SAES-422 Multistate Research Activity Accomplishments Report

Status: Approved

Basic Information

Participants

Fernandez, Gina (gefernan@ncsu.edu) - North Carolina State University (NC); Gasic, Ksenija (kgasic@clemson.edu) - Clemson University (SC); Hulbert, Scot (scot_hulbert@wsu.edu) – Washington State University (WA); Iorizzo, Massimo (miorizz@ncsu.edu) - North Carolina State University (NC); Jung, Sook (sook.jung@wsu.edu) - Washington State University (WA); Knapp, Steven (sjknapp@ucdavis.edu) Main, Doreen (dorrie@wsu.edu) - Washington State University (WA); Peace, Cameron (cpeace@wsu.edu) - Washington State University (WA);

In the annual meeting we discussed progress of the project by objective and outstanding work to be completed by the end of the project in 2024. We spent a considerable amount of time discussing sustainability options for the 5 databases and its other resources that comprise NRSP10. An action plan was developed because of these discussions and plans are being made to have a second meeting of participants in May 2023. We also met several times during year 3 with project participants, database steering committee members, and over 160 community researchers to present and discuss the work of NRSP10 and provide training on use of NRSP10.

Accomplishments

Objective 1: Expand the online community databases for Rosaceae, citrus, cotton, cool season food legumes and Vaccinium crops. Progress from 10/01/21 to 09/30/22 (Year 3): We have continued to add genomic, genetic, and breeding data to our five databases and added new or modified tools for increased functionality. In year 3, we added 74 genomes, 4,480,397 genes, 4,662,527 mRNAs, 27 genetic maps, 298,463 markers, 66,392 phenotypes, 534 Trait Loci, 5,997,856 genotypes and 3,729 germplasm. A total of 9 TB of data was added to NRSP10 database resources. These data were provided by participants and community partners or from publications. In addition, we perform data analysis to add functional annotation for every genome, align markers and transcripts to the genomes, perform synteny analysis and identify orthologs/paralogs, and create pathway maps for genome sequences.  Manual curation of data​ includes standardizing marker and QTL names between publications and continuing to update trait ontologies that we develop and maintain.

All our databases have work completed pages detailing data and tools added by date. Usage of the databases continues to grow. Usage between 10/01/21 to 09/30/22 was as follows: 
GDR – 37,755 users from 180 countries, with 1,235,564 pages viewed over 102,615 visits (22% U.S.); 
CottonGen – 30,976 users from 175 countries, 400,492 pages viewed over 61,989 visits (23% U.S.); 
Pulse Crop Database – 10,182 users from 141 countries, 68,506 pages viewed over 13,666 visits (17% U.S.); 
Citrus Genome Database – 15,273 users from 165 countries, 286,005 pages viewed over 25,850 visits (18% U.S.); and 
GDV – 6,038 users from 109 countries, 99,170 pages viewed over 11,085 visits (27% U.S.). For the first 5 years of the NRS10 project (10/01/14-9/30/19), 3,679,433 pages of these databases were accessed by users. In the first 3 years of the current NRSP10, 5,465,318 pages have been accessed by our community of users and the databases have been cited in 482 peer-reviewed publications.

Objective 2: Develop a Tripal module for visualization of epigenetics data. Progress from 10/01/19 to 02/15/22:  This module is no longer needed as methylation data is now being visualized in the JBrowse genome browser.

Objective 3: Enhance TripalMap to integrate genomic and genetic data. Progress from 10/01/21 to 02/15/22: TripalMap, our genetic map comparison, and visualization viewer, now includes a genome comparison view with genome correspondence matrix and organism filtering for genomes (TripalMap v2.0). Chromosomes and linkage groups are linked by shared markers, allowing users to explore the genomic features around QTL, even when only the genetic position is available. The genes in chromosome view in TripalMap hyperlink to our genome viewer JBrowse. This has been implemented in all the NRSP10 databases.

Objective 4: Enhance TripalBIMS to (a) support phenomics data, (b) add GWAS analysis, and (c) global performance prediction capability. Progress from 10/01/21 to 09/30/22:  We continued development of the Tripal Breeding Information Management System (BIMS), a program embedded in each of our five databases and began development of a non-database associated BIMS (standalone BIMS: https://www.breedwithbims.org) to meet the request by non NRSP10 crop breeders to have access to such a resource. It also includes access to all training manuals/webinars/presentations/publications for the BIMS programs are accessible from a central location, and updated BIMS manual, so it is very comprehensive site. New functionality during year 3 includes (1) BIMS compliant with BrAPI. This enables breeders to send data directly from any BrAPI compliant resource, such as the Field Book App, to BIMS, (2) Updates to BIMS as FieldBook updated, (3) Importing of public data from the community database into their private BIMS program, (4) Developed a prototype Global prediction tool where users can load genotypic data of their germplasm of interest, choose an environmental condition and trait name to view predicted trait values, (5) BIMS training manual updated, (6) Dedicated BIMS training provided for 3 program and multiple crops.

Objective 5: (a) Identify Sustainability Models and (b) provide additional tools and resources as required by the community. Progress from 10/01/21 to 09/30/22: (a) Phoenix Bioinformatics conducted a database user survey in Year 3 to gauge the willingness of users/institutions/ to financially support our GDR, CottonGen and GDV NRSP10 databases (representative models for all 5 of our databases). They contacted key industry representatives to discuss possible sustainability models and their willingness to provide financial support toward sustainability. One non-federal funding organization has indicated its willingness to allow principal investigators to request funds in their research proposals to support their crop database. USDA ARS has submitted a funding request for $500,000 per year to be appropriated to support CottonGen. A new SCRI database project (PD Main) was funded ($5.2 M, 2022-2026) to support data collection and curation of genomic, genetic, and breeding data for our Rosaceae citrus, vaccinium, and pulse crop databases, with significant resources allocated to training and outreach. Cotton Incorporated/USDA-ARS and the Cotton Industry provided $154,000 in year 3 for CottonGen Data Curation and some development work. At the request of the American Pomological Society and the ASHS we have developed an online tree fruit and nut register of cultivars database (https://www.fruitandnutlist.org/) which collates and standardizes information from the last 50 tree fruit and nut registers into searchable data for use by breeders and growers. At the request of the US Plant Breeding Coordinating Committee (PBCC SCC-80) we developed and maintain web pages (https://www.nrsp10.org/PBCC_about_us) within the NRSP10 website for PBCC information management and resource dissemination to the plant breeding community. At the request of the National Association of Plant Breeding leadership team, we have designed and developed a new website for the NAPB using Drupal 9 (easier to manage and update than the previous web site). Collection of data and access to information about US Plant Breeding Programs (https://www.nrsp10.org/us-breeding-program) continues to be updated using tools available on the NRSP10 site and a new 5-year survey is slated for spring 2023 through the NRSP10 site, with the results ready for presenting at the annual NAPB meeting in July 2023.

Core Tripal Database PlatformNRSP10 continues to develop Core Tripal and provide support for communities using or interested in adopting this database platform. Between 10/01/21 to 09/30/22 core Tripal efforts focused on Tripal v4 Core Development and Tripal v3 Core Maintenance. Community Engagement includes monthly user group meeting, Slack is used for real-time conversations amongst the Project Management Committee, Tripal Advisory Council, and users in general. Additionally, the Chado group uses the Tripal slack for conversations as well.  Releases include:

Tripal v3.8 – Released 02/22/22. This release contains various bug fixes, improvements to the OBO loader and Tripal fields; Tripal v3.9 - Released 08/12/22. This release contains mostly of minor bug fixes and compatibility improvements. On 09/27/22 Tripal MegaSearch v1.4.0 was also released. Over 140 sites have installed the Tripal database platform to date.

AgBioData, Consortium of Agricultural Databases: On September 1, 2021, NSF funded ($600,000) a three-year Research Coordination Network project “RCN: Reimagining a Sustainable Data Network to Accelerate Agricultural Research and Discovery”. The project team of NRSP10 helped lead this proposal and the AgBioData website (https://www.agbiodata.org) is developed and hosted through the NRSP10 project. The AgBioData Research Coordination Network (RCN) aims to accelerate research in agricultural science by increasing the accessibility and reuse of large- scale biological data. It focuses on increasing the value of Big Scientific Data through the FAIR (findable, accessible, interoperable, and reusable) model. Annual meetings bring together multi-disciplinary scientists from all the major U.S. agricultural genomic, genetic, and breeding (GGB) databases and allied resources, accelerating synergistic efforts to make the huge amount of data curated by AgBioData databases FAIR. This facilitates shortened data processing/curation times, simplified data management, and more standardized data handling between databases, making it easier for researchers to find and use data. The larger network of all data stakeholders is working together to identify and prioritize the most pressing data and metadata standardization needs and to develop and implement processes to solve them. Working groups have been created to focus on key community data issues, including unified nomenclature, metadata standards, data federation, and recommendations for emerging data types. The network directly supports data-generating scientists by developing clearly defined FAIR data management guides with a framework on how to maximize data visibility and data reuse for common types of biological data and a foundational FAIR data management educational curriculum appropriate for academic courses or short training modules. Lastly, sustainability efforts will provide a roadmap for the future of GGB databases. The AgBioData Consortium is a model for how database managers, researchers, educators, and publishers can work together to be more resource-efficient and how, as central resources for the communities they serve, they can use a collective voice to benefit all scientists, both domestically and abroad.

Impacts

  1. Continued to enable basic discovery and crop improvement research efforts in tree fruit, berries, nuts and cotton through access to high-quality, curated and integrated data and analysis tools in the Rosaceae, Cotton, Vaccinium, Citrus and Legume databases and the one-stop genome annotation platform supported by NRSP10. Use and impact of the NRSP10 databases continues to grow as more data and functionality are added. Between October 1, 2021, and September 30, 2022, these collective database resources were visited by 12,402 US visitors from all 52 states and major territories out of a total visitor count of 188,104 from 181 countries, with over 4 million pages accessed. In the same period NRSP10 databases were cited in 482 peer-reviewed publications in 2022 (Google Scholar). Please see the general/citations page in each database for links to publications citing the databases.
  2. Due to NRSP10 project activities, breeding program efficiency improvements allow for high throughput in evaluating newly developed peach, dry pea, berry cultivars/breeding lines to create more individuals with the desired traits. This project’s activities accelerate the development of organic pulse cultivars and disease-tolerant and climate-resilient peach cultivars adapted to the southeastern environment. The broader public will benefit from having high protein and micronutrient-rich pulse crops (dry pea and lentil) to combat obesity and malnutrition and access to affordable peach fruit produced with low pesticide input. More than 1000 advanced organic dry pea cultivars biofortified with high protein will be field tested for target release in 2025.
  3. Access to U.S. Plant Breeding capacity data information enables national, state, and institutional decision making. At the request of the Plant Breeding Coordinating Committee (SCC-80) the NRSP10 team developed a long-term resource to collect, collate and disseminate information on U.S. Plant Breeding Capacity in Public Sector Institutions in 2018. This work was funded through the NRSP10 project, and an NSF PGRP grant to PI Main. Breeding program information continues to be updated through the NRSP10 site and is particularly important for capturing retiring and new breeder data. The next 5-year survey of U.S. Plant Breeding capacity is slated for spring 2023, with the results ready for presenting at the annual NAPB meeting in July 2023.
  4. Enhanced visibility of the resources provided by NRSP10. Disseminating the results of this project further increased the visibility and need for the resources provided by NRSP10. Between October 1, 2021 to September 1, 2022 this has been accomplished through 9 peer reviewed publications and presentations at conferences (Biennial NAPIA and BIC Bean Meetings (November, 2021), Plant and Animal Genome Conference (January, 2022), American Society for Horticultural Science Annual Meeting (July, 2022), American Society of Plant Biology Annual Meeting (July, 2022), International Symposium on Breeding and Effective Use of Biotechnology and Molecular Tools in Horticultural Crops (August, 2022), and 10th ISHS Peach Symposium (June, 2022); Rosaceae and Vaccinium training workshops and Tripal codefests, several webinars, 20 newsletters and 8 short “How to Videos”. All these presentations and webinars are available from the database websites.
  5. Reduced redundancy of effort and resources by providing access to a standardized Tripal database platform with help desk support enabling adoption. There are currently over 140 species/clade genomics, genetics and/or breeding databases using this common platform. Over 6000 crop and wild relative species are now served through a Tripal database. Extensive collaboration among 12 database research groups in 5 countries continues to increase co-development and extension of the Tripal database platform functionality. Funding for development of the Tripal by federal agencies and industry demonstrates the impact of this open-source software as a standard platform for databases serving genomics, genetics and breeding data and analysis/visualization tools to scientists. A large federal grant to further support Tripal is currently in preparation led by NRSP10 project participants, Stephen Ficklin (WSU), Meg Staton (UTenn), Jill Wegrzyn (UConn), and Dorrie Main (WSU).
  6. The AgBioData Research Coordination Network (RCN) continues to accelerate research in agricultural science by increasing the accessibility and reuse of large- scale biological data, increase communication and coordination toward sustainability within the Agricultural Biological Database Community (AgBioData) which is facilitated through NRSP10 hosted AgBioData website (https://www.agbiodata.org).
  7. Increased sustainability of NRSP10 databases through leverage of federal/industry funding of > $14M (2014-2026). This includes the newly funded $5.2 M USDA NIFA SCRI to PDs Main/Jung/Peace (WSU), Gasic/Rife (Clemson), Ru (Auburn), Gmitter (Florida), Bassil/McGee (USDA ARS) for our Rosaceae, vaccinium, citrus and pulse crop databases; and $~350,000 from USDA NIFA, Cotton Incorporated, Cotton Industry, and USDA ARS for CottonGen (2021-2024). Other sources of support include two SCRI’s (VacCAP, PD Iorriza, NCSU; Lentil, PD Burrows, Montana State University).

Publications

Yu, J., Jung, S., Cheng, C.H., Lee, T., Zheng, P., Buble, K., Crabb, J., Humann, J., Hough, H., Jones, D. and Campbell, J.T., 2021. CottonGen: The Community Database for Cotton Genomics, Genetics, and Breeding Research. Plants, 10(12), p.2805. https://doi.org/10.3390/plants10122805

Jung, S., Lee, T., Cheng, C. H., Zheng, P., Bubble, K., Crabb, J., ... & Main, D. (2022, May). Resources for peach genomics, genetics and breeding research in GDR, the Genome Database for Rosaceae. In X International Peach Symposium 1352 (pp. 149-156). 10.17660/ActaHortic.2022.1352.20

Edger, P. P., Iorizzo, M., Bassil, N. V., Benevenuto, J., Ferrão, L. F. V., Giongo, L., ... & Zalapa, J. (2022). There and back again; historical perspective and future directions for Vaccinium breeding and research studies. Horticulture Research, 9, uhac083. https://doi.org/10.1093/hr/uhac083

Rolling, W. R., Senalik, D., Iorizzo, M., Ellison, S., Van Deynze, A., & Simon, P. W. (2022). CarrotOmics: a genetics and comparative genomics database for carrot (Daucus carota). Database, baac079. https://doi.org/10.1093/database/baac079

Thavarajah, D., Lawrence, T. J., Powers, S. E., Kay, J., Thavarajah, P., Shipe, E., ... & Boyles, R. (2022). Organic dry pea (Pisum sativum L.) biofortification for better human health. PloS one, 17(1), e0261109. https://doi.org/10.1371/journal.pone.0261109.

Salaria, S., Boatwright, J. L., Thavarajah, P., Kumar, S., & Thavarajah, D. (2022). Protein Biofortification in Lentils (Lens culinaris Medik.) Toward Human Health. Frontiers in Plant Science, 934. https://doi.org/10.3389/fpls.2022.869713

Madurapperumage, A., Johnson, N., Thavarajah, P., Tang, L., & Thavarajah, D. (2022). Fourier‐transform infrared spectroscopy (FTIR) as a high‐throughput phenotyping tool for quantifying protein quality in pulse crops. The Plant Phenome Journal, 5(1), e20047. https://doi.org/10.1002/ppj2.20047

Reda, T., & Powers, S. E. (2022). Falling into line: Adaptation of organically grown kale (Brassica oleracea var. acephala) and kale relatives to fall planting. Scientia Horticulturae, 295, 110878. https://doi.org/10.1016/j.scienta.2022.110878

Humann J, Crabb J, Cheng C-H, Lee T, Zheng P, Buble K, Jung S, Yu J, Hough H, Coyne C, McGee R, Main D. 2021. The Pulse Crop Database: A Resource for Pulse Crop Research and Improvement. Proceedings of the 2021 Biennial NAPIA and BIC Bean Meeting, November 4-6, 2021.

Jung S, Lee T, Cheng C-H, Gasic K, Humann JL, Yu J, Hough H, B.T. Campbell, Main, D. 2022. New Features on the Tripal BIMS, Breeding Information Management System. Proceedings of the Plant and Animal Genome XXIX Conference, January 8-12, 2022.

Ficklin, SP, Wytko C, Soto B, Main D, Feltus FA. 2022. Two New Tripal Extension Modules: File Management for FAIR Data and Biological Networks. Proceedings of the Plant and Animal Genome XXIX Conference, January 8-12, 2022.

Honaas, LA, Zhang H, Wafula EK, Eilers J, Harkess A, Timilsena P, dePamphiis CW, Waite J, Jung S, Main D. 2022. Towards a Catalog of Pome Tree Architecture Genes: The Draft ‘d’Anjou’ Genome (Pyrus communis L.). Proceedings of the Plant and Animal Genome XXIX Conference, January 8-12, 2022.

Humann JL, Cheng C-H, Crabb J, Jung S, Yu J, Main D. 2022. Using the Tripal Pub Curator Module to Manage Data Curation from Publications. Proceedings of the Plant and Animal Genome XXIX Conference, January 8-12, 2022.

Ru S, Hardner C, Evans K, Main, D, Carter PA, Harshman J, Sandefur P, Edge-Garza D. and Peace C. 2022. Empirical Evaluation of Multi-Trait DNA Testing in an Apple Seedling Population. Proceedings of the Plant and Animal Genome XXIX Conference, January 8-12, 2022.

Iorizzo M, Molla MF, Bostan H, De Paola D, Teresi S, Teresi A, Cremona G, Qi X, Mackey T, Bassil N, Ashrafi H, Giongo L, Jibran R, Chagne D, Bianco L, Lila MA, Rowland LJ, Iovene M and Edger P. 2022. Comparative genome analysis in blueberry revealed autopolyploid recombination behavior and a heterozygous reciprocal translocation.  American Society for Horticultural Science 2022 Annual Conference, July 29-August 3, 2022, Chicago, IL, USA.

Iorizzo M, Lila MA, Perkins-Veazie P, Pottorff M, Mengist MF, Colonna A, Vorsa N, Edger P, Bassil N, Luby C, Mackey T, Munoz P, Zalapa J, Gallardo KR, Atucha A, Main D, Giongo L, Li C, Polashock J, Sims C, Canales E, DeVetter L, Chagne D, Espley R and Coe M. 2022. Vaccinium CAP: A community-based project to develop advanced genetic and genomic tools to improve fruit quality in blueberry and cranberry. Acta Horticulturae, Proceeding of the International Symposium on Breeding and Effective Use of Biotechnology and Molecular Tools in Horticultural Crops, August 14-20, 2022, Angers, France.

Mengist MF, Grace M, Mackey T, Bassil N, Luby C, Ferruzzi M, Lila MA and Iorizzo M. Dissecting the genetic basis of anthocyanins accumulation and diversity in blueberries (Vaccinium corymbosum L). American Society for Horticultural Science 2022 Annual Conference, July 29-August 3, 2022, Chicago, IL, USA.

Humann J, Crabb J, Frank M, C-H Cheng, P Zheng, T Lee, K Buble, H Hough, K Scott, S Jung, D Main.  Using the Citrus Genome Database for Genetics, Genomics, and Breeding Research.  American Society for Horticultural Science 2022 Annual Meeting, July 30-August 3, 2022, Chicago, IL.

Jung S., J. Yu, J. Humann, P. Zheng, T. Lee, C-H. Cheng, K. Buble, H. Hough, D. Main. Database and Data Management Resources for Tree Fruit and Berry Breeding. American Society for Horticultural Science 2022 Annual Meeting, July 30-August 3, 2022, Chicago, IL.

Jung, J. Humann, J. Yu, P. Zheng, T. Lee, C-H. Cheng, K. Buble, H. Hough, D. Main. Complementary Efforts By Other Databases, Initiatives and Networks to Facilitate Use of the Existing Genetic Diversity in the NPGS System and Other Collections (GDV, GDR, AgBioData). American Society for Horticultural Science 2022 Annual Meeting, July 30-August 3, 2022, Chicago, IL.

Jung, J. Yu, J. Humann, P. Zheng, T. Lee, C-H. Cheng, K. Buble, H. Hough, and D. Main 2022. Update on database resources for tree fruit, berry, cotton, and pulse genome research. Plant Biology 2022 (organized by ASPB), July 11, 2022, Portland, OR

Jung, T. Lee, C-H. Cheng, P. Zheng, K. Buble, J. Crabb, K. Gasic, J. Yu, J. Humann, H. Hough, and D. Main 2022. Resources for peach genomics, genetics and breeding research in GDR, the Genome Database for Rosaceae. Proceedings of the 10th ISHS Peach Symposium, May 30-June 3, 2022, Naoussa, Greece.

Log Out ?

Are you sure you want to log out?

Press No if you want to continue work. Press Yes to logout current user.

Report a Bug
Report a Bug

Describe your bug clearly, including the steps you used to create it.