SAES-422 Multistate Research Activity Accomplishments Report
Sections
Status: Approved
Basic Information
- Project No. and Title: NRSP_OLD8 : National Animal Genome Research Program
- Period Covered: 10/01/2014 to 09/01/2015
- Date of Report: 02/26/2015
- Annual Meeting Dates: 01/10/2015 to 01/11/2015
Participants
The NRSP-8 business meeting was preceded by two days of species workshops, area subcommittees, and the combined Animal Genome Workshop presented on Sunday afternoon. The combined workshop included four plenary presentations as follows: Dr. Kim Worley, Baylor College of Medicine, “Improving the Reference – Better Genomes for the Sheep and the Cow”; Dr. John Hickey, The Roslin Institiute, University of Edinburgh, “Genomic Selection 2.0”; Dr. Hans Cheng, USDA-ARS-ADOL, delivered the NRSP8 Distinguished Lecture entitled “Integrative Genomics to Provide Basic and Applied Knowledge to Control Marek’s Disease in Chicken”; Dr. Elisabetta Giuffra, INRA, UMR de Genetique Animale et Biologie Integrative, “The Functional Annotation of Animal Genomes (FAANG) Initiative”. Dr. Giuffra’s presentation was followed by a 30 minute FAANG Roundtable Discussion by members of the international FAANG consortium. Topics of discussion included current pilot projects as well as plans for future large-scale collaborative work on annotation of regulatory elements across many animal clades. The business meeting was called to order by the Chair, Dr. Stephen White (USDA-ARS-ADRU, Pullman, WA), and was recorded by the Secretary, Dr. Daniel Ciobanu (University of Nebraska-Lincoln) with approximately 40 members in attendance. Coordinator reports were presented for the species/topic groups of Cattle, Poultry, Swine, Sheep and Small Ruminants, Equine, Aquaculture, and Bioinformatics. Dr. Eric Young (North Carolina State University) provided the administrative report. Dr. Lakshmi Matukumalli (contact for the Tools and Resources – Animal Breeding, Genetics and Genomics, USDA-NIFA-AFRI) provided a brief update. It was stated and confirmed that the 2015 NRSP8 meeting will again be held in conjunction with the Plant and Animal Genome conference in San Diego. Dr. Daniel Ciobanu assumed the NRSP-8 Chair for 2015-2016, and Dr. Huaijun Zhou (University of California-Davis) was elected Secretary for 2016-2017. The meeting was adjourned
Accomplishments
Objective 1:Catfish
A) Cooperative research between USDA-ARS Warmwater Aquaculture Research Unit and the School of Fisheries, Aquaculture and Aquatic Sciences at Auburn University has resulted in the first generation catfish genome sequence assembly. Next generation sequences from a doubled haploid channel catfish were error-corrected and assembled using the MaSuRCA/Whole Genome Shotgun Assembler pipeline. Mate pair reads from 3kb and 8kb length fragments, and paired end sequences from 34.4 kb fosmid clones were used to link contigs into scaffolds. Illumina and Pac Bio sequences were used to fill scaffold gaps, which improved the average contig lengths from 7.2 kb to 17.1 kb. Half the assembled bases were contained in 2,861 contigs of 76.7 kb or longer (up to 607 kb) and in 113 scaffolds of 1.88 Mb or longer (up to 11.5 Mb). 99% of the assembled bases were contained in 5,299 scaffolds of 1kb or longer. The kmer-based genome size estimate was 948 Mb, and the combined lengths of contigs and degenerates (sequences deemed as genomic repeats) was 934 Mb. 93.7% of assembled bases could be placed on the high density genetic map, and 95.6% could be placed on the BAC physical map. The catfish genome was annotated using transcriptome sequencing. Through transcriptome analysis of various tissues, a total of almost 28,000 genes were identified, and 23,000 complete cDNAs have been assembled and annotated. Gene families and gene duplication were analyzed. B) The catfish genome was annotated using transcriptome sequencing. Through transcriptome analysis of various tissues, a total of almost 28,000 genes were identified, and 23,000 complete cDNAs have been assembled and annotated. Gene families and gene duplication were analyzed. C) Development and validation of SNP resources in different lines of blue catfish
Oyster
A) Proposal to sequence and assemble the Eastern Oyster Genome was funded by the USDA NIFA Animal Breeding, Genetics, and Genomics program. PI: Marta Gómez-Chiarri. B) Draft genome of Pearl oyster (Pinctada fucata) is in progress (NCBI BioProject ID PRJDB2628). C) Mantle transcriptome of pearl oyster (Pinctada maxima) was sequenced, assembled and annotated. In addition, 1764 SSRs were identified. D) The Pacific Oyster methylome was characterized. Genes that are regulated by CpG methylation largely originate within the eukaryotic lineage suggesting that alternate methylation patterns contiribute to the radiation of eukaryotic taxa.
Salmonids
A) The first rainbow trout reference genome was published (Berthelot et al., 2014) and the genome assembly was posted at the NCBI genome database. B) Thousands of SNP markers from RAD tags were identified and genotyped for various populations of O. mykiss (Matala et al. 2014), O. tshawytscha, O. nerka, and O. kisutch. C) An effective method for custom amplicon sequencing (GT-seq) that allows thousands of fish to be genotyped for panels of 100-1000 SNPs was developed (Campbell et al. 2014, E-published).
Shrimp
A high quality draft assembly of the Litopenaeus vannamei remains elusive. Several groups are working on this, including a teams led by Mike Criscitiello at Texas A&M and Rogerio Sotelo-Mundo at CIAD in Hermosillo Mexico, and one led by Jianguo He from Sun Yat-sen University in China. Efforts are currently focused upon assembly methods and inclusion of more PacBio data.
Striped bass
The 585 Mb hybrid Illumina-PacBio striped bass genome sequence assembly was annotated and an jBrowse website is being designed for presenting access to it online with availability via the the NRSP8 website, URL: http://stripedbass.animalgenome.org/ anticipated in 2016. A dedicated virtual machine platform has been setup to develop cyber resources for the Striped Bass Genome Database project at NC State University. Female and male white bass genomes also have been sequenced using Illumina and assembled using the striped bass genome as a reference. The female and male white bass genome assemblies consist of 57,533 contigs (643 Mb) and 56,818 contigs (644 Mb), respectively and partial Cegma scores indicate that both assemblies are 97.98% complete."
Objective 2: Catfish
A) Bulked segregant RNA-seq (BSR-Seq) was used to analyze differentially expressed genes and associated SNPs with disease resistance against enteric septicemia of catfish (ESC). A total of 1,255 differentially expressed genes were found between resistant and susceptible fish. In addition, 56,419 SNPs were identified as significant SNPs between susceptible and resistant fish located on 4,304 unique genes. Detailed analysis of these significant SNPs allowed differentiation of significant SNPs caused by genetic segregation and those caused by allele-specific expression. Mapping of the significant SNPs, along with analysis of differentially expressed genes, allowed identification of candidate genes underlining disease resistance against ESC disease. Genomic sequencing of multiple individuals allowed identification of 8.4 millons of SNPs and analysis between the domestic and wild catfish allowed identification of selection sweeps. B) Continued characterization of immune responses and underlying gene actors in innate and specific immune responses in catfish
Oyster
A) The transcriptomes of wild juvenile oysters from high and low salinity regimes were sequenced and compared to identify candidate genes for osmoregulation. High nucleotide sequence divergence between the Eastern Oyster and Pacific Oyster limits the extent to which the C. gigas genome can serve as a reference genome for C. virginica; however, purifying selection on protein sequences in this genus allowed for accurate functional annotation of C. virginica predicted protein sequences. B) The transcriptomes of juvenile Eastern Oysters from ROD (Roseovarius Oyster Disease) resistant and susceptible families were sequenced and compared to characterize the responses of different families to the disease as well as provide insight to mechanisms of disease resistance. Transcripts involved in immune recognition, signaling, protease inhibition, detoxification, and apoptosis were differentially expressed among the two families. C) Two Illumina GoldenGate genotyping arrays containing 384 SNP markers were designed for Crassostrea gigas and Ostrea edulis respectively and used to genotype 1000 individuals from wild and selected populations as well as families bred for commercially important traits. Overall success rate was 60%. These arrays provide adequate power for parentage assignment. D) Transcriptomes of Pacific Oysters from three wild populations were sequenced and aligned to the Pacific Oyster genome. 5.8 × 105 SNPs were identified and non-synonymous SNPs were enriched in genes involved in apoptosis and responses to biological stimuli. HRM genotyping assays have been developed for approximately 1300 SNP markers.
Salmonids
A) The rainbow trout 57K SNP array is now commercially available from Affymetrix in two formats; for samples in 96-well plates and for samples in 384-well plates (more economical). For the 96 format the minimum order is 192 samples and for the 384 format the minimum order 1,920 samples. More ordering information on the array and a data sheet with technical information are available on the Affymetrix web site: http://www.affymetrix.com/estore/catalog/prod900010/AFFY/Axiom%26%23174%3B+Trout+Genotyping+Array#1_1. B) GWAS studies were conducted for disease resistance in rainbow trout (Campbell et al. 2014, E-published), and tested natural populations of steelhead for genomic association with variable environments and landscapes (Matala et al. 2014). C) SNP markers were used to identify specific stocks of Chinook salmon and to identify run-timing, straying, and delayed mortality in natural populations (Hess et al. 2014; Rechisky et al. 2014). D)
A total of 76 differentially expressed miRNAs including 10 miRNAs novel to rainbow trout were identified in skeletal muscle under the influence of estrogen. The known miRNAs include important myogenic miRNAs, such as miR-1, miR-133a, miR-126, miR-145, miR-499 and miR-206. E) A stringent set of 9,674 large intergenic noncoding RNAs (lncRNAs) were identified by RNA-Seq analysis of rainbow trout transcriptome. These lincRNAs in general are less conserved than protein-coding genes, and typically co-expressed with their neighboring genes. Many of them are tissue-specific and functionally associated with important biological processes.
Shrimp
Several RNAseq projects have been completed in this species, including a refined transcriptome annotation from the Texas A&M/CIAD team late this year (Scientific Reports 4:7081 2014).
Striped bass
Artificial neural networks and supervised machine learning were employed to further evaluate relationships between ovary transcriptome profiles and egg quality (fertility) in striped bass. Expression levels of as few as 250-1,000 ovary genes proved to be a robust predictor of egg quality (R2 always > 0.80) in separate studies involving analyses of gene expression by microarray or RNA-Seq in different groups of domesticated and wild striped bass. Egg transcriptome profiles were nearly as informative as ovary profiles from the same females, implicating maternal transcripts deposited in eggs in control of egg quality.
Objective 3: Oyster
Proposal to hold resource coordination workshops focused on oysters and other shellfish funded through NRSP8 Aquaculture Program. PI: Steven Roberts.
Salmonids
A bioinformatics pipeline was developed for genotyping SNPs from raw sequence data for the GT-seq method (Campbell et al. 2014, E-published).
Shrimp
The shrimp community is slowly making data more accessible. Acacia Alcivar-Warren’s Environmental Genomics is moving on shrimp projects internationally in epigenomics and environmental toxicology, and had set up a One Health Genomics website that was unfortunately hacked. Most groups are still sharing data via small institution repositories (e.g. http://repository.tamu.edu/handle/1969.1/152151).
Cattle Technical Report
Objective 1:
Bovine Genome sequence: An important focus of the community has been towards improving the bovine genome assembly. In 2014, there was limited success in expanding these efforts to the desired level with NIFA funding. A group of collaborating scientists are working toward improving the bovine reference genome assembly and its annotation. Multiple data types have been generated, such as an optical map, Illumina paired-end and mate-pair sequence, PacBio sequence, and improved gene predictions based on RNA-seq data. All of the data were derived from tissue samples from L1 Dominette 01449, the reference animal. The goal of the group has been to generate improved reference genome assemblies with fewer gaps, miss-assemblies, and missing genes.
Efforts currently underway and supported by NIFA grants: 1) David Schwartz, Shigou Zhou (University of Wisconsin, Madison) and collaborators: Have generated a whole genome optical map (BtOM1.0) of Dominette 01449. BtOM1.0 is a high-resolution physical map that has been used to compare the structure of both bovine assemblies, UMD3.1 and Btau4.6, revealing that Btau4.6 has double the number of discordances than UMD3.1. BtOM1.0, when used as an independent guide, will greatly advance future improvements of existing sequence builds and will also serve as an accurate physical scaffold for comparative genomic studies. 2) Kim Worley (Baylor College of Medicine, Houston, TX) is using long PacBio reads to improve the assembly and has generated 20x PacBio data for both Dominette 01449 and a Texel sheep. She will use PBJelly software (developed in-house at Baylor) to fill gaps and improve scaffolds and is working with David Schwartz to get the optical map data in a form that will allow identifying regions of Btau4.6 and UMD3.1 that are consistent and inconsistent with the optical map. 3) Chris Elsik (University of Missouri, Columbia) is working on improving gene annotation using RNA-seq data from strand-specific cDNA libraries, new software and visualization applications with community annotation. 4) Huaijun Zhou and collaborators (University of California, Davis) are following the blue print of the human and mouse ENCODE projects for identifying the functional roles of regulatory elements in the genome, this group has implemented a similar effort in cattle, pig and chicken, initiating the AGENCODE project. The goals are to identify promoter, enhancer, and silencer region specific chromatin marks, and to determine functional roles of regulatory regions in relevant tissues in each species.
Efforts currently underway, not supported by NIFA grants: Tim Smith (Meat Animal Research Center, Nebraska) and Juan Medrano (University of California, Davis): Genomic DNA from Dominette 01449 is being used to produce PacBio libraries to generate >50x of coverage of the genome for an independent assembly, which will be merged with the current assembly UMD3.1 to leverage the advantages of both. Funding for this project comes from $55k from NRSP8 Cattle coordinator funds, $15k from USMARC and $10K from Zoetis. Part of the funds will also be used to generate IsoSeq PacBio RNA full-length cDNA sequences from Dominette tissues to supplement fat, muscle and lung tissues already performed at USMARC. Data from these efforts will be made publically available as the data are being generated using public funding.
Objective 2: n/a
Objective 3:
Harvey Blackburn at USDA-ARS National Animal Germplasm Program (NAGP), Colorado State University Experimental Station and EMBRAPA have joined efforts to develop a genomic database that will serve as a repository for DNA data from the large animal genomics projects which have valuable data that needing permanent archiving for future research. This effort, coupled with the existing capacities to store phenotypic and production system data in the Animal-GRIN database as well as germplasm/tissue samples, will facilitate the communities’ efforts to maintain valuable data for future use. Database and bioinformatics activities are also coordinated by Jim Reecy (NRSP8 Bioinformatics Coordinator) at the NAGRP site (http://www.genome.iastate.edu/cattle/). Coordination funds supported students, post-docs and workshop speakers travel awards for PAG-XXI in January 2014, and will do the same for PAG XXII in January 2015. A future priority is to support efforts towards the improvement of the bovine genome reference sequence assembly in 2015, and to support data sharing and the creation of sample and data repositories that will benefit other cattle research investigators. We will expand our efforts to include international collaborators and the cattle industry, and expect to keep the Cattle Genome Community informed of developments and activities of the Cattle Genome Coordinators through a periodic newsletter. If you have any informational items you would like distributed via this newsletter please contact Alison Van Eenennaam (alvaneenennaam@ucdavis.edu) or either of the two other co-coordinators. Constructive suggestions from researchers on areas to support in bovine genomics are also welcomed.
Swine Technical Report
Objective 1:
Porcine SNP chips update: Illumina and the International Porcine SNP Chip Consortium developed a porcine 60K BeadChip that has been used worldwide for numerous genome wide association studies (GWAS) studies. GeneSeek, a supplier of genotyping services, has a low density chip, the GeneSeek Genomic Profiler for Porcine LD (GGP-Porcine LD) that utilizes Illumina Infinium chemistry and features approximately 8,500 SNPs for high density chip imputation. GeneSeek also released a new chip in 2014, the GGP - Porcine HD that features nearly 70,000 SNPs that span the pig genome, as well as several markers that directly impact disease and performance traits. Details on these chips can be obtained from GeneSeek (geneseekinfo@neogen.com). A new high density SNP chip is being developed by Affymetrix, and will be announced in 2015.
Objective 2:
Shared Materials and Funding: NRSP8 funds are available to support community activities to find associations with many different traits. In 2014, a policy was developed and approved by the Advisory Committee that for swine genomics projects to be eligible for NRSP8 Coordination support, the project must materially involve two or more NRSP8 member groups (university or ARS research locations) and that substantial funding will only be provided for projects that have matching funding from another agency. In FY 2014 one project was approved to work toward a genetic analysis of PEDV resistance. Any questions on this policy, please contact the Coordinators.
Objective 3:
International Efforts: Communication with all international groups and individuals is excellent. The Swine Genome coordinators have been working with a large number of individuals in many countries to develop a new initiative, called Functional Annotation of ANimal Genomes (FAANG). This group proposes a project to identify all functional elements in animal genomes, and has presented their plans on a website organized by the Swine Coordination effort (see www.faang.org).
Policy Updates: We have developed an Advisory Committee, who will provide guidance on policy as well as help evaluate requests for funding. The members of this Advisory Committee represent the swine industry, swine genomics and biotechnology researchers, NRSP-8 Stations and participating USDA labs. The members are: Jack Dekkers (ISU), Chris Hostetler (National Pork Board), Joan Lunney (USDA-BARC), Randy Prather (U. Missouri), and Juan P. Steibel (MSU). Thanks to this group for volunteering for this important role!
Communication: The Pig Genome Update has now published 120 issues and has been distributed electronically to over 2,300 people worldwide. PGU will be electronically published three times a year, and in addition to general updates, the issues will be published to coincide with major events of interest to the genome community in February, June, and October.
Travel and Meeting Support: Travel of some scientists was partially funded to attend important pig genomics meetings. These included: Chris Eisley 2014 Neal Jorgenson Travel Award, Joan Lunney 2014 Distinguished Lecturer NRSP-8 Workshop, 2015 commitments:
Melanie Trenhaile, 2015 Neal Jorgenson Travel Award winner, Elisabetta Giuffra, 2015 NRSP-8 special speaker on FAANG, Huaijun Zhou, Midwest ASAS Functional Genomics Workshop.
2014 Research Support Activities: The goals are to help support all of the objectives of this project. Major activities included helping facilitate collection of phenotypes and sharing use of SNP chips in the future. New bioinformatic tools relevant to the swine genomics community will also be developed with help of the bioinformatics team. Constructive suggestions from researchers to help this coordination and facilitation program grow and succeed are appreciated.
Projects approved for funding during period:
1. FAANG project led by Huaijun Zhou, University of California-Davis. This project also had funding promised by the NRSP8 Bovine and Poultry Coordinators, as well as funding by the National Pork Board.
2. PEDV genetics resistance project led by Max Rothschild with collaborators Daniel Ciobanu and Canadian swine genetics companies.
Poultry Technical Report
Objective 1:
Reference linkage map. Linkage mapping is now primarily via high throughput SNP assays. Very high density SNP mapping (ca. 600,000 SNP) panels have been developed and are being employed in genome-wide association studies (GWAS) and genomic selection (GS). Plans will begin soon to help resolute unmapped sequence contigs through genetic mapping of selected SNPs on the East Lansing reference panel.
Chicken genome sequence. Efforts are ongoing to improve the chicken genome sequence, which is being led by Wes Warren, The Genome Institute at Washington U. The latest build, Galgal5.0, will incorporate information from 30x PacBio coverage from a 10 Kb library (funded by Cobb-Vantress), which has improved the N50 contig size from 250 Kb to 1.79 Mb and cut about half of the number of unplaced scaffold gaps (1722 to 965). Unfortunately, even with these efforts and the use of 6x Moleculo sequence, the new build has still not captured the roughly 5% of missing sequence (believed to be predominantly on the microchromosomes). Future planned efforts include PacBio sequencing of a 20 Kb library and, as discussed above, integration with an improved genetic map.
Objective 2:
DNA from the East Lansing international reference mapping population has been sent to many laboratories throughout the world. Similarly, DNA from the junglefowl used to generate the reference sequence assembly has been widely distributed, especially for copy number variant studies.
Objective 3:
Database activities are led by the NRSP-8 Bioinformatics Coordinator, Jim Reecy, and Susan Lamont, along with Shane Burgess, represent poultry interests on the advisory committee for this group. Poultry bioinformatics has also benefitted from support at several other locations. We maintain a homepage for the NRSP-8 U.S. Poultry Genome project (http://poultry.mph.msu.edu) that provides a variety of genome mapping resources, including our newsletter archive. The Poultry Genome Newsletter is published quarterly and is distributed through our Homepage and on the angenmap email discussion group.
Meetings: Over 3,000 scientists attended the joint Plant and Animal Genome XXII meeting last January, held jointly with the annual NAGRP meeting. Coordination funds helped support attendance at PAG-XXII: Travel support for John Hsieh, Iowa State U. graduate student (Lamont, PI); Melissa Monson, Recipient of Neal Jorgenson Genome Travel Award, U. Minnesota graduate student (Reed, PI); Dr. Michael Romanov, U. Kent, UK. Dr. Rachel Hawken, Cobb-Vantress, NRSP-8 workshop speaker.
Impact: This project is generating tools through which the genome sequence can be used to locate inherited production trait alleles and apply the DNA sequence to ascertain the physiological basis for those traits. It has resulted, among other things, in the generation of the complete sequence of the chicken and now the turkey genome. Commercial breeders are using the sequence and SNP we generated to characterize and improve production lines using GS. In simpler terms, we are now moving closer to understanding the cause of phenotypic variation that is relevant to the agricultural use of poultry.
Equine Technical Report
Objective 1:
New Reference Genome Assembly: Ted Kalbfleisch announced that the Morris Animal Foundation had selected for funding a proposal crafted by Ted, Jamie MacLeod and Ludovic Orlando for creating a new assembly of the reference sequence, the putative Ecab 3.0. Partial support for a postdoctoral student will come from USDA-NRSP8 coordinators’ funds. The grant proposal and work is underpinned by data provided by workshop participants including whole genome sequence information from TWILIGHT (reference horse) and from horses of other breeds.
Whole Genome Sequences: In connection with research projects, many of which are cited in the reference section, over 200 horses have had their whole genomes sequenced. Many of those sequences are being used for the new assembly described in the previous paragraph and were used to identify SNPs for construction of the 670K SNP assay tool described below.
Access to reference DNA: The Cornell laboratory (Doug Antczak and Don Miller) have continued to provide samples to other scientists from TWILIGHT, the horse providing DNA for the reference sequence and from BRAVO, the horse that provided DNA for the CHORI 241 BAC library.
Objective 2:
New SNP assay tool: The 670K SNP chip is now available for research use on horses. This was an initiative proposed and driven by Dr. Molly McCue of the University of Minnesota with support of students, co-workers and funding from several agencies including the USD-NRSP8 coordinators‘ fund. Bob Schaefer (UMN) gave a presentation describing the considerations in designing the tool. Geneseek (NE) is a commercial laboratory offering testing and has agreed to coordinate testing among laboratories to help reduce costs. Workshop scientists contributed data from whole genome sequencing of more than 200 horses to discover SNPs for use on this assay tool.
Objective 3:
A consortium was established to annotate functional elements in the genome responsible for regulating phenotypic traits for all animal species. The group is called Functional Annotation of Animal Genomes (FAANG) and is patterned after the ENCODE program that has been successful for studying functional genomics in humans. Dr. Jamie MacLeod (University of Kentucky) has been invited to serve on the guiding committee to represent the interests of horse genomics. Dr. MacLeod has invited participation in a subgroup focusing on horses, called E-FAANG, for Equine – FAANG.
Database Activities: Two databases compile published genetic data for horses: http://locus.jouy.inra.fr/cgi-bin/lgbc/mapping/common/intro2.pl?BASE=horse; http://www.thearkdb.org/. Several genome browsers have been developed at the University of California, Santa Cruz, ENSEMBL and NCBI: http://www.genome.ucsc.edu/cgi-bin/hgGateway?hgsid=95987985&clade=vertebrate&org=Horse&db=0; http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9796; http://www.equinegenome.org/Equinegenome.org.htmlhttp://pre.ensembl.org/Equus_caballus/index.html. A SNP database is available: http://www.broad.mit.edu/mammals/horse/.
A RNAseq database: http://macleod.uky.edu/equinebrowser/ A major entry point for databases and other relevant information about the horse genome workshop and participants is the workshop website: http://www.uky.ledu/AG/Horsemap.
International Efforts: The horse genome technical committee is an international activity with approximately half of the participants coming from Europe, Africa and AustralAsia while the other half come from North America.
Communication: Communication within the horse genome workshop is facilitated by an email list for sharing information by the Horse Genome Coordinator and through the website: http://www.uky.edu/AG/Horsemap. One of the major aspects of the website is to increase its value for informing members of the horse industry about the scientists using horse genomics to solve important problems and to explain the value of horse genomics
Travel and Meeting Support: During 2014, travel awards were provided to 10 students, including one Jorgenson award, and travel support for two invited speakers to the Horse Genome Workshop and to the NRSP8 general meeting.
Future Activities: During 2015 a workshop on Horse Genomics will be conducted under the auspices of the Dorothy Russell Havemyer Foundation in conjunction with the USDA-NRSP8. The workshop will include discussions of applications of the horse genome tools to address issues of performance and health in horses. In addition, one session will be devoted to discussion of FAANG and activities to promote this program. Coordinator funding will be used for partial support of a postdoctoral fellow to work on the Morris Animal Foundation funded project to create a new assembly for the horse genome (Ecab 3.0).
Sheep/Goat Technical Report
Objective 1:
An ongoing project of the ISGC is development of a whole genome reference assembly. In 2010, sequence data were generated at two sequencing facilities (Beijing Genomics Institute and the Roslin Institute) from DNA of a Texel ewe and a Texel ram, respectively. A paper was published in Science in June, 2014 describing the whole genome assembly (Oar v3.1), the RH map, and the linkage map. Highlights of differences between the genome structure of sheep, cattle and goats are included in the paper. The analysis of about a terabite of data on the transcriptome is also included. Variation of alleles, allelic imbalance and copy number variation have been included in the manuscript as points of interest. Biological stories include digestive tract enzymes, evolution of the rumen, lipid metabolism and evolution of wool.
Kim Worley (BCM-HGSC) received funding from a 2013 USDA/AFRI grant to fill in gaps and improve the sheep assembly using PacBio sequence data with PBJelly (scaffolding and gap filling). Around XX whole genome shotgun sequence using the PacBio technology has been generated from the Texel ram used in the sheep assembly. The sequence reads were long (up to 10 kb average) and therefore useful for spanning gaps in a draft genome. For sequences mapped to specific sheep chromosomes, the PBJelly method closed 70% of the gaps, reducing the number of contigs from 117,293 to 35,267. The assembly is more contiguous, with the contig N50 increased from 41.7 kb to 165.2 kb and almost ¼ of the contigs larger than 100kb (8,527; increased from 2,355). The PacBio data appears to improve the GC representation, increasing the G+C content slightly (0.1% of the contig bases). The fraction of the ambiguous bases (Ns) in the scaffolds decreased from 3.12% to 0.87% of the genome. The PacBio work will be in Oar v4.
Objective 2
A USDA/AFRI project, funded in 2013, is focused on a resequencing database that will include extensive annotation of variants. The overall objective of this proposal is to build a comprehensive database of genomic variation for sheep, based on whole genome analysis, and to make the database available to the research community. This resource, referred to as the SheepGenomes DB, will speed discovery and innovation for scientists working in the area of livestock genomics.
Four elements essential to the design, construction and delivery of the SheepGenomes DB have been completed to date. The first element is the finalized design of the database and how it interacts with external public data archives. Importantly, the workflow integrates NCBI, dbSNP and the European Variation Archive (EVA), which delivers key advantages concerning public access of genome information, data storage and variant accessioning. Secondly, a standardized bioinformatic pipeline has been completed for raw sequence read filtering and mapping to create reference guided assemblies of each individual. This is essential, as a standardized pipeline ensures that the variants detected within every animal in SheepGenomesDB can be confidently compared against each other. Thirdly, a mission statement was distributed to the community, resulting in agreements to submit over 400 sheep genomes into the analysis by early 2015 (see table below). Another 1000+ genomes in Run 2 are expected by late 2015. Finally, a web portal has been created to house project information and to support user downloads of SNP and indel information.
Objective 3: n/a
Bioinformatics Technical Report
Objective 1: n/a
Objective 2:
Over the past year, partnered with researchers at Kansas State University, Michigan State University, Iowa State University, and U.S. Department of Agriculture, we continued to further develop and improve the web-interfaced relational databases to store and disseminate phenotypic and genotypic information from large genomic studies in farm animals and better serve the needs of researchers. For example, we are working with the PRRS CAP Host Genome consortium to develop a relational database to house individual animal genotype and phenotype data (http://www.animalgenome.org/lunney). This will help the consortium, whose individual research labs lack expertise with relational databases, share information among consortium members, thereby facilitating data analysis.
PLANS FOR THE FUTURE: Facilitate the development and sharing of animal populations and the collection and analysis of new, unique, and interesting phenotypes. We will seek to partner with any NRSP-8 members wishing to warehouse phenotypic and genotypic data in customized relational databases. This will help consortia/researchers whose individual research labs lack expertise with relational databases to warehouse and share information.
Objective 3:
The following describes the project's activities over this past year. The NAGRP data repository has been actively used by the horse community to share the Variant Call Format (VCF) files in their collaborative research.
Multi-species support: The Animal QTLdb and the NAGRP data repository have been actively serving multiple species research activities. A state-of-the-art online alignment tool (Jbrowse) has been set up on the AnimalGenome.ORG server to serve the cattle, chicken, pig, sheep, and horse communities for QTL/association data alignment with annotated genes and other genome features (http://i.animalgenome.org/jbrowse). The advantage of Jbrowse is that it easily allows user quantitative data- XYPlot/Density, in BAM or VCF format-to be loaded directly to a user's browser for comparisons in the local environment. New data sources and species continue to be updated. This complements GBrowse, which features multiple HD SNP chip, OMIA genes, and STS marker alignments against QTL/association data for cattle, chicken, pig, sheep, and horse. Recently a dedicated virtual machine platform was set up to develop cyber resources for the Striped Bass Genome Database activity, a project led by Benjamin Reading and Charles Opperman at North Carolina State University (http://stripedbass.animalgenome.org/).
Ontology development : This past year we continued to focus on the integration of the Animal Trait Ontology into the Vertebrate Trait Ontology (http://bioportal.bioontology.org/ontologies/VT). We have continued working with the Rat Genome Database to integrate ATO terms that are not applicable to the Vertebrate Trait Ontology into the Clinical Measurement Ontology (http://bioportal.bioontology.org/ontologies/CMO). Traits specific to livestock products continue to be incorporated into a Livestock Product Trait Ontology (PT; http://animalgenome.org/cgi-bin/amido/browse.cgi). We have also continued mapping the cattle, pig, chicken, sheep, and horse QTL traits to Vertebrate Trait Ontology (VT), Product Trait Ontology (PT) and Clinical Measurement Ontology (CMO) to help standardize the trait nomenclature used in the QTLdb. A new web page is set up to reflect this development (http://www.animalgenome.org/bioinfo/projects/ato/alt), with links to the three new sites for VT, PT, and CMO respectively. At the request of community members, at least 45 new terms were added to the VT in 2014. Anyone interested in helping to improve the ATO/VT is encouraged to contact James Reecy (jreecy@iastate.edu), Cari Park (caripark@iastate.edu), or Zhiliang Hu (zhu@iastate.edu). The new VT/PT/CMO cross-mapping has been well employed by the Animal QTLdb and VCMap tools. Annotation to the VT is now also available for rat QTL data in the Rat Genome Database and for mouse strain measurements in the Mouse Phenome Database. Finally, we have made plans to expand the livestock breed ontology with updated data from Oklahoma State University, Food and Agriculture Organization, and from China.
Continuing work on the chicken anatomy ontology is based upon UA biocurator funds, with work focusing on (1) linking adult chicken anatomy terms with the Uberon ontology (of generic anatomical terms) and (2) adding developmental terms provided by Prof Burt's group at the Roslin Institute. Currently the chicken anatomy ontology contains 14,627 terms, cross-referenced with the Uberon ontology (and other related anatomy ontologies). Since this ontology will be required for the Functional Annotation of Animal Genomes (FAANG) Project, during 2015 we will seek competitive funding for a full-time biocurator to complete this ontology.
Software development : The NRSP-8 Bioinformatics Online Tool Box has been actively updated (http://www.animalgenome.org/bioinfo/tools/). Software upgrades were made continually to SNPlotz, Gene Ontology CateGOrizer, and the Expeditor. The CateGOrizer is now bundled with a new external tool, ReviGO, for the convenience of users to take CateGOrizer outputs directly to ReviGO for a semantic representative subset analysis.
In collaboration with Dr. Shengsong Xie and Yuhua Fu from Shanghai, China, a sRNAPrimer designing tool has been made available through AnimalGemome.ORG (http://www.animalgenome.org/cgi-bin/host/sRNAPrimer/d).
As a result of collaborations between Iowa State University, the Medical College of Wisconsin, and University of Iowa, the Virtual Comparative Map
(http://www.animalgenome.org/VCmap/) tool has passed its initial development stage and is at a stable working status serving the community. Application development, improvement, and testing have continued. Online help materials have been added, including a written user manual and a video tutorial. AgBase and the NRSP-8 websites provide multiple reciprocal reference links to facilitate resource sharing. Please feel free to try things out and send any feedback to vcmap@animalgenome.org.
Gene nomenclature standard:
During 2014 the Chicken Gene Nomenclature Committee (CGNC) updated nomenclature to support new annotations from both NCBI and Ensembl. We currently provide standardized nomenclature for 16,422 genes and this data is now routinely distributed to both NCBI Entrez and Ensembl. During 2014 funding to support chicken gene nomenclature was provided by NIH NIGMS Project number 5R24GM079326-02 and during 2015 we will be seeking continued competitive funds for this project.
The initial cattle gene nomenclature is provided by the Bovine Genome Database.
Currently we have standardized gene nomenclature for 9,910 Bos taurus genes based upon homology to assigned human gene nomenclature (http://www.animalgenome.org/genetics_glossaries/bovgene). We are also working with HGNC to support the development and use of standardized gene nomenclature for livestock species.
Minimal standards development : We have continued to work on the MIQAS project to help define minimal standards for publication of QTL and gene association data (http://miqas.sourceforge.net/). The most recent works were to develop documentations how this was done in Animal QTLdb.
Expanded Animal QTLdb functionality: In 2014, a total of 9,063 new QTL have been added to the database. Currently, there are 12,618 curated porcine QTL, 13,415 curated bovine QTL, 4,379 curated chicken QTL, 1,005 curated horse QTL, 791 curated sheep QTL, and 127 curated rainbow trout QTL in the database (http://www.animalgenome.org/QTLdb/). All included livestock QTL data have been ported to NCBI, Ensembl, and UCSC genome browser. Now users can fully utilize the browser and data mining tools at NCBI, Ensembl, and UCSC to explore animal QTL/association data. In addition we have continued to improve existing and add new QTLdb curation tools and user portal tools. The new additions include a batch data loading tool to speed up the curation process and a new API tool set to facilitate programming access to the database (see our poster #1157 for details).
Facilitating research: The Data Repository for the aquaculture, cattle, chicken, and pig communities to share their genome analysis data has proven to be very useful (http://www.animalgenome.org/repository). New data is continually being added.
A total of 1,126 data files on different animal genomes, supplementary data files to publications, and other sharing purpose have been made available to community users. More than 50 data files were shared/transmitted through the online data file-sharing tool by collaborators and/or groups in the community.
Our helpdesk is here to assist community members. Throughout 2014, we have helped more than 60 research groups/individuals with their research projects and questions. Our involvement has ranged from data transfer, data assembly, and data analysis, to software applications, code development, etc. Please continue to contact us as you need help with bioinformatic issues.
Community support and user services at AnimalGenome.ORG : We have been maintaining and actively updating the NRSP-8 species web pages for each of the six species. We have been hosting a couple dozen mailing lists/web sites for various research groups in the NAGRP community (http://www.animalgenome.org/community/). This includes groups like AnGenMap, "CRI-MAP users", "Sheep Models", etc. The most recent addition is a new web site for the Functional Annotation of ANimal Genomes (FAANG) project, with list mailing, user forum, wiki pages, and online publishing capabilities to support coordinated international action to accelerate Genome to Phenome. An increasing number of web hits and data downloads continued in 2014. For example, AnimalGenome.ORG received over 3.7 million web hits from 237,000 individual sites (visitors), which made 970,000 data downloads that generated almost 2 TB internet traffic.
Reaching out: We have been sending periodic updates to over 2,500 users worldwide to inform them of the news and updated information we develop or host at AnimalGenome.ORG.
More than 38 new items were updated to the community in 2014.
PLANS FOR THE FUTURE: Develop, integrate, and implement bioinformatic resources to support the discovery of genetic mechanisms that underlie traits of interest.
We will continue to work with bovine, mouse, rat, and human QTL database curators to develop minimal information for publication standards. We will also work with these same database groups to improve phenotype and measurement ontologies, which will facilitate transfer of QTL information across species. We will continue working with U.S. and European colleagues to develop a Bioinformatics Blueprint, similar to the Animal Genomics Blueprint recently published by USDA-NIFA, to help direct future livestock-oriented bioinformatic/database efforts.