SAES-422 Multistate Research Activity Accomplishments Report

Status: Approved

Basic Information

Participants

See attached

NRSP-8 Business Meeting Date January 12, 2014 1. Call to order by Milt Thomas (roughly 5:15pm), with Stephen White serving as the secretary. 2. Coordinator Reports Milt Thomas called on each species coordinators to give their report. The allotted time for each report was shortened due to the need to use the room to prepare for the opening session of PAG. Thus, below is a list of the species reports. The specifics of each report were submitted in their annual report. Notable highlights of these 7 reports follows the list. a. Equine – Ernie Bailey b. Swine – Chris Tuggle and Cathy Ernst c. Aquaculture – Caird Rexroad (in place of John Liu) d. Cattle – Juan Medrano, Alison Van Eenennaam, Jerry Taylor e. Sheep/Goat – Stephen White f. Poultry – Mary Delany g. Bioinformatics – James Reecy Notable highlights: Many of the species groups have included or expanded the number of Co-Coordinators. The complete list of coordinators and co-coordinators now includes: Equine – Ernest Bailey, Molly McCue, Samantha Brooks; Swine – Chris Tuggle, Cathy Ernst; Aquaculture – John Liu, Caird Rexroad III; Cattle – Juan Medrano, Jerry Taylor, Alison Van Eennaam; Sheep/Goat – Noelle Cockett, Stephen White; Poultry – Mary Delany, Hans Cheng; Bioinformatics – James Reecy, Max Rothschild, Susan Lamont, Chris Tuggle, Fiona McCarthy. Multiple genome assembly improvement and high density genotyping projects are under way, and the sheep reference genome assembly paper has been submitted for publication. Multiple species groups also expressed enthusiasm for AgENCODE projects amid discussion following the white paper meeting that preceded the weekend meeting. 3. Administrator reports a. Eric Young Eric stated that the renewal of NRSP8 had gone extremely well, with no edits or changes recommended. The budget has also been approved. There will now be a 5 year approval process with a mid-term review in year 3. This review will be dependent on the success documented in the annual reports. Thanks to Tom Porter, Milt Thomas and the writing committee. Finally, Eric reminded everyone that the current report will be the project-end report for the terminating NRSP8 project (2008-2013), so project reports need to include accomplishment summaries over the whole project period. b. Lakshmi Matukumalli Due to funding restrictions, Lakshmi did not attend but sent his apologies. 4. Call for old business: no items requested or presented. 5. Call for new business: Noelle Cockett spoke about AgENCODE and steps that will contribute to its success. a. Steve Ellis had mentioned an NSF call for comments from the community about next funding priorities. Jim Reecy will provide notice on AngenMap. Community feedback on the importance of regulatory element identification across taxa will be important, but those that respond need to keep in mind the scientific priorities of NSF do not explicitly include agricultural production. b. Also, NRSP8 is planning to host a meeting year in the Washington, DC area in the first half of 2014 to help coordinate international efforts and funding sources for AgENCODE. A committee including Chris Tuggle, Joan Lunney, and Stephen White will be advancing this effort. 6. Nominations for next business meeting: no items requested or presented. Location confirmed as San Diego for next year. 7. Nominations for Secretary/Chair-elect: Joan Lunney nominated Daniel Ciobanu of University of Nebraska for the Secretary in 2015 and Chair in 2016. Second was obtained from Susan Lamont and the motion passed unanimously. Daniel had been notified of the intention to nominate and had agreed to accept the nomination prior to the business meeting. 8. "pass the gavel" to Stephen White: After passing the gavel, the NRSP8 community thanked Milt Thomas for his leadership in the last year during a time of transition for the NRSP8 projects. 9. Meeting adjourned Minutes prepared by Stephen White, 02/05/2014.

Accomplishments

OBJECTIVES Objective 1: Create shared genomic tools and reagents and sequence information to enhance the understanding and discovery of genetic mechanisms affecting traits of interest. Objective 2: Facilitate the development and sharing of animal populations and the collection and analysis of new, unique and interesting phenotypes. Objective 3: Develop, integrate and implement bioinformatics resources to support the discovery of genetic mechanisms that underlie traits of interest. Aquaculture Technical Report Objective 1: Catfish: The channel catfish genome assembly has been improved by changing the assembly algorithm from ABySS to MaSuRCA. The current version of the catfish genome assembly (v1.1) has 95% of the channel catfish genome sequences spanning 780.7Mb in 46,936 contigs and 8,597 scaffolds. We are continuing to work with the developers of Celera Assembler to optimize assembly of long PacBio and Illumina reads. The catfish genome was annotated using transcriptome sequencing. Through transcriptome analysis of various tissues, a total of over 23,000 complete cDNAs have been assembled and annotated. Gene families and gene duplications were analyzed. A draft genetic linkage map from 3-generation families has been produced and is currently analyzed and will be used to assist genome assembly. A 250K SNP array based on Affymetrix Axiom technology has been constructed. Oyster: The transcriptome of an adult Eastern oyster (Crassostrea virginica) was sequenced with short Illumina reads and assembled into 66,229 contigs. The de novo assembly covers 90% of published ESTs and a set of ~40K contigs have been annotated using public databases. 657 genes related to innate immunity have been identified. RNA sequencing of C. virginica samples collected before and after the Deep Water Horizon oil spill resulted in a de novo transcriptome assembly where 9,469 transcripts were homologous to Pacific oyster transcripts. RNA seq data are being used to identify potential effects of oiled water and sediments on the Eastern oyster. A Crassostrea gigas fosmid library was constructed that contains 459,936 clones representing 22.34-fold haploid genome equivalents. End sequencing revealed over 6000 sequences with open reading frames e 300 bp, 1 million SNPs, and 3200 SSRs. Fifty-six SNPs were identified in C. gigas sequences mined from the EST database. Forty-two SNPs conform to Hardy-Weinberg Equilibrium and 28 are polymorphic in a full-sib family, suggesting these SNPs will be useful for pedigree analysis, association studies and marker assisted selection. Salmonids: To identify genes and gene products that are essential in the regulation of embryonic development in rainbow trout, RNA-Seq analysis was performed on eight RNA samples isolated from developing embryos. There are 2,020 transcripts that are only expressed in embryos before cell division, and 34 genes that start to express in 3d embryos, the onset of maternal zygotic transition in rainbow trout. In addition, a total of 50,351 novel transcripts were identified from the dataset, and 3,329 to 17,312 splice variants were observed at different stages of embryonic development. The first rainbow trout high density 57K SNP chip was developed and characterized. Approximately 50K of the SNPs were validated in a panel of 18 rainbow trout populations at the standard 97% call rate of the Affymetrix SNP polisher software. Striped Bass: Genomic DNA (30.52 Gb) sequenced from 4 domesticated striped bass was assembled into ~517 Mb comprised of 71,500 contigs averaging 7 Kb, with several over 80Mb and one >100 Mb. Contig coverage is generally 30X, with fewer than 10 contigs >400X, suggesting the genome is near 600 Mb, similar to the confamilial European sea bass. Over 200 million unique sequences of small RNAs from ovarian tissues were obtained, with most representing piRNAs expressed in early oogenesis, including ~400 miRNAs known to regulate transcript translation and degradation. Striped bass and white bass are the parental species of the hybrid striped bass (white bass,Morone chrysops X striped bass, M. saxatilis). Major tissues and organs (brain, liver, spleen, kidney, ovary, testes, etc) from 10 individuals from each species (5 male and 5 female) were harvested and RNA sequenced in a lane of Illumina HiSeq2000. A total of 262 x 106 high quality reads were obtained with 135 x 106 reads from striped bass and 127 x 106 reads from white bass. Reads were assembled into 203,587 striped bass contigs and 185,531 white bass contigs. Annotation was carried out by BLAST against the UniProt and nr databases for both species. Again, similar results were obtained from both species, with 18,630 UniProt and 23,605 nr annotated unigenes in striped bass and 18,584 UniProt and 22,354 nr annotated unigenes in white bass. Objective 2: Catfish:Bulked segregant RNA-seq (BSR-Seq) was used to analyze differentially expressed genes and associated SNPs with disease resistance against enteric septicemia of catfish (ESC). A total of 1,255 differentially expressed genes were found between resistant and susceptible fish. In addition, 56,419 SNPs were identified as significant SNPs between susceptible and resistant fish located on 4,304 unique genes. Detailed analysis of these significant SNPs allowed differentiation of significant SNPs caused by genetic segregation and those caused by allele-specific expression. Mapping of the significant SNPs, along with analysis of differentially expressed genes, allowed identification of candidate genes underlying disease resistance against ESC. Genotyping-by-sequencing was conducted on individuals from populations of wild and aquacultured blue catfish. Markers were validated and extended using multiplex Sequenom MassArray assays and should be useful in follow up studies of the diversity of cultured blue catfish populations and in parentage studies. In-depth transcriptome sequencing of channel catfish resistant and susceptible to Flavobacterium columnare as well as microbiome sequencing of channel, blue, and hybrid catfish mucosal tissues. Oyster: Sequence polymorphisms and differential gene expression patterns were identified that can distinguish among two C. gigas lines exhibiting either high or low survival with respect to summer mortality. Salmonids: Several studies were completed to investigate genetic variation of multiple salmonid species including Chinook salmon, steelhead/rainbow trout, and cutthroat trout. Studies included investigation into the genetic basis for traits such as thermal adaptation and migration. QTL mapping families for stress response and bacterial cold water resistance in rainbow trout that were previously genotyped with microsatellites, were re-genotyped with ~5,000 restriction-site associated DNA (RAD) SNPs. The major microsatellite QTL were validated by the new RAD SNPs linkage maps. Sequence information from the RAD SNPs is useful for aligning the QTL with sequence contigs from the rainbow trout draft genome assembly in an effort to identify positional candidate genes. Striped Bass: Novel supervised machine learning analyses identified networks of expressed ovarian genes and proteins that collectively function to determine a complex phenotype, egg quality. Artificial neural networks (ANNs) were used to reveal a powerful relationship (R2 >90%) between profiles of maternal ovary gene expression and subsequent egg fertility in wild and domestic striped bass. RNAseq data from the same fish is being mined for single nucleotide polymorphisms (SNPs) to determine if there is a genetic basis for egg quality. K-means clustering and support vector machines (SVMs) were applied to quantitative tandem mass spectrometry data to reveal a strong relationship (R2 >83%) between ovarian stage and protein profiles during the annual reproductive cycle. Objective 3:The aquaculture community works with the Bioinformatics Coordinator to develop species-specific resources, such as those included in the Animal QTLdb. Large sequence databases are also publicly available at www.animalgenome.org/aquaculture/database/. Oyster: SQLShare has been used to store, distribute, and query large genomic datasets from the Pacific oyster. Details of this project, including tutorials for the freely available resource are available at: github.com/sr320/qdod/wiki. Salmonids: The working draft of the rainbow trout genome assembly reported in 2012 was placed on the animalgenome.org web site hosted by the NRSP-8 Bioinformatics Coordinators. It is now available for downloading by the general public. In addition, an excel file with the genome location of the 145K RAD SNPs dataset reported in 2012 is available from the same web site and as an appendix file from the journal of Molecular Ecology Resources. Striped Bass: A new high performance computing cluster dedicated to NGS analysis in studies of striped bass. This new cyberinfrastructure, built around the open source scientific computing platform  Galaxy, was used successfully to assemble an ovarian transcriptome from RNA-seq data. Cattle Technical Report Objective 1: Bovine Genome sequence: An important focus of the community has been towards improving the bovine genome assembly. There are several actively funded efforts in this directions and the expectation is to have new updates on the assembly in 2014. The Bovine Genome Improvement Consortium was formed, which is a group of scientists working to improve the bovine reference genome assembly and its annotation. Multiple data types have been or are in the process of being generated, such as an optical map, Illumina paired-end and mate-pair sequence, PacBio sequence, and improved gene predictions based on RNA-seq data. All of the data will be derived from tissue samples from L1 Dominette 01449, the reference animal. The goal of the group is a single reference genome sequence with fewer gaps, misassemblies, and missing genes. The consortium held a conference call November 21, 2013 to coordinate efforts between various projects and will be meeting at PAG in January 2014. The specific efforts currently underway and supported by NIFA grants are: 1) David Schwartz and Shigou Zhou (University of Wisconsin, Madison): Currently the optical map has 76 contigs. It covers 97% of the genome (based on UMD3.1 and Btau 4.6. reference genomes), with a genome coverage of 447x. It has 8.9 kb average fragment size, with an average contig size of 34 Mb. The optical map can be compared to any assembly of the Dominette reference genome and will report back any discordances (likely misassemblies) of any kind and can also place unmapped contigs. Overall both assemblies are in reasonable shape, but the optical map will contribute significantly to their improvement. 2) Chris Elsik (University of Missouri) will create a dedicated page at www.bovinegenome.org to direct people towards the efforts of improving the bovine genome reference. Jim Reecy will mirror the information from the Animal Genome website. 3) Kim Worley (Baylor College of Medicine, Houston, TX) is using long PacBio reads to improve the assembly. The target is to develop a 10X PacBio coverage for both the sheep and cattle genomes and use PBJelly software (developed in-house at Baylor) to fill gaps and improve scaffolds. 4) Jared Decker (University of Missouri, Columbia): Aleksey Zimin and others will generate the new genome assembly of Dominette data (not currently NIFA supported). 5) Chris Elsik (University of Missouri, Columbia) will improve gene annotation using RNA-seq data, new software and visualization applications and community annotation. Objective 2: n/a Objective 3: Bioinformatics and database resources: Dr. Harvey Blackburn at USDA-ARS National Animal Germplasm Program (NAGP) and the Colorado State University Agricultural Experimental Station have joined efforts to begin the development of genomic databases that will serve as a repository for DNA data from the large animal genomics projects funded by AFRI, the dairy and beef industry, and other large projects that may have valuable data that need permanent archiving for future research. This effort, coupled with the existing capacities to store phenotypic and production system data in Animal-GRIN as well as germplasm/tissue samples, will facilitate the communities efforts to maintain valuable data for future use. Database and bioinformatics activities are also coordinated by Jim Reecy (NRSP8 Bioinformatics Coordinator) at the NAGRP site (http://www.genome.iastate.edu/cattle/). Swine Technical Report Objective 1: Genome Map Development Update: New gene markers continue to be identified with the development of the 60K SNP chip and GWAS and sequencing efforts. The 60KSNP chip information can be integrated with the development of Build 10.2 as maps now are based on the pig sequencing efforts. Shared Materials and Funding:The Pig Genome Coordinator has recently supported community activities to find associations with many different traits. In FY 2013, several projects including those for disease resistance, reproduction and meat quality were supported. This brings the total to well over 3,000 chips/genotyping for those several projects from 2009-2013. Porcine SNP chip update: Illumina and the International Porcine SNP Chip Consortium developed a porcine 60K+ SNP and has shipped it to many researchers worldwide. The original publication was Ramos et al. 2009. Prices for the chip have been dropping and are reasonable. A new custom low density chip is now available for imputation work. GeneSeek, a supplier of genotyping services has announced the GeneSeek Genomic Profiler for Porcine LD (GGP-Porcine). This custom low density BeadChip utilizes Illumina Infinium chemistry and features approximately 8,500 SNPs for high density chip imputation. The GGP - Porcine BeadChip also includes gene markers from several well-known reproduction, growth, feed efficiency, and meat quality traits at no added expense. These include the following markers: EPOR, MC4R, HMGA, CCKAR, PRKAG,ESR, and CAST. Details on these markers will be available from GeneSeek. In addition, researchers can request additional markers including the HAL, Rendement Napole (RN), resistance marker to E.coli (F4 ab/ac), a SNP parentage panel, which impacts litter size in Large White or Yorkshire by paying additional royalty fees for these optional licensed tests. The chip was developed as a result of a collaborative effort involving leading academic, USDA, and GeneSeek researchers. The price (per sample) is about 40% of the cost of the 60K chip. Objective 2: n/a Objective 3: Database Activities: The Pig Genome Database continues to receive considerable updating. The Animal QTLdb included 1468 new pig QTL in during 2013(release #21), making the total number of pig QTL in the database 8,919., Throughout 2013, the NAGRP bioinformatics team has continued their efforts to make improvements to the Animal QTLdb, which includes a new mirror site in China, facilitate the addition of gene network analysis data, improved search tools and data analysis tools. Users are encouraged to register an account to enter new QTL data. Find out more from http://www.animalgenome.org/QTLdb . In addition, the pig genome build 10.2 annotations are continuing to be updated in the BioMart (http://www.animalgenome.org:8181) and for the Animal QTLdb. Poultry Technical Report Objective 1: Reference linkage map. Linkage mapping is now primarily via high throughput SNP assays. Very high density SNP mapping (ca. 600,000 SNP) panels have been developed and are being employed in genome-wide association studies and genome-wide marker-assisted selection (GMAS). Last year, 192 Affymetrix 600K genotypes were obtained from DNA Landmarks for various committee members using coordination funding. Physical and comparative maps. Physical mapping of the turkey genome is complete, involving construction of a detailed comparative chicken-turkey BAC contig comparative map. Chicken genome sequence. A new build, Galgal4.0, of the chicken genome sequence which combines the original reads, next generation sequencing (NGS) reads (Roche and Illumina) and the near-finished quality of the Z sequence done by Bellott et al. (Nature 466:612-616, 2010) was released late in 2011 and is now on browser sites. This still has not captured the roughly 5% of missing sequence (believed to be predominantly on the microchromosomes). Methods to fill gaps and obtain the missing sequence are being pursued (e.g., optical mapping, PacBio and Moleculo sequence methods, new assembly algorithms). Optical mapping and Moleculo sequencing of the reference genome was supported this year through coordination funds. In addition, Cobb-Vantress Inc. made a ~$150,000 commitment to this effort being led by The Genome Institute at Washington U., and additional support has been obtained this year from a USDA-NIFA-AFRI grant submission. A number of additional chicken genomes have been sequenced via NGS. This year coordination funds supported a project to sequence 19 different chicken lines of interest to NRSP-8 members. That project is currently in progress with data availability anticipated by the end of 2013. Turkey genome sequence. The Turkey Genome Sequencing Consortium generated a draft sequence of the turkey genome (Dalloul et al., PLoS Biology 8(9):e1000475, 2010) using a combination of NGS reads, along with the turkey BAC contig map noted above. Coordination funds were committed to aid in this effort which also enjoyed support from VaTech, BARC and U. of Minn., among others. Efforts are on-going to improve the annotation of genes and fill gaps in the turkey sequence, as funded by a subsequent AFRI grant. Chicken microarrays. Previously, coordination funds provided microarrays for transcriptional profiling and comparative genome hybridization. Some coordination support also was committed to Illumina RNA-sequencing and Agilent chip-based transcriptional profiling, partly in hopes of filling in missing sequences Objective 2: DNA from the East Lansing international reference mapping population has been sent to many laboratories throughout the world. Similarly, DNA from the junglefowl used to generate the reference sequence assembly has been widely distributed, especially for copy number variant studies. Objective 3: Database activities are led by the NRSP-8 Bioinformatics Coordinator, Jim Reecy, and Susan Lamont, along with Shane Burgess, represent poultry interests on the advisory committee for this group. Poultry bioinformatics has also benefitted from support at several other locations. We maintain a homepage for the NRSP-8 U.S. Poultry Genome project (http://poultry.mph.msu.edu) that provides a variety of genome mapping resources, including our newsletter archive. The Poultry Genome Newsletter is published quarterly and is distributed through our Homepage and on the angenmap email discussion group. Equine Technical Report Objective 1: Two major goals for the horse genome workshop have been improvement of the reference genome and development of a new assay tool for SNP analysis. Gene discovery is one of the main research areas for scientists in this section and problems with the current assembly were discussed at the previous PAG meetings and at a horse genome Workshop held during the July 2014 under the auspices of the Dorothy Russell Havemeyer Foundation. REFSEQ data helped to identify genes which had not been included in the assembly; REFSEQ data identified numerous differences between the genome annotation for the horse and the actual situation for the horse with regard to intron-exon structure and sequence; a gene for one trait was identified among sequences which had not been included in the assembly and relegated to chromosome UN. Toward the goal of improving the reference sequence, additional NEXTGEN DNA sequence was obtained for the reference horse, TWILIGHT, and compared to the annotated reference genome. Discrepancies between the assemblies suggested novel approaches that will be part of research grant designed to create a new assembly of the reference genome. Scientists at NCBI were contacted and stated that they have created and are continuing to develop processes to annotate a new assembly. A new assembly for the horse would be readily handled by these programs. The studies described above are subject of publications during the last year or will be published in the near future. Illumina SNP chips have been used extensively by workshop participants to make gene discoveries. The current tool (Illumina Equine SNP70) is about to expire and the workshop participants are collaborating to develop a new SNP assay tool using a format developed by Affymetrix, Inc. The earlier tools were based on SNPs discovered during the initial production of the reference sequence and are heavily biased towards SNPs found in Thoroughbred horses. The new chip is being designed based on SNPs detected in connection with NEXTGEN sequencing of more than 150 horses of diverse breeds. A set of 200 horses of diverse breeds will be tested with 2 million SNPs and a set of 670,000 SNPs will be selected for the final assay tool. The work began on this in 2013 and the tool will be available in the middle of 2014. Objective 2: The extent of variation among horses was assayed using the Illumina SNP50 chip by testing 814 horses of 36 different breeds. Workshop participants collaborated in a consortium to assemble DNA samples from horses of diverse breeds from around the world. SNPs were evaluated for information content they provided and a set of 10,536 used to construct trees reflecting the relationships among breeds as well as the efficacy of the set of SNPs to identify individual horses with their stated breed. The work was published in 2013. A second published study extended that work to investigate signatures of selection. Objective 3: During 2013 a committee was established to standardize and database nomenclature for horse genomics. Sheep/Goat Technical Report Objective 1: Develop high resolution genome maps for sheep: An ongoing project of the ISGC (international sheep genome consortium) is development of a whole genome reference assembly. In 2010, sequence data were generated at two sequencing facilities (Beijing Genomics Institute and the Roslin Institute) from DNA of a Texel ewe and a Texel ram, respectively. The first step of the reference sequence assembly involved de novo assembly of 75X reads from the Texel ewe into contigs and scaffolds. Once that was completed, sequences from both animals were used for gap filling. Version 2.0 of the ovine whole-genome reference sequence (Oar v2.0) was publicly released in February, 2011 and Oar v3.1 was released in October, 2012 through NCBI GenBank. For chromosome assemblies go to http://www.ncbi.nlm.nih.gov/assembly/GCA_000298735.1/ and for the full assembly, including scaffolds and contigs not assigned to chromosomes, go to http://www.livestockgenomics.csiro.au/cgi-bin/gbrowse/oarv3.1/ To allow annotation by Ensembl, the ISGC has agreed not to release a new version of the assembly until late 2015, although update patches for some regions will likely be released before then. The RNA dataset produced by Roslin Institute and submitted to Ensembl is the largest transcriptome analysis of any species in Ensembl, including man. A manuscript describing the whole genome assembly (Oar v3.1), the RH map, and the linkage map is in preparation. Highlights of differences between the genome structure of sheep, cattle and goats are included in the manuscript. The analysis of about a terabite of data on the transcriptome is also included. Variation of alleles, allelic imbalance and copy number variation have been included in the manuscript as points of interest. Biological stories include reproduction, digestive tract enzymes, evolution of the rumen, lipid metabolism and evolution of wool. Kim Worley (BCM-HGSC) received funding from a 2013 USDA/AFRI grant to fill gaps in the sheep assembly using PacBio data with PBJelly (scaffolding and gap filling) and Honey (structural variation identification and assembly QC). For sheep, this approach has produced 19x long read data (40 kb max, 8.5 kb N50, and 6 kb mean). Using three rounds of PBJelly 2, analysis has moved the assembly contig N50 from 41.7 kb to over 500 kb and closed 89% of gaps. However, there was only a minor shift in scaffold N50 (from 100.1Mb to 101.2Mb). Improvement of the Y chromosome, which is full of repetitive sequence, will not be possible with this method. The PacBio work will be in Oar v4. Develop high resolution genome maps for goats: The sequencing of the goat genome and development of a more refined genome assembly has continued over the past year. The San Clemente Island goat, Papadum, sequencing and assembly by the USDA and VSU has been evolving as the sequencing technologies evolve. Currently, we have about 67X coverage of Illumina HiSeq sequence data and more than 13X coverage with PacBio sequences. The PacBio technology has been rapidly improving and collaboration with Tim Smith at USDA-ARS Clay Center has allowed us to put more of our sequencing focus on PacBio technologies. Currently we have over 8X coverage of PacBio sequences that are >5 kb. Currently, we are utilizing the next generation of PacBio chemistry for improve read lengths and are pushing the coverage of Papadum to 70X on PacBio and increasing our Illumina sequence coverage. The Illumina 50K SNP chip characterization and development paper has been accepted for publication by PLoS One led by Gwenola Tosser at INRA. The SNP chip is still available and was used heavily over the last year by many groups. Additionally, the first non-IGGC publication using the SNP chip is out and we expect to have more publications coming out over the next year. Gwenola Tosser-Klopp, Philippe Bardou, Olivier Bouchez, Cédric Cabau, Richard Crooijmans, Yang Dong, Cécile Donnadieu-Tonon, André Eggen, Henri C. M. Heuven, Saadiah Jamli, Abdullah Johari Jiken, Christophe Klopp, Cynthia T. Lawley, John McEwan, Patrice Martin, Carole R. Moreno, Philippe Mulsant, Ibouniyamine Nabihoudine, Eric Pailhoux, Isabelle Palhière, Rachel Rupp, Julien Sarry, Brian L. Sayre, Aurélie Tircazes, Jun Wang, Wen Wang, Wenguang Zhang, and the International Goat Genome Consortium. 2014. Design and Characterization of a 52K SNP Chip for Goats. PLOS ONE 10.1371/journal.pone.0086227 Objective 2 Provide genome mapping resources for sheep: An ovine HD (600K) SNP chip was released by Illumina in 2013. Parameters for inclusion of SNPs on the chip were equal spacing (80% of the SNPs), functional, GBS (genotyping by sequencing), literature and 50K chip SNPs (that were not already included under equal spacing). There is still 6K head room which can be added by users. The final design provides about 12 SNPs every 50K. There should be little ascertainment bias across breeds; however, there will be bias within breeds. There are a lot of low MAF SNPs on the HD chip so users should not create a cluster file, and instead use the cluster file created with ~3,500 animals and available from Illumina. The SNPs will be deposited in dbSNP by March, 2014. The chip should be useful for GWAS and evolutionary studies, and be moderately useful for imputation. A USDA/AFRI project, funded in 2013, is focused on a resequencing database that will include extensive annotation of variants. To date, 75 animals from the HapMap project that were included in the sequencing project for SNP identification were chosen and represent 40 breeds and 2 wild sheep species (thin tail and big horn). BCM-HGSC did the sequencing and then aligned the sequences using Oar v3.1. Over 24M SNPs were detected and some selection sweeps across breeds were identified such as pigmentation, horns and shape of ears. As expected, no breed-specific selection sweeps were found. Provide genome mapping resources and unique goat populations: We are continuing collaboration with the USDA-ARS Beltsville faculty and research groups from ILRI, ICARDI, ASARECA working in Africa and international collaborators from Brazil, Austria, UK, China, New Zealand and Australia, on a project to improve management of goat genetic resources and the goat production value chain in Africa. The project has collected >2400 samples from ~55 sites in 12 African countries. The extractions of DNA are complete for most of those samples and are currently working on genotyping these samples with the Illumina 50K SNP panel. We have utilized the Illumina 50K SNP panel to begin characterization of U.S. meat goat breeds. We have genotyped samples from the Boer, Myotonic, Kiko and Spanish breed populations. We are using the panel to analyze the breed relationships and origins as well as inbreeding levels. Objective 3: Bioinformatics and genetic mechanisms in goats: We are continuing to develop our bioinformatics tools and activities. This past year we have begun the development of a novel mathematical model for candidate gene finding utilizing bioinformatics and model systems analysis approach. We have developed methods for identifying protein-protein interactions, transcription factor binding sites and epigenetic factors from multiple species and are working on methods to combine these data types for an improved prediction of candidate genes in a cross-species analysis. Bioinformatics Technical Report Objective 1: n/a Objective 2: Over the past year, partnered with researchers at Kansas State University, Michigan State University, Iowa State University, and U.S. Department of Agriculture, we have further developed and improved the web-interfaced relational databases to store and disseminate phenotypic and genotypic information from large genomic studies in farm animals and better serve the needs of researchers. For example, we are working with the PRRS CAP Host Genome consortium to develop a relational database to house individual animal genotype and phenotype data (http://www.animalgenome.org/lunney/index.php). This will help the consortium, whose individual research labs lack expertise with relational databases, share information among consortium members, thereby facilitating data analysis. Objective 3: Poultry: A total of 477 new QTL were curated into the Animal QTLdb (http://www.animalgenome.org/QTLdb/chicken.html). Chicken QTL can be visualized against the genome at http://www.animalgenome.org/cgi-bin/gbrowse/chicken/ and aligned with chicken 60K SNPs along with NCBI-annotated gene information (http://www.animalgenome.org/cgi-bin/gbrowse/chicken/) on genome build GG_4.0. In addition, we continue to mirror Dr. Carl Schmidt's Gallus genome browser while the original site is undergoing restructuring (http://www.animalgenome.org/cgi-bin/gbrowse/gallus/). The Chicken Gene Nomenclature Committee (CGNC) database was initiated with NRSP-8 funds to provide standardized gene nomenclature for chicken genes. As of 30 December 2013, we have assigned nomenclature for 14,800 genes using orthology to HGNC assigned gene names and a further 1,684 genes have been manually assigned nomenclature by biocurators supported by Arizona state funds. We also responded to 11 community requests to provide chicken gene nomenclature. The standardized chicken gene names are publically displayed at the NCBI Entrez Gene database and we are working with Ensembl to ensure they are able to also access this data. Cattle: In the past year, 2000 new cattle QTL were added to the Animal QTLdb (http://www.animalgenome.org/QTLdb/cattle). In addition, cattle QTL can now be viewed relative to both the UMD3.1 assembly (http://www.animalgenome.org/cgi-bin/gbrowse/bovine/) and Btau4.2 assembly (http://www.animalgenome.org/cgi-bin/gbrowse/cattle). Cattle 770K high-density SNPs and 4.1M dbSNP data are now available in GBrowse to align with QTL and in SNPlotz for genome analysis (http://www.animalgenome.org/tools/snplotz/). We have also updated the initial cattle gene nomenclature provided by the Bovine Genome Database, providing standardized gene nomenclature for 9,910 Bos taurus genes based upon homology to assigned human gene nomenclature. This data is available at http://www.animalgenome.org/genetics_glossaries/bovgene. Swine: The pig genome sequencing information has been updated at http://www.animalgenome.org/pigs/genome/ and a new pig genome database has been under active development (http://www.animalgenome.org/pig/genome/db/). In the past year, a total of 1,547 new QTL were added to the AnimalQTLdb (http://www.animalgenome.org/QTLdb/pig). The pig gene Wishlist (http://www.animalgenome.org/cgi-bin/host/ssc/gene2bacs) has continued to support the pig genome annotation activities. Sheep: In 2013, 36 new sheep QTL were added to the Animal QTLdb (http://www.animalgenome.org/QTLdb/sheep). Active updates have been continued for the NRSP-8 web site for activities in the sheep genome community (http://www.animalgenome.org/sheep/). GBrowse alignments for sheep 54K SNP and BAC clones were set up on OAR Build 3.1. Aquaculture: Many useful links for aquaculture can be found at http://www.animalgenome.org/aquaculture/. Thanks to collaborative efforts by researchers from the USDA National Center for Cool and Cold Water Aquaculture, new QTL continue to be entered into the QTLdb. In 2013, 39 new QTL data for rainbow trout were curated into the Animal QTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/OM/index). Multi-species: A local copy of Biomart software has been kept up-to-date on the AnimalGenome.ORG server to serve the cattle, chicken, pig, and horse communities (http://www.animalgenome.org:8181/). New data sources and species continue to be updated. Ontology development: This past year we continued to focus on the integration of the Animal Trait Ontology into the Vertebrate Trait Ontology (http://bioportal.bioontology.org/ontologies/1659). We have continued working with the Rat Genome Database to integrate ATO terms that are not applicable to the Vertebrate Trait Ontology into the Clinical Measurement Ontology (http://bioportal.bioontology.org/ontologies/1583). Traits specific to livestock products continue to be incorporated into a Livestock Product Trait Ontology (PT; http://animalgenome.org/cgi-bin/amido/browse.cgi). We have also continued mapping the cattle, pig, chicken, and sheep QTL traits to Vertebrate Trait Ontology (VT), Product Trait Ontology (PT) and Clinical Measurement Ontology (CMO) to help standardize the trait nomenclature used in the QTLdb. A new web page is set up to reflect this development (http://www.animalgenome.org/bioinfo/projects/ato/alt), with new sites at http://www.animalgenome.org/bioinfo/projects/vt/, http://www.animalgenome.org/bioinfo/projects/pt/, and http://www.animalgenome.org/bioinfo/projects/cmo/ respectively. Anyone interested in helping to improve the ATO/VT is encouraged to contact James Reecy (jreecy@iastate.edu), Cari Park (caripark@iastate.edu) or Zhiliang Hu (zhu@iastate.edu). The new VT/PT/CMO cross-mapping has been well employed by the Animal QTLdb and VCMap tools. Finally, we have made plans to expand the livestock breed ontology with updated data from Oklahoma State University, Food and Agriculture Organization, and from China. The chicken adult anatomy is complete, and consists of 2,284 ontology terms cross referenced with the Vertebrate and Uberon Ontologies. The information for these terms includes relationships, synonyms, definitions, and comments (homologies to mammalian structures, species differences). In January 2013, Drs Frances Wong (Roslin Institute) and Fiona McCarthy (University of Arizona) collaborated to begin integration of adult and embryological anatomy terms for the chicken ontology. Dr Wong's visit to the UA was partially supported by a Collaborative Exchange award from the Phenotype Ontology Research Coordination Network (NSF-DEB-0956049). Continuation of this work awaits further funding opportunity. Software development: The NRSP-8 Bioinformatics Online Tool Box has been actively updated (http://www.animalgenome.org/bioinfo/tools/). Software upgrades were made continually to SNPlotz, Gene Ontology CateGOrizer, BEAP, and the Expeditor. As a result of collaborations between Iowa State University, the Medical College of Wisconsin, and University of Iowa, the Virtual Comparative Map (VCMap; http://www.animalgenome.org/VCmap/) tool has passed its initial development stage and is at a stable working status serving the community. Application development, improvement, and testing has continued. Online help materials have been added, including a written user manual and a video tutorial. To improve links between AgBase and the NRSP-8 website, AgBase now also provides a link to the Virtual Comparative Map (VCMap). Please feel free to try things out and send any feedback to vcmap@animalgenome.org. Mailing lists and user forums: We have been hosting a couple dozen mailing lists / web sites for various research groups in the NAGRP community. The most active groups include the AnGenMap (www.animalgenome.org/community/angenmap/), The "CRI-MAP users" (http://www.animalgenome.org/tools/share/crimap/) for user interactions to improve CRI-MAP software), "Sheep Models" (www.animalgenome.org/sheep/community/SheepModels), etc. Upon request from Hasan Khatib (hkhatib@wisc.edu), a new mailing list "EPIgroup" (www.animalgenome.org/community/epigroup/) was set up to promote epigenetics research in livestock species. It currently has 198 members. Upon request from Frank Nicholas (frank.nicholas@sydney.edu.au), a new mailing list "OMIA-Support Group" (www.animalgenome.org/community/omia-support/) was set up to facilitate OMIA development activities. It currently has 80 members. Minimal standards development: We have continued to work on the MIBBI project http://www.mibbi.org/index.php/Main_Page to help define minimal standards for publication of QTL and gene association data (http://miqas.sourceforge.net/). See Taylor et al. (2008) for additional information. Expanded Animal QTLdb functionality: In 2013, a total of 4099 new QTL have been added to the database. Currently, there are 9862 curated porcine QTL, 6305 curated bovine QTL, 3919 curated chicken QTL, 789 curated sheep QTL, and 127 curated rainbow trout QTL in the database (http://www.animalgenome.org/QTLdb/). We are adding Horse QTLdb to the Animal QTLdb family to collect horse QTL/association data. All included livestock QTL data have been ported to NCBI. In 2013, we have worked with UCSC and Ensembl to port the livestock animal QTL data to UCSC Golden tracks and to Ensembl databases. Now users can fully utilize the tools at NCBI, Ensembl, and UCSC to mine animal QTL data. The January 2013 Nucleic Acids Research Database Issue contains a paper describing the latest developments we made on the Animal QTLdb. Facilitating research: The Data Repository for the aquaculture, cattle, chicken, and pig communities to share their genome analysis data has proven to be very useful (http://www.animalgenome.org/repository). New data is continually being added. Frequent data downloads include over 140 data files in 6 different animal species. The newly added data includes rainbow trout genome assembly draft, chicken 60K SNP information, etc. In parallel to the public data repository, the online data file-sharing tool has also been actively used to facilitate data sharing among collaborators and/or groups. Our helpdesk is here to assist community members. Throughout the year, we have helped more than 50 research groups/individuals with their research projects and questions. Our involvement has ranged from data transfer, data assembly, and data analysis, to software applications, code development, etc. Please continue to contact us as you need help with bioinformatic issues.

Impacts

Publications

Log Out ?

Are you sure you want to log out?

Press No if you want to continue work. Press Yes to logout current user.

Report a Bug
Report a Bug

Describe your bug clearly, including the steps you used to create it.