Abstract:
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. © The Author(s) 2016.
Registro:
Documento: |
Artículo
|
Título: | The importance of digitized biocollections as a source of trait data and a new VertNet resource |
Autor: | Guralnick, R.P.; Zermoglio, P.F.; Wieczorek, J.; LaFrance, R.; Bloom, D.; Russell, L. |
Filiación: | University of Florida Museum of Natural History, University of Florida at Gainesville, Gainesville, FL, United States Departamento de Ecología, Genética y Evolución, Instituto IEGEBA (CONICET-UBA), Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS, Université François Rabelais, Tours, France Museum of Vertebrate Zoology, University of California, Berkeley, CA, United States Biodiversity Institute University of Kansas, Lawrence, KS, United States
|
Palabras clave: | animal; DNA sequence; genetic database; genetic variation; human; procedures; quantitative trait locus; software; Animals; Databases, Genetic; Genetic Variation; Humans; Quantitative Trait Loci; Sequence Analysis, DNA; Software |
Año: | 2016
|
Volumen: | 2016
|
DOI: |
http://dx.doi.org/10.1093/database/baw158 |
Título revista: | Database
|
Título revista abreviado: | Database
|
ISSN: | 17580463
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_17580463_v2016_n_p_Guralnick |
Referencias:
- Heim, N.A., Knope, M.L., Schaal, E.K., Cope's rule in the evolution of marine animals (2015) Science, 347, pp. 867-870
- Kraft, N.J.B., Ackerly, D.D., Functional trait and phylogenetic tests of community assembly across spatial scales in an Amazonian forest (2010) Ecol. Monogr., 80, pp. 401-422
- Kunstler, G., Falster, D., Coomes, D.A., Plant functional traits have globally consistent effects on competition (2015) Nature, 529, pp. 204-207
- Kattge, J., Díaz, S., Lavorel, S., TRY - A global database of plant traits (2011) Global Change Biol., 17, pp. 2905-2935
- Violle, C., Reich, P.B., Pacala, S.W., The emergence and promise of functional biogeography (2014) Proc. Natl Acad. Sci. U. S. A., 111, pp. 13690-13696
- Wilman, H., Belmaker, J., Simpson, J., EltonTraits 1.0: Species-level foraging attributes of the world's birds and mammals (2014) Ecology, 95, p. 2027
- Kattge, J., Ogle, K., Bönisch, G., A generic structure for plant trait databases (2011) Methods Ecol. Evol., 2, pp. 202-213
- Jetz, W., Cavender-Bares, J., Pavlick, R., Monitoring plant functional diversity from space (2016) Nat. Plants, 2, p. 16024
- Grimm, A., Prieto Ramírez, A.M., Moulherat, S., Life-history trait database of European reptile species (2014) Nat. Conserv., 9, pp. 45-67
- Jones, K.E., Bielby, J., Cardillo, M., PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals (2009) Ecology, 90, p. 2648
- Bolnick, D.I., Amarasekare, P., Araújo, M.S., Why intraspecific trait variation matters in community ecology (2011) Trends Ecol. Evol., 26, pp. 183-192
- Moran, E.V., Hartig, F., Bell, D.M., Intraspecific trait variation across scales: Implications for understanding global change responses (2015) Global Change Biol., 22, pp. 137-150
- Kostikova, A., Silvestro, D., Pearman, P.B., Salamin, N., Bridging inter- and intraspecific trait evolution with a hierarchical Bayesian approach (2016) Syst. Biol., 65, pp. 417-431
- Beaman, R.S., Cellinese, N., Mass digitization of scientific collections: New opportunities to transform the use of biological specimens and underwrite biodiversity science (2012) ZooKeys, 209, pp. 7-17
- Constable, H., Guralnick, R., Wieczorek, J., VertNet: A new model for biodiversity data sharing (2010) PLoS Biol, p. 8. , http://www.hubmed.org/display.cgi?uids=20169109
- Guralnick, R., Hill, A., Biodiversity informatics: Automated approaches for documenting global biodiversity patterns and processes (2009) Bioinformatics, 25, pp. 421-428. , http://www.hubmed.org/display.cgi?uids=19129210
- Vollmar, A., Macklin, J.A., Ford, L.S., Natural history specimen digitization: Challenges and concerns (2010) Biodivers. Inform., 7, pp. 93-112
- Guralnick, R.P., Cellinese, N., Deck, J., Community next steps for making globally unique identifiers work for biocollections data" (2015) ZooKeys, 494, pp. 133-154
- Wieczorek, J., Bloom, D., Guralnick, R., Darwin core: An evolving community-developed biodiversity data standard (2012) PLoS One, 7, p. e29715
- Zermoglio, P.F., Guralnick, R.P., Wieczorek, J.R., A standardized reference data set for vertebrate taxon name resolution (2016) PloS One, 11, p. e0146894
- Holmes, M.W., Hammond, T.T., Wogan, G.O.U., Natural history collections as windows on evolutionary processes (2016) Mol. Ecol., 25, pp. 864-881
- La Salle, J., Williams, K.J., Craig Moritz, C., Biodiversity analysis in the digital era (2016) Philos. Trans. R. Soc. Lond. B Biol. Sci., 371. , pii: 20150337
- Owens, I., Unlocking the vault: Digitizing collections to understand global biodiversity (2015) 2015 AAAS Annual Meeting, , https://aaas.confex.com/aaas/2015/webprogram/Paper14288.html, (12-16 February 2015)
- Page, L.M., MacFadden, B.J., Fortes, J.A., Digitization of biodiversity collections reveals biggest data on biodiversity (2015) BioScience, 65, pp. 841-842
- Guralnick, R., Zermoglio, P., Wieczorek, J., (2016) CyVerse Data Commons, , http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/VertNet_Traits, (5 November 2016, date last accessed)
- Peters, R.H., (1986) The Ecological Implications of Body Size, , https://books.google.com/books?hl=en&lr=&id=OYVxiZgTXWsC&pgis=1, Cambridge University Press, Cambridge
- Savage, V.M., Gillooly, J.F., Brown, J.H., Effects of body size and temperature on population growth (2004) Am. Nat., 163, pp. 429-441
- Woodward, G., Ebenman, B., Emmerson, M., Body size in ecological networks (2005) Trends Ecol. Evol., 20, pp. 402-409
- Park, C.A., Bello, S.M., Smith, C.L., The vertebrate trait ontology: A controlled vocabulary for the annotation of trait data across species (2013) J. Biomed. Seman., 4, p. 13
- Dodd, C.K., (2010) Amphibian Ecology and Conservation: A Handbook of Techniques, , https://books.google.com/books/about/Amphibian_Ecology_and_Conservation.html?id=04gfAwAAQBAJ&pgis=1, OUP, Oxford
- Hile, R., Standardization of methods of expressing lengths and weights of fish (1948) Trans. Am. Fish. Soc., 75, pp. 157-164
- (2016) Jupyter Notebook, , http://jupyter.org/, (18 September 2016, date last accessed)
- Fawcett, T., An introduction to ROC analysis (2006) Patt. Recog. Lett., 27, pp. 861-874
- Powers, D., Evaluation: From precision, recall and f-measure to roc., informedness, markedness and correlation (2011) J. Mach. Learn. Technol., 2, pp. 37-63
- Bloom, D., Wieczorek, J., Russell, L., (2016) VertNet-Amphibia-Oct2015. CyVerse Data Commons, , http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/VertNet_Amphibia_Oct2015, (18 September 2016, date last accessed)
- Bloom, D., Wieczorek, J., Russell, L., (2016) VertNet-Aves-Oct2015. CyVerse Data Commons, , http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/VertNet_Aves_Oct2015, (18 September 2016, date last accessed)
- Bloom, D., Wieczorek, J., Russell, L., (2016) VertNet-Fishes-Oct2015. CyVerse Data Commons, , http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/VertNet_Fishes_Oct2015, (18 September 2016, date last accessed)
- Bloom, D., Wieczorek, J., Russell, L., (2016) VertNet-Mammalia-Oct2015. CyVerse Data Commons, , http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/VertNet_Mammalia_Oct2015, (18 September 2016, date last accessed)
- Bloom, D., Wieczorek, J., Russell, L., (2016) VertNet-Reptilia-Oct2015. CyVerse Data Commons, , http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/VertNet_Reptilia_Oct2015, (18 September 2016, date last accessed)
- Foster, M.S., Cannell, P., Bird specimens and documentation: Critical data for a critical resource (1990) Condor, 92, pp. 277-283. , https://pubs.er.usgs.gov/publication/5223615
- Deans, A.R., Yoder, M.J., Balhoff, J.P., Time to change how we describe biodiversity (2012) Trends Ecol. Evol., 27, pp. 78-84
- Pafilis, E., Buttigieg, P.L., Ferrell, B., EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation (2016) Database: The Journal of Biol. Datab. Curat., 2016, p. baw005
- Mungall, C.J., Washington, N.L., Nguyen-Xuan, J., Use of model organism and disease databases to support matchmaking for human disease gene discovery (2015) Hum. Mutat., 36, pp. 979-984
- Nath, C., Albaghdadi, M.S., Jonnalagadda, S.R., A natural language processing tool for large-scale data extraction from echocardiography reports (2016) PLOS One, 11, p. e0153749
- Rak, R., Rowley, A., Black, W., Ananiadou, S., Argo: An integrative, interactive, text mining-based workbench supporting curation (2012) Database, 2012, p. bas010
- Martin, R.E., Pine, R.H., (2001) A Manual of Mammalogy with Keys to Families of the World, , 3rd edn. McGraw-Hill, Dubuque, Iowa
- Ananiadou, S., Kell, D.B., Tsujii, J., Text mining and its potential applications in systems biology (2006) Trends Biotechnol., 24, pp. 571-579
- Pereira, H.M., Ferrier, S., Walters, M., Ecology. Essential biodiversity variables (2013) Science (New York, N.Y.), 339, pp. 277-278
- Edwards, J.L., Research and societal benefits of the global biodiversity information facility (2004) BioScience, 54, p. 486
- Vos, R.A., Biserkov, J.V., Balech, B., Enriched biodiversity data as a resource and service (2014) Biodivers. Data J., 2, p. e1125
- Cui, H., Semantic annotation of morphological descriptions: An overall strategy (2010) BMC Bioinformatics, 11, p. 278
- Thessen, A.E., Parr, C.S., Knowledge extraction and semantic annotation of text from the encyclopedia of life (2014) PLoS One, 9, p. e89550
- Mungall, C.J., Torniai, C., Gkoutos, G.V., Uberon, an integrative multi-species anatomy ontology (2012) Genome Biol., 13, p. R5
- Buttigieg, P.L., Morrison, N., Smith, B., Consortium, E.N.V.O., The environment ontology: Contextualising biological and biomedical entities (2013) J. Biomed. Semant., 4, p. 43
- Buttigieg, P.L., Pafilis, E., Lewis, S.E., The environment ontology in 2016: Bridging domains with increased scope, semantic density, and interoperation (2016) J. Biomed. Semant., 7, p. 57
- Deans, A.R., Lewis, S.E., Huala, E., (2015) Finding Our Way Through Phenotypes, 13, pp. 1-9
- Parr, C.S., Guralnick, R., Cellinese, N., Page, R.D.M., Evolutionary informatics: Unifying knowledge about the diversity of life (2012) Trends Ecol. Evol., 27, pp. 94-103
- Page, R.D.M., Biodiversity informatics: The challenge of linking data and the role of shared identifiers (2008) Brief. Bioinformatics, 9, pp. 345-354
- Hardisty, A., Roberts, D., Addink, W., A decadal view of biodiversity informatics: Challenges and priorities (2013) BMC Ecol., 13, p. 16
- Parr, C.S., Wilson, N., Schulz, K.S., TraitBank: Practical semantics for organism attribute data. Special Issue on Semantics for Biodiversity (2015) Semantic Web, pp. 1-12. , Preprint
Citas:
---------- APA ----------
Guralnick, R.P., Zermoglio, P.F., Wieczorek, J., LaFrance, R., Bloom, D. & Russell, L.
(2016)
. The importance of digitized biocollections as a source of trait data and a new VertNet resource. Database, 2016.
http://dx.doi.org/10.1093/database/baw158---------- CHICAGO ----------
Guralnick, R.P., Zermoglio, P.F., Wieczorek, J., LaFrance, R., Bloom, D., Russell, L.
"The importance of digitized biocollections as a source of trait data and a new VertNet resource"
. Database 2016
(2016).
http://dx.doi.org/10.1093/database/baw158---------- MLA ----------
Guralnick, R.P., Zermoglio, P.F., Wieczorek, J., LaFrance, R., Bloom, D., Russell, L.
"The importance of digitized biocollections as a source of trait data and a new VertNet resource"
. Database, vol. 2016, 2016.
http://dx.doi.org/10.1093/database/baw158---------- VANCOUVER ----------
Guralnick, R.P., Zermoglio, P.F., Wieczorek, J., LaFrance, R., Bloom, D., Russell, L. The importance of digitized biocollections as a source of trait data and a new VertNet resource. Database. 2016;2016.
http://dx.doi.org/10.1093/database/baw158