Demographics identification: Variable extraction resource (DIVER)

Alexander Hsieh, Son Doan, Michael Conway, Ko Wei Lin, Hyeoneui Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Lack of standardization in representing phenotype data generated in different studies is a major barrier to data reuse for cross study analyses. To address this issue, we developed DIVER, a tool that identifies and standardizes demographic variables in dbGaP, based on simple natural language processing and standardized terminology mapping. In its evaluation using variables (N=3,565) from a range of pulmonary studies in dbGaP, DIVER proved to be an effective approach to standardizing dbGaP variables by successfully identifying demographic variables with high rates of recall and precision (98% and 94%, respectively). In addition, DIVER correctly modeled 79% of the identified demographic variables at the core semantic level. Examination of variables that DIVER could not handle shed light on where our tool needs enhancement so it can further improve its semantic modeling accuracy. DIVER is an important component of a system for phenotype discovery in dbGaP studies.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012
Pages40-49
Number of pages10
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012 - San Diego, CA, United States
Duration: Sep 27 2012Sep 28 2012

Publication series

NameProceedings - 2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012

Conference

Conference2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012
Country/TerritoryUnited States
CitySan Diego, CA
Period09/27/1209/28/12

Keywords

  • data reuse
  • data standardization
  • dbGaP
  • phenotype variables

Fingerprint

Dive into the research topics of 'Demographics identification: Variable extraction resource (DIVER)'. Together they form a unique fingerprint.

Cite this