Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use

Nestor Alvaro, Mike Conway, Son Doan, Christoph Lofi, John Overington, Nigel Collier

Research output: Contribution to journalArticlepeer-review

51 Scopus citations


Self-reported patient data has been shown to be a valuable knowledge source for post-market pharmacovigilance. In this paper we propose using the popular micro-blogging service Twitter to gather evidence about adverse drug reactions (ADRs) after firstly having identified micro-blog messages (also know as "tweets") that report first-hand experience. In order to achieve this goal we explore machine learning with data crowdsourced from laymen annotators. With the help of lay annotators recruited from CrowdFlower we manually annotated 1548 tweets containing keywords related to two kinds of drugs: SSRIs (eg. Paroxetine), and cognitive enhancers (eg. Ritalin). Our results show that inter-annotator agreement (Fleiss' kappa) for crowdsourcing ranks in moderate agreement with a pair of experienced annotators (Spearman's Rho. =. 0.471). We utilized the gold standard annotations from CrowdFlower for automatically training a range of supervised machine learning models to recognize first-hand experience. F-Score values are reported for 6 of these techniques with the Bayesian Generalized Linear Model being the best (. F-Score. =. 0.64 and Informedness. =. 0.43) when combined with a selected set of features obtained by using information gain criteria.

Original languageEnglish
Pages (from-to)280-287
Number of pages8
JournalJournal of Biomedical Informatics
StatePublished - Dec 1 2015
Externally publishedYes


  • Crowdsourcing
  • Natural language processing
  • Pharmacovigilance
  • Twitter


Dive into the research topics of 'Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use'. Together they form a unique fingerprint.

Cite this