TY - JOUR
T1 - How do you #relax when you're #stressed? A content analysis and infodemiology study of stress-related tweets
AU - Doan, Son
AU - Ritchart, Amanda
AU - Perry, Nicholas
AU - Chaparro, Juan D.
AU - Conway, Mike
N1 - Funding Information:
SD and AR were partially supported by NIH grant U54HL108460. NP and MC were partially supported by NIH grant R00LM011393. JDC were partially supported by the NLM Medical Informatics Training Grant 5T15LM011271-04. We would like to thank Mr Gregory Stoddard, MPH, MBA at the University of Utah's Division of Epidemiology for his valuable comments on an earlier version of this manuscript.
Publisher Copyright:
© Son Doan, Amanda Ritchart, Nicholas Perry, Juan D Chaparro, Mike Conway. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 13.06.2017. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/)
PY - 2017/4
Y1 - 2017/4
N2 - Background: Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation is often recommended in mental health treatment as a frontline strategy to reduce stress, thereby improving health conditions. Twitter is a microblog platform that allows users to post their own personal messages (tweets), including their expressions about feelings and actions related to stress and stress management (eg, relaxing). While Twitter is increasingly used as a source of data for understanding mental health from a population perspective, the specific issue of stress-as manifested on Twitter-has not yet been the focus of any systematic study. Objective: The objective of our study was to understand how people express their feelings of stress and relaxation through Twitter messages. In addition, we aimed at investigating automated natural language processing methods to (1) classify stress versus nonstress and relaxation versus nonrelaxation tweets, and (2) identify first-hand experience-that is, who is the experiencer-in stress and relaxation tweets. Methods: We first performed a qualitative content analysis of 1326 and 781 tweets containing the keywords “stress” and “relax,” respectively. We then investigated the use of machine learning algorithms-in particular naive Bayes and support vector machines-to automatically classify tweets as stress versus nonstress and relaxation versus nonrelaxation. Finally, we applied these classifiers to sample datasets drawn from 4 cities in the United States (Los Angeles, New York, San Diego, and San Francisco) obtained from Twitter's streaming application programming interface, with the goal of evaluating the extent of any correlation between our automatic classification of tweets and results from public stress surveys. Results: Content analysis showed that the most frequent topic of stress tweets was education, followed by work and social relationships. The most frequent topic of relaxation tweets was rest & vacation, followed by nature and water. When we applied the classifiers to the cities dataset, the proportion of stress tweets in New York and San Diego was substantially higher than that in Los Angeles and San Francisco. In addition, we found that characteristic expressions of stress and relaxation varied for each city based on its geolocation. Conclusions: This content analysis and infodemiology study revealed that Twitter, when used in conjunction with natural language processing techniques, is a useful data source for understanding stress and stress management strategies, and can potentially supplement infrequently collected survey-based stress data.
AB - Background: Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation is often recommended in mental health treatment as a frontline strategy to reduce stress, thereby improving health conditions. Twitter is a microblog platform that allows users to post their own personal messages (tweets), including their expressions about feelings and actions related to stress and stress management (eg, relaxing). While Twitter is increasingly used as a source of data for understanding mental health from a population perspective, the specific issue of stress-as manifested on Twitter-has not yet been the focus of any systematic study. Objective: The objective of our study was to understand how people express their feelings of stress and relaxation through Twitter messages. In addition, we aimed at investigating automated natural language processing methods to (1) classify stress versus nonstress and relaxation versus nonrelaxation tweets, and (2) identify first-hand experience-that is, who is the experiencer-in stress and relaxation tweets. Methods: We first performed a qualitative content analysis of 1326 and 781 tweets containing the keywords “stress” and “relax,” respectively. We then investigated the use of machine learning algorithms-in particular naive Bayes and support vector machines-to automatically classify tweets as stress versus nonstress and relaxation versus nonrelaxation. Finally, we applied these classifiers to sample datasets drawn from 4 cities in the United States (Los Angeles, New York, San Diego, and San Francisco) obtained from Twitter's streaming application programming interface, with the goal of evaluating the extent of any correlation between our automatic classification of tweets and results from public stress surveys. Results: Content analysis showed that the most frequent topic of stress tweets was education, followed by work and social relationships. The most frequent topic of relaxation tweets was rest & vacation, followed by nature and water. When we applied the classifiers to the cities dataset, the proportion of stress tweets in New York and San Diego was substantially higher than that in Los Angeles and San Francisco. In addition, we found that characteristic expressions of stress and relaxation varied for each city based on its geolocation. Conclusions: This content analysis and infodemiology study revealed that Twitter, when used in conjunction with natural language processing techniques, is a useful data source for understanding stress and stress management strategies, and can potentially supplement infrequently collected survey-based stress data.
KW - Machine learning
KW - Natural language processing
KW - Relaxation
KW - Social media
KW - Stress
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85051008525&partnerID=8YFLogxK
U2 - 10.2196/publichealth.5939
DO - 10.2196/publichealth.5939
M3 - Article
AN - SCOPUS:85051008525
VL - 3
JO - JMIR Public Health and Surveillance
JF - JMIR Public Health and Surveillance
SN - 2369-2960
IS - 2
M1 - e35
ER -