TY - JOUR
T1 - Estimating disease burden using Internet data
AU - Qiu, Riyi
AU - Hadzikadic, Mirsad
AU - Yu, Sha
AU - Yao, Lixia
N1 - Publisher Copyright:
© The Author(s) 2018.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.
AB - Data on disease burden are often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage and social media data, specifically the search volume on Google, page view count on Wikipedia, and disease mentioning frequency on Twitter, correlated with the disease burden, measured by prevalence and treatment cost, for 1633 diseases over an 11-year period. We also applied least absolute shrinkage and selection operator to predict the burden of diseases. We found that Google search volume is relatively strongly correlated with the burdens for 39 of 1633 diseases, including viral hepatitis, diabetes mellitus, multiple sclerosis, and hemorrhoids. Wikipedia and Twitter data strongly correlated with the burdens of 15 and 7 diseases, respectively. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and the presence of stigma.
KW - Google search
KW - Twitter
KW - Wikipedia
KW - data mining
KW - disease burden
KW - least absolute shrinkage and selection operator
KW - prevalence
KW - treatment cost
UR - http://www.scopus.com/inward/record.url?scp=85058943364&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058943364&partnerID=8YFLogxK
U2 - 10.1177/1460458218810743
DO - 10.1177/1460458218810743
M3 - Article
C2 - 30488754
AN - SCOPUS:85058943364
SN - 1460-4582
VL - 25
SP - 1863
EP - 1877
JO - Health Informatics Journal
JF - Health Informatics Journal
IS - 4
ER -