TY - GEN
T1 - Estimating disease burden using google trends and Wikipedia data
AU - Qiu, Riyi
AU - Hadzikadic, Mirsad
AU - Yao, Lixia
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.
AB - Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.
KW - Disease burden
KW - Least absolute shrinkage and selection operator (LASSO)
KW - Page review
KW - Prevalence
KW - Search query volume
KW - Treatment cost
UR - http://www.scopus.com/inward/record.url?scp=85026287521&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026287521&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-60045-1_39
DO - 10.1007/978-3-319-60045-1_39
M3 - Conference contribution
AN - SCOPUS:85026287521
SN - 9783319600444
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 374
EP - 385
BT - Advances in Artificial Intelligence
A2 - Ali, Moonis
A2 - Benferhat, Salem
A2 - Tabia, Karim
PB - Springer Verlag
T2 - 30th International Conference on Industrial, Engineering, and Other Applications of Applied Intelligent Systems, IEA/AIE 2017
Y2 - 27 June 2017 through 30 June 2017
ER -