Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

Cynthia S. Crowson, Tina M. Gunderson, John M. Davis, Elena Myasoedova, Vanessa L. Kronzer, Caitrin M. Coffey, Elizabeth J. Atkinson

Research output: Contribution to journalArticlepeer-review


Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA. Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis. Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar. Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.

Original languageEnglish (US)
Pages (from-to)210-219
Number of pages10
JournalArthritis Care and Research
Issue number2
StatePublished - Feb 2023

ASJC Scopus subject areas

  • Rheumatology


Dive into the research topics of 'Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis'. Together they form a unique fingerprint.

Cite this