TY - JOUR
T1 - Using Naïve Bayesian Analysis to Determine Imaging Characteristics of KRAS Mutations in Metastatic Colon Cancer
AU - Pershad, Yash
AU - Govindan, Siddharth
AU - Hara, Amy K.
AU - Borad, Mitesh J.
AU - Bekaii-Saab, Tanios
AU - Wallace, Alex
AU - Albadawi, Hassan
AU - Oklu, Rahmi
N1 - Publisher Copyright:
© 2017 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2017/9
Y1 - 2017/9
N2 - Genotype, particularly Ras status, greatly affects prognosis and treatment of liver metastasis in colon cancer patients. This pilot aimed to apply word frequency analysis and a naive Bayes classifier on radiology reports to extract distinguishing imaging descriptors of wild-type colon cancer patients and those with v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations. In this institutional-review-board-approved study, we compiled a SNaPshot mutation analysis dataset from 457 colon adenocarcinoma patients. From this cohort of patients, we analyzed radiology reports of 299 patients (> 32,000 reports) who either were wild-type (147 patients) or had a KRAS (152 patients) mutation. Our algorithm determined word frequency within the wild-type and mutant radiology reports and used a naive Bayes classifier to determine the probability of a given word belonging to either group. The classifier determined that words with a greater than 50% chance of being in the KRAS mutation group and which had the highest absolute probability difference compared to the wild-type group included: “several”, “innumerable”, “confluent”, and “numerous” (p < 0.01). In contrast, words with a greater than 50% chance of being in the wild type group and with the highest absolute probability difference included: “few”, “discrete”, and “[no] recurrent” (p = 0.03). Words used in radiology reports, which have direct implications on disease course, tumor burden, and therapy, appear with differing frequency in patients with KRAS mutations versus wild-type colon adenocarcinoma. Moreover, likely characteristic imaging traits of mutant tumors make probabilistic word analysis useful in identifying unique characteristics and disease course, with applications ranging from radiology and pathology reports to clinical notes.
AB - Genotype, particularly Ras status, greatly affects prognosis and treatment of liver metastasis in colon cancer patients. This pilot aimed to apply word frequency analysis and a naive Bayes classifier on radiology reports to extract distinguishing imaging descriptors of wild-type colon cancer patients and those with v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations. In this institutional-review-board-approved study, we compiled a SNaPshot mutation analysis dataset from 457 colon adenocarcinoma patients. From this cohort of patients, we analyzed radiology reports of 299 patients (> 32,000 reports) who either were wild-type (147 patients) or had a KRAS (152 patients) mutation. Our algorithm determined word frequency within the wild-type and mutant radiology reports and used a naive Bayes classifier to determine the probability of a given word belonging to either group. The classifier determined that words with a greater than 50% chance of being in the KRAS mutation group and which had the highest absolute probability difference compared to the wild-type group included: “several”, “innumerable”, “confluent”, and “numerous” (p < 0.01). In contrast, words with a greater than 50% chance of being in the wild type group and with the highest absolute probability difference included: “few”, “discrete”, and “[no] recurrent” (p = 0.03). Words used in radiology reports, which have direct implications on disease course, tumor burden, and therapy, appear with differing frequency in patients with KRAS mutations versus wild-type colon adenocarcinoma. Moreover, likely characteristic imaging traits of mutant tumors make probabilistic word analysis useful in identifying unique characteristics and disease course, with applications ranging from radiology and pathology reports to clinical notes.
KW - Machine learning
KW - Natural language processing
KW - Naïve Bayesian classification
KW - RAS mutation
KW - Radiogenomics
UR - http://www.scopus.com/inward/record.url?scp=85064999864&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064999864&partnerID=8YFLogxK
U2 - 10.3390/diagnostics7030050
DO - 10.3390/diagnostics7030050
M3 - Article
AN - SCOPUS:85064999864
SN - 2075-4418
VL - 7
JO - Diagnostics
JF - Diagnostics
IS - 3
M1 - 50
ER -