TY - JOUR
T1 - American academy of Orthopedic Surgeons' OrthoInfo provides more readable information regarding meniscus injury than ChatGPT-4 while information accuracy is comparable
AU - Bohn, Camden
AU - Hand, Catherine
AU - Tannir, Shadia
AU - Ulrich, Marisa
AU - Saniei, Sami
AU - Girod-Hoffman, Miguel
AU - Lu, Yining
AU - Krych, Aaron
AU - Forsythe, Brian
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/4
Y1 - 2025/4
N2 - Introduction: Over 61% of Americans seek health information online, often using artificial intelligence (AI) tools like ChatGPT. However, concerns persist about the readability and accessibility of AI-generated content, especially for individuals with varying health literacy levels. This study compares the readability and accuracy of ChatGPT responses on meniscus injuries with those from the American Academy of Orthopaedic Surgeons' OrthoInfo website, which is tailored for patient education. We hypothesize that while ChatGPT offers accurate information, its readability will be lower than that of OrthoInfo. Methods: Seven frequently asked questions about meniscus injuries were used to compare responses from ChatGPT-4 and OrthoInfo. Readability was assessed using multiple calculators (Flesch-Kincaid, Gunning fog index, Coleman-Liau, Simple Measure of Gobbledygook Readability Formula, FORCAST Readability Formula, Fry graph, and Raygor Readability Estimate), and accuracy was evaluated by three independent reviewers on a 4-point scale. Statistical analysis included independent t-tests to compare readability and accuracy between the two sources. Results: ChatGPT responses required a significantly higher education level to comprehend, with an average reading grade level of 13.8 compared to 9.8 for OrthoInfo (p < 0.01). The Flesch Reading Ease Index also indicated lower readability for ChatGPT (32.0 vs. 59.9, p < 0.01). However, both ChatGPT and OrthoInfo responses were highly accurate, with all but one ChatGPT response receiving the highest accuracy rating of 4. The response to physical exam findings was less accurate (3.3 vs. 3.6, p = 0.52). Conclusion: While AI-generated responses were accurate, their readability made them less accessible than OrthoInfo, which is designed for a broad audience. This study underscores the importance of clear, accessible information for meniscal injuries and suggests that AI tools should incorporate readability metrics to enhance patient comprehension. Despite the potential of AI, resources like OrthoInfo remain essential for effectively communicating health information to the public. Level of evidence: IV.
AB - Introduction: Over 61% of Americans seek health information online, often using artificial intelligence (AI) tools like ChatGPT. However, concerns persist about the readability and accessibility of AI-generated content, especially for individuals with varying health literacy levels. This study compares the readability and accuracy of ChatGPT responses on meniscus injuries with those from the American Academy of Orthopaedic Surgeons' OrthoInfo website, which is tailored for patient education. We hypothesize that while ChatGPT offers accurate information, its readability will be lower than that of OrthoInfo. Methods: Seven frequently asked questions about meniscus injuries were used to compare responses from ChatGPT-4 and OrthoInfo. Readability was assessed using multiple calculators (Flesch-Kincaid, Gunning fog index, Coleman-Liau, Simple Measure of Gobbledygook Readability Formula, FORCAST Readability Formula, Fry graph, and Raygor Readability Estimate), and accuracy was evaluated by three independent reviewers on a 4-point scale. Statistical analysis included independent t-tests to compare readability and accuracy between the two sources. Results: ChatGPT responses required a significantly higher education level to comprehend, with an average reading grade level of 13.8 compared to 9.8 for OrthoInfo (p < 0.01). The Flesch Reading Ease Index also indicated lower readability for ChatGPT (32.0 vs. 59.9, p < 0.01). However, both ChatGPT and OrthoInfo responses were highly accurate, with all but one ChatGPT response receiving the highest accuracy rating of 4. The response to physical exam findings was less accurate (3.3 vs. 3.6, p = 0.52). Conclusion: While AI-generated responses were accurate, their readability made them less accessible than OrthoInfo, which is designed for a broad audience. This study underscores the importance of clear, accessible information for meniscal injuries and suggests that AI tools should incorporate readability metrics to enhance patient comprehension. Despite the potential of AI, resources like OrthoInfo remain essential for effectively communicating health information to the public. Level of evidence: IV.
KW - Accuracy
KW - Artificial intelligence
KW - ChatGPT
KW - Health literacy
KW - Meniscus injury
KW - Readability
UR - http://www.scopus.com/inward/record.url?scp=105000594858&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105000594858&partnerID=8YFLogxK
U2 - 10.1016/j.jisako.2025.100843
DO - 10.1016/j.jisako.2025.100843
M3 - Article
C2 - 39988021
AN - SCOPUS:105000594858
SN - 2059-7754
VL - 11
JO - Journal of ISAKOS
JF - Journal of ISAKOS
M1 - 100843
ER -