TY - JOUR
T1 - Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES)
T2 - Architecture, component evaluation and applications
AU - Savova, Guergana K.
AU - Masanz, James J.
AU - Ogren, Philip V.
AU - Zheng, Jiaping
AU - Sohn, Sunghwan
AU - Kipper-Schuler, Karin C.
AU - Chute, Christopher G.
PY - 2010/9
Y1 - 2010/9
N2 - We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies - the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
AB - We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies - the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.
UR - http://www.scopus.com/inward/record.url?scp=78149490620&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78149490620&partnerID=8YFLogxK
U2 - 10.1136/jamia.2009.001560
DO - 10.1136/jamia.2009.001560
M3 - Article
C2 - 20819853
AN - SCOPUS:78149490620
SN - 1067-5027
VL - 17
SP - 507
EP - 513
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 5
ER -