A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

David Oniani; Yanshan Wang

doi:10.1145/3388440.3412413

A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

David Oniani, Yanshan Wang

Digital Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-To-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency-Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.

Original language	English (US)
Title of host publication	Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
Publisher	Association for Computing Machinery, Inc
ISBN (Electronic)	9781450379649
DOIs	https://doi.org/10.1145/3388440.3412413
State	Published - Sep 21 2020
Event	11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020 - Virtual, Online, United States Duration: Sep 21 2020 → Sep 24 2020

Publication series

Name	Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020

Conference

Conference	11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
Country/Territory	United States
City	Virtual, Online
Period	9/21/20 → 9/24/20

Keywords

ai
bert
biobert
cord-19
covid-19
dataset
gpt-2
nlp
semantic similarity
tf-idf
use

ASJC Scopus subject areas

Computer Science Applications
Software
Biomedical Engineering
Health Informatics

Access to Document

10.1145/3388440.3412413

Cite this

Oniani, D., & Wang, Y. (2020). A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020 (Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020). Association for Computing Machinery, Inc. https://doi.org/10.1145/3388440.3412413

A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. / Oniani, David; Wang, Yanshan.
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020. Association for Computing Machinery, Inc, 2020. (Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Oniani, D & Wang, Y 2020, A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. in Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020. Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020, Association for Computing Machinery, Inc, 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020, Virtual, Online, United States, 9/21/20. https://doi.org/10.1145/3388440.3412413

Oniani D, Wang Y. A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020. Association for Computing Machinery, Inc. 2020. (Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020). doi: 10.1145/3388440.3412413

Oniani, David ; Wang, Yanshan. / A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19. Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020. Association for Computing Machinery, Inc, 2020. (Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020).

@inproceedings{c0c4f16b14c1437486675ae6059bd361,

title = "A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19",

abstract = "COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-To-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency-Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.",

keywords = "ai, bert, biobert, cord-19, covid-19, dataset, gpt-2, nlp, semantic similarity, tf-idf, use",

author = "David Oniani and Yanshan Wang",

note = "Publisher Copyright: {\textcopyright} 2020 ACM.; 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020 ; Conference date: 21-09-2020 Through 24-09-2020",

year = "2020",

month = sep,

day = "21",

doi = "10.1145/3388440.3412413",

language = "English (US)",

series = "Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020",

publisher = "Association for Computing Machinery, Inc",

booktitle = "Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020",

}

TY - GEN

T1 - A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

AU - Oniani, David

AU - Wang, Yanshan

PY - 2020/9/21

Y1 - 2020/9/21

N2 - COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-To-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency-Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.

AB - COVID-19 (2019 Novel Coronavirus) has resulted in an ongoing pandemic and as of 26 July 2020, has caused more than 15.7 million cases and over 640,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-To-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf (Term Frequency-Inverse Document Frequency), Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT), and Universal Sentence Encoder (USE) to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online and made its source code available free of charge to anyone interested in running it locally, online, or just for experimental purposes. Overall, our work has yielded significant results in both designing a chatbot that produces high-quality responses to COVID-19-related questions and comparing several embedding generation techniques.

KW - ai

KW - bert

KW - biobert

KW - cord-19

KW - covid-19

KW - dataset

KW - gpt-2

KW - nlp

KW - semantic similarity

KW - tf-idf

KW - use

UR - http://www.scopus.com/inward/record.url?scp=85097002603&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85097002603&partnerID=8YFLogxK

U2 - 10.1145/3388440.3412413

DO - 10.1145/3388440.3412413

M3 - Conference contribution

AN - SCOPUS:85097002603

T3 - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020

BT - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020

PB - Association for Computing Machinery, Inc

T2 - 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020

Y2 - 21 September 2020 through 24 September 2020

ER -

A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this