TY - GEN
T1 - Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health
AU - Liyanage, Chandreen
AU - Garg, Muskan
AU - Mago, Vijay
AU - Sohn, Sunghwan
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD)1 manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the prescreening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew’s Correlation Coefficient for upto 13.11% and 15.95%, respectively.
AB - Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD)1 manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the prescreening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew’s Correlation Coefficient for upto 13.11% and 15.95%, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85174509749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174509749&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85174509749
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 306
EP - 312
BT - BioNLP 2023 - BioNLP and BioNLP-ST, Proceedings of the Workshop
A2 - Demner-fushman, Dina
A2 - Ananiadou, Sophia
A2 - Cohen, Kevin
PB - Association for Computational Linguistics (ACL)
T2 - 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, BioNLP 2023
Y2 - 13 July 2023
ER -