Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction

Research output: Contribution to journalConference articlepeer-review

Abstract

Accurately extracting clinical information from speech is critical to the diagnosis and treatment of many neurological conditions.As such, there is interest in leveraging AI for automatic, objective assessments of clinical speech to facilitate diagnosis and treatment of speech disorders.We explore transfer learning using foundation models, focusing on the impact of layer selection for the downstream task of predicting pathological speech features.We find that selecting an optimal layer can greatly improve performance (∼15.8% increase in balanced accuracy per feature as compared to worst layer, ∼13.6% increase as compared to final layer), though the best layer varies by predicted feature and does not always generalize well to unseen data.A learned weighted sum offers comparable performance to the average best layer in-distribution (only ∼1.2% lower) and had strong generalization for out-of-distribution data (only 1.5% lower than the average best layer).

Original languageEnglish (US)
Pages (from-to)4618-4622
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: Sep 1 2024Sep 5 2024

Keywords

  • foundation models
  • latent representations
  • layer analysis
  • pathological speech
  • transfer learning

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction'. Together they form a unique fingerprint.

Cite this