Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model

Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R.J. Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech Model (USM) that is trained on 12 million hours of diverse audio recordings. Our model compresses long sequences into a small set of class-specific latent representations and a factorized projection is used to predict different attributes of the disordered input speech. The benefit of our approach is that it allows us to model different regions of the input for different classes and is at the same time data efficient. We evaluated the proposed model extensively on a curated corpus from the Mayo Clinic. Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%. With limited task-specific data, we find that pretraining is important and surprisingly pretraining with the un-related automatic speech recognition (ASR) task is also beneficial. Encodings from the middle layers provide a mix of both acoustic and phonetic information and achieve best prediction results compared to just using the final layer encodings (83.1% vs 79.6%). The results are promising and with further refinements may help clinicians detect speech abnormalities without needing access to highly specialized speech-language pathologists.

Original languageEnglish (US)
Title of host publication2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350306897
DOIs
StatePublished - 2023
Event2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023 - Taipei, Taiwan, Province of China
Duration: Dec 16 2023Dec 20 2023

Publication series

Name2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

Conference

Conference2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period12/16/2312/20/23

Keywords

  • neurological tests
  • perceiver
  • sequence classification
  • speech disorders

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Acoustics and Ultrasonics
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'Detecting Speech Abnormalities With a Perceiver-Based Sequence Classifier that Leverages a Universal Speech Model'. Together they form a unique fingerprint.

Cite this