Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques.

Gale, Robert; Chen, Liu; Dolata, Jill; van Santen, Jan; Asgari, Meysam

Gale, Robert; Chen, Liu; Dolata, Jill; van Santen, Jan; Asgari, Meysam.

Afiliación

Gale R; Center for Spoken Language Understanding (CSLU), Oregon Health & Science University (OHSU), Portland, OR.
Chen L; Center for Spoken Language Understanding (CSLU), Oregon Health & Science University (OHSU), Portland, OR.
Dolata J; Center for Spoken Language Understanding (CSLU), Oregon Health & Science University (OHSU), Portland, OR.
van Santen J; Center for Spoken Language Understanding (CSLU), Oregon Health & Science University (OHSU), Portland, OR.
Asgari M; Center for Spoken Language Understanding (CSLU), Oregon Health & Science University (OHSU), Portland, OR.

Interspeech ; 2019: 11-15, 2019 Sep.

Article en En | MEDLINE | ID: mdl-33088838

RESUMEN

This study explores building and improving an automatic speech recognition (ASR) system for children aged 6-9 years and diagnosed with autism spectrum disorder (ASD), language impairment (LI), or both. Working with only 1.5 hours of target data in which children perform the Clinical Evaluation of Language Fundamentals Recalling Sentences task, we apply deep neural network (DNN) weight transfer techniques to adapt a large DNN model trained on the LibriSpeech corpus of adult speech. To begin, we aim to find the best proportional training rates of the DNN layers. Our best configuration yields a 29.38% word error rate (WER). Using this configuration, we explore the effects of quantity and similarity of data augmentation in transfer learning. We augment our training with portions of the OGI Kids' Corpus, adding 4.6 hours of typically developing speakers aged kindergarten through 3rd grade. We find that 2nd grade data alone - approximately the mean age of the target data - outperforms other grades and all the sets combined. Doubling the data for 1st, 2nd, and 3rd grade, we again compare each grade as well as pairs of grades. We find the combination of 1st and 2nd grade performs best at a 26.21% WER.

Palabras clave

autism spectrum disorder; children speech recognition; deep neural network; language impairment; speech recognition; transfer learning

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Interspeech Año: 2019 Tipo del documento: Article Pais de publicación: Francia

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google