Your browser doesn't support javascript.
loading
Vowel speech recognition from rat electroencephalography using long short-term memory neural network.
Ham, Jinsil; Yoo, Hyun-Joon; Kim, Jongin; Lee, Boreom.
Afiliación
  • Ham J; Department of Biomedical Science and Engineering (BMSE), Gwangju Institute of Science and Technology (GIST), Gwangju, South Korea.
  • Yoo HJ; Department of Physical Medicine and Rehabilitation, Korea University Anam Hospital, Korea University College of Medicine, Seoul, South Korea.
  • Kim J; Deepmedi Research Institute of Technology, Deepmedi Inc., Seoul, South Korea.
  • Lee B; Department of Biomedical Science and Engineering (BMSE), Gwangju Institute of Science and Technology (GIST), Gwangju, South Korea.
PLoS One ; 17(6): e0270405, 2022.
Article en En | MEDLINE | ID: mdl-35737731
Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition. Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition. In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning. This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods. Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition. The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen's κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach. These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Percepción del Habla Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: PLoS One Asunto de la revista: CIENCIA / MEDICINA Año: 2022 Tipo del documento: Article País de afiliación: Corea del Sur Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Percepción del Habla Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: PLoS One Asunto de la revista: CIENCIA / MEDICINA Año: 2022 Tipo del documento: Article País de afiliación: Corea del Sur Pais de publicación: Estados Unidos