Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks.

Castro-Ospina, Andrés Eduardo; Solarte-Sanchez, Miguel Angel; Vega-Escobar, Laura Stella; Isaza, Claudia; Martínez-Vargas, Juan David

Castro-Ospina, Andrés Eduardo; Solarte-Sanchez, Miguel Angel; Vega-Escobar, Laura Stella; Isaza, Claudia; Martínez-Vargas, Juan David.

Afiliação

Castro-Ospina AE; Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.
Solarte-Sanchez MA; Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.
Vega-Escobar LS; Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.
Isaza C; SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia.
Martínez-Vargas JD; GIDITIC, Universidad EAFIT, Medellín 050022, Colombia.

Sensors (Basel) ; 24(7)2024 Mar 26.

Article em En | MEDLINE | ID: mdl-38610318

ABSTRACT

ABSTRACT

Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.

Palavras-chave

ecoacoustics; environmental sound classification; graph neural networks; graph representation learning; node classification; pre-trained models

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Sensors (Basel) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Colômbia País de publicação: Suíça

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google