EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm.

Jacob Junior, Antonio Fernando Lavareda; do Carmo, Fabricio Almeida; de Santana, Adamo Lima; Santana, Ewaldo Eder Carvalho; Lobato, Fabio Manoel Franca

Jacob Junior, Antonio Fernando Lavareda; do Carmo, Fabricio Almeida; de Santana, Adamo Lima; Santana, Ewaldo Eder Carvalho; Lobato, Fabio Manoel Franca.

Afiliação

Jacob Junior AFL; Graduate Program in Electrical Engineering (PPGEE), Federal University of Maranhão (UFMA), São Luís, Maranhão, Brazil.
do Carmo FA; Graduate Program in Computer Engineering and Systems (PECS), State University of Maranhão (UEMA), São Luís, Maranhão, Brazil.
de Santana AL; Graduate Program in Computer Engineering and Systems (PECS), State University of Maranhão (UEMA), São Luís, Maranhão, Brazil.
Santana EEC; Corporate ReD Headquarters Fuji Electric Co., Tokyo, Japan.
Lobato FMF; Graduate Program in Electrical Engineering (PPGEE), Federal University of Maranhão (UFMA), São Luís, Maranhão, Brazil.

PLoS One ; 19(1): e0297147, 2024.

Article em En | MEDLINE | ID: mdl-38241256

ABSTRACT

ABSTRACT

Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be "drama" and "bibliography" simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi-label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.

Assuntos

Algoritmos; Projetos de Pesquisa; Análise por Conglomerados; Bases de Dados Factuais

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Projetos de Pesquisa / Algoritmos Tipo de estudo: Prognostic_studies Idioma: En Revista: PLoS One Assunto da revista: CIENCIA / MEDICINA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Brasil País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google