Utilizing Nearest-Neighbor Clustering for Addressing Imbalanced Datasets in Bioengineering.

Huang, Chih-Ming; Lin, Chun-Hung; Hung, Chuan-Sheng; Zeng, Wun-Hui; Zheng, You-Cheng; Tsai, Chih-Min

Huang, Chih-Ming; Lin, Chun-Hung; Hung, Chuan-Sheng; Zeng, Wun-Hui; Zheng, You-Cheng; Tsai, Chih-Min.

Afiliación

Huang CM; Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 833, Taiwan.
Lin CH; Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 833, Taiwan.
Hung CS; Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 833, Taiwan.
Zeng WH; Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 833, Taiwan.
Zheng YC; Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 833, Taiwan.
Tsai CM; Division of Cardiology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung 833, Taiwan.

Bioengineering (Basel) ; 11(4)2024 Mar 31.

Article en En | MEDLINE | ID: mdl-38671767

ABSTRACT

ABSTRACT

Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-Neighbor (OCNN) algorithm. The original inter-quartile range mechanism is replaced with the K-means with outlier removal (KMOR) algorithm for efficient outlier identification in the target class. Parameters are optimized by treating these outliers as non-target-class samples. A new algorithm, the Location-based Nearest-Neighbor (LBNN) algorithm, clusters one-class training data using KMOR and calculates the farthest distance and percentile for each test data point to determine if it belongs to the target class. Experiments cover parameter studies, validation on eight standard imbalanced datasets from KEEL, and three applications on real medical imbalanced datasets. Results show superior performance in precision, recall, and G-means compared to traditional classification models, making it effective for handling imbalanced data challenges.

Palabras clave

K-means with outlier removal (KMOR); Location-based Nearest Neighbor (LBNN); One-Class Nearest-Neighbor (OCNN)

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Bioengineering (Basel) Año: 2024 Tipo del documento: Article País de afiliación: Taiwán Pais de publicación: Suiza

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google