Balancing effort and benefit of K-means clustering algorithms in Big Data realms.
PLoS One
; 13(9): e0201874, 2018.
Article
em En
| MEDLINE
| ID: mdl-30183705
In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Simulação por Computador
/
Análise por Conglomerados
/
Interpretação Estatística de Dados
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
PLoS One
Assunto da revista:
CIENCIA
/
MEDICINA
Ano de publicação:
2018
Tipo de documento:
Article
País de afiliação:
México
País de publicação:
Estados Unidos