Pesquisa | Portal Regional da BVS

Feature selection reveal peripheral blood parameter's changes between COVID-19 infections patients from Brazil and Ecuador.

Feltes, Bruno César; Vieira, Igor Araújo; Parraga-Alava, Jorge; Meza, Jaime; Portmann, Edy; Terán, Luis; Dorn, Márcio.

Infect Genet Evol ; 98: 105228, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35104680

RESUMO

The investigation of conventional complete blood-count (CBC) data for classifying the SARS-CoV-2 infection status became a topic of interest, particularly as a complementary laboratory tool in developing and third-world countries that financially struggled to test their population. Although hematological parameters in COVID-19-affected individuals from Asian and USA populations are available, there are no descriptions of comparative analyses of CBC findings between COVID-19 positive and negative cases from Latin American countries. In this sense, machine learning techniques have been employed to examine CBC data and aid in screening patients suspected of SARS-CoV-2 infection. In this work, we used machine learning to compare CBC data between two highly genetically distinguished Latin American countries: Brazil and Ecuador. We notice a clear distribution pattern of positive and negative cases between the two countries. Interestingly, almost all red blood cell count parameters were divergent. For males, neutrophils and lymphocytes are distinct between Brazil and Ecuador, while eosinophils are distinguished for females. Finally, neutrophils, lymphocytes, and monocytes displayed a particular distribution for both genders. Therefore, our findings demonstrate that the same set of CBC features relevant to one population is unlikely to apply to another. This is the first study to compare CBC data from two genetically distinct Latin American countries.

Assuntos

COVID-19/sangue , COVID-19/fisiopatologia , Testes Hematológicos/métodos , Testes Hematológicos/estatística & dados numéricos , Programas de Rastreamento/métodos , Programas de Rastreamento/estatística & dados numéricos , SARS-CoV-2/patogenicidade , Adulto , Idoso , Idoso de 80 Anos ou mais , Brasil/epidemiologia , Equador/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade

UrbangEnCy: An emergency events dataset based on citizen sensors for monitoring urban scenarios in Ecuador.

Parraga-Alava, Jorge; Alcivar-Cevallos, Roberth; Vaca-Cardenas, Leticia; Meza, Jaime.

Data Brief ; 34: 106693, 2021 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-33490324

RESUMO

Recently, the use of the citizen-sensors (people generating and sharing real data by social media) for detecting and disseminating emergency events in real-time have shown a considerable increase because people at the place of the event, as well as elsewhere, can quickly post relevant information on this type of alerts. Here, we present an emergency events dataset called UrbangEnCy. The dataset contains over 25500 texts in Spanish posted on Twitter from January 19th to August 19th, 2020, with emergencies and non-emergencies related content in Ecuador. We obtained, cleaned and, filtered these tweets and, then we selected the location and temporal data as well as tweet content. Besides, the data set includes annotations regarding the type of tweet (emergency / non-emergency) as well as additional nomenclature used to describe emergencies in the Center for immediate response service to emergencies (ECU 911) of Ecuador and international emergency services agencies (ESAs). UrbangEnCy dataset facilitates evaluating data science performance, machine learning, and natural language processing algorithms used with supervised and unsupervised problems re- related to text mining and pattern recognition. The dataset is freely and publicly available at https://doi.org/10.17632/4x37zz82k8.

Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance.

Parraga-Alava, Jorge; Inostroza-Ponta, Mario.

J Bioinform Comput Biol ; 18(6): 2050038, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-33148094

RESUMO

Using a prior biological knowledge of relationships and genetic functions for gene similarity, from repository such as the Gene Ontology (GO), has shown good results in multi-objective gene clustering algorithms. In this scenario and to obtain useful clustering results, it would be helpful to know which measure of biological similarity between genes should be employed to yield meaningful clusters that have both similar expression patterns (co-expression) and biological homogeneity. In this paper, we studied the influence of the four most used GO-based semantic similarity measures in the performance of a multi-objective gene clustering algorithm. We used four publicly available datasets and carried out comparative studies based on performance metrics for the multi-objective optimization field and clustering performance indexes. In most of the cases, using Jiang-Conrath and Wang similarities stand in terms of multi-objective metrics. In clustering properties, Resnik similarity allows to achieve the best values of compactness and separation and therefore of co-expression of groups of genes. Meanwhile, in biological homogeneity, the Wang similarity reports greater number of significant GO terms. However, statistical, visual, and biological significance tests showed that none of the GO-based semantic similarity measures stand out above the rest in order to significantly improve the performance of the multi-objective gene clustering algorithm.

Assuntos

Algoritmos , Família Multigênica , Análise por Conglomerados , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Ontologia Genética/estatística & dados numéricos , Semântica , Transcriptoma

A data set for electric power consumption forecasting based on socio-demographic features: Data from an area of southern Colombia.

Parraga-Alava, Jorge; Moncayo-Nacaza, Jorge Dario; Revelo-Fuelagán, Javier; Rosero-Montalvo, Paul D; Anaya-Isaza, Andrés; Peluffo-Ordóñez, Diego Hernán.

Data Brief ; 29: 105246, 2020 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-32083158

RESUMO

In this article, we introduce a data set concerning electric-power consumption-related features registered in seven main municipalities of Nariño, Colombia, from December 2010 to May 2016. The data set consists of 4427 socio-demographic characteristics, and 7 power-consumption-referred measured values. Data were fully collected by the company Centrales Eléctricas de Nariño (CEDENAR) according to the client consumption records. Power consumption data collection was carried following a manual procedure wherein company workers are in charge of manually registering the readings (measured in kWh) reported by the electric energy meters installed at each housing/building. Released data set is aimed at providing researchers a suitable input for designing and assessing the performance of forecasting, modelling, simulation and optimization approaches applied to electric power consumption prediction and characterization problems. The data set, so-named in shorthand PCSTCOL, is freely and publicly available at https://doi.org/10.17632/xbt7scz5ny.3.

RoCoLe: A robusta coffee leaf images dataset for evaluation of machine learning based methods in plant diseases recognition.

Parraga-Alava, Jorge; Cusme, Kevin; Loor, Angélica; Santander, Esneider.

Data Brief ; 25: 104414, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-31516934

RESUMO

In this article we introduce a robusta coffee leaf images dataset called RoCoLe. The dataset contains 1560 leaf images with visible red mites and spots (denoting coffee leaf rust presence) for infection cases and images without such structures for healthy cases. In addition, the data set includes annotations regarding objects (leaves), state (healthy and unhealthy) and the severity of disease (leaf area with spots). Images were all obtained in real-world conditions in the same coffee plants field using a smartphone camera. RoCoLe data set facilitates the evaluation of the performance of machine learning algorithms used in image segmentation and classification problems related to plant diseases recognition. The current dataset is freely and publicly available at https://doi.org/10.17632/c5yvn32dzg.2.

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

Parraga-Alava, Jorge; Dorn, Marcio; Inostroza-Ponta, Mario.

BioData Min ; 11: 16, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30100924

RESUMO

BACKGROUND: Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised. They are usually based on criteria like compactness and separation, which may not be sufficient since they can not guarantee the generation of clusters that have both similar expression patterns and biological coherence. METHOD: We propose a Multi-Objective Clustering algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) to find clusters of genes with high levels of co-expression, biological coherence, and also good compactness and separation. Cluster quality indexes are used to optimise simultaneously gene relationships at expression level and biological functionality. Our proposal also includes intensification and diversification strategies to improve the search process. RESULTS: The effectiveness of the proposed algorithm is demonstrated on four publicly available datasets. Comparative studies of the use of different objective functions and other widely used microarray clustering techniques are reported. Statistical, visual and biological significance tests are carried out to show the superiority of the proposed algorithm. CONCLUSIONS: Integrating a-priori biological knowledge into a multi-objective approach and using intensification and diversification strategies allow the proposed algorithm to find solutions with higher quality than other microarray clustering techniques available in the literature in terms of co-expression, biological coherence, compactness and separation.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA