Pesquisa | Portal Regional da BVS

1.

Analysis of Variance Combined with Optimized Gradient Boosting Machines for Enhanced Load Recognition in Home Energy Management Systems.

Cabral, Thales W; Neto, Fernando B; de Lima, Eduardo R; Fraidenraich, Gustavo; Meloni, Luís G P.

Sensors (Basel) ; 24(15)2024 Jul 31.

Artigo em Inglês | MEDLINE | ID: mdl-39124011

RESUMO

Load recognition remains not comprehensively explored in Home Energy Management Systems (HEMSs). There are gaps in current approaches to load recognition, such as enhancing appliance identification and increasing the overall performance of the load-recognition system through more robust models. To address this issue, we propose a novel approach based on the Analysis of Variance (ANOVA) F-test combined with SelectKBest and gradient-boosting machines (GBMs) for load recognition. The proposed approach improves the feature selection and consequently aids inter-class separability. Further, we optimized GBM models, such as the histogram-based gradient-boosting machine (HistGBM), light gradient-boosting machine (LightGBM), and XGBoost (extreme gradient boosting), to create a more reliable load-recognition system. Our findings reveal that the ANOVA-GBM approach achieves greater efficiency in training time, even when compared to Principal Component Analysis (PCA) and a higher number of features. ANOVA-XGBoost is approximately 4.31 times faster than PCA-XGBoost, ANOVA-LightGBM is about 5.15 times faster than PCA-LightGBM, and ANOVA-HistGBM is 2.27 times faster than PCA-HistGBM. The general performance results expose the impact on the overall performance of the load-recognition system. Some of the key results show that the ANOVA-LightGBM pair reached 96.42% accuracy, 96.27% F1, and a Kappa index of 0.9404; the ANOVA-HistGBM combination achieved 96.64% accuracy, 96.48% F1, and a Kappa index of 0.9434; and the ANOVA-XGBoost pair attained 96.75% accuracy, 96.64% F1, and a Kappa index of 0.9452; such findings overcome rival methods from the literature. In addition, the accuracy gain of the proposed approach is prominent when compared straight to its competitors. The higher accuracy gains were 13.09, 13.31, and 13.42 percentage points (pp) for the pairs ANOVA-LightGBM, ANOVA-HistGBM, and ANOVA-XGBoost, respectively. These significant improvements highlight the effectiveness and refinement of the proposed approach.

2.

Predictors of in-ICU length of stay among congenital heart defect patients using artificial intelligence model: A pilot study.

Chang Junior, João; Caneo, Luiz Fernando; Turquetto, Aida Luiza Ribeiro; Amato, Luciana Patrick; Arita, Elisandra Cristina Trevisan Calvo; Fernandes, Alfredo Manoel da Silva; Trindade, Evelinda Marramon; Jatene, Fábio Biscegli; Dossou, Paul-Eric; Jatene, Marcelo Biscegli.

Heliyon ; 10(4): e25406, 2024 Feb 29.

Artigo em Inglês | MEDLINE | ID: mdl-38370176

RESUMO

Objective: This study aims to develop a predictive model using artificial intelligence to estimate the ICU length of stay (LOS) for Congenital Heart Defects (CHD) patients after surgery, improving care planning and resource management. Design: We analyze clinical data from 2240 CHD surgery patients to create and validate the predictive model. Twenty AI models are developed and evaluated for accuracy and reliability. Setting: The study is conducted in a Brazilian hospital's Cardiovascular Surgery Department, focusing on transplants and cardiopulmonary surgeries. Participants: Retrospective analysis is conducted on data from 2240 consecutive CHD patients undergoing surgery. Interventions: Ninety-three pre and intraoperative variables are used as ICU LOS predictors. Measurements and main results: Utilizing regression and clustering methodologies for ICU LOS (ICU Length of Stay) estimation, the Light Gradient Boosting Machine, using regression, achieved a Mean Squared Error (MSE) of 15.4, 11.8, and 15.2 days for training, testing, and unseen data. Key predictors included metrics such as "Mechanical Ventilation Duration", "Weight on Surgery Date", and "Vasoactive-Inotropic Score". Meanwhile, the clustering model, Cat Boost Classifier, attained an accuracy of 0.6917 and AUC of 0.8559 with similar key predictors. Conclusions: Patients with higher ventilation times, vasoactive-inotropic scores, anoxia time, cardiopulmonary bypass time, and lower weight, height, BMI, age, hematocrit, and presurgical oxygen saturation have longer ICU stays, aligning with existing literature.

3.

Prediction of Body Weight by Using PCA-Supported Gradient Boosting and Random Forest Algorithms in Water Buffaloes (Bubalus bubalis) Reared in South-Eastern Mexico.

Gomez-Vazquez, Armando; Tirink, Cem; Cruz-Tamayo, Alvar Alonzo; Cruz-Hernandez, Aldenamar; Camacho-Pérez, Enrique; Okuyucu, Ibrahim Cihangir; Sahin, Hasan Alp; Dzib-Cauich, Dany Alejandro; Gülboy, Ömer; Garcia-Herrera, Ricardo Alfonso; Chay-Canul, Alfonso J.

Animals (Basel) ; 14(2)2024 Jan 17.

Artigo em Inglês | MEDLINE | ID: mdl-38254463

RESUMO

This study aims to use advanced machine learning techniques supported by Principal Component Analysis (PCA) to estimate body weight (BW) in buffalos raised in southeastern Mexico and compare their performance. The first stage of the current study consists of body measurements and the process of determining the most informative variables using PCA, a dimension reduction method. This process reduces the data size by eliminating the complex structure of the model and provides a faster and more effective learning process. As a second stage, two separate prediction models were developed with Gradient Boosting and Random Forest algorithms, using the principal components obtained from the data set reduced by PCA. The performances of both models were compared using R2, RMSE and MAE metrics, and showed that the Gradient Boosting model achieved a better prediction performance with a higher R2 value and lower error rates than the Random Forest model. In conclusion, PCA-supported modeling applications can provide more reliable results, and the Gradient Boosting algorithm is superior to Random Forest in this context. The current study demonstrates the potential use of machine learning approaches in estimating body weight in water buffalos, and will support sustainable animal husbandry by contributing to decision making processes in the field of animal science.

4.

Machine learning algorithms accurately identify free-living marine nematode species.

Brito de Jesus, Simone; Vieira, Danilo; Gheller, Paula; Cunha, Beatriz P; Gallucci, Fabiane; Fonseca, Gustavo.

PeerJ ; 11: e16216, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37842061

RESUMO

Background: Identifying species, particularly small metazoans, remains a daunting challenge and the phylum Nematoda is no exception. Typically, nematode species are differentiated based on morphometry and the presence or absence of certain characters. However, recent advances in artificial intelligence, particularly machine learning (ML) algorithms, offer promising solutions for automating species identification, mostly in taxonomically complex groups. By training ML models with extensive datasets of accurately identified specimens, the models can learn to recognize patterns in nematodes' morphological and morphometric features. This enables them to make precise identifications of newly encountered individuals. Implementing ML algorithms can improve the speed and accuracy of species identification and allow researchers to efficiently process vast amounts of data. Furthermore, it empowers non-taxonomists to make reliable identifications. The objective of this study is to evaluate the performance of ML algorithms in identifying species of free-living marine nematodes, focusing on two well-known genera: Acantholaimus Allgén, 1933 and Sabatieria Rouville, 1903. Methods: A total of 40 species of Acantholaimus and 60 species of Sabatieria were considered. The measurements and identifications were obtained from the original publications of species for both genera, this compilation included information regarding the presence or absence of specific characters, as well as morphometric data. To assess the performance of the species identification four ML algorithms were employed: Random Forest (RF), Stochastic Gradient Boosting (SGBoost), Support Vector Machine (SVM) with both linear and radial kernels, and K-nearest neighbor (KNN) algorithms. Results: For both genera, the random forest (RF) algorithm demonstrated the highest accuracy in correctly classifying specimens into their respective species, achieving an accuracy rate of 93% for Acantholaimus and 100% for Sabatieria, only a single individual from Acantholaimus of the test data was misclassified. Conclusion: These results highlight the overall effectiveness of ML algorithms in species identification. Moreover, it demonstrates that the identification of marine nematodes can be automated, optimizing biodiversity and ecological studies, as well as turning species identification more accessible, efficient, and scalable. Ultimately it will contribute to our understanding and conservation of biodiversity.

Assuntos

Inteligência Artificial , Nematoides , Humanos , Animais , Algoritmos , Aprendizado de Máquina , Cromadoria

5.

Who's your data? Primary immune deficiency differential diagnosis prediction via machine learning and data mining of the USIDNET registry.

Méndez Barrera, Jose Alfredo; Rocha Guzmán, Samuel; Hierro Cascajares, Elisa; Garabedian, Elizabeth K; Fuleihan, Ramsay L; Sullivan, Kathleen E; Lugo Reyes, Saul O.

Clin Immunol ; 255: 109759, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37678719

RESUMO

PURPOSE: There are currently more than 480 primary immune deficiency (PID) diseases and about 7000 rare diseases that together afflict around 1 in every 17 humans. Computational aids based on data mining and machine learning might facilitate the diagnostic task by extracting rules from large datasets and making predictions when faced with new problem cases. In a proof-of-concept data mining study, we aimed to predict PID diagnoses using a supervised machine learning algorithm based on classification tree boosting. METHODS: Through a data query at the USIDNET registry we obtained a database of 2396 patients with common diagnoses of PID, including their clinical and laboratory features. We kept 286 features and all 12 diagnoses to include in the model. We used the XGBoost package with parallel tree boosting for the supervised classification model, and SHAP for variable importance interpretation, on Python v3.7. The patient database was split into training and testing subsets, and after boosting through gradient descent, the predictive model provides measures of diagnostic prediction accuracy and individual feature importance. After a baseline performance test, we used the Class Weighting Hyperparameter, or scale_pos_weight to correct for imbalanced classification. RESULTS: The twelve PID diagnoses were CVID (1098 patients), DiGeorge syndrome, Chronic granulomatous disease, Congenital agammaglobulinemia, PID not otherwise classified, Specific antibody deficiency, Complement deficiency, Hyper-IgM, Leukocyte adhesion deficiency, ectodermal dysplasia with immune deficiency, Severe combined immune deficiency, and Wiskott-Aldrich syndrome. For CVID, the model found an accuracy on the train sample of 0.80, with an area under the ROC curve (AUC) of 0.80, and a Gini coefficient of 0.60. In the test subset, accuracy was 0.76, AUC 0.75, and Gini 0.51. The positive feature value to predict CVID was highest for upper respiratory infections, asthma, autoimmunity and hypogammaglobulinemia. Features with the highest negative predictive value were high IgE, growth delay, abscess, lymphopenia, and congenital heart disease. For the rest of the diagnoses, accuracy stayed between 0.75 and 0.99, AUC 0.46-0.87, Gini 0.07-0.75, and LogLoss 0.09-8.55. DISCUSSION: Clinicians should remember to consider the negative predictive features together with the positives. We are calling this a proof-of-concept study to continue with our explorations. A good performance is encouraging, and feature importance might aid feature selection for future endeavors. In the meantime, we can learn from the rules derived by the model and build a user-friendly decision tree to generate differential diagnoses.

Assuntos

Doenças da Imunodeficiência Primária , Síndrome de Wiskott-Aldrich , Humanos , Diagnóstico Diferencial , Aprendizado de Máquina , Mineração de Dados

6.

Risk Factors Associated with COVID-19 Lethality: A Machine Learning Approach Using Mexico Database.

Carvantes-Barrera, Alejandro; Díaz-González, Lorena; Rosales-Rivera, Mauricio; Chávez-Almazán, Luis A.

J Med Syst ; 47(1): 90, 2023 Aug 19.

Artigo em Inglês | MEDLINE | ID: mdl-37597034

RESUMO

Identifying risk factors associated with COVID-19 lethality is crucial in combating the ongoing pandemic. In this study, we developed lethality predictive models for each epidemiological wave and for the overall dataset using the Extreme Gradient Boosting technique and analyzed them using Shapley values to determine the contribution levels of various features, including demographics, comorbidities, medical units, and recent medical information from confirmed COVID-19 cases in Mexico between February 23, 2020, and April 15, 2022. The results showed that pneumonia and advanced age were the most important factors predicting patient death in all cohorts. Additionally, the medical unit where the patient received care acted as a risk or protective factor. IMSS medical units were identified as high-risk factors in all cohorts, except in wave four, while SSA medical units generally were moderate protective factors. We also found that intubation was a high-risk factor in the first epidemiological wave and a moderate-risk factor in the following waves. Female gender was a protective factor of moderate-high importance in all cohorts, while being between 18 and 29 years old was a moderate protective factor and being between 50 and 59 years old was a moderate risk factor. Additionally, diabetes (all cohorts), obesity (third wave), and hypertension (fourth wave) were identified as moderate risk factors. Finally, residing in municipalities with the lowest Human Development Index level represented a moderate risk factor. In conclusion, this study identified several significant risk factors associated with COVID-19 lethality in Mexico, which could aid policymakers in developing targeted interventions to reduce mortality rates.

Assuntos

COVID-19 , Humanos , Feminino , Adolescente , Adulto Jovem , Adulto , Pessoa de Meia-Idade , COVID-19/epidemiologia , México/epidemiologia , Fatores de Risco , Obesidade , Aprendizado de Máquina

7.

Machine learning and comorbidity network analysis for hospitalized patients with COVID-19 in a city in Southern Brazil.

Passarelli-Araujo, Hemanoel; Passarelli-Araujo, Hisrael; Urbano, Mariana R; Pescim, Rodrigo R.

Smart Health (Amst) ; 26: 100323, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-36159078

RESUMO

The large amount of data generated during the COVID-19 pandemic requires advanced tools for the long-term prediction of risk factors associated with COVID-19 mortality with higher accuracy. Machine learning (ML) methods directly address this topic and are essential tools to guide public health interventions. Here, we used ML to investigate the importance of demographic and clinical variables on COVID-19 mortality. We also analyzed how comorbidity networks are structured according to age groups. We conducted a retrospective study of COVID-19 mortality with hospitalized patients from Londrina, Parana, Brazil, registered in the database for severe acute respiratory infections (SIVEP-Gripe), from January 2021 to February 2022. We tested four ML models to predict the COVID-19 outcome: Logistic Regression, Support Vector Machine, Random Forest, and XGBoost. We also constructed a comorbidity network to investigate the impact of co-occurring comorbidities on COVID-19 mortality. Our study comprised 8358 hospitalized patients, of whom 2792 (33.40%) died. The XGBoost model achieved excellent performance (ROC-AUC = 0.90). Both permutation method and SHAP values highlighted the importance of age, ventilatory support status, and intensive care unit admission as key features in predicting COVID-19 outcomes. The comorbidity networks for old deceased patients are denser than those for young patients. In addition, the co-occurrence of heart disease and diabetes may be the most important combination to predict COVID-19 mortality, regardless of age and sex. This work presents a valuable combination of machine learning and comorbidity network analysis to predict COVID-19 outcomes. Reliable evidence on this topic is crucial for guiding the post-pandemic response and assisting in COVID-19 care planning and provision.

8.

Spatial patterns and determinants of avocado frontier dynamics in Mexico.

Ramírez-Mejía, Diana; Levers, Christian; Mas, Jean-François.

Reg Environ Change ; 22(1): 28, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35250377

RESUMO

The surging demand for commodity crops has led to rapid and severe agricultural frontier expansion globally and has put producing regions increasingly under pressure. However, knowledge about spatial patterns of agricultural frontier dynamics, their leading spatial determinants, and socio-ecological trade-offs is often lacking, hindering contextualized decision making towards more sustainable food systems. Here, we used inventory data to map frontier dynamics of avocado production, a cash crop of increasing importance in global diets, for Michoacán, Mexico, before and after the implementation of the North American Free Trade Agreement (NAFTA). We compiled a set of environmental, accessibility and social variables and identified the leading determinants of avocado frontier expansion and their interactions using extreme gradient boosting. We predicted potential expansion patterns and assessed their impacts on areas important for biodiversity conservation. Avocado frontiers expanded more than tenfold from 12,909 ha (1974) to 152,493 ha (2011), particularly after NAFTA. Annual precipitation, distance to settlements, and land tenure were key factors explaining avocado expansion. Under favorable climatic and accessibility conditions, most avocado expansion occurred on private lands. Contrary, under suboptimal conditions, most avocado expansion occurred on communal lands. Large areas suitable for further avocado expansion overlapped with priority sites for restoration, highlighting an imminent conflict between conservation and economic revenues. This is the first analysis of avocado frontier dynamics and their spatial determinants across a major production region and our results provide entry points to implement government-based strategies to support small-scale farmers, mostly those on communal lands, while trying to minimize the socio-environmental impacts of avocado production. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10113-022-01883-6.

9.

Design of Automatic Tool for Diagnosis of Pneumonia Using Boosting Techniques

Postalcioglu, Seda.

Braz. arch. biol. technol ; Braz. arch. biol. technol;65: e22210322, 2022. tab, graf

Artigo em Inglês | LILACS-Express | LILACS | ID: biblio-1364443

RESUMO

Abstract Covid-19 is today's pandemic disease and can cause the hospital crowded. Additionally, It affects the lungs and may cause pneumonia. The most popular technique for diagnosis of pneumonia is the evaluation of X-ray. However, a sufficient number of radiologists are needed to interpret the X-ray images. High rates of child deaths due to pneumonia have been encountered. Using this type of system, a diagnosis can be made quickly, and then the treatment process can be started rapidly. This study aims to diagnose pneumonia using boosting techniques by the automatic tool. With this tool, the workload of the doctors/radiologists is reduced. The boosting techniques are a family of machine learning techniques. Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost) are used for the study. These techniques are chosen because of their simulation duration for modeling and convenience for real-time applications. L2 normalization and feature selection are applied to the data before applying the techniques. Random Forest Classifier is used for feature selection estimator. After the modeling, Categorical Boosting algorithm is observed as faster than the other techniques. Simulation duration is obtained as 0.7 seconds. By using this automatic tool, the user can be able to upload the desired X-ray image to the system and get the result easily from the screen without any radiologist/doctor.

10.

Automatic method for classifying COVID-19 patients based on chest X-ray images, using deep features and PSO-optimized XGBoost.

Dias Júnior, Domingos Alves; da Cruz, Luana Batista; Bandeira Diniz, João Otávio; França da Silva, Giovanni Lucca; Junior, Geraldo Braz; Silva, Aristófanes Corrêa; de Paiva, Anselmo Cardoso; Nunes, Rodolfo Acatauassú; Gattass, Marcelo.

Expert Syst Appl ; 183: 115452, 2021 Nov 30.

Artigo em Inglês | MEDLINE | ID: mdl-34177133

RESUMO

The COVID-19 pandemic, which originated in December 2019 in the city of Wuhan, China, continues to have a devastating effect on the health and well-being of the global population. Currently, approximately 8.8 million people have already been infected and more than 465,740 people have died worldwide. An important step in combating COVID-19 is the screening of infected patients using chest X-ray (CXR) images. However, this task is extremely time-consuming and prone to variability among specialists owing to its heterogeneity. Therefore, the present study aims to assist specialists in identifying COVID-19 patients from their chest radiographs, using automated computational techniques. The proposed method has four main steps: (1) the acquisition of the dataset, from two public databases; (2) the standardization of images through preprocessing; (3) the extraction of features using a deep features-based approach implemented through the networks VGG19, Inception-v3, and ResNet50; (4) the classifying of images into COVID-19 groups, using eXtreme Gradient Boosting (XGBoost) optimized by particle swarm optimization (PSO). In the best-case scenario, the proposed method achieved an accuracy of 98.71%, a precision of 98.89%, a recall of 99.63%, and an F1-score of 99.25%. In our study, we demonstrated that the problem of classifying CXR images of patients under COVID-19 and non-COVID-19 conditions can be solved efficiently by combining a deep features-based approach with a robust classifier (XGBoost) optimized by an evolutionary algorithm (PSO). The proposed method offers considerable advantages for clinicians seeking to tackle the current COVID-19 pandemic.

11.

Environmental controls on canopy foliar nitrogen distributions in a Neotropical lowland forest.

Balzotti, Christopher S; Asner, Gregory P; Taylor, Philip G; Cleveland, Cory C; Cole, Rebecca; Martin, Roberta E; Nasto, Megan; Osborne, Brooke B; Porder, Stephen; Townsend, Alan R.

Ecol Appl ; 26(8): 2449-2462, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-27874999

RESUMO

Distributions of foliar nutrients across forest canopies can give insight into their plant functional diversity and improve our understanding of biogeochemical cycling. We used airborne remote sensing and partial least squares regression to quantify canopy foliar nitrogen (foliar N) across ~164 km2 of wet lowland tropical forest in the Osa Peninsula, Costa Rica. We determined the relative influence of climate and topography on the observed patterns of foliar N using a gradient boosting model technique. At a local scale, where climate and substrate were constant, we explored the influence of slope position on foliar N by quantifying foliar N on remnant terraces, their adjacent slopes, and knife-edged ridges. In addition, we climbed and sampled 540 trees and analyzed foliar N in order to quantify the role of species identity (phylogeny) and environmental factors in predicting foliar N. Observed foliar N heterogeneity reflected environmental factors working at multiple spatial scales. Across the larger landscape, elevation and precipitation had the highest relative influence on predicting foliar N (30% and 24%), followed by soils (15%), site exposure (9%), compound topographic index (8%), substrate (6%), and landscape dissection (6%). Phylogeny explained ~75% of the variation in the field collected foliar N data, suggesting that phylogeny largely underpins the response to the environmental factors. Taken together, these data suggest that a large fraction of the variance in foliar N across the landscape is proximately driven by species composition, though ultimately this is likely a response to abiotic factors such as climate and topography. Future work should focus on the mechanisms and feedbacks involved, and how shifts in climate may translate to changes in forest function.

Assuntos

Nitrogênio , Folhas de Planta , Costa Rica , Florestas , Árvores , Clima Tropical

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA