RESUMEN
Artificial intelligence techniques are now widely used in various agricultural applications, including the detection of devastating diseases such as late blight (Phytophthora infestans) and early blight (Alternaria solani) affecting potato (Solanum tuberorsum L.) crops. In this paper, we present a mobile application for detecting potato crop diseases based on deep neural networks. The images were taken from the PlantVillage dataset with a batch of 1000 images for each of the three identified classes (healthy, early blight-diseased, late blight-diseased). An exploratory analysis of the architectures used for early and late blight diagnosis in potatoes was performed, achieving an accuracy of 98.7%, with MobileNetv2. Based on the results obtained, an offline mobile application was developed, supported on devices with Android 4.1 or later, also featuring an information section on the 27 diseases affecting potato crops and a gallery of symptoms. For future work, segmentation techniques will be used to highlight the damaged region in the potato leaf by evaluating its extent and possibly identifying different types of diseases affecting the same plant.
RESUMEN
This work proposes a new methodology to identify and validate deep learning models for artificial oil lift systems that use submersible electric pumps. The proposed methodology allows for obtaining the models and evaluating the prediction's uncertainty jointly and systematically. The methodology employs a nonlinear model to generate training and validation data and the Markov Chain Monte Carlo algorithm to assess the neural network's epistemic uncertainty. The nonlinear model was used to overcome the limitations of the need for big datasets for training deep learning models. However, the developed models are validated against experimental data after training and validation with synthetic data. The validation is also performed through the models' uncertainty assessment and experimental data. From the implementation point of view, the method was coded in Python with Tensorflow and Keras libraries used to build the neural Networks and find the hyperparameters. The results show that the proposed methodology obtained models representing both the nonlinear model's dynamic behavior and the experimental data. It provides a most probable value close to the experimental data, and the uncertainty of the generated deep learning models has the same order of magnitude as that of the nonlinear model. This uncertainty assessment shows that the built models were adequately validated. The proposed deep learning models can be applied in several applications requiring a reliable and computationally lighter model. Hence, the obtained AI dynamic models can be employed for digital twin construction, control, and optimization.
RESUMEN
Breast ultrasound (BUS) image classification in benign and malignant classes is often based on pre-trained convolutional neural networks (CNNs) to cope with small-sized training data. Nevertheless, BUS images are single-channel gray-level images, whereas pre-trained CNNs learned from color images with red, green, and blue (RGB) components. Thus, a gray-to-color conversion method is applied to fit the BUS image to the CNN's input layer size. This paper evaluates 13 gray-to-color conversion methods proposed in the literature that follow three strategies: replicating the gray-level image to all RGB channels, decomposing the image to enhance inherent information like the lesion's texture and morphology, and learning a matching layer. Besides, we introduce an image decomposition method based on the lesion's structural information to describe its inner and outer complexity. These gray-to-color conversion methods are evaluated under the same experimental framework using a pre-trained CNN architecture named ResNet-18 and a BUS dataset with more than 3000 images. In addition, the Matthews correlation coefficient (MCC), sensitivity (SEN), and specificity (SPE) measure the classification performance. The experimental results show that decomposition methods outperform replication and learning-based methods when using information from the lesion's binary mask (obtained from a segmentation method), reaching an MCC value greater than 0.70 and specificity up to 0.92, although the sensitivity is about 0.80. On the other hand, regarding the proposed method, the trade-off between sensitivity and specificity is better balanced, obtaining about 0.88 for both indices and an MCC of 0.73. This study contributes to the objective assessment of different gray-to-color conversion approaches in classifying breast lesions, revealing that mask-based decomposition methods improve classification performance. Besides, the proposed method based on structural information improves the sensitivity, obtaining more reliable classification results on malignant cases and potentially benefiting clinical practice.
Asunto(s)
Mama , Redes Neurales de la Computación , Femenino , Humanos , Mama/diagnóstico por imagen , Ultrasonografía , Ultrasonografía Mamaria , Sensibilidad y EspecificidadRESUMEN
This paper presents a generic framework for fault prognosis using autoencoder-based deep learning methods. The proposed approach relies upon a semi-supervised extrapolation of autoencoder reconstruction errors, which can deal with the unbalanced proportion between faulty and non-faulty data in an industrial context to improve systems' safety and reliability. In contrast to supervised methods, the approach requires less manual data labeling and can find previously unknown patterns in data. The technique focuses on detecting and isolating possible measurement divergences and tracking their growth to signalize a fault's occurrence while individually evaluating each monitored variable to provide fault detection and prognosis. Additionally, the paper also provides an appropriate set of metrics to measure the accuracy of the models, which is a common disadvantage of unsupervised methods due to the lack of predefined answers during training. Computational results using the Commercial Modular Aero Propulsion System Simulation (CMAPSS) monitoring data show the effectiveness of the proposed framework.
Asunto(s)
Benchmarking , Redes Neurales de la Computación , Reproducibilidad de los Resultados , Pronóstico , Simulación por ComputadorRESUMEN
Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.
Asunto(s)
Retroelementos , Secuencias Repetidas Terminales , Elementos Transponibles de ADN , Evolución Molecular , Genoma de Planta , Aprendizaje Automático , Plantas/genética , Retroelementos/genéticaRESUMEN
Automatic flood detection may be an important component for triggering damage control systems and minimizing the risk of social or economic impacts caused by flooding. Riverside images from regular cameras are a widely available resource that can be used for tackling this problem. Nevertheless, state-of-the-art neural networks, the most suitable approach for this type of computer vision task, are usually resource-consuming, which poses a challenge for deploying these models within low-capability Internet of Things (IoT) devices with unstable internet connections. In this work, we propose a deep neural network (DNN) architecture pruning algorithm capable of finding a pruned version of a given DNN within a user-specified memory footprint. Our results demonstrate that our proposed algorithm can find a pruned DNN model with the specified memory footprint with little to no degradation of its segmentation performance. Finally, we show that our algorithm can be used in a memory-constraint wireless sensor network (WSN) employed to detect flooding events of urban rivers, and the resulting pruned models have competitive results compared with the original models.
Asunto(s)
Internet de las Cosas , Algoritmos , Computadores , Inundaciones , Redes Neurales de la ComputaciónRESUMEN
In the last five years, the inclusion of Deep Learning algorithms in prognostics and health management (PHM) has led to a performance increase in diagnostics, prognostics, and anomaly detection. However, the lack of interpretability of these models results in resistance towards their deployment. Deep Learning-based models fall within the accuracy/interpretability tradeoff, which means that their complexity leads to high performance levels but lacks interpretability. This work aims at addressing this tradeoff by proposing a technique for feature selection embedded in deep neural networks that uses a feature selection (FS) layer trained with the rest of the network to evaluate the input features' importance. The importance values are used to determine which will be considered for deployment of a PHM model. For comparison with other techniques, this paper introduces a new metric called ranking quality score (RQS), that measures how performance evolves while following the corresponding ranking. The proposed framework is exemplified with three case studies involving health state diagnostics and prognostics and remaining useful life prediction. Results show that the proposed technique achieves higher RQS than the compared techniques, while maintaining the same performance level when compared to the same model but without an FS layer.
Asunto(s)
Aprendizaje Profundo , Algoritmos , Redes Neurales de la Computación , PronósticoRESUMEN
Olive tree growing is an important economic activity in many countries, mostly in the Mediterranean Basin, Argentina, Chile, Australia, and California. Although recent intensification techniques organize olive groves in hedgerows, most olive groves are rainfed and the trees are scattered (as in Spain and Italy, which account for 50% of the world's olive oil production). Accurate measurement of trees biovolume is a first step to monitor their performance in olive production and health. In this work, we use one of the most accurate deep learning instance segmentation methods (Mask R-CNN) and unmanned aerial vehicles (UAV) images for olive tree crown and shadow segmentation (OTCS) to further estimate the biovolume of individual trees. We evaluated our approach on images with different spectral bands (red, green, blue, and near infrared) and vegetation indices (normalized difference vegetation index-NDVI-and green normalized difference vegetation index-GNDVI). The performance of red-green-blue (RGB) images were assessed at two spatial resolutions 3 cm/pixel and 13 cm/pixel, while NDVI and GNDV images were only at 13 cm/pixel. All trained Mask R-CNN-based models showed high performance in the tree crown segmentation, particularly when using the fusion of all dataset in GNDVI and NDVI (F1-measure from 95% to 98%). The comparison in a subset of trees of our estimated biovolume with ground truth measurements showed an average accuracy of 82%. Our results support the use of NDVI and GNDVI spectral indices for the accurate estimation of the biovolume of scattered trees, such as olive trees, in UAV images.
Asunto(s)
Olea , Agricultura , Australia , Chile , Italia , EspañaRESUMEN
The identification of human violence determinants has sparked multiple questions from different academic fields. Innovative methodological assessments of the weight and interaction of multiple determinants are still required. Here, we examine multiple features potentially associated with confessed acts of violence in ex-members of illegal armed groups in Colombia (N = 26,349) through deep learning and feature-derived machine learning. We assessed 162 social-contextual and individual mental health potential predictors of historical data regarding consequentialist, appetitive, retaliative, and reactive domains of violence. Deep learning yields high accuracy using the full set of determinants. Progressive feature elimination revealed that contextual factors were more important than individual factors. Combined social network adversities, membership identification, and normalization of violence were among the more accurate social-contextual factors. To a lesser extent the best individual factors were personality traits (borderline, paranoid, and antisocial) and psychiatric symptoms. The results provide a population-based computational classification regarding historical assessments of violence in vulnerable populations.
RESUMEN
Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Genoma de Planta , Genómica/métodos , Retroelementos , Secuencias Repetidas Terminales , Aprendizaje Automático , Redes Neurales de la Computación , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: The Smad7 protein is negative regulator of the TGF-ß signaling pathway, which is upregulated in patients with breast cancer. miRNAs regulate proteins expressions by arresting or degrading the mRNAs. The purpose of this work is to identify a miRNAs profile that regulates the expression of the mRNA coding for Smad7 in breast cancer using the data from patients with breast cancer obtained from the Cancer Genome Atlas Project. METHODS: We develop an automatic search method based on genetic algorithms to find a predictive model based on deep neural networks (DNN) which fit the set of biological data and apply the Olden algorithm to identify the relative importance of each miRNAs. RESULTS: A computational model of non-linear regression is shown, based on deep neural networks that predict the regulation given by the miRNA target transcripts mRNA coding for Smad7 protein in patients with breast cancer, with R2 of 0.99 is shown and MSE of 0.00001. In addition, the model is validated with the results in vivo and in vitro experiments reported in the literature. The set of miRNAs hsa-mir-146a, hsa-mir-93, hsa-mir-375, hsa-mir-205, hsa-mir-15a, hsa-mir-21, hsa-mir-20a, hsa-mir-503, hsa-mir-29c, hsa-mir-497, hsa-mir-107, hsa-mir-125a, hsa-mir-200c, hsa-mir-212, hsa-mir-429, hsa-mir-34a, hsa-let-7c, hsa-mir-92b, hsa-mir-33a, hsa-mir-15b, hsa-mir-224, hsa-mir-185 and hsa-mir-10b integrate a profile that critically regulates the expression of the mRNA coding for Smad7 in breast cancer. CONCLUSIONS: We developed a genetic algorithm to select best features as DNN inputs (miRNAs). The genetic algorithm also builds the best DNN architecture by optimizing the parameters. Although the confirmation of the results by laboratory experiments has not occurred, the results allow suggesting that miRNAs profile could be used as biomarkers or targets in targeted therapies.
Asunto(s)
Algoritmos , Neoplasias de la Mama/genética , Aprendizaje Profundo , MicroARNs/genética , Modelos Biológicos , Redes Neurales de la Computación , Proteína smad7/genética , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , MicroARNs/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteína smad7/metabolismoRESUMEN
Cardiac auscultation is one of the most conventional approaches for the initial assessment of heart disease, however the technique is highly user-dependent and with low repeatability. Several computational approaches based on the analysis of the phonocardiograms (PCG) have been proposed to classify heart sounds into normal or abnormal, but most often do not achieve acceptable levels of sensitivity (Se) and specificity (Sp) or require the use of special hardware. We propose a novel approach for classification of PCG. First, the system makes use of deep neural networks for computing individual cardiac cycle probabilities, followed by classification using weighted probability comparisons. The system was tested on an extended dataset consisting of a balanced sample of 18179 normal and abnormal cycles, achieving Se and Sp values of 91.3% and 93.8% respectively. In addition, the system overcomes previous limitations since it was trained with a balanced sample; also, the decision factor used during the classification stage allows to control the trade-off between Se and Sp, making the proposed system suitable for clinical applications.