Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
1.
Molecules ; 29(14)2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39065004

RESUMEN

In this paper, an alternative and efficient copper(I)-catalyzed synthesis of 2-sulfonyliminocoumarins is developed through a three-component reaction of ortho-hydroxybenzyl alcohol, alkynes, and p-toluenesulfonyl azide. The proposed route for access to the 2-iminocoumarin ring involves a [4 + 2] hetero-Diels-Alder reaction between ortho-quinone methide and ketenimine intermediates generated in situ.

2.
Sci Rep ; 14(1): 16200, 2024 07 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003293

RESUMEN

The COVID-19 pandemic has had a significant impact on students' academic performance. The effects of the pandemic have varied among students, but some general trends have emerged. One of the primary challenges for students during the pandemic has been the disruption of their study habits. Students getting used to online learning routines might find it even more challenging to perform well in face to face learning. Therefore, assessing various potential risk factors associated with students low performance and its prediction is important for early intervention. As students' performance data encompass diverse behaviors, standard machine learning methods find it hard to get useful insights for beneficial practical decision making and early interventions. Therefore, this research explores regularized ensemble learning methods for effectively analyzing students' performance data and reaching valid conclusions. To this end, three pruning strategies are implemented for the random forest method. These methods are based on out-of-bag sampling, sub-sampling and sub-bagging. The pruning strategies discard trees that are adversely affected by the unusual patterns in the students data forming forests of accurate and diverse trees. The methods are illustrated on an example data collected from university students currently studying on campus in a face-to-face modality, who studied during the COVID-19 pandemic through online learning. The suggested methods outperform all the other methods considered in this paper for predicting students at the risk of academic failure. Moreover, various factors such as class attendance, students interaction, internet connectivity, pre-requisite course(s) during the restrictions, etc., are identified as the most significant features.


Asunto(s)
COVID-19 , Aprendizaje Automático , Estudiantes , Humanos , COVID-19/epidemiología , Factores de Riesgo , Educación a Distancia/métodos , SARS-CoV-2/aislamiento & purificación , Rendimiento Académico , Pandemias , Universidades , Medición de Riesgo/métodos , Femenino , Masculino
3.
Heliyon ; 10(12): e32203, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-38975167

RESUMEN

Probability distributions are widely utilized in applied sciences, especially in the field of biomedical science. Biomedical data typically exhibit positive skewness, necessitating the use of flexible, skewed distributions to effectively model such phenomena. In this study, we introduce a novel approach to characterize new lifetime distributions, known as the New Flexible Exponent Power (NFEP) Family of distributions. This involves the addition of a new parameter to existing distributions. A specific sub-model within the proposed class, known as the New Flexible Exponent Power Weibull (NFEP-Wei), is derived to illustrate the concept of flexibility. We employ the well-established Maximum Likelihood Estimation (MLE) method to estimate the unknown parameters in this family of distributions. A simulation study is conducted to assess the behavior of the estimators in various scenarios. To gauge the flexibility and effectiveness of the NFEP-Wei distribution, we compare it with the AP-Wei (alpha power Weibull), MO-Wei (Marshal Olkin Weibull), classical Wei (Weibull), NEP-Wei (new exponent power Weibull), FRLog-Wei (flexible reduced logarithmic Weibull), and Kum-Wei (Kumaraswamy Weibull) distributions by analyzing four distinct biomedical datasets. The results demonstrate that the NFEP-Wei distribution outperforms the compared distributions.

4.
BMC Med Inform Decis Mak ; 24(1): 120, 2024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38715002

RESUMEN

In recent times, time-to-event data such as time to failure or death is routinely collected alongside high-throughput covariates. These high-dimensional bioinformatics data often challenge classical survival models, which are either infeasible to fit or produce low prediction accuracy due to overfitting. To address this issue, the focus has shifted towards introducing a novel approaches for feature selection and survival prediction. In this article, we propose a new hybrid feature selection approach that handles high-dimensional bioinformatics datasets for improved survival prediction. This study explores the efficacy of four distinct variable selection techniques: LASSO, RSF-vs, SCAD, and CoxBoost, in the context of non-parametric biomedical survival prediction. Leveraging these methods, we conducted comprehensive variable selection processes. Subsequently, survival analysis models-specifically CoxPH, RSF, and DeepHit NN-were employed to construct predictive models based on the selected variables. Furthermore, we introduce a novel approach wherein only variables consistently selected by a majority of the aforementioned feature selection techniques are considered. This innovative strategy, referred to as the proposed method, aims to enhance the reliability and robustness of variable selection, subsequently improving the predictive performance of the survival analysis models. To evaluate the effectiveness of the proposed method, we compare the performance of the proposed approach with the existing LASSO, RSF-vs, SCAD, and CoxBoost techniques using various performance metrics including integrated brier score (IBS), concordance index (C-Index) and integrated absolute error (IAE) for numerous high-dimensional survival datasets. The real data applications reveal that the proposed method outperforms the competing methods in terms of survival prediction accuracy.


Asunto(s)
Redes Neurales de la Computación , Humanos , Análisis de Supervivencia , Estadísticas no Paramétricas , Biología Computacional/métodos
5.
PLoS One ; 19(5): e0297544, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38809823

RESUMEN

Statistical quality control is concerned with the analysis of production and manufacturing processes. Control charts are process control techniques, commonly applied to observe and control deviations. Shewhart control charts are very sensitive and used for large shifts based on the basic assumption of normality. Cumulative Sum (CUSUM) control charts are effective for identifying that may have special causes, such as outliers or excessive variability in subgroup means. This study uses a CUSUM control chart problems structure to evaluate the performance of robust dispersion parameters. We investigated the design structure features of various control charts, based on currently defined estimators and some new robust scale estimators using trimming and winsorization in different scenarios. The Median Absolute Deviation based on trimming and winsorization is introduced. The effectiveness of CUSUM control charts based on these estimators is evaluated in terms of average run length (ARL) and Standard Deviation of the Run Length (SDRL) using a simulation study. The results show the robustness of the CUSUM chart in observing small changes in magnitude for both normal and contaminated data. In general, robust estimators MADTM and MADWM based on CUSUM charts outperform in all environments.


Asunto(s)
Control de Calidad , Modelos Estadísticos , Simulación por Computador , Algoritmos
6.
Sci Rep ; 14(1): 8992, 2024 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-38637663

RESUMEN

This paper aims to introduce a novel family of probability distributions by the well-known method of the T-X family of distributions. The proposed family is called a "Novel Generalized Exponent Power X Family" of distributions. A three-parameters special sub-model of the proposed method is derived and named a "Novel Generalized Exponent Power Weibull" distribution (NGEP-Wei for short). For the proposed family, some statistical properties are derived including the hazard rate function, moments, moment generating function, order statistics, residual life, and reverse residual life. The well-known method of estimation, the maximum likelihood estimation method is used for estimating the model parameters. Besides, a comprehensive Monte Carlo simulation study is conducted to assess the efficacy of this estimation method. Finally, the model selection criterion such as Akaike information criterion (AINC), the correct information criterion (CINC), the Bayesian information criterion (BINC), the Hannan-Quinn information criterion (HQINC), the Cramer-von-Misses (CRMI), and the ANDA (Anderson-Darling) are used for comparison purpose. The comparison of the NGEP-Wei with other rival distributions is made by Two COVID-19 data sets. In terms of performance, we show that the proposed method outperforms the other competing methods included in this study.


Asunto(s)
COVID-19 , Humanos , Teorema de Bayes , México/epidemiología , COVID-19/epidemiología , Simulación por Computador , Canadá
7.
Sci Rep ; 14(1): 9116, 2024 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-38643305

RESUMEN

RNA modifications are pivotal in the development of newly synthesized structures, showcasing a vast array of alterations across various RNA classes. Among these, 5-hydroxymethylcytosine (5HMC) stands out, playing a crucial role in gene regulation and epigenetic changes, yet its detection through conventional methods proves cumbersome and costly. To address this, we propose Deep5HMC, a robust learning model leveraging machine learning algorithms and discriminative feature extraction techniques for accurate 5HMC sample identification. Our approach integrates seven feature extraction methods and various machine learning algorithms, including Random Forest, Naive Bayes, Decision Tree, and Support Vector Machine. Through K-fold cross-validation, our model achieved a notable 84.07% accuracy rate, surpassing previous models by 7.59%, signifying its potential in early cancer and cardiovascular disease diagnosis. This study underscores the promise of Deep5HMC in offering insights for improved medical assessment and treatment protocols, marking a significant advancement in RNA modification analysis.


Asunto(s)
5-Metilcitosina/análogos & derivados , Algoritmos , Redes Neurales de la Computación , Teorema de Bayes , Máquina de Vectores de Soporte , ARN
8.
Sci Rep ; 13(1): 20020, 2023 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-37973894

RESUMEN

The article introduces a novel Bayesian AEWMA Control Chart that integrates different loss functions (LFs) like the square error loss function and Linex loss function under an informative prior for posterior and posterior predictive distributions, implemented across diverse ranked set sampling (RSS) designs. The main objective is to detect small to moderate shifts in the process mean, with the average run length and standard deviation of run length serving as performance measures. The study employs a hard bake process in semiconductor production to demonstrate the effectiveness of the proposed chart, comparing it with existing control charts through Monte Carlo simulations. The results underscore the superiority of the proposed approach, particularly under RSS designs compared to simple random sampling (SRS), in identifying out-of-control signals. Overall, this study contributes a comprehensive method integrating various LFs and RSS schemes, offering a more precise and efficient approach for detecting shifts in the process mean. Real-world applications highlight the heightened sensitivity of the suggested chart in identifying out-of-control signals compared to existing Bayesian charts using SRS.

9.
Sci Rep ; 13(1): 20723, 2023 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-38007541

RESUMEN

This study introduces the Bayesian adaptive exponentially weighted moving average (AEWMA) control chart within the framework of measurement error, examining two separate loss functions: the squared error loss function and the linex loss function. We conduct an analysis of the posterior and posterior predictive distributions utilizing a conjugate prior. In the presence of measurement error (ME), we employ a linear covariate model to assess the control chart's effectiveness. Additionally, we explore the impacts of measurement error by investigating multiple measurements and a method involving linearly increasing variance. We conduct a Monte Carlo simulation study to assess the control chart's performance under ME, examining its run length profile. Subsequently, we offer a specific numerical instance related to the hard-bake process in semiconductor manufacturing, serving to verify the functionality and practical application of the suggested Bayesian AEWMA control chart when confronted with ME.

10.
Sci Rep ; 13(1): 18240, 2023 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-37880337

RESUMEN

Control charts, including exponentially moving average (EWMA) , are valuable for efficiently detecting small to moderate shifts. This study introduces a Bayesian EWMA control chart that employs ranked set sampling (RSS) with known prior information and two distinct loss functions (LFs), the Square Error Loss function (SELF) and the Linex Loss function (LLF), for posterior and posterior predictive distributions. The chart's performance is assessed using average run length (ARL) and standard deviation of run length (SDRL) profiles, and it is compared to the Bayesian EWMA control chart based on simple random sampling (SRS). The results indicate that the proposed control chart detects small to moderate shifts more effectively. The application in semiconductor manufacturing provides concrete evidence that the Bayesian EWMA control chart, when implemented with RSS schemes, demonstrates a higher degree of sensitivity in detecting deviations from normal process behavior. Comparison to the Bayesian EWMA control chart using SRS, it exhibits a superior ability to identify and flag instances where the manufacturing process is going out of control. This heightened sensitivity is critical for promptly addressing and rectifying issues, which ultimately contributes to improved quality control in semiconductor production.

11.
Sci Rep ; 13(1): 14042, 2023 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-37640724

RESUMEN

The objective of this study is to investigate the behavior of the Bayesian exponentially weighted moving average (EWMA) control chart in the presence of measurement error (ME). It explores the impact of different ranked set sampling designs and loss functions on the performance of the control chart when ME is present. The analysis incorporates a covariate model, multiple measurement methods, and a conjugate prior to account for ME. The performance evaluation of the proposed Bayesian EWMA control chart with ME includes metrics such as average run length and standard deviation of run lengths. The findings, obtained through Monte Carlo simulation and real data application, indicated that ME significantly affects the performance of the Bayesian EWMA control chart when RSS schemes are employed. Particularly noteworthy is the superior performance of the median RSS scheme compared to the other two schemes in the presence of ME.

12.
PeerJ Comput Sci ; 9: e1190, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37346678

RESUMEN

The outbreak of the COVID-19 pandemic has also triggered a tsunami of news, instructions, and precautionary measures related to the disease on social media platforms. Despite the considerable support on social media, a large number of fake propaganda and conspiracies are also circulated. People also reacted to COVID-19 vaccination on social media and expressed their opinions, perceptions, and conceptions. The present research work aims to explore the opinion dynamics of the general public about COVID-19 vaccination to help the administration authorities to devise policies to increase vaccination acceptance. For this purpose, a framework is proposed to perform sentiment analysis of COVID-19 vaccination-related tweets. The influence of term frequency-inverse document frequency, bag of words (BoW), Word2Vec, and combination of TF-IDF and BoW are explored with classifiers including random forest, gradient boosting machine, extra tree classifier (ETC), logistic regression, Naïve Bayes, stochastic gradient descent, multilayer perceptron, convolutional neural network (CNN), bidirectional encoder representations from transformers (BERT), long short-term memory (LSTM), and recurrent neural network (RNN). Results reveal that ETC outperforms using BoW with a 92% of accuracy and is the most suitable approach for sentiment analysis of COVID-19-related tweets. Opinion dynamics show that sentiments in favor of vaccination have increased over time.

13.
Sci Rep ; 13(1): 9463, 2023 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-37301897

RESUMEN

The memory-type control charts, such as cumulative sum (CUSUM) and exponentially weighted moving average control chart, are more desirable for detecting a small or moderate shift in the production process of a location parameter. In this article, a novel Bayesian adaptive EWMA (AEWMA) control chat utilizing ranked set sampling (RSS) designs is proposed under two different loss functions, i.e., square error loss function (SELF) and linex loss function (LLF), and with informative prior distribution to monitor the mean shift of the normally distributed process. The extensive Monte Carlo simulation method is used to check the performance of the suggested Bayesian-AEWMA control chart using RSS schemes. The effectiveness of the proposed AEWMA control chart is evaluated through the average run length (ARL) and standard deviation of run length (SDRL). The results indicate that the proposed Bayesian control chart applying RSS schemes is more sensitive in detecting mean shifts than the existing Bayesian AEWAM control chart based on simple random sampling (SRS). Finally, to demonstrate the effectiveness of the proposed Bayesian-AEWMA control chart under different RSS schemes, we present a numerical example involving the hard-bake process in semiconductor fabrication. Our results show that the Bayesian-AEWMA control chart using RSS schemes outperforms the EWMA and AEWMA control charts utilizing the Bayesian approach under simple random sampling in detecting out-of-control signals.


Asunto(s)
Colina O-Acetiltransferasa , Proyectos de Investigación , Teorema de Bayes , Simulación por Computador , Método de Montecarlo
14.
Front Public Health ; 10: 922795, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35968475

RESUMEN

In this article, a new hybrid time series model is proposed to predict COVID-19 daily confirmed cases and deaths. Due to the variations and complexity in the data, it is very difficult to predict its future trajectory using linear time series or mathematical models. In this research article, a novel hybrid ensemble empirical mode decomposition and error trend seasonal (EEMD-ETS) model has been developed to forecast the COVID-19 pandemic. The proposed hybrid model decomposes the complex, nonlinear, and nonstationary data into different intrinsic mode functions (IMFs) from low to high frequencies, and a single monotone residue by applying EEMD. The stationarity of each IMF component is checked with the help of the augmented Dicky-Fuller (ADF) test and is then used to build up the EEMD-ETS model, and finally, future predictions have been obtained from the proposed hybrid model. For illustration purposes and to check the performance of the proposed model, four datasets of daily confirmed cases and deaths from COVID-19 in Italy, Germany, the United Kingdom (UK), and France have been used. Similarly, four different statistical metrics, i.e., root mean square error (RMSE), symmetric mean absolute parentage error (sMAPE), mean absolute error (MAE), and mean absolute percentage error (MAPE) have been used for a comparison of different time series models. It is evident from the results that the proposed hybrid EEMD-ETS model outperforms the other time series and machine learning models. Hence, it is worthy to be used as an effective model for the prediction of COVID-19.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Predicción , Humanos , Modelos Teóricos , Pandemias , Estaciones del Año
15.
Health Res Policy Syst ; 20(1): 43, 2022 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-35436896

RESUMEN

BACKGROUND: Brief behavioural support can effectively help tuberculosis (TB) patients quit smoking and improve their outcomes. In collaboration with TB programmes in Bangladesh, Nepal and Pakistan, we evaluated the implementation and scale-up of cessation support using four strategies: (1) brief tobacco cessation intervention, (2) integration of tobacco cessation within routine training, (3) inclusion of tobacco indicators in routine records and (4) embedding research within TB programmes. METHODS: We used mixed methods of observation, interviews, questionnaires and routine data. We aimed to understand the extent and facilitators of vertical scale-up (institutionalization) within 59 health facility learning sites in Pakistan, 18 in Nepal and 15 in Bangladesh, and horizontal scale-up (increased coverage beyond learning sites). We observed training and surveyed all 169 TB health workers who were trained, in order to measure changes in their confidence in delivering cessation support. Routine TB data from the learning sites were analysed to assess intervention delivery and use of TB forms revised to report smoking status and cessation support provided. A purposive sample of TB health workers, managers and policy-makers were interviewed (Bangladesh n = 12; Nepal n = 13; Pakistan n = 19). Costs of scale-up were estimated using activity-based cost analysis. RESULTS: Routine data indicated that health workers in learning sites asked all TB patients about tobacco use and offered them cessation support. Qualitative data showed use of intervention materials, often with adaptation and partial implementation in busy clinics. Short (1-2 hours) training integrated within existing programmes increased mean confidence in delivering cessation support by 17% (95% CI: 14-20%). A focus on health system changes (reporting, training, supervision) facilitated vertical scale-up. Dissemination of materials beyond learning sites and changes to national reporting forms and training indicated a degree of horizontal scale-up. Embedding research within TB health systems was crucial for horizontal scale-up and required the dynamic use of tactics including alliance-building, engagement in the wider policy process, use of insider researchers and a deep understanding of health system actors and processes. CONCLUSIONS: System-level changes within TB programmes may facilitate routine delivery of cessation support to TB patients. These strategies are inexpensive, and with concerted efforts from TB programmes and donors, tobacco cessation can be institutionalized at scale.


Asunto(s)
Cese del Uso de Tabaco , Tuberculosis , Conductas Relacionadas con la Salud , Humanos , Fumar/terapia , Uso de Tabaco , Cese del Uso de Tabaco/métodos , Tuberculosis/terapia
17.
J Healthc Eng ; 2021: 2567080, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34512933

RESUMEN

In this paper, we have focused on machine learning (ML) feature selection (FS) algorithms for identifying and diagnosing multidrug-resistant (MDR) tuberculosis (TB). MDR-TB is a universal public health problem, and its early detection has been one of the burning issues. The present study has been conducted in the Malakand Division of Khyber Pakhtunkhwa, Pakistan, to further add to the knowledge on the disease and to deal with the issues of identification and early detection of MDR-TB by ML algorithms. These models also identify the most important factors causing MDR-TB infection whose study gives additional insights into the matter. ML algorithms such as random forest, k-nearest neighbors, support vector machine, logistic regression, leaset absolute shrinkage and selection operator (LASSO), artificial neural networks (ANNs), and decision trees are applied to analyse the case-control dataset. This study reveals that close contacts of MDR-TB patients, smoking, depression, previous TB history, improper treatment, and interruption in first-line TB treatment have a great impact on the status of MDR. Accordingly, weight loss, chest pain, hemoptysis, and fatigue are important symptoms. Based on accuracy, sensitivity, and specificity, SVM and RF are the suggested models to be used for patients' classifications.


Asunto(s)
Antituberculosos , Tuberculosis Resistente a Múltiples Medicamentos , Algoritmos , Antituberculosos/uso terapéutico , Humanos , Aprendizaje Automático , Pakistán , Tuberculosis Resistente a Múltiples Medicamentos/diagnóstico , Tuberculosis Resistente a Múltiples Medicamentos/tratamiento farmacológico , Tuberculosis Resistente a Múltiples Medicamentos/epidemiología
18.
PeerJ Comput Sci ; 7: e562, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34141889

RESUMEN

In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.

19.
PLoS One ; 15(11): e0242762, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33253248

RESUMEN

OBJECTIVES: Forecasting epidemics like COVID-19 is of crucial importance, it will not only help the governments but also, the medical practitioners to know the future trajectory of the spread, which might help them with the best possible treatments, precautionary measures and protections. In this study, the popular autoregressive integrated moving average (ARIMA) will be used to forecast the cumulative number of confirmed, recovered cases, and the number of deaths in Pakistan from COVID-19 spanning June 25, 2020 to July 04, 2020 (10 days ahead forecast). METHODS: To meet the desire objectives, data for this study have been taken from the Ministry of National Health Service of Pakistan's website from February 27, 2020 to June 24, 2020. Two different ARIMA models will be used to obtain the next 10 days ahead point and 95% interval forecast of the cumulative confirmed cases, recovered cases, and deaths. Statistical software, RStudio, with "forecast", "ggplot2", "tseries", and "seasonal" packages have been used for data analysis. RESULTS: The forecasted cumulative confirmed cases, recovered, and the number of deaths up to July 04, 2020 are 231239 with a 95% prediction interval of (219648, 242832), 111616 with a prediction interval of (101063, 122168), and 5043 with a 95% prediction interval of (4791, 5295) respectively. Statistical measures i.e. root mean square error (RMSE) and mean absolute error (MAE) are used for model accuracy. It is evident from the analysis results that the ARIMA and seasonal ARIMA model is better than the other time series models in terms of forecasting accuracy and hence recommended to be used for forecasting epidemics like COVID-19. CONCLUSION: It is concluded from this study that the forecasting accuracy of ARIMA models in terms of RMSE, and MAE are better than the other time series models, and therefore could be considered a good forecasting tool in forecasting the spread, recoveries, and deaths from the current outbreak of COVID-19. Besides, this study can also help the decision-makers in developing short-term strategies with regards to the current number of disease occurrences until an appropriate medication is developed.


Asunto(s)
COVID-19/epidemiología , Predicción , Humanos , Modelos Estadísticos , Pakistán/epidemiología , Estaciones del Año
20.
Front Genet ; 11: 539227, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33093842

RESUMEN

Meiotic recombination is the driving force of evolutionary development and an important source of genetic variation. The meiotic recombination does not take place randomly in a chromosome but occurs in some regions of the chromosome. A region in chromosomes with higher rate of meiotic recombination events are considered as hotspots and a region where frequencies of the recombination events are lower are called coldspots. Prediction of meiotic recombination spots provides useful information about the basic functionality of inheritance and genome diversity. This study proposes an intelligent computational predictor called iRSpots-DNN for the identification of recombination spots. The proposed predictor is based on a novel feature extraction method and an optimized deep neural network (DNN). The DNN was employed as a classification engine whereas, the novel features extraction method was developed to extract meaningful features for the identification of hotspots and coldspots across the yeast genome. Unlike previous algorithms, the proposed feature extraction avoids bias among different selected features and preserved the sequence discriminant properties along with the sequence-structure information simultaneously. This study also considered other effective classifiers named support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to predict recombination spots. Experimental results on a benchmark dataset with 10-fold cross-validation showed that iRSpots-DNN achieved the highest accuracy, i.e., 95.81%. Additionally, the performance of the proposed iRSpots-DNN is significantly better than the existing predictors on a benchmark dataset. The relevant benchmark dataset and source code are freely available at: https://github.com/Fatima-Khan12/iRspot_DNN/tree/master/iRspot_DNN.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA