Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
J Sci Food Agric ; 2024 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-39031773

RESUMEN

BACKGROUND: Different varieties of rice vary in planting time, stress resistance, and other characteristics. With advances in rice-breeding technology, the number of rice varieties has increased significantly, making variety identification crucial for both trading and planting. RESULTS: This study collected RGB images of 20 hybrid rice seed varieties. An enhanced deep super-resolution network (EDSR) was employed to enhance image resolution, and a variety classification model utilizing the high-resolution dataset demonstrated superior performance to that of the model using the low-resolution dataset. A novel training sample selection methodology was introduced integrating deep learning with the Kennard-Stone (KS) algorithm. Convolutional neural networks (CNN) and autoencoders served as supervised and unsupervised feature extractors, respectively. The extracted feature vectors were subsequently processed by the KS algorithm to select training samples. The proposed methodologies exhibited superior performance over the random selection approach in rice variety classification, with an approximately 10.08% improvement in overall classification accuracy. Furthermore, the impact of noise on the proposed methodology was investigated by introducing noise to the images, and the proposed methodologies maintained superior performance relative to the random selection approach on the noisy image dataset. CONCLUSION: The experimental results indicate that both supervised and unsupervised learning models performed effectively as feature extractors, and the deep learning framework significantly influenced the selection of training set samples. This study presents a novel approach for training sample selection in classification tasks and suggests the potential for extending the proposed method to image datasets and other types of datasets. Further exploration of this potential is warranted. © 2024 Society of Chemical Industry.

2.
Nutr Neurosci ; : 1-11, 2024 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-39046352

RESUMEN

Objective: Previous studies have suggested diet was associated with depressive symptoms. We aimed to develop and validate Dietary Depression Index (DDI) based on dietary prediction of depression in a large Chinese cancer screening cohort.Methods: In the training set (n = 2729), we developed DDI by using intake of 20 food groups derived from a food frequency questionnaire to predict depression as assessed by Patient Health Questionnaire-9 based on the reduced rank regression method. Sensitivity, specificity, positive predictive value, and negative predictive value were used to assess the performance of DDI in evaluating depression in the validation dataset (n = 1176).Results: Receiver operating characteristic analysis was constructed to determine the best cut-off value of DDI in predicting depression. In the study population, the DDI ranged from -3.126 to 1.810. The discriminative ability of DDI in predicting depression was good with the AUC of 0.799 overall, 0.794 in males and 0.808 in females. The best cut-off values of DDI for depression prediction were 0.204 overall, 0.330 in males and 0.034 in females. DDI was a validated method to assess the effects of diet on depression.Conclusion: Among individual food components in DDI, fermented vegetables, fresh vegetables, whole grains and onions were inversely associated, whereas legumes, pickled vegetables and rice were positively associated with depressive symptoms.

3.
Sensors (Basel) ; 24(5)2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38475088

RESUMEN

A computational spectrometer is a novel form of spectrometer powerful for portable in situ applications. In the encoding part of the computational spectrometer, filters with highly non-correlated properties are requisite for compressed sensing, which poses severe challenges for optical design and fabrication. In the reconstruction part of the computational spectrometer, conventional iterative reconstruction algorithms are featured with limited efficiency and accuracy, which hinders their application for real-time in situ measurements. This study proposes a neural network computational spectrometer trained by a small dataset with high-correlation optical filters. We aim to change the paradigm by which the accuracy of neural network computational spectrometers depends heavily on the amount of training data and the non-correlation property of optical filters. First, we propose a presumption about a distribution law for the common large training dataset, in which a unique widespread distribution law is shown when calculating the spectrum correlation. Based on that, we extract the original dataset according to the distribution probability and form a small training dataset. Then a fully connected neural network architecture is constructed to perform the reconstruction. After that, a group of thin film filters are introduced to work as the encoding layer. Then the neural network is trained by a small dataset under high-correlation filters and applied in simulation. Finally, the experiment is carried out and the result indicates that the neural network enabled by a small training dataset has performed very well with the thin film filters. This study may provide a reference for computational spectrometers based on high-correlation optical filters.

4.
Data Brief ; 51: 109727, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38020417

RESUMEN

The inverse kinematics plays a vital role in the planning and execution of robot motions. In the design of robotic motion control for NAO robot arms, it is necessary to find the proper inverse kinematics model. Neural networks are such a data-driven modeling technique that they are so flexible for modeling the inverse kinematics. This inverse kinematics model can be obtained by means of training neural networks with the dataset. This training process cannot be achieved without the presence of the dataset. Therefore, the contribution of this research is to provide the dataset to develop neural networks-based inverse kinematics model for NAO robot arms. The dataset that we created in this paper is named ARKOMA. ARKOMA is an acronym for ARif eKO MAuridhi, all of whom are the creators of this dataset. This dataset contains 10000 input-output data pairs in which the end-effector position and orientation are the input data and a set of joint angular positions are the output data. For further application, this dataset was split into three subsets: training dataset, validation dataset, and testing dataset. From a set of 10000 data, 60 % of data was allocated for the training dataset, 20 % of data for the validation dataset, and the remaining 20 % of data for the testing dataset. The dataset that we provided in this paper can be applied for NAO H25 v3.3 or later.

5.
J Fr Ophtalmol ; 46(7): 706-711, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37537126

RESUMEN

PURPOSE: The purpose of this study was to evaluate the performance of ChatGPT, a cutting-edge artificial intelligence (AI) language model developed by OpenAI, in successfully completing the French language version of the European Board of Ophthalmology (EBO) examination and to assess its potential role in medical education and knowledge assessment. METHODS: ChatGPT, based on the GPT-4 architecture, was exposed to a series of EBO examination questions in French, covering various aspects of ophthalmology. The AI's performance was evaluated by comparing its responses with the correct answers provided by ophthalmology experts. Additionally, the study assessed the time taken by ChatGPT to answer each question as a measure of efficiency. RESULTS: ChatGPT achieved a 91% success rate on the EBO examination, demonstrating a high level of competency in ophthalmology knowledge and application. The AI provided correct answers across all question categories, indicating a strong understanding of basic sciences, clinical knowledge, and clinical management. The AI model also answered the questions rapidly, taking only a fraction of the time needed by human test-takers. CONCLUSION: ChatGPT's performance on the French language version of the EBO examination demonstrates its potential to be a valuable tool in medical education and knowledge assessment. Further research is needed to explore optimal ways to implement AI language models in medical education and to address the associated ethical and practical concerns.


Asunto(s)
Inteligencia Artificial , Oftalmología , Humanos , Lenguaje
6.
Sensors (Basel) ; 22(13)2022 Jun 21.
Artículo en Inglés | MEDLINE | ID: mdl-35808174

RESUMEN

Obstacle detection for autonomous navigation through semantic image segmentation using neural networks has grown in popularity for use in unmanned ground and surface vehicles because of its ability to rapidly create a highly accurate pixel-wise classification of complex scenes. Due to the lack of available training data, semantic networks are rarely applied to navigation in complex water scenes such as rivers, creeks, canals, and harbors. This work seeks to address the issue by making a one-of-its-kind River Obstacle Segmentation En-Route By USV Dataset (ROSEBUD) publicly available for use in robotic SLAM applications that map water and non-water entities in fluvial images from the water level. ROSEBUD provides a challenging baseline for surface navigation in complex environments using complex fluvial scenes. The dataset contains 549 images encompassing various water qualities, seasons, and obstacle types that were taken on narrow inland rivers and then hand annotated for use in semantic network training. The difference between the ROSEBUD dataset and existing marine datasets was verified. Two state-of-the-art networks were trained on existing water segmentation datasets and tested for generalization to the ROSEBUD dataset. Results from further training show that modern semantic networks custom made for water recognition, and trained on marine images, can properly segment large areas, but they struggle to properly segment small obstacles in fluvial scenes without further training on the ROSEBUD dataset.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Visión Monocular , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Ríos , Semántica
7.
Sensors (Basel) ; 22(3)2022 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-35161986

RESUMEN

The problem surrounding convolutional neural network robustness and noise immunity is currently of great interest. In this paper, we propose a technique that involves robustness estimation and stability improvement. We also examined the noise immunity of convolutional neural networks and estimated the influence of uncertainty in the training and testing datasets on recognition probability. For this purpose, we estimated the recognition accuracies of multiple datasets with different uncertainties; we analyzed these data and provided the dependence of recognition accuracy on the training dataset uncertainty. We hypothesized and proved the existence of an optimal (in terms of recognition accuracy) amount of uncertainty in the training data for neural networks working with undefined uncertainty data. We have shown that the determination of this optimum can be performed using statistical modeling. Adding an optimal amount of uncertainty (noise of some kind) to the training dataset can be used to improve the overall recognition quality and noise immunity of convolutional neural networks.


Asunto(s)
Redes Neurales de la Computación , Ruido , Procesamiento de Imagen Asistido por Computador , Probabilidad , Reconocimiento en Psicología , Incertidumbre
8.
Environ Sci Pollut Res Int ; 29(57): 85727-85741, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35001275

RESUMEN

The enforcement of the Movement Control Order to curtail the spread of COVID-19 has affected home energy consumption, especially HVAC systems. Occupancy detection and estimation have been recognized as key contributors to improving building energy efficiency. Several solutions have been proposed for the past decade to improve the precision performance of occupancy detection and estimation in the building. Environmental sensing is one of the practical solutions to detect and estimate occupants in the building during uncertain behavior. However, the literature reveals that the performance of environmental sensing is relatively poor due to the poor quality of the training dataset used in the model. This study proposed a smart sensing framework that combined camera-based and environmental sensing approaches using supervised learning to gather standard and robust datasets related to indoor occupancy that can be used for cross-validation of different machine learning algorithms in formal research. The proposed solution is tested in the living room with a prototype system integrated with various sensors using a random forest regressor, although other techniques could be easily integrated within the proposed framework. The primary implication of this study is to predict the room occupation through the use of sensors providing inputs into a model to lower energy consumption. The results indicate that the proposed solution can obtain data, process, and predict occupant presence and number with 99.3% accuracy. Additionally, to demonstrate the impact of occupant number in energy saving, one room with two zones is modeled each zone with air condition with different thermostat controller. The first zone uses IoFClime and the second zone uses modified IoFClime using a design-builder. The simulation is conducted using EnergyPlus software with the random simulation of 10 occupants and local climate data under three scenarios. The Fanger model's thermal comfort analysis shows that up to 50% and 25% energy can be saved under the first and third scenarios.


Asunto(s)
Contaminación del Aire Interior , COVID-19 , Humanos , Contaminación del Aire Interior/análisis , Aire Acondicionado , Clima , Eficiencia
9.
Sensors (Basel) ; 21(15)2021 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-34372214

RESUMEN

This paper proposes a method to embed and extract a watermark on a digital hologram using a deep neural network. The entire algorithm for watermarking digital holograms consists of three sub-networks. For the robustness of watermarking, an attack simulation is inserted inside the deep neural network. By including attack simulation and holographic reconstruction in the network, the deep neural network for watermarking can simultaneously train invisibility and robustness. We propose a network training method using hologram and reconstruction. After training the proposed network, we analyze the robustness of each attack and perform re-training according to this result to propose a method to improve the robustness. We quantitatively evaluate the results of robustness against various attacks and show the reliability of the proposed technique.


Asunto(s)
Seguridad Computacional , Interpretación de Imagen Asistida por Computador , Algoritmos , Redes Neurales de la Computación , Reproducibilidad de los Resultados
10.
Sensors (Basel) ; 21(14)2021 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-34300686

RESUMEN

The Internet of Things (IoT) consists of small devices or a network of sensors, which permanently generate huge amounts of data. Usually, they have limited resources, either computing power or memory, which means that raw data are transferred to central systems or the cloud for analysis. Lately, the idea of moving intelligence to the IoT is becoming feasible, with machine learning (ML) moved to edge devices. The aim of this study is to provide an experimental analysis of processing a large imbalanced dataset (DS2OS), split into a training dataset (80%) and a test dataset (20%). The training dataset was reduced by randomly selecting a smaller number of samples to create new datasets Di (i = 1, 2, 5, 10, 15, 20, 40, 60, 80%). Afterwards, they were used with several machine learning algorithms to identify the size at which the performance metrics show saturation and classification results stop improving with an F1 score equal to 0.95 or higher, which happened at 20% of the training dataset. Further on, two solutions for the reduction of the number of samples to provide a balanced dataset are given. In the first, datasets DRi consist of all anomalous samples in seven classes and a reduced majority class ('NL') with i = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20 percent of randomly selected samples. In the second, datasets DCi are generated from the representative samples determined with clustering from the training dataset. All three dataset reduction methods showed comparable performance results. Further evaluation of training times and memory usage on Raspberry Pi 4 shows a possibility to run ML algorithms with limited sized datasets on edge devices.


Asunto(s)
Internet de las Cosas , Aprendizaje Automático , Algoritmos , Benchmarking
11.
Comput Methods Programs Biomed ; 206: 106111, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33957377

RESUMEN

BACKGROUND AND OBJECTIVE: Lung cancer is the most common type of cancer with a high mortality rate. Early detection using medical imaging is critically important for the long-term survival of the patients. Computer-aided diagnosis (CAD) tools can potentially reduce the number of incorrect interpretations of medical image data by radiologists. Datasets with adequate sample size, annotation, and truth are the dominant factors in developing and training effective CAD algorithms. The objective of this study was to produce a practical approach and a tool for the creation of medical image datasets. METHODS: The proposed model uses the modified maximum transverse diameter approach to mark a putative lung nodule. The modification involves the possibility to use a set of overlapping spheres of appropriate size to approximate the shape of the nodule. The algorithm embedded in the model also groups the marks made by different readers for the same lesion. We used the data of 536 randomly selected patients of Moscow outpatient clinics to create a dataset of standard-dose chest computed tomography (CT) scans utilizing the double-reading approach with arbitration. Six volunteer radiologists independently produced a report for each scan using the proposed model with the main focus on the detection of lesions with sizes ranging from 3 to 30 mm. After this, an arbitrator reviewed their marks and annotations. RESULTS: The maximum transverse diameter approach outperformed the alternative methods (3D box, ellipsoid, and complete outline construction) in a study of 10,000 computer-generated tumor models of different shapes in terms of accuracy and speed of nodule shape approximation. The markup and annotation of the CTLungCa-500 dataset revealed 72 studies containing no lung nodules. The remaining 464 CT scans contained 3151 lesions marked by at least one radiologist: 56%, 14%, and 29% of the lesions were malignant, benign, and non-nodular, respectively. 2887 lesions have the target size of 3-30 mm. Only 70 nodules were uniformly identified by all the six readers. An increase in the number of independent readers providing CT scans interpretations led to an accuracy increase associated with a decrease in agreement. The dataset markup process took three working weeks. CONCLUSIONS: The developed cluster model simplifies the collaborative and crowdsourced creation of image repositories and makes it time-efficient. Our proof-of-concept dataset provides a valuable source of annotated medical imaging data for training CAD algorithms aimed at early detection of lung nodules. The tool and the dataset are publicly available at https://github.com/Center-of-Diagnostics-and-Telemedicine/FAnTom.git and https://mosmed.ai/en/datasets/ct_lungcancer_500/, respectively.


Asunto(s)
Neoplasias Pulmonares , Nódulo Pulmonar Solitario , Algoritmos , Diagnóstico por Computador , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Interpretación de Imagen Radiográfica Asistida por Computador , Sensibilidad y Especificidad , Nódulo Pulmonar Solitario/diagnóstico por imagen , Tomografía Computarizada por Rayos X
12.
Australas Phys Eng Sci Med ; 42(2): 573-584, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-31087232

RESUMEN

The construction of a powerful statistical shape model (SSM) requires a rich training dataset that includes the large variety of complex anatomical topologies. The lack of real data causes most SSMs unable to generalize possible unseen instances. Artificial enrichment of training data is one of the methods proposed to address this issue. In this paper, we introduce a novel technique called constrained cage-based deformation (CCBD), which has the ability to produce unlimited artificial data that promises to enrich variability within the training dataset. The proposed method is a two-step algorithm: in the first step, it moves a few handles together, and in the second step transfers the displacements of these handles to the base mesh vertices to generate a real new instance. The evaluation of statistical characteristics of the CCBD confirms that our proposed technique outperforms notable data-generating methods quantitatively, in terms of the generalization ability, and with respect to specificity.


Asunto(s)
Algoritmos , Bases de Datos como Asunto , Modelos Estadísticos , Fémur/anatomía & histología , Humanos , Imagenología Tridimensional , Hígado/anatomía & histología , Análisis Numérico Asistido por Computador , Análisis de Componente Principal , Reproducibilidad de los Resultados
13.
BMC Genet ; 18(1): 115, 2017 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-29246113

RESUMEN

BACKGROUND: Weighted genetic risk scores (GRS), defined as weighted sums of risk alleles of single nucleotide polymorphisms (SNPs), are statistically powerful for detection gene-environment (GxE) interactions. To assign weights, the gold standard is to use external weights from an independent study. However, appropriate external weights are not always available. In such situations and in the presence of predominant marginal genetic effects, we have shown in a previous study that GRS with internal weights from marginal genetic effects ("GRS-marginal-internal") are a powerful and reliable alternative to single SNP approaches or the use of unweighted GRS. However, this approach might not be appropriate for detecting predominant interactions, i.e. interactions showing an effect stronger than the marginal genetic effect. METHODS: In this paper, we present a weighting approach for such predominant interactions ("GRS-interaction-training") in which parts of the data are used to estimate the weights from the interaction terms and the remaining data are used to determine the GRS. We conducted a simulation study for the detection of GxE interactions in which we evaluated power, type I error and sign-misspecification. We compared this new weighting approach to the GRS-marginal-internal approach and to GRS with external weights. RESULTS: Our simulation study showed that in the absence of external weights and with predominant interaction effects, the highest power was reached with the GRS-interaction-training approach. If marginal genetic effects were predominant, the GRS-marginal-internal approach was more appropriate. Furthermore, the power to detect interactions reached by the GRS-interaction-training approach was only slightly lower than the power achieved by GRS with external weights. The power of the GRS-interaction-training approach was confirmed in a real data application to the Traffic, Asthma and Genetics (TAG) Study (N = 4465 observations). CONCLUSION: When appropriate external weights are unavailable, we recommend to use internal weights from the study population itself to construct weighted GRS for GxE interaction studies. If the SNPs were chosen because a strong marginal genetic effect was hypothesized, GRS-marginal-internal should be used. If the SNPs were chosen because of their collective impact on the biological mechanisms mediating the environmental effect (hypothesis of predominant interactions) GRS-interaction-training should be applied.


Asunto(s)
Asma/genética , Contaminación Ambiental , Interacción Gen-Ambiente , Polimorfismo de Nucleótido Simple , Niño , Simulación por Computador , Marcadores Genéticos , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Inflamación/genética , Modelos Genéticos , Factores de Riesgo
14.
J Digit Imaging ; 30(5): 629-639, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28405834

RESUMEN

We propose a generalized framework for developing computer-aided detection (CADe) systems whose characteristics depend only on those of the training dataset. The purpose of this study is to show the feasibility of the framework. Two different CADe systems were experimentally developed by a prototype of the framework, but with different training datasets. The CADe systems include four components; preprocessing, candidate area extraction, candidate detection, and candidate classification. Four pretrained algorithms with dedicated optimization/setting methods corresponding to the respective components were prepared in advance. The pretrained algorithms were sequentially trained in the order of processing of the components. In this study, two different datasets, brain MRA with cerebral aneurysms and chest CT with lung nodules, were collected to develop two different types of CADe systems in the framework. The performances of the developed CADe systems were evaluated by threefold cross-validation. The CADe systems for detecting cerebral aneurysms in brain MRAs and for detecting lung nodules in chest CTs were successfully developed using the respective datasets. The framework was shown to be feasible by the successful development of the two different types of CADe systems. The feasibility of this framework shows promise for a new paradigm in the development of CADe systems: development of CADe systems without any lesion specific algorithm designing.


Asunto(s)
Algoritmos , Diagnóstico por Computador/métodos , Aneurisma Intracraneal/diagnóstico por imagen , Angiografía por Resonancia Magnética/métodos , Nódulos Pulmonares Múltiples/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Estudios de Factibilidad , Femenino , Humanos , Masculino , Persona de Mediana Edad
15.
Med Phys ; 44(6): 2515-2531, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28339103

RESUMEN

PURPOSE: To develop a new automated treatment planning solution for breast and rectal cancer radiotherapy. METHODS: The automated treatment planning solution developed in this study includes selection of the iterative optimized training dataset, dose volume histogram (DVH) prediction for the organs at risk (OARs), and automatic generation of clinically acceptable treatment plans. The iterative optimized training dataset is selected by an iterative optimization from 40 treatment plans for left-breast and rectal cancer patients who received radiation therapy. A two-dimensional kernel density estimation algorithm (noted as two parameters KDE) which incorporated two predictive features was implemented to produce the predicted DVHs. Finally, 10 additional new left-breast treatment plans are re-planned using the Pinnacle3 Auto-Planning (AP) module (version 9.10, Philips Medical Systems) with the objective functions derived from the predicted DVH curves. Automatically generated re-optimized treatment plans are compared with the original manually optimized plans. RESULTS: By combining the iterative optimized training dataset methodology and two parameters KDE prediction algorithm, our proposed automated planning strategy improves the accuracy of the DVH prediction. The automatically generated treatment plans using the dose derived from the predicted DVHs can achieve better dose sparing for some OARs without compromising other metrics of plan quality. CONCLUSIONS: The proposed new automated treatment planning solution can be used to efficiently evaluate and improve the quality and consistency of the treatment plans for intensity-modulated breast and rectal cancer radiation therapy.


Asunto(s)
Órganos en Riesgo , Planificación de la Radioterapia Asistida por Computador , Neoplasias del Recto/radioterapia , Humanos , Dosificación Radioterapéutica , Radioterapia de Intensidad Modulada
16.
Anal Biochem ; 497: 48-56, 2016 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-26723495

RESUMEN

Succinylation is a posttranslational modification (PTM) where a succinyl group is added to a Lys (K) residue of a protein molecule. Lysine succinylation plays an important role in orchestrating various biological processes, but it is also associated with some diseases. Therefore, we are challenged by the following problem from both basic research and drug development: given an uncharacterized protein sequence containing many Lys residues, which one of them can be succinylated, and which one cannot? With the avalanche of protein sequences generated in the postgenomic age, the answer to the problem has become even more urgent. Fortunately, the statistical significance experimental data for succinylated sites in proteins have become available very recently, an indispensable prerequisite for developing a computational method to address this problem. By incorporating the sequence-coupling effects into the general pseudo amino acid composition and using KNNC (K-nearest neighbors cleaning) treatment and IHTS (inserting hypothetical training samples) treatment to optimize the training dataset, a predictor called iSuc-PseOpt has been developed. Rigorous cross-validations indicated that it remarkably outperformed the existing method. A user-friendly web-server for iSuc-PseOpt has been established at http://www.jci-bioinfo.cn/iSuc-PseOpt, where users can easily get their desired results without needing to go through the complicated mathematical equations involved.


Asunto(s)
Lisina/análisis , Proteínas/química , Succinatos/química , Algoritmos , Animales , Inteligencia Artificial , Bases de Datos de Proteínas , Humanos , Internet , Programas Informáticos
17.
Molecules ; 21(1): E95, 2016 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-26797600

RESUMEN

Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.


Asunto(s)
Sitios de Unión , Proteínas Portadoras/química , Biología Computacional/métodos , Dominios y Motivos de Interacción de Proteínas , Proteínas/química , Programas Informáticos , Algoritmos , Proteínas Portadoras/metabolismo , Conjuntos de Datos como Asunto , Proteínas/metabolismo
18.
J Biomol Struct Dyn ; 33(10): 2221-33, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25513722

RESUMEN

Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug-protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug-protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug-protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users' convenience, a public accessible Web server for iDrug-Target has been established at http://www.jci-bioinfo.cn/iDrug-Target/ , by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.


Asunto(s)
Drogas en Investigación/química , Enzimas/química , Canales Iónicos/química , Receptores Citoplasmáticos y Nucleares/química , Receptores Acoplados a Proteínas G/química , Programas Informáticos , Benchmarking , Bases de Datos de Compuestos Químicos , Conjuntos de Datos como Asunto , Diseño de Fármacos , Descubrimiento de Drogas , Drogas en Investigación/síntesis química , Enzimas/metabolismo , Humanos , Internet , Canales Iónicos/metabolismo , Terapia Molecular Dirigida/métodos , Unión Proteica , Curva ROC , Receptores Citoplasmáticos y Nucleares/metabolismo , Receptores Acoplados a Proteínas G/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA