Búsqueda | Portal Regional de la BVS

Deep Learning to Generate in Silico Chemical Property Libraries and Candidate Molecules for Small Molecule Identification in Complex Samples.

Colby, Sean M; Nuñez, Jamie R; Hodas, Nathan O; Corley, Courtney D; Renslow, Ryan R.

Anal Chem ; 92(2): 1720-1729, 2020 01 21.

Artículo en Inglés | MEDLINE | ID: mdl-31661259

RESUMEN

Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g., mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e., without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. First, molecular representation is learned from a large data set of structures with m/z labels. Next, in silico property values are used to continue training, as experimental property data is limited. Finally, the network is further refined by being trained with the experimental data. This allows the network to learn as much as possible at each stage, enabling success with progressively smaller data sets without overfitting. Once trained, the network can be used to predict chemical properties directly from structure, as well as generate candidate structures with desired chemical properties. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.

Asunto(s)

Simulación por Computador , Aprendizaje Profundo , Bibliotecas de Moléculas Pequeñas/análisis , Metabolómica , Estructura Molecular , Bibliotecas de Moléculas Pequeñas/metabolismo

Doing the Impossible: Why Neural Networks Can Be Trained at All.

Hodas, Nathan O; Stinis, Panos.

Front Psychol ; 9: 1185, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30050485

RESUMEN

As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network leads to higher mutual information between layers. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights, providing insight into why neural networks with far more weights than training points can be reliably trained.

Deep learning for computational chemistry.

Goh, Garrett B; Hodas, Nathan O; Vishnu, Abhinav.

J Comput Chem ; 38(16): 1291-1307, 2017 06 15.

Artículo en Inglés | MEDLINE | ID: mdl-28272810

RESUMEN

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including quantitative structure activity relationship, virtual screening, protein structure prediction, quantum chemistry, materials design, and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non-neural networks state-of-the-art models across disparate research topics, and deep neural network-based models often exceeded the "glass ceiling" expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry. © 2017 Wiley Periodicals, Inc.

ShapeShop: Towards Understanding Deep Learning Representations via Interactive Experimentation.

Hohman, Fred; Hodas, Nathan; Chau, Duen Horng.

Ext Abstr Hum Factors Computing Syst ; 2017: 1694-1699, 2017 May.

Artículo en Inglés | MEDLINE | ID: mdl-29354810

RESUMEN

Deep learning is the driving force behind many recent technologies; however, deep neural networks are often viewed as "black-boxes" due to their internal complexity that is hard to understand. Little research focuses on helping people explore and understand the relationship between a user's data and the learned representations in deep learning models. We present our ongoing work, ShapeShop, an interactive system for visualizing and understanding what semantics a neural network model has learned. Built using standard web technologies, ShapeShop allows users to experiment with and compare deep learning models to help explore the robustness of image classifiers.

Adaptive Visual Sort and Summary of Micrographic Images of Nanoparticles for Forensic Analysis.

Jurrus, Elizabeth; Hodas, Nathan; Baker, Nathan; Marrinan, Tim; Hoover, Mark D.

IEEE Int Symp Technol Homel Security HST ; 20162016 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-30191203

RESUMEN

Image classification of nanoparticles from scanning electron microscopes for nuclear forensic analysis is a long, time consuming process. Months of analyst time may initially be required to sift through images in order to categorize morphological characteristics associated with nanoparticle identification. Subsequent assessment of newly acquired images against identified characteristics can be equally time consuming. We present INStINCt, our Intelligent Signature Canvas, as a framework for quickly organizing image data in a web-based canvas framework that partitions images based on features derived from convolutional neural networks. This work is demonstrated using particle images from an aerosol study conducted by Pacific Northwest National Laboratory under the auspices of the U.S. Army Public Health Command to determine depleted uranium aerosol doses and risks.

Determination of the source of SHG verniers in zebrafish skeletal muscle.

Dempsey, William P; Hodas, Nathan O; Ponti, Aaron; Pantazis, Periklis.

Sci Rep ; 5: 18119, 2015 Dec 11.

Artículo en Inglés | MEDLINE | ID: mdl-26657568

RESUMEN

SHG microscopy is an emerging microscopic technique for medically relevant imaging because certain endogenous proteins, such as muscle myosin lattices within muscle cells, are sufficiently spatially ordered to generate detectable SHG without the use of any fluorescent dye. Given that SHG signal is sensitive to the structural state of muscle sarcomeres, SHG functional imaging can give insight into the integrity of muscle cells in vivo. Here, we report a thorough theoretical and experimental characterization of myosin-derived SHG intensity profiles within intact zebrafish skeletal muscle. We determined that "SHG vernier" patterns, regions of bifurcated SHG intensity, are illusory when sarcomeres are staggered with respect to one another. These optical artifacts arise due to the phase coherence of SHG signal generation and the Guoy phase shift of the laser at the focus. In contrast, two-photon excited fluorescence images obtained from fluorescently labeled sarcomeric components do not contain such illusory structures, regardless of the orientation of adjacent myofibers. Based on our results, we assert that complex optical artifacts such as SHG verniers should be taken into account when applying functional SHG imaging as a diagnostic readout for pathological muscle conditions.

Asunto(s)

Microscopía/métodos , Músculo Esquelético/metabolismo , Miofibrillas/metabolismo , Miosinas/metabolismo , Sarcómeros/metabolismo , Pez Cebra/metabolismo , Animales , Artefactos , Diagnóstico por Imagen/métodos , Microscopía Confocal , Microscopía de Fluorescencia por Excitación Multifotónica , Células Musculares/metabolismo , Músculo Esquelético/anatomía & histología , Fotones , Reproducibilidad de los Resultados , Pez Cebra/anatomía & histología , Pez Cebra/embriología

The simple rules of social contagion.

Hodas, Nathan O; Lerman, Kristina.

Sci Rep ; 4: 4343, 2014 Mar 11.

Artículo en Inglés | MEDLINE | ID: mdl-24614301

RESUMEN

It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is far more complex. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We provide a framework for unifying information visibility, divided attention, and explicit social feedback to predict the temporal dynamics of user behavior.

Asunto(s)

Difusión de la Información , Conducta Social , Medios de Comunicación Sociales , Amigos/psicología , Humanos , Funciones de Verosimilitud

Microscopic structure and dynamics of air/water interface by computer simulations--comparison with sum-frequency generation experiments.

Wang, Yanting; Hodas, Nathan O; Jung, Yousung; Marcus, R A.

Phys Chem Chem Phys ; 13(12): 5388-93, 2011 Mar 28.

Artículo en Inglés | MEDLINE | ID: mdl-21347495

RESUMEN

The air/water interface was simulated and the mode amplitudes and their ratios of the effective nonlinear sum-frequency generation (SFG) susceptibilities (A(eff)'s) were calculated for the ssp, ppp, and sps polarization combinations and compared with experiments. By designating "surface-sensitive" free OH bonds on the water surface, many aspects of the SFG measurements were calculated and compared with those inferred from experiment. We calculate an average tilt angle close to the SFG observed value of 35, an average surface density of free OH bonds close to the experimental value of about 2.8 × 10(18) m(-2), computed ratios of A(eff)'s that are very similar to those from the SFG experiment, and their absolute values that are in reasonable agreement with experiment. A one-parameter model was used to calculate these properties. The method utilizes results available from independent IR and Raman experiments to obtain some of the needed quantities, rather than calculating them ab initio. The present results provide microscopic information on water structure useful to applications such as in our recent theory of on-water heterogeneous catalysis.

Asymmetry in RNA pseudoknots: observation and theory.

Aalberts, Daniel P; Hodas, Nathan O.

Nucleic Acids Res ; 33(7): 2210-4, 2005.

Artículo en Inglés | MEDLINE | ID: mdl-15831794

RESUMEN

RNA can fold into a topological structure called a pseudoknot, composed of non-nested double-stranded stems connected by single-stranded loops. Our examination of the PseudoBase database of pseudoknotted RNA structures reveals asymmetries in the stem and loop lengths and provocative composition differences between the loops. By taking into account differences between major and minor grooves of the RNA double helix, we explain much of the asymmetry with a simple polymer physics model and statistical mechanical theory, with only one adjustable parameter.

Asunto(s)

Modelos Moleculares , ARN/química , Biopolímeros/química , Modelos Estadísticos , Conformación de Ácido Nucleico

10.

Efficient computation of optimal oligo-RNA binding.

Hodas, Nathan O; Aalberts, Daniel P.

Nucleic Acids Res ; 32(22): 6636-42, 2004.

Artículo en Inglés | MEDLINE | ID: mdl-15608295

RESUMEN

We present an algorithm that calculates the optimal binding conformation and free energy of two RNA molecules, one or both oligomeric. This algorithm has applications to modeling DNA microarrays, RNA splice-site recognitions and other antisense problems. Although other recent algorithms perform the same calculation in time proportional to the sum of the lengths cubed, O((N1 + N2)3), our oligomer binding algorithm, called bindigo, scales as the product of the sequence lengths, O(N1*N2). The algorithm performs well in practice with the aid of a heuristic for large asymmetric loops. To demonstrate its speed and utility, we use bindigo to investigate the binding proclivities of U1 snRNA to mRNA donor splice sites.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Oligorribonucleótidos/química , Análisis de Secuencia de ARN/métodos , Emparejamiento Base , Secuencia de Bases , Sitios de Unión , Conformación de Ácido Nucleico , Oligorribonucleótidos/metabolismo , Sitios de Empalme de ARN , ARN Mensajero/química , ARN Mensajero/metabolismo , ARN Nuclear Pequeño/química , ARN Nuclear Pequeño/metabolismo , Alineación de Secuencia

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA