Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Comput Biol Med ; 141: 105118, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34971979

RESUMEN

There are many difficulties in extracting and using knowledge for medical analytic and predictive purposes from Real-World Data, even when the data is already well structured in the manner of a large spreadsheet. Preparative curation and standardization or "normalization" of such data involves a variety of chores but underlying them is an interrelated set of fundamental problems that can in part be dealt with automatically during the datamining and inference processes. These fundamental problems are reviewed here and illustrated and investigated with examples. They concern the treatment of unknowns, the need to avoid independency assumptions, and the appearance of entries that may not be fully distinguished from each other. Unknowns include errors detected as implausible (e.g., out of range) values that are subsequently converted to unknowns. These problems are further impacted by high dimensionality and problems of sparse data that inevitably arise from high-dimensional datamining even if the data is extensive. All these considerations are different aspects of incomplete information, though they also relate to problems that arise if care is not taken to avoid or ameliorate consequences of including the same information twice or more, or if misleading or inconsistent information is combined. This paper addresses these aspects from a slightly different perspective using the Q-UEL language and inference methods based on it by borrowing some ideas from the mathematics of quantum mechanics and information theory. It takes the view that detection and correction of probabilistic elements of knowledge subsequently used in inference need only involve testing and correction so that they satisfy certain extended notions of coherence between probabilities. This is by no means the only possible view, and it is explored here and later compared with a related notion of consistency.


Asunto(s)
Medicina , Lenguaje , Probabilidad
2.
Comput Biol Med ; 117: 103621, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-32072972

RESUMEN

The Quantum Universal Exchange Language (Q-UEL) based on Dirac notation and algebra from quantum mechanics, along with its associated data mining and Hyperbolic Dirac Net (HDN) for probabilistic inference, has proven to be a useful architectural principle for knowledge management, analysis and prediction systems in medicine. It has been described in several papers; here is described its extension to clinical genomics and precision medicine. Two use cases are studied: (a) bioinformatics in clinical decision support especially for risk for type 2 diabetes using mitochondrial patient DNA sequences, and (b) bioinformatics and computational biology (conformational) research examples related to drug discovery involving the recently discovered class of mitochondrial derived peptides (MDPs). MDPs were surprising when first discovered as coded in small open reading frames (sORFs), and are emerging as having a fundamental role in metabolic control, longevity and disease. This project originally represented a language specification study relating to what information related to genomics is essential or useful to carry, and what processing will be needed. However, novel aspects introduced or discovered include the HDN-like neural nets and their use, along with more established methods, for prediction of type 2 diabetes, and in particular for proposals for over 80 natural MDPs most of which that have not previously been described at the time of the study, as potential drug lead targets. Also, use of many medical records with simulated joining of mtDNA as performance tests led to some insightful observations regarding the behavior of HDN predictions where independent factors are involved.


Asunto(s)
Diabetes Mellitus Tipo 2 , Genoma Mitocondrial , Preparaciones Farmacéuticas , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Diabetes Mellitus Tipo 2/genética , Humanos , Lenguaje , Medicina de Precisión
3.
Comput Biol Med ; 112: 103369, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31377681

RESUMEN

While clinical and biomedical information in digital form has been escalating, it is socioeconomic factors that are important determinants of health on the national and global scale. We show how collective use of data mining and prediction algorithms to analyze socioeconomic population health data can stand beside classical correlation analysis in routine data analysis. The underlying theoretical basis is the Dirac notation and algebra that is a scientific standard but unusual outside of the physical sciences, combined with a theory of expected information first developed for analyzing sparse data but still largely confined to bioinformatics. The latter was important here because the records analyzed (which are for US counties and equivalents, not patients) are very few by contemporary data mining standards. The approach is very unlikely to be familiar to socioeconomic researchers, so the theory and the advantages of our inference nets over the Bayes Net are reviewed here, mostly using socioeconomic examples. While our expertise and focus is in regard to novel analytical methods rather than socioeconomics per se, a significant negative (countertrending) relationship between population health and equity was initially surprising, at least to the present authors. This encouraged deeper exploration including that of the relationship between our data mining methods and traditional Pearson's correlation. The latter is susceptible to giving wrong conclusions if a phenomenon called Simpson's paradox applies, so this is also investigated. Also discussed is that, even for very few records, associative data mining can still demand significant computational resources due to a combinatorial explosion.


Asunto(s)
Algoritmos , Minería de Datos , Lenguaje , Humanos , Factores Socioeconómicos
4.
Comput Biol Med ; 108: 382-399, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-31075569

RESUMEN

Probabilistic inference methods require a more general and realistic description of the world as a Bidirectional General Graph (BGG). While in its original form the Bayes Net (BN) has been promoted as a predictive tool, it is more immediately a way of testing a hypothesis or model about interactions in a system usually considered on a causal basis. Once established, the model can be used in a predictive way, but the problem here is that for a traditional BN the hypotheses or models that can be formed are limited to the Directed Acyclic Graph (DAG) by definition. Three interrelated features are highlighted that represent deficiencies of the DAG which are corrected by conversion to a method based on a BGG: (i) lack of intrinsic representation of coherence by Bayes' rule, (ii) relatedly the need to consider interdependence in parent nodes, and (iii) the need for management of a property called recurrence. These deficiencies can represent large errors in absolute estimates of probabilities, and while relative and renormalized probabilities ameliorate that, they can often make much of a net superfluous through cancelations by division. The Hyperbolic Dirac Net (HDN) based on Dirac's quantum mechanics is a solution that led naturally to avoiding these deficiencies. It encodes bidirectional probabilities in an h-complex value rediscovered by Dirac, i.e. with the imaginary number h such that hh = +1. Properties of the HDN described previously are reviewed (though emphasis is on descriptions in familiar probability terms), the issue of recurrence is introduced, methods of construction are simplified, and the severity of the quantitative differences between BNs and analogous HDNs are exemplified. There is also discussion of how results compare with other approaches in practice.


Asunto(s)
Algoritmos , Medicina , Modelos Teóricos , Teorema de Bayes , Humanos , Probabilidad
5.
Comput Biol Med ; 95: 147-166, 2018 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-29500985

RESUMEN

Theoretical and methodological principles are presented for the construction of very large inference nets for odds calculations, composed of hundreds or many thousands or more of elements, in this paper generated by structured data mining. It is argued that the usual small inference nets can sometimes represent rather simple, arbitrary estimates. Examples of applications in clinical and public health data analysis, medical claims data and detection of irregular entries, and bioinformatics data, are presented. Construction of large nets benefits from application of a theory of expected information for sparse data and the Dirac notation and algebra. The extent to which these are important here is briefly discussed. Purposes of the study include (a) exploration of the properties of large inference nets and a perturbation and tacit conditionality models, (b) using these to propose simpler models including one that a physician could use routinely, analogous to a "risk score", (c) examination of the merit of describing optimal performance in a single measure that combines accuracy, specificity, and sensitivity in place of a ROC curve, and (d) relationship to methods for detecting anomalous and potentially fraudulent data.


Asunto(s)
Macrodatos , Minería de Datos/métodos , Bases de Datos Factuales , Procesamiento Automatizado de Datos/métodos , Modelos Teóricos , Biología Computacional/métodos , Humanos , Revisión de Utilización de Seguros
6.
Comput Biol Med ; 51: 183-97, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24954566

RESUMEN

We recently introduced the concept of a Hyperbolic Dirac Net (HDN) for medical inference on the grounds that, while the traditional Bayes Net (BN) is popular in medicine, it is not suited to that domain: there are many interdependencies such that any "node" can be ultimately conditional upon itself. A traditional BN is a directed acyclic graph by definition, while the HDN is a bidirectional general graph closer to a diffuse "field" of influence. Cycles require bidirectionality; the HDN uses a particular type of imaginary number from Dirac׳s quantum mechanics to encode it. Comparison with the BN is made alongside a set of recipes for converting a given BN to an HDN, also adding cycles that do not usually require reiterative methods. This conversion is called the P-method. Conversion to cycles can sometimes be difficult, but more troubling was that the original BN had probabilities needing adjustment to satisfy realism alongside the important property called "coherence". The more general and simpler K-method, not dependent on the BN, is usually (but not necessarily) derived by data mining, and is therefore also introduced. As discussed, BN developments may converge to an HDN-like concept, so it is reasonable to consider the HDN as a BN extension.


Asunto(s)
Toma de Decisiones Asistida por Computador , Técnicas de Apoyo para la Decisión , Modelos Biológicos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA