Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Curr Urol ; 18(2): 104-109, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-39176299

RESUMEN

Background: The incidence of prostate cancer is increasing worldwide. A significant proportion of patients develop metastatic disease and are initially prescribed androgen deprivation therapy (ADT). However, subsequent sequences of treatments in real-world settings that may improve overall survival remain an area of active investigation. Materials and methods: Data were collected from 384 patients presenting with de novo metastatic prostate cancer from 2011 to 2015 at a tertiary cancer center. Patients were categorized into surviving (n = 232) and deceased (n = 152) groups at the end of 3 years. Modified sequence pattern mining techniques (Generalized Sequential Pattern Mining and Sequential Pattern Discovery using Equivalence Classes) were applied to determine the exact order of the most frequent sets of treatments in each group. Results: Degarelix, as the initial form of ADT, was uniquely in the surviving group. The sequence of ADT followed by abiraterone and docetaxel was uniquely associated with a higher 3-year overall survival. Orchiectomy followed by fosfestrol was found to have a unique niche among surviving patients with a long duration of response to the initial ADT. Patients who received chemotherapy followed by radiotherapy and those who received radiotherapy followed by chemotherapy were found more frequently in the deceased group. Conclusions: We identified unique treatment sequences among surviving and deceased patients at the end of 3 years. Degarelix should be the preferred form of ADT. Patients who received ADT followed by abiraterone and chemotherapy showed better results. Patients requiring palliative radiation and chemotherapy in any sequence were significantly more frequent in the deceased group, identifying the need to offer such patients the most efficacious agents and to target them in clinical trial design.

2.
Patterns (N Y) ; 4(12): 100890, 2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-38106611

RESUMEN

Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering unified inconsistent notation sub-structures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the safe pattern pruning method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.

3.
Mol Phylogenet Evol ; 189: 107933, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37769827

RESUMEN

As some of the smallest vertebrates, yet largest producers of consumed reef biomass, cryptobenthic reef fishes serve a disproportionate role in reef ecosystems and are one of the most poorly understood groups of fish. The blenny genera Hypleurochilus and Parablennius are currently considered paraphyletic and the interrelationships of Parablennius have been the focus of recent phylogenetic studies. However, the interrelationships of Hypleurochilus remain understudied. This genus is transatlantically distributed and comprises 11 species with a convoluted taxonomic history. In this study, relationships for ten Hypleurochilus species are resolved using multi-locus nuclear and mtDNA sequence data, morphological data, and mined COI barcode data.  Mitochondrial and nuclear sequence data from 61 individuals collected from the western Atlantic and northern Gulf of Mexico (N. GoM) delimit seven species into a temperate clade, a tropical clade, and a third distinct lineage. This lineage, herein referred to as H. cf. aequipinnis, may represent a species of Hypleurochilus whose range has expanded into the N. GoM. Inclusion of publicly available COI sequence for an additional three species provides further phylogenetic resolution. H. bananensis forms a new eastern Atlantic clade with H. cf. aequipinnis, providing further evidence for a western Atlantic range expansion. Single marker COI delimitation was unable to elucidate the relationships between H. springeri/H. pseudoaequipinnis and between H. multifilis/H. caudovittatus due to incomplete lineage sorting. Mitochondrial data are also unable to accurately resolve the placement of H. bermudensis. However, a comprehensive approach using multi-locus phylogenetic and species delimitation methods was able to resolve these relationships. While mining publicly available sequence data allowed for the inclusion of an increased number of species in the analysis and a more comprehensive phylogeny, it was not without drawbacks, as a handful of sequences are potentially mis-identified. Overall, we find that the recent divergence of some species within this genus and potential introgression events confound the results of single locus delimitation methods, yet a combination of single and multi-locus analyses has allowed for insights into the biogeography of this genus and uncovered a potential transatlantic range expansion.


Asunto(s)
Ecosistema , Perciformes , Animales , Filogenia , Golfo de México , ADN Mitocondrial/genética , Peces/genética , Teorema de Bayes
4.
Expert Syst Appl ; 209: 118182, 2022 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-35966368

RESUMEN

A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students' learning experience and widening their educational prospects, but also an opportunity to gain insights into students' learning processes with learning analytics. This study contributes to the topic of improving and understanding e-learning processes in the following ways. First, we demonstrate that accurate predictive models can be built based on sequential patterns derived from students' behavioral data, which are able to identify underperforming students early in the course. Second, we investigate the specificity-generalizability trade-off in building such predictive models by investigating whether predictive models should be built for every course individually based on course-specific sequential patterns, or across several courses based on more general behavioral patterns. Finally, we present a methodology for capturing temporal aspects in behavioral data and analyze its influence on the predictive performance of the models. The results of our improved sequence classification technique are capable to predict student performance with high levels of accuracy, reaching 90% for course-specific models.

5.
Sensors (Basel) ; 21(4)2021 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-33546167

RESUMEN

Games have become one of the most popular activities across cultures and ages. There is ample evidence that supports the benefits of using games for learning and assessment. However, incorporating game activities as part of the curriculum in schools remains limited. Some of the barriers for broader adoption in classrooms is the lack of actionable assessment data, the fact that teachers often do not have a clear sense of how students are interacting with the game, and it is unclear if the gameplay is leading to productive learning. To address this gap, we seek to provide sequence and process mining metrics to teachers that are easily interpretable and actionable. More specifically, we build our work on top of Shadowspect, a three-dimensional geometry game that has been developed to measure geometry skills as well other cognitive and noncognitive skills. We use data from its implementation across schools in the U.S. to implement two sequence and process mining metrics in an interactive dashboard for teachers. The final objective is to facilitate that teachers can understand the sequence of actions and common errors of students using Shadowspect so they can better understand the process, make proper assessment, and conduct personalized interventions when appropriate.

6.
Int J Med Inform ; 148: 104366, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33485216

RESUMEN

OBJECTIVE: This work aims at deriving interesting clinical events using association rule mining based on a user-annotated order of clinical features. MATERIALS AND METHODS: A user specifies a partial temporal order of features by indexing features of interest, with repeated and bundled indexes allowed as needed. An association mining algorithm plugin was designed to generate rules that adhere to the user-specified temporal order. The plugin uses temporal and sequence constraints to reduce rule permutations early in the rule generation process. The method was evaluated with a large medical claims dataset to generate clinical events. RESULTS: Using the plug-in algorithm, the database is scanned to calculate the support of item sequences whose sequential order conforms with the user annotated feature order. In our experiments with 20,000 medical claim data records, our method generated rules in a significantly less time than the standalone Apriori algorithm. Our approach generates dendrograms to organize the rules into meaningful hierarchies and provides a graphical interface to navigate the rules and unfold interesting clinical events. DISCUSSION: Since many associations in healthcare are of sequential nature, some of the derived rules may describe interesting clinical flows or events, while others may be contextually irrelevant. Our method exploits user-specified sequence constraints to eliminate irrelevant rules and reduce rule permutations, speeding up rule mining. CONCLUSION: This work can be the foundation for future association rule mining studies to extract sequential events based on interestingness. The work can support clinical education where the instructor defines feature sequence constraints, and students unfold and examine extracted sequential rules.


Asunto(s)
Algoritmos , Minería de Datos , Bases de Datos Factuales , Humanos
7.
Sensors (Basel) ; 20(9)2020 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-32365545

RESUMEN

With the rapid development in sensing technology, data mining, and machine learning fields for human health monitoring, it became possible to enable monitoring of personal motion and vital signs in a manner that minimizes the disruption of an individual's daily routine and assist individuals with difficulties to live independently at home. A primary difficulty that researchers confront is acquiring an adequate amount of labeled data for model training and validation purposes. Therefore, activity discovery handles the problem that activity labels are not available using approaches based on sequence mining and clustering. In this paper, we introduce an unsupervised method for discovering activities from a network of motion detectors in a smart home setting. First, we present an intra-day clustering algorithm to find frequent sequential patterns within a day. As a second step, we present an inter-day clustering algorithm to find the common frequent patterns between days. Furthermore, we refine the patterns to have more compressed and defined cluster characterizations. Finally, we track the occurrences of various regular routines to monitor the functional health in an individual's patterns and lifestyle. We evaluate our methods on two public data sets captured in real-life settings from two apartments during seven-month and three-month periods.

8.
Artículo en Inglés | MEDLINE | ID: mdl-29515937

RESUMEN

The advent of mobile health (mHealth) technologies challenges the capabilities of current visualizations, interactive tools, and algorithms. We present Chronodes, an interactive system that unifies data mining and human-centric visualization techniques to support explorative analysis of longitudinal mHealth data. Chronodes extracts and visualizes frequent event sequences that reveal chronological patterns across multiple participant timelines of mHealth data. It then combines novel interaction and visualization techniques to enable multifocus event sequence analysis, which allows health researchers to interactively define, explore, and compare groups of participant behaviors using event sequence combinations. Through summarizing insights gained from a pilot study with 20 behavioral and biomedical health experts, we discuss Chronodes's efficacy and potential impact in the mHealth domain. Ultimately, we outline important open challenges in mHealth, and offer recommendations and design guidelines for future research.

9.
BMC Bioinformatics ; 18(1): 228, 2017 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-28464826

RESUMEN

BACKGROUND: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable. RESULTS: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe- to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining. CONCLUSIONS: microTaboo implements a solution to the k-disjoint problem in an alignment- and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at: https://MohammedAlJaff.github.io/microTaboo.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genoma
10.
Artif Intell Med ; 71: 43-56, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27506130

RESUMEN

MOTIVATION: Prescribing cascade (PC) occurs when an adverse drug reaction (ADR) is misinterpreted as a new medical condition, leading to further prescriptions for treatment. Additional prescriptions, however, may worsen the existing condition or introduce additional adverse effects (AEs). Timely detection and prevention of detrimental PCs is essential as drug AEs are among the leading causes of hospitalization and deaths. Identifying detrimental PCs would enable warnings and contraindications to be disseminated and assist the detection of unknown drug AEs. Nonetheless, the detection is difficult and has been limited to case reports or case assessment using administrative health claims data. Social media is a promising source for detecting signals of detrimental PCs due to the public availability of many discussions regarding treatments and drug AEs. OBJECTIVE: In this paper, we investigate the feasibility of detecting detrimental PCs from social media. METHODS: The detection, however, is challenging due to the data uncertainty and data rarity in social media. We propose a framework to mine sequences of drugs and AEs that signal detrimental PCs, taking into account the data uncertainty and data rarity. RESULTS: We conduct experiments on two real-world datasets collected from Twitter and Patient health forum. Our framework achieves encouraging results in the validation against known detrimental PCs (F1=78% for Twitter and 68% for Patient) and the detection of unknown potential detrimental PCs (Precision@50=72% and NDCG@50=95% for Twitter, Precision@50=86% and NDCG@50=98% for Patient). In addition, the framework is efficient and scalable to large datasets. CONCLUSION: Our study demonstrates the feasibility of generating hypotheses of detrimental PCs from social media to reduce pharmacists' guesswork.


Asunto(s)
Minería de Datos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Medios de Comunicación Sociales , Humanos , Farmacéuticos
11.
Methods Inf Med ; 55(3): 223-33, 2016 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-26848079

RESUMEN

OBJECTIVES: Understanding the progression of comorbid neurodevelopmental disorders (NDD) during different critical time periods may contribute to our comprehension of the underlying pathophysiology of NDDs. The objective of our study was to identify frequent temporal sequences of developmental diagnoses in noisy patient data. METHODS: We used a data set of 2810 patients, documenting NDD diagnoses given to them by an NDD expert at a child developmental center during multiple visits at different ages. Extensive preprocessing steps were developed in order to allow the data set to be processed by an efficient sequence mining algorithm (SPADE). RESULTS: The discovered sequences were validated by cross validation for 10 iterations; all correlation coefficients for support, confidence and lift measures were above 0.75 and their proportions were similar. No signifi- cant differences between the distributions of sequences were found using Kolmogorov-Smirnov test. CONCLUSIONS: We have demonstrated the feasibility of using the SPADE algorithm for discovery of valid temporal sequences of comorbid disorders in children with NDDs. The identification of such sequences would be beneficial from clinical and research perspectives. Moreover, these sequences could serve as features for developing a full-fledged temporal predictive model.


Asunto(s)
Algoritmos , Minería de Datos , Trastornos del Neurodesarrollo/patología , Adolescente , Niño , Preescolar , Comorbilidad , Humanos , Lactante , Modelos Teóricos , Factores de Tiempo
12.
Sensors (Basel) ; 16(2): 145, 2016 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-26805850

RESUMEN

Due to the recent explosive growth of location-aware services based on mobile devices, predicting the next places of a user is of increasing importance to enable proactive information services. In this paper, we introduce a data-driven framework that aims to predict the user's next places using his/her past visiting patterns analyzed from mobile device logs. Specifically, the notion of the spatiotemporal-periodic (STP) pattern is proposed to capture the visits with spatiotemporal periodicity by focusing on a detail level of location for each individual. Subsequently, we present algorithms that extract the STP patterns from a user's past visiting behaviors and predict the next places based on the patterns. The experiment results obtained by using a real-world dataset show that the proposed methods are more effective in predicting the user's next places than the previous approaches considered in most cases.

13.
IEEE Trans Knowl Data Eng ; 28(11): 2910-2926, 2016 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-37274928

RESUMEN

In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively prune those unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose PFS2, a novel differentially private FSM algorithm. It is the first algorithm that supports the general gap-constrained FSM in the context of differential privacy. The gap constraints in FSM can be used to limit the mining results to a controlled set of frequent sequences. In our PFS2 algorithm, the core is to utilize sample databases to prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which candidate sequences are potentially frequent. To improve the accuracy of such private estimations, a gap-aware sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to calibrate the amount of noise required by differential privacy, a gap-aware sensitivity computation method is proposed to obtain the sensitivity of the local support computations with different gap constraints. Furthermore, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ϵ-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy.

14.
Health Inf Sci Syst ; 3: 5, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26664724

RESUMEN

Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal clinical details on individual and populations of patients. In this paper we present three unsupervised approaches, sequential pattern mining (PrefixSpan); frequency linguistic based C-Value; and keyphrase extraction from co-occurrence graphs (TextRank), to automatically extract single and multi-word medical terms without domain-specific knowledge. Because each of the three approaches focuses on different aspects of the language feature space, we propose a genetic algorithm to learn the best parameters of linearly integrating the three extractors for optimal performance against domain expert annotations. Around 30,000 clinical letters sent over the past decade from ophthalmology specialists to general practitioners at an eye clinic are anonymised as the corpus to evaluate the effectiveness of the ensemble against individual extractors. With minimal annotation, the ensemble achieves an average F-measure of 65.65 % when considering only complex medical terms, and a F-measure of 72.47 % if we take single word terms (i.e. unigrams) into consideration, markedly better than the three term extraction techniques when used alone.

15.
J Biomed Inform ; 56: 369-78, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26146159

RESUMEN

OBJECTIVE: In order to derive data-driven insights, we develop Care Pathway Explorer, a system that mines and visualizes a set of frequent event sequences from patient EMR data. The goal is to utilize historical EMR data to extract common sequences of medical events such as diagnoses and treatments, and investigate how these sequences correlate with patient outcome. MATERIALS AND METHODS: The Care Pathway Explorer uses a frequent sequence mining algorithm adapted to handle the real-world properties of EMR data, including techniques for handling event concurrency, multiple levels-of-detail, temporal context, and outcome. The mined patterns are then visualized in an interactive user interface consisting of novel overview and flow visualizations. RESULTS: We use the proposed system to analyze the diagnoses and treatments of a cohort of hyperlipidemic patients with hypertension and diabetes pre-conditions, and demonstrate the clinical relevance of patterns mined from EMR data. The patterns that were identified corresponded to clinical and published knowledge, some of it unknown to the physician at the time of discovery. CONCLUSION: Care Pathway Explorer, which combines frequent sequence mining techniques with advanced visualizations supports the integration of data-driven insights into care pathway discovery.


Asunto(s)
Minería de Datos/métodos , Registros Electrónicos de Salud , Hiperlipidemias/diagnóstico , Algoritmos , Estudios de Cohortes , Gráficos por Computador , Recolección de Datos , Humanos , Hiperlipidemias/complicaciones , Hiperlipidemias/tratamiento farmacológico , Hipertensión/diagnóstico , Hipertensión/tratamiento farmacológico , Lipoproteínas LDL/análisis , Evaluación del Resultado de la Atención al Paciente , Estado Prediabético/diagnóstico , Estado Prediabético/tratamiento farmacológico , Programas Informáticos
16.
MAbs ; 7(4): 693-706, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26018625

RESUMEN

Camelid immunoglobulin variable (IGV) regions were found homologous to their human counterparts; however, the germline V repertoires of camelid heavy and light chains are still incomplete and their therapeutic potential is only beginning to be appreciated. We therefore leveraged the publicly available HTG and WGS databases of Lama pacos and Camelus ferus to retrieve the germline repertoire of V genes using human IGV genes as reference. In addition, we amplified IGKV and IGLV genes to uncover the V germline repertoire of Lama glama and sequenced BAC clones covering part of the Lama pacos IGK and IGL loci. Our in silico analysis showed that camelid counterparts of all human IGKV and IGLV families and most IGHV families could be identified, based on canonical structure and sequence homology. Interestingly, this sequence homology seemed largely restricted to the Ig V genes and was far less apparent in other genes: 6 therapeutically relevant target genes differed significantly from their human orthologs. This contributed to efficient immunization of llamas with the human proteins CD70, MET, interleukin (IL)-1ß and IL-6, resulting in large panels of functional antibodies. The in silico predicted human-homologous canonical folds of camelid-derived antibodies were confirmed by X-ray crystallography solving the structure of 2 selected camelid anti-CD70 and anti-MET antibodies. These antibodies showed identical fold combinations as found in the corresponding human germline V families, yielding binding site structures closely similar to those occurring in human antibodies. In conclusion, our results indicate that active immunization of camelids can be a powerful therapeutic antibody platform.


Asunto(s)
Región Variable de Inmunoglobulina , Pliegue de Proteína , Homología de Secuencia de Aminoácido , Animales , Camélidos del Nuevo Mundo , Camelus , Cristalografía por Rayos X , Humanos , Región Variable de Inmunoglobulina/química , Región Variable de Inmunoglobulina/genética , Región Variable de Inmunoglobulina/inmunología , Estructura Terciaria de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA