RESUMO
In late summer and early autumn 2022, an intense bloom of Protoceratium reticulatum-the main yessotoxin (YTX) producer along Chilean coasts and a major threat to artisanal fisheries, the aquaculture industry, and environmental health-was recorded in the Patagonian fjord system. The high YTX levels (>3.75 mg kg-1) resulted in the first ban of shellfish collection in Chile. At Puyuhuapi Fjord, a global "hotspot" of harmful algal bloom events, the cell density of P. reticulatum determined in integrated tube samples (0-10 m) at the end of April 2022 reached 407,000 cells L-1. At the same time, YTX levels well exceeded the regulatory limit by roughly 2.5-fold, with concentrations as high as 9.42 mg kg-1 measured in native populations of the blue mussel Mytilus chilensis. Five different YTX analogues, 45-OH-YTX, COOH-45-keto-YTX, COOH-45-OH-YTX, COOH-YTX, and 45,55-diOH-YTX, were also detected in relevant amounts. While the ban lasted close to 3 months, accumulation and detoxification processes were monitored over a 1-year period. This study assessed the implications of high levels of YTXs and their analogues on the local economy and ecosystem health, given the increase in P. reticulatum blooms predicted for NW Patagonia in the context of a changing climate.
RESUMO
Protoceratium reticulatum is the main yessotoxin-producer along the Chilean coast. Thus far, the yessotoxin levels recorded in this region have not posed a serious threat to human health. However, a bloom of P. reticulatum during the austral summer of 2022 caused the first ban of shellfish collection, due to the high toxin levels. A bloom of P. reticulatum during the austral summer of 2020 allowed an evaluation of the fine-scale distribution of the dinoflagellate during a tidal cycle. High-resolution measurements of biophysical properties were carried out in mid-summer (February 18-19) at a fixed sampling station in Puyuhuapi Fjord, Chilean Patagonia, as part of an intensive 24-h biophysical experiment to monitor the circadian distributions of P. reticulatum vegetative cells and yessotoxins. High P. reticulatum cell densities (>20 × 103 cells L-1) were found in association with a warmer (14.5-15 °C) and estuarine (23.5-24.5 g kg-1) sub-surface water layer (6-8 m). P. reticulatum cell numbers and yessotoxins followed a synchronic distribution pattern consistent with the excursions of the pycnocline. Nevertheless, the surface aggregation of the cells was modulated by the light cycle, suggesting daily vertical migration. The yessotoxin content per P. reticulatum cell ranged from 9.4 to 52.2 pg. This study demonstrates both the value of fine-scale resolution measurements of biophysical properties in a highly stratified system and the potential ecosystem impact of P. reticulatum strains producing high levels of yessotoxins.
Assuntos
Dinoflagellida , Venenos de Moluscos , Oxocinas , Dinoflagellida/fisiologia , Oxocinas/análise , Chile , Estuários , Luz , Proliferação Nociva de Algas , Toxinas Marinhas/análiseRESUMO
INTRODUCTION: The rise of new technologies in the field of health is yielding promising results. In certain chronic conditions such as type 2 diabetes mellitus, which ranks among the top five causes of global mortality, it could be useful in supporting patient management. MATERIALS AND METHODS: A systematic review will be conducted on scientific publications from the last 5 years (January 2019 to October 2023) to describe the effect of mobile app usage on glycated hemoglobin for the management of adult patients with type 2 diabetes mellitus who participated in randomized controlled clinical trials. The search will be carried out in the databases of MEDLINE (Ovid), Embase (Ovid), CINAHL (EBSCOhost), CENTRAL, WoS, Scopus, Epistemonikos, and LILACS. The search strategy will be constructed using both controlled and natural language. Additionally, the Cochrane filter will be applied to identify randomized controlled trials. The review will include scientific articles reporting studies that present results from randomized controlled trials, with texts in Spanish, English, or French, utilizing mobile applications for the management of adult individuals (over 18 years) with type 2 diabetes mellitus, and whose outcomes report the effects on glycated hemoglobin. The Cochrane Risk of Bias Tool will be used to assess the quality of the studies, and the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) methodology will be implemented to evaluate the certainty of the evidence. RESULTS: The analysis will be conducted by observing the value of the glycated hemoglobin levels of the participants. Given that this data is a quantitative and continuous value, it facilitates the identification of the effects of the mobile applications used for the management of type 2 diabetes mellitus (T2DM) in adults. Furthermore, if sufficient data are available, a meta-analysis will be conducted using IBM-SPSS. The effect of the intervention will be estimated by the mean difference. All point estimates will be accompanied by 95% confidence intervals. A random effects model will be used. The heterogeneity of the results will be assessed using Cochrane's Q and I2 statistics. DISCUSSION: Considering that the quality of content and functionality of certain applications in the healthcare field is highly variable, it is necessary to evaluate the scientific evidence reported on the effect of the use of this type of technology in people with T2DM.
Assuntos
Diabetes Mellitus Tipo 2 , Aplicativos Móveis , Revisões Sistemáticas como Assunto , Diabetes Mellitus Tipo 2/terapia , Humanos , Hemoglobinas Glicadas/análise , Hemoglobinas Glicadas/metabolismo , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
The frequency of harmful algal blooms (HABs) has increased over the last two decades, a phenomenon enhanced by global climate change. However, the effects of climate change will not be distributed equally, and Chile has emerged as one important, vulnerable area. The Chilean Patagonian region (41â56°S) hosts two marine ecoregions that support robust blue economies via wild fisheries, aquaculture, and tourism. However, the harmful algal bloom-forming dinoflagellate Alexandrium catenella, a causative agent of paralytic shellfish poisoning outbreaks, threatens the viability of blue industries in this region and others worldwide. Despite the proliferation of A. catenella blooms over the last few decades, the role of sedimentary resting cysts in the recurrence of harmful algal blooms and the species' northward expansion across Chilean Patagonia is not well understood. As a resting cyst-producing species, the sediment-cyst dynamics of A. catenella likely contribute to the geographical expansion and bloom recurrence of this species. For this purpose, we analyzed a decade of A. catenella surface sediment cyst records across the two ecoregions of the Chilean Patagonian System that were further stratified into five subregions based on water temperature, salinity, dissolved oxygen, and nutrient characteristics. We also analyzed spatio-temporal cyst dynamics in a pre-, during-, and post-bloom scenario of the Chiloense ecoregion (more northern) of the Magellanic province. Our results indicated highly variable A. catenella resting cyst abundances, with a maximum of 221 cysts cm-3 recorded in 2002 after an intense bloom. Generalized linear mixed models and linear mixed models found that sampling season, subregion, and Total Organic Matter (%) explained resting cyst presence and density. The results also demonstrated the presence of A. catenella cysts in northern subregions, evidencing the northward geographical expansion observed during the last few decades. The risks of A. catenella bloom recurrence from small, patchy resting cyst distributions across broad geographical areas and under changing environmental conditions are discussed.
Assuntos
Dinoflagellida , Intoxicação por Frutos do Mar , Proliferação Nociva de Algas , Temperatura , AquiculturaRESUMO
Toxic and harmful algal blooms (HABs) are a global problem affecting human health, marine ecosystems, and coastal economies, the latter through their impact on aquaculture, fisheries, and tourism. As our knowledge and the techniques to study HABs advance, so do international monitoring efforts, which have led to a large increase in the total number of reported cases. However, in addition to increased detections, environmental factors associated with global change, mainly high nutrient levels and warming temperatures, are responsible for the increased occurrence, persistence, and geographical expansion of HABs. The Chilean Patagonian fjords provide an "open-air laboratory" for the study of climate change, including its impact on the blooms of several toxic microalgal species, which, in recent years, have undergone increases in their geographical range as well as their virulence and recurrence (the species Alexandrium catenella, Pseudochattonella verruculosa, and Heterosigma akashiwo, and others of the genera Dinophysis and Pseudo-nitzschia). Here, we review the evolution of HABs in the Chilean Patagonian fjords, with a focus on the established connections between key features of HABs (expansion, recurrence, and persistence) and their interaction with current and predicted global climate-change-related factors. We conclude that large-scale climatic anomalies such as the lack of rain and heat waves, events intensified by climate change, promote the massive proliferation of these species by creating ideal conditions for their growth and persistence, as they affect water-column stratification, nutrient inputs, and reproductive rates.
RESUMO
Harmful algal blooms (HABs) in southern Chile are a serious threat to public health, tourism, artisanal fisheries, and aquaculture in this region. Ichthyotoxic HAB species have recently become a major annual threat to the Chilean salmon farming industry, due to their severe economic impacts. In early austral autumn 2021, an intense bloom of the raphidophyte Heterosigma akashiwo was detected in Comau Fjord, Chilean Patagonia, resulting in a high mortality of farmed salmon (nearly 6000 tons of biomass) within 15 days. H. akashiwo cells were first detected at the head of the fjord on March 16, 2021 (up to 478 cells mL-1). On March 31, the cell density at the surface had reached a maximum of 2 × 105 cells mL-1, with intense brown spots visible on the water surface. Strong and persistent high-pressure anomalies over the southern tip of South America, consistent with the positive phase of the Southern Annular Mode (SAM), resulted in extremely dry conditions, high solar radiation, and strong southerly winds. A coupling of these features with the high water retention times inside the fjord can explain the spatial-temporal dynamics of this bloom event. Other factors, such as the internal local physical uplift process (favored by the north-to-south orientation of the fjord), salt-fingering events, and the uplift of subantarctic deep-water renewal, likely resulted in the injection of nutrients into the euphotic layer, which in turn could have promoted cell growth and thus high microalgal cell densities, such as reached by the bloom.
Assuntos
Estuários , Microalgas , Animais , Mudança Climática , Proliferação Nociva de Algas , Salmão , Chile , ÁguaRESUMO
Harmful algal blooms, in particular recurrent blooms of the dinoflagellate Alexandrium catenella, associated with paralytic shellfish poisoning (PSP), frequently limit commercial shellfish harvests, resulting in serious socio-economic consequences. Although the PSP-inducing species that threaten the most vulnerable commercial species of shellfish are very patchy and spatially heterogeneous in their distribution, the spatial and temporal scales of their effects have largely been ignored in monitoring programs and by researchers. In this study, we examined the spatial and temporal dynamics of PSP toxicity in the clam (Ameghinomya antiqua) in two fishing grounds in southern Chile (Ovalada Island and Low Bay). During the summer of 2009, both were affected by an intense toxic bloom of A. catenella (up to 1.1 × 106 cells L-1). Generalized linear models were used to assess the potential influence of different environmental variables on the field detoxification rates of PSP toxins over a period of 12 months. This was achieved using a four parameter exponential decay model to fit and compare field detoxification rates per sampling site. The results show differences in the spatial variability and temporal dynamics of PSP toxicity, given that greater toxicities (+10-fold) and faster detoxification (20% faster) are observed at the Ovalada Island site, the less oceanic zone, and where higher amounts of clam are annually produced. Our observations support the relevance of considering different spatial and temporal scales to obtain more accurate assessments of PSP accumulation and detoxification dynamics and to improve the efficacy of fisheries management after toxic events.
Assuntos
Dinoflagellida , Intoxicação por Frutos do Mar , Toxinas Biológicas , Humanos , Frutos do Mar , Proliferação Nociva de AlgasRESUMO
Harmful Algal Blooms (HAB) pose a severe socio-economic problem worldwide. The dinoflagellate species Alexandrium catenella produces potent neurotoxins called saxitoxins (STXs) and its blooms are associated with the human intoxication named Paralytic Shellfish Poisoning (PSP). Knowing where and how these blooms originate is crucial to predict blooms. Most studies in the Chilean Patagonia, were focused on coastal areas, considering that blooms from the adjacent oceanic region are almost non-existent. Using a combination of field studies and modelling approaches, we first evaluated the role of the continental shelf off northern Chilean Patagonia as a source of A. catenella resting cysts, which may act as inoculum for their toxic coastal blooms. This area is characterized by a seasonal upwelling system with positive Ekman pumping during spring-summer, and by the presence of six major submarine canyons. We found out that these submarine canyons increase the vertical advection of bottom waters, and thus, significantly enhance the process of coastal upwelling. This is a previously unreported factor, among those involved in bloom initiation. This finding put this offshore area at high risk of resuspension of resting cysts of A. catenella. Here, we discuss in detail the physical processes promoting this resuspension.
Assuntos
Cistos , Dinoflagellida , Intoxicação por Frutos do Mar , Humanos , Chile , Proliferação Nociva de Algas , Oceanos e MaresRESUMO
Harmful algal blooms (HABs) are recurrent in the NW Patagonia fjords system and their frequency has increased over the last few decades. Outbreaks of HAB species such as Alexandrium catenella, a causal agent of paralytic shellfish poisoning, and Protoceratium reticulatum, a yessotoxins producer, have raised considerable concern due to their adverse socioeconomic consequences. Monitoring programs have mainly focused on their planktonic stages, but since these species produce benthic resting cysts, the factors influencing cyst distributions are increasingly gaining recognition as potentially important to HAB recurrence in some regions. Still, a holistic understanding of the physico-chemical conditions influencing cyst distribution in this region is lacking, especially as it relates to seasonal changes in drivers of cyst distributions, as the characteristics that favor cyst preservation in the sediment may change through the seasons. In this study, we analyzed the physico-chemical properties of the sediment (temperature, pH, redox potential) and measured the bottom dissolved oxygen levels in a "hotspot" area of southern Chile, sampling during the spring and summer as well as the fall and winter, to determine the role these factors may play as modulators of dinoflagellate cyst distribution, and specifically for the cysts of A. catenella and P. reticulatum. A permutational analysis of variance (PERMANOVA) showed the significant effect of sediment redox conditions in explaining the differences in the cyst assemblages between spring-summer and fall-winter periods (seasonality). In a generalized linear model (GLM), sediment redox potential and pH were associated with the highest abundances of A. catenella resting cysts in the spring-summer, however it was sediment temperature that most explained the distribution of A. catenella in the fall-winter. For P. reticulatum, only spring-summer sediment redox potential and temperature explained the variation in cyst abundances. The implications of environmental (physico-chemical) seasonality for the resting cysts dynamics of both species are discussed.
Assuntos
Cistos , Dinoflagellida , Intoxicação por Frutos do Mar , Estuários , Proliferação Nociva de Algas , Humanos , Estações do AnoRESUMO
Fish-killing blooms of Heterosigma akashiwo and Pseudochattonella verruculosa have been devastating for the farmed salmon industry, but in Southern Chile the conditions that promote the growth and toxicity of these microalgae are poorly understood. This study examined the effects of different combinations of temperature (12, 15, 18 °C) and salinity (10, 20, 30 psu) on the growth of Chilean strains of these two species. The results showed that the optimal growth conditions for H. akashiwo and P. verruculosa differed, with a maximum rate of 0.99 day-1 obtained at 15 °C and a salinity of 20 psu for H. akashiwo, and a maximum rate of 1.06 day-1 obtained at 18 °C and a salinity of 30 psu for P. verruculosa. Cytotoxic assays (2 × 101 - 2 × 105 cell mL-1; cells, filtrates, and cell lysates) performed at salinities of 20 and 30 psu showed a 100% reduction in the viability of embryonic fish cells exposed to intact cells of H. akashiwo and a 39% reduction following exposure to culture filtrates of P. verruculosa. Differences in the fish-killing mechanisms (direct cell contact vs. extracellular substances) and physiological traits of H. akashiwo and P. verruculosa explain the recent occurrence of very large blooms under contrasting (cold-brackish vs. hot-salty) extreme climate conditions in Chile.
Assuntos
Microalgas , Estramenópilas , Animais , Homicídio , Salinidade , TemperaturaRESUMO
The bloom-forming toxic dinoflagellate Alexandrium catenella was first detected in southern Chile (39.5-55° S) 50 years ago and is responsible for most of the area's cases of paralytic shellfish poisoning (PSP). Given the complex life history of A. catenella, which includes benthic sexual cysts, in this study, we examined the potential link between latitude, toxicity, and sexual compatibility. Nine clones isolated from Chilean Patagonia were used in self- and out-crosses in all possible combinations (n = 45). The effect of latitude on toxicity, reproductive success indexes, and cyst production was also determined. Using the toxin profiles for all strains, consisting of C1, C2, GTX4, GTX1, GTX3, and NeoSTX, a latitudinal gradient was determined for their proportions (%) and content per cell (pg cell-1), with the more toxic strains occurring in the north (-40.6° S). Reproductive success also showed a latitudinal tendency and was lower in the north. None of the self-crosses yielded resting cysts. Rather, the production of resting cysts was highest in pairings of clones separated by distances of 1000-1650 km. Our results contribute to a better understanding of PSP outbreaks in the region and demonstrate the importance of resting cysts in fueling new toxic events. They also provide additional evidence that the introduction of strains from neighboring regions is a cause for concern.
Assuntos
Dinoflagellida/genética , Dinoflagellida/metabolismo , Toxinas Marinhas/metabolismo , Toxinas Marinhas/toxicidade , Chile , DNA Espaçador Ribossômico/genética , Eutrofização , Toxinas Marinhas/genética , ReproduçãoRESUMO
La Miocardiopatía no compactada (MCNC) representa una anomalía de la morfología miocárdica frecuentemente asociada a una etiología genética, sin embargo, hay escasas descripciones de su asociación con enfermedades autoinmunes. Se presenta revisión de la literatura y el caso de una paciente lúpica que ingresa con signos de insuficiencia cardiaca descompensada con confirmación de VINC por ecocardiograma y resonancia magnética nuclear cardiaca (RMN-C).
Non compacted cardiomyopathy (NCNC) frequently represents an abnormality of myocardial morphology associated with a genetic etiology, however, there are few descriptions of its association with autoimmune diseases. We present a review of the literature and the case of a lupus patient who was admitted with signs of decompensated heart failure with confirmation of VINC by echocardiography and cardiac nuclear magnetic resonance (C-MRI).
Assuntos
Feminino , Lúpus Eritematoso Sistêmico , Insuficiência Cardíaca , CardiomiopatiasRESUMO
The dinoflagellate Alexandrium catenella is responsible for paralytic shellfish poisoning and negative socioeconomic impacts on the fishing industry and aquaculture. In Chilean Patagonia, the reasons underlying the significant increase in the geographical extension (from south to north) of A. catenella blooms during the last five decades are not well understood. To assess the potential spreading risk of A. catenella during an intense austral summer bloom, we conducted an in situ experiment in a "hotspot" of this dinoflagellate in southern Chile. The objective was to assess the accumulation of A. catenella resting cysts in passive (fishing nets) and active (mussels) dispersal agents during the phase of bloom decline. Large numbers of resting cysts were detected in fishing nets (maximum of 5334 cysts net-1 per month) at 5 m depth and in mussels (maximum of 16 cysts g-1 of digestive gland) near Vergara Island. The potential of these vectors to serve as inoculum sources and the implications of our findings for A. catenella population dynamics are discussed.
Assuntos
Dinoflagellida , Proliferação Nociva de Algas , Animais , Chile , EstuáriosRESUMO
Despite the continuous technical advancements around health information standards, a critical component to their widespread adoption involves political agreement between a diverse set of stakeholders. Countries that have addressed this issue have used diverse strategies. In this vision paper we present the path that Chile is taking to establish a national program to implement health information standards and achieve interoperability. The Chilean government established an inter-agency program to define the current interoperability situation, existing gaps, barriers, and facilitators for interoperable health information systems. As an answer to the identified issues, the government decided to fund a consortium of Chilean universities to create the National Center for Health Information Systems. This consortium should encourage the interaction between all health care stakeholders, both public and private, to advance the selection of national standards and define certification procedures for software and human resources in health information technologies.
Assuntos
Sistemas de Informação em Saúde , Parcerias Público-Privadas , Chile , Atenção à Saúde , Humanos , Informática MédicaRESUMO
Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.
Assuntos
Inteligência Artificial , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Obesidade/diagnóstico , Teorema de Bayes , Comorbidade , Humanos , Processamento de Linguagem Natural , Sobrepeso/diagnóstico , Máquina de Vetores de SuporteRESUMO
Alexandrium ostenfeldii is present in a wide variety of environments in coastal areas worldwide and is the only dinoflagellate known species that produces paralytic shellfish poisoning (PSP) toxins and two types of cyclic imines, spirolides (SPXs) and gymnodimines (GYMs). The increasing frequency of A. ostenfeldii blooms in the Baltic Sea has been attributed to the warming water in this region. To learn more about the optimal environmental conditions favoring the proliferation of A. ostenfeldii and its complex toxicity, the effects of temperature and salinity on the kinetics of both the growth and the net toxin production of this species were examined using a factorial design and a response-surface analysis (RSA). The results showed that the growth of Baltic A. ostenfeldii occurs over a wide range of temperatures and salinities (12.5-25.5°C and 5-21, respectively), with optimal growth conditions achieved at a temperature of 25.5°C and a salinity of 11.2. Together with the finding that a salinity > 21 was the only growth-limiting factor detected for this strain, this study provides important insights into the autecology and population distribution of this species in the Baltic Sea. The presence of PSP toxins, including gonyautoxin (GTX)-3, GTX-2, and saxitoxin (STX), and GYMs (GYM-A and GYM-B/-C analogues) was detected under all temperature and salinity conditions tested and in the majority of the cases was concomitant with both the exponential growth and stationary phases of the dinoflagellate's growth cycle. Toxin concentrations were maximal at temperatures and salinities of 20.9°C and 17 for the GYM-A analogue and > 19°C and 15 for PSP toxins, respectively. The ecological implications of the optimal conditions for growth and toxin production of A. ostenfeldii in the Baltic Sea are discussed.
Assuntos
Dinoflagellida/crescimento & desenvolvimento , Dinoflagellida/metabolismo , Toxinas Marinhas/análise , Oceanos e Mares , Dinâmica Populacional , Salinidade , TemperaturaRESUMO
In this work we present a system to identify and extract patient's smoking status from clinical narrative text in Spanish. The clinical narrative text was processed using natural language processing techniques, and annotated by four people with a biomedical background. The dataset used for classification had 2,465 documents, each one annotated with one of the four smoking status categories. We used two feature representations: single word token and bigrams. The classification problem was divided in two levels. First recognizing between smoker (S) and non-smoker (NS); second recognizing between current smoker (CS) and past smoker (PS). For each feature representation and classification level, we used two classifiers: Support Vector Machines (SVM) and Bayesian Networks (BN). We split our dataset as follows: a training set containing 66% of the available documents that was used to build classifiers and a test set containing the remaining 34% of the documents that was used to test and evaluate the model. Our results show that SVM together with the bigram representation performed better in both classification levels. For S vs NS classification level performance measures were: ACC=85%, Precision=85%, and Recall=90%. For CS vs PS classification level performance measures were: ACC=87%, Precision=91%, and Recall=94%.
Assuntos
Bases de Dados Factuais , Registros Eletrônicos de Saúde/classificação , Processamento de Linguagem Natural , Fumar , Teorema de Bayes , Chile , Humanos , Narração , Máquina de Vetores de SuporteRESUMO
The use of text mining and supervised machine learning algorithms on biomedical databases has become increasingly common. However, a question remains: How much data must be annotated to create a suitable training set for a machine learning classifier? In prior research with active learning in medical text classification, we found evidence that not only sample size but also some of the intrinsic characteristics of the texts being analyzed-such as the size of the vocabulary and the length of a document-may also influence the resulting classifier's performance. This study is an attempt to create a regression model to predict performance based on sample size and other text features. While the model needs to be trained on existing datasets, we believe it is feasible to predict performance without obtaining annotations from new datasets once the model is built.
Assuntos
Inteligência Artificial , Documentação/classificação , Documentação/estatística & dados numéricos , Uso Significativo/estatística & dados numéricos , Processamento de Linguagem Natural , Terminologia como Assunto , Vocabulário Controlado , Curadoria de Dados/métodos , Mineração de Dados/estatística & dados numéricos , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Tamanho da AmostraRESUMO
OBJECTIVE: This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. DESIGN: Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. MEASUREMENTS: Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. RESULTS: The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. CONCLUSION: For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.
Assuntos
Mineração de Dados/métodos , Processamento de Linguagem Natural , Algoritmos , Inteligência Artificial , Humanos , Curva ROCRESUMO
BACKGROUND: Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. METHODS: We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. RESULTS: A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p < 0.05). CONCLUSIONS: This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.