Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
PLoS One ; 15(12): e0244179, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33378340

RESUMEN

The state-of-the-art systems for most natural language engineering tasks employ machine learning methods. Despite the improved performances of these systems, there is a lack of established methods for assessing the quality of their predictions. This work introduces a method for explaining the predictions of any sequence-based natural language processing (NLP) task implemented with any model, neural or non-neural. Our method named EXSEQREG introduces the concept of region that links the prediction and features that are potentially important for the model. A region is a list of positions in the input sentence associated with a single prediction. Many NLP tasks are compatible with the proposed explanation method as regions can be formed according to the nature of the task. The method models the prediction probability differences that are induced by careful removal of features used by the model. The output of the method is a list of importance values. Each value signifies the impact of the corresponding feature on the prediction. The proposed method is demonstrated with a neural network based named entity recognition (NER) tagger using Turkish and Finnish datasets. A qualitative analysis of the explanations is presented. The results are validated with a procedure based on the mutual information score of each feature. We show that this method produces reasonable explanations and may be used for i) assessing the degree of the contribution of features regarding a specific prediction of the model, ii) exploring the features that played a significant role for a trained model when analyzed across the corpus.


Asunto(s)
Procesamiento de Lenguaje Natural , Aprendizaje Automático , Programas Informáticos
2.
PLoS One ; 15(8): e0236863, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32780736

RESUMEN

Much valuable information is embedded in social media posts (microposts) which are contributed by a great variety of persons about subjects that of interest to others. The automated utilization of this information is challenging due to the overwhelming quantity of posts and the distributed nature of the information related to subjects across several posts. Numerous approaches have been proposed to detect topics from collections of microposts, where the topics are represented by lists of terms such as words, phrases, or word embeddings. Such topics are used in tasks like classification and recommendations. The interpretation of topics is considered a separate task in such methods, albeit they are becoming increasingly human-interpretable. This work proposes an approach for identifying machine-interpretable topics of collective interest. We define topics as a set of related elements that are associated by having posted in the same contexts. To represent topics, we introduce an ontology specified according to the W3C recommended standards. The elements of the topics are identified via linking entities to resources published on Linked Open Data (LOD). Such representation enables processing topics to provide insights that go beyond what is explicitly expressed in the microposts. The feasibility of the proposed approach is examined by generating topics from more than one million tweets collected from Twitter during various events. The utility of these topics is demonstrated with a variety of topic-related tasks along with a comparison of the effort required to perform the same tasks with words-list-based representations. Manual evaluation of randomly selected 36 sets of topics yielded 81.0% and 93.3% for the precision and F1 scores respectively.


Asunto(s)
Algoritmos , Web Semántica , Humanos , Medios de Comunicación Sociales
3.
PLoS One ; 11(3): e0151885, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26991442

RESUMEN

Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public opinion mining, marketing, entertainment, and disaster management. Towards this end, approaches to relate single or partial posts to knowledge base items have been proposed. However, in microblogging systems like Twitter, topics emerge from the culmination of a large number of contributions. Therefore, identifying topics based on collections of posts, where individual posts contribute to some aspect of the greater topic is necessary. Models, such as Latent Dirichlet Allocation (LDA), propose algorithms for relating collections of posts to sets of keywords that represent underlying topics. In these approaches, figuring out what the specific topic(s) the keyword sets represent remains as a separate task. Another issue in topic detection is the scope, which is often limited to specific domain, such as health. This work proposes an approach for identifying domain-independent specific topics related to sets of posts. In this approach, individual posts are processed and then aggregated to identify key tokens, which are then mapped to specific topics. Wikipedia article titles are selected to represent topics, since they are up to date, user-generated, sophisticated articles that span topics of human interest. This paper describes the proposed approach, a prototype implementation, and a case study based on data gathered during the heavily contributed periods corresponding to the four US election debates in 2012. The manually evaluated results (0.96 precision) and other observations from the study are discussed in detail.


Asunto(s)
Blogging , Internet , Medios de Comunicación Sociales , Algoritmos , Minería de Datos , Opinión Pública
4.
IEEE J Biomed Health Inform ; 18(4): 1363-9, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25014939

RESUMEN

Radiologists inspect CT scans and record their observations in reports to communicate with physicians. These reports may suffer from ambiguous language and inconsistencies resulting from subjective reporting styles, which present challenges in interpretation. Standardization efforts, such as the lexicon RadLex for radiology terms, aim to address this issue by developing standard vocabularies. While such vocabularies handle consistent annotation, they fall short in sufficiently processing reports for intelligent applications. To support such applications, the semantics of the concepts as well as their relationships must be modeled, for which, ontologies are effective. They enable the software to make inferences beyond what is present in the reports. This paper presents the open-source ontology onlira (Ontology of the Liver for Radiology), which is developed to support such intelligent applications, such as identifying and ranking similar liver patient cases. onlira is introduced in terms of its concepts, properties, and relations. Examples of real liver patient cases are provided for illustration purposes. The ontology is evaluated in terms of its ability to express real liver patient cases and address semantic queries.


Asunto(s)
Ontologías Biológicas , Biología Computacional/métodos , Hígado/diagnóstico por imagen , Semántica , Tomografía Computarizada por Rayos X/métodos , Humanos , Pulmón/diagnóstico por imagen
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA