Búsqueda | Portal Regional de la BVS

On cross-lingual retrieval with multilingual text encoders.

Litschko, Robert; Vulic, Ivan; Ponzetto, Simone Paolo; Glavas, Goran.

Inf Retr Boston ; 25(2): 149-183, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35573078

RESUMEN

Pretrained multilingual text encoders based on neural transformer architectures, such as multilingual BERT (mBERT) and XLM, have recently become a default paradigm for cross-lingual transfer of natural language processing models, rendering cross-lingual word embedding spaces (CLWEs) effectively obsolete. In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs. We first treat these models as multilingual text encoders and benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR-a setup with no relevance judgments for IR-specific fine-tuning-pretrained multilingual encoders on average fail to significantly outperform earlier models based on CLWEs. For sentence-level retrieval, we do obtain state-of-the-art performance: the peak scores, however, are met by multilingual encoders that have been further specialized, in a supervised fashion, for sentence understanding tasks, rather than using their vanilla 'off-the-shelf' variants. Following these results, we introduce localized relevance matching for document-level CLIR, where we independently score a query against document sections. In the second part, we evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments. Our results show that, despite the supervision, and due to the domain and language shift, supervised re-ranking rarely improves the performance of multilingual transformers as unsupervised base rankers. Finally, only with in-domain contrastive fine-tuning (i.e., same domain, only language transfer), we manage to improve the ranking quality. We uncover substantial empirical differences between cross-lingual retrieval results and results of (zero-shot) cross-lingual transfer for monolingual retrieval in target languages, which point to "monolingual overfitting" of retrieval models trained on monolingual (English) data, even if they are based on multilingual transformers.

CHRONIC PSEUDO-OBSTRUCTION OF THE SIGMOID COLON: A CASE REPORT.

Vulic, Ivan; Sestan-Pesa, Matija; Muzina Misic, Dubravka; Pavic, Ivana; Zivkovic, Mario; Budimir, Ivan; Hrabar, Davor; Ljubicic, Neven; Nikolic, Marko.

Acta Clin Croat ; 61(4): 735-740, 2022 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37868188

RESUMEN

Chronic intestinal pseudo-obstruction (CIPO) is a rare syndrome characterized by signs of intestinal obstruction lasting for 6 months or more, in the absence of a definitive cause of obstruction. We report a case of CIPO in a 49-year-old female patient with a 6-month history of ongoing irregular bowel movements, manifested as constipation and diarrhea accompanied by abdominal pain and bloated feeling. Contrast-enhanced abdominal computed tomography and magnetic resonance enterography revealed focal thickening of a segment of the lienal flexure and intermittent areas of wider and narrower caliber along the sigmoid colon. No signs of a definitive cause of obstruction were found, but evidence for dolichosigma was revealed, which was later confirmed with colonoscopy. Due to persisting symptoms, the patient agreed to elective resection of the sigmoid colon. Following the procedure, symptoms regressed with a significant improvement in the quality of life. The patient has been regularly monitored in an outpatient setting and reports absence of the symptoms since the procedure. Pathophysiology of the resected section revealed more prominent lymphatic tissue, follicular arrangement, and reactively altered germinal centers, which can suggest CIPO.

Asunto(s)

Obstrucción Intestinal , Seudoobstrucción Intestinal , Femenino , Humanos , Persona de Mediana Edad , Colon Sigmoide/diagnóstico por imagen , Colon Sigmoide/cirugía , Colon Sigmoide/patología , Calidad de Vida , Obstrucción Intestinal/diagnóstico , Obstrucción Intestinal/etiología , Obstrucción Intestinal/cirugía , Seudoobstrucción Intestinal/diagnóstico , Seudoobstrucción Intestinal/etiología , Seudoobstrucción Intestinal/cirugía , Tomografía Computarizada por Rayos X

Trustworthy Wireless Sensor Networks for Monitoring Humidity and Moisture Environments.

Prodanovic, Radomir; Sarang, Sohail; Rancic, Dejan; Vulic, Ivan; Stojanovic, Goran M; Stankovski, Stevan; Ostojic, Gordana; Baranovski, Igor; Maksovic, Dusan.

Sensors (Basel) ; 21(11)2021 May 24.

Artículo en Inglés | MEDLINE | ID: mdl-34073687

RESUMEN

Wireless sensors networks (WSNs) are characterized by flexibility and scalability in any environment. These networks are increasingly used in agricultural and industrial environments and have a dual role in data collection from sensors and transmission to a monitoring system, as well as enabling the management of the monitored environment. Environment management depends on trust in the data collected from the surrounding environment, including the time of data creation. This paper proposes a trust model for monitoring humidity and moisture in agricultural and industrial environments. The proposed model uses a digital signature and public key infrastructure (PKI) to establish trust in the data source, i.e., the trust in the sensor. Trust in data generation is essential for real-time environmental monitoring and subsequent analyzes, thus timestamp technology is implemented here to further ensure that gathered data are not created or changed after the assigned time. Model validation is performed using the Castalia network simulator by testing energy consumption at the receiver and sender nodes and the delay incurred by creating or validating a trust token. In addition, validation is also performed using the Ascertia TSA Crusher application for the time consumed to obtain a timestamp from the free TSA. The results show that by applying different digital signs and timestamps, the trust entity of the WSN improved significantly with an increase in power consumption of the sender node by up to 9.3% and receiver node by up to 126.3% for a higher number of nodes, along with a packet delay of up to 15.6% and an average total time consumed up to 1.186 s to obtain the timestamp from the best chosen TSA, which was as expected.

Wireless Sensor Network in Agriculture: Model of Cyber Security.

Prodanovic, Radomir; Rancic, Dejan; Vulic, Ivan; Zoric, Nenad; Bogicevic, Dusan; Ostojic, Gordana; Sarang, Sohail; Stankovski, Stevan.

Sensors (Basel) ; 20(23)2020 Nov 25.

Artículo en Inglés | MEDLINE | ID: mdl-33255859

RESUMEN

Nowadays, wireless sensor networks (WSN) are widely used in agriculture monitoring to improve the quality and productivity of farming. In this application, sensors gather different types of data (i.e., humidity, carbon dioxide level, and temperature) in real-time scenarios. Thus, data gathering, transmission, and rapid response to new circumstances require a secured data mechanism to avoid malicious adversaries. Therefore, this paper focuses on data security from the data origin source to the end-user, and proposes a general data security model that is independent of the network topology and structure, and can be widely used in the agriculture monitoring application. The developed model considers practical aspects, the architecture of the sensor node, as well as the necessity to save energy while ensuring data security, and optimize the model through the application of organizational and technical measures. The model evaluation is conducted through simulation in terms of energy consumption. The result shows that the proposed model ensures good data security at the cost of a slight increase in energy consumption at receiver and sender nodes, and energy consumption per bit, up to 2%, 7%, and 1.3%, respectively, due to overhead added for authentication in the network.

A deep learning approach to bilingual lexicon induction in the biomedical domain.

Heyman, Geert; Vulic, Ivan; Moens, Marie-Francine.

BMC Bioinformatics ; 19(1): 259, 2018 07 09.

Artículo en Inglés | MEDLINE | ID: mdl-29986664

RESUMEN

BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations. RESULTS: The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining. The best results are obtained by exploiting the synergy between these word-level and character-level representations in the classification model. We evaluate the models both quantitatively and qualitatively. CONCLUSIONS: Translation of domain-specific biomedical terminology benefits from the character-level representations compared to relying solely on word-level representations. It is beneficial to take a deep learning approach and learn character-level representations rather than relying on handcrafted representations that are typically used. Our combined model captures the semantics at the word level while also taking into account that specialized terminology often originates from a common root form (e.g., from Greek or Latin).

Asunto(s)

Minería de Datos/métodos , Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Semántica , Humanos , Bases del Conocimiento , Multilingüismo

Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.

Chiu, Billy; Pyysalo, Sampo; Vulic, Ivan; Korhonen, Anna.

BMC Bioinformatics ; 19(1): 33, 2018 02 05.

Artículo en Inglés | MEDLINE | ID: mdl-29402212

RESUMEN

BACKGROUND: Word representations support a variety of Natural Language Processing (NLP) tasks. The quality of these representations is typically assessed by comparing the distances in the induced vector spaces against human similarity judgements. Whereas comprehensive evaluation resources have recently been developed for the general domain, similar resources for biomedicine currently suffer from the lack of coverage, both in terms of word types included and with respect to the semantic distinctions. Notably, verbs have been excluded, although they are essential for the interpretation of biomedical language. Further, current resources do not discern between semantic similarity and semantic relatedness, although this has been proven as an important predictor of the usefulness of word representations and their performance in downstream applications. RESULTS: We present two novel comprehensive resources targeting the evaluation of word representations in biomedicine. These resources, Bio-SimVerb and Bio-SimLex, address the previously mentioned problems, and can be used for evaluations of verb and noun representations respectively. In our experiments, we have computed the Pearson's correlation between performances on intrinsic and extrinsic tasks using twelve popular state-of-the-art representation models (e.g. word2vec models). The intrinsic-extrinsic correlations using our datasets are notably higher than with previous intrinsic evaluation benchmarks such as UMNSRS and MayoSRS. In addition, when evaluating representation models for their abilities to capture verb and noun semantics individually, we show a considerable variation between performances across all models. CONCLUSION: Bio-SimVerb and Bio-SimLex enable intrinsic evaluation of word representations. This evaluation can serve as a predictor of performance on various downstream tasks in the biomedical domain. The results on Bio-SimVerb and Bio-SimLex using standard word representation models highlight the importance of developing dedicated evaluation resources for NLP in biomedicine for particular word classes (e.g. verbs). These are needed to identify the most accurate methods for learning class-specific representations. Bio-SimVerb and Bio-SimLex are publicly available.

Asunto(s)

Tecnología Biomédica , Semántica , Programas Informáticos , Bases de Datos como Asunto , Humanos , Lenguaje , Procesamiento de Lenguaje Natural

Investigating the cross-lingual translatability of VerbNet-style classification.

Majewska, Olga; Vulic, Ivan; McCarthy, Diana; Huang, Yan; Murakami, Akira; Laippala, Veronika; Korhonen, Anna.

Lang Resour Eval ; 52(3): 771-799, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30956632

RESUMEN

VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA