Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros











Intervalo de año de publicación
1.
J Med Internet Res ; 26: e48443, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38271060

RESUMEN

BACKGROUND: The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text. OBJECTIVE: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner. METHODS: We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models' outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text. RESULTS: The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model's performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts. CONCLUSIONS: The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI.


Asunto(s)
Inteligencia Artificial , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural , Privacidad , China
2.
Russ Linguist ; 48(1): 2, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38125924

RESUMEN

The subject of this study is the so-called "Surzhyk", a mixed Ukrainian-Russian variety used by millions of people in Ukraine, sometimes alongside Ukrainian and, less commonly, alongside Russian. More specifically, the focus here is on the lexicon, addressing the following questions: (i) To what extent is the mixed speech lexicon influenced by Ukrainian or Russian? (ii) Does the distribution of Ukrainian or Russian lexemes reveal a reduction in variation, i.e. patterns of stabilisation? In other words, are there tendencies for one of the two competing, synonymous, or functionally equivalent Ukrainian or Russian lexemes to prevail over the other? Many Ukrainian linguists have stereotypically claimed for years that the distribution of Ukrainian and Russian elements in Surzhyk is unpredictable, spontaneous, if not chaotic. It is worth noting that these opinions are not based on comprehensive, systematic empirical evidence and largely ignore theoretical developments in the field of code-mixing. In contrast, by means of a quantitative analysis of an extensive corpus and a focus on intra-sentential code-mixing, this study demonstrates that the majority of recorded lexical Ukrainian-Russian competitions exhibit a clear fixation on one of the two expressions, resulting in a reduction in variation. In these instances, one of the two expressions prevails extensively across the entire region of Central Ukraine and the Black Sea Coast. Surzhyk is evidently evolving towards a "fused lect". A smaller portion of the examined instances reveals such stabilisation only in certain parts of the survey area, and another equally small portion exhibits widespread variability. In general, Ukrainian and Russian lexemes are roughly balanced in quantity.

3.
PeerJ Comput Sci ; 9: e1312, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37409088

RESUMEN

With the massive use of social media today, mixing between languages in social media text is prevalent. In linguistics, the phenomenon of mixing languages is known as code-mixing. The prevalence of code-mixing exposes various concerns and challenges in natural language processing (NLP), including language identification (LID) tasks. This study presents a word-level language identification model for code-mixed Indonesian, Javanese, and English tweets. First, we introduce a code-mixed corpus for Indonesian-Javanese-English language identification (IJELID). To ensure reliable dataset annotation, we provide full details of the data collection and annotation standards construction procedures. Some challenges encountered during corpus creation are also discussed in this paper. Then, we investigate several strategies for developing code-mixed language identification models, such as fine-tuning BERT, BLSTM-based, and CRF. Our results show that fine-tuned IndoBERTweet models can identify languages better than the other techniques. This is the result of BERT's ability to understand each word's context from the given text sequence. Finally, we show that sub-word language representation in BERT models can provide a reliable model for identifying languages in code-mixed texts.

4.
Lang Speech ; 66(2): 412-441, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35903821

RESUMEN

Mouth activity forms a key component of all sign languages. This can be divided into mouthings, which originate from words in the ambient spoken language, and mouth gestures, which do not. This study examines the relationship between the distribution of mouthings co-occurring with verb signs in British Sign Language (BSL) and various linguistic and social factors, using the BSL Corpus. We find considerable variation between participants and a lack of homogeneity in mouth actions with particular signs. This accords with previous theories that mouthings constitute code-blending between spoken and signed languages-similar to code-switching or code-mixing in spoken languages-rather than being a phonologically or lexically compulsory part of the sign. We also find a strong association between production of plain verbs (which are body-anchored and cannot be modified spatially) and increased mouthing. In addition, we observe significant effects of region (signers from the south of the United Kingdom mouth more than those from the north), gender (women mouth more than men), and age (signers aged 16-35 years produce fewer mouthings than older participants). We find no significant effect of language background (deaf vs. hearing family). Based on these findings, we argue that the multimodal, multilingual, and simultaneous nature of code-blending in sign languages fits well within the paradigm of translanguaging. We discuss implications of this for concepts of translanguaging, code-switching, code-mixing, and related phenomena, highlighting the need to consider not just modality and linguistic codes but also sequential versus simultaneous patterning.


Asunto(s)
Lenguaje , Lengua de Signos , Masculino , Humanos , Femenino , Lingüística , Gestos , Boca
5.
Transl Issues Psychol Sci ; 9(4): 323-337, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38405269

RESUMEN

Language switching is common in bilingual environments, including those of many bilingual children. Some bilingual children hear rapid switching that involves immediate translation of words (an 'immediate-translation' pattern), while others hear their languages most often in long blocks of a single language (a 'one-language-at-a-time' pattern). Our two-site experimental study compared two groups of developing bilinguals from different communities, and investigated whether differences in the timing of language switching impose different demands on bilingual children's learning of novel nouns in their two languages: do children learn differently if they hear a translation immediately vs. if they hear translations more separated in time? Using an at-home online tablet word learning task, data were collected asynchronously from 3- to 5-year-old bilinguals from French-English bilingual families in Montreal, Canada (N = 31) and Spanish-English bilingual families in New Jersey, USA (N = 22). Results showed that bilingual children in both communities readily learned new words, and their performance was similar across the immediate-translation and one-language-at-a-time conditions. Our findings highlight that different types of bilingual interactions can provide equal learning opportunities for bilingual children's vocabulary development.

6.
Stud Health Technol Inform ; 290: 627-631, 2022 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-35673092

RESUMEN

Electronic health records (EHRs) at medical institutions provide valuable sources for research in both clinical and biomedical domains. However, before such records can be used for research purposes, protected health information (PHI) mentioned in the unstructured text must be removed. In Taiwan's EHR systems the unstructured EHR texts are usually represented in the mixing of English and Chinese languages, which brings challenges for de-identification. This paper presented the first study, to the best of our knowledge, of the construction of a code-mixed EHR de-identification corpus and the evaluation of different mature entity recognition methods applied for the code-mixed PHI recognition task.


Asunto(s)
Confidencialidad , Registros Electrónicos de Salud , Lenguaje , Procesamiento de Lenguaje Natural , Taiwán
7.
Soft comput ; : 1-18, 2022 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-35493275

RESUMEN

Code-mixing on social media is a trend in many countries where people speak multiple languages, such as India, where Hindi and English are major communication languages. Sentiment analysis is beneficial in understanding users' opinions and thoughts on social, economic, and political issues. It eliminates the manual monitoring of each and every review, which is a cumbersome task. However, performing sentiment analysis on code-mix data is challenging, as it involves various out of vocabulary terms and numerous issues, making it a new field in natural language processing. This work includes dealing with such text and ensembling a classifier to detect sentiment polarity. Our classifier ensembles a multilingual variant of RoBERTa and a sentence-level embedding from Universal Sentence Encoder to identify the sentiments of these code-mixed tweets with higher accuracy. This ensemble optimises the classifier's performance by using the strength of both for transfer learning. Experiments were conducted on real-life benchmark datasets and revealed their sentiment. The performance of the proposed classifier framework is compared with other baselines and deep learning models on five datasets to show the superiority of our results. Results showed improved and increased performance in the proposed classifier's accuracy, precision, and recall. The accuracy achieved by our classifier on code-mix datasets is 66% on Joshi et al. 2016, 60% on SAIL 2017, and 67% on SemEval 2020 Task-9 dataset, which is on average around 3% as compared to contemporary baselines.

8.
Front Psychol ; 12: 726764, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34744892

RESUMEN

This article investigates the role of direct input in the code-mixing of three bilingual children aged 2-4 years acquiring English as one language, and either German, Polish, or Finnish as the other. From a usage-based perspective, it is assumed that early children's utterances are item-based and that they contain many lexically fixed patterns. To account for such patterns, the traceback method has been developed to test the hypothesis that children's utterances are constructed on the basis of a limited inventory of chunks and frame-and-slot patterns. We apply this method to the code-mixed utterances, suggesting that much of the code-mixing occurs within frame-and-slot patterns, such as Was ist X? as in Was ist breakfast muesli? "What is breakfast muesli?" We further analyzed each code-mixed utterance in terms of priming. Our findings suggest that much of the early code-mixing is based on concrete lexically fixed patterns which are subject to input occurring in immediately prior speech, either the child's own or that of her caregivers.

9.
Rev. logop. foniatr. audiol. (Ed. impr.) ; 41(3): 151-160, Juli-Sep. 2021. ilus, tab, graf
Artículo en Inglés | IBECS | ID: ibc-227197

RESUMEN

Introduction: This study examined word identification among Spanish-English bilingual and English monolingual individuals, i.e., speech perception using a gated word task examining phonetic category formation. Methods: Participants included 25 English-speaking monolingual and 10 fluent Spanish-English speaking bilingual adults. The experiment included a gating task incorporating Spanish sentences, English sentences, code-mixed Spanish-English sentences, and code-mixed English-Spanish sentences. The gated words at the end of the sentences consisted of voiceless initial consonants, voiced initial consonants, CV-tense words, or CV-lax words. Results: There was a significant main effect for consonant voicing features and for language group. In addition, there was a significant interaction between consonant voicing features and language group. A significant main effect for vowel tenseness was also found along with a significant interaction between vowel tenseness and language group. Discussion: Results suggest that bilinguals are capable of ultimate attainment, with phonotactic capabilities similar to that of monolingual speakers when identifying English words. Bilingual speakers may also be more sensitive to voicing features for both Spanish and English words.(AU)


Introducción: Este estudio examinó la identificación de las palabras entre los individuos bilingües español-inglés y monolingües de habla inglesa, es decir, la percepción del lenguaje utilizando un sistema de palabras cerradas que examina la formación de la categoría fonética. Métodos: Los participantes incluyeron 25 adultos monolingües de habla inglesa y 10 adultos bilingües español-inglés. El experimento incluyó un sistema cerrado que incorporaba frases en español, frases en inglés, frases español-inglés de código mixto y frases inglés-español de código mixto. Las palabras cerradas al final de las frases consistían en consonantes iniciales sordas, consonantes iniciales sonoras, palabras tensas con CV o palabras laxas con CV. Resultados: Se produjo un efecto principal significativo para las características de entonación de las consonantes y para el grupo de lenguaje. Además, se produjo una interacción entre las características de entonación de las consonantes y el grupo de lenguaje. También se encontró un efecto principal significativo de la tensión de las vocales, y una interacción significativa entre la tensión de las vocales y el grupo de lenguaje. Discusión: Los resultados sugieren que las personas bilingües son capaces de alcanzar un logro máximo, con capacidades fonotácticas similares a las de los hablantes monolingües, al identificar las palabras en inglés. Los hablantes bilingües pueden ser más sensibles también a las características de entonación de las palabras españolas e inglesas.(AU)


Asunto(s)
Humanos , Masculino , Femenino , Lenguaje , Desarrollo del Lenguaje , Multilingüismo , Fonética , Habla , Fonoaudiología , Audiología
10.
Front Psychol ; 12: 682838, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34385956

RESUMEN

This paper offers an inductive, exploratory study on the role of input and individual differences in the early code-mixing of bilingual children. Drawing on data from two German-English bilingual children, aged 2-4, we use the traceback method to check whether their code-mixed utterances can be accounted for with the help of constructional patterns that can be found in their monolingual data and/or in their caregivers' input. In addition, we apply the traceback method to check whether the patterns used by one child can also be found in the input of the other child. Results show that patterns found in the code-mixed utterances could be traced back to the input the children receive, suggesting that children extract lexical knowledge from their environment. Additionally, tracing back patterns within each child was more successful than tracing back to the other child's corpus, indicating that each child has their own set of patterns which depends very much on their individual input. As such, these findings can shed new light on the interplay of the two developing grammars in bilingual children and their individual differences.

11.
Front Psychol ; 11: 488, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32390892

RESUMEN

In the literature, the term code-mixing/switching refers to instances of language mixing in which speakers/signers combine properties of two or more languages in their utterances. Such a linguistic behavior is typically discussed in the context of multilinguals, and experts commonly focus on the form of language mixing/switching and its cross-linguistic commonalities. Not much is known, however, about how the knowledge of code-mixing comes about. How come any speaker/signer having access to more than one externalization channel (spoken or signed) code-mixes spontaneously? Likewise, why do both neurotypical speakers/signers and certain neuro-atypical speakers/signers produce structurally similar mixing types? This paper offers some answers to these questions arguing that the cognitive process underlying code-mixing is a basic property of the human learning device: recombination, a fully automated cognitive process. Recombination is innate: it allows learners to select relevant linguistic features from heterogeneous inputs, and recombine them into new syntactic objects as part of their mental grammars whose extensions, arguably individual idiolects, represents what Aboh (2015b,a, 2019b) characterizes as hybrid grammars.

12.
Biling (Camb Engl) ; 23(3): 500-518, 2020 May.
Artículo en Inglés | MEDLINE | ID: mdl-33776543

RESUMEN

Although there is a body of work investigating code-switching (alternation between two languages in production) in the preschool period, it largely relies on case studies or very small samples. The current work seeks to extend extant research by exploring the development of code-switching longitudinally from 31 to 39 months of age in two distinct groups of bilingual children: Spanish-English children in San Diego and French-English children in Montréal. In two studies, consistent with previous research, children code-switched more often between than within utterances and code-switched more content than function words. Additionally, children code-switched more from Spanish or French to English than the reverse. Importantly, the factors driving the rate of code-switching differed across samples such that exposure was the most important predictor of code-switching in Spanish-English children whereas proficiency was the more important predictor in French-English children.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA