Your browser doesn't support javascript.
loading
Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora.
Temnikova, Irina P; Baumgartner, William A; Hailu, Negacy D; Nikolova, Ivelina; McEnery, Tony; Kilgarriff, Adam; Angelova, Galia; Cohen, K Bretonnel.
Afiliación
  • Temnikova IP; Qatar Computing Research Institute, Doha, Qatar.
  • Baumgartner WA; Computational Bioscience Program, Univ. of Colorado School of Medicine, USA.
  • Hailu ND; Computational Bioscience Program, Univ. of Colorado School of Medicine, USA.
  • Nikolova I; IICT, Bulgarian Academy of Sciences, Sofia, Bulgaria.
  • McEnery T; Department of Linguistics and English Language, Lancaster University, Lancaster, UK.
  • Kilgarriff A; Lexical Computing Ltd., Brighton, UK.
  • Angelova G; IICT, Bulgarian Academy of Sciences, Sofia, Bulgaria.
  • Cohen KB; Computational Bioscience Program, Univ. of Colorado School of Medicine, USA.
LREC Int Conf Lang Resour Eval ; 2014: 1714-1718, 2014 May.
Article en En | MEDLINE | ID: mdl-29568819
Sublanguages are varieties of language that form "subsets" of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed-English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: LREC Int Conf Lang Resour Eval Año: 2014 Tipo del documento: Article País de afiliación: Qatar Pais de publicación: Francia

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: LREC Int Conf Lang Resour Eval Año: 2014 Tipo del documento: Article País de afiliación: Qatar Pais de publicación: Francia