Your browser doesn't support javascript.
loading
Language Statistics at Different Spatial, Temporal, and Grammatical Scales.
Sánchez-Puig, Fernanda; Lozano-Aranda, Rogelio; Pérez-Méndez, Dante; Colman, Ewan; Morales-Guzmán, Alfredo J; Rivera Torres, Pedro Juan; Pineda, Carlos; Gershenson, Carlos.
Afiliação
  • Sánchez-Puig F; Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
  • Lozano-Aranda R; Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
  • Pérez-Méndez D; Instituto de Fisica Interdisciplinar y Sistemas Complejos, Universidad de las Islas Baleares, 07122 Palma de Mallorca, Spain.
  • Colman E; Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
  • Morales-Guzmán AJ; Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
  • Rivera Torres PJ; Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
  • Pineda C; Posgrado en Ciencias de la Computación, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
  • Gershenson C; Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
Entropy (Basel) ; 26(9)2024 Aug 29.
Article em En | MEDLINE | ID: mdl-39330068
ABSTRACT
In recent decades, the field of statistical linguistics has made significant strides, which have been fueled by the availability of data. Leveraging Twitter data, this paper explores the English and Spanish languages, investigating their rank diversity across different scales temporal intervals (ranging from 3 to 96 h), spatial radii (spanning 3 km to over 3000 km), and grammatical word ngrams (ranging from 1-grams to 5-grams). The analysis focuses on word ngrams, examining a time period of 1 year (2014) and eight different countries. Our findings highlight the relevance of all three scales with the most substantial changes observed at the grammatical level. Specifically, at the monogram level, rank diversity curves exhibit remarkable similarity across languages, countries, and temporal or spatial scales. However, as the grammatical scale expands, variations in rank diversity become more pronounced and influenced by temporal, spatial, linguistic, and national factors. Additionally, we investigate the statistical characteristics of Twitter-specific tokens, including emojis, hashtags, and user mentions, revealing a sigmoid pattern in their rank diversity function. These insights contribute to quantifying universal language statistics while also identifying potential sources of variation.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Entropy (Basel) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: México País de publicação: Suíça

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Entropy (Basel) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: México País de publicação: Suíça