Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 1 de 1
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Behav Res Methods ; 54(2): 830-844, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-34357542

RESUMEN

We present Shabd, a psycholinguistic database in Hindi. It is based on a corpus of 1.4 billion words from electronic newspapers and news websites. Word frequencies and part of speech information have been derived and are made available in a cleaned list of 34 thousand hand-selected words, and a list of 96 thousand words observed with a frequency of more than 100 times in the corpus. Next to the Shabd database, we also make a list with all 2.3 million word types available and a list with the 2.5 million most frequent word pairs (word bigrams). The quality of the word frequency measure was tested in two lexical decision tasks. We observed that the Shabd word frequencies outperform existing frequencies based on smaller corpora of newspapers but not the Worldlex word frequencies based on an analysis of blogs. We also observed that word frequency accounts for as much variance as contextual diversity (operationalized as the number of documents in which the words were observed). The Shabd database is freely available for research.


Asunto(s)
Lenguaje , Psicolingüística , Blogging , Bases de Datos Factuales , Humanos , Habla
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA