EduNER: a Chinese named entity recognition dataset for education research.

Li, Xu; Wei, Chengkun; Jiang, Zhuoren; Meng, Wenlong; Ouyang, Fan; Zhang, Zihui; Chen, Wenzhi

Li, Xu; Wei, Chengkun; Jiang, Zhuoren; Meng, Wenlong; Ouyang, Fan; Zhang, Zihui; Chen, Wenzhi.

Afiliación

Li X; College of Computer Science and Technology, Zhejiang University, 38 Zheda Rd., Hangzhou, 310027 Zhejiang China.
Wei C; College of Computer Science and Technology, Zhejiang University, 38 Zheda Rd., Hangzhou, 310027 Zhejiang China.
Jiang Z; School of Public Affairs, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058 Zhejiang China.
Meng W; College of Computer Science and Technology, Zhejiang University, 38 Zheda Rd., Hangzhou, 310027 Zhejiang China.
Ouyang F; College of Education, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058 Zhejiang China.
Zhang Z; Information Technology Center, Zhejiang University, 866 Yuhangtang Rd., Hangzhou, 310058 Zhejiang China.
Chen W; College of Computer Science and Technology, Zhejiang University, 38 Zheda Rd., Hangzhou, 310027 Zhejiang China.

Neural Comput Appl ; : 1-15, 2023 May 20.

Article en En | MEDLINE | ID: mdl-37362570

RESUMEN

A high-quality domain-oriented dataset is crucial for the domain-specific named entity recognition (NER) task. In this study, we introduce a novel education-oriented Chinese NER dataset (EduNER). To provide representative and diverse training data, we collect data from multiple sources, including textbooks, academic papers, and education-related web pages. The collected documents span ten years (2012-2021). A team of domain experts is invited to accomplish the education NER schema definition, and a group of trained annotators is hired to complete the annotation. A collaborative labeling platform is built for accelerating human annotation. The constructed EduNER dataset includes 16 entity types, 11k+ sentences, and 35,731 entities. We conduct a thorough statistical analysis of EduNER and summarize its distinctive characteristics by comparing it with eight open-domain or domain-specific NER datasets. Sixteen state-of-the-art models are further utilized for NER tasks validation. The experimental results can enlighten further exploration. To the best of our knowledge, EduNER is the first publicly available dataset for NER task in the education domain, which may promote the development of education-oriented NER models.

Palabras clave

Benchmark; Chinese named entity recognition; Dataset; Education

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Neural Comput Appl Año: 2023 Tipo del documento: Article Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google