Unveiling Strengths and Weaknesses of NLP Systems Based on a Rich Evaluation Corpus: the Case of NER in French - HAL UNIV-PARIS8 - open access Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

Unveiling Strengths and Weaknesses of NLP Systems Based on a Rich Evaluation Corpus: the Case of NER in French

Résumé

Named Entity Recognition (NER) is an applicative task for which annotation schemes vary. To compare the performance of systems which tagsets differ in precision and coverage, it is necessary to assess (i) the comparability of their annotation schemes and (ii) the individual adequacy of the latter to a common annotation scheme. What is more, and given the lack of robustness of some tools towards textual variation, we cannot expect an evaluation led on an homogeneous corpus with low-coverage to provide a reliable prediction of the actual tools performance. To tackle both these limitations in evaluation, we provide a gold corpus for French covering 6 textual genres and annotated with a rich tagset that enables comparison with multiple annotation schemes. We use the flexibility of this gold corpus to provide both: (i) an individual evaluation of four heterogeneous NER systems on their target tagsets, (ii) a comparison of their performance on a common scheme. This rich evaluation framework enables a fair comparison of NER systems across textual genres and annotation schemes.
Fichier principal
Vignette du fichier
FENEC_LREC_2024.pdf (230.05 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04534593 , version 1 (05-04-2024)

Identifiants

  • HAL Id : hal-04534593 , version 1

Citer

Alice Millour, Yoann Dupont, Karën Fort, Liam Duignan. Unveiling Strengths and Weaknesses of NLP Systems Based on a Rich Evaluation Corpus: the Case of NER in French. LREC-COLING 2024, May 2024, Turin, Italy. ⟨hal-04534593⟩
1 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More