Evaluation in natural language processing
|Section:||Language and Computation|
What are the purposes of evaluation. Different kinds of evaluation (of an hypothesis, of a resource, of a system in terms of its requirements, of a system in termos of usability, of model adequacy, of economical impact). Measures and concepts (properties of measures, relationship with desirable properties, statistical remarks). Evaluation of user-visible vs. user-transparent tasks; black-box vs. glass-box evaluation. The evaluation contest paradigm. Evaluation resources (golden resources, pooling, ablation). Baselines, ceilings, inter-annotator agreement. Corpus-based evaluation. Detailed examples: parsing, information retrieval, information extraction, machine translation, morphological analysis, and generation.