Introduction

HornEval is a platform for human evaluation of machine translation (MT) and automatic speech recognition (ASR) systems, with a focus on African languages and low-resource settings.

About HornEval

HornEval helps researchers and practitioners run human evaluations, manage evaluation datasets, and compare systems on a public leaderboard.

  • Evaluation tool: evaluators rate and rank model outputs so you can compare systems objectively.
  • Crowdsourcing for evaluation: tasks can be completed by many evaluators; results aggregate into dataset benchmarks.
  • Leaderboarding tool: completed evaluations feed a public leaderboard so you can see which models perform best on which datasets and domains.

Evaluation types

The platform supports two evaluation modes:

  • MT (Machine Translation): each task has a source segment and one or more translation hypotheses. Evaluators assign a numeric rating and a preference rank.
  • ASR (Automatic Speech Recognition): each task has an audio input and one or more transcription hypotheses. Evaluators rate and rank transcriptions. ASR evaluations are batch-based.