Documentation
A short guide to HornEval and how to use it.
Introduction
HornEval is a lightweight platform for evaluating language technologies: machine translation (MT), automatic speech recognition (ASR), and text-to-speech (TTS). It helps researchers and practitioners run human evaluations, manage datasetsdatasets, and compare systems on a leaderboard.
What is HornEval?
HornEval focuses on African languages and low-resource settings. You can run real-time or batch evaluations, upload and manage evaluation datasetsdatasets, and view ranked results on a public leaderboard. The app uses Better Auth for sign-in (Google, GitHub, Hugging Face) and protects certain routes so only signed-in userusers can access datasetsdatasets and profileprofile pages.
Evaluation types
- MT (Machine Translation) — Compare translation outputs, rate quality, and rank systems. Supports real-time and batch evaluation.
- ASR (Automatic Speech Recognition) — Evaluate transcriptions with optional reference. Batch upload and human rating.
- TTS — Placeholder for future text-to-speech evaluation.
Getting started
From the home page you can start an MT evaluation (add batches, run tasks). The leaderboard is public; datasetsdatasets and profileprofile require sign-in. Use the navbar to switch between MT, ASR, Leaderboard, and DatasetsDatasets. Sign in with Google, GitHub, or Hugging Face when prompted for protected actions.
Authentication
HornEval uses Better Auth with OAuth only (no email/password). You can sign in with Google, GitHub, or Hugging Face. Sessions are stored in MongoDB. Routes like /profileprofile, /userusers, and /datasetsdatasets are protected by the Next.js proxy; unauthenticated userusers are redirected to the home page.
Datasets
The DatasetsDatasets page (for signed-in userusers) lets you manage MT and ASR evaluation batches. You can upload JSON batches, refresh the list, and open batch details. Each batch has tasks and completion state. Only userusers with the right role can upload; others may need to ask an adminadmin.
Leaderboard
The Leaderboard page is public. It shows ranked systems (e.g. MT or ASR models) based on human evaluation scores. You can filter, pin models for comparison, and see metrics. Data comes from completed evaluation batches.
Roles & permissions
UserUsers have a role: useruser, adminadmin, or rootroot. Only rootroot can open the UserUsers page to manage accounts and roles. AdminAdmins can manage datasetsdatasets and permissions as configured in the app.