Terminologies
Standard terms used across the platform and in batch specifications.
Batch, task, and model
- Batch: a named evaluation set (e.g. a dataset split). Contains a list of tasks.
- Task: one input (source text for MT, audio for ASR) and multiple model outputs.
- Model: a system (e.g. an MT or ASR engine) that produced one output per task.
Rate and rank
- Rate: numeric quality score (e.g. 1–5) assigned by the evaluator to each output.
- Rank: relative preference order (1 = best, 2 = second, etc.). Both are required before submission.
- Ranking must be consistent with ratings: higher-rated outputs must receive better (lower) ranks.
Rating guideline, domain, and reference
- Rating guideline: defines the scale (e.g. 1–5) and the meaning of each value. Can be attached to a batch so evaluators see it in the UI.
- Domain: optional content category (e.g. Health, News) per task.
- Reference: optional gold or corrected segment (e.g. a reference translation) that evaluators can view or add when all outputs are low quality.