Series

The Benchmark

1 edition

  1. The Benchmark — MMLU (Massive Multitask Language Understanding)

    A plain-English explainer of one AI evaluation benchmark: what it measures, how it works, and when to trust it.