Global Compliance Score
—
scoring…
Aggregated EU AI Act score across all recently audited LLM traces.
Compliance by Article (EU AI Act 2024/1689)
| Article | Requirement | Traces | Pass | Fail | Avg Score | Status |
|---|
verified_user
Evaluator Calibration — is the judge trustworthy?
Cohen's κ measures live agreement between the Gemini evaluator and human auditors on a labelled golden set (20 traces, 100 article-judgements). We treat the judge like any model: measure it, then improve it. Expanding the set + tuning the rubric anchors moved κ from 0.24 (fair) to “substantial” agreement (κ ≈ 0.65) — the conventional bar for a trustworthy evaluator. Measuring the judge is the point.
Select a trace to analyse, or click Evaluate.