EU AI Act Compliance

Connecting…

Global Compliance Score

scoring…

Aggregated EU AI Act score across all recently audited LLM traces.

Compliance by Article (EU AI Act 2024/1689)

Article Requirement Traces Pass Fail Avg Score Status
verified_user

Evaluator Calibration — is the judge trustworthy?

Cohen's κ measures live agreement between the Gemini evaluator and human auditors on a labelled golden set (20 traces, 100 article-judgements). We treat the judge like any model: measure it, then improve it. Expanding the set + tuning the rubric anchors moved κ from 0.24 (fair) to “substantial” agreement (κ ≈ 0.65) — the conventional bar for a trustworthy evaluator. Measuring the judge is the point.

Select a trace to analyse, or click Evaluate.
close