Human-in-the-loop quality evaluation
Drop your v3-results-*.json file here
v3-results-*.json
Supports v3 benchmark datasets (1,400+ outputs)