Skip to content

Pull requests: openai/evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add agent pre-action control eval
#1658 opened May 10, 2026 by mindbomber Loading…
13 tasks done
Add atr_prompt_injection eval (modelgraded safety, 16 multilingual samples)
#1657 opened May 10, 2026 by eeee2345 Loading…
13 tasks done
Add agent-tool-abstention eval (13 samples, Match template)
#1656 opened May 8, 2026 by MukundaKatta Loading…
5 tasks done
Add agent-tool-routing eval (12 samples, Match template)
#1655 opened May 8, 2026 by MukundaKatta Loading…
5 tasks done
Fix OpenAI completion args routing
#1653 opened Apr 23, 2026 by kayametehan Loading…
Add explain mode to HumanCliSolver
#1652 opened Apr 23, 2026 by kayametehan Loading…
Handle nested token usage details in oaieval
#1650 opened Apr 23, 2026 by kayametehan Loading…
Add Turkish proverbs eval
#1649 opened Apr 23, 2026 by kayametehan Loading…
eval: add RAIL Score responsible AI evaluation across 8 dimensions
#1640 opened Apr 2, 2026 by SumitVermakgp Loading…
12 tasks done
README: fix Evals starter guide link
#1623 opened Feb 19, 2026 by dcol91863 Loading…
Add Logic Stress Stress-test Suite (v2, v3)
#1622 opened Feb 16, 2026 by 14H034160212 Contributor Loading…
fix: correct typos in evals
#1621 opened Feb 7, 2026 by thecaptain789 Loading…
Improving CI
#1617 opened Feb 5, 2026 by fsdavi Loading…
13 tasks
Pritiks23 patch 1
#1613 opened Feb 3, 2026 by Pritiks23 Loading…
13 tasks done
Refactor JSONL file loading logic in data.py
#1612 opened Feb 3, 2026 by Pritiks23 Loading…
13 tasks done
Add powershell-encoding-basics eval
#1611 opened Jan 28, 2026 by TheodorNEngoy Loading…
Update to python 3.12
#1607 opened Dec 21, 2025 by omonimus1 Loading…
ProTip! What’s not been updated in a month: updated:<2026-04-10.