openai / evals Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 18.4k

Code
Issues 121
Pull requests 73
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: openai/evals

Labels 10 Milestones 0

New pull request New

73 Open 1,261 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add agent pre-action control eval

#1658 opened May 10, 2026 by mindbomber

Loading…

13 tasks done

Add atr_prompt_injection eval (modelgraded safety, 16 multilingual samples)

#1657 opened May 10, 2026 by eeee2345

Loading…

13 tasks done

Add agent-tool-abstention eval (13 samples, Match template)

#1656 opened May 8, 2026 by MukundaKatta

Loading…

5 tasks done

Add agent-tool-routing eval (12 samples, Match template)

#1655 opened May 8, 2026 by MukundaKatta

Loading…

5 tasks done

eval: add Oracle ERP Cloud workflow and terminology eval

#1654 opened May 4, 2026 by karthikchundi-commits

Loading…

Fix OpenAI completion args routing

#1653 opened Apr 23, 2026 by kayametehan

Loading…

Add explain mode to HumanCliSolver

#1652 opened Apr 23, 2026 by kayametehan

Loading…

Route modern OpenAI models through chat completions

#1651 opened Apr 23, 2026 by kayametehan

Loading…

Handle nested token usage details in oaieval

#1650 opened Apr 23, 2026 by kayametehan

Loading…

Add Turkish proverbs eval

#1649 opened Apr 23, 2026 by kayametehan

Loading…

Update Python version to 3.12 and refresh PR template

#1648 opened Apr 23, 2026 by kayametehan

Loading…

Add Turkish language evals: logical reasoning and grammar

#1647 opened Apr 23, 2026 by kayametehan

Loading…

eval: add RAIL Score responsible AI evaluation across 8 dimensions

#1640 opened Apr 2, 2026 by SumitVermakgp

Loading…

12 tasks done

fix: replace 11 bare except clauses with except Exception

#1626 opened Feb 25, 2026 by haosenwang1018

Loading…

Add finance-agent routing eval dataset and builder guidance

#1625 opened Feb 24, 2026 by maxpetrusenko

Loading…

README: fix Evals starter guide link

#1623 opened Feb 19, 2026 by dcol91863

Loading…

Add Logic Stress Stress-test Suite (v2, v3)

#1622 opened Feb 16, 2026 by 14H034160212 Contributor

Loading…

fix: correct typos in evals

#1621 opened Feb 7, 2026 by thecaptain789

Loading…

Improving CI

#1617 opened Feb 5, 2026 by fsdavi

Loading…

13 tasks

Add reasoning consistency eval under constrained intermediate steps

#1615 opened Feb 5, 2026 by getappai

Loading…

Pritiks23 patch 1

#1613 opened Feb 3, 2026 by Pritiks23

Loading…

13 tasks done

Refactor JSONL file loading logic in data.py

#1612 opened Feb 3, 2026 by Pritiks23

Loading…

13 tasks done

Add powershell-encoding-basics eval

#1611 opened Jan 28, 2026 by TheodorNEngoy

Loading…

Update to python 3.12

#1607 opened Dec 21, 2025 by omonimus1

Loading…

Add tnengoy_citations.dev.v0 (model-graded factuality eval)

#1603 opened Oct 12, 2025 by TheodorNEngoy

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2026-04-10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!