Agent Observability
Select a project to explore agent trajectories.
🔧
SWE-Agent
Software engineering agent trajectories from the Nebius SWE-agent benchmark — real bug-fixing sessions on open-source repos.
8 tasks742 runs406 succeeded336 failed
→🔍
TRACE
Reward hacking detection dataset from PatronusAI — agent trajectories labeled for specification gaming, reward tampering, sycophancy, and sandbagging.
516 tasks517 runs249 succeeded268 reward hacking
→