Articles tagged: SWE-bench

2 articles

Guides

Testing Mythos and Fable: Moving Beyond SWE-bench with Nvidia’s Open Contender

Explore how Nvidia’s new open-source framework challenges SWE-bench dominance. Learn to test AI models with Mythos and Fable for real-worl...

Jun 20, 20268 min
AI research

Is it agentic enough? Benchmarking open models on your own tooling

Learn how to evaluate open-source AI agents for autonomy and task completion using custom benchmarks. A practical guide for researchers an...

Jun 18, 20269 min