Articles tagged: benchmarking

5 articles

AI research

The Next Frontier: How Artificial Intelligence is Reshaping Scientific Discovery

Artificial intelligence is revolutionizing AI research by accelerating hypothesis generation, automating experiments, and uncovering patte...

Jun 23, 20267 min
AI research

Is it agentic enough? Benchmarking open models on your own tooling

Learn how to evaluate open-source AI agents for autonomy and task completion using custom benchmarks. A practical guide for researchers an...

Jun 18, 20269 min
AI research

olmo-eval: An evaluation workbench for the model development loop

olmo-eval is an evaluation workbench designed to integrate seamlessly into the model development loop, enabling rapid iteration and system...

Jun 12, 20267 min
AI research

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

A clear and practical article about artificial intelligence for a professional audience.

Jun 10, 20264 min
AI agents

The Open Source Community is backing OpenEnv for Agentic RL

A clear and practical article about artificial intelligence for a professional audience.

Jun 8, 20268 min