LIVE
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
All systems operational0 AI providers monitored, polled every 2 minutes
Live status
All harnesses

OpenHands

All Hands AI

OpenHands started as the open-source OpenDevin project and now ships as the reference implementation behind several top SWE-bench Verified entries. Architecturally it is a sandboxed runtime plus a small set of agent processes (CodeAct, Browser, Planner) that share a workspace. Most agentic-coding research papers in 2025-2026 use OpenHands as their substrate.

Type
agent-platform
License
Open source
Model story
Multi-model
Vendor
All Hands AI

Leaderboard Placements

BenchmarkBest base modelScoreRank
SWE-bench Verified Claude Sonnet 4.665.8#8 / 15
Terminal-Bench Claude Sonnet 4.630.1#10 / 13
Aider Polyglot ---
SWE-Lancer Claude Sonnet 4.628.4#5 / 5

Distribution

Open-source. Run as a Docker container locally or on a hosted runtime. MIT license.

Model Story

Multi-model. Most entries use Claude Sonnet 4.6 or GPT-5.5; the harness has no preferred model.

Pricing

Free harness; you pay for the underlying API tokens and any compute you host.

Who It's For

Researchers and teams building on top of an open agentic substrate, plus anyone who wants the same harness public benchmarks are run on.

Notable Features

  • CodeAct: agent expresses actions as Python code
  • Built-in browser tool for web tasks
  • Sandboxed Docker runtime per session
  • Microservice-style agent architecture (swap planners freely)
  • Reference implementation for SWE-bench paper submissions

Other Harnesses