LIVE
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
All systems operational0 AI providers monitored, polled every 2 minutes
Live status
All endpoints

Benchmarks

Free
GET /api/benchmarks

The /api/benchmarks endpoint returns benchmark scores for major AI models across SWE-bench (real software engineering tasks), MMLU-Pro (general reasoning), HumanEval (code generation), GPQA Diamond (graduate science), and MATH (competition math). Updated weekly as new scores publish.

When to use this endpoint

When your agent needs to compare model capability on a specific dimension. For per-benchmark leaderboard views see /benchmarks/[name]; for time-series of one model on one benchmark, use /api/premium/history/benchmarks/series.

Example response

{
  "ok": true,
  "lastUpdated": "2026-04-24",
  "benchmarks": [
    { "id": "swe_bench", "name": "SWE-bench", "description": "Real GitHub issue resolution", "maxScore": 100 }
  ],
  "models": [
    {
      "model": "Claude Opus 4.7",
      "provider": "Anthropic",
      "scores": { "swe_bench": 65.4, "mmlu_pro": 93.8, "human_eval": 96.2 }
    }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed

tf = TensorFeed()
b = tf.benchmarks()
# Sort models by SWE-bench desc
ranked = sorted(b["models"], key=lambda m: m["scores"].get("swe_bench", 0), reverse=True)

TypeScript SDK

import { TensorFeed } from 'tensorfeed';

const tf = new TensorFeed();
const { models } = await tf.benchmarks();
const top = models
  .filter(m => m.scores.swe_bench)
  .sort((a, b) => b.scores.swe_bench - a.scores.swe_bench);

FAQ

Where do the benchmark scores come from?

Published scores from each benchmark's official leaderboard plus, where applicable, vendor-published numbers verified against the test methodology. We do not run independent benchmark evaluations.

How current are the benchmark scores?

Updated weekly via the daily catalog cron. New model launches typically land within a few days of public score publication.

Related endpoints