LIVE
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
All systems operational0 AI providers monitored, polled every 2 minutes
Live status
All endpoints

Inference Provider Matrix

Free
GET /api/inference-providers

The /api/inference-providers endpoint returns the cross-provider pricing matrix for open-weight models. Same Llama 4 Maverick / Scout / DeepSeek V4 / Mixtral / Qwen 2.5 weights, different price across 8 hosted providers. Each offer carries input price, output price, blended price, output TPS, context window the provider serves at, feature flags (function calling, json mode, vision), and the provider docs URL.

When to use this endpoint

When your agent is picking the cheapest hosted inference path for an open-weight model. For a single-model lookup use /api/inference-providers/cheapest instead so you do not need the full matrix.

Parameters

NameInTypeDescription
familyquerystringFilter by origin lab (Meta, DeepSeek, Mistral, Alibaba)e.g. Meta

* required

Example response

{
  "ok": true,
  "lastUpdated": "2026-04-30",
  "tracked_providers": ["Together AI", "Fireworks", "DeepInfra", "Groq", "OpenRouter", "Replicate", "Anyscale", "DeepSeek"],
  "models": [
    {
      "modelId": "llama-4-scout",
      "modelName": "Llama 4 Scout",
      "family": "Meta",
      "paramsB": 109,
      "license": "Llama 4 Community License",
      "openWeights": true,
      "offers": [
        { "provider": "DeepInfra", "inputPrice": 0.16, "outputPrice": 0.55, "blendedPrice": 0.355, "contextWindow": 10000000, "outputTPS": 170, "features": ["function-calling", "vision"] }
      ]
    }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed
tf = TensorFeed()
matrix = tf.inference_providers(family="Meta")
for m in matrix["models"]:
    cheapest = min(m["offers"], key=lambda o: o["blendedPrice"])
    print(f"{m['modelName']:<28} {cheapest['provider']:<14} ${cheapest['blendedPrice']:.3f}")

TypeScript SDK

const res = await fetch("https://tensorfeed.ai/api/inference-providers?family=Meta");
const { models } = await res.json();
for (const m of models) {
  const cheapest = m.offers.reduce((a, b) => a.blendedPrice < b.blendedPrice ? a : b);
  console.log(`${m.modelName}: ${cheapest.provider} @ $${cheapest.blendedPrice}`);
}

FAQ

Why is the same model priced differently across providers?

Each inference provider runs its own GPU fleet, quantization strategy, and batching policy. Together and Fireworks anchor on FP8 Turbo variants for speed. DeepInfra optimizes for raw cost. Groq runs custom LPU silicon at very high throughput with a context-window trade-off. The price spread on a single model is routinely 3-10x.

How fresh is this data?

Editorial weekly refresh. Provider pricing changes more often than embedding pricing but less often than spot-priced compute, so a weekly cadence is the right granularity.

Related endpoints