Last Updated: March 2026
AI API Pricing Guide: Every Provider Compared
AI API pricing can be confusing. Every provider uses slightly different units, some charge differently for input and output tokens, and prices change frequently. This guide breaks it all down in one place, with real cost examples so you can estimate what your project will actually cost. All prices are in USD per 1 million tokens unless noted otherwise.
Pricing Overview: All Models
Here is every major API model with its current pricing, sorted by provider. Prices are per 1 million tokens. For context, 1 million tokens is roughly 750,000 words, or about 4-5 full-length novels.
| Provider | Model | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|
| Anthropic | Claude Opus 4.6 | $15.00 | $75.00 | 200K |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Anthropic | Claude Haiku 4.5 | $0.80 | $4.00 | 200K |
| OpenAI | GPT-4o | $2.50 | $10.00 | 128K |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | 128K |
| OpenAI | o1 | $15.00 | $60.00 | 200K |
| OpenAI | o3-mini | $1.10 | $4.40 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| Meta | Llama 4 Scout | Free* | Free* | 10M |
| Meta | Llama 4 Maverick | Free* | Free* | 1M |
| Mistral | Mistral Large | $2.00 | $6.00 | 128K |
| Mistral | Mistral Small | $0.10 | $0.30 | 128K |
| Cohere | Command R+ | $2.50 | $10.00 | 128K |
| Cohere | Command R | $0.15 | $0.60 | 128K |
* Open source models are free to self-host. Hosted API pricing varies by provider (e.g., Together, Fireworks, Groq). Prices are subject to change. Check provider websites for the most current pricing.
Pricing by Provider
Anthropic
| Model | Input | Output | Context | Capabilities |
|---|---|---|---|---|
| Claude Opus 4.6 | $15.00 | $75.00 | 200K | text, vision, tool-use, code |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | text, vision, tool-use, code |
| Claude Haiku 4.5 | $0.80 | $4.00 | 200K | text, vision, tool-use, code |
OpenAI
| Model | Input | Output | Context | Capabilities |
|---|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K | text, vision, tool-use, code |
| GPT-4o-mini | $0.15 | $0.60 | 128K | text, vision, tool-use, code |
| o1 | $15.00 | $60.00 | 200K | text, reasoning, code |
| o3-mini | $1.10 | $4.40 | 200K | text, reasoning, code |
| Model | Input | Output | Context | Capabilities |
|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | text, vision, tool-use, code, reasoning |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | text, vision, tool-use, code |
Meta
| Model | Input | Output | Context | Capabilities |
|---|---|---|---|---|
| Llama 4 Scout | Free* | Free* | 10M | text, vision, code |
| Llama 4 Maverick | Free* | Free* | 1M | text, vision, code |
Mistral
| Model | Input | Output | Context | Capabilities |
|---|---|---|---|---|
| Mistral Large | $2.00 | $6.00 | 128K | text, vision, tool-use, code |
| Mistral Small | $0.10 | $0.30 | 128K | text, tool-use, code |
Cohere
| Model | Input | Output | Context | Capabilities |
|---|---|---|---|---|
| Command R+ | $2.50 | $10.00 | 128K | text, tool-use, RAG |
| Command R | $0.15 | $0.60 | 128K | text, tool-use, RAG |
Cost Calculator Examples
Abstract token prices are hard to reason about. Here are concrete examples showing what common tasks actually cost with different models. These assume typical token counts for each task type.
Example 1: Chatbot Application (10,000 conversations/month)
Assuming each conversation averages 2,000 input tokens and 1,000 output tokens:
| Model | Input Cost | Output Cost | Total/month |
|---|---|---|---|
| Claude Opus 4.6 | $300.00 | $750.00 | $1,050.00 |
| Claude Sonnet 4.6 | $60.00 | $150.00 | $210.00 |
| GPT-4o | $50.00 | $100.00 | $150.00 |
| GPT-4o-mini | $3.00 | $6.00 | $9.00 |
| Claude Haiku 4.5 | $16.00 | $40.00 | $56.00 |
| Gemini 2.0 Flash | $2.00 | $4.00 | $6.00 |
The takeaway: there is a 175x cost difference between the most expensive and cheapest options for the same workload. Choosing the right model matters enormously.
Example 2: Document Summarization (1,000 documents/month)
Assuming each document is 10,000 input tokens and the summary is 500 output tokens:
| Model | Total/month |
|---|---|
| Claude Opus 4.6 | $187.50 |
| Gemini 2.5 Pro | $17.50 |
| Mistral Small | $1.15 |
| Gemini 2.0 Flash | $1.20 |
Example 3: Code Generation (500 requests/day)
Assuming 1,500 input tokens (prompt + context) and 2,000 output tokens (generated code) per request:
| Model | Total/month |
|---|---|
| o1 (reasoning) | $2,137.50 |
| Claude Sonnet 4.6 | $517.50 |
| GPT-4o | $356.25 |
| o3-mini | $156.75 |
| GPT-4o-mini | $21.38 |
Free Tier Comparison
Most providers offer free API access with usage limits. Here is what you get without spending anything:
| Provider | Free Tier Details | Models Available | Limits |
|---|---|---|---|
| OpenAI | Free credits for new accounts | GPT-4o-mini, GPT-3.5 | Rate limited; credit expires |
| Anthropic | Free credits for new accounts | Claude Haiku, Sonnet | Rate limited; credit expires |
| Generous free tier via AI Studio | Gemini 2.0 Flash, 2.5 Pro (limited) | 15 RPM for Flash; lower for Pro | |
| Mistral | Free tier available | Mistral Small, open models | Rate limited |
| Meta (via hosts) | Free self-hosting; hosted free tiers vary | Llama 4 Scout, Maverick | Unlimited if self-hosted |
Pro tip: Google AI Studio offers the most generous free API access. If you are prototyping or building a low-traffic application, you can potentially run entirely on Google's free tier with Gemini 2.0 Flash.
Price Per Task Estimates
Here is roughly what common tasks cost per individual request using different model tiers. These are estimates based on typical token counts.
| Task | Tokens (in/out) | Frontier Model | Mid-tier Model | Budget Model |
|---|---|---|---|---|
| Summarize an article | 3K / 300 | $0.067 | $0.014 | $0.0006 |
| Translate 1 page | 500 / 600 | $0.052 | $0.011 | $0.0004 |
| Generate a function | 1K / 500 | $0.053 | $0.011 | $0.0005 |
| Write a blog post | 500 / 3K | $0.233 | $0.047 | $0.0019 |
| Analyze a spreadsheet | 10K / 1K | $0.225 | $0.045 | $0.0016 |
| Chat response (avg) | 2K / 500 | $0.068 | $0.014 | $0.0005 |
Frontier model = Claude Opus 4.6 / o1. Mid-tier = Claude Sonnet 4.6 / GPT-4o. Budget = GPT-4o-mini / Gemini Flash.
Tips for Reducing API Costs
API costs can add up quickly, especially at scale. Here are practical strategies for keeping them under control:
1. Use the smallest model that works
This is the single most impactful optimization. For many tasks, GPT-4o-mini or Gemini Flash produces results that are nearly as good as frontier models at a fraction of the cost. Test your use case with cheaper models first and only upgrade if quality is genuinely insufficient. A model that is 10x cheaper and 95% as good is almost always the right choice.
2. Implement caching
If users ask similar questions, cache the responses. Both Anthropic and OpenAI offer prompt caching features that can reduce costs by up to 90% for repeated prefixes. Even simple application-level caching (storing responses for identical inputs) can save significant money.
3. Optimize your prompts
Shorter prompts cost less. Remove unnecessary instructions, examples, and context. Use system prompts efficiently. If you are including few-shot examples, test whether you really need all of them. Often 1-2 examples work nearly as well as 5-6.
4. Set max token limits
Always set a max_tokens parameter to prevent unexpectedly long (and expensive) responses. For a summarization task, you probably do not need more than 500 output tokens. For code generation, 2,000 is usually plenty.
5. Use model routing
Route different requests to different models based on complexity. Simple questions go to a cheap model; complex ones go to a frontier model. You can implement this with a classifier (which itself can be a cheap model) or with simple heuristics based on input length or keywords.
6. Batch your requests
Both OpenAI and Anthropic offer batch APIs with 50% discounts. If your use case does not require real-time responses (e.g., processing a backlog of documents), batching can cut your costs in half.
7. Consider open source models
For high-volume applications, self-hosting an open source model like Llama 4 or Mistral can be dramatically cheaper than API calls. The upfront infrastructure cost is higher, but per-request costs approach zero. See our open source LLM guide for details.
Understanding Tokens
Tokens are the fundamental unit of AI API pricing. A token is roughly three-quarters of a word in English. Here are some helpful benchmarks:
- 1 token = roughly 4 characters or 0.75 words in English
- 100 tokens = roughly 75 words (a short paragraph)
- 1,000 tokens = roughly 750 words (about 1.5 pages)
- 10,000 tokens = roughly 7,500 words (a long article)
- 100,000 tokens = roughly 75,000 words (a short novel)
- 1,000,000 tokens = roughly 750,000 words (several novels)
Important: input tokens and output tokens are priced differently, with output tokens typically costing 2-5x more than input tokens. This is because generating text is more computationally intensive than processing it. When estimating costs, always account for both sides.