Question 1

What is the cheapest AI image generation API?

Accepted Answer

FLUX 1 Schnell at $0.003 per image via Replicate or Together. Open-weights and Apache-licensed, so it is also free to self-host on a GPU. For higher fidelity, FLUX 1.1 Pro at $0.04 is the next tier; FLUX 1.1 Pro Ultra at $0.06 is the top of the line.

Question 2

What is the cheapest production speech-to-text in 2026?

Accepted Answer

Whisper Large v3 hosted on Groq at roughly $0.0007 per minute is the cheapest production STT. The model is open-weights Apache-2.0; Groq runs it on custom LPU silicon at the lowest production price in the catalog. Deepgram Nova-3 at $0.0043/min is the cheapest non-Whisper option with state-of-the-art latency.

Question 3

Which video model leads the leaderboards?

Accepted Answer

As of late April 2026, Alibaba's HappyHorse 1.0 leads the Artificial Analysis Video Arena by 115 Elo. Sora 2 (OpenAI) and Veo 3 (Google) follow. Veo 3 is the only major model with native synchronized audio generation; the others require a separate TTS pass.

Question 4

Which TTS has the lowest latency for real-time voice agents?

Accepted Answer

Cartesia Sonic 2 at sub-90ms time-to-first-byte. Built on the Mamba architecture rather than transformers. About 90% cheaper than ElevenLabs at comparable quality for non-celebrity voice cloning. Deepgram Aura-2 is the closest competitor at sub-200ms TTFB.

Multimodal Models