Multimodal Models
Image generation, video generation, text-to-speech, and speech-to-text catalog. Pricing in modality-native units (per image, per second, per 1k characters, per minute).
For agents: same data at /api/multimodal. Filter with ?modality=image|video|tts|stt. Free, no auth, cached 10 min.