Training Datasets
AI pretraining corpora, instruction-tuning datasets, DPO preference data, and multimodal data the open-source community uses to train and fine-tune models. Each entry: size, license, languages, content type.
For agents: same data at /api/training-datasets. Filter with ?stage=pretraining|instruction-tuning|dpo|rlhf|multimodal. Free, no auth, cached 10 min.