20/20 Vision Language Models
A Prescription for Better VLMs through Data Curation Alone
DatologyAI · Joshi et al.
Hold the architecture, recipe, and compute fixed — vary only the pretraining data. Our pipeline (multimodal deduplication, quality filtering, mixture design, and both task-agnostic and task-specific synthetic data, all with rigorous multimodal decontamination) produces much better vision-language models — reproducibly.