Toggle light / dark theme

MoDE: CLIP Data Experts via Clustering

Meta presents MoDE

CLIP Data Experts via Clustering.

The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data.


Join the discussion on this paper page.