Google presents DiPaCo v/@Ar_Douillard.
Distributed Path Composition.
An experimental mixture of experts that can be trained across the world, with no limit engineering-wise on its size, while being able to be light-weight and fast at test-time.
Everything…
Join the discussion on this paper page.
Comments are closed.