Blog

Mar 30, 2024

Invalid SMILES are beneficial rather than detrimental to chemical language models

Posted by in category: chemistry

Generative models for chemical structures are often trained to create output in the common SMILES notation. Michael Skinnider shows that training models with the goal of avoiding the generation of incorrect SMILES strings is detrimental to learning other chemical properties and that allowing models to generate incorrect molecules, which can be easily removed post hoc, leads to better performing models.

Leave a reply