Blog

Feb 11, 2023

Scientists Use AI to Create Music through Proteins

Posted by in categories: biotech/medical, media & arts, robotics/AI

The first time a language model was used to synthesize human proteins.

Of late, AI models are really flexing their muscles. We have recently seen how ChatGPT has become a poster child for platforms that comprehend human languages. Now a team of researchers has tested a language model to create amino acid sequences, showcasing abilities to replicate human biology and evolution.

The language model, which is named ProGen, is capable of generating protein sequences with a certain degree of control. The result was achieved by training the model to learn the composition of proteins. The experiment marks the first time a language model was used to synthesize human proteins.

A study regarding the research was published in the journal *Nature Biotechnology Thursday. *The project was a combined effort from researchers at the University of California-San Francisco and the University of California-Berkeley and Salesforce Research, which is a science arm of a software company based in San Fransisco.

## The significance of using a language model

Researchers say that a language model was used for its ability to generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics.

“In the same way that words are strung together one-by-one to form text sentences, amino acids are strung together one-by-one to make proteins,” Nikhil Naik, the Director of AI Research at Salesforce Research, told *Motherboard*. The team applied “neural language modeling to proteins for generating realistic, yet novel protein sequences.”

The study was based on training the model with 280 million protein sequences from over 19,000 families, which was “augmented with control tags specifying protein properties.”

According to *Motherboard*, the use of conditional language models by the team allows for significantly more control over what types of sequences are generated, making them more useful for designing proteins with specific properties.

Comments are closed.