We trained a bidirectional masked generative model on protein sequence, structure and function. ESM3 work over tokenized representations of multiple modalities and can generate proteins with high fidelity and controllability. ESM3 further improves with feedback using alignment methods similar to Reinforcement Learning from Human Feedback (RLHF) applied in LLMs.
We have prompted ESM3 to generate fluorescent proteins with a chain of thought. Among the generations that we synthesized, we found a bright fluorescent protein at far distance (58% identity) from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.
That being said, this is definitely one that's particularly optimistic, particularly low-ego, centered around a curiosity one has with the world.
That I like quite a bit.