Key Points
- Evo is an AI model trained on bacterial genomes to interpret DNA like language.
- It can reconstruct missing portions of genes, achieving up to full recovery.
- The system successfully restores deleted genes within functional clusters.
- Evo generates novel protein sequences, demonstrated with toxin variants lacking known antitoxins.
- The approach respects evolutionary limits, altering only regions tolerant to variability.
- Potential applications include protein engineering, synthetic biology, and drug discovery.
AI Model Trained on Bacterial Genomes
Scientists developed an artificial intelligence system called Evo, which is trained on the full complement of bacterial genomes. By learning the relationships between nucleotide patterns and larger genomic contexts, Evo can interpret DNA fragments much like a language model interprets text.
Filling Gaps in Known Genes
When prompted with a partial sequence of a known gene, Evo can accurately predict the missing portion. For example, providing 30 percent of a gene’s sequence allowed Evo to generate 85 percent of the remainder, and supplying 80 percent of the sequence resulted in a complete reconstruction.
Restoring Deleted Genes
In experiments where a single gene was removed from a functional cluster, Evo correctly identified and restored the missing gene, demonstrating its understanding of gene organization and functional relationships.
Generating Novel Protein Sequences
Beyond completing existing genes, Evo was challenged to produce new protein sequences. Researchers used bacterial toxin genes, which typically evolve rapidly and are paired with antitoxin genes. By prompting Evo with a toxin that was only mildly related to known toxins and filtering out responses resembling known antitoxins, the system generated a novel toxin sequence with no obvious antitoxin counterpart.
Implications for Biotechnology
These results indicate that Evo can not only replicate known biological information but also explore new sequence space while respecting evolutionary constraints. This opens avenues for designing proteins with desired functions, accelerating synthetic biology, and expanding the toolkit for protein engineering.
Source: arstechnica.com