Changes in DNA sequence (mutations) cause thousands of different genetic diseases and underlie evolution. However, after 70 years of molecular biology, we remain rather limited in our ability to predict how changes in sequence alter the properties of the protein molecular machines encoded by DNA. This limited capacity to predict how changes in sequence alter the activities of proteins fundamentally limits clinical genetics - for example the identification of disease-causing mutations - and makes engineering biology difficult and slow.
To address this shortcoming, we are developing methods that allow us to quantify the precise molecular effects of millions of changes in the sequences of many different proteins on their molecular properties. Applied at scale, these approaches will allow us to generate reference atlases of mutational effects for clinical genetics and, more fundamentally, datasets of sufficient size and diversity to allow the fundamental ‘encoding’ problems of molecular biology to be directly tackled using computational approaches, including artificial intelligence. The long-term objective is to be able to understand, predict and engineer the sequence-to-activity relationships that underlie essentially all of biology.