CLEMSON, South Carolina — A team of scientists from Clemson University and Cornell University is developing the first set of computational techniques that can predict how DNA mutations affect proteins and protein-to-protein interactions, which are vital in determining how the body’s tissues and organs function. Their study also holds the potential to accelerate the synthesis of new drug treatments for a variety of genetic disorders.

“Humans have about 20,000 different proteins in each cell, and each protein is involved in about five different interactions,” said Emil Alexov of Clemson’s department of physics and astronomy. “Imagine how that breaks down – each cell in our body then directs anywhere from 100,000 up to a million different interactions. Because DNA is different from person to person, these interactions are slightly different in each individual. Some of these differences are OK – that is the reason why we aren’t identical to each other. But some of these differences can cause diseases.”

Emil directs Wang and Zhang

Professor Emil Alexov (center) works with former graduate students Lin Wang (left) and Zhe Zhang (right).
Image Credit: Clemson University Relations

Alexov is working with collaborators from Cornell with a $2.3 million grant from the National Institutes of Health.

Disease-causing mutations in DNA can code for the wrong amino acid – or sometimes delete an amino acid entirely – resulting in a misfolded protein that wreaks havoc on normal functioning.

For example, cystic fibrosis – a disease characterized by the chronic buildup of mucus in the lungs – occurs when phenylalanine, an amino acid, is deleted from the protein encoded by the CFTR gene. This gene’s protein product, in its normal state, acts as a chloride channel to control water secretion and absorption. But without the protein, the channel ceases to exist, causing the affected individual to produce mucus that is unusually thick, sticky and difficult to clear.

The team’s goal, after a four-year study, is to be able to computationally predict how a mutation – like that implicated in cystic fibrosis – will affect corresponding protein interactions without the need for costly, time-consuming experiments.

“Human DNA contains about three billion base pairs. You cannot conduct three billion experiments, let alone billions and billions, once you consider all of the potential mutations,” Alexov said. “It’s simply not feasible; it’s got to be computationally done.”

To start, Alexov’s colleagues – professors Haiyuan Yu and Andrew G. Clark at Cornell – will home in on a total of 6,000 mutations: 4,000 that are common in the general population and 2,000 that will be nominated by researchers in the human genetics community.

After preparing the mutated samples, Yu and Clark will use high-throughput sequencing to generate millions upon millions of reads (short fragments of DNA sequence) that can be pieced together like a puzzle to render the original DNA sequence of the mutation. Through a handful of laboratory procedures, the pair will then test the mutant proteins to uncover interactions of interest and to discover which mutations result in enhanced or weakened interactions.

Illustration of particular protein-protein interaction. The figure shows dynein binding domain interacting with its partner, the alpha/beta tubulin dimer. The electrostatic filed lines indicate strong interactions among the partners.

Illustration of a particular protein-protein interaction between dyenin and tubulin proteins.
Image Credit: Emil Alexov

Alexov will direct the second half of the study by developing computational tools – using data points sent to him from Yu and Clark – to estimate the behavior of protein interactions.

“If you are lucky, there will be an experimental structure of the particular protein in question, and if you are luckier, the experimentally determined structure of this protein will interact with some other protein,” Alexov said. “That is the best-case scenario. But in the vast majority of cases, we aren’t that lucky, so we have to build our own structures.”

After building a 3-D model of both the normal structure of a particular protein and its mutant version, Alexov and his students can run calculations for binding affinity and other biochemical measures.

“Binding affinity, or binding free energy, is a quantity that tells us to what extent this protein-protein complex is tightly bound or not, with the understanding that any deviation from the norm is bad,” Alexov said. “Let’s say the normal protein interacts with its partner at a particular strength. If the mutation makes this binding tighter, it’s bad. If the mutation makes this binding weaker, it’s equally bad. So we need to have some measure to say if the mutation makes this interaction highly perturbed or not.”

But the question is, at what point is a deviance from the norm no longer treatable?

“Typically, the thing that we argue in our papers is we don’t care about the absolute change, we care about percentage. For example, if the normal binding energy is five, a perturbation value of one is significant. But if the binding energy was 20, one is not so significant,” Alexov said. “In many cases, we don’t know what the typical binding energy is, and we have to make a guess. So we make a prediction based on other details – like structural changes and hydrogen bonding – to make us more confident that the thing we’re predicting is going to happen in real life.”

Alexov’s ultimate goal is to find a drug that can bind to a perturbed protein to restore its original shape and binding affinity and, therefore, its proper function – unlocking the potential to treat a multitude of genetic disorders.

“Often the best treatment can be highly facilitated if you know what the primary origin of the disease is,” Alexov said. “If we can understand how these DNA differences affect interactions in the proteins of our body, this will pave the way to develop personalized treatments for better patient care.”


Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM125639. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.