Physics And The Messy World Of Proteins


Photography: Michael Sexton


What’s physics got to do with
the messy world of proteins?
Everything, it seems, as Mark Buchanan finds out

HAVE A LOOK round Ken Dill’s office, and you can probably guess his line of work. On the blackboard, drawings of knotted and tangled strings fight it out for space. Books lie open on the desk, their pages cluttered with diagrams of strings. And on the computer screen, stringy chains of yellow, luminescent beads wriggle and writhe. On the evidence, you might guess that Dill is a string theorist.

In a sense, you’d be right. But Dill is no quantum physicist seeking a Theory of Everything. He’s not even doing what most people would think of as physics. Try biology instead. At the University of California at San Francisco, Dill is trying to create a mathematical theory for the building blocks of the living world–those strange and stringy molecules called proteins.

Biologists know a vast amount about proteins: as long chains of amino acids folded into distinctive blobs, they do everything from catalysing chemical reactions to destroying microbes in the bloodstream. But smack bang in the middle of this understanding is a yawning gap–how proteins fold up. The folded shape of a protein is vital to its action–an extra kink or a missing coil can spoil everything. Even when researchers know a protein’s precise sequence of amino acids, they are usually powerless to predict its final shape.

Go one step deeper and the puzzle becomes more disconcerting. After half a century, scientists still haven’t worked out the basic mechanisms by which proteins manage to fold at all. Worse yet, simple arithmetic says that protein folding ought to be impossible. But this is where Dill’s mathematical “string theory” is coming to the rescue. With an approach based in physics, he and a handful of others are finally teasing out a solution to this “folding paradox”. Armed with the clues coming out of this work, biologists may not have to puzzle over their protein problems for much longer.

The conceptual quandary is this: a protein manages to curl up into the same shape every time. That might not seem problematic, but while a protein usually has only one correct fold, it can be contorted into an enormous number of others. When two amino acids bond, says Peter Wolynes, a chemist at the University of Illinois at Urbana-Champaign, they can adopt roughly 10 different orientations in relation to each other. “So a protein of 60 amino acids can be in any of about 1060 states.” This means that even if a protein could try out 100 billion folds a second, it should take longer than the age of the Universe to stumble over its correct fold. “Like a drunk playing golf,” says Wolynes, “it should take practically forever.” But it doesn’t.

To try to understand this folding paradox, Dill, Wolynes and others have taken an unusual approach–by turning their backs on proteins altogether. A protein is just one example of a polymer, a chain made of many units linked together. So they are looking into the folding of very simple polymers–computer models of chains of beads that can move and bend as the beads attract or repel one another. Dill pioneered this approach ten years ago with his then postdoc Kit Lau. “Coming from the tradition of polymer theory and physics,” says Dill, “toy models seemed a natural way to go–we could test ideas that couldn’t be tested in any other way.”

Real proteins contain up to 20 different types of amino acids. But in a crude sense, says Dill, all these amino acids fall into just two classes: hydrophobic and hydrophilic. Some shun contact with water, while others eagerly seek it. So Dill and Lau settled on strings with only two kinds of beads.

These models may insult the complexity of real proteins, but their behaviour is so complicated that Dill and his colleague Hue Sun Chan, also at UCSF, are still trying to understand them today. In any simulation run, the computer mimics the effects of tossing a string into a bath of water. When that happens, the string’s hydrophobic beads quickly clump together into a group as they try to stay dry. By contrast, its hydrophilic beads try to stay in contact with the water, and end up surrounding the hydrophobic blob (see Diagram). The result is a folded string.

So far so good. But folding into a vague blob isn’t enough. To mimic protein behaviour, a string needs to be a “good folder”, something that folds quickly and reliably into a unique final shape. In the early 1990s, Dill and his colleagues found that random sequences of his two beads didn’t work that way. “We found that only a few per cent led to good folders,” says Dill. All quickly collapsed into globules, but most were very un-protein-like, and even a small change in the unfolded starting shape often led to a completely different folded state.

An artefact of too simple a model? It seems not. In 1994, chemists Andrej Sali, Eugene Shakhnovich and Martin Karplus of Harvard University studied the folding of virtual strings made of many types of beads that interacted in complicated ways. Looking at the folding of strings with random sequences of these beads, they found that only about 15 per cent folded reliably into unique states. The lesson? As physicist Vijay Pande of the University of California at Berkeley puts it: “Random sequences just aren’t sufficiently protein-like.”

So the sequences of good folders, proteins or otherwise, must have some special property that lets them arrive at their unique shape. What is it? And how does it ensure reliable folding? Some scientists are running larger and more detailed simulations hoping to find out, but others, such as Wolynes, have been searching the remote corners of physics for theoretical notions that might help. Rather unexpectedly, one of the most fruitful ideas comes from the physics of solids, in the form of something known as a “spin glass”.

Take a chunk of copper peppered inside with a few magnesium atoms, and you have a spin glass. Each magnesium atom has spin, a quantum mechanical property that makes it act like a tiny magnet. These magnets interact with one another and, depending on the distance between them, some try to point in the same direction, while others try to point in opposite directions (see Diagram).

The term “spin glass” comes from an analogy with ordinary glass, which is made by rapidly cooling a molten material. As it cools, the molecules try to arrange themselves in an orderly way, as in a crystal, because that is the state with lowest energy. But if the cooling is fast, they can’t do it. Instead, like cars in a crowded car park, they get jammed up and stuck in a haphazard arrangement.

This is also what happens with a spin glass. The forces that make a magnesium spin try to align with its closest neighbours and “disalign” with others lead to what physicists call “frustration”: the system has a difficult time finding a low-energy arrangement that satisfies all the pairs at once. “There are conflicting interactions”, says Wolynes. So, like the molecules in window glass, the spins of magnesium ions tend to get locked into arrangements that satisfy some pairs but leave others unsatisfied.

As a result, a spin glass does not have one optimal arrangement of spins but many, all of which have about the same energy. There simply is no unique state of especially low energy. And in this curious property Wolynes sees a connection with folded polymers. “Most random sequences,” he points out, “would have a good deal of frustration too.” In a folded chain, for example, some hydrophobic amino acids might be forced by their neighbours to live at the globule’s surface, exposed to water, while some hydrophilic beads might be buried inside the globule. Sure enough, the toy computer models bear this out: like spin glasses, most random polymers don’t fold into one unique state but into one of many, all with about the same energy.

So if most random polymers are like spin glasses, what about the few that fold well? And what about real proteins? As Wolynes sees it, this is the purpose of the special sequences that proteins have. “Proteins”, as he puts it, “have sequences that have been artfully selected by evolution so that the frustration in their interactions is much less than you would expect.”

Rugged landscape

To get a better picture of this idea of “minimal frustration”, imagine an abstract undulating landscape, where at any point, the height corresponds to the energy of one possible conformation of the amino acid chain. For most polymers with random sequences of beads–as for a spin glass–this “energy landscape” would be extremely rugged and have no states of exceptionally low energy. Upon folding, the polymer would take the shape corresponding to one of the many local, shallow valleys, and it would fall into a different one every time (see Diagram).

For the rare random polymers that do fold well, however, Dill’s simulations and those of other groups show that the energy landscape is different. While the landscape of a good folder may still be rugged, one state has markedly lower energy than all others. In this folded state the system can escape frustration, and satisfy the needs of a fair fraction of the beads all at once.

Based on this insight, Wolynes has tried to apply the maths of spin glass theory to proteins. The basic model for spin glasses, the “random energy model”, was invented in 1980 by Bernard Derrida of the École Normale Supérieure in Paris. Derrida assumed that the disorder in a spin glass–the random distances between magnesium ions–would create an energy landscape that was truly random and extremely rugged. For a protein, Wolynes points out that two slightly different folds should have fairly similar energies, so he’s modified the random energy model to try to take that into account.

Wolynes begins with a rugged energy landscape that has a single state with an especially low energy. This is the correctly folded state. Then he supposes that states only slightly different from the correct fold also have lower than average energies. The result is a landscape with a deep, rugged valley. This funnel-like structure, Wolynes proposes, is the answer to the folding paradox. A protein just falls downhill into its folded state.

Dill’s simulations back up the idea, at least for toy proteins. In his simple two-bead models, good folding polymers have funnels in their energy landscapes, and poor folders don’t. At Berkeley, Pande and his colleague Daniel Rokhsar have used model strings to prove the validity of this idea in another way–by showing how to design polymer sequences from scratch so as to obtain good folders. Researchers usually take polymers with fixed sequences and watch them fold. Rokhsar and Pande turned the game on its head by fixing a random polymer in a particular fold, and then letting the sequence of amino acids evolve so as to make the energy of the fold as low as possible. In every case, the result is a sequence that produces a good-folding polymer with a funnel-like energy landscape.

Something very similar seems to be going on in real proteins. In 1995, Wolynes and several colleagues were able to estimate the rough shape and ruggedness of the energy landscapes of a few real proteins–albeit small ones, with fairly short amino acid chains–and showed that they too had funnel-like features.

So in the light of the physics models, protein folding turns out to be not really paradoxical at all. “It was a misconception”, says Dill. Put folded proteins into a hot liquid, and their vigorous motion will make them unfold into wriggling strings. But most will return to their characteristic shape as the liquid cools below some “folding” temperature. At this point, the random kicks the protein receives from the environment prevent it being trapped in any of the shallow valleys in its landscape. So the protein explores different states as it cools, but when it falls into the deep well corresponding to its folded state, it is so deep that it stays there.

The researchers are beginning to draw parallels between protein folding and other natural processes too, such as freezing. Water freezes by nucleation–when, below the freezing temperature, a few molecules fall randomly into a crystalline structure, and seed the growth of the rest of the solid. The same is true of proteins, saysRokhsar. “First, some very small droplets of the folded structure form. These then form a scaffolding for the rest of the protein structure.”

The next goal, says Dill, is to bridge the gap between real proteins and computer models, by making the models more realistic and filling in details of the general picture of how the computer strings fold. Nowhere are those details more crucial than in efforts to solve that other protein-folding problem–how to predict protein shape from amino acid sequence. Biologists already know the amino acid sequences for most of the body’s 100 000 proteins, but they know the structures of only a few thousand, because the only way to determine those structures is through time-consuming methods such as X-ray crystallography.This is frustrating to biotechnology companies eager to find novel proteins for medicines or superstrong fibres, or to design new proteins.

Far better would be to predict the structures by mere mathematics. At the moment, the most effective approach is though comparisons. Suppose you’d like to predict the structure of a protein and already happen to know the structures of others with similar sequences. Then you’re in luck–you have some clues as to what the protein’s structure is likely to be. Using this approach, says Wolynes, it’s possible to get roughly accurate predictions for about a third of the unknown sequences.

But for the other two-thirds, there aren’t any clues. For these, the general idea is to feed a protein’s sequence into a computer and let it search for a state with very low energy. This approach merely mimics the folding process electronically. Unfortunately, even the fastest supercomputer isn’t able to simulate the motion of every molecule in a protein for more than a few nanoseconds. That’s not long enough to see how they fold, so cruder models have to be used. “That’s where the energy landscapes come in,” says Wolynes.

Downhill run

By understanding the typical features of protein energy landscapes, it becomes easier to make judicious approximations and arrive at computer models that can make roughly accurate predictions in reasonable amounts of time. Dill, for example, is developing computer algorithms that can take a sequence and search for low-energy folds. Searching by just going downhill takes a long time, as the computer tends to get stuck in shallow, local wells, rather than finding the global valley corresponding to the lowest energy fold. Knowing that the landscape will have a funnel-like shape, Dill can build this knowledge into the algorithm, making its search strategy more efficient. It can hop over the local traps and find its way to the correct fold. Other researchers are developing similar ideas. “We’re not there yet,” Wolynes says, “but it has become an engineering problem.”

And fortunately, it is less of a problem every day. Researchers are accumulating the structures of so many proteins that “the fraction of new structures that are really unrelated in sequence or fold to anything we have seen before is falling rapidly”, says John Moult, a biophysicist at the Center for Advanced Research in Biotechnology in Rockville, Maryland. In five or ten years, he says, scientists should have a set of archetypal proteins with folded shapes that would more or less cover the space of all possible structures.

By using these as guides to build better computer models, it should in a few years be possible to predict the shape of any new protein by looking to details of the archetypal protein that its sequence most resembles. For biology and biotechnology, this will usher in a new era. After half a century, scientists will at last be able to read the “second half” of the genetic code.

Further Reading:

  • “From Levinthal to pathways to funnels”, by Ken Dill and Hue Sun Chan, Nature Structural Biology, vol 4, p 10 (1997)