Marketing and Communications

Left to right. Blake Sweeney, Neocles Leontis, Anton Petrov, Craig Zirbel, and James Roll

BGSU researchers develop resource for predicting 3-D structure of RNA molecules

Most living things are a bit like an ongoing construction project. Following the instructions encoded in their DNA, a creature’s cells go about performing their necessary functions, allowing the organism to develop and then maintain the various tissues and organs that keep it alive.

You might think of DNA as the architect, the molecule with the blueprint for life. But like most construction projects where the architect—in this case DNA—creates the blueprint, others must do the rest of the work. And in our cells one of the hardest workers is ribonucleic acid, or RNA.

There are many known types of RNA molecules—messenger RNA, ribosomal RNA, and microRNA, for example—but improved techniques have allowed scientists to discover thousands more in recent years. Now, they are working hard to understand the functions of all these new RNAs to gain fresh insights into evolution, develop new pharmaceuticals, and enhance our understanding of many disease processes, including cancer.

More than half a century ago, James Watson and Francis Crick discovered the chemical structure of DNA—the famous double helix. Like DNA, RNA incorporates helices but it also folds to form loops and junctions. These features give specific RNA molecules a three-dimensional shape, and discovering the structure of each new RNA is the key to understanding how each one functions. But determining this 3-D structure has been a difficult and complicated task … until now.

Bowling Green State University faculty Drs. Neocles Leontis and Craig Zirbel, supported by the efforts of post doctoral researcher Anton Petrov, graduate students James Roll from statistics and Blake Sweeney from biology, and others, are developing an online resource to help researchers around the world gain valuable new insights into the 3-D structure—and function—of these important biological molecules.

“Our current priority is to launch a Web application for other scientists to use to help them predict the 3-D structures of RNAs they are studying,” said Leontis, a professor in the Department of Chemistry. “We want to create a process that will scale and allow scientists to submit the RNA sequences from entire genomes. Right now we are working on a retail level, but we will be moving toward wholesale.”

They have deployed a beta version of the application and are now in the process of redesigning the interface. The origins of this work date back to the late 1990s when Leontis was collaborating with Eric Westhof, director of a research institute for molecular biology at the University of Strasbourg, France.

They realized that RNA molecules, like DNA, consist of just four basic building blocks, called bases, that link together to form long chains. When an RNA molecule folds to form a functional 3-D structure, the bases bond to make base pairs, which stack on each other to make helices and other structural motifs.

“When all the base pairs are of the Watson-Crick type, you get a regular double helix, as in DNA,” Leontis said. “But the helices in RNA are short and interspersed with loops and junctions that contain non-Watson-Crick base pairs. Together, we proposed a simple way to classify the non-Watson-Crick base pairs into geometrical patterns or families and we made predictions, based on this classification, of new combinations that should appear in structures. We also made predictions as to which combinations might substitute for each other during evolution.”

This ability to classify base pairs into families and predict substitute base pairs created a new “vocabulary” for examining these structures. As work in the field continued, new structures were identified that confirmed their predictions. Today, the Leontis-Westhof base pair classification system is widely accepted and used to annotate structures.

The stability of a particular arrangement of stacked base pairs in RNA depends on many factors, but some arrangements are more stable than others. This is important because when cells reproduce, mutations may occur in a cell’s DNA, resulting in base substitutions in the RNA produced from the DNA. Sometimes these substitutions are stable and other times not, but if they are stable they can persist. In general, RNA will have a preference, or tendency, to form a more stable molecule. This preference or tendency is one clue that researchers can use, when given a particular sequence of base pairs, to predict the shape of an RNA molecule.

While there are a variety of tools available for researchers to determine the secondary (2-D) shape of RNA, discovering the three-dimensional (3-D) structure of an RNA molecule is much more difficult. To date, X-ray crystallography has definitively determined the 3-D structure of a fairly small number of specific RNA molecules, compared to all that are known to exist in our cells.

Zirbel, a professor in the Department of Mathematics and Statistics, is applying probability theory to help predict RNA 3-D structure.

“If you give me the sequence of an RNA molecule, I can go to websites that will predict the secondary structure and tell me where the helices made of Watson-Crick base pairs are,” said Zirbel. “That will leave internal loops and hairpins, which are geometrically the most interesting, and which are composed of the non-Watson-Crick base pairs that determine its shape. What I’d like to do is take the sequence of the internal loop and tell you what it looks like in 3-D. Is it a 45-degree bend, a 90-degree bend, or an extra twist? That would help you reassemble the RNA in 3-D. It doesn’t solve the whole problem, just the important part.”

The Web application the team is developing uses an innovative “grammar” that builds on the “vocabulary” introduced by the Leontis-Westhof base pair classification system. This allows researchers to ask new questions—and get better answers—about RNA 3-D structure. It does this by examining all the possible 3-D structures for a given sequence and then suggesting which is the most likely.

“In statistics, you do this kind of thing often,” Zirbel said. “For example, my model could be that the data come from a normal distribution with some mean and some standard deviation. Then you try to determine which mean and which standard deviation match the data the best.”

“Here, we’ve got 250 different models,” Zirbel said. “We just don’t know which model is the most likely to have generated the sequence that we’re seeing. So our decision criterion is maximum likelihood. You take each model and ask, ‘What is the probability that this model would generate that sequence? Then what is the probability that the next model would generate that sequence?’ All the way down the line, and then choose the model that has the maximum likelihood of generating that sequence.”

From a software programming perspective, this is a challenging task. But the interdisciplinary nature of the current project team is pushing the project ahead. Petrov has created software to automate the cataloging and classification of the various loop, hairpin, and junction structures that give RNA its 3-D shape, while Roll has been working to refine the probabilistic model that is the core of the application’s prediction algorithm. Sweeney brings database expertise that has helped the group integrate sequence data from other genome projects, and also simplified software coding and maintenance.

“It turns out it’s a whole lot of fun, and it’s different than anything I have done before,” Zirbel said. “It’s kind of the Wild West of probabilistic modeling, and I happen to like that. As a mathematician I don’t mind solving somebody else’s problem as opposed to proving theorems in my own domain. In this case, we have a chemist, Neocles, who knows a problem that people want solved, and we’re out to solve it. We complement each other in that he appreciates the mathematics and I appreciate the physics, chemistry and biology.”