Homology Modeling of Proteins

The most successful protein structure prediction method to date is homology modeling (also known as comparative modeling). The approach is based on the structural conservation of the framework regions between the members of a protein family. Since the 3D structures are more conserved in evolution than sequence, even the best sequence alignment methods frequently fail to correctly identify the regions that possess the desired level of structural similarity, and the quality of alignment is the single most important factor determining the accuracy of the 3D model.

Reliable prediction of framework regions of proteins

We have developed an algorithm called Consensus that consistently provides a high quality alignment of the framework regions for comparative modeling[1]. The target and template sequences are aligned by the five different alignment algorithms (chosen based on a benchmarking study we conducted earlier, please see reference). A consensus alignment is built based on the five alignments generated, and each position is assigned a confidence level (consensus strength). The regions reliable for homology modeling are predicted by applying criteria involving secondary structure and solvent exposure profile of the template, predicted secondary structure of the target, consensus confidence level, template domain boundaries and structural continuity of the predicted region with other predicted regions. On the average, the method predicts structures that deviate from the native structures by about 2.5 angstroms, and the predictions extend to almost 80% of the regions that are structurally aligned in the FSSP database. The approach was implemented as the Consensus alignment server, and tested at the CAFASP3 competition of such servers.

Multiple model approach

Our first approach to dealing with this uncertainty around target-template alignments was to use a multiple-model approach. A number of pair-wise alignments of target and template sequences would be generated using simple dynamic programming with variation of parameters. Models were constructed for each alignment. This was followed by energy-based discrimination of the generated models. This approach was tested at the CASP4 (Comparative Assessment of Structure Prediction) competition in 2000. In view of its simplicity the approach provided surprisingly good results for the easier targets, i.e., for targets with a good template available, but the dynamic programming was too rudimentary to obtain any good alignment for difficult targets. Thus, the free energy ranking algorithm had to choose one among inferior models. It worked reasonably well for its simplicity. However, false positives were caused by completely wrong alignments. The resolution of the discrimination was also sometimes inadequate.

The consensus algorithm for comparative modeling would serve at least two purposes:

  1. It would help in generating minimal models that can be completely trusted as accurate base models and can be further improved and tested.
  2. Multiple alignments can now be generated with these reliable regions constrained. This would result in far fewer alignments, as well as hopefully eliminate the false positive problem, improving the resolution power of the energy-based discrimination.

The reliable regions identified using our consensus method could potentially be used as the constrained regions, varying the remaining segments of the alignment. This approach assumes that the template is usually the best approximation for a target, i.e. following the template wherever possible is better than predicting ab initio. Loops, target regions that are very different from the template, and regions that are aligned to gaps may be predicted again using a multiple model approach, varying them structurally.

Comparative Modeling of P450s

Cytochrome P450s are known to be present across almost all species, involved in a huge variety of functions. Several types exist in humans alone, and understanding them is of vital importance. The overall structure of the p450s is remarkably well conserved across the entire protein family, given that the sequence identity can be as low as ~20%. Crystal structures of very few of them have been determined. Therefore the scope of comparative modeling here is enormous. We hope to predict models and refine them sufficiently to be used in other applications such as docking.

One of research areas involves docking studies on comparative models of p450s with carcinogenic molecules identified in the environment. Certain environmental carcinogens are metabolized differently by the cytochrome p450 1A1s in humans and fish, and exhibit different degrees of cytotoxicity. We are investigating why this is so, and whether docking studies can substantiate our hypotheses about the mechanisms. This is a collaborative effort with Dr. J. Stegeman of the Woods Hole Oceanographic Institute.