Shoichet Laboratory

Design and Experimental Testing of Docking Algorithms


A long-term goal of our research is to address the docking problem. This may be stated as, "Given the structure of a protein and that of a potential ligand, can the two form a favorable complex? What are the bases for binding and specificity?"

Like many molecular recognition questions, molecular docking is difficult because of the many states accessible to macromolecules and their ligands, and the problem of calculating accurate energies. The number of accessible states grows exponentially with the degrees of freedom of the docking molecules. Energy calculations in condensed phases must subtract large numbers to arrive at small differences, almost guaranteeing inaccuracy.

We are investigating methods to address the docking problem through coupled computational and experimental studies. Computationally, we are developing algorithms to sample degrees of freedom such that the number of states grow linearly, not exponentially, with chemical complexity. We are also investigating methods to calculate desolvation energies in docking. An important aspect of our research program is that we are experimentally testing new docking algorithms in our own laboratory. This allows us to investigate how the algorithms are performing in greater detail than is typically accessible to theoretical laboratories. AmpC β-lactamase, in conjunction with a binding site engineered into T4 lysozyme, provide well-behaved experimental systems for these studies. Much of our research centers around using docking as a method to screen molecular databases for ligands that will complement a protein structure.

The conformation problem in docking. One challenge in molecular docking is accounting for molecular flexibility. There are many states to consider in docking flexible molecules; the problem is unbounded in polynomial time. In early work, we showed that ligand conformational flexibility could be partially addressed by pre-calculating conformations and docking them as an ensemble (Lorber & Shoichet, Protein Science, 1998). This allowed us to dock several hundred ligand conformations in about the same time it had taken to calculate one conformation. Subsequently, we have extended this approach to consider ensembles of similar compounds. This pre-organization allowed structure-activity information to emerge directly from the docking screens, and suggested entirely new classes of inhibitors for several enzymes that we were able to test experimentally (Su et al., Proteins, 2001). We have generalized these algorithms by treating ligands hierarchically, which allows us to mix and match side chains that are sampled independently. We can now dock thousands and even millions of ligand conformations in the same time as it took to dock a single ligand conformation. This method has been applied to protein-protein docking (Lorber et al., Protein Science, 2002), and its application to database screening by molecular docking is now being explored.

The scoring problem in docking. A second major challenge in molecular docking is the "scoring" functions that the various programs use. "Scoring" involves evaluating fit for the docking molecules from the database, and ranking them accordingly. We have tried to stick close to atomistic scoring functions, reasoning that these represent good models of physical reality. To do this, we have found that it is important to explicitly consider the cost of desolvating the ligands as they are docked into the binding site in the docking score. Simple implementations of this idea improved our ability to distinguish likely from unlikely dock "hits" (Shoichet et al., Proteins, 1999). More recently, we have shown that improved treatments of partial atomic charges also improve the docking calculation. We are applying this new treatment of ligand databases to predict he binding of ligands for a model binding site engineered into T4 lysozyme.

The specificity problem in docking and scoring. One of the most surprising discoveries to emerge from the experimental side of our docking work is a general mechanism for non-specific enzyme inhibition. A large number of molecules, many of which are published, inhibit many different targets with peculiar, non-drug-like properties. To understand the source of this frustrating behavior, we are using enzymology and light scattering techniques to study hits from virtual and experimental high-throughput-screening (HTS). Unexpectedly, we have shown that many promiscuous inhibitors form aggregates of 100-400 nm diameter (figure), and we hypothesize that it is the aggregate that is the inhibitory species. These aggregate-forming, non-specific inhibitors are common in screening databases used by drug companies, and may well explain many of the false-positive hits encountered in HTS, which is among the biggest problems in discovery research in the pharmaceutical industry (McGovern et al., Journal of Medicinal Chemistry, 2002).

In summary, we take a physical approach to the docking problem. By linking theory with experiment, we can consider several of the important problems outstanding in the field at atomic resolution.

Recent publications:

  • Lorber DM, Udo MK, Shoichet BK. Protein-protein docking with multiple residue conformations and residue substitutions. Protein Science 11, 1393-1408 (2002). [Pubmed | DOI | Supplemental Data | Download PDF]
  • Su AI, Lorber DM, Weston GS, Baase WA, Matthew BW, Shoichet BK. Docking molecules by families to increase the diversity of hits in database screens: computational strategy and experimental evaluation. Proteins 42, 279-93 (2001). [Pubmed | DOI | Supplementary Material]