Paolo Tosco Abstract

PickR: Pick diverse R-groups for library design using 3D electrostatics and shape

Paolo Tosco1, Mark Mackey1

1Cresset
Diverse library design and enumeration is an important technique for generating new chemical matter for hit or lead finding. The ultimate goal is to obtain the broadest coverage of chemical space while minimizing the number of molecules to buy or synthesize. Existing techniques generally use 2D methods involving chemical features or structural graphs to assess the similarity of any two compounds.

While 2D methods are fast, they have significant limitations in their ability to capture the biological similarity between molecules, especially when conformationally flexible structures are involved. Structures which appear to largely differ in functional group decoration may give rise to quite similar steric and electrostatic properties, which are what actually determine their recognition by biological macromolecules.

In this contribution we show how 3D electrostatic and shape similarity can be effectively applied to this problem. The move from 2D to 3D brings a much richer, more realistic description of molecular interactions. However, it also introduces conformational sampling into the problem, significantly increasing the size and complexity of the calculations.

The PickR algorithm utilizes the concept that most libraries are constructed using a combinatorial paradigm, such that the selection of the final molecules to be included in the library can be simplified to the selection of a suitable range of building blocks, or R-groups. To assess the diversity of these R-groups, we align all reagents on a common bond, usually the bond formed in the combinatorial reaction, and compute the electrostatic and shape similarity of every pair of conformations. As the alignment along a bond involves a rotational degree of freedom, we sample multiple mutual arrangements of each reagent pair to make sure that the best steric and electrostatic overlap is attained.

This procedure leads to a single similarity value for each reagent pair, which is collected into a similarity matrix. Clustering this matrix yields to a diverse pick of R-groups which are prioritized for inclusion in the library to be synthesized or acquired from a vendor. We will discuss the algorithm together with the challenges associated with the move to 3D, and highlight both advantages and disadvantages of the approach, particularly with regards to scalability. In fact, while libraries consisting of a few thousand reagents can be conveniently processed on a single commodity workstation, diversity picks out of large corporate or vendor libraries (>100K molecules) require significant computing resources.

To demonstrate the algorithm, we will present the application of PickR to the selection of amino-acid side chains and compare with the results obtained using a 2D method. We will also touch on the extension of this methodology to the diversity pick in final product space as opposed to simple R-group space. In fact, R-groups can be easily reacted in silico with the scaffold(s) of interest, and the resulting products assessed for 3D similarity/diversity as a whole. This allows to take into account conformational restrictions and electrostatic effects subsequent to the conjugation, thus allowing for a more thoughtful design of the reagent library in the light of the compounds to be realized and submitted to biological assays.