Topological Data Analysis of Conformational SpaceLee Steinberg1, Ingrid Membrillo-Solis2, Mariam Pirashvili2, Jacek Brodzki2, Jeremy Frey1 |
|
1Department of Chemistry, University of Southampton 2Department of Mathematics, University of Southampton |
|
Molecular energy landscapes, and their underlying conformational spaces, are a fundamental concept for chemists. For example, the problem of protein folding is often posed as an optimisation of a large number of dihedral angles that minimises the (free) energy. To that end, there have been many attempts to develop general methods for understanding conformational spaces and energy landscapes. Perhaps the most common is the Ramachandran plot [1], a contour plot of the free energy function projected onto two dihedral angles.
Although powerful, the Ramachandran plot does not take the topology of the conformational space directly into account. In particular, the circular nature of the dihedral angles naturally induces a toroidal topology, and while the free energy function is designed to be continuous over the periodic boundaries the Ramachandran plot itself is not influenced by the underlying topology. Work has previously been performed to rectify this problem. ‘Gluing’ the conformational space along the periodic boundaries is perhaps the most obvious approach [2]. Such an approach does indeed create useful pictures, and is topologically faithful. However, this does lead to a stretched geometry, and therefore paths over the conformational space can appear distorted. Recent developments in mathematics have led to the field of topological data analysis, and persistent homology [3]. Such techniques have already found success in chemistry, from solubility prediction [4], to materials informatics [5]. In this work, we apply these techniques to the conformational space of a well-studied molecule, alanine dipeptide. By studying a representative set of conformers, we show that the toroidal topology is easily recovered. We are able to verify that the conformational space of this molecule must be embedded in at least 4 dimensions, with the exact number depending on the extent of approximations made. We compare the use of different metrics on conformational space, and show that they do lead to similar topologies. We then suggest a molecular representation well suited to describing conformational spaces. Also, we use these techniques to study the energy landscape itself, and show that we are able to observe the presence of transition states, and local minima. We then move onto a complicated problem given by a simple molecule: pentane. The molecular symmetry causes the traditional representation of conformational spaces to break down, and lead to a totally disconnected conformational space. However, it is clear that, as long as molecular chirality is fixed, it should be possible to transform any conformer to any other – the conformational space should be path connected. We are able to discuss where exactly the traditional representation breaks down, and show that by moving to a more abstract representation (that will be familiar to chemists in general), topological data analysis techniques allow us to correctly identify the conformational space under different approximations. Finally, we present a perspective as to how these techniques could be applied to more complex molecules, in particular in improving the understanding of their free energy landscapes without complex calculation. [1] Ramachandran, Ramakrishnan, Sasisekharan, Stereochemistry of polypeptide chain configurations, Journal of Molecular Biology, 7, 1963, 95-99[2] Jakli, Knak Jensen, Csizmadia, Perczel, Variation of conformational properties at a glance…, Chemical Physics Letters, 547, 2012, 82-88 [3] Edelsbrunner, Harer, Persistent Homology – A Survey, Contemporary Mathematics, 435, 2007, 257-282 [4] Pirashvili, Steinberg, Belchi-Guillamon, Niranjan, Frey, Brodzki, Improved understanding of aqueous solubility modeling through topological data analysis, Journal of Cheminformatics, 10, 2018, 54 [5] Lee, Barthel, Dlotko, Moosavi, Hess, Smit, High-throughput screening approach for nanoporous materials genome using topological data analysis, Journal of Chemical Theory and Computation, 14, 2018, 4427-4437 |