E. James Harner



E. James Harner, Harshinder Singh, Shenggiao Li, and Jun Tan (2003), Computational Challenges in Computing Nearest Neighbor Estimates of Entropy for Large Molecules, Computing Science and Statistics, 35, I2003Proceedings/HarnerJames/HarnerJames.paper.pdf



Computational Challenges in Computing Nearest Neighbor Estimates of Entropy for Large Molecules
E. James Harner, (West Virginia University), jharner@stat.wvu.edu,
Harshinder Singh, (NIOSH/West Virginia University), hsingh@stat.wvu.edu,
Shengqiao Li, (NIOSH/West Virginia University), shli@stat.wvu.edu, and
Jun Tan, (West Virginia University), jtan@stat.wvu.edu

Abstract

Entropy is a statistical thermodynamic property of molecules; its evaluation is important for studying the properties of biological molecules (such as peptides, proteins, and DNA molecules) and chemical molecules. Entropy evaluation is also important in drug designs and for investigating the effect of toxins on human skin. The entropy of a molecule depends mainly on random fluctuations in its torsional (also called rotational or dihedral) angles. The traditional approach assumed a multivariate normal distribution for the torsional angles of large molecules (Karplus and Kushik, Macromolecules, 1981). However, the assumption of normality is not valid in many situations, particularly when there are large fluctuations in the torsional angles. Demchuk and Singh (Molecular Physics, 2001) introduced a circular probability approach to modeling torsional angles in molecules and illustrated the modeling of the torsional angle of methanol using von Mises distributions. A bathtub shaped probability distribution was derived for the potential energy of the methanol molecule. Singh et al (Biometrika, 2002) introduced a bivariate circular model, which is a natural torus version of the bivariate normal distribution to which it reduces when the fluctuations in the angles are small. The marginal distributions are symmetric and are either unimodal or bimodal. This model was used for modeling two angels of a pentapeptide. In general, the torsional angles can have arbitrary shapes and macromolecules have a large number of torsional angles, which are interdependent. Thus a nonparametric approach appears to be a natural choice for entropy estimation of large molecules. However, entropy evaluation using histogram and kernel density estimates also has problems in high dimensions. Estimates of entropy based on nearest neighbor distances between sample points (Kozachenko and Leonenko. Problems of Information Transmission, 1987) and estimates of entropy based on kth nearest neighbor distances (Singh et al., to appear in Statistics and Decisions) offer hope for estimating entropy for large molecules. However, evaluating nearest neighbor distances is computationally challenging when the number of torsional angles is large and the data obtained using molecular dynamic simulations on the molecule is huge. We discuss computational approaches for obtaining estimates of entropy based on nearest neighbor distances. Our approaches use RMPI to parallelize n^2 and nlog(n) kth nearest neighbor algorithms, where n is the number of dynamic simulations. We illustrate this approach using data on torsional angles of some large molecules.


Take me back to the main proceedings page.