Sridevi Parise, Padhraic Smyth, and Sergey Kirshner (2003), Multivariate Density Estimation with Permuted Variable-Values, Computing Science and Statistics, 35, I2003Proceedings/PariseSridevi/PariseSridevi.presentation.pdf ,
In this paper we consider the problem of multivariate density estimation from a data set of N rows (observations) and p columns (variables), where in each row the observed values can be permuted in some random fashion. Thus, for any row we do not know with certainty which variable is associated with each value. This problem arises in practice in a number of practical applications such as computer vision (where measured features of objects are not in correspondence with each other) and in text analysis (where fields of the data are not in correspondence). We derive lower-bounds on how well any algorithm can "un-mix" the rows, given perfect knowledge of the original multivariate distribution (but not knowing the permutation applied to each row). We show how this bound is related to the well-known classification Bayes error rate, providing some insight into the difficult of the problem. We analyze the special case of Gaussian densities and demonstrate the different effects of positive and negative correlation among the column variables. In the more usual case in practice where the parameters of the original multivariate density are unknown, the EM algorithm can be used to simultaneously learn the parameters and "un-mix" the rows. The concepts in the paper are illustrated using an application in astronomical data analysis, involving registration, clustering, and classification of images of galaxies.