Martin Theus, Di Cook, and Heike Hofmann (2003), Scatterplots for Massive Datasets, Computing Science and Statistics, 35, I2003Proceedings/TheusMartin/TheusMartin.presentation.pdf
Representing data in scatterplots works well up to about tens of thousands cases. A point on a page takes very little ink so a lot of points can be drawn before overplotting occurs, especially when optimizations such as pixel sized glyphs and large plot windows are used. However scatterplots lose their usefulness when data sets reach the order of 100k. With such large data substantial overplotting masks structure in the data. Thus plots of large data are inherently binned by the screen real estate. This talk discusses the use of alpha-blending to render the points and grey scale to represent the counts at each pixel, investigating these methods for the representation of pairs of variables of large data. Now additionally data visualization is most useful when implemented into an interactive system that allows linking information between several plots. This talk also investigates the nature of linking information from a binned sctterplot representation.