Standing on the Shoulders of Giants and Variations on a Theme

James Goodnight

Those dealing seriously with large volumes of data and software technology are continually seeking improvements. Issues affecting "quality of results" are paramount as data volumes continue to grow, and opportunities to address data quality issues abound. Algorithms designed for complex data including more powerful sampling strategies, dimension reduction techniques, the ability to handle numerical and textual data, and the effective use of heuristics to speed results will be key to distilling vast data into quality information / knowledge / wisdom. Examples from genomics illustrate many of the important issues.