Table of Contents

Keynote Addresses

Standing on the Shoulders of Giants and Variations on a Theme
James F. Goodnight

Data Mining at Microsoft
David Heckerman


Banquet Address

Eavesdropping on the Brain
Terrence Sejnowski


Short Courses

Statistical Data Mining
Edward J. Wegman

Bioinformatics
Pierre Baldi


Statistical Graphics

Making Trees Interactive – KLIMT
Simon Urbanek and Antony R. Unwin

Some Graphics for Recursive Partitioning
Daniel B. Carr and Ru Sun


Flexible Models for Prediction

Predictive Data Mining with Multiple Additive Regression Trees
Jerome H. Friedman

Towards Understanding Boosting
Bin Yu and Peter Buhlmann

Why Does Model Averaging Work?
Yoav Freund


Model-Based Clustering

A Model Based Approach to Text Categorization and Clustering
Alejandro Murua, Jeremy Tantrum, Werner Stuetzle and Solveig Sieberts

Unsupervised Segmentation and Classification of Mixtures of Markovian Sources
Yevgeny Seldin, Gill Bejerano, and Naftali Tishby

Evaluating Sequential Tests for a Class of Stochastic Processes
Xiaoping Xiong and Ming Tan

A Comparison of Reversible Jump MCMC algorithms for DNA Sequence Segmentation Using Hidden Markov Models
Richard J. Boys and Daniel A. Henderson


Office of Naval Research Overview

Functional Analysis of Computer Network Data
J. L. Solka and D. J. Marchette

Inferring Internal Losses and Delays in Communication Networks from Edge Measurements
Robert Nowak

Texture Modeling Using Self-Similar Wavelets and POMMs
Jennifer Davidson and Richard Barton

The Adaptive Data Cube: An Experiment in Hyperspectral Pattern Recognition
Carey E. Priebe


Visualization for Data Mining

GGobi: XGobi Redesigned and Extended
Deborah F. Swayne, Duncan Temple Lang, Andreas Buja, and Dianne Cook

Visual Post Analysis of Association Rules
H. Hofmann

Uncovering Complexity in Data through Sound
Mark H. Hansen and Ben Rubin


Beyond Correlation

Causal Inference in Statistics: A Gentle Introduction
Judea Pearl

The Defining Role of "Principal Effects" in Comparing Treatments Using General Post- Treatment Variables
Constatine E. Frangakis and Donald B. Rubin


Statistical Models for Text

Active Learning for Support Vector Machines with Applications to Text Classification
Simon Tong and Daphne Koller

Conditional Random Fields for Text Processing
John Lafferty, Andrew McCallum, and Fernando Periera

Relevant Encoding of Linguistic Data via the Information Bottleneck Method
Naftali Tishby


Bayesian Methods

A Split Merge Markov Chain Sampling Algorithm for Bayesian Mixture Models
Sonia Jain and Radford M. Neal

Priors for Bayesian Neural Networks
Mark Robinson

Adaptive Metropolis-Hastings Samplers for the Bayesian Analysis of Large Linear Gaussian Systems
Stephen K. H. Yeung and Darren J. Wilkinson

Genetic Analysis of Melanoma Onset by using Estimating Equations and Bayesian Random Effects Models
K-A. Do, P. Kuhnert, S-J. Lee, J. F. Aitken, A. Green, and N. G. Martin

GDAGsim: Sparse Matrix Algorithms for Bayesian Computation
Darren J. Wilkinson


Army Research Office Overview

Approximations to Dirichlet Processes with Applications
Jayaram Sethuraman

Banks of Interacting Bayesian Filters
Roris L. Rozovskii, R. Blazek and A. Petrov

Data Reduction by Quantization
Edward J. Wegman and Nkem-Amin (Martin) Khumbah

The Principle and Practice of Minimum Description Length
Bin Yu


Gene Expression – I

A Bayesian Approach to Analysis of cDNA Microarray Data
M. A. Black, B. A. Craig, M. Tanurdzic, and R. W. Doerge

A Statistical Analysis of Radiolabeled Gen Expression Data
Rafael A. Irizarry, Giovanni Parigiani, Mingzhou Guo, Tatiana Dracheva, Jin Jen

Replication and Appropriate Statistical Analysis are Required for Accurate Interpretation of DNA Microarray Experiments
She-pin Hung and G. Wesley Hatfield

Identifying Statistically Significant Similarieties in Gene Expression Patterns via Bayesian Infinite Mixture Models
Mario Medvedovic and Siva Vivaganesan

An Interdisciplinary Program Employing Computational, Biochemical and Genomic Methods to Examine the Effects of Chromosome Structure on the Regulation of Gene Expression
Lorenzo Tolleri, Craig J. Benham, Pierre Baldi , and G. Wesley Hatfield


Graphical Models

Variational Models and Bayesian Estimation
Tommi Jaakkola

Advanced Mean Field Methods for Probabilistic Models
Manfred Opper and Ole Winther

Probability Assessment with Maximum Entropy in Bayesian Networks
Wim Wiegerinck and Tom Heskes


Environmental Modeling

Functional Data Analysis of Complex Computer Simulation Output: A Case Study in Nuclear Waste Disposal Waste Assessment
David Draper and Bruno Mendes

Integrated Assessment of Drinking Water Regulations
Mitchell J. Small, Patrick Gurian, Mark Schervish, and J. R. Lockwood

Bayesian Sensitivity Analysis and Uncertainty Analysis
Jeremy E. Oakley and Anthony O'Hagan

Sensitivity Analysis of a Buried Radioactive Waste Risk Model
Tom Stockton


Computational Finance

Learning to Trade via Direct Reinforcement
John Moody

Statistical Inference, the Bootstrap, and Neural Network Modeling with Application to Foreign Exchange Rates
Jeff Racine and Halbert White


National Security Agency Overview

Dynamic Visualization of Changing Prior and Posterior in Bayesian Analysis
Hani Doss and B. Narasimhan

Nonparametric Clustering
David W. Scott


Massive Data Sets

...Reflections on a Workshop
Jon R. Kettenring


Analyzing Web Data

Statistical Learning Problems Associated with the World Wide Web
Byron Dom

Finite State Approaches to Information Extraction
Andrew McCallum, Fernando Pereira, John Lafferty, and Dayne Freitag

Graph Structure in the Web
Andrew Tomkins


Bayesian Bioinformatics

Genome-Wide Binding Motif Discovery via Microarray and Prospect Sampler
Jun Liu and Xiaole Liu

Hierarchial Models for Gene Expression Data Analysis
Michael Newton and Christina Kendziorski

Stochastic Models for Sequences with Non-Local Dependency Structure
Scott C. Schmidler


John Tukey and the Interface

John Tukey and the Correlation Coefficient
David R. Brillinger

On the Interaction between Statistics and Computing: In Memory of John W. Tukey
Luisa Fernholz

The Legacy of John Tukey
Robert L. Launer


Ecological and Earth Science Applications

Spatio-Temporal Prediction of Incomplete Precipitation Records
Craig Johns and Douglas Nychka

Bayesian and Frequentist Inference for Ecological Inference: The R x C Case
Ori Rosen, Wengxin Jiang, Gary King, and Martin Tanner

Using the Chemical Mass Balance Model to Estimate Pollution Source Contributions from Correlated Air Quality Observations
William F. Christensen

Land Cover Mapping using Combination and Ensemble Classifiers
Brian M. Steele and David A. Patterson

Mining for Knowledge about Ostracode Assemblages in the Tecolutla River Delta
A. Dale Magoun, Melvin Kontrovitz and Daniel J. Stanley


International Association for Statistical Computing Overview

Developing Data Mining Systems
Arno Siebes

Graphical and Statistical Pruning of Association Rules
Adalbert Wilhelm


How Large is the Web?

Searching the Web: Current Limitations, New Techniques, and Future Directions
C. Lee Giles

How Large is the World Wide Web?
Adrian Dobra and Stephen E. Fienberg


Support Vector Machines

A Tutorial on Support Vector Machines
Bernhard Schollkopf

Kernel Methods for Unsupervised Learning Bernhard Schollkopf


Chernoff Faces the Interface

Graphical Representation as a Discipline
Herman Chernoff

Clustering and Genetics of Complex Disease
Richard Olshen

Multivariate Statistical Process Control and Signature Analysis using Eigenfactor Detection Methods
Kuang-Han Chen, Duane S. Boning, and Roy E. Welsch


Clusters, Outliers, and Density Models

Data Sharpening for Higher0Order Density Estimation
Michael C. Minnotte and Peter Hall

Robust Detection of Multivariate Outliers in High Dimensions and High Levels of Contamination
Mark Werner and Karen Kafadar

The Complexity of Computing the MCD Estimator
Thorsten Bernholt and Paul Fischer

Finding Committee Solutions by Clustering Models in Function Space
Thomas Ragg

Detecting Novel Samples in Mass Spectral Data: A Clustering Approach
Vladimir Svetnik and Andy I. Liaw


Journal of Computational and Graphical Statistics Overview

A Computational Approach to Full Nonparametric Bayesian Inference under Dirichlet Process Mixture Models
Alan E. Gelfand and Athanasios Kottas

Hierarchical Model-Based Clustering for Large Datasets
Christian Posse


Software Support for Bayesian Analysis Systems

Computing Environments for Bayesian Statistics
Robert Gentleman

Stochastic Parameterized Grammars for Bayesian Model Composition
Eric Mjolsness, Michael Turmon, and Wolfgang Fink

The Bayes Net Toolbox for Matlab
Kevin P. Murphy


Massive Data Sets

Data Squashing: Constructing Summary Data Sets
William DuMouchel

Exploratory Analysis of Retail Sales of Billions of Items
Dunja Mladenic, William F. Eddy, and Scott Ziolko

Mining Large Datasets
Johannes Gehrke


Census 2000: Lessons for Census 2010

Technology and 2010 Census
Carol M. Van Horn

The U. S. Census Bureau's MAF/TIGER System, Internal and External Interfaces
Robert Marx and Linda M. Franz


Gene Expression II

Assessing Patient Survival using Microarray Gene Expression Data via Partial Least Squares Proportional Hazard Regression
Danh V. Nguyen and David M. Rocke

Lessons Learned from Analyzing the Differential Gene Expression Data between Normal and Tumor Tissues in Head and Neck Cancer Patients
J. Jack Lee, Hyung Woo Kim, Feng Zhan, and Adel K. El-Naggar

Taming Genetic Microarray Data: A Paradigm using a Well-Known Case Study
Howard T. Thaler

Statistical Modelling of Micro Array Data
Ziad Taib


National Science Foundation Overview

Unraveling and Defining Biocomplexity
William K. Michener and James L. Rosenberger

Theoretical and Computational Challenges in Entropy Evaluation of Macromolecules
H. Singh, J. Harner, V. Hnizdo and E. Demchek


Computational Tools and Methods

Ciphertext Size Requirement of Ciphertext-Only Attack on Vignere Cipher
Qiong Yang and Song Guo

Interval Computation of Gamma Probabilities and Their Inverses
Trong Wu

Smooth Quadratures of Volterra Integral Equations with Applications to Estimation of HIV Infection Rates and Projection of AIDS Incidence
John J. Hsieh

Designing Experiments for Causal Networks
William D. Heavlin

Multi-Layer Structured Correlation Designs for Heterogeneous and Unbalanced Clustered Data
Edward C. Chao

On Perfect Stability in Characteristic Functions
Jinhyo Kim and Bongsu Ko

An Environment for Creating Interactive Statistical Documents
Samuel E. Buttrey, Deborah Nolan and Duncan Temple Lang

Experiences with a Course on "Web-Based Statistics"
Jürgen Symanzik and Natascha Vukasinovic

ASSIST: A Package for Spline Smoothing in S-Plus Template
Yuedong Wang and Chunlei Ke

JAVA Implementation of Multiple Linear Regression Models for Patient-Specific Longitudinal Data to Monitor Chemotherapy-Induced Anemia
Christine E. McLaren, Wagner Truppel, Randall F. Holcombe, and Edward L. Kambour

The Development of Community Nutrition Map (CNMap)
Alvin B. Nowverl


Decision Support and Forecasting

Cost Growth Models for NASA's Programs: A Summary
Tze-San Lee and L. Dale Thomas

Series Approximations in Analysis of Risk
Costas A. Christophi and Reza Modarres

An Adequate Statistics for the Exponentially Distributed Censoring Data
P. S. Nair and S-C. Cheng

Comparing Two Measurement Devices: Review and Extensions to Estimate New Device Variability
Brian J. Eastwood

Computationally Intensive Techniques for a Fully Bayesian, Decision Theoretic Approach to Financial Forecasting and Portfolio Selection
Andrew Simpson and Darren J. Wilkinson


Classification Methods

A Statistical View of the Support Vector Machine
Yi Lin

Lazy Class Probability Estimators
Dragois D. Margineantu and Thomas G. Dietterich

PERT – Perfect Random Tree Ensembles
Adele Cutler and Guohua Zhao

Multicategory Support Vector Machines
Yoonkyung Lee, Yi Lin, and Grace Wahba

Using Pseudo-Predictors to Improve the Performance of a Classification Rule
Majid Mojirsheibani


Regression and Function Estimation

Inference for Self-Modeling Regression with Random Effects
Naomi Altman

Support Vector Machine Regression in Chemometrics
Ayhan Demiriz, Kristin P. Bennett, Curt M. Breneman, and Mark J. Embrechts

Data-Driven and Optimal Denoising of a Signal and Recovery of Its Derivative Using Multiwavelets
Nathanial Tymes, Jr., Sam Efromovich, M. Christina Pereyra, and Joseph D. Lakey

RIP-GAMs with an Application to Human Brain Research
Michael G. Schimek

An Adaptive-Learned Temporal Radial Basis Function Network for Recursive Function Estimation
Yiu Ming Cheung and Lei Xu


Visualization and Image Data

A Statistical Approach to the Segmentation of MR Imagery and Volume Estimation of Stroke Lesions
Benjamin Stein and Joseph Horowitz

Visualizing Spatial Autocorrelation with Dynamically Linked Windows
Luc Anselin, Ibnu Syabri, Oleg Smirnov, and Yanqui Ren

Compressions and Analysis of Very Large Imagery Data Sets using Spatial Statistics
James A. Shine

Statistical Visualization of Environmental Data on the Web using nViZn
Lacey Jones and Jürgen Symanzik

A Principled Approach to Interactive Hierarchical Non-Linear Visualization of High- Dimensional Data
Peter Tino, Ian Nabney, Yi Sun, and Bruce S. Williams


Data Mining

A Tree-Based Scan Statistic for Database Disease Surveillance
Martin Kulldorff, Zixing Fang, and Stephen Walsh

Creating Ensembles of Decision Trees through Sampling
Chandrika Kamath and Erick Cantú-Paz

Data Mining Diabetic Databases: Are Rough Sets a Useful Addition?
Joseph L. Breault

Model Complexity Based Design of Radial Basis Function Networks with Data Mining Applications
Miyoung Shin and Amrit L. Goel

Combining Decision Trees using Systematic Patterns
Hyunjoong Kim


Sampling and Resampling Methods

Resampling Time Series with Seasonal Components
Dimitris N. Politis

Correlation and Sampling in Relational Data Mining
David Jensen and Jennifer Neville

Inference for the Sample Maximum in the Presence of Serial Correlation and Heavy- Tailed Distributions
Tucker McElroy and Dimitris N. Politis

BootQC: Bootstrap for Statistical Quality Control and Applications to Aviation Safety Analysis
Regina Y. Liu and Hueychung Teng

Selection of the Shrinkage Factor for the Two Stage Testimator of the Normal Mean using Bootstrap Likelihood
Makarand V. Ratnaparkhi, Vasant B. Waikar, and Frederick J. Schuurmann


Bioinformatics Day


Biological Sequence Analysis

Comparative Genomics and the Future of Biological Knowledge
Anthony Kerlavage

The Public Working Draft of the Human Genome
David Haussler

Identification of Post-Translationally Modified and Mutated Proteins via Mass- Spectrometry
Pavel Pevzner


Gene Expression Data Analysis

Improved Statistical Inference from DNA Microarray Data using Analysis of Variance and a Bayesian Statistical Framework
G. Wesley Hatfield

Statistical Issues, Data Analysis, and Modelling for Gene Expression Profiling
Mike West

Plaid Models for DNA Microarrays
Art Owen


Medical Informatics

Integrating Data and Disciplines: Biostatistics and Biomedical Informatics
Joyce Niland

The Trouble with Text: Challenges and Promises of Biomedical Information Retrieval Technology
Wanda Pratt

Public Health Aspects of Bioinformatics and Medical Informatics
Abdelmonem A. Afifi


Automated Analysis of Brain Images

On Metrics and Variational Equations of Computational Anatomy
Michael Miller

Visual Analysis of Variance: A Tool for Quantitative Assessment of fMRI Data Processing and Analysis
William F. Eddy and R. L. McNamee

Positron Emission Tomography: Image Formation and Analysis
Richard Leahy


Corrected Paper from Volume 32

Is Cross-Validation the Best Approach for Principal Component and Ridge Regression?
Roy E. Welsch