Standing on the Shoulders of Giants and Variations on a Theme

James F. Goodnight

Data Mining at Microsoft

David Heckerman

Eavesdropping on the Brain

Terrence Sejnowski

Statistical Data Mining

Edward J. Wegman

Bioinformatics

Pierre Baldi

Making Trees Interactive – KLIMT

Simon Urbanek and Antony R. Unwin

Some Graphics for Recursive Partitioning

Daniel B. Carr and Ru Sun

Predictive Data Mining with Multiple Additive Regression Trees

Jerome H. Friedman

Towards Understanding Boosting

Bin Yu and Peter Buhlmann

Why Does Model Averaging Work?

Yoav Freund

A Model Based Approach to Text Categorization and Clustering

Alejandro Murua, Jeremy Tantrum, Werner Stuetzle and Solveig Sieberts

Unsupervised Segmentation and Classification of Mixtures of Markovian Sources

Yevgeny Seldin, Gill Bejerano, and Naftali Tishby

Evaluating Sequential Tests for a Class of Stochastic Processes

Xiaoping Xiong and Ming Tan

A Comparison of Reversible Jump MCMC algorithms for DNA Sequence Segmentation
Using Hidden Markov Models

Richard J. Boys and Daniel A. Henderson

Functional Analysis of Computer Network Data

J. L. Solka and D. J. Marchette

Inferring Internal Losses and Delays in Communication Networks from Edge
Measurements

Robert Nowak

Texture Modeling Using Self-Similar Wavelets and POMMs

Jennifer Davidson and Richard Barton

The Adaptive Data Cube: An Experiment in Hyperspectral Pattern Recognition

Carey E. Priebe

GGobi: XGobi Redesigned and Extended

Deborah F. Swayne, Duncan Temple Lang, Andreas Buja, and Dianne Cook

Visual Post Analysis of Association Rules

H. Hofmann

Uncovering Complexity in Data through Sound

Mark H. Hansen and Ben Rubin

Causal Inference in Statistics: A Gentle Introduction

Judea Pearl

The Defining Role of "Principal Effects" in Comparing Treatments Using General Post-
Treatment Variables

Constatine E. Frangakis and Donald B. Rubin

Active Learning for Support Vector Machines with Applications to Text Classification

Simon Tong and Daphne Koller

Conditional Random Fields for Text Processing

John Lafferty, Andrew McCallum, and Fernando Periera

Relevant Encoding of Linguistic Data via the Information Bottleneck Method

Naftali Tishby

A Split Merge Markov Chain Sampling Algorithm for Bayesian Mixture Models

Sonia Jain and Radford M. Neal

Priors for Bayesian Neural Networks

Mark Robinson

Adaptive Metropolis-Hastings Samplers for the Bayesian Analysis of Large Linear
Gaussian Systems

Stephen K. H. Yeung and Darren J. Wilkinson

Genetic Analysis of Melanoma Onset by using Estimating Equations and Bayesian
Random Effects Models

K-A. Do, P. Kuhnert, S-J. Lee, J. F. Aitken, A. Green, and N. G.
Martin

GDAGsim: Sparse Matrix Algorithms for Bayesian Computation

Darren J. Wilkinson

Approximations to Dirichlet Processes with Applications

Jayaram Sethuraman

Banks of Interacting Bayesian Filters

Roris L. Rozovskii, R. Blazek and A. Petrov

Data Reduction by Quantization

Edward J. Wegman and Nkem-Amin (Martin) Khumbah

The Principle and Practice of Minimum Description Length

Bin Yu

A Bayesian Approach to Analysis of cDNA Microarray Data

M. A. Black, B. A. Craig, M. Tanurdzic, and R. W. Doerge

A Statistical Analysis of Radiolabeled Gen Expression Data

Rafael A. Irizarry, Giovanni Parigiani, Mingzhou Guo, Tatiana Dracheva, Jin Jen

Replication and Appropriate Statistical Analysis are Required for Accurate Interpretation
of DNA Microarray Experiments

She-pin Hung and G. Wesley Hatfield

Identifying Statistically Significant Similarieties in Gene Expression Patterns via
Bayesian Infinite Mixture Models

Mario Medvedovic and Siva Vivaganesan

An Interdisciplinary Program Employing Computational, Biochemical and Genomic
Methods to Examine the Effects of Chromosome Structure on the Regulation of Gene
Expression

Lorenzo Tolleri, Craig J. Benham, Pierre Baldi , and G. Wesley Hatfield

Variational Models and Bayesian Estimation

Tommi Jaakkola

Advanced Mean Field Methods for Probabilistic Models

Manfred Opper and Ole Winther

Probability Assessment with Maximum Entropy in Bayesian Networks

Wim Wiegerinck and Tom Heskes

Functional Data Analysis of Complex Computer Simulation Output: A Case Study in
Nuclear Waste Disposal Waste Assessment

David Draper and Bruno Mendes

Integrated Assessment of Drinking Water Regulations

Mitchell J. Small, Patrick Gurian, Mark Schervish, and J. R. Lockwood

Bayesian Sensitivity Analysis and Uncertainty Analysis

Jeremy E. Oakley and Anthony O'Hagan

Sensitivity Analysis of a Buried Radioactive Waste Risk Model

Tom Stockton

Learning to Trade via Direct Reinforcement

John Moody

Statistical Inference, the Bootstrap, and Neural Network Modeling with Application to
Foreign Exchange Rates

Jeff Racine and Halbert White

Dynamic Visualization of Changing Prior and Posterior in Bayesian Analysis

Hani Doss and B. Narasimhan

Nonparametric Clustering

David W. Scott

...Reflections on a Workshop

Jon R. Kettenring

Statistical Learning Problems Associated with the World Wide Web

Byron Dom

Finite State Approaches to Information Extraction

Andrew McCallum, Fernando Pereira, John Lafferty, and Dayne Freitag

Graph Structure in the Web

Andrew Tomkins

Genome-Wide Binding Motif Discovery via Microarray and Prospect Sampler

Jun Liu and Xiaole Liu

Hierarchial Models for Gene Expression Data Analysis

Michael Newton and Christina Kendziorski

Stochastic Models for Sequences with Non-Local Dependency Structure

Scott C. Schmidler

John Tukey and the Correlation Coefficient

David R. Brillinger

On the Interaction between Statistics and Computing: In Memory of John W. Tukey

Luisa Fernholz

The Legacy of John Tukey

Robert L. Launer

Spatio-Temporal Prediction of Incomplete Precipitation Records

Craig Johns and Douglas Nychka

Bayesian and Frequentist Inference for Ecological Inference: The R x C Case

Ori Rosen, Wengxin Jiang, Gary King, and Martin Tanner

Using the Chemical Mass Balance Model to Estimate Pollution Source Contributions
from Correlated Air Quality Observations

William F. Christensen

Land Cover Mapping using Combination and Ensemble Classifiers

Brian M. Steele and David A. Patterson

Mining for Knowledge about Ostracode Assemblages in the Tecolutla River Delta

A. Dale Magoun, Melvin Kontrovitz and Daniel J. Stanley

Developing Data Mining Systems

Arno Siebes

Graphical and Statistical Pruning of Association Rules

Adalbert Wilhelm

Searching the Web: Current Limitations, New Techniques, and Future Directions

C. Lee Giles

How Large is the World Wide Web?

Adrian Dobra and Stephen E. Fienberg

A Tutorial on Support Vector Machines

Bernhard Schollkopf

Kernel Methods for Unsupervised Learning
Bernhard Schollkopf

Graphical Representation as a Discipline

Herman Chernoff

Clustering and Genetics of Complex Disease

Richard Olshen

Multivariate Statistical Process Control and Signature Analysis using Eigenfactor
Detection Methods

Kuang-Han Chen, Duane S. Boning, and Roy E. Welsch

Data Sharpening for Higher0Order Density Estimation

Michael C. Minnotte and Peter Hall

Robust Detection of Multivariate Outliers in High Dimensions and High Levels of
Contamination

Mark Werner and Karen Kafadar

The Complexity of Computing the MCD Estimator

Thorsten Bernholt and Paul Fischer

Finding Committee Solutions by Clustering Models in Function Space

Thomas Ragg

Detecting Novel Samples in Mass Spectral Data: A Clustering Approach

Vladimir Svetnik and Andy I. Liaw

A Computational Approach to Full Nonparametric Bayesian Inference under Dirichlet
Process Mixture Models

Alan E. Gelfand and Athanasios Kottas

Hierarchical Model-Based Clustering for Large Datasets

Christian Posse

Computing Environments for Bayesian Statistics

Robert Gentleman

Stochastic Parameterized Grammars for Bayesian Model Composition

Eric Mjolsness, Michael Turmon, and Wolfgang Fink

The Bayes Net Toolbox for Matlab

Kevin P. Murphy

Data Squashing: Constructing Summary Data Sets

William DuMouchel

Exploratory Analysis of Retail Sales of Billions of Items

Dunja Mladenic, William F. Eddy, and Scott Ziolko

Mining Large Datasets

Johannes Gehrke

Technology and 2010 Census

Carol M. Van Horn

The U. S. Census Bureau's MAF/TIGER System, Internal and External Interfaces

Robert Marx and Linda M. Franz

Assessing Patient Survival using Microarray Gene Expression Data via Partial Least
Squares Proportional Hazard Regression

Danh V. Nguyen and David M. Rocke

Lessons Learned from Analyzing the Differential Gene Expression Data between Normal
and Tumor Tissues in Head and Neck Cancer Patients

J. Jack Lee, Hyung Woo Kim, Feng Zhan, and Adel K. El-Naggar

Taming Genetic Microarray Data: A Paradigm using a Well-Known Case Study

Howard T. Thaler

Statistical Modelling of Micro Array Data

Ziad Taib

Unraveling and Defining Biocomplexity

William K. Michener and James L. Rosenberger

Theoretical and Computational Challenges in Entropy Evaluation of Macromolecules

H. Singh, J. Harner, V. Hnizdo and E. Demchek

Ciphertext Size Requirement of Ciphertext-Only Attack on Vignere Cipher

Qiong Yang and Song Guo

Interval Computation of Gamma Probabilities and Their Inverses

Trong Wu

Smooth Quadratures of Volterra Integral Equations with Applications to Estimation of
HIV Infection Rates and Projection of AIDS Incidence

John J. Hsieh

Designing Experiments for Causal Networks

William D. Heavlin

Multi-Layer Structured Correlation Designs for Heterogeneous and Unbalanced
Clustered Data

Edward C. Chao

On Perfect Stability in Characteristic Functions

Jinhyo Kim and Bongsu Ko

An Environment for Creating Interactive Statistical Documents

Samuel E. Buttrey, Deborah Nolan and Duncan Temple Lang

Experiences with a Course on "Web-Based Statistics"

Jürgen Symanzik and Natascha Vukasinovic

ASSIST: A Package for Spline Smoothing in S-Plus Template

Yuedong Wang and Chunlei Ke

JAVA Implementation of Multiple Linear Regression Models for Patient-Specific
Longitudinal Data to Monitor Chemotherapy-Induced Anemia

Christine E. McLaren, Wagner Truppel, Randall F. Holcombe, and Edward L. Kambour

The Development of Community Nutrition Map (CNMap)

Alvin B. Nowverl

Cost Growth Models for NASA's Programs: A Summary

Tze-San Lee and L. Dale Thomas

Series Approximations in Analysis of Risk

Costas A. Christophi and Reza Modarres

An Adequate Statistics for the Exponentially Distributed Censoring Data

P. S. Nair and S-C. Cheng

Comparing Two Measurement Devices: Review and Extensions to Estimate New Device
Variability

Brian J. Eastwood

Computationally Intensive Techniques for a Fully Bayesian, Decision Theoretic
Approach to Financial Forecasting and Portfolio Selection

Andrew Simpson and Darren J. Wilkinson

A Statistical View of the Support Vector Machine

Yi Lin

Lazy Class Probability Estimators

Dragois D. Margineantu and Thomas G. Dietterich

PERT – Perfect Random Tree Ensembles

Adele Cutler and Guohua Zhao

Multicategory Support Vector Machines

Yoonkyung Lee, Yi Lin, and Grace Wahba

Using Pseudo-Predictors to Improve the Performance of a Classification Rule

Majid Mojirsheibani

Inference for Self-Modeling Regression with Random Effects

Naomi Altman

Support Vector Machine Regression in Chemometrics

Ayhan Demiriz, Kristin P. Bennett, Curt M. Breneman, and Mark J. Embrechts

Data-Driven and Optimal Denoising of a Signal and Recovery of Its Derivative Using
Multiwavelets

Nathanial Tymes, Jr., Sam Efromovich, M. Christina Pereyra, and Joseph D. Lakey

RIP-GAMs with an Application to Human Brain Research

Michael G. Schimek

An Adaptive-Learned Temporal Radial Basis Function Network for Recursive Function
Estimation

Yiu Ming Cheung and Lei Xu

A Statistical Approach to the Segmentation of MR Imagery and Volume Estimation of
Stroke Lesions

Benjamin Stein and Joseph Horowitz

Visualizing Spatial Autocorrelation with Dynamically Linked Windows

Luc Anselin, Ibnu Syabri, Oleg Smirnov, and Yanqui Ren

Compressions and Analysis of Very Large Imagery Data Sets using Spatial Statistics

James A. Shine

Statistical Visualization of Environmental Data on the Web using nViZn

Lacey Jones and Jürgen Symanzik

A Principled Approach to Interactive Hierarchical Non-Linear Visualization of High-
Dimensional Data

Peter Tino, Ian Nabney, Yi Sun, and Bruce S. Williams

A Tree-Based Scan Statistic for Database Disease Surveillance

Martin Kulldorff, Zixing Fang, and Stephen Walsh

Creating Ensembles of Decision Trees through Sampling

Chandrika Kamath and Erick Cantú-Paz

Data Mining Diabetic Databases: Are Rough Sets a Useful Addition?

Joseph L. Breault

Model Complexity Based Design of Radial Basis Function Networks with Data Mining
Applications

Miyoung Shin and Amrit L. Goel

Combining Decision Trees using Systematic Patterns

Hyunjoong Kim

Resampling Time Series with Seasonal Components

Dimitris N. Politis

Correlation and Sampling in Relational Data Mining

David Jensen and Jennifer Neville

Inference for the Sample Maximum in the Presence of Serial Correlation and Heavy-
Tailed Distributions

Tucker McElroy and Dimitris N. Politis

BootQC: Bootstrap for Statistical Quality Control and Applications to Aviation Safety
Analysis

Regina Y. Liu and Hueychung Teng

Selection of the Shrinkage Factor for the Two Stage Testimator of the Normal Mean
using Bootstrap Likelihood

Makarand V. Ratnaparkhi, Vasant B. Waikar, and Frederick
J. Schuurmann

Comparative Genomics and the Future of Biological Knowledge

Anthony Kerlavage

The Public Working Draft of the Human Genome

David Haussler

Identification of Post-Translationally Modified and Mutated Proteins via Mass-
Spectrometry

Pavel Pevzner

Improved Statistical Inference from DNA Microarray Data using Analysis of Variance
and a Bayesian Statistical Framework

G. Wesley Hatfield

Statistical Issues, Data Analysis, and Modelling for Gene Expression Profiling

Mike West

Plaid Models for DNA Microarrays

Art Owen

Integrating Data and Disciplines: Biostatistics and Biomedical Informatics

Joyce Niland

The Trouble with Text: Challenges and Promises of Biomedical Information Retrieval
Technology

Wanda Pratt

Public Health Aspects of Bioinformatics and Medical Informatics

Abdelmonem A. Afifi

On Metrics and Variational Equations of Computational Anatomy

Michael Miller

Visual Analysis of Variance: A Tool for Quantitative Assessment of fMRI Data
Processing and Analysis

William F. Eddy and R. L. McNamee

Positron Emission Tomography: Image Formation and Analysis

Richard Leahy

Is Cross-Validation the Best Approach for Principal Component and Ridge Regression?

Roy E. Welsch