Table of Contents

Short Courses

Random Forests
Leo Breiman and Adele Cutler

Gene Expression Analysis
Rafael A. Irizarry

Papers and Presentations

Comparison of Classification Techniques in Bioinformatics
Rashpal Ahluwalia and Sundar Chidambaram

Visualizing Patients Treated with Three-Dimensional Computed Tomography-Guided Brachytherapy
Faleh Alshameri and Jee Vang

Extending the Loop Design for Microarray Experiments
Naomi S. Altman

Mixed Effects Model for Assessing RNA Degradation in Affymetrix GeneChip experiments
Kellie J. Archer, Suresh E. Joel, and Viswanathan Ramakrishnan

Cherry picking: A new robustness tool
David Banks and Leanna House

Modeling Dinucleotide Density Fluctuations in Genome Sequences
R. H. Baran

Downdating and other operations on a truncated complete orthogonal decomposition
Jesse Barlow

Mining Distance-Based Outliers in Near Linear Time
Stephen D. Bay and Mark Schwabacher

Many features, few samples: From cheminformatics to bioinformatics
Kristin Bennet

Five Hierarchical Levels of Sequence-Structure Correlation in Proteins
Christopher Bystroff

Evaluating Natural Language Processing Applications Applied to Outbreak and Disease Surveillance
Wendy W. Chapman, John N. Dowling, Oleg Ivanov, Per H. Gesteland, Robert T. Olszewski, Jeremy U. Espino, and Michael M. Wagner

Learning Imbalanced Data with Random Forests
Chao Chen, Andy Liaw, and Leo Breiman

Cancer classification using informative gene profiles
Xue-Wen Chen

Limitations of Statistical Learning from Gene Expression Data
Tianjiao Chu

The Restricted Partition Method for Detecting Epistatic Interactions Contributing to a Quantitative Trait
Robert Culverhouse, Tsvika Klein, Mary Relling, and William Shannon

The MiTAP System for Monitoring Reports of Disease Outbreak
Laurie E. Damianos, Guido Zarella, and Lynette Hirschman

Cluster Substance Identification via Conditional Entropy Calculations
James C. Diggans and Jeffrey L. Solka

A Wavelet-Based Statistical Analysis of fMRI Data: I. Motivation and Data Distribution Modeling
Ivo D. Dinov, John W. Boscardin, Michael S. Mega, Elizabeth L. Sowell, and Arthur W. Toga

On the Least Median Square Problem
Jeff Erickson, Sariel Har-Peled, and David Mount

Modeling continuous shape change for facial animation
Julian Faraway

Sequential model list selection for function approximation
Ernest Fokoue

Jointly Optimizing Model Complexity and Data-Processing Parameters with Mixed-Input SPSA
Jim Garrett

Gene Expression Comparisons for Class Prediction in Cancer Studies
Donald Geman, Christian d'Avignon, Daniel Q. Naiman, Raimond L. Winslow, and Arnaud Zeboulon

Estimating the Parameters of Infinite Scale Mixtures of Normals
Hasan Hamdan and John P. Nolan

Yxilon-Designing the Next Generation, Vertically Integrable Statistical Software Environment
Wolfgang Härdle, Sigbert Klinke, and Uwe Ziegenhagen

Computation of the kth Nearest Neighbor Estimate of Entropy of Molecules Using Parallel Processing
E. James Harner, Jun Tan, Shengqiao Li, and Harshinder Singh

Assessing Survival Forests for Prognosis Based on Gene Profiles
Thu M. Hoàng and Van L. Parsons

Application of the Random Forest Classification Algorithm to a SELDI-TOF Proteomics Study in the Setting of a Cancer Prevention Trial
Grant Izmirlian

Characterization and Re-Annotation of Common Genes Found in 35 Complete Chloroplast Genomes
Beatrice Kilel

Structural Analysis of Network Traffic Flows
Eric Kolaczyk, Anukool Lakhina, Dina Papagiannaki, Mark Crovella, Christophe Diot, and Nina Taft

Assessment of the Relative Therapeutic Effect in Small Groups at Several Time Points: Comparison of Mucosal and Subcutaneous Peptide Vaccines in Rhesus Macaques Exposed to SHIV
V.A. Kuznetsov, V.S. Stepanov, J.A. Berzofsky, and I.M. Belyakov

Subsampling Model Selection in Neural Networks for Nonlinear Time Series Analysis
Michelle La Rocca and Cira Perna

Structured multicategory support vector machine with ANOVA decomposition
Yoonkyung Lee

Intersection Graphs for Text Analysis
Elizabeth Leeds and David J. Marchette

Shakespeare: A Combinatoric to Gene Network Modularization
Nicholas Lewin-Koh and Christopher Taylor

Classification of high dimensional data by two-way mixture models
Jia Li

User Profiling in Window Title and Process Table
Chien-Chih Lin, Eun Young Noh, Youngping Yan, and Edward Wegman

Cancer prediction with kernel PLS and gene expression profile
Zhenqui Liu, Decheng Chen, and Jacques Reifman

The Volume-of-Tube Formula: Computational Methods and Statistical Applications
Catherine Loader

SVD-based functional ANOVA for measurement evaluation of MALDI-TOF mass spectrometry of polymers
Z. Q. John Lu

Monte Carlo Analysis of Univariate Statistical Outlier Techniques
Mark W. Lukens

Automated Phenotypic Networks for the Integration of Heterogeneous Databases
Yves A. Lussier and Xiaoyan Wang

Signal Conditioning and Filtering of SELDI Mass Spectrometry Time Series
Dariya I. Malyarenko, William E. Cooke, Eugene R. Tracy, Haijian Chen, O. John Semmes, Maciek Sasinowski, Michael W. Trosset, and Dennis M. Manos

Confidence-Based Cost-Sensitive Classification Decisions
Dragos D. Margineantu

Classification and Clustering Using Weighted Text Proximity Matrices
Wendy L. Martinez, Angel R. Martinez and Edward J. Wegman

Polyoptimizing Genetic Algorithm for Feature Subset Selection
Ewy Mathe and John Grefenstette

Identifying Differentially Expressed Proteins in 2-D DIGE Experiments
Yan Ma and E. James Harner

Multivariate Density Estimation for Massive Datasets via Sequential Convex Hull Peeling
James P. McDermott and Dennis K. J. Lin

Supervised Learning Methods for Gene-Expression Data
G. J. McLachlan, C. Ambroise, L. Ben-Tovin Jones, and X. Zhu

Systems biology thought experiments for interpreting epistasis models
Jason Moore

XML-Based Applications in Statistical Analysis
Yuichi Mori, Tomokazu Fujino, Yoshiro Yamamoto, Takafumi Kubota, and Tomoyuki Tarumi

A Comparison of Sequential and False Discovery Rate Algorithms: Computational Experiments for Exploratory DNA Microarray Studies
Danh V. Nguyen

Streaming Graphics
Andrew Norton and Leland Wilkinson

Gene-Gene and Gene-Environment Interactions and Genetic Case-Control Association Studies
Jurg Ott and Josephine Hoh

Wavelet and SiZer Analysis of Internet Traffic Data
Cheolwoo Park, Fred Godtliebsen, Felix Hernandez-Campos, J. S. Marron, Vitaliana Rondonotti, F. Donelson Smith, Stilian Stoev, and Murad Taqqu

Mixture models in molecular classification
Giovanni Parmigianni

Mining distance-based outliers in near linear time
Hanchuan Peng and Fuhui Long

Evolving Classifiers for Knowledge Discovery in Medical and Biological Databases
Michael R. Peterson, Travis E. Doom, and Michael L. Raymer

DNA Microbial and Viral Identification Using Ultra Specific Probes “Blind” to Host Background DNA
Catherine Putonti, George Fox, Richard C. Willson, B. Montgomery Pettitt, and Yuriy Fofanov

Wavelet Domain Linear Inversion via the LASSO
Leming Qu and Partha Routh

Computational Geometry, Data Depth, and Robust Statistics
Eynat Rafalin and Diane Souvaine

Noncentral Generalized F Distributions with Applications to Joint Outlier Detection
Donald E. Ramirez

Actor Allegiance and Block Model Strength
John Rigsby and Jeffrey L. Solka

Performance of the False Discovery Rate for Small Sets of cDNA Microarrays
Simon Rosenfeld

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis
Stephan R. Sain and Reinhard Furrer

Bayesian Hierarchical Model of the Browsing Behavior of World Wide Web Users
Juana Sanchez and Ching-Ti Liu

Alternatives to Mixture Modeling in High Dimensions
David W. Scott

Visual Data Mining of RNA Secondary Structure and Folding Pathways as Determined by the Massive Parallel Genetic Algorithm
Bruce A. Shapiro and Wojciech Kasprzak

Parallelizing the Computation of Spatial Covariance in Large Spatial Datasets
James A. Shine

An Efficient Algorithm for Simulating Coalescence with Recombination
Katy L. Simonsen, Dan A. Noland, and Chinh Le

Model-Based Clustering with an Adaptive Mixtures Smart Start
Jeffrey L. Solka and Wendy L. Martinez

Identifying Cross Corpora Document Associations via Minimal Spanning Trees
Jeffrey L. Solka, Avory C. Bryant, and Edward J. Wegman

The Analysis of Biomedical Data-Caveats and Challenges
Ray L. Somorjai

Cramér-Rao Bounds and Monte Carlo Calculation of the Fisher Information Matrix in Difficult Problems
James C. Spall

On the Construction of Discriminant Coordinates from Dissimilarity Data
Michael W. Trosset

Interactive graphics for large data sets - there is more to it than meets the eye
Antony Unwin

Privacy-Preserving k-Means Clustering over Vertically Partitioned Data
Jaideep Vaidya and Chris Clifton

Mining Concepts-Drifting Data Streams
Haixum Wang

Visual Analytics for Streaming Internet Traffic
Edward J. Wegman

Performance Metrics for Group-Detection Algorithms
J. V. White, S. Steingold, and C. G. Fournelle

A Two-Stage Nearest-Neighbor Classifier with Application to Microbial Source Tracking
Jayson D. Wilbur

Having it all
Allan Wilks

Indexing Continual Range Queries for Efficient Steam Processing
Kun-Lung Wu, Shyh-Kwei Chen, and Philip S. Yu

Visual Analytics for Dynamically Conditioned Choropleth Maps: QQplots, Scatterplots, Smoothes, and Two-Way Tables
Chunling Zhang, Yaru Li, and Daniel Carr

Does sequence similarity predict expression similarity
Kui Zhang