Grossmann, Wilfried (2003), Metadata Usage in Statistical Computing, Computing Science and Statistics, 35, I2003Proceedings/GrossmannWilfried/GrossmannWilfried.paper.pdf
Information about data plays a crucial role in all steps of statistical analysis, but (data) management of this information is usual done in an ad-hoc manner by the working statistician. Recent developments of statistical computing environments, in particular for data mining, have improved the situation, but a systematic approach is not yet available. In the talk we will outline a model, which integrates statistical data and information about the data (i.e. metadata), and show the application of this structure in the context of statistical computing. The model is based on a number of information objects describing the data in some detail, for example information about the underlying populations, the methods for obtaining the data, or the variables used in the data set together with their roles in the context of the data set. Based on this model one can define for each statistical procedure the corresponding transformations on the adjoined metadata and describe the modifications of the metadata implied by the statistical procedure. Examples for a number of important data pre-processing steps like data combination, modification of variables or weighting is given.