Abstract:
Through this paper we consider how the
representation, access and organization of the data
drastically affect the performance of Data Mining
Techniques. The framework we propose utilizes vertical
data representation which is an emerging data
representation technique, combined with couple of
compression schemes to facilitate efficient data mining,
scaling over large datasets. The key aspect of using a
compression scheme in SEED Miner lies in its vertical data
representation (where a column-based data representation is
considered in contrast to the conventional horizontal rowbased
representation) and we also provide the results of
empirical simulations to validate our analysis of WAH
compression applied on top of vertical data would provide
the scalability and efficiency of the applications and
algorithms embedded in SEED Miner.