Principal component analysis in query selecivity estimation

Dariusz R Augustyn


Query selectivity allows to estimate the size of query results. It is required for obtaining the optimal method of query execution. This is a main goal of a query optimizer activities. Selectivity calculations for queries with a complex multi-attribute selection condition require a non-parametric estimator of multi-dimensional probability density function of distribution of table attribute values. Using a multi-dimensional histogram as a representation of multi-dimensional distribution is very space-consuming for high dimensions. The approach based on Principal Component Analysis allows to reduce dimensionality and makes the representation space efficient. Additionally the attribute value independence rule (with multiplicity of simple selectivities) may be used in a dimensions-reduced space so the method of the PCA-based selectivity estimation becomes simpler and more effective. The paper also presents the implementation of the proposed solution in DBMS Oracle as the extension of the query optimizer by using Oracle Data Cartridge Interface Statistics module.


query optimizer; PCA; query selectivity; Oracle ODCIStats; histogram

