Applying Bayesian networks for query selectivity estimation in query optimizer of Oracle DBMS

Dariusz R Augustyn, Łukasz Warchał

Abstract


The paper presents applying Bayesian network-based method of a selectivity estimation. The query selectivity allows estimate query result size, which allows to choose the optimal method of query execution. Obtaining the selectivity for a query with a selection condition based on many attributes, requires an estimator of a multidimensional probability density function of attribute values. Bayesian network can be used as a memory-efficient representation of the multidimensional distribution of attribute values. The article shows Bayesian network approach applied for extending the functionality of the query optimizer. Some Weka modules are used for implementing Bayesian network-based selectivity estimation in Oracle DBMS optimizer.

Keywords


query execution plan; query selectivity estimation; estimation of a multidimensional attribute value distribution; Bayesian network; Weka; query optimizer of Oracle DBMS

Full Text:

PDF (Polski)

References


Getoor L., Taskar B., Koller D.: Selectivity estimation using probabilistic modes. Proc. of ACM SIGMOD Int. Conf. on Management of Data. ACM, New York 2001.

Possala V., Ioannidis Y. E.: Selectivity Estimation without the Attribute Value Independence Assumption. Proc. of the 23rd Int. Conf. on Very Large Databases, The VLDB Journal, Athens 1997.

Gunopulos D., Kollios G., Tsotras V.J.: Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes. ACM SIGMOD 2000, Dallas 2000.

Lee. J., Deok-Hwan K. Chin-Wan Ch.: Multi-dimensional Selectivity Estimation Using Compressed Histogram Estimation Information. Proc. of ACM SIGMOD Int. Conf. on Management of Data. ACM, Philadelphia 1999.

Chakrabarti K. Garofalakis M., Rastogi R., Shim K: Approximate Query Processing Using Wavelets. VLDB Journal, vol. 10, no. 2-3, Springer-Verlag. New York 2001.

Bruno N., Chaudhuri S., Gravano L. (2001) STHoles: a multidimensional workload-aware histogram. Proc. of ACM SIGMOD hit. Conf. on Management of Data 30(2). ACM, New York 2001.

Kullback-Leibler divergence - Wikipedia (2010). http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence.

Witten I.H., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques (2 edition), Morgan Kaufmann, 2005.

Weka 3 - Data Mining with Open Source Machine Learning Software in Java (2010). http://www.cs.waikato.ac.nz/ml/weka.

Bouckaert R. R.: Bayesian Network Classifiers in Weka for Version 3-5-8. University of Waikato 2008.

Accessing Databases from WEKA: http://weka.wikispaces.com/Databases

Oracle 10g. Using extensible optimizer (2010). http://download.oracle.com/docs/cd/B14117 01 /appdev.101/b10800/dciextopt.htm.




DOI: http://dx.doi.org/10.21936/si2011_v32.n1A.107