An R package for induction and evaluation of classification rules

Wojciech Malara, Marek Sikora, Łukasz Wróbel

Abstract


The primary goal of this paper is to present an R package for induction and evaluation of classification rules. The implemented rule induction algorithm employs a so-called covering strategy. A unique feature of the algorithm is the possibility of using different rule quality measures during growing and pruning of rules. The presented implementation is one of the first available for R environment.

Keywords


data mining; classification; classification rules

Full Text:

PDF (Polski)

References


Agrawal R., Srikant R.: Fast algorithms for mining association rules, [in:] Bocca J. B., Jarke M., Zaniolo C. (eds.): Proceedings of 20th International Conference on Very Large Data Bases. VLDB, Morgan Kaufmann, 1994, s. 487÷499.

Amin T., Chikalov I., Moshkov M., Zielosko B.: Dynamic programming approach to optimization of approximate decision rules. Information Sciences, No. 221, 2013, s. 403÷418.

An A., Cercone N.: Rule quality measures for rule induction systems: Description and evaluation. Computational Intelligence, Vol. 17, No. 3, 2001, s. 409÷424.

Bazan J., Szczuka M., Wróblewski J.: A new version of rough set exploration system. Rough Sets and Current Trends in Computing, Springer, 2002, s. 397÷404.

Bruha I., Tkadlec J.: Rule quality for multiple-rule classifier: Empirical expertise and theoretical methodology. Intelligent Data Analysis, Vol. 7, No. 2, 2003, s. 99÷124.

Cohen W. W.: Fast effective rule induction. Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, 1995, s. 115÷123.

Duch W., Adamczak R., Grabczewski K.: A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks, Vol. 12, No. 2, 2001, s. 277÷306.

Eddelbuettel D., François R.: Rcpp: Seamless R and C++ integration. Journal of Statistical Software, Vol. 40, No. 8, 2011, s. 1÷18.

Fürnkranz J.: Separate-and-conquer rule learning. Artificial Intelligence Review, Vol. 13, No. 1, 1999, s. 3÷54.

Fürnkranz J., Flach P. A.: ROC ’n’ rule learning – towards a better understanding of covering algorithms. Machine Learning, Vol. 58, No. 1, 2005, s. 39÷77.

Grzymala-Busse J. W.: A new version of the rule induction system LERS. Fundamenta Informaticae, Vol. 31, No. 1, 1997, s. 27÷39.

Hilderman R. J., Hamilton H. J.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Norwell, MA, USA 2001.

Hornik K., Buchta C., Zeileis A: Open-source machine learning: R meets Weka. Computational Statistics, Vol. 24, No. 2, 2009, s. 225÷232.

Janssen F., Fürnkranz J.: On the quest for optimal rule learning heuristics. Machine Learning, Vol. 78, No. 3, 2010, s. 343÷379.

Janssen F., Fürnkranz J.: Heuristic rule-based regression via dynamic reduction to classification, [in:] Walsh T. (ed.): Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11). 2011, s. 1330÷1335.

Kuhn M., Weston S., Keefer C., Coulte N.: Rule-based models, http://rulebasedmodels.r-forge.r-project.org.

Michalski R. S.: Discovering classification rules using variable-valued logic system VL. Proceedings of the 3rd international joint conference on Artificial intelligence, Morgan Kaufmann Publishers Inc., 1973, s. 162÷172.

Mierswa I., Wurst M., Klinkenberg R., Scholz M., Euler T.: Yale: Rapid prototyping for complex data mining tasks. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2006, s. 935÷940.

Pawlak Z.: Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht Boston 1991.

Quinlan J. R.: C4.5: Programs for Machine Learning, 1st ed. Morgan Kaufmann, San Mateo, CA, USA 1992.

Quinlan J. R.: Learning with continuous classes. Proceedings of the 5th Australian joint Conference on Artificial Intelligence, Singapore 1992, s. 343÷348.

R Development Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria 2011.

Sikora M.: Rule quality measures in creation and reduction of data rule models. Lecture Notes in Computer Science, Vol. 4259, Springer, Berlin-Heidelberg 2006, s. 716÷725.

Sikora M.: Wybrane metody oceny i przycinania reguł decyzyjnych. Studia Informatica, Vol. 33, No. 3B (108), Gliwice 2012.

Sikora M., Skowron A., Wróbel Ł.: Rule quality measure-based induction of unordered sets of regression rules. Lecture Notes in Computer Science, Vol. 7557, Springer, Berlin-Heidelberg 2012, s. 162÷171.

Sikora M., Wróbel Ł.: Data-driven adaptive selection of rules quality measures for improving the rules induction algorithm. Lecture Notes in Computer Science, Vol. 6743, Springer, Berlin-Heidelberg 2011, s. 278÷285.

Sikora M., Wróbel Ł.: Data-driven adaptive selection of rule quality measures for improving rule induction and filtration algorithms. International Journal of General Systems, 42(6) (w druku, 2013).

Stefanowski J.: Algorytmy indukcji reguł decyzyjnych w odkrywaniu wiedzy. Wydawnictwo Politechniki Poznańskiej, Poznań 2001.

Theußl S., Zeileis A.: Collaborative Software Development Using R-Forge. The R Journal, Vol. 1, No. 1, 2009, s. 9÷14.

Witten I. H., Frank E., Hall M. A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Amsterdam 2011.

Wróbel Ł.: Tree-based induction of decision list from survival data. Journal of Medical Informatics & Technologies, No. 20, 2012, s. 73÷78.

Wróbel Ł., Sikora M., Skowron A.: Algorithms for filtration of unordered sets of regression rules. Lecture Notes in Computer Science, Vol. 7694, Springer, 2012, s. 284÷295.




DOI: http://dx.doi.org/10.21936/si2013_v34.n2B.71