The influence of selected metrics on the result of an examination of clusters

Łukasz Paśko, Galina Setlak

Abstract


The aim of this paper is to present the metrics used to measure a distance between objects in a feature space. The analyses were performed on seven datasets. For each of them, the occurrence of clusters of similar objects was examined, and the measures of clusters’ dispersion were calculated. The calculations were carried out using fourteen metrics known from the literature. The article contains selected results with particular emphasis on the differences arising from the use of various metrics.

Keywords


data mining; metrics; measures of the quality of clusters

Full Text:

PDF (Polski)

References


Alcalá-Fdez J., Fernandez A., Luengo J., Derrac J., García S., Sánchez L., Herrera F.: KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing, Vol. 17, No. 2÷3, 2011, s. 255÷287

Cha S.: Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences, Vol. 1, No. 4, 2007, s. 300÷307.

Cox T. F., Cox M. A. A: Multidimensional Scaling, 2nd edition, Chapman & Hall/CRC Press, 2000.

Deza M. M., Deza E.: Encyclopedia of distances. Springer-Verlag, Berlin, Heidelberg 2009.

Dolnicar S.: Using cluster analysis for market segmentation – typical misconceptions, established methodological weaknesses and some recommendations for improvement. Australasian Journal of Market Research, Vol. 11 (2), 2003, s. 5÷12.

Everitt B. S., Landau S., Leese M.: Cluster analysis, Wiley Publishing, Nowy Jork 2009.

Gavin D. G., Oswald W. W., Wahl E. R., Williams J. W.: A statistical approach to evaluating distance metrics and analog assignments for pollen records. Quaternary Research, Vol. 60, 2003, s. 356÷367.

Gordon A. D.: Classification, 2nd edition, Chapman & Hall/CRC Press, 1999.

Hand D., Mannila H., Smyth P.: Eksploracja danych. WNT, Warszawa 2005.

Jain A. K., Dubes R. C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey 1988.

Jain A. K., Murty M. N., Flynn P. J.: Data clustering: a review. ACM Computing Surveys, Vol. 31, No. 3, 1999, s. 264÷323.

Krivulin N.: An algebraic approach to multidimensional minimax location problems with Chebyshev distance. WSEAS Transaction on Mathematics, Vol. 10, No. 6, 2011, s. 191÷200.

Krause E. F.: Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Dover, New York 1986.

Meila M.: Comparing clusterings – an information based distance. Journal of Multivariate Analysis, Vol. 98, No. 5, 2007, s. 873÷895.

Monev V.: Introduction to similarity searching in chemistry. MATCH Communications in Mathematical and in Computer Chemistry, Vol. 51, 2004, s. 7÷38.

Osowski S.: Metody i narzędzia eksploracji danych. Wyd. BTC, Legionowo 2013.

Paśko Ł., Setlak G.: Ocena segmentacji rynku za pomocą miar jakości grupowania danych. Studia Informatica, Vol. 35, No. 2 (116), Gliwice 2014, s. 157÷173.

Setlak G., Paśko Ł.: Zastosowanie metod eksploracji danych do segmentacji rynków. Studia Informatica, Vol. 34, No. 2A (111), Gliwice 2013, s. 311 ÷323.

http://sci2s.ugr.es/keel/datasets.php.




DOI: http://dx.doi.org/10.21936/si2015_v36.n1.720