Privacy preserving density-based clustering over horizontally partitioned spatial data

Marcin Gorawski, Marek Łuk

Abstract


The paper proposes a new density-based distributed clustering algorithm - the PPDBDC (Privacy Preserving Density-Based Distributed Clustering) algorithm.. This algorithm can be applied to horizontally distributed spatial data in a data mining process. It is based on existing distributed clustering algorithms: the DBDC algorithm and the SDBDC algorithm. In addition presented solution enables local data privacy preservation.

Keywords


odkrywanie wiedzy; eksploracja danych; klasteryzacja; dane przestrzenne; prywatność danych; obliczenia równoległe; poziome rozproszenie danych; DBCSAN; DBDC; SDBDC; PPDBDC

Full Text:

PDF (Polski)

References


Clinton C, Kantarcioglu M., Vaidya J.: Defining Privacy for Data Mining. Proceedings of the National Science Foundation Workshop on Next Generation Data Minig, November 1-3, 2002. Baltmiore. MD

Clifton C , Kantarcioglu M , Vaidya J., Lm X.. Zhu M. Y.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newsletter. v4 n.2, s. 28-34, December 2002.

Clinton C: Privacy Preserving Distributed Data Mining. November, 2001 www.cs-.purdue.edu/homes/clifton/DistDM/CliftonDDM.pdf

Ester M., Kriegel H., Sander J., Xu X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd int. Conf. on Knowledge Discovery and Data Mining (KDD ‘96). Portland, Oregon, 1996. AAAI Press, 1996.

Ester M., Kriegel H., Sander J., Xu X : Density-Based Clustering in Spatial Databases. The Algorithm GDBSCAN and its Applications. Data Mining and Knowledge Discovery, An International Journal 2(2) 169-194, June 1998. MA

Gorawski M., Malczok R.: Eps Algorithm A Heuristic Approach to Calculation Density-Based Clustering Eps Parameter. 4th International Conference. Advances in Information Systems. ADVTS06, October 18-20, 2006. Izmir. Turkey, Lecture Notes in Computer Science, s. 90-99.

Gorawski M., Malczok R Calculation of Density-Based Clustering Parameters Supported with Distributed Processing. 8th International Conference Data Warehousing and Knowledge Discovery, DaWaK 2006, Krakow. Poland, September 4 - 8, 2006, Lecture Notes in Computer Science, s. 417-426.

Gorawski M.. Słabiński Ł. Data Privacy Preserving Distributed k-means Clustering (Ochrona prywatności danych podczas rozproszonego klastrowania metodą k-średnich). Computer Methods and Systems. Kraków, Poland Oprogramowanie Naukowo-Techniczne, Vol 2, 2005. s. 351-356.

Gorawski M.. Stachurski K.: On Association Rules Mining Algorithms with Data Privacy Preserving. Advances in Web Intelligence Third International Atlantic Web Intelligence Conference, AWIC 2005. Lodz, Poland, June 6-9, 2005, Lecture Notes in in Artificial Intelligence 3528 Springer 2005, s. 170-175.

Gorawski M.: Association Rule Mining and Data Privacy Preserving in Spatial Data Warehouses - Performance Analysis (Odkrywanie reguł asocjacji i ochrona prywatności danych w przestrzennych hurtowniach danych - analiza wydajnościowa). 4 th Conference on Information Technology May 21-24, 2006, Gdansk, Faculty of ETI Annals in Information Technologies, ZN.ETI Nr 4. T1. 2006, s. 673-680.

Jagannathan G., Pillaipakkamnatt K., Wright R.: A New Privacy-Preserving Distributed k-Clustering Algorithm. Proceedings of the 2006 SIAM International Conference on Data Mining (SDM), 2006

Jagannathan G., Wright R.: Privacy-Preserving Distributed k-Means Clustering over Arbitrarily Partitioned Data. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge Discovery in Data Mining. 2005.

Januzaj E., Kriegel H., Pfeifle M. : Scalable Density-Based Distributed Clustering. Proc. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, Italy, 2004.

Januzaj EKriegel H., Pfeifle M.: Towards Effective and Efficient Distributed Clustering. Proceedings of Int. Workshop on Clustering Large Data Sets. 3rd Int. Conf. on Data Mining (ICDM 2003), Melbourne, FL, 2003, s. 49-58.

Jha S., Kruger L., McDaniel P.: Privacy Preserving Clustering. 10th European Symposium on Research in Computer Security (ESORICS), Milan, Italy, September 2005.

Klusch M., Lodi S., Moro G.: Distributed clustering based on sampling local density estimates. The Eighteenth biennial International Joint Conference on Artificial Intelligence (IJCAI’20O3), Morgan Kaufmann, Mexico, August 2003.

Merugu S., Ghosh J.: A Privacy-sensitive Approach to Distributed Clustering. Pattern Recognition Letters, vol. 26, 2005, s. 399-410.

Merugu S., Ghosh J.: Privacy-preserving Distributed Clustering using Generative Models. Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM). November 2003.

Oliveira S., Zaiane O: Privacy Preserving Clustering By Data Transformation. Proceedings of the 18th Brazilian Symposium on Databases (SBBD 2003), Manaus, Brazil October 2003, s 304-318.

Vaidya J., Clinton C.: Privacy Preserving Association Rule Mining in Vertically Partitioned Data. The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002).

Zhang N., Zhao W.: Distributed Privacy Preserving Information Sharing. Proceedings of the International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, September 2005.




DOI: http://dx.doi.org/10.21936/si2007_v28.n3A.548