Influence of the virtual machine manager on the data mining system performance

Dariusz Czerwiński

Abstract


This paper presents a comparative analysis of the impact of the virtual machine manager on the data mining systems performance. Discussion is based on the results obtained in a test environment based on the Cloudera Hadoop distribution which is used as personal cluster. The main focus is the hypervisor impact on the typical operations in data mining system, such as parallelized calculation, memory operations and the use of CPU resources.

Keywords


systems efficiency; virtualization; data mining systems; Cloudera Hadoop

Full Text:

PDF (Polski)

References


Yao Ke-Thia, Lucas R., Gottschalk T., Wagenbreth G., Ward C: Data Analysis for Massively Distributed Simulations, Interservice/Industry Training, Simulation, and Education Conference HTSEC 2009 Paper No. 9350.

Dean J., Sanjay Ghemawat S.: MapReduce: Simplified Data Processing on Large Clusters, http://research.google.com/arctóve/mapreduce.html, March 2012.

Hadoop Blog, http://developer.yahoo.com/blogs/hadoop/, March 2012.

Welcome to Hadoop Apache, http://hadoop.apache.org, March 2012.

Lam C.: Hadoop in Action, Manning Publications Co. 2011.

Nurmi D., Wolski R., Grzegorczyk Ch., Obertelli G., Soman S., Youseff L., Zagorodnov D.: The Eucalyptus Open-source Cloud-computing System, 9th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID), Vol. 0,2009, s. 124-131.

CDH Version and Packaging Information - Cloudera Support, https://ccp.cloudera.com-/display/DOC/CDH+Version+and+Packaging+Information, March 2012.




DOI: http://dx.doi.org/10.21936/si2012_v33.n3A.127