Abstract:
There is a high necessity to refresh the data mining results, as the former results become stale and obsolete over time due to dynamic and evolving data. Clustering is one of the important data mining techniques that help to group data points with similarity together. To mine the data generated exponentially in these days, MapReduce, a parallel programming framework can be combined MapReduce with the k-medoids clustering algorithm to arrive at the optimum results quickly. Due to the parallel processing architecture of Hadoop, the proposed iterative algorithm for processing incremental data using an intermediate key file exhibited better performance over conventional k-medoids.