Wednesday, November 25, 2009

streamcluster Perforated Loops

This benchmark implements an online approximation of the K-means clustering algorithm~\cite{hartigan1979k}. It partitions a set of points into clusters, with each cluster centered around one of the input points. The algorithm is designed to process sets of points from a space of features, with the clusters corresponding to entities with similar features.

The inner loop in pFL() estimates the cost of opening a new cluster center. Perforating this loop allows the application to make a less accurate estimate of this cost more quickly. The following comment appears above the definition of the loop bound (ITER) in the source code:
/* higher ITER --> more likely to get correct number of centers */
/* higher ITER also scales the running time almost linearly */


This comment reflects the fact that the original version of the loop computes an inexact approximation of the true cost, with the number of iterations controlling a performance vs. quality of service tradeoff. In effect, the perforation space exploration space algorithm rediscovers this tradeoff.

// This function is defined in streamcluster.cpp in the streamcluster source code
float pFL(Points *points, int *feasible, int numfeasible,
float z, long *k, double cost, long iter, float e,
int pid, pthread_barrier_t* barrier)
{
#ifdef ENABLE_THREADS
pthread_barrier_wait(barrier);
#endif
long i;
long x;
double change;
long numberOfPoints;

change = cost;
/* continue until we run iter iterations without improvement */
/* stop instead if improvement is less than e */
while (change/cost > 1.0*e) {
change = 0.0;
numberOfPoints = points->num;
/* randomize order in which centers are considered */

if( pid == 0 ) {
intshuffle(feasible, numfeasible);
}
#ifdef ENABLE_THREADS
pthread_barrier_wait(barrier);
#endif
// This loop is perforated, changing i++ to i+=2
for (i=0;i<iter;i++) {
x = i%numfeasible;
change += pgain(feasible[x], points, z, k, pid, barrier);
}

cost -= change;
#ifdef PRINTINFO
if( pid == 0 ) {
fprintf(stderr, "%d centers, cost %lf, total distance %lf\n",
*k, cost, cost - z*(*k));
}
#endif
#ifdef ENABLE_THREADS
pthread_barrier_wait(barrier);
#endif
}
return(cost);
}



Our examination of the source code confirms that perforating this loop
(as the comment suggests) leaves the application able to process all inputs.