The inner loop in pFL() estimates the cost of opening a new cluster center. Perforating this loop allows the application to make a less accurate estimate of this cost more quickly. The following comment appears above the definition of the loop bound (ITER) in the source code:

/* higher ITER --> more likely to get correct number of centers */

/* higher ITER also scales the running time almost linearly */

This comment reflects the fact that the original version of the loop computes an inexact approximation of the true cost, with the number of iterations controlling a performance vs. quality of service tradeoff. In effect, the perforation space exploration space algorithm rediscovers this tradeoff.

`// This function is defined in streamcluster.cpp in the streamcluster source code`

float pFL(Points *points, int *feasible, int numfeasible,

float z, long *k, double cost, long iter, float e,

int pid, pthread_barrier_t* barrier)

{

#ifdef ENABLE_THREADS

pthread_barrier_wait(barrier);

#endif

long i;

long x;

double change;

long numberOfPoints;

change = cost;

/* continue until we run iter iterations without improvement */

/* stop instead if improvement is less than e */

while (change/cost > 1.0*e) {

change = 0.0;

numberOfPoints = points->num;

/* randomize order in which centers are considered */

if( pid == 0 ) {

intshuffle(feasible, numfeasible);

}

#ifdef ENABLE_THREADS

pthread_barrier_wait(barrier);

#endif

// This loop is perforated, changing i++ to i+=2

for (i=0;i<iter;i++) {

x = i%numfeasible;

change += pgain(feasible[x], points, z, k, pid, barrier);

}

cost -= change;

#ifdef PRINTINFO

if( pid == 0 ) {

fprintf(stderr, "%d centers, cost %lf, total distance %lf\n",

*k, cost, cost - z*(*k));

}

#endif

#ifdef ENABLE_THREADS

pthread_barrier_wait(barrier);

#endif

}

return(cost);

}

Our examination of the source code confirms that perforating this loop

(as the comment suggests) leaves the application able to process all inputs.