20160208:TP:C:Threads:Algos
Sommaire
Introduction
Our purpose this week is to use threads for performing interesting stuff.
Timing
First, in order to see if your parallel code is interesting, you need to be able to correctly time your algorithms and only the part you care (not init or other stuff.)
For that we need to have access to a correct wall-clock, this is provided by clock_gettime(2), using CLOCK_MONOTONIC. This function fills a structure timespec of the form:
struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ };
Timing a piece code is simply done by getting time before the code is ran and just after it, then the difference between the two points gives you the elapsed time. So, the first things you need is a function able to compute differences between two points in time, since we only care about a readable value to display, we'll return the difference in seconds as a double precision floating point number (double.)
- Implement the following function:
double double_difftime(struct timespec *t0, struct timespec *t1);
Eventually, you may want to wrap the timing block of code in a macro like this one:
# define TIMING_CODE(BLK__, RES__) \
do { \
struct timespec t0, t1; \
clock_gettime(CLOCK_MONOTONIC, &t0); \
BLK__; \
clock_gettime(CLOCK_MONOTONIC, &t1); \
RES__ = double_difftime(&t0, &t1); \
} while (0)
That can then ben call like this:
double runtime; TIMING_CODE( (myfunc(...), anotherfunc(...)), runtime);
Or, using statement-expression (GNU extension):
double runtime; TIMING_CODE( ({ myfunc(...); anotherfunc(...); }), runtime);
Array Sum
We'll implement several strategy to implement simple array sum.
Basic block split
In this version, we divide the array in simple chunks, one for each threads, compute the sum in each chunk using a thread and then sum the result. The pseudo-code is as follow:
lin_array_sum(begin, end) : compute the sum of the cells between begin and end run(data) : data.result = lin_array_sum(data.begin, data.end) exit thread basic_split_sum(begin, end, nbthr): workers: array of nbthr thread handles data: array of nbthr thread's data (containing a begin, an end and a double for result) len = (end - begin) / nbthr cur = begin for i in [0, nbthr[: data[i].begin = cur data[i].end = min(end, cur + len) create_thread(workers[i], run, data[i]) cur = cur + len result = 0 for i in [0, nbthr[: thread_join(workers[i]) result = result + data[i].result return result
- Implement the described strategy and compare time with run on a simple call on linear sum. Try with various number of threads.
Divide and Conquer
Now we want to implement the D'n'C strategy, without threads the algorithm look like this:
dnd_sum(begin, end, threshold): if end - begin < threshold: return lin_array_sum(begin, end) mid = begin + (end - begin) / 2 a = dnd_sum(begin, mid) b = dnd_sum(mid, end) return a + b
In order to do that with threads properly (beware the version seen in lecture won't work, probably) you need a run function that take thread's data, run the code, set the result in thread data and leave current thread.
void* worker(void *arg_) { struct arg *arg = arg_; arg->result = dnd_sum(arg->begin, arg->end, arg->threshold); pthread_exit(NULL); }
You also need to compute the threshold in order to run a maximum number of threads (fixed). If you apply the correct strategy, the threshold can be obtained by just dividing the length of the array by the number of threads. The correct strategy is to only replace one recursive call by a new thread.
- Implement and compare with the previous version.
Vector dot product
The dot-product of two vectors of same dimensions is defined as:
dotproduct(v1, v2): sum = 0 for i in [0, len(v1)[: sum = sum + v1[i] * v2[i] return sum
- Adapt previous code to compute dot-product in parallel.
Finding minimum
Your goal now is to apply the same ideas to find the minimum of the application of a function on every value of the array. In order to test, consider the following function (you'll need to implement it):
/* next_prime(x) returns the first prime number after x */ /* x <= next_prime(x) and for all prime number y: x <= next_prime(x) <= y */ unsigned long next_prime(unsigned_long x);
- Implement, apply, compare.