20160208:TP:C:Threads:Algos

Sommaire

1 Introduction
2 Timing
3 Array Sum
- 3.1 Basic block split
- 3.2 Divide and Conquer
4 Vector dot product
5 Finding minimum

Introduction

Our purpose this week is to use threads for performing interesting stuff.

Timing

First, in order to see if your parallel code is interesting, you need to be able to correctly time your algorithms and only the part you care (not init or other stuff.)

For that we need to have access to a correct wall-clock, this is provided by clock_gettime(2), using CLOCK_MONOTONIC. This function fills a structure timespec of the form:

struct timespec {
  time_t   tv_sec;        /* seconds */
  long     tv_nsec;       /* nanoseconds */
};

Timing a piece code is simply done by getting time before the code is ran and just after it, then the difference between the two points gives you the elapsed time. So, the first things you need is a function able to compute differences between two points in time, since we only care about a readable value to display, we'll return the difference in seconds as a double precision floating point number (double.)

Implement the following function:

double double_difftime(struct timespec *t0, struct timespec *t1);

Eventually, you may want to wrap the timing block of code in a macro like this one:

# define TIMING_CODE(BLK__, RES__)	    \
    do {				    \
      struct timespec t0, t1;		    \
      clock_gettime(CLOCK_MONOTONIC, &t0);  \
      BLK__;				    \
      clock_gettime(CLOCK_MONOTONIC, &t1);  \
      RES__ = double_difftime(&t0, &t1);    \
    } while (0)

That can then ben call like this:

  double runtime;
  TIMING_CODE( (myfunc(...), anotherfunc(...)), runtime);

Or, using statement-expression (GNU extension):

  double runtime;
  TIMING_CODE( ({
      myfunc(...);
      anotherfunc(...);
    }), runtime);

Array Sum

We'll implement several strategy to implement simple array sum.

Basic block split

In this version, we divide the array in simple chunks, one for each threads, compute the sum in each chunk using a thread and then sum the result. The pseudo-code is as follow:

lin_array_sum(begin, end) : compute the sum of the cells between begin and end

run(data) :
  data.result = lin_array_sum(data.begin, data.end)
  exit thread

basic_split_sum(begin, end, nbthr):
  workers: array of nbthr thread handles
  data: array of nbthr thread's data (containing a begin, an end and a double for result)
  len = (end - begin) / nbthr
  cur = begin
  for i in [0, nbthr[:
    data[i].begin = cur
    data[i].end = min(end, cur + len)
    create_thread(workers[i], run, data[i])
    cur = cur + len
  result = 0
  for i in [0, nbthr[:
    thread_join(workers[i])
    result = result + data[i].result
  return result

Implement the described strategy and compare time with run on a simple call on linear sum. Try with various number of threads.

Divide and Conquer

Now we want to implement the D'n'C strategy, without threads the algorithm look like this:

dnd_sum(begin, end, threshold):
  if end - begin < threshold:
    return lin_array_sum(begin, end)
  mid = begin + (end - begin) / 2
  a = dnd_sum(begin, mid)
  b = dnd_sum(mid, end)
  return a + b

In order to do that with threads properly (beware the version seen in lecture won't work, probably) you need a run function that take thread's data, run the code, set the result in thread data and leave current thread.

void* worker(void *arg_) {
  struct arg *arg = arg_;
  arg->result = dnd_sum(arg->begin, arg->end, arg->threshold);
  pthread_exit(NULL);
}

You also need to compute the threshold in order to run a maximum number of threads (fixed). If you apply the correct strategy, the threshold can be obtained by just dividing the length of the array by the number of threads. The correct strategy is to only replace one recursive call by a new thread.

Implement and compare with the previous version.

Vector dot product

The dot-product of two vectors of same dimensions is defined as:

dotproduct(v1, v2):
  sum = 0
  for i in [0, len(v1)[:
    sum = sum + v1[i] * v2[i]
  return sum

Adapt previous code to compute dot-product in parallel.

Finding minimum

Your goal now is to apply the same ideas to find the minimum of the application of a function on every value of the array. In order to test, consider the following function (you'll need to implement it):

/* next_prime(x) returns the first prime number after x */
/* x <= next_prime(x) and for all prime number y: x <= next_prime(x) <= y */
unsigned long next_prime(unsigned_long x);

Implement, apply, compare.

20160208:TP:C:Threads:Algos

Sommaire

Introduction

Timing

Array Sum

Basic block split

Divide and Conquer

Vector dot product

Finding minimum

Menu de navigation

Outils personnels

Espaces de noms

Variantes

Affichages

Plus

Rechercher

Plan

Outils