20180212:TP:C:Threads:Intro

De wiki-prog
Révision de 23 janvier 2018 à 13:07 par Slashvar (discuter | contributions) (Introduction)

Aller à : navigation, rechercher


Introduction

This session spans across two weeks and is dedicated to multi-threading, it's intended to be an introduction to basic multi-threading (first part) and an exploration of some basic parallel algorithms.

Note that this session is long, but the first exercises are really basic and are just here for playing with thread creation, if you get the idea, you can skip to more interesting content like parallel sum.

Provided Makefile

Here is the simple Makefile that will be able to compile all questions from this session.

# Makefile
 
CC=gcc -fsanitize=address
CPPFLAGs=
CFLAGS= -Wall -Wextra -std=c99 -O2 -pthread
LDFLAGS=
LDLIBS=
 
all:
 
clean:
	rm -f *.o
 
# END

Hellos world

The first step is to start multiple threads. You'll need the following functions:

int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
int pthread_join(pthread_t thread, void **retval);
void pthread_exit(void *retval);

Our goal is to start a certain number of threads (we take the number on the command line) and make them print a simple message.

V1: all the same

The first version is pretty simple: we want to print Hello from thread ! in each thread.

Your program must behave like this:

shell> make hello_simple
gcc -fsanitize=address -Wall -Wextra -std=c99 -O2 -pthread    hello_simple.c   -o hello_simple
shell> ./hello_simple 2
Hello from thread !
Hello from thread !
shell> ./hello_simple 8
Hello from thread !
Hello from thread !
Hello from thread !
Hello from thread !
Hello from thread !
Hello from thread !
Hello from thread !
Hello from thread !

In order to manage creation and join, you need to keep track of thread handles, an array of phtread_t is the best choice.

// pseudo code for example only !
pthread_t thread_handles[42];
// ...
int e;
// run 3rd thread
e = pthread_create(thread_handles + 3, NULL, hello, "I'm dummy code copier");
// hello(arg) is the thread start routine,
// its argument should be the string to be printed
if (e != 0) // oups something goes wrong ...
  abort();  // use better error handling ...

Note about error handling: pthread_create(3) does not (may not?) set errno like traditional syscall, if you want to use err(3) in order to have a correct error display, you have to set errno to the return value of pthread_create(3).

V2: identified thread

Now, we want something similar, but each thread will have an id number (provided by the thread creator.)

Again, here is an expected output:

shell> make hello_id
gcc -fsanitize=address -Wall -Wextra -std=c99 -O2 -pthread    hello_id.c   -o hello_id
shell> ./hello_id 4
<00>: Hello from thread !
<01>: Hello from thread !
<03>: Hello from thread !
<02>: Hello from thread !
shell> ./hello_id 8
<07>: Hello from thread !
<00>: Hello from thread !
<01>: Hello from thread !
<04>: Hello from thread !
<03>: Hello from thread !
<05>: Hello from thread !
<06>: Hello from thread !
<02>: Hello from thread !

In order to keep your data passing clean, the recommended way is to use a struct storing the message to be printed (even if it's the same for all threads) and the thread id.

struct th_arg {
  const char *msg;
  size_t id;
};

Then, you create an array of this struct and initialize each cell with the message and a different id (use the cell index), and pass the pointer to the cell to the thread (pthread_create(3) last parameter.)

// pseudo code for example only !
struct th_arg data[42];
(data + 3)->msg = "I'm dummy code copier";
(data + 3)->id = 3;
pthread_create(thread_handles + 3, NULL, hello, data + 3);

Who is reading stdin ?

What happen when two threads read from the standard input (or any other input) ? Which thread gets the input ?

The best way to solve that is to test !

Let's write an echo function:

void echo(size_t id) {
  int r;
  char buf[256];
  while ( (r = read(STDIN_FILENO, buf, 255)) ) {
    if (r == -1) {
      if (errno == EINTR) continue;
      err(2, "Issue reading from stdin in thread %zu", id);
    }
    buf[r] = 0;
    printf("<%02zu>: %s", id, buf);
  }
}

On the same idea as previous exercise, launch multiple threads (that will run the echo function) and see who gets inputs from stdin.

Compile, run, test, learn !

Timing

First, in order to see if your parallel code is interesting, you need to be able to correctly time your algorithms and only the part you care (not init or other stuff.)

For that we need to have access to a correct wall-clock, this is provided by clock_gettime(2), using CLOCK_MONOTONIC. This function fills a structure timespec of the form:

struct timespec {
  time_t   tv_sec;        /* seconds */
  long     tv_nsec;       /* nanoseconds */
};

Timing a piece code is simply done by getting time before the code is ran and just after it, then the difference between the two points gives you the elapsed time. So, the first things you need is a function able to compute differences between two points in time, since we only care about a readable value to display, we'll return the difference in seconds as a double precision floating point number (double.)

  • Implement the following function:
double double_difftime(struct timespec *t0, struct timespec *t1);

Eventually, you may want to wrap the timing block of code in a macro like this one:

# define TIMING_CODE(BLK__, RES__)	    \
    do {				    \
      struct timespec t0, t1;		    \
      clock_gettime(CLOCK_MONOTONIC, &t0);  \
      BLK__;				    \
      clock_gettime(CLOCK_MONOTONIC, &t1);  \
      RES__ = double_difftime(&t0, &t1);    \
    } while (0)

That can then ben call like this:

  double runtime;
  TIMING_CODE( (myfunc(...), anotherfunc(...)), runtime);

Or, using statement-expression (GNU extension):

  double runtime;
  TIMING_CODE( ({
      myfunc(...);
      anotherfunc(...);
    }), runtime);

Array Sum

We'll implement several strategy to implement simple array sum.

Basic block split

In this version, we divide the array in simple chunks, one for each threads, compute the sum in each chunk using a thread and then sum the result. The pseudo-code is as follow:

lin_array_sum(begin, end) : compute the sum of the cells between begin and end

run(data) :
  data.result = lin_array_sum(data.begin, data.end)
  exit thread

basic_split_sum(begin, end, nbthr):
  workers: array of nbthr thread handles
  data: array of nbthr thread's data (containing a begin, an end and a double for result)
  len = (end - begin) / nbthr
  cur = begin
  for i in [0, nbthr[:
    data[i].begin = cur
    data[i].end = min(end, cur + len)
    create_thread(workers[i], run, data[i])
    cur = cur + len
  result = 0
  for i in [0, nbthr[:
    thread_join(workers[i])
    result = result + data[i].result
  return result
  • Implement the described strategy and compare time with run on a simple call on linear sum. Try with various number of threads.

Divide and Conquer

Now we want to implement the D'n'C strategy, without threads the algorithm look like this:

dnd_sum(begin, end, threshold):
  if end - begin < threshold:
    return lin_array_sum(begin, end)
  mid = begin + (end - begin) / 2
  a = dnd_sum(begin, mid)
  b = dnd_sum(mid, end)
  return a + b

In order to do that with threads properly (beware the version seen in lecture won't work, probably) you need a run function that take thread's data, run the code, set the result in thread data and leave current thread.

void* worker(void *arg_) {
  struct arg *arg = arg_;
  arg->result = dnd_sum(arg->begin, arg->end, arg->threshold);
  pthread_exit(NULL);
}

You also need to compute the threshold in order to run a maximum number of threads (fixed). If you apply the correct strategy, the threshold can be obtained by just dividing the length of the array by the number of threads. The correct strategy is to only replace one recursive call by a new thread.

  • Implement and compare with the previous version.

Vector dot product

The dot-product of two vectors of same dimensions is defined as:

dotproduct(v1, v2):
  sum = 0
  for i in [0, len(v1)[:
    sum = sum + v1[i] * v2[i]
  return sum
  • Adapt previous code to compute dot-product in parallel.

Finding minimum

Your goal now is to apply the same ideas to find the minimum of the application of a function on every value of the array. In order to test, consider the following function (you'll need to implement it):

/* next_prime(x) returns the first prime number after x */
/* x <= next_prime(x) and for all prime number y: x <= next_prime(x) <= y */
unsigned long next_prime(unsigned_long x);
  • Implement, apply, compare.