20171120:Practical:C:hashtable

De wiki-prog
Aller à : navigation, rechercher


Introduction

You will implement a traditional hash table using separate chaining (with linked lists) and automatic resizing.

S3#: It is due on Friday, April 27, 2018 at 2pm.

The submission format is:

Submission

  • 20171120_hash_table/
    • AUTHORS
    • hash_table.h
    • hash_table.c
    • tests

API

Your goal is to implement all the operation defined in the header hash_table.h:

/*
 * hash_table.h Hash Table implementation
 */
 
#ifndef _HASH_TABLE_H_
#define _HASH_TABLE_H_
 
#include <stdint.h>
#include <stdlib.h>
 
struct pair {
  uint32_t              hkey;
  char                 *key;
  void                 *value;
  struct pair          *next;
};
 
struct htable {
  size_t                size, capacity;
  struct pair         **tab;
};
 
/*
 * hash(data):
 * compute the hash of the nul terminated string data.
 */
uint32_t hash(char *data);
 
/*
 * create_htable(capacity):
 * build a new hash table with initial capacity.
 */
struct htable *create_htable(size_t capacity);
 
/*
 * access_htable(htable, key):
 * returns a pointer to the pair containing the given key
 * returns NULL if the key is not present
 */
struct pair *access_htable(struct htable *htable, char *key);
 
/*
 * add_htable(htable,key,value):
 * add the pair (key,value) to the hash table
 */
int add_htable(struct htable *htable, char *key, void *value);
 
/*
 * remove_htable(htable, key):
 * removes the pair containing the given key from the hash table
 */
void remove_htable(struct htable *htable, char *key);
 
/*
 * clear_htable(htable):
 * delete all pairs in the table
 */
void clear_htable(struct htable *htable);
 
# endif /* _HASH_TABLE_H_ */

You must (of course) respect the described structure since tests can rely on it.

Work

You will write the implementation file hash_table.c for the header hash_table.h.

Your file must not contains any main function

The hash function is described in a following section as for the strategies for insertion, collision and auto-resizing.

Implementation details

Your hash table will store pairs made of a key (a C string) and a generic pointer (void*) used as a value.

Collisions will be handled using linked lists: each pair contains a next pointer.

The next section describes the hash function and for key equality tests you'll use the standard strcmp(3) function.

Hash

You'll use this classical hash function (try to guess its name … )

The input is a C string (nul terminated) and the output is 32bit unsigned integer (uint32_t from stdint.h)


  • The initial hash value (h) is set to 0
  • Loop on all character in the key:
    • add the character value to h
    • add h times 1024 to h
    • h is replaced by the XOR between h and h/64
  • After the loop, add h times 8 to h
  • h is replaced by the XOR between h and h/2048
  • add h times 32768 to h
  • return h

Table initialization and searching

The function create_htable(capacity) creates a new hash table with capacity capacity.

All unused cells must be set to NULL.

Seaching

The function accees_htable(htable, key) looks for a pair containing the key key. The function will return the pointer to the corresponding pair in the table and NULL if it didn't found the key.

In order to search for a key:

  • compute the hash value of the key
  • compute the index in the array as the remainder of the hash by the capacity of the array
  • search for the key in the linked list starting at the computed index (use the hash value before using strcmp(3))

Note that we return the pair so our access function can be used to update the associated value.

Insert and delete

The function add_htable(htable,key,value) add a new pair in the table. The function returns 0 if the key is already present in the table, a value different from 0 otherwise.

You must first start like the search function then create the new pair (if the key was not present) and add it in front of the linked list.

The function remove_htable(htable, key) looks for key and suppresses (if present) the pair in the corresponding list.

Auto-resizing

When the number of stored elements hit a predefined threshold, the number of collisions (and thus the length of the linked lists in each cells) will grow quickly. In order to compensate this, we'll resize our table.

When the ratio size/capacity is greater than 0.75 (75%), we double the capacity of the table: you need to allocate a new array and insert again all the pairs from the original table. Beware that changing the capacity change the computed index for all keys.

Clear the table

The last operation, clear_htable(htable), delete all pairs from the table. Beware, you must free each pair !