C:Workshop:2018:D4
Due date for this tutorial: 2018-01-22 at 2pm Assignment directory: dotpstree
Introduction
Our goal for the last session of this workshop is to implement a variant of pstree(1). This command shows a tree representation of the running processes (using ASCII art), since don't really want to do ASCII art, we'll use the dot format instead and use dot(1) (or any other graphviz commands) to obtain a graphical representation of this tree. We will limit our display to the pid of running processes.
For that, we need to access the processes information described in /proc, but we don't want to loose time on directory crawling. Instead we'll use the base library of commands like ps(1) and top(1): libprocps. It's not part of the standard C library but is available on all Linux (in the base system).
The workflow of our program is as follow:
- get the list of processes and their information
- build the tree in memory
- serialize the tree in the dot format
The following sessions are here to guide you in this process.
Trees, lists and process representation
All data structures will be intrusive (see [1]).
General Trees as binary trees
The file tree.h (code follow) describes the lists and trees (note that this header is self contained, no other files is needed):
/* tree.h : intrusive general tree as binary trees */ /* provides also basic intrusive lists */ # ifndef DOTPSTREE_TREE_H_ # define DOTPSTREE_TREE_H_ # include <stddef.h> # define CONTAINER_OF(_type_, _field_, _ptr_) \ ((_type_*)((char*)(_ptr_) - offsetof(_type_, _field_))) /* classic intrusive lists */ struct list { struct list *next; }; /* list_init_sentinel(head) initializes a new list with head as a sentinel */ /* head must be allocated separately, of course */ static inline void list_init_sentinel(struct list *head) { head->next = NULL; } /* list_push_front(head, elm) pushes elm in front of the sentinel head */ /* head and elm must be allocated list cells */ static inline void list_push_front(struct list *head, struct list *elm) { elm->next = head->next; head->next = elm; } /* list_pop_front(head) pops the first element of the list and returns it */ /* returns NULL if the list is empty */ static inline struct list* list_pop_front(struct list *head) { struct list *elm = head->next; if (elm) { head->next = elm->next; elm->next = NULL; } return elm; } /* foreach_list(_head_, _list_iter_) iterating macro on list */ /* _list_iter_ must be the name of an existing variable of */ /* type struct list* */ # define foreach_list(_head_, _list_iter_) \ for (_list_iter_ = _head_; \ _list_iter_->next; \ _list_iter_ = _list_iter_->next ) /* General trees implemented using binary representation */ struct tree { struct tree *child, *sibling; }; /* tree_attach_child(parent, child) attaches the node child to parent */ /* both parent and child must be valid nodes, child must not have siblings */ static inline void tree_attach_child(struct tree *parent, struct tree *child) { child->sibling = parent->child; parent->child = child; } /* foreach_child(_node_, _chld_iter_) macro iterating on children of _node_ */ /* _chld_iter_ must the name of a variable of type struct tree* */ # define foreach_child(_node_, _chld_iter_) \ for (_chld_iter_ = _node_->child; \ _chld_iter_; \ _chld_iter_ = _chld_iter_->sibling) /* for example only */ static inline size_t tree_number_of_children(struct tree *node) { size_t count = 0; struct tree *child; foreach_child(node, child) { count += 1; } return count; } # endif /* DOTPSTREE_TREE_H_ */
The implementation follows the classical First-Child-Right-Sibling model where a node contains a pointer to its first child and a pointer to its right sibling (the next child of its parent.)
Process and memory management
Memory Pool
During the construction of the process tree, we will build a lot of nodes and then throw them away when finished. It can interesting to use a memory pool allocator rather than relying on malloc and having to free each node one by one.
Here is an implementation of such a memory pool:
/* mempool.h: a simple memory pool */ # ifndef DOTPSTREE_MEMPOOL_H_ # define DOTPSTREE_MEMPOOL_H_ # include <assert.h> # include <stddef.h> # include <stdlib.h> struct mempool { size_t capacity, size; char *data; }; static inline struct mempool* mempool_init(size_t capacity) { struct mempool *pool = malloc(sizeof (struct mempool)); pool->size = 0; pool->capacity = capacity; pool->data = calloc(capacity, sizeof (char)); return pool; } static inline void *mempool_alloc(struct mempool *pool, size_t size) { // you should take care of initial capacity assert(pool->size + size <= pool->capacity); void *ptr = pool->data + pool->size; pool->size += size; return ptr; } static inline void mempool_reset(struct mempool *pool) { pool->size = 0; } static inline void mempool_delete(struct mempool *pool) { free(pool->data); free(pool); } # endif /* DOTPSTREE_MEMPOOL_H_ */
Process representation
The following structure will represent a process:
struct process { pid_t pid; struct list list; struct tree tree; };
We limit our data to the pid of each process, but a lot more information can be added if you want to go further.
Each process will belong to a process list (for lookup) and a process tree.
These processes will be stored in a process_set described by this structure:
struct process_set { struct list *process_list; struct tree *process_tree; struct mempool *pool; };
The roles of process_list and process_tree are obvious. pool is the memory pool used to allocate memory all nodes. Note that thanks to the intrusive implementation, we have only memory cell allocated for each process, even if process nodes belong to two data structures.
Here is the complete process.h header, describing these two structures and their basic operations. Note that this header defines functions that you'll need to implement in process.c (see assignment later.)
/* process.h: process structure */ # ifndef DOTPSTREE_PROCESS_H_ # define DOTPSTREE_PROCESS_H_ # include <stddef.h> # include <unistd.h> # include <proc/readproc.h> # include "mempool.h" # include "tree.h" struct process { pid_t pid; struct list list; struct tree tree; }; struct process_set { struct list *process_list; struct tree *process_tree; struct mempool *pool; }; static inline void process_set_init(struct process_set *set) { // be large, allocate a million of struct process // lazy memory mapping is nice (and any way what is 100MB) set->pool = mempool_init(sizeof (struct process) * (1 << 20)); set->process_list = mempool_alloc(set->pool, sizeof (struct list)); list_init_sentinel(set->process_list); set->process_tree = NULL; } static inline struct process* process_set_find(struct process_set *set, pid_t pid) { struct list *cur; foreach_list(set->process_list, cur) { struct process *proc = CONTAINER_OF(struct process, list, cur->next); if (proc->pid == pid) return proc; } return NULL; } struct process* process_set_add(struct process_set *set, pid_t pid); void process_set_attach_process(struct process_set *set, proc_t *procinfo); struct process_set* make_process_set(void); static inline void process_set_delete(struct process_set *set) { mempool_delete(set->pool); free(set); } # endif /* DOTPSTREE_PROCESS_H_ */
Assignment
Your job is to implement our dot version of pstree.
Your must provide a directory (in your git) named dotpstree and containing:
- dotpstree:
- AUTHORS
- Makefile
- dot_out.c and dot_out.h: file containing tree traversal and dot output building
- dotpstree.c: the main file
- mempool.h
- process.c: implementation of the corresponding header
- process.h
- tree.h
process.c
You must provide implementation for the following functions:
/* process_set_add(set, pid) tries to add the process pid to the list and returns it */ /* if it's already in the list, returns the process node */ /* if it's the first encounter of the process, it'll add it to the process list */ /* the process will be set as the root of process tree if pid==1 */ struct process* process_set_add(struct process_set *set, pid_t pid); /* process_set_attach_process(set, procinfo) attaches a new process to the set */ /* this function is responsible to add the process and attach it to its parent */ void process_set_attach_process(struct process_set *set, proc_t *procinfo); /* make_process_set() creates and populates the set of processes */ struct process_set* make_process_set(void);
Here are some details:
- process_set_add(set, pid):
- if pid is already in the list, return it
- create the node (using the memory pool !)
- attach it to the list
- if pid == 1, then set the root of process tree to this node
- process_set_attach(set, procinfo):
- get needed information from procinfo
- add the process node
- if pid is not 1, search for its parent and add the node to children the parent's children list
- make_process_set():
- allocate (using malloc(3)) and initialize a process set
- get the process information list using readproctab(PROC_FILLSTAT)
- iterate through the list of process information and attach the corresponding process
- returns the set
readproctab(3)
This function is provided by libprocps, read its manual page for more information.
readproctab(PROC_FILLSTAT) will return an array of proc_t* terminated by NULL
The type proc_t is a structure type, it is described in /usr/include/proc/readproc.h, but concretely what we need is only the following fields:
- tgid which is (almost) a synonym for pid
- ppid the pid of the parent of the current process
Note that the manual page describes a function named freeproctab(3) but this function doesn't exist. We won't free the array obtained.
dot_out.c
The header file:
/* dot_out.h: produce dot file from process tree */ # ifndef DOTPSTREE_DOT_OUT_H_ # define DOTPSTREE_DOT_OUT_H_ # include <stdio.h> # include "process.h" void dot_output(FILE *out, struct process_set *set); # endif /* DOTPSTREE_DOT_OUT_H_ */
Implement the function dot_output(out, set). The parameter out is a stdio stream usable with fprintf(3) it will serve as an output for our dot.
Building the tree requires a simple traversal like a depth first one.
The function must build the whole dot file looking like the following example (this the file used to generate the tree at the beginning of this subject, using the command dot -Grankdir=LR):
digraph PSTree { 1 -> 22066; 22066 -> 22069; 22069 -> 22159; 22159 -> 22165; 22069 -> 22077; 22077 -> 22078; 22078 -> 22083; 22083 -> 24030; 22083 -> 23745; 22083 -> 23624; 22083 -> 23498; 22083 -> 23465; 22083 -> 22871; 22083 -> 22852; 22083 -> 22553; 22083 -> 22227; 22083 -> 22221; 22083 -> 22200; 22083 -> 20416; 22083 -> 19785; 22083 -> 19742; 22083 -> 19507; 22083 -> 18099; 22078 -> 22080; 22080 -> 22081; 22069 -> 22075; 22069 -> 22074; 1 -> 20680; 1 -> 19999; 1 -> 17082; 17082 -> 22448; 17082 -> 20725; 17082 -> 20670; 20670 -> 22446; 17082 -> 19990; 17082 -> 19981; 17082 -> 19976; 17082 -> 17099; 1 -> 16841; 1 -> 10646; 10646 -> 17134; 17134 -> 19942; 19942 -> 20019; 20019 -> 20372; 20372 -> 20668; 20372 -> 20662; 20372 -> 18163; 18163 -> 18164; 20372 -> 9630; 9630 -> 9631; 9631 -> 29337; 20372 -> 8660; 8660 -> 8661; 20019 -> 20022; 1 -> 8813; 1 -> 583; 1 -> 529; 1 -> 520; 1 -> 514; 1 -> 508; 1 -> 501; 501 -> 23894; 1 -> 498; 1 -> 475; 475 -> 513; 1 -> 474; 1 -> 473; 1 -> 471; 1 -> 464; 1 -> 462; 1 -> 461; 461 -> 463; 1 -> 388; 1 -> 294; 1 -> 248; }
dotpsrtee.c and Makefile
Here is a complete Makefile:
# Makefile CC=gcc CPPFLAGS= -MMD -D_XOPEN_SOURCE=500 CFLAGS= -Wall -Wextra -std=c99 -O0 -g LDFLAGS= LDLIBS= -lprocps SRC= process.c dot_out.c dotpstree.c OBJ= ${SRC:.c=.o} DEP= ${SRC:.c=.d} all: dotpstree dotpstree: ${OBJ} clean: ${RM} ${OBJ} ${RM} ${DEP} ${RM} dotpstree -include ${DEP} # END
And the main program:
/* dotpstree.c: a dot version of pstree using procps-ng */ # include <err.h> # include <stdlib.h> # include <stdio.h> # include "dot_out.h" # include "process.h" int main(int argc, char *argv[]) { struct process_set *set = make_process_set(); int has_file_output = 0; FILE *out = stdout; if (argc > 1) { out = fopen(argv[1], "w"); has_file_output = 1; if (!out) err(1, "can't open %s", argv[1]); } dot_output(out, set); if (has_file_output) fclose(out); process_set_delete(set); return 0; }
Advanced options
In this section we will extend our program with some extra features, if you want to go a little bit further, you can take a look at the man page for pstree(1).
In order to support those options, you need to add parameters to the command line of the program, you are free to manage them as you wish but the program must keep the original behavior when run with no parameter or with a single one. You will provide a file called BONUS with a quick description on how we can tests your bonus.
These extensions are bonus and thus less guided.
Display process name
So far we only use PID of process, it would be nice to have the command name.
In structure proc_t there's two fields that can be used for that: cmdline and cmd.
The first one is an array of strings corresponding to the argv array in the process and the second one is the basename of the command (the last element of command path, thus the command name). The cmd field seems appropriate, it's a static string (of max 16char) in the structure, you will extend our process structure in order to store that name and copy the content of cmd in your struct.
You can use that name when generating the dot output by adding for each process a line that may look like this:
1 [label="init"];
Selecting a user
Once your able to build the complete tree of processes, we can add a simple filter in order to only display process for a given user.
This a little bit more tricky than command name: we can pass some options to readproctab(3) in order to get only process matching a given user id, but in that case we probably won't have a tree but a forest. Implementing this bonus requires some change in the way we build the tree, but you can simplifies the problem by keeping a fake root in place of the process with pid 1. In fact, depending on how you implement the tree construction, it may works out-of-box …
In order to obtain processes from a given uid, you will need to use the following call to readproctab(3) (see also openproc(3) for more information, note that you can select more than one uid since it takes an array and a size):
tab = readproctab(PROC_FILLSTAT | PROC_UID, &uid, 1);
If you want to use a username rather than a uid number, you need getpwnam(3) (read the manual).