20180129:TP:C:TCP:Client
Sommaire
Introduction
The goal of this session is to write a simple application that fetch a web-pages using only socket.
Assignements
The tutorial must be commit to your git repo as usual, here is the directory structure:
- 20180129_TCP_client/
- AUTHORS
- Makefile
- get_page.c
Your code is in a single file get_page.c and your Makefile is able to build the binary file get_page with the usual flags.
Minimal HTTP Query
Here is a simple test in order to access a web-page, I choose to visit http://perdu.com :
> nc perdu.com 80 GET http://perdu.com HTTP/1.0 HTTP/1.1 200 OK Date: Fri, 15 Jan 2016 13:29:25 GMT Server: Apache Last-Modified: Tue, 02 Mar 2010 18:52:21 GMT ETag: "cc-480d5dd98a340" Accept-Ranges: bytes Content-Length: 204 Vary: Accept-Encoding Connection: close Content-Type: text/html <html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1> <h2>Pas de panique, on va vous aider</h2> <strong><pre> * <----- vous êtes ici</pre></strong></body></html>
As you can see, the query start with the keyword GET followed by the u.r.l.and the protocol version (we use 1.0 that requires less work for our query.) What you can't see is that the line are terminated by the sequence \n\r, so the C string of my query looks like:
char *query = "GET http://perdu.com HTTP/1.0\n\r\n\r";
So, our first exercise is to build the query string and compute it's size:
- Implement the following function:
char* build_query(const char *url, size_t *len);
build_query(url, &len) take the url string, builds the query string, returns the pointer to the string, this pointer must be freeable using free(3). The functions also puts the length of the query (number of characters without the terminating '\0') in the pointed variable len.
Connexion to the server
Getting the IP address
First, in order to connect to a server we need to get its IP address. Lots of code samples use gethostbyname(2) but this function is now deprecated and we must use getaddrinfo(3) (as presented in the lecture). Here is some code:
struct addrinfo hints; struct addrinfo *result, *rp; // hints describes what we want // first fill the structure with 0s memset(&hints, 0, sizeof (struct addrinfo)); // We only want IPv4, use AF_UNSPEC if you don't care which one hints.ai_family = AF_INET; // We want TCP hints.ai_socktype = SOCK_STREAM; hints.ai_flags = 0; hints.ai_protocol = 0; // name and port are strings (domaine name and port, obviously) // result will contain the result addrinfo_error = getaddrinfo(name, port, &hints, &result); // Error management if ( addrinfo_error != 0 ) { errx(EXIT_FAILURE, "Fail getting address for %s on port %s: %s", name, port, gai_strerror(addrinfo_error)); }
result is a linked list of struct addrinfo (the next field is ai_next.) We now need to try each answer in the list and see if we succeed to connect.
Let's do It !
- Write the following function:
void get_page(const char *name, const char *url, const char *port);
get_page(name, url, port) connects to the machine name on port port, build a query for the URL url, send it, read the answer and write it to the standard output.
You have to manage all communication aspects, properly first you need to send the query, then read everything send by the server and outputs it to STDOUT_FILENO, beware, the server will probably close the stream once it has send all he want, the answer is finished also when the server sends two ends of line ("\n\r\n\r".)
Here is an example of main:
int main() { get_page("perdu.com", "http://perdu.com", "80"); return 0; }
And the output of the full program:
> ./get_page HTTP/1.1 200 OK Date: Fri, 15 Jan 2016 15:17:58 GMT Server: Apache Last-Modified: Tue, 02 Mar 2010 18:52:21 GMT ETag: "cc-480d5dd98a340" Accept-Ranges: bytes Content-Length: 204 Vary: Accept-Encoding Connection: close Content-Type: text/html <html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1> <h2>Pas de panique, on va vous aider</h2> <strong><pre> * <----- vous êtes ici</pre></strong></body></html> Connexion closed by the server
New lines are just here for presentation, you can have a different output …
Here is a list of man pages you need to read:
- asprintf(3)
- bind(2)
- close(2)
- connect(2)
- getaddrinfo(3)
- memset(3)
- read(2)
- socket(2)
- strncmp(3)
- write(2)