20180129:TP:C:TCP:Client

De wiki-prog
Aller à : navigation, rechercher


Introduction

The goal of this session is to write a simple application that fetch a web-pages using only socket.

Assignements

The tutorial must be commit to your git repo as usual, here is the directory structure:

  • 20180129_TCP_client/
    • AUTHORS
    • Makefile
    • get_page.c

Your code is in a single file get_page.c and your Makefile is able to build the binary file get_page with the usual flags.

Minimal HTTP Query

Here is a simple test in order to access a web-page, I choose to visit http://perdu.com :

> nc perdu.com 80
GET http://perdu.com HTTP/1.0
 
HTTP/1.1 200 OK
Date: Fri, 15 Jan 2016 13:29:25 GMT
Server: Apache
Last-Modified: Tue, 02 Mar 2010 18:52:21 GMT
ETag: "cc-480d5dd98a340"
Accept-Ranges: bytes
Content-Length: 204
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
 
<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1>
<h2>Pas de panique, on va vous aider</h2>
<strong><pre>    * <----- vous &ecirc;tes ici</pre></strong></body></html>

As you can see, the query start with the keyword GET followed by the u.r.l.and the protocol version (we use 1.0 that requires less work for our query.) What you can't see is that the line are terminated by the sequence \n\r, so the C string of my query looks like:

char *query = "GET http://perdu.com HTTP/1.0\n\r\n\r";

So, our first exercise is to build the query string and compute it's size:

  • Implement the following function:
char* build_query(const char *url, size_t *len);

build_query(url, &len) take the url string, builds the query string, returns the pointer to the string, this pointer must be freeable using free(3). The functions also puts the length of the query (number of characters without the terminating '\0') in the pointed variable len.

Connexion to the server

Getting the IP address

First, in order to connect to a server we need to get its IP address. Lots of code samples use gethostbyname(2) but this function is now deprecated and we must use getaddrinfo(3) (as presented in the lecture). Here is some code:

  struct addrinfo hints;
  struct addrinfo *result, *rp;
 
  // hints describes what we want
  // first fill the structure with 0s
  memset(&hints, 0, sizeof (struct addrinfo));
  // We only want IPv4, use AF_UNSPEC if you don't care which one
  hints.ai_family = AF_INET;
  // We want TCP
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags = 0;
  hints.ai_protocol = 0;
 
  // name and port are strings (domaine name and port, obviously)
  // result will contain the result
  addrinfo_error = getaddrinfo(name, port, &hints, &result);
 
  // Error management
  if ( addrinfo_error != 0 ) {
    errx(EXIT_FAILURE, "Fail getting address for %s on port %s: %s",
	 name, port, gai_strerror(addrinfo_error));
  }

result is a linked list of struct addrinfo (the next field is ai_next.) We now need to try each answer in the list and see if we succeed to connect.

Let's do It !

  • Write the following function:
void get_page(const char *name, const char *url, const char *port);

get_page(name, url, port) connects to the machine name on port port, build a query for the URL url, send it, read the answer and write it to the standard output.

You have to manage all communication aspects, properly first you need to send the query, then read everything send by the server and outputs it to STDOUT_FILENO, beware, the server will probably close the stream once it has send all he want, the answer is finished also when the server sends two ends of line ("\n\r\n\r".)

Here is an example of main:

int main() {
  get_page("perdu.com", "http://perdu.com", "80");
  return 0;
}

And the output of the full program:

> ./get_page 
HTTP/1.1 200 OK
Date: Fri, 15 Jan 2016 15:17:58 GMT
Server: Apache
Last-Modified: Tue, 02 Mar 2010 18:52:21 GMT
ETag: "cc-480d5dd98a340"
Accept-Ranges: bytes
Content-Length: 204
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
 
<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1>
<h2>Pas de panique, on va vous aider</h2>
<strong><pre>    * <----- vous &ecirc;tes ici</pre></strong></body></html>
Connexion closed by the server

New lines are just here for presentation, you can have a different output …

Here is a list of man pages you need to read:

  • asprintf(3)
  • bind(2)
  • close(2)
  • connect(2)
  • getaddrinfo(3)
  • memset(3)
  • read(2)
  • socket(2)
  • strncmp(3)
  • write(2)