A Gentle Introduction to TCP - Part 1
The TCP protocol is what we use to get computers to communicate. Computers on a network are identified by their IP address (a 4 byte integer). Each process using TCP on that computer is identified by a "port" (a 2 byte integer). The communication can only happen in pairs. One of the partecipants behaves as a "server" waiting for someone to connect, while the other behaves as "client", initiating the connections. Once the connection is established, client and server work the same way as far as TCP is concerned. In this post I'll show how to implement a simple TCP echo server which waits for a message and sends it back, and the client which sends a message.
The Server
To use TCP we need to use the socket interface provided by the OS. At first we'll look into the UNIX socket interface. Later we'll see the Windows one. This is the server:
1 | #define ADDR "127.0.0.1" |
2 | #define PORT 3000 |
3 | |
4 | int main(void) |
5 | { |
6 | int fd = socket(AF_INET, SOCK_STREAM, 0); |
7 | if (fd < 0) { perror("socket"); return -1; } |
8 | |
9 | struct sockaddr_in bind_buffer; |
10 | bind_buffer.sin_family = AF_INET; |
11 | bind_buffer.sin_addr = inet_addr(ADDR); |
12 | bind_buffer.sin_port = htons(PORT); |
13 | if (bind(fd, (struct sockaddr*) &bind_buffer, sizeof(bind_buffer))) { |
14 | perror("bind"); return -1; |
15 | } |
16 | |
17 | if (listen(fd, 8)) { |
18 | perror("listen"); return -1; |
19 | } |
20 | |
21 | while (1) { |
22 | int new_fd = accept(fd, NULL, NULL); |
23 | if (new_fd < 0) { |
24 | perror("accept"); return -1; |
25 | } |
26 | |
27 | char message[128]; |
28 | int len = recv(new_fd, message, sizeof(message)-1); |
29 | if (len < 0) { perror("recv"); return -1; } |
30 | |
31 | if (len > 0) |
32 | send(new_fd, message, len); |
33 | |
34 | close(new_fd); |
35 | } |
36 | |
37 | close(fd); |
38 | return 0; |
39 | } |
40 |
socket, perror
To use TCP we need to set up a "socket", a kernel object we create to keep track of states of new and old connections. A socket can be a listening socket or a connection socket. A listening socket is something required to create one or more connection sockets. The connection sockets represent the actual communication with another computer.
The socket
function creates the socket and returns an integer which identifies the socket object within the kernel. The integer must be non-negative to be valid, so if we get -1 it means an error occurred. If an error occurred, socket sets the errno
value. We call perror
to print out the function that failed and the text description of the errno
.
The socket
function takes three arguments. We are only interested in using TCP over IPv4, so the first argument must be AF_INET
to set IPv4 and the second must be SOCK_STREAM
to set TCP. The third argument is unused so we pass zero.
bind, dotted-decimal notation, and byte ordering
The socket we just created will be used as a "listener". In other words, it's the object that establishes new connections and produces new connection sockets representing them. Now we use the bind
function to specify which IP address and TCP port we are going to listen for connections on. The IP we specify here is not of other computers but our own. This is necessary as a computer could have multiple IP addresses. The way this works is we put the information in the sockaddr_in
structure, and then call bind
telling it to apply this configuration to the fd
socket.
1 | struct in_addr { |
2 | unsigned long s_addr; |
3 | }; |
4 | |
5 | struct sockaddr_in { |
6 | short sin_family; |
7 | unsigned short sin_port; |
8 | struct in_addr sin_addr; |
9 | char sin_zero[8]; |
10 | }; |
11 |
The bind
function does not know what type of address we are specifying, so once again we need to specify IPv4 by setting AF_INET
to the sin_family
field. Then we set the IP address to sin_addr
and the TCP port to sin_port
. If the function succeeds 0 is returned, else -1 and the errno
is set.
The IP address is represented by one 32 bit string, but usually we express it in dotted-decimal form. Each one of the four bytes is expressed in base 10 and a dot is used to separate the bytes:
127.0.0.1
the single bytes are represented in binary as:
127 -> 01111111
0 -> 00000000
0 -> 00000000
1 -> 00000001
so the actual bit string ends up being:
01111111000000000000000000000001
the inet_addr
perform this conversion for us.
Before setting the port value, we need to convert it to the "network byte order" using htons
.
By "byte order" we mean how CPUs store multi-byte values in memory. Lets say we have the number 1 stored in a 4 byte unsigned integer. Its binary representation is:
00000000 00000000 00000000 00000001
(i split the bits in groups of 8 to make it easier to read.)
Of course computers store things as bytes (groups of 8 bits), so this 4 byte number will span over 4 addressable memory cells. When we say we're storing the number at address 1000, what we really mean is we are storing its bytes from address 1000 to 1003. When we want its value back from the memory the CPU will read the four bytes into the original 32 bit value. Notice how it does not matter how the four bytes are stored. The CPU could mix up the order as long as it remembers that order next time it loads it and everything will work. Turns out, that's exactly what happens! Different computers (more precisely their CPUs) store bytes in different orders. There are two main ways to order bytes: little endian and big endian. The number 16975631 stored in 32 bits is logically represented as:
000000001 00000011 00000111 00001111
And here is how it's represented on different endianesses
1000 1001 1002 1003
00000001 00000011 00000111 00001111 big endian
1000 1001 1002 1003
00001111 00000111 00000011 00000001 little endian
Due to this difference, any time computers transmit to each other strings of bits representing multi-byte values, they need to coordinate on which byte ordering to use. Based on that, the computer with the other ordering will need to invert the bytes. In most network protocols the ordering used is big endian.
Because port numbers are two bytes wide, we need to make sure they are stored as big endian in the sockaddr_in
buffer. The htons
function stands for host-to-network short
and converts whichever byte ordering we (the host) have to big endian. If we are already big endian (unlikely) it does nothing. If we are little endian it inverts the bytes. There are similar functions to convert larger data types or convert the other way around like htonl
, ntohs
, and ntohl
.
listen
Now that the socket is configured we can start listening for new connections! We do so using the listen
function. The first argument is the listening socket we want to listen with. The second argument is the "backlog" size. The way TCP applications usually work is they accept one or more connections using the accept
function (more on that later) and then go do some work for a bit. When they're done they accept new connections. It is possible that new connections are established faster than the program can accept them, resulting in them "piling up". The backlog is the maximum count of connection requests that can wait before being rejected. For instance, if we use a backlog of 3 and we receive 4 connections, the last one will be rejected. This value does not need to be high as programs usually accept connections very fast. Still I like to set something greater than 1. As for bind
, listen
returns 0 on success, else -1 and sets errno
.
accept
Now that we brought the socket to the listening state we can accept new connections. We use the accept
function for this. The way this works is when this function is called, the program stops and waits for incoming connections. When a new connection request arrives the kernel automatically creates a socket object and returns its descriptor as the return value of accept
. If the return value is -1, the function failed (and errno
is set). The first argument is the listening socket we want to accept a connection from, and the other two are "optional" arguments. They return the address of the machine we are talking to. If we are not interested we can keep them NULL
.
send, recv, and close
Once we acquired a valid file descriptor from accept
we can read bytes coming from the other computer using the recv
function and send bytes using the send
function. You can also use read
and write
.
The recv
function takes as first argument the connection socket. The second and third arguments are the destination buffer's location and size. The last argument is used to specify different options. You usually keep it to zero. The function returns the number of bytes received and written to the destination buffer, 0 if the connection was closed by the other machine, and -1 if an error occurred (and errno
is set). The read
call is equivalent to recv
with no options set. When successful this function can return any number of bytes from 1 to the destination buffer's size.
The send
function is analogous to recv
. The first argument is the socket, the second and third arguments specify the source buffer, and the last argument is for extra options. You usually set it to zero. This function returns the number of read bytes on success or -1 when it fails (it sets errno
). The number of sent bytes can be anything from 0 to the size of the source buffer. In practice, you won't get the 0 return value unless the source buffer has a length of 0.
When you're done talking with the other machine you can tell the kernel to delete the socket object by calling close
on its descriptor. You can do this on a specific connection or on the listening socket if you're not interested in establishing new connections. If you close the listener, the connections that were already established will continue working. The close
function also can fail, but I rarely find it useful to check for errors here.
The client
This is the client:
1 | #define SERVER_ADDR "127.0.0.1" |
2 | #define SERVER_PORT 3000 |
3 | |
4 | int main(void) |
5 | { |
6 | int fd = socket(AF_INET, SOCK_STREAM, 0); |
7 | if (fd < 0) { perror("socket"); return -1; } |
8 | |
9 | struct sockaddr_in connect_buffer; |
10 | connect_buffer.sin_family = AF_INET; |
11 | connect_buffer.sin_addr = inet_addr(SERVER_ADDR); |
12 | connect_buffer.sin_port = htons(SERVER_PORT); |
13 | if (connect(fd, &connect_buffer, sizeof(connect_buffer))) { |
14 | perror("listen"); return -1; |
15 | } |
16 | |
17 | char message[] = "Hello!"; |
18 | int res = send(fd, message, sizeof(message)-1); |
19 | if (res < 0) { perror("send"); return -1; } |
20 | |
21 | printf("Sent: %s\n", messaage); |
22 | |
23 | char reply[128]; |
24 | int len = recv(fd, reply, sizeof(reply)); |
25 | if (len < 0) { perror("recv"); return -1; } |
26 | reply[len] = '\0'; |
27 | |
28 | if (len > 0) { |
29 | printf("Received: %s\n", reply); |
30 | } |
31 | |
32 | close(fd); |
33 | return 0; |
34 | } |
35 |
The differences are:
- Calling
bind
is optional - We use
connect
instead oflisten/accept
- There is no loop
socket
The socket creation is the same as the server. The only difference is that we create a connection socket (not a listener) from the start. This socket is not yet active. We need to use the connect
function to actually establish a connection with a specific machine.
Not using bind
The call to bind
associates a specific IP address and port to our socket. Without it, the OS would just give us a random port number! This is necessary for server so that clients know which port to connect to. If the server choose used a random port whenever it started up, no clients could ever connect to it. This is not the case for clients, so we can use whichever port the OS decides to gives us.
connect
The connect
function takes a socket descriptor as first argument. This is the socket that will be used to connect. The second and third arguments are used to specify the location of the machine to connect to. We need to fill in the sockaddr_in
just like we did for the server, with the only exception that the information is relative to the other machine, not ourselves. If everything went OK and the connection is established, 0 is returned. If someting went wrong, -1 is returned and errno
is set. From this point on, it's possible to use recv
and send
on that connection.
Disclaimer
This is just an example program. Here is a list of changes I would make if I had to use this in the real world:
- Don't abort the program when
accept
fails - Don't assume the entire message is received in one
recv
call. You need to call it in a loop to make sure everything is received and properly handle interruptions (seeEINTR
) - Don't assume
send
handles the entire buffer in one call. You need to call it in a loop to make sure everything is sent and properly handle interruptions (seeEINTR
) - Use non-blocking sockets to avoid stalling for slow (or malicious) machines