Next: Questions

Sockets

All the IPC mechanisms discussed so far have only been useful for communication between processes on the same machine. With the modern trend towards networks, it is important for Linux to support an IPC mechanism which allows it to be fully network compatible. Sockets is that mechanism.

Sockets made a first appearance in early versions of BSD UNIX. Here the designers tried to build a very general mechanism that would be able to accommodate many network protocols. To a large extent the desiguers were successful but, in order to allow all the necessary flexibility, programming with sockets requires the specification of values for a number of variable parameters. In order to make more sense of some of the parameters, it is necessary to look first at some of the underlying network principles.

The Internet Protocol (IP) sits on top of lower level software protocols and the network hardware, and hides these network structures from the application software.

This means that applications do not need to know what kind of network hardware they are using (Ethernet, serial lines, parallel lines).

IP is what is known as a connectionless service, which means that the network doesn't have any fixed data connections between particular machines, just lots of machines all conceptually connected on a common highway. In order to make message routing possible, each interface on each host machine on the network is given a unique IP address. Each address is just a 32-bit number, usually given as four 8-bit values, printed in decimal and with periods (full stops) between (for example 194.61.21.6). If you are only using a stand alone machine, or your own local network, then you are free to choose your own IP addresses. If, however, you will be connecting your machine to a network that has global Internet access, you will need to have a globally registered Internet address allocated to you by your network administrator. Each IP data packet (called a datagram) transmitted on the network carries its source and destination IP address information with it, and may get routed through a number of machines to reach its destination. Because each IP datagram is carried separately, maybe even over different routes, the connectionless service does not guarantee that the packets will arrive in the same order as they were sent, or even that they will arrive at all.

The User Datagram Protocol

At the IP level, the addresses in the datagrams only specify machine interface IP numbers; they do not specify any particular processes or applications on those machines. To overcome this problem, the User Datagram Protocol (UDP) sits on top of the basic IP layer and provides a set of communication end points called ports, each identified by a small integer value. The UDP message packets contain both source and destination port numbers which are used to allow the UDP software at the destination to deliver an incoming message to the correct process, and for that process to be able to return a reply to the original sender.

Using UDP, each message packet is a self-contained entity. This means that each packet of data also carries with it all the information required to deliver it, so that a sequence of packets, transmitted from a particular process via a given UDP port, need not all have the same destination.

UDP simply uses the underlying Internet Protocol to send its messages, and so it, too, provides the same, unreliable, connectionless message passing as IP. Unreliable, here, is not intended to mean that messages are unlikely to get through, just that it is not guaranteed.

The Transmission Control Protocol

At the application program level, there is often a need to transfer large quantities of data between two fixed processes. In this kind of scenario, using an unreliable connectionless protocol means that application programs would have to have sophisticated error detection and correction mechanisms built into them. In order to avoid this unnecessary overhead on the design and construction of application programs, what is required is a reliable protocol which guarantees the delivery of data between two fixed processes and also guarantees that the data will be received in the same order that it was sent. The Transmission Control Protocol (TCP) is the standard solution.

Like UDP, TCP also sits on top of the basic IP layer and provides protocol ports to allow multiple-destination processes on a single machine. Unlike UDP, however, TCP is a connection based protocol. This means that both ends of the communication link must agree to participate before any data can be exchanged, and that sequences of packets transmitted at one end are all intended for the same destination. As all the data packets sent over a particular TCP connection are intended for the same destination, and their order of arrival is guaranteed to be the same as their order of transmission, there is no need for any kind of message boundaries to be observed by the protocol itself. So, for example, using the read() and write() system calls, a message of 100 bytes transmitted by the sender may legitimately be received at the destination as four separate 25-byte messages. This means that it is up to the processes at each end of the connection to impose any required structure on the data transmitted, as it is not necessarily preserved by TCP.

A socket is a communication end point and as such can be mapped quite nicely onto a UDP or TCP port. Although these are not the only possible mappings they are by far the most common and are the ones considered in most detail here.

Clients and Servers

A server is a process that offers some kind of service to other processes, and clients are the server's customer processes. In the case where the client and server are to communicate using TCP, the server will usually open its end of the link and passively wait for connections to arrive. The client, on the other hand, will take the active role in opening the connection to the server.

Once a TCP-socket connection between two processes has been established, the end points of the connection are made to perform like ordinary Linux files, so that standard system calls, like read() and write(), can be used to transfer the data. To make this possible, sockets have descriptors associated with them analogous to the file descriptors associated with ordinary files. The main difference between socket descriptors and file descriptors is that the socket descriptors do not automatically have specific destination addresses bound to them when the socket is created, whereas file descriptors are bound to specific file names by the open() system call. To enable the read() and write() system calls to be used in both cases like this, it is necessary for Linux to ensure that socket and file descriptors are in the same numeric range and that there is no duplication of numeric values

in a given application. This is done by allocating them both from the same per process table.

Creating a Socket

Sockets are created using the socket() system call. The socket() call takes three parameters: family, type and protocol, and returns an integer socket descriptor as its return value. The general form of the call is:

	sd = socket(family, type, protocol)

The family parameter names the address family to use with the socket. The address family specifies the format for addresses when they are given. The two main families are AF_INET where addresses are given as 32-bit IP addresses (standard Internet format) and AF_UNIX where the addresses are path names in the Linux file system.

The type parameter specifies the type of communication required. The main choice is between a connectionless datagram (UDP type) service, called SOCK_DGRAM and a reliable, connection based (TCP type) service, called SOCK_STREAM. If only from the point of view of simplicity and reliability, most home-grown applications will be written using SOCK_STREAM.

In the majority of cases, specifying the address family and socket type is sufficient to determine the communication protocol to use over the new socket. In these cases, the protocol parameter is given the value zero so that a default protocol will automatically be selected. Protocol values other than zero are normally only used when making direct, low level, access to hardware interfaces.

If the socket() call is successful, the return value is a socket descriptor of type int. A -1 is returned on error. The socket descriptor is used as a parameter to other calls in setting up socket connections.

The standard close() system call should be used on a socket descriptor to shut it down at the end of a session.

Binding a Local Address

When a socket is first created, it has no specific address associated with it and so remote processes have no way to refer to it. The format of an address for a particular socket will depend upon the address family that was specified when the socket was created. For example, in the Internet address family (AF_INET), a socket address would be a machine interface IP address and a port number on that machine.

Very often a client process will not care what port number is assigned to its socket and will allow the system to pick one. Server processes, on the other hand, need to be able to specify their addresses to the system because they will usually be operating at a fixed port number which is known to their clients and which the clients will need to use to establish a connection.

To bind a specific address to a socket, a server process uses the bind() system call, as follows:

	bind(sd, address, addrlen)

The sd parameter is just the socket descriptor that was returned by the socket() call. The address parameter is a pointer to a structure that contains an address of an appropriate type for the socket type being bound, and the addrlen parameter specifies the length of the address, in bytes. This arrangement is necessary in order to permit the wide variety of address formats that may be encountered between different address families.

Clients Connecting to a Server

In connection based (TCP type) communication links, a client process needs to initiate the connection to the server. In order to do this, the client needs to know the address and port number that the server will use to provide its service. With this information, the client uses the connect() system call:

	connect(sd, address, addrlen)

The sd parameter is the socket descriptor of the client's local socket. The address parameter specifies the address that the server is using on the destination machine, addrlen being the length of the specified address, in bytes.

If the client's socket has not been bound to a specified local address when the connect() call is made, then the system will automatically pick a local address and an appropriate local port number and bind them to the client's socket.

The connect() call returns the value 0 if a connection is successfully established or the value -1 on error. Once a client has successfully established a connection, the socket can then be used to send and receive data. There are several possibilities here, the most common being the use of the read() and write() system calls. These are exactly the same calls as are used on files but are given the local socket descriptor as their first parameter.

Setting up a Server

Once a server process has used the socket() and bind() system calls to create itself a socket and bind a local address and port to the socket, it then needs to prepare the socket to take incoming connection requests. The listen() system call is used to perform this function. At the same time, this system call is also used to set the size of a queue that Linux will use to allow queuing of simultaneous connection requests from multiple clients. The general form of the listen() system call is:

	listen(sd, qlen)

where the sd parameter is the socket descriptor returned by a socket() call and qlen is the permitted size of the queue given as the maximum number of pending client connection requests.

Once a socket has been set up in this manner, the server now only needs to wait for a connection. To do this it uses the accept() system call as follows:

	newsd = accept(sd, address, addrlen)

Once again, the sd parameter is the socket descriptor associated with the server's socket. The address parameter is a pointer to a socket address structure which will be filled in by the accept() call with the address and port number bound to the client's socket when a connection is established. The addrlen parameter is a pointer to an integer which is also filled in by the accept() call to give the size, in bytes, of the client's address.

The return value from accept() is a new socket descriptor which has as its destination the established client, and which will be used in all further communication with that client. The original socket descriptor that was passed as a parameter to accept() is still open at this point and may be used again in another call to accept() to establish a connection with another client process.

Once a socket descriptor has been returned by accept(), the server can communicate with the new client using the normal read() and write() system calls.

Blocking and Non-blocking Sockets

By default, a call to accept() will block if no client process is waiting to establish a connection. This means that the accept() call does not return until there is a client ready to communicate. Consequently, the server process cannot proceed with any other tasks until a connection is made.

Sometimes it is required that a server just look to see if a client is waiting to connect, accepting the connection if it is, but getting on with something else if there is no pending connection. This is done by making the socket non-blocking so that the accept() call returns immediately, even if no client is waiting. The fcntl() system call is used to set or reset the O_DELAY flag on the associated socket descriptor to achieve this result. First get the flags associated with the socket descriptor:

	flags = fcntl(sd, F_CETFL);

And then set O_NDELAY as follows:

	fcntl(sd, F_SETFL,  flags | O_NDELAY);

Or to reset O_NDELAY:

	fcntl(sd, F_SETFL, flags&~O_NDELAY);

Connectionless Communication

In addition to connection based communication, there is also support for connectionless arrangements based on UDP datagram sockets. These sockets are created in the same way as stream based sockets except that the type parameter in the socket() call is SOCK_DGRAM instead of SOCK_STREAM. Once the datagram socket is created, if a particular local address needs to be used it can be bound to the socket with the bind() system call. If the use of bind() is required then this must precede the first use of the socket for data transmission. If bindO is not used then the system will automatically allocate a local address and port number the first time data is sent.

A process should use the system calls sendto() and recvfrom() for sending and receiving data, because they include a parameter to hold the address and port number of the communication partner in their parameter lists. The general form of the sendto0 call is:

	sendto(sd, buf, len, flags, addr, addrlen)

where sd, buf and len have the same types and actions as the parameters in the write() system call. The flags parameter, whose values are defined in the header file <linux/socket.h> is used for several special message options. The addr parameter is a pointer to a structure containing the destination address and port, and addrlen is the size, in bytes, of the address structure.

The recvfrom() system call has the general form:

	recvfrom(sd, buf, len, flags, addr, lenptr)

Its parameter list is similar to the list for sendto(), with sd, buf and len acting like the read() system call. The addr and lenptr parameters are pointers to an address structure and an integer respectively and are filled in by the recvfrom() call with the appropriate information about the communication partner.

Datagram sockets can also use the connect() system call. In this case there is no attempt to establish a connection with another process; the connect() call is just used to specify an association between a datagram socket and a particular destination address and port. Once connect() has been used, any data sent on the socket is automatically routed to the associated destination. This allows the use of the send() and recv() system calls for passing data. The general form of these calls is:

	send(sd, buf, len, flags) 
	recv(sd, buf, len, flags)

These two calls are exactly the same as sendto() and recvfrom(), but without the last two parameters, as this information has already been set up by connect(). The send() and recv() calls can also be used for communication on connection based sockets, as here a process's communication partner has already been permanently established.

Socket Support Calls

In addition to the main set of system calls associated with socket use, which you have now seen, there are a number of support functions which also need a mention. These are related to hostnames, port numbers, standard network services and network byte ordering.

The gethostname() system call allows processes to have access to the local hostname. The form of the call is:

	gethostnane (mane, len)

Where name is a pointer to a char array that will hold the host's domain name, and len is the maximum name length that the array can hold. For privileged (EUIDO) processes, sethostname() can be used to set the local hostname to a specified string.

Once you have a host's domain name available you need to be able to get its IP address or vice versa. The library functions gethostbyname() and gethostbyaddr() perform these tasks. The general form of these calls is:

	hostent = gethostbyname(name)
	hostent = gethostbyaddr(addr, len, type)

The name parameter is a pointer to a char array containing the host's domain name. The addr parameter is a pointer to a host address of length len and address type type. Both calls return a pointer to a struct hostent, defined in <netdb.h> which contains name, address and address type information for the required host. There are a set of well-known ports for all the standard network services, and to find the port associated with a particular service name and protocol there is a standard library function called getservbyname(). This has the general form:

	servent = getservbyname(nane, protocol)

The name parameter is a pointer to a string containing the name of the required service and the prototype parameter is a pointer to a string containing the appropriate communication protocol (udp or tcp). This function looks up the required information in the file /etc/services and returns a pointer to a structure of type struct servent filled in appropriately. The servent structure is declared in <netdb.h>.

Different machines with different processor architectures store integer values in several different ways. The most important difference as far as networking is concerned is the order in which the bytes are stored in an integer. Some processors, including the 80x86 family, store the bytes in an integer with the low byte first. This means that, on these machines, a pointer to an integer is actually pointing at the least significant byte in the multi-byte value. Other processors store the bytes that make up an integer, high byte first so that an int* in this case points to the high byte of the multi-byte value. If steps were not taken to avoid the problem, then machines of opposite types would have trouble communicating integer values to each other over a network link. The solution to this problem is to agree that all machines will send multi-byte values to each other one way or the other. In fact, the agreed standard is to send high byte first. This means that the low byte first machines (80x86 and Linux included) need to reverse their byte order whenever integers are input from or output to the network. To do this conversion, four functions are provided, two to input and output 16-bit values (short) and two more for the input and output of 32-bit values (long). The functions are.

	netval = htonl(hostval) - host to network long; 
	netval = htons (hostval) - host to network short; 
	hostval = ntohl (netval) - network to host long; 
	hostval = ntohs (netval) - network to host short.

All machines supply these functions. On machines that already use the correct byte ordering they are just dummy functions, or macros that do nothing. They are still supplied, however, so that code written on one machine can be portable, even to machines with the opposite processor architectures.

Obviously, if you only want to transmit streams of bytes (such as strings) then no use of these functions is required. More information on these and related calls are available in sections 2 and 3 of the manual pages.

Next: Questions