Online documentation - Websydian v6.0

Users Guide | Patterns Reference | WebsydianExpress | Search

Basic Internet Technology

The basic structure of the Internet
IP - The Internet Protocol
TCP - The Transport Control Protocol
Internet Applications and Application-level Protocols

The Internet - a Network of Networks

The Internet is a network of networks. Fundamentally, it consists of a large number of Local Area Networks (LANs) that have been connected to each other. LANs are the networks that organizations use to connect their computers. Computers that are connected by LANs are usually all located in the same building (hence the name), or in a few buildings which are located very close to each other. There is lots of competing and mutually incompatible LAN technologies; the most popular are Novell NetWare and various Microsoft and IBM products. Nowadays everything seems to be converging towards Internet technology. Technically, the LANs are incompatible because they used different communications protocols. Most of them were designed for local area networks only, and hence they do not work very well for larger networks, when the computers they connect are separated by large distances or where network bandwidth is scarce.

TCP/IP

TCP/IP is the name of a layered set of protocols designed to work well both in local area networks and in wide area networks. TCP/IP technology is what makes the internet work, and it is slowly but surely replacing proprietary LAN technology in pretty much all computer networks in the world.

Internet Protocol Address

TCP/IP is based on the notion of multiple, connected networks. Each network using this technology is assigned a unique network number. In theory, every single network in the world has its own network number. Networks that are not connected to other networks can use any network number they please. Each computer connected to a network is assigned a unique host number. The host number must be unique within the network to which the computer is connected. The combination of a network number and a host number is called an Internet Protocol address; IP address for short.

Two computers can communicate, if

they know each others network and host numbers, and
the two networks to which the computers are connected and can reach each other.

TCP/IP is really the name of an entire suite of protocols, of which the two most important are the Internet Protocol (IP) and the Transport Control Protocol (TCP). Because of their importance, they are discussed briefly in the following sections.

Internet Protocol

The Internet Protocol is the most fundamental in TCP/IP. Its job is to move individual pieces of data from one computer (host or network node, in network terminology) to another. This is achieved using datagrams which are packets of information consisting of

the IP addresses of the sender and recipient of the datagram,
some data (the payload of the datagram), and
a simple checksum which allows the recipient of the datagram to check that the data was not corrupted during transport.

There are a few other things in a datagram, but they are not important for this discussion.

There is an upper bound on how much data a single IP datagram can carry. When large amounts of data is to be transmitted, the data is split into several IP datagrams, each of which carry part of the data. On the way from one network node to another, IP datagrams move independently of each other. Two datagrams sent immediately after each other may in fact take two different paths to the destination host.

Everything you see when you use the Internet, is achieved using IP. It is, in other words, the way you move data across the Internet. There are no other ways.

Transport Control Protocol

Clearly, IP has a very fragmented view of networking. All you can do with it on its own is to send datagrams between hosts (network nodes) on the network. What you want to do in most applications is to establish connections between processes on computers. That is exactly what the Transport Control Protocol (TCP) does. It uses the services provided by IP to create what appers to be a permanently established connection (i.e., the connection remains until it is explicitly disconnected).

TCP allows a process (i.e. a running program) to send and receive bytes (octets, in networking terminology, since a byte is an eight bit unit) to each other. TCP ensures that if one process sends a sequence of octets, then that is what the other process will receive at the other end (i.e. exactly those octets and in the sequence they were sent).

TCP does all the work to make this happen, and that is in fact a bigger job than it might seem at first. The reason that this is a considerable job is, that IP does not guarantee delivery of datagrams. If the underlying telecommunications links used by IP to move the datagrams across the internetwork become congested, then they may start dropping datagrams. Since IP datagrams are completely independent of each other, there is no way that IP can detect that a datagram was lost in transit. TCP, on the other hand, keeps track of the datagrams, retransmitting any that were lost, delivering them to its client process (i.e., the process that uses the connection) in the correct order, etc.

The services provided by TCP are so valuable that nobody uses IP on its own.

We have now seen that the relationship between IP and TCP is one of abstraction. TCP uses the services of IP to provide a better service at a higher level of abstraction (a permanent, reliable connection rather than just a bunch of datagrams). Likewise, IP uses the underlying (telecommunications) data links to provide its end-to-end datagram service. The relationship is illustrated by the figure below showing the layered structure of TCP/IP.

The TCP/IP stack

Data is actually only being transferred at the lowest level. At all the higher levels, data is always simply passed to a lower level (indicated in the figure above by the small curved arrows). However, the layers communicate with the corresponding layer at the other end of the connection, meaning that the data handed from e.g. the TCP layer to the IP layer in the left side of the figure is identical with data handed by the IP layer the the TCP layer on the right side.

Application-Level Protocols

So TCP provides this wonderful service, but why and to whom? The answer is: To applications which need inter-host communication. Software developers might use the services of TCP to implement any communication functionality they want. Anything required to achieve the results necessary must be implemented by software developers, i.e. TCP/IP does not provide any further services.

Generally, implementing an application which is distributed across multiple hosts that communicate across a network amounts to implementing a communications protocol.

A communications protocol is simply a set of rules for communicating. It describes in excruciating detail how two computer programs can exchange information in a reliable manner. Implementing a system that uses TCP, fundamentally implies specifying how the two processes at each end of the TCP connection are to communicate. This is a relatively difficult task, because the two processes operate independently of each other and because all sorts of error scenarios are possible (e.g. one or the other host could crash or slow down to a crawl, the underlying data links could become congested or be completely lost, etc.). Accounting for (and - if possible - recovering from) all possible error conditions in communications protocols is inherently hard, so few people or organizations have the skills and resources to successfully design, implement, and deploy them. Because of this inherent difficulty, major new protocols appear only quite rarely.

TCP/IP allows computers with widely varying operating systems and application software to communicate because the protocols completely specify how two processes are to communicate. Hence, anybody can implement systems which behave as required by the protocol specification. For this to be possible, the protocols specifications must be publicly available. Only protocols whose specifications are widely available can be widely implemented and hence widely used.

Certain applications (and hence the corresponding communications protocols) were implemented almost immediately after the design of TCP/IP was complete. They include:

File Transfer Protocol (FTP), a protocol for browsing directories and files in a remote machine, and uploading and downloading files.
Telnet, a protocol for logging into a remote multi-user host.
Simple Mail Transfer Protocol (SMTP), a protocol for exchanging e-mail messages between hosts.
Net News Transfer Protocol (NNTP), a protocol for exchanging and managing distributed discussion groups between hosts.

All of these protocols have freely available implementations that almost all larger internet sites use (with the exception of SMTP, which have a significant number of commercial implementations).

The above protocols are almost unchanged since their very first official releases, which covers a timespan of about twenty years. That is a very long time in computing (and even more so in networking).

The most important new TCP/IP-based application to come along this decade is the World Wide Web (WWW) which defined the hypertext Transfer Protocol (HTTP) to transfer documents between hypertext servers (web servers) to hypertext display clients (web browsers).

This application, and the communications protocol on which it is based, is the subject of the next part of this introduction to Websydian and Internet technology.

Proceed with next section World Wide Web Technology.