The Node.js platform itself is billed as a solution for writing fast and scalable network applications. To write network-oriented software, you need to understand how networking technologies and protocols interrelate. Over the course of the next section, we explain how networks have been designed around technology stacks with clear boundaries; and furthermore, how Node implements these protocols and what their APIs look like.
Here you’ll learn about how Node’s networking modules work. This includes the dgram, dns, http, and net modules. If you’re unsure about network terminology like socket, packet, and protocol, then don’t worry: we also introduce key networking concepts to give you a solid foundation in network programming.
This section is an introduction to networking. You’ll learn about network layers, packets, sockets—all of the stuff that networks are made of. These ideas are critical to understanding Node’s networking APIs.
Networking jargon can quickly become overwhelming. To get everyone on the same page, we’ve included table 7.1, which summarizes the main concepts that will form the basis of this chapter. To understand Node’s networking APIs, it’s crucial to learn about layers, packets, sockets, and all the other things that networks are made of. If you don’t learn about the difference between TCP (Transmission Control Protocol) and UDP (User Datagram Protocol), then it would be difficult for you to know when to use these protocols. In this section we introduce the terms you need to know and then explore the concepts a bit more so you leave the section with a solid foundation.
If you’re responsible for implementing high-level protocols that run on top of HTTP or even low-latency game code that uses UDP, then you should understand each of these concepts. We break each of these concepts down into more detail over the next few sections.
The stack of protocols and standards that make up the internet and internet technology in general can be modeled as layers. The lowest layers represent physical media—Ethernet, Bluetooth, fiber optics—the world of pins, voltages, and network adapters.
As software developers, we work at a higher level than lower-level hardware. When talking to networks with Node, we’re concerned with the application and transport layers of the Internet Protocol (IP) suite.
Layers are best represented visually. Figure 7.1 relates logical network layers to packets. The lower-level physical and data-link layer protocols wrap higher-level protocols.
Packets are wrapped by protocols at consecutive layers. A TCP packet, which could represent part of a series of packets from an HTTP request, is contained in the data section of an IP packet, which in turn is wrapped by an Ethernet packet. Going back to figure 7.1, TCP packets from HTTP requests cut through the transport and application layers: TCP is the transport layer, used to create the higher-level HTTP protocol. The other layers are also involved, but we don’t always know which specific protocols are used at each layer: HTTP is always transmitted over TCP/IP, but beyond that, Wi-Fi or Ethernet can be used—your programs won’t know the difference.
Figure 7.2 shows how network layers are wrapped by each protocol. Notice that data is never seen to move more than one step between layers—we don’t talk about transport layer protocols interacting with the network layer.
When writing Node programs, you should appreciate that HTTP is implemented using TCP because Node’s http module is built on the underlying TCP implementation found in the net module. But you don’t need to understand how Ethernet, 10BASE-T, or Bluetooth works.
You’ve probably heard of TCP/IP—this is what we call the Internet Protocol suite because the Transmission Control Protocol (TCP) and the Internet Protocol (IP) are the most important and earliest protocols defined by this standard.
In Internet Protocol, a host is identified by an IP address. In IPv4, addresses are 32-bit, which limits the available address space. IP has been at the center of controversy over the last decade because addresses are running out. To fix this, a new version of the protocol known as IPv6 was developed.
You can make TCP connections with Node by using the net module. This allows you to implement application layer protocols that aren’t supported by the core modules: IRC, POP, and even FTP could be implemented with Node’s core modules. If you find yourself needing to talk to nonstandard TCP protocols, perhaps something used
internally in your company, then net.Socket and net.createConnection will make light work of it.
Node supports both IPv4 and IPv6 in several ways: the dns module can query IPv4 and IPv6 records, and the net module can transmit and receive data to hosts on IPv4 and IPv6 networks.
The interesting thing about IP is it doesn’t guarantee data integrity or delivery. For reliable communication, we need a transport layer protocol like TCP. There are also times when delivery isn’t always required, although of course it’s preferred—in these situations a lighter protocol is needed, and that’s where UDP comes in. The next section
examines TCP and UDP in more detail.
Datagrams are the basic unit of communication in UDP. These messages are selfcontained, holding a source, destination, and some user data. UDP doesn’t guarantee delivery or message order, or offer protection against duplicated data. Most protocols you’ll use with Node programs will be built on TCP, but there are times when UDP is
useful. If delivery isn’t critical, but performance is desired, then UDP may be a better choice. One example is a streaming video service, where occasional glitches are an acceptable trade-off to gain more throughput.
TCP and UDP both use the same network layer—IP. Both provide services to application layer protocols. But they’re very different. TCP is a connect-oriented and reliable byte stream service, whereas UDP is based around datagrams, and doesn’t guarantee the delivery of data.
Contrast this to TCP, which is a full-duplex1 connection-oriented protocol. In TCP, there are only ever two endpoints for a given connection. The basic unit of information passed between endpoints is known as a segment—the combination of a chunk of data along with a header. When you hear the term packet, a TCP segment is generally
being referred to.
Although UDP packets include checksums that help detect corruption, which can occur as a datagram travels across the internet, there’s no automatic retransmission of corrupt packets—it’s up to your application to handle this if required. Packets with invalid data will be effectively silently discarded.
Every packet, whether it’s TCP or UDP, has an origin and destination address. But the source and destination programs are also important. When your Node program connects to a DNS server or accepts incoming HTTP connections, there has to be a way to map between the packets traveling along the network and the programs that generated them. To fully describe a connection, you need an extra piece of information. This is known as a port number—the combination of a port number and an address is known as a socket. Read on to learn more about ports and how they relate to sockets.
The basic unit of a network, from a programmer’s perspective, is the socket. A socket is the combination of an IP address and a port number—and there are both TCP and UDP sockets. As you saw in the previous section, a TCP connection is full-duplex—opening a connection to a given host allows communication to flow to and from that
host. Although the term socket is correct, historically “socket” meant the Berkeley Sockets API.
THE BERKELEY SOCKETS API : Berkeley Sockets, released in 1983, was an API for working with internet sockets. This is the original API for the TCP/IP suite. Although the origins lie in Unix, Microsoft Windows includes a networking stack that closely follows Berkeley Sockets.
There are well-known port numbers for standard TCP/IP services. They include DNS, HTTP, SSH, and more. These port numbers are usually odd numbers due to historical reasons. TCP and UDP ports are distinct so they can overlap. If an application layer protocol requires both TCP and UDP connections, then the convention is to use the same port number for both connections. An example of a protocol that uses both UDP and TCP is DNS.
In Node, you can create TCP sockets with the net module, and UDP is supported by the dgram module. Other networking protocols are also supported—DNS is a good example.
The following sections look at the application layer protocols included in Node’s core modules.
Node has a suite of networking modules that allows you to build web and other server applications. Over the next few sections we’ll cover DNS, TCP, HTTP, and encryption.
The Domain Name System (DNS) is the naming system for addressing resources connected to the internet (or even a private network). Node has a core module called dns for looking up and resolving addresses. Like other core modules, dns has asynchronous APIs. In this case, the implementation is also asynchronous, apart from certain
methods that are backed by a thread pool. This means DNS queries in Node are fast, but also have a friendly API that is easy to learn.
You don’t often have to use this module, but we’ve included techniques because it’s a powerful API that can come in handy for network programming. Most application layer protocols, HTTP included, accept hostnames rather than IP addresses.
Node also provides modules for networking protocols that we’re more familiar with—for example, HTTP.
HTTP is important to most Node developers. Whether you’re building web applications or calling web services, you’re probably interacting with HTTP in some way. Node’s http core module is built on the net, stream, buffer, and events modules. It’s low-level, but can be used to create simple HTTP servers and clients without too
much effort.
Due to the importance of the web to Node development, we’ve included several techniques that explore Node’s http module. Also, when we’re working with HTTP we often need to use encryption—Node also supports encryption through the crypto and tls modules.
You should know the term SSL—Secure Sockets Layer—because it’s how secure web pages are served to web browsers. Not just HTTP traffic gets encrypted, though—other services, like email, encrypt messages as well. Encrypted TCP connections use TLS: Transport Layer Security. Node’s tls module is implemented using OpenSSL.
This type of encryption is called public key cryptography. Both clients and servers must have private keys. The server can then make its public key available so clients can encrypt messages. To decrypt these messages, access to the server’s private key is required.
Node supports TLS by allowing TCP servers to be created that support several ciphers. The TCP server itself inherits from net.Server—once you’ve got your head around TCP clients and servers in Node, encrypted connections are just an extension of these principles.
A solid understanding of TLS is important if you want to deploy web applications with Node. People are increasingly concerned with security and privacy, and unfortunately SSL/TLS is designed in such a way that programmer error can cause security weaknesses.
There’s one final aspect of networking in Node that we’d like to introduce before we move on to the techniques for this chapter: how Node is able to give you asynchronous APIs to networking technologies that are sometimes blocking at the system level.