Trying to figure out whether you should use websocket or HTTP? This article will show you what a websocket is, how it works, where its used, and how it’s different from HTTP.
What is a websocket?
A websocket is a persistent connection that exists between a client and a server. It offers a bidirectional, full-duplex communications channel that operates over HTTP via a single TCP/IP socket connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011.
Essentially, the WebSocket API is an advanced technology that makes it possible to open up a two-way interactive communication session between your user’s browser and your server. This API enables you to send messages to a server and receive event-driven responses without needing to poll the server for a reply.
A websocket connection does happen to be functionally similar to standard Unix-style sockets, they are not related.
A websocket is basically a framed protocol, which means that a piece of data (a message) gets sliced down into a number of discrete pieces, with the size of each piece encoded in the frame. The frame is made up of a frame type, a payload length, and a data portion.
The most important pieces of the websocket protocol:
Fin Bit
The Fin bit is the first bit of the WebSocket header. It is set if this frame is the last data to complete the message.
RSV1, RSV2, RSV3 Bits
These bits are saved to be used in the future.
Opcode
There is an opcode for every frame. It determines the way in which you interpret the frame’s payload data. Here’s a list of some opcode values with their description:
- 0x00 - the frame continues the payload from the previous frame.
- 0x01 - this opcode denotes a text frame. Text frames are UTF-8 decoded by the server.
- 0x02 -this opcode denotes a binary frame. Binary frames are delivered by the server without any changes.
- 0x03-0x07 - this opcode denotes that the frame is reserved for future use.
- 0x08 - this opcode denotes that the client wants to close the connection.
- 0x09 - this is a ping frame. It functions as a heartbeat mechanism to ensure that the connection is still alive. The receiver needs to respond with a pong.
- 0x0a - this is a pong frame. It also functions as a heartbeat mechanism to make sure that the connection is still alive. The receiver needs to respond with a ping frame.
- 0x0b-0x0f - this opcode also denotes that the frame is reserved for future use.
Mask
Setting this bit to 1 enables masking. Websockets need all payloads to be obfuscated through the use of a random key (the mask) selected by the client. The masking key is then put together with the payload data through the use of an XOR operation before the data is sent to the payload. Masking stops caches from misinterpreting WebSocket frames as cacheable data.
When the websocket protocol was being developed, it was seen that if a compromised server gets deployed, and clients connect to that server, there is a possibility of having intermediate proxies or infrastructure caching the responses of the compromised server so that future clients requesting that data receive the incorrect response. Such an attack is known as cache poisoning and arises from the fact that you cannot control the manner in which misbehaving proxies act. This can be quite an issue when you introduce a new protocol like WebSocket that needs to interact with the existing infrastructure of the internet.
Payload len
The Payload len field and Extended payload length fields are utilized for the purpose of encoding the total length of the payload data for the frame. If the payload data is smaller than 126 bytes, the length gets encoded in the Payload len field. As the payload data increases, we make use of the additional fields to encode the length of the payload.
Masking-key
This is quite closed tied to the mask bit. All frames sent from the client to the server are masked by a 32-bit value that is contained within the frame. This field will be present if the mask bit is set to 1 and is absent if the mask bit is set to 0.
Payload data
The payload data is made up of arbitrary application data and any extension data that has been negotiated between the client and the server. Extensions get negotiated during the initial handshake and make it possible for you to extend the WebSocket protocol for further uses.
What is a websocket used for?
A websocket is used for the purpose of opening a two-way interactive communication session between the user's browser and a server. It enables you to shoot out messages to a server and receive event-driven responses without any need for polling the server for a reply.
How do websockets work?
The process of a websocket connection starts with a WebSocket handshake that involves make use of a new scheme ws or wss. These could be thought of to be equivalent to HTTP and HTTPS respectively.
When this scheme is used, the clients and servers have to follow the standard WebSocket connection protocol. The establishment of the websocket is kicked off with HTTP request upgrading that features a few headers like Connection: Upgrade, Upgrade: WebSocket, Sec-WebSocket- Key, etc.
Here’s how the connection gets established:
The request
The ‘Connection: Upgrade’ header denotes the WebSocket handshake and the ‘Sec-WebSocket-Key’ features a Base64-encoded random value. This value gets arbitrarily whenever a WebSocket handshake takes place. The key header is also a part of the request.
The response
Ther response header, ‘Sec-WebSocket-Accept’, features the zest of value that was submitted in the ‘Sec-WebSocket-Key’ request header. This is connected with a specific protocol specification and is used extensively to thwart misleading information. It improves API security and prevents ill-configured servers from creating errors and issues in the application development.
Websocket vs HTTP
Websockets and HTTP are both used for application communication, which might confuse you when you’re trying to figure out which one you should opt for. Let’s see how they are different from each other.
WebSocket is a framed and bidirectional protocol. In contrast, HTTP is a unidirectional protocol that functions above the TCP protocol.
The websocket protocol has the ability to support continual data transmission. It is widely used in real-time application development. HTTP, on the other hand, is stateless and is generally used for developing RESTful applications.
Websockets have communication occurring on both ends, because of which it is a faster protocol than HTTP, in which the connection is built at one end, making it a bit slower than WebSocket.
A websocket utilizes a unified TCP connection and requires one party to terminate the connection. Unti one part terminates the connection, the connection stays active. For HTTP, a separate connection needs to be built for different requests and the connection is automatically broken as soon as the request is completed.