This article cites the article from the author of the book "Dimensional _Woo", the content has been deleted, thanks to the original author's sharing.
HTTP (full name hypertext transfer protocol, English full name HyperText Transfer Protocol) is the most widely used network protocol on the Internet. All WWW files must comply with this standard. The original purpose of designing HTTP was to provide a way to publish and receive HTML pages.
For mobile instant messaging (especially IM applications), today's mainstream data communication is nothing more than long connection + short connection, and short connection is the application of the HTTP protocol introduced in this article. The correct understanding of the HTTP protocol is quite beneficial for writing IM. (For the specific application of HTTP on the mobile side, you can read the summary of optimization methods for short connections in modern mobile networks: request speed, weak network adaptation, security guarantee . ).
This article has a long length. Let's preview the mind map first:
- Instant messaging development exchange 3 groups: 185926912 [Recommended]
- Mobile IM development entry article: " Getting started with one is enough: develop mobile IM from zero "
(This article was published simultaneously at: http://www. 52im.net/thread-1677-1- 1.html )
2, "the father of HTTP"
▲ "Father of HTTP" - Ted Nelson
▲ HTTP protocol logo
In 1960, Ted Nelson conceived a method of processing textual information through a computer, and called it hypertext, which became the basis for the development of the HTTP hypertext transfer protocol standard architecture.
Ted Nelson organized the World Wide Web Consortium and the Internet Engineering Task Force to collaborate on research and eventually released a series of RFCs, the most famous of which is RFC 2616 . RFC 2616 defines a version of the HTTP protocol that we commonly use today - HTTP 1.1.
Because of Ted Nelson's breakthrough historical contribution to the development of HTTP technology, he is known as the "father of HTTP."
3, a series of articles
This article is the sixth in a series of articles. The outline of this series is as follows:
If you feel that this series of articles is too basic, you can read the "Unknown Network Programming" series directly. The series is as follows:
For a summary of the mobile network features and optimization tools, please see:
4, reference materials
5, HTTP overview
5.1 Computer Network Architecture Layering
5.2 TCP/IP communication transport stream
When using the TCP/IP protocol suite for network communication, it communicates with the other party in a hierarchical order. The sender goes down from the application layer, and the receiver goes up from the link layer.
The TCP/IP communication transport stream is as follows:
First, the client acting as the sender sends an HTTP request to view a web page at the application layer (HTTP protocol);
Then, for the convenience of transmission, the data (HTTP request message) received from the application layer is divided at the transport layer (TCP protocol), and the tag number and port number are marked on each message, and then forwarded to the network layer;
At the network layer (IP protocol), the MAC address that is the destination of the communication is added and forwarded to the link layer. In this way, the communication request to the network is ready;
The server at the receiving end receives the data at the link layer and sends it to the upper layer in order, up to the application layer. When transmitted to the application layer, it can be considered to actually receive the HTTP request sent by the client.
The HTTP request is shown below:
In the network architecture, there are numerous network protocols. This article focuses on the HTTP protocol (HTTP/1.1 version).
The HTTP protocol (HyperText Transfer Protocol) is a transport protocol for transmitting hypertext from a WWW server to a local browser. It can make the browser more efficient and reduce network transmission. It not only ensures that the computer transmits the hypertext document correctly and quickly, but also determines which part of the document is transferred and which part of the content is displayed first (such as text before the graphic).
HTTP is an application layer communication protocol between a client browser or other program and a web server. Hypertext information is stored on the Web server on the Internet, and the client needs to transmit the hypertext information to be accessed through the HTTP protocol. HTTP contains command and transport information, not only for Web access, but also for communication between other Internet/Intranet applications, enabling integration of hypermedia access for various application resources.
The website address we enter in the address bar of the browser is called the URL (Uniform Resource Locator). Just like every household has a house address, each page also has an Internet address. When you enter a URL in the browser's address box or click on a hyperlink, the URL determines the address to browse. The browser extracts the webpage code of the site on the web server through the Hypertext Transfer Protocol (HTTP) and translates it into a beautiful webpage.
6, HTTP work process
HTTP request response model:
The HTTP communication mechanism is that during a complete HTTP communication, the following seven steps are completed between the client and the server:
1) Establish a TCP connection: Before the HTTP work starts, the client first establishes a connection with the server through the network. The connection is completed through TCP. The protocol and the IP protocol jointly build the Internet, that is, the well-known TCP/IP protocol family. Therefore, the Internet is also called a TCP/IP network. HTTP is a higher-level application layer protocol than TCP. According to the rules, only the lower layer protocol can be established after the high-level protocol is established. Therefore, the TCP connection must be established first. The port number of the general TCP connection is 80.
2) The client sends a request command to the server: once the TCP connection is established, the client sends a request command to the server;
For example: GET/sample/hello.jsp HTTP/1.1;
3) The client sends the request header information: after the client sends its request command, it also sends some other information to the server in the form of header information, after which the client sends a blank line to notify the server that it has finished the header. Sending of information;
4) Server response: After the client makes a request to the server, the server will return a response from the client;
For example: HTTP/1.1 200 OK
The first part of the response is the version number and response status code of the protocol;
5) The server returns the response header information: just as the client will send information about itself along with the request, the server will also send the user about its own data and the requested document along with the response;
6) The server sends data to the client: after the server sends the header information to the client, it sends a blank line to indicate that the header information is sent to the end, and then it is in the format described by the Content-Type response header. Send the actual data requested by the user;
7) The server closes the TCP connection: In general, once the server returns the request data to the client, it closes the TCP connection, and then if the client or server adds this line of code in its header information Connection:keep-alive, TCP The connection will remain open after being sent, so the client can continue to send requests over the same connection. Staying connected saves the time required to establish a new connection for each request and saves network bandwidth.
7, HTTP protocol basis
7.1 Communication is achieved through the exchange of requests and responses
When applying the HTTP protocol, one must be the client role and the other server role. The role of the server and the client is determined only from one communication line. The HTTP protocol stipulates that the request is sent from the client, and finally the server responds to the request and returns. In other words, it is certain that the communication is started from the client first, and the server does not send a response until it receives the request.
7.2 HTTP is a protocol that does not save state
HTTP is a stateless protocol. The protocol itself does not save the state of communication between the request and the response. That is to say, at the HTTP level, the protocol does not persist requests for sent requests or responses. This is to handle a large number of transactions faster, to ensure the scalability of the protocol, and deliberately design the HTTP protocol to be so simple.
However, with the continuous development of the Web, many of our businesses need to save the communication status. So we introduced the cookie technology. With cookies and communication using the HTTP protocol, you can manage the state.
7.3 Using state management of cookies
Cookie technology controls the state of a client by writing cookie information in request and response messages. The cookie notifies the client to save the cookie based on a header field message called Set-Cookie in the response message sent from the server. When the client sends a request to the server again, the client automatically adds the cookie value to the request message and sends it out. After the server detects the cookie sent by the client, it checks the connection request from which client, and then compares the record on the server, and finally gets the previous status information.
7.4 Requesting URIs Locating Resources
The HTTP protocol uses URIs to locate resources on the Internet. It is because of the specific functionality of the URI that resources are accessible anywhere on the Internet.
7.5 HTTP method to inform the server of intent (HTTP/1.1)
7.6 Persistent connection
In the initial version of the HTTP protocol, a TCP connection was disconnected for each HTTP communication. For example, when using a browser to browse an HTML page containing multiple images, when sending a request to access the HTML page resources, it will also request other resources contained in the HTML page. Therefore, each request will cause fearless TCP connection establishment and disconnection, increasing the overhead of traffic.
In order to solve the above problem of TCP connection, HTTP/1.1 and some HTTP/1.0 have come up with a method of persistent connection. The feature is that the TCP connection state is maintained as long as no disconnection is explicitly made at either end. Designed to make multiple requests and responses after a TCP connection. In HTTP/1.1, all connections are persistent by default.
Persistent connections make it possible to send most requests in a pipelined manner. Before sending a request, you need to wait and receive a response before you can send the next request. After the pipeline technology appears, the next request can be sent without waiting. This allows multiple requests to be sent in parallel at the same time, without waiting for the response one by one.
For example, when requesting an HTML page that contains multiple images, using a persistent connection allows the request to end more quickly than a single connection. Pipeline technology is faster than a persistent connection. The more requests, the more obvious the time difference.
8. HTTP protocol packet structure
8.1 HTTP Messages
The information used for HTTP protocol interaction is called an HTTP message. The HTTP message of the requesting end (client) is called a request message; the responding end (server side) is called a response message. The HTTP report text is a string of text consisting of multiple lines of data (using CR+LF as a newline).
8.2 HTTP Message Structure
HTTP packets can be roughly divided into two parts: the packet header and the packet body. Both are divided by the initial blank line (CR+LF). Usually, there is not necessarily a message body.
The HTTP message structure is as follows:
8.3 Request message structure
The header of the request message consists of the following data:
Request line - contains the method, request URI, and HTTP version used for the request;
Header Field - Contains various headers that represent the various conditions and attributes of the request. (Common header, request header, entity header, and undefined headers such as cookies in the RFC).
An example of a request message is as follows:
8.4 Response Message Structure
The first part of the response message consists of the following data:
Status line - contains a status code indicating the response result, a reason phrase, and an HTTP version;
Header Field - Contains various headers that represent the various conditions and attributes of the request. (Common headers, response headers, entity headers, and undefined headers such as cookies in RFC).
An example of a response message is as follows:
9, the first field of the HTTP message header (emphasis analysis)
9.1 Header Field Overview
Let's first review the location of the header field in the message. The HTTP message contains the message header and the message body. The message header contains the request line (or status line) and the header field.
Among the many fields of the message, the HTTP header field contains the most abundant information. The header field exists in both the request and response messages and covers the content information related to the HTTP message. The header field is used to provide the client and server with the message body size, language used, authentication information, and so on.
9.2 Header field structure
The HTTP header field consists of the header field name and the field value separated by a colon ":".
In addition, field values can have multiple values for a single HTTP header field.
When two or more header fields with the same header field name appear in the HTTP message header, this situation is not clear in the specification. Depending on the internal processing logic of the browser, the order of priority processing may be different. Inconsistent.
9.3 header field type
The header fields are divided into the following four types according to the actual use:
9.4 Common Header Field (HTTP/1.1)
9.5 Request Header Field (HTTP/1.1)
9.6 Response Header Field (HTTP/1.1)
9.7 Entity Header Field (HTTP/1.1)
9.8 is the first field of the cookie service
10, other header fields
The HTTP header field is self-extended. So on the web server and browser applications, there are various non-standard header fields. The following are the most commonly used header fields.
X-Frame-Options: The DENY header field X-Frame-Options is an HTTP response header that controls the display of website content within the Frame tags of other Web sites. Its main purpose is to prevent clickjacking attacks. The header field X-Frame-Options has the following two field values that can be specified:
SAMEORIGIN: Only the pages under the same domain name (Top-level-browsing-context) match when licensed. (For example, when specifying http:// sample.com/sample.html When the page is SAMEORIGIN, then the frame of all pages on sample.com is allowed to load the page, and the pages of other domain names such as example.com will not work.)
X-XSS-Protection: 1 The first field X-XSS-Protection belongs to the HTTP response header, a countermeasure for cross-site scripting attacks (XSS) that controls the switch of the browser's XSS protection mechanism. The field values that can be specified in the header field X-XSS-Protection are as follows:
0 : Set XSS filtering to an invalid state
1 : Set XSS filtering to a valid state
DNT: 1 The first field DNT belongs to the HTTP request header, where DNT is the abbreviation of Do Not Track, which means that the rejection of personal information is collected, which is a way to refuse to be tracked by precise advertisement. The field values that can be specified in the header field DNT are as follows:
0: Agree to be tracked
1 : Refusal to be tracked
Since the function of the first field DNT is valid, the web server needs to support DNT accordingly.
P3P: CP="CAO DSP LAW CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa OUR BUS IND The first field P3P belongs to the HTTP response header and can be made on the Web site by using P3P (The Platform for Privacy Preferences) technology. Personal privacy becomes a form that is only understandable by the program to protect the privacy of the user.
To set up P3P, follow the steps below:
Step 1: Create P3P privacy
Step 2: After creating the P3P privacy comparison file, save the name in /w3c/p3p.xml
Step 3: After creating a new CompactESS policy from P3P privacy, output to the HTTP response
11, HTTP response status code
The server only receives a partial request, but once the server does not reject the request, the client should continue to send the rest of the request.
101 Switching Protocols
Server conversion protocol: The server will switch to another protocol in accordance with the client's request.
The request is successful (followed by a response to the GET and POST request.)
The request is created and new resources are created.
The request for processing has been accepted, but the processing is not completed.
203 Non-authoritative Information
The document has returned normally, but some of the response headers may be incorrect because a copy of the document is being used.
204 No Content
No new documents. The browser should continue to display the original document. This status code is useful if the user refreshes the page periodically and the servlet can determine that the user's document is new enough.
205 Reset Content
No new documents. But the browser should reset what it displays. Used to force the browser to clear the form input.
206 Partial Content
The client sent a GET request with a Range header and the server completed it.
300 Multiple Choices
Multiple choices. List of links. The user can select a link to reach the destination. A maximum of five addresses are allowed.
301 Moved Permanently
The requested page has been transferred to the new url.
The requested page has been temporarily transferred to the new url.
303 See Other
The requested page can be found under another url.
304 Not Modified
The document was not modified as expected. The client has a buffered document and issues a conditional request (generally an If-Modified-Since header is provided to indicate that the client only wants to update the document than the specified date). The server tells the client that the originally buffered document can continue to be used.
305 Use Proxy
The document requested by the client should be extracted by the proxy server indicated by the Location header.
This code was used in the previous version. It is no longer used, but the code is still reserved.
307 Temporary Redirect
The requested page has been temporarily moved to the new url.
400 Bad Request
The server failed to understand the request.
The requested page requires a username and password.
The server configuration caused the login to fail.
Unauthorized due to ACL restrictions on resources.
Filter authorization failed.
ISAPI/CGI application authorization failed.
Access is denied by the URL authorization policy on the web server. This error code is specific to IIS 6.0.
402 Payment Required
This code is not yet available.
Access to the requested page is prohibited.
Execution access is prohibited.
Read access is prohibited.
Write access is forbidden.
SSL is required.
Requires SSL 128.
The IP address was rejected.
Require a client certificate.
Site access was denied.
There are too many users.
The configuration is invalid.
Access to the mapping table is denied.
The client certificate is revoked.
Reject the directory listing.
Exceeded client access license.
The client certificate is not trusted or invalid.
The client certificate has expired or is not yet valid.
The requested URL cannot be executed in the current application pool. This error code is specific to IIS 6.0.
CGI cannot be executed for clients in this application pool. This error code is specific to IIS 6.0.
Passport login failed. This error code is specific to IIS 6.0.
404 Not Found
The server could not find the requested page.
(none) – no files or directories were found.
Unable to access the website on the requested port.
The web service extension locking policy blocks this request.
The MIME mapping policy blocks this request.
405 Method Not Allowed
The method specified in the request is not allowed.
406 Not Acceptable
The server generated response cannot be accepted by the client.
407 Proxy Authentication Required
The user must first authenticate using a proxy server so that the request is processed.
408 Request Timeout
The request exceeded the wait time of the server.
The request could not be completed due to a conflict.
The requested page is not available.
411 Length Required
"Content-Length" is not defined. If there is no such content, the server will not accept the request.
412 Precondition Failed
The prerequisites in the request were evaluated as failed by the server.
413 Request Entity Too Large
The server will not accept the request because the requested entity is too large.
414 Request-url Too Long
Since the url is too long, the server will not accept the request. This happens when a post request is converted to a get request with very long query information.
415 Unsupported Media Type
Since the media type is not supported, the server will not accept the request.
416 Requested Range Not Satisfiable
The server cannot satisfy the Range header specified by the client in the request.
417 Expectation Failed
The execution failed.
500 Internal Server Error
The request was not completed. The server encountered an unpredictable situation.
The application is busy restarting on the web server.
The web server is too busy.
Direct request for Global.asa is not allowed.
The UNC authorization credentials are incorrect. This error code is specific to IIS 6.0.
The URL authorization store cannot be opened. This error code is specific to IIS 6.0.
Internal ASP error.
501 Not Implemented
The request was not completed. The server does not support the requested feature.
502 Bad Gateway
The request was not completed. The server received an invalid response from the upstream server.
The CGI application timed out. ·
An error occurred in the CGI application.
503 Service Unavailable
The request was not completed. The server is temporarily overloaded or down.
504 Gateway Timeout
The gateway timed out.
505 HTTP Version Not Supported
The server does not support the HTTP protocol version specified in the request.
12, HTTP message entity
12.1 Overview of HTTP Message Entities
Please take a closer look at the content of each component in the example above.
Next, let's take a look at the concept of messages and entities. If you think of an HTTP message as a box in an Internet freight system, the HTTP entity is the actual goods in the message.
Message: It is the data unit exchanged and transmitted in the network, that is, the data block to be sent by the station at one time. The message contains the complete data information to be sent, the length of which is very inconsistent, the length is not limited and variable;
Entity: The payload data (supplement) as a request or response is transmitted, the content of which consists of the entity header and the entity body. (The relevant content of the entity header has been explained in the sixth point above.)
We can see that the content of the deep red box in the right picture above is the physical part of the message, and the two parts of the blue box are the entity header and the entity body. The content of the pink box on the left is the body of the message.
Usually, the body of the message is equal to the entity body. Only when the encoding operation is performed in the transmission, the content of the entity body changes, causing it to differ from the body of the message.
12.2 Content Encoding
HTTP applications sometimes need to encode content before sending it. For example, the server may compress large HTML documents before sending them over a slow connection, which helps reduce the time it takes to transfer entities. The server can also mess up or encrypt content to prevent unauthorized third parties from seeing the content of the document.
This type of encoding is applied to the content on the sender. When the content is content-encoded, the encoded data is placed in the entity body and sent to the recipient as usual.
Content encoding type:
12.3 transmission coding
Content encoding is a reversible transformation of the body of a message and is closely related to the specific format details of the content.
Transport encoding is also a reversible transform that acts on the entity body, but they are used for architectural reasons, regardless of the format of the content. Transmission coding is used to change the way data in a message is transmitted over the network.
12.4 block coding
Block coding splits the message into blocks of known size. Blocks are sent next to each other, so there is no need to know the size of the entire message before sending it. Block coding is a type of transmission code that is a property of a message.
If there is no persistent connection between the client and the server, the client does not need to know the length of the body it is reading, but only needs to read the server to close the body connection.
When using a persistent connection, before the server writes the body, it must know its size and send it in the Content-Length header. If the server dynamically creates content, it may not know the length of the body before sending it.
Block coding provides a solution to this difficulty, as long as the server is allowed to send the body in chunks, indicating the size of each block. Because the body is dynamically created, the server can buffer a portion of it, send its size and the corresponding block, and then repeat the process before the body has finished sending it. The server can use a block of size 0 as the signal for the end of the body, so that you can continue to connect and prepare for the next response.
Let's take a look at a block-encoded message example:
12.5 multi-part media type
Multipart (multipart) email messages in MIME contain multiple messages that are sent together as a single complex message. Each part is independent, with its own set describing its content, and the different parts are connected together by a delimited string.
Correspondingly, the HTTP protocol also adopts a multi-part object collection, and a sent message body can contain multiple types of entities.
The multipart object collection contains the following objects:
Multipart/form-data: used when uploading web form files;
Multipart/byteranges: Status code 206 Partial Content Used when the response message contains multiple ranges of content.
12.6 Scope Request
Suppose you are downloading a large file, it has already been three-quarters, and suddenly the network is interrupted, then the download must be repeated again. In order to solve this problem, a recoverable mechanism is needed to recover the download from the previous download interruption. To achieve this, this requires a range request.
With scope requests, the HTTP client can resume downloading the entity by requesting a range (or part) of the entity that failed to get the failure. Of course, this has a premise that the object has not changed since the client last requested the entity to the time range request. E.g:
GET /bigfile.html HTTP/1.1
Host: [url= http://www. sample.com ] http://www. sample.com [/url]
In the above example, the client is requesting the portion after the 20224 bytes at the beginning of the document.
13. Web server that works with HTTP
In addition to the client and server, there are some applications for facilitating communication during HTTP communication. The more important ones are listed below: Proxy, Cache, Gateway, Tunnel, Agent Proxy.
The HTTP proxy server is an important component of web security, application integration, and performance optimization. The agent is located between the client and the server, receives all HTTP requests from the client, and forwards these requests to the server (the request may be modified before forwarding). For the user, these applications are a proxy that accesses the server on behalf of the user.
For security reasons, the agent is typically used as a trusted intermediate node that forwards all web traffic. Agents can also filter requests and responses, securely surf the Internet or surf the Internet.
The first request from the browser:
The browser requests again:
Web caching or proxy caching is a special type of HTTP proxy server that can save common documents that are transferred by the proxy. The next client requesting the same document can enjoy the services provided by the cached private copy. It is much faster for a client to download a document from a nearby cache than to download it from a remote web server.
A gateway is a special kind of server that is used as an intermediate entity of other servers. Usually used to convert HTTP traffic to other protocols. When the gateway receives the request, it looks like it is the source server of the resource. The client may not know that it is communicating with a gateway.
A tunnel is an HTTP application that blindly forwards raw data between two connections after it is established. HTTP tunnels are typically used to forward non-HTTP data over one or more HTTP connections without snooping data.
A common use of HTTP tunneling is to carry encrypted Secure Sockets Layer (SSL) traffic over an HTTP connection so that SSL traffic can pass through a firewall that only allows web traffic to pass.
The Agent agent is a client application that initiates HTTP requests on behalf of a user. All applications that publish web requests are HTTP Agent agents.
(Original link: https://www. jianshu.com/p/6e9e4156e ce3 )
Appendix: More network programming materials
(This article was published simultaneously at: http://www. 52im.net/thread-1677-1- 1.html )