哈尔滨工业大学-深圳-高级计算机网络-课程Project--网页抓取与恢复.docx_第1页
哈尔滨工业大学-深圳-高级计算机网络-课程Project--网页抓取与恢复.docx_第2页
哈尔滨工业大学-深圳-高级计算机网络-课程Project--网页抓取与恢复.docx_第3页
哈尔滨工业大学-深圳-高级计算机网络-课程Project--网页抓取与恢复.docx_第4页
哈尔滨工业大学-深圳-高级计算机网络-课程Project--网页抓取与恢复.docx_第5页
已阅读5页,还剩23页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

哈尔滨工业大学深圳研究生院Advanced Computer Network Project Report http protocol analysis and webpage reverting 报告日期 2017年12月12号 Contenthttp protocol analysis and webpage reverting11Introduction11.1. Objectives11.2. Environment and tools11.3. Task Distribution12. Protocol Analysis22.1 HTTP Analysis22.2. TCP Analysis72.3. IP Analysis113. Capture Packets143.1 Introduction143.2 Process144. Analyse Packets164.1. Introduction164.2. The process of analysis164.3. The method of analysis175. Revert Webpage205.1. Principle205.2. Process21Main Code225.3. Result241Introduction1.1. ObjectivesHTTP is the most widely used Internet protocol on the Internet. Write a program that can capture the coming packets and analyze HTTP protocol, then evert the webpage which you select to test by using the captured data.1.2. Environment and toolsWindows 10Codeblock + cWinpcap2. Protocol Analysis2.1 HTTP AnalysisThe HTTP (Hypertext Transfer Protocol ) is an application-layer protocol. HTTP specifies the format of data transfer between the browser client and the server. It specifies what kind of message the client can send to the server, and what kind of response. it is the foundation of data communication for the World Wide Web.work procedureAn HTTP operation is called a transaction, and its working process can be divided into four steps.1) First client and server need to establish a connection. 2) After the connection is established, the client sends a request to the server in the format of URL, protocol version number, followed by MIME information including the request modifier, client information, and possible content.3) After receiving the request, the server gives the corresponding response information in the form of a status line including the protocol version number of the message, a successful or incorrect code, followed by the MIME information including server information, entity information and possible content.4) Client Receive The information returned by the server is displayed on the users display through the browser, and the client is disconnected from the server.If an error occurs at some point in the above process, the error message will be returned to the client with the display output. Our request might also have been through the proxy server before it reached the web serve.HTTP protocol-structHTTP messages consist of requests from the client to the server and responses from the server to the client.Request lineGeneral information headRequest head Entity headerMessage bodyTable 2.1.1 request message formatThe request line starts with the method field, followed by the URL field and the HTTP protocol version field, ending with CRLF. SP is a delimiter. Except for CF and LF in the final CRLF sequence, it is not necessary.Status lineGeneral information headRequest headEntity headerMessage bodyTable 2.1.2 response message formatThe status symbol consists of 3 digits, indicating whether the request is understood or fulfilled. The reason analysis is a brief description of the original status code, the status code is used to support automatic operation, and the reason analysis is for the user to use. The client does not need to be used to check or display the syntax.Request methodHTTP defines 8 methods to indicate the desired action to be performed on the identified resource. What this resource represents, whether pre-existing data or data that is generated dynamically, depends on the implementation of the server. Often, the resource corresponds to a file or the output of an executable residing on the server. methodsIntroductionsGETAsk for read a web page.HEADAsks for read the head of web page.PUTRequests to store a web page.POSTAttach a url.DELETElDeletes the web page.TRACEEchoes the received request .OPTIONSQuery the properties of the server or a particular file.CONNECTConverts the request connection to a transparent TCP/IP tunnel.Table 2.3 HTTP request methodThe HTTP server should at least GET and HEAD methods, the other methods are optional. In addition, in addition to the above method, a specific HTTP server can also extend a custom method.Client request messageGET /somedir/page.html HTTP/1.1Host: Connection: closeUser-agent: Mozilla/5.0Accept-language: frA client request is followed by a blank line, so that the request ends with a double newline, each in the form of a carriage return followed by a line feed. The Host field distinguishes between various DNS names sharing a single IP address, allowing name-based virtual hosting. While optional in HTTP/1.0, it is mandatory in HTTP/1.1.Response messageThe client sends a request to the server. The server responds with a status line. The response includes the version of the message protocol, the success or failure code, the server information, the entity meta-information, and the necessary entity content. Depending on the category of response category, the server response may contain entity content, but not all responses have entity content.Server response messageHTTP/1.1 200 OKConnection: closeDate: Mon, 11 Dec 2017 22:38:34 GMTServer: Apache/ Last-Modified: mon,11 Dec 2017 23:11:55 GMTContent-Type: text/html; charset=UTF-8Content-Length: 6821Accept-Ranges: bytes(data data data data.)Content-Type specifies the Internet media type of the data conveyed by the HTTP message, while Content-Length indicates its length in bytes. The HTTP/1.1 webserver publishes its ability to respond to requests for certain byte ranges of the document by setting the field Accept-Ranges: bytes. This is useful, if the client needs to have only certain portions of a resource sent by the server, which is called byte serving. When Connection: close is sent, it means that the web server will close the TCP connection immediately after the transfer of this response.Most of the header lines are optional. When Content-Length is missing the length is determined in other ways. Chunked transfer encoding uses a chunk size of 0 to mark the end of the content. Identity encoding without Content-Length reads content until the socket is closed.2.2. TCP AnalysisThe TCP (Transmission Control Protocol) is a core protocol of the Internet protocol suite. It originated in the initial network implementation in which it complemented the IP (Internet Protocol). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets between applications running on hosts communicating over an IP network. TCP is the protocol that major Internet applications such as the World Wide Web, email, remote administration and file transfer rely on. Applications that do not require reliable data stream service may use the UDP(User Datagram Protocol), which provides a connectionless datagram service that emphasizes reduced latency over reliability.TCP message segment -structTransmission Control Protocol accepts data from a data stream, divides it into chunks, and adds a TCP header creating a TCP segment. The TCP segment is then encapsulated into an Internet Protocol (IP) datagram, and exchanged with peers. The term TCP packet appears in both informal and formal usage, whereas in more precise terminology segment refers to the TCP Protocol Data Unit (PDU), datagram to the IP PDU, and frame to the data link layer PDU: Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module to transmit each segment to the destination TCP.A TCP segment consists of a segment header and a data section. The TCP header contains 10 mandatory fields, and an optional extension field (Options, pink background in table). The data section follows the header. Its contents are the payload data carried for the application. The length of the data section is not specified in the TCP segment header. It can be calculated by subtracting the combined length of the TCP header and the encapsulating IP header from the total IP datagram length (specified in the IP header).Figure 2.2.1 TCP message headl Source port (16 bits):Identifies the sending portl Destination port (16 bits):Identifies the receiving portl Sequence number (32 bits):Has a dual role: If the SYN flag is set (1), then this is the initial sequence number. The sequence number of the actual first data byte and the acknowledged number in the corresponding ACK are then this sequence number plus 1. If the SYN flag is clear (0), then this is the accumulated sequence number of the first data byte of this segment for the current session.l Acknowledgment number (32 bits): If the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. This acknowledges receipt of all prior bytes (if any). The first ACK sent by each end acknowledges the other ends initial sequence number itself, but no data.l Data offset (4 bits): Specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header. This field gets its name from the fact that it is also the offset from the start of the TCP segment to the actual data.l Reserved (3 bits): For future use and should be set to zerol Flags (9 bits) (aka Control bits): Contains 9 1-bit flags NS (1 bit) ECN-nonce concealment protection. CWR (1 bit) Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it received a TCP segment with the ECE flag set and had responded in congestion control mechanism. ECE (1 bit) ECN-Echo has a dual role, depending on the value of the SYN flag. It indicates: If the SYN flag is set (1), that the TCP peer is ECN capable. If the SYN flag is clear (0), that a packet with Congestion Experienced flag in IP header set is received during normal transmission. URG (1 bit) indicates that the Urgent pointer field is significant ACK (1 bit) indicates that the Acknowledgment field is significant. All packets after the initial SYN packet sent by the client should have this flag set. PSH (1 bit) Push function. Asks to push the buffered data to the receiving application. RST (1 bit) Reset the connection SYN (1 bit) Synchronize sequence numbers. Only the first packet sent from each end should have this flag set. Some other flags and fields change meaning based on this flag, and some are only valid for when it is set, and others when it is clear. FIN (1 bit) No more data from senderl Window size (16 bits): The size of the receive window, which specifies the number of window size units (by default, bytes) (beyond the segment identified by the sequence number in the acknowledgment field) that the sender of this segment is currently willing to receive l Checksum (16 bits): The 16-bit checksum field is used for error-checking of the header and datal Urgent pointer (16 bits): If the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data bytel Options (Variable 0320 bits, divisible by 32): The length of this field is determined by the data offset field. Options have up to three fields: Option-Kind (1 byte), Option-Length (1 byte), Option-Data (variable). The Option-Kind field indicates the type of option, and is the only field that is not optional. Depending on what kind of option we are dealing with, the next two fields may be set: the Option-Length field indicates the total length of the option, and the Option-Data field contains the value of the option, if applicable. For example, an Option-Kind byte of 0x01 indicates that this is a No-Op option used only for padding, and does not have an Option-Length or Option-Data byte following it. An Option-Kind byte of 0 is the End Of Options option, and is also only one byte. An Option-Kind byte of 0x02 indicates that this is the Maximum Segment Size option, and will be followed by a byte specifying the length of the MSS field (should be 0x04). Note that this length is the total length of the given options field, including Option-Kind and Option-Length bytes. So while the MSS value is typically expressed in two bytes, the length of the field will be 4 bytes (+2 bytes of kind and length). In short, an MSS option field with a value of 0x05B4 will show up as (0x02 0x04 0x05B4) in the TCP options section.2.3. IP AnalysisThe IP(Internet Protocol ) is the principal communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet. IP has the task of delivering packets from the source host to the destination host solely based on the IP addresses in the packet headers. For this purpose, IP defines packet structures that encapsulate the data to be delivered. It also defines addressing methods that are used to label the datagram with source and destination information. Historically, IP was the connectionless datagram service in the original Transmission Control Program introduced by Vint Cerf and Bob Kahn in 1974; the other being the connection-oriented Transmission Control Protocol (TCP). The Internet protocol suite is therefore often referred to as TCP/IP. The first major version of IP, Internet Protocol Version 4 (IPv4), is the dominant protocol of the Internet. Its successor is Internet Protocol Version 6 (IPv6).The Internet Protocol is responsible for addressing hosts and for routing datagrams (packets) from a source host to a destination host across one or more IP networks. For this purpose, the Internet Protocol defines the format of packets and provides an addressing system that has two functions: identifying hosts; and providing a logical location service.1) Datagram constructionSample encapsulation of application data from UDP to a Link protocol frame.Each datagram has two components: a header and a payload. The IP header is tagged with the source IP address, the destination IP address, and other meta-data needed to route and deliver the datagram. The payload is the data that is transported. This method of nesting the data payload in a packet with a header is called encapsulation.2) IP addressing IP addressing entails the assignment of IP addresses and associated parameters to host interfaces. The address space is divided into networks and subnetworks, involving the designation of network or routing prefixes. IP routing is performed by all hosts, as well as routers, whose main function is to transport packets across network boundaries. Routers communicate with one another via specially designed routing protocols, either interior gateway protocols or exterior gateway protocols, as needed for the topology of the network.3) IP routingIP routing is also common in local networks. For example, many Ethernet switches support IP multicast operations. These switches use IP addresses and Internet Group Management Protocol to control multicast routing but use MAC addresses for the actual routing.IP header formatFigure 2.3.1 TCP message headl version (4 bits):Specify the IP protocol version number.l IHL(4 bits):head lengthl Type of service (8 bits):Service type . Contains 3 priority fields, 4 TOS fields and 1 reserved field (must be set to 0). 4 TOS said: minimum delay, maximum throughput, maximum reliability and minimum cost. Landing procedures such as ssh and telnet need is the minimum delay service. File transfer program ftp requires maximum throughput of service.l Total length (16 bits): The length of the entire IP datagram. The maximum length of an IP datagram is 65535 (16th power of -1) bytes. (Note the byte)Due to MTU limitations, datagrams exceeding the MTU will be fragmented.l Identification (16bits): Uniquely identifies each datagram sent by the host. The initial value is randomly generated by the system. Each time a datagram is sent, the value is incremented by 1. All fragments in the same datagram have the same identifier.l Flag (3 bits): The first DF dont fragment indicates that fragmentation is prohibited. If the length of an IP datagram exceeds the MTU, the IP module will discard the datagram and return an MFMP fragment. More Slices In addition to the last shard, all other shards should be set 1.l fragment offset(13 bits): Slice offset.l Time to live (8 bits): Time-to-live router The number of hops allowed to pass before reaching the destination. When TTL is reduced to 0, the router discards the datagram and returns an ICMP error packet.l Protocal (8 bits): Distinguish the upper layer protocol ICMP 1 TCP 6 For more information, please cat / etc / protocols.l header checksum (16 bits):Test ip packet header information is filled by the sender, the receiver uses the CRC algorithm to test the IP header.l Source address/destination address (32 bits): source /destination address.3. Capture Packets3.1 IntroductionWinPcap is the industry-standard tool for link-layer network access in Windows environments:it allows applications to capture and transmit network packets bypassing the protocol stack, and has additional useful features, including kernel-level packet filtering, a network statistics engine and support for remotepacket capture.3.2 ProcessThe process of the packet capturing:In our project, we use Init() function to complete this process the following is the details:123456789101112131415pcap_t *pcap_handle

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论