Saturday, March 9, 2019

What Happens When You Write www.google.com and Hit Enter

There are lots of processes involved if we go in every detail, but I am covering few points which will give insights of processes involved from a URL enter till the page displayed in the browser

  1. Browser checks for cache if requested url matches in cache then go to step #4
  2. If not available in the cache then browser ask OS for server IP address
  3. OS makes a DNS lookup and replies the IP address to the browser
    1. Since the operating system doesn’t know where “www.google.com” is, it queries a DNS resolver.
    2. For most users, their DNS resolver is provided by their Internet Service Provider (ISP)
    3. The resolver starts by querying one of the root DNS servers for the IP of “www.google.com.” The root is represented in the hidden trailing “.” at the end of the domain name. Typing this extra “.” is not necessary as your browser automatically adds it.
    4. There are 13 root server clusters named A-M with servers in over 380 locations. They are managed by 12 different organizations that report to the Internet Assigned Numbers Authority (IANA), such as Verisign, who controls the A and J clusters. All of the servers are copies of one master server run by IANA.
    5. These root servers hold the locations of all of the top level domains (TLDs) such as .com, .de, .io, and newer generic TLDs such as .camera.
    6. The root doesn’t have the IP info for “www.google.com,” but it knows that .com might know, so it returns the location of the .com servers. The root responds with a list of the 13 locations of the .com gTLD servers, listed as NS or “name server” records.
    7. Next the resolver queries one of the .com name servers for the location of google.com. Like the Root Servers, each of the TLDs has 4-13 clustered name servers existing in many locations. There are two types of TLDs: country codes (ccTLDs) run by government organizations, and generic (gTLDs). Every gTLD has a different commercial entity responsible for running these servers. In this case, we will be using the gTLD servers controlled by Verisign, who run the .com, .net, .edu, and .gov among gTLDs.
    8. Each TLD server holds a list of all of the authoritative name servers for each domain in the TLD. For example, each of the 13 .com gTLD servers has a list with all of the name servers for every single .com domain. The .com gTLD server does not have the IP addresses for google.com, but it knows the location of google.com’s name servers. The .com gTLD server responds with a list of all of google.com’s NS records. In this case, Google has four name servers, “ns1.google.com” to “ns4.google.com.”
    9. Finally, the DNS resolver queries one of Google’s name server for the IP of “www.google.com.”
    10. This time the queried Name Server knows the IPs and responds with an A or AAAA address record (depending on the query type) for IPv4 and IPv6, respectively.
    11. At this point the resolver has finished the recursion process and is able to respond to the end user’s operating system with an IP address.
  4. Browser opens a TCP connection to the server (this step is much more complex with HTTPS)
    1. Client sends SYN packet.
    2. Web server sends SYN-ACK packet.
    3. Client answers with ACK packet, concluding the three-way TCP connection establishment.
  5. Browser sends the HTTP request through TCP connection
    1. Web server processes the request, finds the resource, and sends the response to the Client. Client receives the first byte of the first packet from the web server, which contains the HTTP Response headers and content.
  6. Client loads the content of the response
  7. Web server sends second TCP segment with the PSH flag set.
  8. Client sends ACK. (Client sends ACK every two segments it receives. from the host)
  9. Web server sends third TCP segment with HTTP_Continue.
  10. Browser receives HTTP response and may close the TCP connection, or reuse it for another request
    1. Client sends a FIN packet to close the TCP connection.
  11. Browser checks if the response is a redirect or a conditional response (3xx result status codes), authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal responses (2xx)
  12. If cacheable, response is stored in a cache, following key in response header tells the browser to cache or not any request’s response
    1. Preventing caching (Cache-Control: no-cache, no-store, must-revalidate)
    2. Caching static assets (Cache-Control: public, max-age=31536000)
  13. Browser decodes response (e.g. if it's gzipped)
  14. The browser determines what to do with the response (e.g. is it an HTML page, is it an image, is it a sound clip, is it a js file, CSS file ?) using content-type in the respnse header
  15. The browser renders response or offers a download dialog for unrecognized types
    1. In case of html file, browser parse the html to create the DOM (Document Object Model)
    2. If during parsing browser gets js file with script tag then browser pause the dom parsing and start downloading the js file and execute the js code
      1. In case js file script tag has async attribute then downloading happens parallely but at the time of js code execution DOM parser stops
      2. In case js file script tag has defer attribute then downloading happens parallelly and js code executed after DOM completion
    3. If during html document parsing browser gets CSS file then DOM parsing is not blocked and CSS files downloaded parallely
      1. CSS file blocks rendering, once css is downloaded CSSOM (CSS object model) is created
      2. The DOM and CSSOM trees are combined to form the render tree
      3. Render tree contains only the nodes required to render the page
      4. Layout computes the exact position and size of each object
      5. The last step is paint, which takes in the final render tree and renders the pixels to the screen
      6. And now you see the www.google.com page in the browser