Saturday, August 19, 2017

Done to Death

I saw this tweet from @redteamwrangler today:


This question is a pretty easy fallback for interviewers and it might be getting a little old for us. But it set me to thinking about how tediously I could answer this without using the Internet or any reference materials on my machine. Some of these details might be a little off or just downright wrong, but I'll display my ignorance. It was a fun exercise since it isn't in the context of any actual interview :-)

What happens when you type www.google.com into a browser and press return? Supposing you're using Internet Explorer for Windows on an wired (Ethernet) connection:

  • 8259a or emulated 8259a keyboard controller emits some scancodes
  • Keyboard interrupt is sent to the CPU
  • Interrupt service routine acknowledges interrupt, potentially moves one or more scancodes into a buffer
  • Or a delayed procedure call (DPC) does
  • ntoskrnl/win32k determine which thread corresponds to the foreground window and deliver a series of window messages of type WM_KEYDOWN/WM_KEYUP ending with one having virtual key code VK_ENTER
  • Window procedure for the browser URL bar (which is a window object) is called with a window message of type WM_KEYUP with virtual key code VK_ENTER
  • Window procedure has a switch statement / jmp table in it that accounts for this particular window message (WM_KEYUP) and maybe a sub-case for VK_ENTER
  • Probably takes the accumulated buffer so far (L"www.google.com") and passes it to a function
  • Probably uses a library like WinInet to do the real stuff
    • Probably calls InternetOpen() to get a handle of type HINTERNET
    • Probably calls InternetOpenUrlW() or HttpSomethingSomething() to get another HINTERNET handle
      • Probably reads the registry or uses a cached value for the HTTP User-Agent field that it provides here
      • Probably uses WinSock2 for TCP
      • Probably calls ws2_32!WSAStartup() if it hasn't been called yet
      • Checks proxy settings for the user and optionally establishes a connection with the corresponding hostname and implements HTTP proxy requests instead of direct HTTP requests
      • Parses the URL for the hostname, protocol scheme, any explicit port specification, URI, query parameters, etc.
        • Probably uses urlmon!InternetCrackUrlW() for this (or is that in wininet? I think it's in urlmon)
        • If no protocol scheme is specified, uses http:// which has a default port of 80
        • If https:// is specified, a default port of 443
      • Issues a DNS A (IPv4 address) request and/or AAAA (IPv6 address) request for the name
        • Probably calls ws2_32!gethostbyname() to do this
          • Thread consults DNS resolver cache service (if running) for name, probably via IPC
            • DNS resolver cache either returns the name or...
            • Probably uses dnsapi.dll which exports some function that...
              • Checks DNS configuration (probably the registry) to get primary, secondary, tertiary, etc. DNS servers
              • Calls ws2_32!inet_aton() to convert human-readable configuration to IPv4 or IPv6 addresses
              • Creates a socket object via ws2_32!socket()
              • Creates an in_addr object to communicate with the DNS server
              • Uses ws2_32!sendto() to use AF_INET/IPPROTO_UDP connectionlessly querying the server
                • Network layer (hmm, getting hand wavy) consults routing table to determine what interface packet should go through and whether it must visit a gateway
                • Network card device driver creates and fills out an object that the kernel uses to describe network datagrams (packets)
                • Network card device driver initiates I/O request with NIC via PCI registers or other hardware interface to provide datagram to be transmitted
                  • NIC takes the medium and transmits Ethernet frames bearing the octets that were given to it
                  • If another host transmits at the same time, the two hosts use the binary exponential backoff algorithm to wait until the medium is clear
                  • A router is likely the gateway; it accepts the packets and creates new packets to send across one or more other networks until the DNS server receives them
                  • DNS server UDP stack handles incoming packet, provides it to UDP-based DNS service e.g. bind which is bound to port 53
                  • Bind parses the packets, potentially forwards the request if the desired names are not in its zone file, and returns the response
              • The DNS resolver cache service receives and parses DNS reply/replies and returns the answer to the DNS client
          • If the DNS resolver cache service wasn't running, ws2_32!gethostbyname() probably does most of this itself
      • Since you only typed "www.google.com", it's plaintext HTTP on the default port, so 80
      • Establishes a TCP connection with the resulting host number and port
        • ws2_32!socket() to get a socket object
        • ws2_32!connect() with AF_INET and IPPROTO_TCP to connect
          • tcpip.sys is probably involved here
          • Again the network card and Ethernet medium stuff
          • TCP three-way handshake, window negotiation, etc.
            • The client sends a SYN TCP segment
            • The server returns a SYN,ACK TCP segment
            • The client returns an ACK TCP segment
          • TCP data transmission
            • The client sends a SYN,PSH TCP segment pushing data
            • Something like "GET / HTTP/1.1\nHost: www.google.com\nUser-Agent: ..."
  • Google's web server does some thinking and returns a response
  • The client receives a 3xx redirect response and gets directed to go to https://www.google.com/
  • Uses Microsoft schannel (secure channel) library to negotiate ciphers, parse the server's security certificate, and transmit data over TLS
  • Starts with ClientHello message, ServerHello, etc.
  • Obtains HTTP response, something like "HTTP 200 OK\n..." with some HTML in the HTTP body
  • HTML links to images, maybe JavaScript, etc., resulting in Cross-Origin (CORS) processing and follow-on requests
  • Invokes the JScript scripting engine for JScript and rendering engine to display the content
  • Uses graphics primitives and likely renders into a buffer that it furnishes to win32k.sys through GDI calls
  • GDI manages framebuffer of all windows including the foreground window
  • Monitor dispays the framebuffer
  • Photons fly into your eyes
  • Optic nerve and brain adjust for upside-down image arriving at retina
  • Person realizes then went to Google and says, "crap, I meant to go to Bing." Orrrrrr maybe not, haha.

If I had more time, I'd draw this out a little more, but I had to quit eventually. And the more I do this, the more I bump into all the things I don't know. A couple things I'd like to know more about:

  • What does "network layer" mean on Windows? I could use ETW with syscall stackwalking enabled to follow the ws2_32!connect() call into the kernel, or Windows Internals might just tell me.
  • What kernel object represents a packet in the Windows kernel? A packet buffer? I forgot :-( 
  • How does the networking stack give a packet to the NIC to transmit it? My device driver fu is ageing.