This question is a pretty easy fallback for interviewers and it might be getting a little old for us. But it set me to thinking about how tediously I could answer this without using the Internet or any reference materials on my machine. Some of these details might be a little off or just downright wrong, but I'll display my ignorance. It was a fun exercise since it isn't in the context of any actual interview :-)
What happens when you type www.google.com into a browser and press return? Supposing you're using Internet Explorer for Windows on an wired (Ethernet) connection:
- 8259a or emulated 8259a keyboard controller emits some scancodes
- Keyboard interrupt is sent to the CPU
- Interrupt service routine acknowledges interrupt, potentially moves one or more scancodes into a buffer
- Or a delayed procedure call (DPC) does
- ntoskrnl/win32k determine which thread corresponds to the foreground window and deliver a series of window messages of type WM_KEYDOWN/WM_KEYUP ending with one having virtual key code VK_ENTER
- Window procedure for the browser URL bar (which is a window object) is called with a window message of type WM_KEYUP with virtual key code VK_ENTER
- Window procedure has a switch statement / jmp table in it that accounts for this particular window message (WM_KEYUP) and maybe a sub-case for VK_ENTER
- Probably takes the accumulated buffer so far (L"www.google.com") and passes it to a function
- Probably uses a library like WinInet to do the real stuff
- Probably calls InternetOpen() to get a handle of type HINTERNET
- Probably calls InternetOpenUrlW() or HttpSomethingSomething() to get another HINTERNET handle
- Probably reads the registry or uses a cached value for the HTTP User-Agent field that it provides here
- Probably uses WinSock2 for TCP
- Probably calls ws2_32!WSAStartup() if it hasn't been called yet
- Checks proxy settings for the user and optionally establishes a connection with the corresponding hostname and implements HTTP proxy requests instead of direct HTTP requests
- Parses the URL for the hostname, protocol scheme, any explicit port specification, URI, query parameters, etc.
- Probably uses urlmon!InternetCrackUrlW() for this (or is that in wininet? I think it's in urlmon)
- If no protocol scheme is specified, uses http:// which has a default port of 80
- If https:// is specified, a default port of 443
- Issues a DNS A (IPv4 address) request and/or AAAA (IPv6 address) request for the name
- Probably calls ws2_32!gethostbyname() to do this
- Thread consults DNS resolver cache service (if running) for name, probably via IPC
- DNS resolver cache either returns the name or...
- Probably uses dnsapi.dll which exports some function that...
- Checks DNS configuration (probably the registry) to get primary, secondary, tertiary, etc. DNS servers
- Calls ws2_32!inet_aton() to convert human-readable configuration to IPv4 or IPv6 addresses
- Creates a socket object via ws2_32!socket()
- Creates an in_addr object to communicate with the DNS server
- Uses ws2_32!sendto() to use AF_INET/IPPROTO_UDP connectionlessly querying the server
- Network layer (hmm, getting hand wavy) consults routing table to determine what interface packet should go through and whether it must visit a gateway
- Network card device driver creates and fills out an object that the kernel uses to describe network datagrams (packets)
- Network card device driver initiates I/O request with NIC via PCI registers or other hardware interface to provide datagram to be transmitted
- NIC takes the medium and transmits Ethernet frames bearing the octets that were given to it
- If another host transmits at the same time, the two hosts use the binary exponential backoff algorithm to wait until the medium is clear
- A router is likely the gateway; it accepts the packets and creates new packets to send across one or more other networks until the DNS server receives them
- DNS server UDP stack handles incoming packet, provides it to UDP-based DNS service e.g. bind which is bound to port 53
- Bind parses the packets, potentially forwards the request if the desired names are not in its zone file, and returns the response
- The DNS resolver cache service receives and parses DNS reply/replies and returns the answer to the DNS client
- If the DNS resolver cache service wasn't running, ws2_32!gethostbyname() probably does most of this itself
- Since you only typed "www.google.com", it's plaintext HTTP on the default port, so 80
- Establishes a TCP connection with the resulting host number and port
- ws2_32!socket() to get a socket object
- ws2_32!connect() with AF_INET and IPPROTO_TCP to connect
- tcpip.sys is probably involved here
- Again the network card and Ethernet medium stuff
- TCP three-way handshake, window negotiation, etc.
- The client sends a SYN TCP segment
- The server returns a SYN,ACK TCP segment
- The client returns an ACK TCP segment
- TCP data transmission
- The client sends a SYN,PSH TCP segment pushing data
- Something like "GET / HTTP/1.1\nHost: www.google.com\nUser-Agent: ..."
- Google's web server does some thinking and returns a response
- The client receives a 3xx redirect response and gets directed to go to https://www.google.com/
- Uses Microsoft schannel (secure channel) library to negotiate ciphers, parse the server's security certificate, and transmit data over TLS
- Starts with ClientHello message, ServerHello, etc.
- Obtains HTTP response, something like "HTTP 200 OK\n..." with some HTML in the HTTP body
- HTML links to images, maybe JavaScript, etc., resulting in Cross-Origin (CORS) processing and follow-on requests
- Invokes the JScript scripting engine for JScript and rendering engine to display the content
- Uses graphics primitives and likely renders into a buffer that it furnishes to win32k.sys through GDI calls
- GDI manages framebuffer of all windows including the foreground window
- Monitor dispays the framebuffer
- Photons fly into your eyes
- Optic nerve and brain adjust for upside-down image arriving at retina
- Person realizes then went to Google and says, "crap, I meant to go to Bing." Orrrrrr maybe not, haha.
If I had more time, I'd draw this out a little more, but I had to quit eventually. And the more I do this, the more I bump into all the things I don't know. A couple things I'd like to know more about:
- What does "network layer" mean on Windows? I could use ETW with syscall stackwalking enabled to follow the ws2_32!connect() call into the kernel, or Windows Internals might just tell me.
- What kernel object represents a packet in the Windows kernel? A packet buffer? I forgot :-(
- How does the networking stack give a packet to the NIC to transmit it? My device driver fu is ageing.