How to measure the time spent in a context switch

How can you measure the time spent in a context switch?

This is a tricky question, but let’s start with a possible solution.
A context switch is the time spent switching between two processes (e.g., bringing a waiting process into execution and sending an executing process into waiting/terminated state). This happens in multitasking. The operating system must bring the state information of waiting processes into memory and save the state information of the running process.
In order to solve this problem, we would like to record timestamps of the last and first instruction of the swapping processes. The context switching time would be the difference in the timestamps between the two processes.
Let’s take an easy example: Assume there are only two processes, P1 and P2.
P1 is executing and P2 is waiting for execution. At some point, the OS must swap P1 and P2 — let’s assume it happens at the Nth instruction of P1. So, the context switch time for this would be Time_Stamp(P2_1) – Time_Stamp(P2_N)
Easy enough. The tricky part is this: how do we know when this swapping occurs? Swapping is governed by the scheduling algorithm of the OS. We can not, of course, record the timestamp of every instruction in the process.
Another issue: there are many kernel level threads which are also doing context switches, and the user does not have any control over them.
Overall, we can say that this is mostly an approximate calculation which depends on the underlying OS. One approximation could be to record the end instruction timestamp of a process and start timestamp of a process and waiting time in queue.
If the total timeof execution of all the processes was T, then the context switch time = T – (SUM for all processes (waiting time + execution time)).

Differences between thread and process

What’s the difference between a thread and a process?

My initial thoughts:

  1. A thread is smaller than a process.
  2. Resources can be exchanged between threads but not between processes.
  3. One process usually contains multiple threads.


Processes and threads are related to each other but are fundamentally different.
A process can be thought of as an instance of a program in execution. Each process is an independent entity to which system resources (CPU time, memory, etc.) are allocated and each process is executed in a separate address space. One process cannot access the variables and data structures of another process. If you wish to access another process’ resources,
inter-process communications have to be used such as pipes, files, sockets etc.
A thread uses the same stack space of a process. A process can have multiple threads. A key difference between processes and threads is that multiple threads share parts of their state. Typically, one allows multiple threads to read and write the same memory (no processes can directly access the memory of another process). However, each thread still has its own registers and its own stack, but other threads can read and write the stack memory.
A thread is a particular execution path of a process; when one thread modifies a process resource, the change is immediately visible to sibling threads.

Differences between TCP and UDP

What are the differences between TCP and UDP? Explain how TCP handles reliable delivery (explain ACK mechanism), flow control (explain TCP sender’s / receiver’s window) and congestion control.

TCP (Transmission Control Protocol): TCP is a connection-oriented protocol. A connection can be made from client to server, and from then on any data can be sent along that connection.

  • Reliable – when you send a message along a TCP socket, you know it will get there unless the connection fails completely. If it gets lost along the way, the server will re-request the lost part. This means complete integrity; data will not get corrupted.
  • Ordered – if you send two messages along a connection, one after the other, you know the first message will get there first. You don’t have to worry about data arriving in the wrong order.
  • Heavyweight – when the low level parts of the TCP “stream” arrive in the wrong order, resend requests have to be sent. All the out of sequence parts must be put back together, which requires a bit of work.

UDP(User Datagram Protocol): UDP is connectionless protocol. With UDP you send messages (packets) across the network in chunks.

  • Unreliable – When you send a message, you don’t know if it’ll get there; it could get lost on the way.
  • Not ordered – If you send two messages out, you don’t know what order they’ll arrive in.
  • Lightweight – No ordering of messages, no tracking connections, etc. It’s just fire and forget! This means it’s a lot quicker, and the network card / OS have to do very little work to translate the data back from the packets.

Explain how TCP handles reliable delivery (explain ACK mechanism), flow control (explain TCP sender’s/receiver’s window).

For each TCP packet, the receiver of a packet must acknowledge that the packet is received. If there is no acknowledgement, the packet is sent again. These guarantee that every single packet is delivered. ACK is a packet used in TCP to acknowledge receipt of a packet. A TCP window is the amount of outstanding (unacknowledged by the recipient) data a sender can send on a particular connection before it gets an acknowledgment back from the receiver that it has gotten some of it.
For example, if a pair of hosts are talking over a TCP connection that has a TCP window with a size of 64 KB, the sender can only send 64 KB of data and then it must wait for an acknowledgment from the receiver that some or all of the data has been received. If the receiver acknowledges that all the data has been received, then the sender is free to send another 64 KB. If the sender gets back an acknowledgment from the receiver that it received the first 32 KB (which could happen if the second 32 KB was still in transit or it could happen if the second 32 KB got lost), then the sender can only send another additional 32 KB since it can’t have more than 64 KB of unacknowledged data outstanding (the second 32 KB of data plus the third).

Congestion Control

The TCP uses a network congestion avoidance algorithm that includes various aspects of an additive-increase-multiplicative-decrease scheme, with other schemes such as slow-start in order to achieve congestion avoidance.
There are different algorithms to solve the problem; Tahoe and Reno are the most well known. To avoid congestion collapse, TCP uses a multi-faceted congestion control strategy. For each connection, TCP maintains a congestion window, limiting the total number of unacknowledged packets that may be in transit end-to-end. This is somewhat analogous to TCP’s sliding window used for flow control. TCP uses a mechanism called slow start to increase the congestion window after a connection is initialized and after a timeout. It starts with a window of two times the maximum segment size (MSS). Although the initial rate is low, the rate of increase is very rapid: for every packet acknowledged, the congestion window increases by 1 MSS so that for every round trip time (RTT), the congestion window has doubled. When the congestion window exceeds a threshold ssthresh the algorithm enters a new state, called congestion avoidance. In some implementations (i.e., Linux), the initial ssthresh is large, and so the first slow start usually ends after a loss. However, ssthresh is updated at the end of each slow start, and will often affect subsequent slow starts triggered by timeouts.

What is a network/subnet mask

What is a network / subnet mask? Explain how host A sends a message / packet to host B when: (a) both are on same network and (b) both are on different networks. Explain which layer makes the routing decision and how.

A mask is a bit pattern used to identify the network/subnet address. The IP address consists of two components: the network address and the host address.
The IP addresses are categorized into different classes which are used to identify the network address.
Example: Consider IP address This address belongs to Class B, so:
Network Mask: 11111111.11111111.00000000.00000000
Given Address: 10011000.11010101.00001011.00000010
By ANDing Network Mask and IP Address, we get the following network address:
10011000.11010101.00000000.00000000 (
Host address: 00001011.00000010
Similarly, a network administrator can divide any network into sub-networks by using subnet mask. To do this, we further divide the host address into two or more subnets.
For example, if the above network is divided into 18 subnets (requiring a minimum of 5 bits to represent 18 subnets), the first 5 bits will be used to identify the subnet address.
Subnet Mask: 11111111.11111111.11111000.00000000 (
Given Address: 10011000.11010101.00001011.00000010
So, by ANDing the subnet mask and the given address, we get the following subnet address: 10011000.11010101.00001000.00000000 (
How Host A sends a message/packet to Host B:
When both are on same network: the host address bits are used to identify the host within the network.
Both are on different networks: the router uses the network mask to identify the network and route the packet. The host can be identified using the network host address.
The network layer is responsible for making routing decisions. A routing table is used to store the path information and the cost involved with that path, while a routing algorithm uses the routing table to decide the path on which to route the packets.
Routing is broadly classified into Static and Dynamic Routing based on whether the table is fixed or it changes based on the current network condition.

Compare and contrast IPv4 and IPv6

Compare and contrast the IPv4 and IPv6 protocols.

IPv4 and IPv6 are the internet protocols applied at the network layer. IPv4 is the most widely used protocol right now and IPv6 is the next generation protocol for internet.

  • IPv4 is the fourth version of Internet protocol which uses 32 bit addressing whereas IPv6 is a next generation internet protocol which uses 128 bits addressing.
  • IPv4 allows 4,294,967,296 unique addresses where as IPv6 can hold 340-undecillion (34, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000) unique IP addresses.
  • IPv4 has different class types: A,B,C,D and E. Class A, Class B, and Class C are the three classes of addresses used on IP networks in common practice. Class D addresses are reserved for multicast. Class E addresses are simply reserved, meaning they should not be used on IP networks (used on a limited basis by some research organizations for experimental purposes).
  • IPv4 has different class types: A,B,C,D and E. Class A, Class B, and Class C are the three classes of addresses used on IP networks in common practice. Class D addresses are reserved for multicast. Class E addresses are simply reserved, meaning they should not be used on IP networks (used on a limited basis by some research organizations for experimental purposes).
    1. Unicast addresses: A Unicast address acts as an identifier for a single interface. An IPv6 packet sent to a Unicast address is delivered to the interface identified by that address.
    2. Multicast addresses: A Multicast address acts as an identifier for a group / set of interfaces that may belong to the different nodes. An IPv6 packet delivered to a multicast address is delivered to the multiple interfaces.
    3. Anycast addresses: Anycast addresses act as identifiers for a set of interfaces that may belong to the different nodes. An IPv6 packet destined for an Anycast address is delivered to one of the interfaces identified by the address.
  • IPv4 address notation:,
  • IPv6 addresses are denoted by eight groups of hexadecimal quartets separated by colons in between them.
  • An example of a valid IPv6 address: 2001:cdba:0000:0000:0000:0000:3257:9652

Because of the increase in the population, there is a need of Ipv6 protocol which can provide solution for:

  1. Increased address space
  2. More efficient routing
  3. Reduced management requirement
  4. Improved methods to change ISP
  5. Better mobility support
  6. Multi-homing
  7. Security
  8. Scoped address: link-local, site-local and global-address space

Common routing protocol

Explain any common routing protocol in detail. For example: BGP, OSPF, RIP.


  1. BGP: Border Gateway Protocol
    BGP is the core routing protocol of the Internet . When a BGP router first comes up on the Internet, either for the first time or after being turned off, it establishes connections with the other BGP routers with which it directly communicates. The first thing it does is download the entire routing table of each neighboring router. After that it only exchanges much shorter update messages with other routers.
    BGP routers send and receive update messages to indicate a change in the preferred path to reach a computer with a given IP address. If the router decides to update its own routing tables because this new path is better, then it will subsequently propagate this information to all of the other neighboring BGP routers to which it is connected, and they will in turn decide whether to update their own tables and propagate the information further.
  2. RIP: Routing Information Protocol
    RIP provides the standard IGP protocol for local area networks, and provides great network stability, guaranteeing that if one network connection goes down the network can quickly adapt to send packets through another connection.
    What makes RIP work is a routing database that stores information on the fastest route from computer to computer, an update process that enables each router to tell other routers which route is the fastest from its point of view, and an update algorithm that enables each router to update its database with the fastest route communicated from neighboring routers.
  3. OSPF: Open Shortest Path First
    Open Shortest Path First (OSPF) is a particularly efficient IGP routing protocol that is faster than RIP, but also more complex.
    The main difference between OSPF and RIP is that RIP only keeps track of the closest router for each destination address, while OSPF keeps track of a complete topological database of all connections in the local network. The OSPF algorithm works as described below:

    • Startup. When a router is turned on it sends Hello packets to all of its neighbors, receives their Hello packets in return, and establishes routing connections by synchronizing databases with adjacent routers that agree to synchronize.
    • Update. At regular intervals each router sends an update message called its “link state” describing its routing database to all the other routers, so that all routers have the same description of the topology of the local network.
    • Shortest path tree. Each router then calculates a mathematical data structure called a “shortest path tree” that describes the shortest path to each destination address and therefore indicates the closest router to send to for each communication; in other words — “open shortest path first”.

What happens after typing a URL into a browser

Explain what happens, step by step, after you type a URL into a browser. Use as much detail as possible.

My initial thoughts:
Take the URL of my website as example:

  1. The browser searches for its cookie to see if the URL was stored. If yes, get its server address.
  2. If the URL is not stored, the browser sends a request to the domain server. The domain server returns a server address of my blog. This is domain resolution.
  3. The browser then sends a request to the server of my blog, the server returns a redirected address.
  4. The browser then sends another request following the redirected address, the server will returns a package of data, probably using POST method.
  5. The browser then render the data to show the webpage of the URL.

There’s no right, or even complete, answer for this question This question allows you to go into arbitrary amounts of detail depending on what you’re comfortable with. Here’s a start though:

  1. Browser contacts the DNS server to find the IP address of URL.
  2. DNS returns back the IP address of the site.
  3. Browser opens TCP connection to the web server at port 80.
  4. Browser fetches the html code of the page requested.
  5. Browser renders the HTML in the display window.
  6. Browser terminates the connection when window is closed.

One of the most interesting steps is Step 1 and 2 – “Domain Name Resolution”. The web addresses we type are nothing but an alias to an IP address in human readable form. Mapping of domain names and their associated Internet Protocol (IP) addresses is managed by the Domain Name System (DNS), which is a distributed but hierarchical entity.
Each domain name server is divided into zones. A single server may only be responsible for knowing the host names and IP addresses for a small subset of a zone, but DNS servers can work together to map all domain names to their IP addresses. That means if one domain name server is unable to find the IP addresses of a requested domain then it requests the information from other domain name servers.

Allocate a two dimensional array using one call of malloc in C

Write a function called my2DAlloc which allocates a two dimensional array. Minimize the number of calls to malloc and make sure that the memory is accessible by the notation arr[i][j].

We will use one call to malloc.
Allocate one block of memory to hold the row vector and the array data. The row vector will reside in rows * sizeof(int*) bytes. The integers in the array will take up another rows * cols * sizeof(int) bytes.
Constructing the array in a single malloc has the added benefit of allowing disposal of the array with a single free call rather than using a special function to free the subsidiary data blocks.

int** My2DAlloc(int rows, int cols) {
	int header = rows * sizeof(int*);
	int data = rows * cols * sizeof(int);
	int** rowptr = (int**)malloc(header + data);
	int* buf = (int*)(rowptr + rows);
	int k;
	for (k = 0; k < rows; ++k) {
		rowptr[k] = buf + k*cols;
	return rowptr;

Here is a demo:

Aligned malloc in C

Write an aligned malloc & free function that takes number of bytes and aligned byte (which is always power of 2).
align_malloc (1000,128) will return a memory address that is a multiple of 128 and that points to memory of size 1000 bytes.
aligned_free() will free memory allocated by align_malloc.


  1. We will use malloc routine provided by C to implement the functionality.
    Allocate memory of size (bytes required + alignment – 1 + sizeof(void*)) using malloc.
    alignment: malloc can give us any address and we need to find a multiple of alignment. (Therefore, at maximum multiple of alignment, we will be alignment-1 bytes away from any location.)
    sizeof(size_t): We are returning a modified memory pointer to user, which is different from the one that would be returned by malloc. We also need to extra space to store the address given by malloc, so that we can free memory in aligned_free by calling free routine provided by C.
  2. If it returns NULL, then aligned_malloc will fail and we return NULL.
  3. Else, find the aligned memory address which is a multiple of alignment (call this p2).
  4. Store the address returned by malloc (e.g., p1 is just size_t bytes ahead of p2), which will be required by aligned_free.
  5. Return p2.
void* aligned_malloc(size_t required_bytes, size_t alignment) {
	void* p1; // original block
	void** p2; // aligned block
	int offset = alignment - 1 + sizeof(void*);
	if ((p1 = (void*)malloc(required_bytes + offset)) == NULL) {
		return NULL;
	p2 = (void**)(((size_t)(p1) + offset) & ~(alignment - 1));
	p2[-1] = p1;
	return p2;

void aligned_free(void *p) {

Device FIFO queue problem

A device boots with an empty FIFO queue. In the first 400 ns period after startup, and in each subsequent 400 ns period, a maximum of 80 words will be written to the queue. Each write takes 4 ns. A worker thread requires 3 ns to read a word, and 2 ns to process it before reading the next word. What is the shortest depth of the FIFO such that no data is lost?

While a perfectly optimal solution is complex, an interviewer is most interested in how you approach the problem.
First, note that writes do not have to be evenly distributed within a period. Thus a likely worst case is 80 words are written at the end of the first period, followed by 80 more at the start of the next.
Note that the maximum write rate for a full period is exactly matched by a full period of processing (400 ns / ((3 ns + 2 ns)/process) = 80 processed words/period).
As the 2nd period in our example is fully saturated, adding writes from a 3rd period would not add additional stress, and this example is a true worst case for the conditions.
For an estimate of maximum queue size, notice that these 160 writes take 640 ns (160 writes * 4 ns / write = 640 ns), during which time only 128 words have been read (640 ns / ((3 ns + 2 ns) / word) = 128 words). However, the first read cannot start until the first write has finished, which fills an extra slot in the queue.
Also, depending on the interactions between read and write timing, a second additional slot may be necessary to ensure a write does not trash the contents of a concurrently occurring read. Thus, a safe estimate is that the queue must be at least 34 words deep (160 – 128 + 1 + 1 = 34) to accommodate the unread words.

Previous Older Entries