Allocate a two dimensional array using one call of malloc in C

Write a function called my2DAlloc which allocates a two dimensional array. Minimize the number of calls to malloc and make sure that the memory is accessible by the notation arr[i][j].

We will use one call to malloc.
Allocate one block of memory to hold the row vector and the array data. The row vector will reside in rows * sizeof(int*) bytes. The integers in the array will take up another rows * cols * sizeof(int) bytes.
Constructing the array in a single malloc has the added benefit of allowing disposal of the array with a single free call rather than using a special function to free the subsidiary data blocks.

int** My2DAlloc(int rows, int cols) {
	int header = rows * sizeof(int*);
	int data = rows * cols * sizeof(int);
	int** rowptr = (int**)malloc(header + data);
	int* buf = (int*)(rowptr + rows);
	int k;
	for (k = 0; k < rows; ++k) {
		rowptr[k] = buf + k*cols;
	return rowptr;

Here is a demo:

Aligned malloc in C

Write an aligned malloc & free function that takes number of bytes and aligned byte (which is always power of 2).
align_malloc (1000,128) will return a memory address that is a multiple of 128 and that points to memory of size 1000 bytes.
aligned_free() will free memory allocated by align_malloc.


  1. We will use malloc routine provided by C to implement the functionality.
    Allocate memory of size (bytes required + alignment – 1 + sizeof(void*)) using malloc.
    alignment: malloc can give us any address and we need to find a multiple of alignment. (Therefore, at maximum multiple of alignment, we will be alignment-1 bytes away from any location.)
    sizeof(size_t): We are returning a modified memory pointer to user, which is different from the one that would be returned by malloc. We also need to extra space to store the address given by malloc, so that we can free memory in aligned_free by calling free routine provided by C.
  2. If it returns NULL, then aligned_malloc will fail and we return NULL.
  3. Else, find the aligned memory address which is a multiple of alignment (call this p2).
  4. Store the address returned by malloc (e.g., p1 is just size_t bytes ahead of p2), which will be required by aligned_free.
  5. Return p2.
void* aligned_malloc(size_t required_bytes, size_t alignment) {
	void* p1; // original block
	void** p2; // aligned block
	int offset = alignment - 1 + sizeof(void*);
	if ((p1 = (void*)malloc(required_bytes + offset)) == NULL) {
		return NULL;
	p2 = (void**)(((size_t)(p1) + offset) & ~(alignment - 1));
	p2[-1] = p1;
	return p2;

void aligned_free(void *p) {

Device FIFO queue problem

A device boots with an empty FIFO queue. In the first 400 ns period after startup, and in each subsequent 400 ns period, a maximum of 80 words will be written to the queue. Each write takes 4 ns. A worker thread requires 3 ns to read a word, and 2 ns to process it before reading the next word. What is the shortest depth of the FIFO such that no data is lost?

While a perfectly optimal solution is complex, an interviewer is most interested in how you approach the problem.
First, note that writes do not have to be evenly distributed within a period. Thus a likely worst case is 80 words are written at the end of the first period, followed by 80 more at the start of the next.
Note that the maximum write rate for a full period is exactly matched by a full period of processing (400 ns / ((3 ns + 2 ns)/process) = 80 processed words/period).
As the 2nd period in our example is fully saturated, adding writes from a 3rd period would not add additional stress, and this example is a true worst case for the conditions.
For an estimate of maximum queue size, notice that these 160 writes take 640 ns (160 writes * 4 ns / write = 640 ns), during which time only 128 words have been read (640 ns / ((3 ns + 2 ns) / word) = 128 words). However, the first read cannot start until the first write has finished, which fills an extra slot in the queue.
Also, depending on the interactions between read and write timing, a second additional slot may be necessary to ensure a write does not trash the contents of a concurrently occurring read. Thus, a safe estimate is that the queue must be at least 34 words deep (160 – 128 + 1 + 1 = 34) to accommodate the unread words.

Prevent reverse engineering of DLLs

What are the best practices to prevent reverse engineering of DLLs?

Best practices include the following:

  • Use obfuscators.
  • Do not store any data (string, etc) in open form. Always compress or encode it.
  • Use a static link so there is no DLL to attack.
  • Strip all symbols.
  • Use a .DEF file and an import library to have anonymous exports known only by their export ids.
  • Keep the DLL in a resource and expose it in the file system (under a suitably obscure name, perhaps even generated at run time) only when running.
  • Hide all real functions behind a factory method that exchanges a secret (better, proof of knowledge of a secret) for a table of function pointers to the real methods.
  • Use anti-debugging techniques borrowed from the malware world to prevent reverse
    engineering (Note that this will likely get you false positives from AV tools).
  • Use protectors.

How to make sure a process doesn’t access unauthorized part of stack

Discuss how would you make sure that a process doesn’t access an unauthorized part of the stack

As with any ambiguously worded interview question, it may help to probe the interviewer to understand what specifically you’re intended to solve. Are you trying to prevent code that has overflowed a buffer from compromising the execution by overwriting stack values? Are you trying to maintain some form of thread-specific isolation between threads? Is the code of interest native code like C++ or running under a virtual machine like Java?
Remember that, in a multi-threaded environment, there can be multiple stacks in a process.
Native Code
One threat to the stack is malicious program input, which can overflow a buffer and overwrite stack pointers, thus circumventing the intended execution of the program.
If the interviewer is looking for a simple method to reduce the risk of buffer overflows in native code, modern compilers provide this sort of stack protection through a command line option. With Microsoft’s CL, you just pass /GS to the compiler. With GCC, you can use -fstack-protector-all.
For more complex schemes, you could set individual permissions on the range of memory pages representing the stack section you care about. In the Win32 API, you’d use the VirtualProtect API to mark the page PAGE_READONLY or PAGE_NOACCESS. This will cause the code accessing the region to go through an exception on access to the specific section of the stack .
Alternately, you could use the HW Debug Registers (DRs) to set a read or write breakpoint on the specific memory addresses of interest. A separate process could be used to debug the process of interest, catch the HW exception that would be generated if this section of the stack were accessed.
However, it’s very important to note that under normal circumstances, threads and processes are not means of access control. Nothing can prevent native code from writing anywhere within the address space of its process, including to the stack. Specifically, there is nothing to prevent malicious code in the process from calling VirtualProtect and marking the stack sections of interest PAGE_EXECUTE_READWRITE. Equally so, nothing prevents code from zeroing out the HW debug registers, eliminating your breakpoints. In summary, nothing can fully prevent native code from accessing memory addresses, including the stack, within its own process space.
A final option is to consider requiring this code that should be “sandboxed” to run in a managed language like Java or C#/.NET. By default, the virtual machines running managed code in these languages make it impossible to gain complete access to the stack from within the process.
One can use further security features of the runtimes to prevent the code from spawning additional processes or running “unsafe” code to inspect the stack. With .NET, for example, you can use Code Access Security (CAS) or appdomain permissions to control such execution.

Determine a machine is using big endian or little endian

Write a program to find whether a machine is big endian or little endian.


int main()
   unsigned int i = 1;
   char *c = (char*)&i;
   if (*c)
       printf("Little endian");
       printf("Big endian");
   return 0;

In the above program, a character pointer c is pointing to an integer i. Since size of character is 1 byte when the character pointer is de-referenced it will contain only first byte of integer. If machine is little endian then *c will be 1 (because last byte is stored first) and if machine is big endian then *c will be 0.

What happens after a user presses a key?

Write a step by step execution of things that happen after a user presses a key on the keyboard. Use as much detail as possible.

My initial thoughts:

  1. A key is pressed, which triggers a signal and that is transferred to the computer.
  2. That triggers an event (exception). The OS need to handle it.
  3. The OS maps the key to a keycode according to the key map.
  4. Appropriate functions will be executed according to the keycode


  1. The keyboard sends a scan code of the key to the keyboard controller (Scan code for key pressed and key released is different).
  2. The keyboard controller interprets the scan code and stores it in a buffer.
  3. The keyboard controller sends a hardware interrupt to the processor. This is done by putting signal on “interrupt request line”: IRQ 1.
  4. The interrupt controller maps IRQ 1 into INT 9.
  5. An interrupt is a signal which tells the processor to stop what it was doing currently and do some special task.
  6. The processor invokes the “Interrupt handler”. CPU fetches the address of “Interrupt Service Routine” (ISR) from “Interrupt Vector Table” maintained by the OS (Processor use the IRQ number for this)
  7. The ISR reads the scan code from port 60h and decides whether to process it or pass the control to program for taking action.

Direct memory access(DMA)

Describe direct memory access (DMA). Can a user level buffer/pointer be used by kernel or drivers?

Direct Memory is a feature which provides direct access (read/write) to system memory without interaction from the CPU. The “DMA Controller” manages this by requesting the System bus access (DMA request) from CPU. CPU completes its current task and grants access by asserting DMA acknowledgement signal. Once it gets the access, it reads/writes the data and returns back the system bus to the CPU by asserting the bus release signal. This transfer is faster than the usual transfer by CPU. Between this time CPU is involved with processing task which doesn’t require memory access.
By using DMA, drivers can access the memory allocated to the user level buffer/pointer.

Branch Target buffer

What is a Branch Target buffer? Explain how it can be used in reducing bubble cycles in cases of branch misprediction.

Branch misprediction occurs when the CPU mispredicts the next instruction to be executed.
The CPU uses pipelining which allows several instructions to be processed simultaneously. But during a conditional jump, the next instruction to be executed depends on the result of the condition. Branch Prediction tries to guess the next instruction. However, if the guess is wrong, we are penalized because the instruction which was executed must be discarded.
Branch Target Buffer (BTB) reduces the penalty by predicting the path of the branch, computing the target of the branch and caching the information used by the branch. There will be no stalls if the branch entry found on BTB and the prediction is correct, otherwise the penalty will be at least two cycles.

Virtual memory, page fault and thrashing

Explain the following terms: virtual memory, page fault, thrashing.

My initial thoughts:

  1. Virtual memory is a technique that uses hard disk pages as main memory.
  2. Page fault is the error occurred when transferring pages.
  3. Trashing… No idea.

Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage. Systems that use this technique make programming of large applications easier and use real physical memory (e.g. RAM) more efficiently than those without virtual memory.

Page Fault: A page is a fixed-length block of memory that is used as a unit of transfer between physical memory and external storage like a disk, and a page fault is an interrupt (or exception) to the software raised by the hardware, when a program accesses a page that is mapped in address space, but not loaded in physical memory.

Thrash is the term used to describe a degenerate situation on a computer where increasing resources are used to do a decreasing amount of work. In this situation the system is said to be thrashing. Usually it refers to two or more processes accessing a shared resource repeatedly such that serious system performance degradation occurs because the system is spending a disproportionate amount of time just accessing the shared resource. Resource access time may generally be considered as wasted, since it does not contribute to the advancement of any process. In modern computers, thrashing may occur in the paging system (if there is not ‘sufficient’ physical memory or the disk access time is overly long), or in the communications system (especially in conflicts over internal bus access), etc.