Find largest 1M numbers in 1B numbers

Describe an algorithm to find the largest 1 million numbers in 1 billion numbers. Assume that the computer memory can hold all one billion numbers.

Thoughts:
We can certainly sort those numbers and return the last 1 million of them. That takes O(n\log n).
If we think about it, we actually do not need to sort them. After all, we just need the largest 1 million numbers, in whatever orders. Therefore we can sort of “partially” sort the numbers and try to find the parts that we need. To do this, we get inspiration from Quicksort, where we get two partitions around pivot during each run:

[elements < pivot][elements >= pivot]

For simplicity, Let us denote the number of elements in the right part as m.
If m is exactly equal to 1 million, we have found the largest 1 million numbers!
If m is larger than 1 million, we need to keep looking for the 1 million numbers, but in the left part of the array this time. We can do this in a recursive way.
If m is less than 1 million, we first remember those m numbers and then we search for the largest (1 million – m) numbers in the left part of the array. Again, recursion here.
Using the random pivot choosing approach, this algorithm takes O(n) time complexity and it does not need any additional space since we can do the partition in place.

Codes:

	public static int quickPartition(int[] array, int low, int high) {
		if (high <= low)
			return -1;

		// randomly choose a pivot
		int index = new Random().nextInt(high - low + 1) + low;
		int pivot = array[index];

		// swap the pivot to the front
		int num = array[index];
		array[index] = array[low];
		array[low] = num;

		// partition as in Quicksort
		int L = low + 1;
		int R = high;

		while (R >= L) {
			while (R >= L && array[L] < pivot)
				L++;
			while (R >= L && array[R] >= pivot)
				R--;
			if (R >= L) {
				int temp = array[L];
				array[L] = array[R];
				array[R] = temp;
				L++;
				R--;
			}
		}

		// swap the pivot back to appropriate position.
		num = array[R];
		array[R] = array[low];
		array[low] = num;

		return R;
	}

	public static List<Integer> largestM(int[] numbers, int M, int low, int high) {
		// RIndex = index of pivot
		int RIndex = quickPartition(numbers, low, high);
		// mFound = # of the largest numbers
		int mFound = high + 1 - RIndex;

		List<Integer> results = new LinkedList<Integer>();

		if (mFound == M) { // Done!
			int i = RIndex;
			for (; i <= high; ++i)
				results.add(numbers[i]);
			return results;

		} else if (mFound > M) { // Keep looking
			// check if those mFound elements are actually the same
			boolean duplicates = true;
			for (int i = RIndex; i <= high; ++i) {
				if (numbers[i] != numbers[RIndex])
					duplicates = false;
			}

			if (!duplicates)
				return largestM(numbers, M, RIndex, high);
			else {
				// if they are the same, just return M duplicates of them
				for (int k = 0; k < M; ++k)
					results.add(numbers[RIndex]);
				return results;
			}
		} else { // Has found some, keep looking for the rest
			int i = RIndex;
			for (; i < numbers.length; ++i)
				results.add(numbers[i]);

			results.addAll(largestM(numbers, M - mFound, low, RIndex - 1));
			return results;
		}

	}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: