Find duplicate elements in the array with 4KB memory only

You have an array with all the numbers from 1 to N, where N is at most 32,000. The array may have duplicate entries and you do not know what N is. With only 4KB of memory available, how would you print all duplicate elements in the array?

My initial thoughts:
4KB = 32768 bits. We can use a byte array to represent if we have seen number i. If so, we make it 1. If we encounter a duplicate, we would have marked the particular position with 1, so we would know we have a duplicate. Then we print it out.

My initial codes:

	public static void findDuplicates(int[] array) {
		byte[] bitVector = new byte[32000 / 8];
		for (int number : array) {
			int posInArray = number >> 3;
			int posInBit = number % 8;
			if ((bitVector[posInArray] & (1 << posInBit)) > 0)
				// found duplicates
				System.out.println(number);
			else
				bitVector[posInArray] |= (1 << posInBit);
		}
	}

Solution:
We have 4KB of memory which means we can address up to 8 * 4 * (2^10) bits. Note that 32*(2^10) bits is greater than 32000. We can create a bit vector with 32000 bits, where each bit represents one integer.
NOTE: While this isn’t an especially difficult problem, it’s important to implement this cleanly. We will define our own bit vector class to hold a large bit vector.

	public static void checkDuplicates(int[] array) {
		BitSet bs = new BitSet(32000);
		for (int i = 0; i < array.length; i++) {
			int num = array[i];
			int num0 = num - 1; // bitset starts at 0, numbers start at 1
			if (bs.get(num0)) {
				System.out.println(num);
			} else {
				bs.set(num0);
			}
		}
	}

	class BitSet {
		int[] bitset;
		
		public BitSet(int size) {
			bitset = new int[size >> 5]; // divide by 32
		}

		boolean get(int pos) {
			int wordNumber = (pos >> 5); // divide by 32
			int bitNumber = (pos & 0x1F); // mod 32
			return (bitset[wordNumber] & (1 << bitNumber)) != 0;
		}

		void set(int pos) {
			int wordNumber = (pos >> 5); // divide by 32
			int bitNumber = (pos & 0x1F); // mod 32
			bitset[wordNumber] |= 1 << bitNumber;
		}
	}

Comments:
bitset starts at 0 while numbers start at 1. I didn’t pay attention to that.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: