Shortest distances between two words in a file

You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?

My initial thoughts:
We can build and store the mapping between pairs of words to their shortest distance. That is O(n^2) space complexity. But the query is constant since it’s just one step of look-up.

My initial codes:

	private static Map<HashSet<String>, Integer> distances = 
			new HashMap<HashSet<String>, Integer>();

	public static int query(String word1, String word2) {
		return distances.get(new Pair<String>(word1, word2));

	public static void buildMap(String text) {
		// clean up the text
		String[] temp = text.split("[\\s\\p{Punct}]");
		List<String> words = new ArrayList<String>();
		for (String word : temp) {
			word = word.trim();
			if (!word.isEmpty()) {

		// build the mapping between pairs of words to
		// their shortest distances
		for (int i = 0; i < words.size(); ++i) {
			for (int j = i + 1; j < words.size(); ++j) {
				if (!words.get(i).equals(words.get(j))) {
					HashSet<String> pair = new HashSet<String>();
					if (distances.keySet().contains(pair)) {
						int curr = distances.get(pair);
						if (j - i < curr)
							distances.put(pair, j - i);
					} else {
						distances.put(pair, j - i);

Please refer to the book: Cracking the Coding Interview.
Another way to solve this is by:

  1. Pre-processing: Store the locations for different words in a hashtable. One scan of the text: O(n) time complexity. Approximately O(n) space complexity.
  2. Query: Modified binary search. For example, query is (hello, world). Lookup in the hashtable we have “hello” -> [1,2] and “world” -> [5,8,9]. We search 1 in [5,8,9] to find the nearest, which is 5. So the distance is 4. We search 2 in [5,8,9] to find the nearest, which is 5 again, yielding distance 3, less than 4. So the shortest distance between “hello” and “world” is 3. In practice, number of locations for a word is relatively small comparing to the size of the text, hence the cost of query/search is nearly constant.

Implementation of the modified binary search:

	private static Map<String, ArrayList<Integer>> locations = new HashMap<String, ArrayList<Integer>>();

	private static void storeLocations(String[] text) {
		for (int i = 0; i < text.length; ++i) {
			String word = text[i];
			if (locations.keySet().contains(word)) {
				ArrayList<Integer> location = locations.get(word);
				locations.put(word, location);
			} else {
				ArrayList<Integer> location = new ArrayList<Integer>();
				locations.put(word, location);

	private static int modified_binary_search(ArrayList<Integer> array,
			int target) {
		int low = 0;
		int high = array.size();
		while (low < high) {
			int mid = low + (high - low) / 2;
			if (target == array.get(mid))
				return target;
			if (target < array.get(mid))
				high = mid;
				low = mid + 1;
		if (low >= 0 && low < array.size())
			return array.get(low);
			return array.get(array.size() - 1);

	private static int shortest_distance(String a, String b) {
		int min = Integer.MAX_VALUE;
		for (int index_a : locations.get(a)) {
			ArrayList<Integer> array = locations.get(b);
			int nearest_index_b = modified_binary_search(array, index_a);
			int distance = Math.abs(nearest_index_b - index_a);
			if (distance < min)
				min = distance;
		return min;

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: