You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?
My initial thoughts:
We can build and store the mapping between pairs of words to their shortest distance. That is space complexity. But the query is constant since it’s just one step of look-up.
My initial codes:
private static Map<HashSet<String>, Integer> distances = new HashMap<HashSet<String>, Integer>(); public static int query(String word1, String word2) { return distances.get(new Pair<String>(word1, word2)); } public static void buildMap(String text) { // clean up the text String[] temp = text.split("[\\s\\p{Punct}]"); List<String> words = new ArrayList<String>(); for (String word : temp) { word = word.trim(); if (!word.isEmpty()) { words.add(word); } } // build the mapping between pairs of words to // their shortest distances for (int i = 0; i < words.size(); ++i) { for (int j = i + 1; j < words.size(); ++j) { if (!words.get(i).equals(words.get(j))) { HashSet<String> pair = new HashSet<String>(); pair.add(words.get(i)); pair.add(words.get(j)); if (distances.keySet().contains(pair)) { int curr = distances.get(pair); if (j - i < curr) distances.put(pair, j - i); } else { distances.put(pair, j - i); } } } } }Solution:
Please refer to the book: Cracking the Coding Interview.
Another way to solve this is by:
- Pre-processing: Store the locations for different words in a hashtable. One scan of the text: time complexity. Approximately space complexity.
- Query: Modified binary search. For example, query is (hello, world). Lookup in the hashtable we have “hello” -> [1,2] and “world” -> [5,8,9]. We search 1 in [5,8,9] to find the nearest, which is 5. So the distance is 4. We search 2 in [5,8,9] to find the nearest, which is 5 again, yielding distance 3, less than 4. So the shortest distance between “hello” and “world” is 3. In practice, number of locations for a word is relatively small comparing to the size of the text, hence the cost of query/search is nearly constant.
Implementation of the modified binary search:
private static Map<String, ArrayList<Integer>> locations = new HashMap<String, ArrayList<Integer>>(); private static void storeLocations(String[] text) { for (int i = 0; i < text.length; ++i) { String word = text[i]; if (locations.keySet().contains(word)) { ArrayList<Integer> location = locations.get(word); location.add(i); Collections.sort(location); locations.put(word, location); } else { ArrayList<Integer> location = new ArrayList<Integer>(); location.add(i); locations.put(word, location); } } } private static int modified_binary_search(ArrayList<Integer> array, int target) { int low = 0; int high = array.size(); while (low < high) { int mid = low + (high - low) / 2; if (target == array.get(mid)) return target; if (target < array.get(mid)) high = mid; else low = mid + 1; } if (low >= 0 && low < array.size()) return array.get(low); else return array.get(array.size() - 1); } private static int shortest_distance(String a, String b) { int min = Integer.MAX_VALUE; for (int index_a : locations.get(a)) { ArrayList<Integer> array = locations.get(b); int nearest_index_b = modified_binary_search(array, index_a); int distance = Math.abs(nearest_index_b - index_a); if (distance < min) min = distance; } return min; }