Shortest distances between two words in a file

You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?

My initial thoughts:
We can build and store the mapping between pairs of words to their shortest distance. That is O(n^2) space complexity. But the query is constant since it’s just one step of look-up.

My initial codes:

	private static Map<HashSet<String>, Integer> distances = 
			new HashMap<HashSet<String>, Integer>();

	public static int query(String word1, String word2) {
		return distances.get(new Pair<String>(word1, word2));
	}

	public static void buildMap(String text) {
		// clean up the text
		String[] temp = text.split("[\\s\\p{Punct}]");
		List<String> words = new ArrayList<String>();
		for (String word : temp) {
			word = word.trim();
			if (!word.isEmpty()) {
				words.add(word);
			}
		}

		// build the mapping between pairs of words to
		// their shortest distances
		for (int i = 0; i < words.size(); ++i) {
			for (int j = i + 1; j < words.size(); ++j) {
				if (!words.get(i).equals(words.get(j))) {
					HashSet<String> pair = new HashSet<String>();
					pair.add(words.get(i));
					pair.add(words.get(j));
					if (distances.keySet().contains(pair)) {
						int curr = distances.get(pair);
						if (j - i < curr)
							distances.put(pair, j - i);
					} else {
						distances.put(pair, j - i);
					}
				}
			}
		}
	}

Solution:
Please refer to the book: Cracking the Coding Interview.
Another way to solve this is by:

  1. Pre-processing: Store the locations for different words in a hashtable. One scan of the text: O(n) time complexity. Approximately O(n) space complexity.
  2. Query: Modified binary search. For example, query is (hello, world). Lookup in the hashtable we have “hello” -> [1,2] and “world” -> [5,8,9]. We search 1 in [5,8,9] to find the nearest, which is 5. So the distance is 4. We search 2 in [5,8,9] to find the nearest, which is 5 again, yielding distance 3, less than 4. So the shortest distance between “hello” and “world” is 3. In practice, number of locations for a word is relatively small comparing to the size of the text, hence the cost of query/search is nearly constant.

Implementation of the modified binary search:

	private static Map<String, ArrayList<Integer>> locations = new HashMap<String, ArrayList<Integer>>();

	private static void storeLocations(String[] text) {
		for (int i = 0; i < text.length; ++i) {
			String word = text[i];
			if (locations.keySet().contains(word)) {
				ArrayList<Integer> location = locations.get(word);
				location.add(i);
				Collections.sort(location);
				locations.put(word, location);
			} else {
				ArrayList<Integer> location = new ArrayList<Integer>();
				location.add(i);
				locations.put(word, location);
			}
		}
	}

	private static int modified_binary_search(ArrayList<Integer> array,
			int target) {
		int low = 0;
		int high = array.size();
		while (low < high) {
			int mid = low + (high - low) / 2;
			if (target == array.get(mid))
				return target;
			if (target < array.get(mid))
				high = mid;
			else
				low = mid + 1;
		}
		if (low >= 0 && low < array.size())
			return array.get(low);
		else
			return array.get(array.size() - 1);
	}

	private static int shortest_distance(String a, String b) {
		int min = Integer.MAX_VALUE;
		for (int index_a : locations.get(a)) {
			ArrayList<Integer> array = locations.get(b);
			int nearest_index_b = modified_binary_search(array, index_a);
			int distance = Math.abs(nearest_index_b - index_a);
			if (distance < min)
				min = distance;
		}
		return min;
	}

Check if there is a route between two nodes in a Diagraph

Given a directed graph, design an algorithm to find out whether there is a route between two nodes.

My initial thoughts:
Starting from the source, using DFS to traverse the graph to check if we arrive at the target.

My initial codes:


	public static boolean isRouteBetween(Digraph g, Integer start, Integer end) {
		Stack<Integer> stack = new Stack<Integer>();
		Set<Integer> visited = new HashSet<Integer>();
		if (start != null) {
			stack.push(start);

			while (!stack.isEmpty()) {
				Integer curr = stack.pop();
				if (!visited.contains(curr)) {
					if (curr.equals(end))
						return true;
					else {
						visited.add(curr);
						for (Integer child : g.adj(curr))
							stack.push(child);
					}
				}
			}
		}
		return false;
	}

Comments:

  • Time complexity: O(|V|+|E|). Space complexity: O(|V|).
  • Use Stack for DFS. Use Queue for BFS.

Solution:

	public enum State {
		Unvisited, Visited, Visiting;
	}

	public static boolean search(Graph g, Node start, Node end) {
		LinkedList<Node> q = new LinkedList<Node>(); // operates as Stack
		for (Node u : g.getNodes()) {
			u.state = State.Unvisited;
		}
		start.state = State.Visiting;
		q.add(start);
		Node u;
		while (!q.isEmpty()) {
			u = q.removeFirst(); // i.e., pop()
			if (u != null) {
				for (Node v : u.getAdjacent()) {
					if (v.state == State.Unvisited) {
						if (v == end) {
							return true;
						} else {
							v.state = State.Visiting;
							q.add(v);
						}
					}
				}
				u.state = State.Visited;
			}
		}
		return false;
	}