Intro of some sort. Still in progress!
1. Two sum
The premise of the Two Sum problem is simple: given an array of integers and some target integer, return the indices of the two numbers such that they add up to the target integer. Additionally, for the purposes of this post, we will make the modification that instead of returning the indices, we simply return whether we can find two integers that add up to the target integer. There are a few approaches to this problem:
- We can use two nested
for
loops: an outer one to select each element, and an inner one to check if any subsequent elements sums to the target integer. - We can sort the array first, then loop over the elements using two pointers: one at the beginning of the list and one at the end, and move inward based on whether the current sum is too small or too large.
- We build a hash map of values to indices, then loop through the array checking if the complement (target minus current value) exists in our map.
- An improvement over the approach in 3. is to build the hash map while simultaenously checking for complements in a single pass, eliminating the need for our second loop.
Comparing the approaches to these problems using big-\(O\) analysis, where \(N\) is the length of the array, we find that the first approach is clearly \(O(N^2)\) time since we're checking every pair of elements. The second approach is dominated by \(O(N \log N)\) time for sorting. The third and fourth approaches are taught to be the fastest: \(O(N)\) time (with the third approach slower by a factor of two), with the tradeoff being \(O(N)\) space to store the hash map.
So, if all we cared about was speed, by asymptotic analysis, the hash table dominates. Unfortunately, in the real world, it is not so simple.
2. Practical implementation
Suppose we were implementing the "algorithmically efficient" solution in C++, using the standard library's std::unordered_map
.
References
- P. Celis, P. -A. Larson and J. I. Munro. Robin hood hashing. Annual Symposium on Foundations of Computer Science (sfcs 1985), 1985, pp. 281-288, doi: 10.1109/SFCS.1985.48.
- Robin Hood Hashing, David Gries, 2021
- Robin Hood Hashing should be your default Hash Table implementation, Sebastian Sylvan, May 8, 2013 (Hacker News post)
- More on Robin hood Hashing, Seabstian Sylvan, Aug 5, 2013
- Robin Hood hashing, Emmanual Goossaert, Nov 11, 2013
- Robin Hood hashing: backward shift deletion, Emmanuel Goossaert, Nov 17, 2013
- Building a HashMap in Rust - Part 1: What's a HashMap?, Alexis Beingessner
- tmmcguire/rust-toys/pony/anagrams/robinhood.pony, Tommy M. McGuire
- CppCon 2014: "Efficiency with Algorithms, Performance with Data Structures", Chandler Carruth
- CppCon 2017: "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step", Matt Kulukundis
- code::dive 2014: Cpu Caches and Why You Care, Scott Meyers
- Benchmark of major hash maps implementations, Thibaut Goetghebuer-Planchon, Oct 5, 2017
- Solving "Two Sum" in C with a tiny hash table, Chris Wellons, June 26, 2023 (Hacker News post)