Fascinating talk on using probabilistic data structures to save oodles of search time if a limited number of false indicators can be tolerated.
Tyler McMullen - Scribd
Different Data Structures
Some very interesting structures
- Bloom Filter
- Splay Tree
- Tests for existence in a set
- Minimal memory use
Example: 100million strings in a set
Tradition set: 10gb minimum vs 280mb
How does it work?
Binary sequence. Uses hash
In places where occasional false positives are okay
find items within a distance of a target
reduces search space
works inside a metric space
If we know the distace between 2 of 3 points, then we can make assumptions about the distance between the remaining "unmeasured" two points.
- Most often used for spelling corrections
- Work in any metric space
- Reduce the search space.
- Self-blancing binary tree
- Brings most accessed items toward root
- The more uneven the access pattern, the better the performance.
Good for caches, garbage collectors, etc.
- O(1) (order 1) on lookup, add, removal
- Ordered traversals
- Prefix matchine
- Excellent memory management.
Useful as an autocompleter.
Interesting, he implemented this as a rack filter.