Take Away
Fascinating talk on using probabilistic data structures to save oodles of search time if a limited number of false indicators can be tolerated.
Introduction
Tyler McMullen - Scribd
Different Data Structures
Why?
- Speed
- Memory
- Clarity
Some very interesting structures
- Bloom Filter
- BK-tree
- Splay Tree
- Trie
Bloom Filters
- Tests for existence in a set
- Probabilistic
- Minimal memory use
Example: 100million strings in a set
Tradition set: 10gb minimum vs 280mb
How does it work?
Binary sequence. Uses hash
In places where occasional false positives are okay
BK-tree
find items within a distance of a target
reduces search space
works inside a metric space
Triangle Inequality
If we know the distace between 2 of 3 points, then we can make assumptions about the distance between the remaining "unmeasured" two points.
Uses:
- Most often used for spelling corrections
- Work in any metric space
- Reduce the search space.
Splay Tree
- Self-blancing binary tree
- Brings most accessed items toward root
- The more uneven the access pattern, the better the performance.
Good for caches, garbage collectors, etc.
Trie
(pronounced "try")
- O(1) (order 1) on lookup, add, removal
- Ordered traversals
- Prefix matchine
- Excellent memory management.
Useful as an autocompleter.
Interesting, he implemented this as a rack filter.
No comments:
Post a Comment