Old Fart Developer: Scenes from LA Ruby Conference - 2010

Take Away

Fascinating talk on using probabilistic data structures to save oodles of search time if a limited number of false indicators can be tolerated.

Introduction

Tyler McMullen - Scribd

Different Data Structures

Why?

Speed
Memory
Clarity

Some very interesting structures

Bloom Filter
BK-tree
Splay Tree
Trie

Bloom Filters

Tests for existence in a set
Probabilistic
Minimal memory use

Example: 100million strings in a set

Tradition set: 10gb minimum vs 280mb

How does it work?

Binary sequence. Uses hash

In places where occasional false positives are okay

BK-tree

find items within a distance of a target

reduces search space

works inside a metric space

Triangle Inequality

If we know the distace between 2 of 3 points, then we can make assumptions about the distance between the remaining "unmeasured" two points.

Uses:

Most often used for spelling corrections
Work in any metric space
Reduce the search space.

Splay Tree

Self-blancing binary tree
Brings most accessed items toward root
The more uneven the access pattern, the better the performance.
Good for caches, garbage collectors, etc.

Trie

(pronounced "try")

O(1) (order 1) on lookup, add, removal
Ordered traversals
Prefix matchine
Excellent memory management.

Useful as an autocompleter.

Interesting, he implemented this as a rack filter.

Old Fart Developer

Thursday, February 25, 2010

Scenes from LA Ruby Conference - 2010 - Data Structures

No comments:

Post a Comment

About Me