Thursday, February 25, 2010

Scenes from LA Ruby Conference - 2010 - Large Databases

Take Away

Large databases are very sensitive to mistakes that don't affect smaller databases. Anything that causes scanning of many records will not only run slowly, it frequently causes the entire web site to go down. Prevent scanning by careful application of indexes and avoiding data transformation operations when querying.


Tim Morgan - scribd
  • We learn rails when we break it
  • scribd is a really large web site (Amazon S3)
  • Large sites are easy to break.
Infamous mistakes

These are the ones you remember for a long time (along w/ everyone else).

  • Almost always problems of scale.
  • Almost always about how Rails interfaces w/ database.
  • Postgres/Oracle, but MySql is what is being used.
  • If you add something that is a SQL request, look at the SQL itself.
  • Always understand the queries your code is generating. Look at the query log.
  • Test with a heavily populated database. If you find it sucks, think what your customers think.
  • Pay close attention to your indexes.
  • MySql and Postgres have very different implementations of bow their indexes work.
The problem with find_in_batches.
  • Doing it using User.all.each is going to be very slow.
  • Better to use find_each; uses batch for x items per request.
Composite primary keys? Use composite keys plugin. is a monkey-patch

  • :case_sensitive => false
  • took site down.
  • problem was SQL lower(login_name)
  • In mysql, make the case-insensitive column binary.
  • solution
problem with delete and destroy
  • caused by misleading Rails documentation
  • delete_all is faster than destroy because it doesn't use hooks.
  • before_destroy positioning is important, must be placed before any associations.
  • delete_all() doesn't remove the link table record itself; it just sets the id column to nil.
  • Solution: use delete_all
  • CategoryMembership.delete_all :category_id =>
  • Fixed in Rails 3.0
problem with indexes

Think about indexes early.
  • mysql uses only one index at a time, so you may have to figure out an index on multiple columns.
  • you may have to tell MySql which index to use:
  • use index (index_documents_on_user_id).
  • Always understand the queries your code is generating.
  • Test with a heavily populated database.
  • Pay close attention to your indexes.

No comments:

Post a Comment