Friday, February 26, 2010

OCRuby Feb 26, 2000 meeting minutes

Thought I'd post this here since the OCRuby group submission is moderated.

This is mainly interesting for the Mongo DB presentation by Tommy Chheng.

Attendees:

Dash
Tina
Michael Hartl
Tommy Chheng
Scott Smith

Michael has published 7 chapters of his electronic book at railstutorial.org

Tommy: Whither Mongo DB for Natural Development

Used by:
  • github
  • EA
  • The New York Times
  • sourceforge
  • The Business Insider
The above are "big" sites.

Why?
  • NoSQL trend?
  • Scalability?
  • Natural Development!
Compare

SQL fixed schema
  • Data model Rows/Tables
  • Data Types Primitives
MongoDB Dynamic
  • BSON Documents/Collections
  • Primitives+Arrays/Hashes
Records vs Documents

SQL
3 tables
  • Documents
  • Revisions
  • Tags
MongoDB
  • document = {title:...} (reads as Javascript code)
  • BSON data structure naturally maps to most programming languages hash object.
  • ORM can just be a thin layer.
  • Can debug Mongo a lot easier.
Mongo Install

How to use in Ruby?

Easy to use in Rails or Sinatra
2 gems
  • mongo+mongo_ext gem
  • mongo_mapper gem


MongoMapper Models
  • No Migrations!
  • AR Validations
  • AR Callbacks
  • Testing using Facory Girl
  • Rails 3?
    • fork which uses ActiveModel, see mailinglist
    • is complete but not merged in yet.

MongoMapper Model

(sample code)

CouchDB or MongoDB?

CouchDB
  • One big database
  • HTTP REST
  • Find by id ma/reduce JS functions
  • MVCC
MongoDB
  • Many collections
  • sockets
  • Find by id - dynamic JS queries
  • Update in place
MVCC - updates by making new version copy; good for data safety.
Update in place is a lot faster

MongoDB by default writes to disk in batches

Limitations

Memory Mapped file:
  • 32-bit 4GB limit, use 64-bit
Atomic updates only at document level
  • Solve by nesting related data in document
MongoDB lacks:
  • transactions
  • CouchDB-style MVCC revisioning

Author

Twitter @tommychheng
  • using mongoDB + mongo_mapper!

Thursday, February 25, 2010

Scenes from LA Ruby Conference - 2010 - Announcements

Take Away

Continues to be lots of interest and activity in Ruby. Conferences in San Francisco and Phoenix announced for this year. Jobs are available, especially in San Francisco.

Job Available

Product Manager - Slingshot Labs, incubator for News Corporation
Looking for a senior Rails developer.

Ruby Conference

Josh Susser and Jim Myer
Golden Gate Ruby Conference
Announce Fall conference - Sept 10,11 2010 (fri/sat)

Primary Sponsor for Today's Conference

ATT Interactive
Buzz.com - social website. Have t-shirts. NOthing to do w/ Google

Ruby Conference

Derek w/ Sunnyconf in Phoenix
Sept 25.

LA Ruby Monthly Meetings

Alf Mcgollough 2nd Thursday of each month.
Meetup.com/laruby.
Looking for west side venue that can handle 50 people.

Job Available

AT&T interactive hiring
  • Ruby and data skills
  • Ruby and web and UI skills
  • Oracle/Ruby
  • Service type work in Ruby/Rails
Job Available

NextspRocket.com
Place to pay money to fix open source bugs.
Check out the site.

Ron Evans - thanks to organizers

Scenes from LA Ruby Conference - 2010 - Large Databases

Take Away

Large databases are very sensitive to mistakes that don't affect smaller databases. Anything that causes scanning of many records will not only run slowly, it frequently causes the entire web site to go down. Prevent scanning by careful application of indexes and avoiding data transformation operations when querying.

Introduction

Tim Morgan - scribd
  • We learn rails when we break it
  • scribd is a really large web site (Amazon S3)
  • Large sites are easy to break.
Infamous mistakes

These are the ones you remember for a long time (along w/ everyone else).

Causes
  • Almost always problems of scale.
  • Almost always about how Rails interfaces w/ database.
  • Postgres/Oracle, but MySql is what is being used.
  • If you add something that is a SQL request, look at the SQL itself.
  • Always understand the queries your code is generating. Look at the query log.
  • Test with a heavily populated database. If you find it sucks, think what your customers think.
  • Pay close attention to your indexes.
  • MySql and Postgres have very different implementations of bow their indexes work.
The problem with find_in_batches.
  • Doing it using User.all.each is going to be very slow.
  • Better to use find_each; uses batch for x items per request.
Composite primary keys? Use composite keys plugin.
http://gist.github.com/105318 is a monkey-patch

validates_uniqueness_of
  • :case_sensitive => false
  • took site down.
  • problem was SQL lower(login_name)
  • In mysql, make the case-insensitive column binary.
  • solution http://gist.github.com/105367
problem with delete and destroy
  • caused by misleading Rails documentation
  • delete_all is faster than destroy because it doesn't use hooks.
  • before_destroy positioning is important, must be placed before any associations.
  • delete_all() doesn't remove the link table record itself; it just sets the id column to nil.
  • Solution: use delete_all
  • CategoryMembership.delete_all :category_id => self.id
  • Fixed in Rails 3.0
problem with indexes

Think about indexes early.
  • mysql uses only one index at a time, so you may have to figure out an index on multiple columns.
  • you may have to tell MySql which index to use:
  • use index (index_documents_on_user_id).
Conclusions
  • Always understand the queries your code is generating.
  • Test with a heavily populated database.
  • Pay close attention to your indexes.




Scenes from LA Ruby Conference - 2010 - Data Structures

Take Away

Fascinating talk on using probabilistic data structures to save oodles of search time if a limited number of false indicators can be tolerated.

Introduction

Tyler McMullen - Scribd

Different Data Structures

Why?
  • Speed
  • Memory
  • Clarity
Some very interesting structures
  • Bloom Filter
  • BK-tree
  • Splay Tree
  • Trie
Bloom Filters
  • Tests for existence in a set
  • Probabilistic
  • Minimal memory use
Example: 100million strings in a set
Tradition set: 10gb minimum vs 280mb

How does it work?
Binary sequence. Uses hash
In places where occasional false positives are okay

BK-tree

find items within a distance of a target
reduces search space
works inside a metric space

Triangle Inequality
If we know the distace between 2 of 3 points, then we can make assumptions about the distance between the remaining "unmeasured" two points.

Uses:
  • Most often used for spelling corrections
  • Work in any metric space
  • Reduce the search space.
Splay Tree
  • Self-blancing binary tree
  • Brings most accessed items toward root
  • The more uneven the access pattern, the better the performance.
Good for caches, garbage collectors, etc.

Trie

(pronounced "try")
  • O(1) (order 1) on lookup, add, removal
  • Ordered traversals
  • Prefix matchine
  • Excellent memory management.

Useful as an autocompleter.

Interesting, he implemented this as a rack filter.

Scenes from LA Ruby Conference - 2010 -Teaching Ruby to Kids

Take Away

It's great to see Sarah evangelizing software training to kids. I've thought about it for a long time and her presentation will spur me to do it. Very good hints on how to do it so that both you and your students both thoroughly enjoy it and become better practitioners.

Introduction

Sarah Mei - Teaching Ruby to Kids

Teaching is her hobby.

Why?

Most programming instructors = FAIL
Teacher needs to be a coder.
Programming is becoming part of basic literacy.
Why should you teach?
  • Rewarding
  • Teaching leads to learning by the teacher.
  • Teaching not rocket science.
Agile teaching
  • set goals
  • form a plan but expect to adapt
  • keep iterations short
Set goals
  • specific
  • imediate,
  • measurable
Form a plan
  • What do I start with?
  • Keep your goals in mind.
  • Software Teaching Tools:
    • Shoes
    • Hackety Hack
    • Small Ruby
Theme
  • Kids love anything visual
  • Anything interactive
  • irb: compelling for kids (maybe)
Plan
  • Install all the tools you might use on all the computers the kids have access to.
  • start small
  • Use the internet.
Short iterations
  • Your "lesson plan" should be a series of very small steps.
  • 15 minutes or less
Listen to the customer
  • Follow tangents!
  • don't stick to a plan because it's the plan.
  • Don't worry about "finishing"
  • Look for teachable moments.
  • Look for signs they've turned off

"Ruby: the programming language for extroverts"

Deploy continuously
  • Do it often, practice
  • Teaching is a learned skill.
  • Take all opportunities you can to teach.
  • talks at your local meetup
  • pair programming
  • summer camps, etc, need volunteers
  • National Lab Day
  • In SF, I always need teachers for intro workshops.
Expect some things you try to fall flat.
  • Some students won't engage
  • Keep at it.

Summary:
  • You should teach
  • You can teach
  • Agile is form more than just development.
  • Practice.
Ruby is a great first language.

For really young kids:
  • Scratch
  • Kodu (Microsoft)
  • ISTE has curriculum for elementary school.
  • cs-unplugged (a web site)

web-kit and javascript.

These are drag-and-drop environments.

Scenes from LA Ruby Conference - 2010 - Web App Performance Monitoring

Take Away

Bjorn showed New Relic in action with how it monitors web sites and identifies issues early and clearly. Unfortunately my blog entry here fails to capture much of it as it was mostly demo. Worth looking into.

Introduction

Bjorn Freeman-Benson - New Relic

Building the First Successful Human-Powered Airplane

Paul MacCready
1977 - gossamer condor

Why did Paul succeed?

How to make the lightest possible airplane as quickly as possible? Ended up crashing a lot.

Could repair in less than 12 hours. Others crash repairs 6 months.

Macready team could iterate faster.

Applied to Software Development

Presenter wants to be that agile.

How he uses the New Relic to do that.

larubyconf
https://rpm.neweelic.com
30 days of RPM Gold for free.

Monday, February 22, 2010

Scenes from LA Ruby Conference - 2010 -Civic Hacking

Take Away

Luigi showed how government has lots of useful data but few tools to make sense of the data. Here's where software developers can make a contribution: to build free tools to make this data more accessible. He talked about the various opportunities and why accomplishing them would make a real difference.

Introduction

Luigi Montanez
luigi@sunlightfoundation.com
@LuigiMontanez

Purpose: get government to open up its data and provide software tools to comprehend it.

Over a thousand people in their effort.

16-paid staff.

"D.C. is Hollywood for ugly people."

Guiding principles:
  • Electoral Politics no
  • Governance yes
  • Open source
Civic Side Projects

Purposes/Challenges
  • Challenging entrenched bureacracies
  • Open source + Open data = better Government
  • Government opens data; they write apps aeround it.
  • Government as a wholesaler, not retailer.
Sunlight Labs API
  • Bio and contact info for elective office holders.
OpenSecrets.org
  • Contributions
    • Example: how much health insurance money has been spent politically and how.
GovTrack.us - Bills and Vote Records

MAPLight.org - Vote Influence
  • How representative voted correlated w/ donations.
Code for America
  • Will choose 5 cities
  • 5 developers will be supplied to each of those cities.
  • Modeled after Teach for America program.
Getting Involved.
  • groups.google.com/group/sunlightlabs
  • #transparency on Freenode
  • github didn't get actual repository
Benefits:
  • Enhance your skillset
  • Low risk, high reward
  • Another testing framework? Really?
  • Local/state govts. an untapped market
  • Solve a hard problem.
TED Talk
  • David Cameron: in a Ted talk "The next age of government"



Scenes from LA Ruby Conference - 2010 - Garbage Collection and the Ruby Heap

Take Away

The traditional Ruby engine is paranoid about memory management because it has to run in so many disparate environments. This negatively impacts the garbage collection performance. If you know or can define where your Ruby installation will run, you can do optimizations that will greatly speed this process. Indeed, this is one of the benefits of Ruby Enterprise, and Ruby 1.9 accomplishes a subset of the optimizations discussed here.

Note: this session went extremely fast and I was not able to collect the notes as I wanted.

Introduction

Joe Damato and Aman Gupta
@joedamato @tmm1

Garbage Collection and the Ruby Heap
  • Why GC
  • Ruby is simple and elegant
  • GC makes life easier.
  • No more memory management
    • Menory management
    • memory leaks
MRI
  • Always allocated on heap
  • Fixed size
  • sizeof(struct RVALUE) = 40

See their site to see how to optimize the GC.

Ruby memory leaks:
  • These are reference leaks
memprof - replacement for gdb.rb and bleak_house

Scenes from LA Ruby Conference - 2010 - Threads and Processes

Take Away

Good discussion of the different facilities available to Ruby and the underlying operating system and their tradeoffs.

Introduction

Aman Gupta - Joe Damato - Threads

http://timetobleed.com Joe's blog.

Fundamentals

What is a thread?

A thread is just a set of execution state.

Models
  • Green threads
  • Native
  • Hybrid
Green 1:N
  • Lightweight
  • Kernel doesn't know they exist
  • Implementation is in userland.
Pros
  • Create lots cheaply
  • Switch them.
  • Schedule them however you want.
Cons
  • Main one is that these can switch only between a single Ruby process.
Native Threads 1:1
  • Kernel knows they exist
  • Some user land code.
Pros
  • Take advantage of SMP
    • Shared memory
    • Blocking in one thread doesn't block
Cons
  • didn't get
Hybrid Threads (M;N)

Pros
  • Take advantage of SMP
  • Cheap setup and teardown
Cons
  • Need 2 schedulers
Ruby 1.9 and Erlang use hybrid threads

Preemptive Multitasking
  • Operating system switches process regardless of process states.
Cooperative Multitasking
  • thread gives up voluntarily.
Fibers

Tools
  • strace
  • google-perf
lsof - "list of open files" - a utility. Can also be used to get a list of open sockets.

strace

trace system calls and signals.

Ruby: SIGVTALRM used

github.com/ice799/matzruby
  • heap_stcks branch
  • heap_stacks_186 branch
github.com/tmm1/ruby187
  • fibers branch

Scenes from LA Ruby Conference - 2010 - The Next 10 Years

Take Away

A warning that the rate of change in underlying software development paradigms will require new mental approaches to the large software challenges lying ahead of us. The "algol-based" languages used by the vast majority of developers will yield to more scalable functional languages.

In the meantime, continue to grow you skills and constantly learn how to take advantage of tools to get more bang for the time you spend designing and coding.
Introduction

A new look at software development - What will the next 10 years bring?

Dave Astels
Twitter: @dastels
dastels@engineyard.com

Observations:
  • You're doing it completely wrong.
  • Software is hard
  • Software construction is the most complex endeavor ever undertaken by mankind.
  • The only software that's worth making is software that does something new.
  • It's only getting (more)...
    • more complex
    • bigger
    • distributed
    • parallel
    • life critical
30 years of software

think about...
  • popular languages
  • flavors, blends, derivatives
  • Fortran->Algol 54/58
  • Lisp 58
  • Ruby described as new-age lisp.
  • Smalltalk from lisp from simula
Some outliers
  • Prolog --> Erlang
  • ML --> Haskel
Time for a change?

Let the computer do more work.
Declarative Languages
  • What you want to do instead of how.
Functional Languages
strong type systems
  • language agda
Tools that cooperate
  • real-time analysis of our codes.
Do more for the developer
  • Giving some insight into the code you're writing.
  • Static analysis
Generated test suites
  • Identify boundary cases.
Runtime analysis

Will current ideas continue to server us?

Speaker says NO!

evolution/revolution

new way of programming

We need parallel strategies
  • problem decomposition
  • data structure design
    • distributed
  • algorithmic organization
We need:
  • better languages
  • better tools
    • tools that help us.
Google's "Go" language
  • everything is parallelized.
parallel and distributed baked in
  • that actively prevent bugs
closing quotes: (Guy Steele):
  • The bag of programming tricks that has served us for 50 years is the wrong way to think going forward and must be thrown out.
  • The great tricks of sequential programming don't work.
Summary
  • It's a parallel world of parallel problems.
  • Have strategies that assume imperfection. How do we write code that way?

Scenes from LA Ruby Conference - 2010 - Mobile Ruby

Take Away

Programming for mobile devices is a lot more than adjusting to the smaller screen; there are additional opportunities in the mobile devices themselves: GPS, camera, motion sensors, etc. However, they have significant challenges: comparatively primitive development environments, and operating systems. These are discussed along with how HTML 5 and Rhomobile are working towards a single programming API across mobile platforms.

Introduction

This was presented by Sarah Allen of Blazing Cloud (gotta retrieve the presenter's name)
Rhomobile framework call "Rhodes"

Mobile app development sucks:
  • Awkward
  • In some ways is archaic
    • Old languages
Brand Transcends Platform

My brand instead of cell phone brand. (Do I agree w/ this?).

Mobile gives you more than desktop:
  • Geolocation
  • Camera
  • Connected
  • Everyone you know is connected
Means you have different opportunities than desktop.

write Once - Run Anywhere

How to get code onto the device.
  • Rhodes is similar to Rails.
  • Views are HTML.
  • It all works within the device. So a kind of HTML processor is inside the device.
It's also analogous to Rails:
  • Controller -> RhoController
  • Model -> Rhom
  • View -> eRB files
Javascript alternatives:
  • Titanium

Scenes from LA Ruby Conference - 2010 - The Big Rewrite

Take Away

The primary danger in the major rewrite is proceeding before the business has bought into it; this is usually fatal (or you end up wishing it was fatal). Along with the important technical skills needed, this talk identifies how to know when the business is behind it, and how to help to navigate the business to support the rewrite. (Or, to determine that it's not a good idea to do the rewrite at this time.)

The Big Rewrite, Doing it Right

Rich Kilmer

btw, he used a presentment software package called Prezi which was very effective with swirling text while zooming.

Drivers for a Rewrite

  • Must be business driven
  • Must NOT be technology driven
  • Don't call it a rewrite
  • Complete in a major release cycle
Why? Costs money and resources. Business has to see value.

Preparing for a rewrite

Drop a major release before you start.
  • One the customer is really happy with.
  • Understand your domain
  • Or have a domain expert available all the time.
  • Break down the current system into logical sets of functionality. (Rich later showed the resulting code which was incredibly clean; this made a (hopefully) lasting impression on me.)
  • Choose the right technology for what you want to do. Examples:

    • Develop a standard worker framework (minion, resque)
    • Dedicate resources to repeatable data migration
    • Keep services code consistent, models clean
    • Use the right tool for each job

Flip It

  • Perform incremental migrations of historic data
  • Prepare business users for potential disruptions
  • Run flip scenarios several times
  • Enable "read only" system during final lip (if needed)
  • Provide a way to fall back if the flip fails

Miscellaneous

  • Don't code for assumptions.
  • If you find that you want to use the same name for two different classes, you may have two different domains which might need different applications.
  • Design for expectation that backing up separate systems will probably not backup synchronously. So be prepared to recover disparities.