Welcome to null-terminated Casey. For an introduction, click here

haiku project: Ranked Index

The startup scene up in the northern wastes of Seattle is small but very vibrant. One of the cool little things we have to amuse ourselves is Marcelo Calbucci's Seattle Startup Index (that link is for March's).

Everybody knows it's meaningless (few would actually value icanhascheezburger over Farecast), but it's fun to watch the startups move up and down, and his idea of averaging Alexa and Compete is a good one.

Meanwhile a week ago on Hacker News, there was some Alexa graphing of YC startups going on, and they couldn't figure out what to round out the top five of Reddit, Scribd, Justin.tv and Weebly with (I have a biased opinion.)

So I thought I would generalize the list, and I've done so at Ranked Index.

Here's my best shot at a ranked list of publicly announced Y Combinator startups that still resolve.

There are a few other lists on the front page, and everything's still pretty rough and slow, so send me suggestions and subscribe to the blog to hear about updates (it'll be in my FeedSalon too of course.)

Trying out Disqus

The jury-rigged comment system I've had on Casey0 and null-terminated is now four years old and was never very good.

So I'm giving the comment system from fellow YCers Disqus a shot; it's not the exact use-case they've designed for, but the javascript was simple enough that I think I've succesfully shoehorned my needs into their functionality. Or vice-versa, I'm not sure. Let me know what you think.

AWSRecord: Adding SimpleDB to S3Record for scalable queries

It's been a while since I had a technical blog post, so today I thought I'd show some code I wrote last weekend to experiment with Amazon's SimpleDb.

Background

Last summer, there was some discussion about using Amazon S3 for object persistance, rather than file storage. One well-written approach is laid out in this blog post. Unfortunately the nice code formatting and colors seem to have disappeared, so use the wayback machine's version.

I implemented something very similar to what's described, and that's what's serving as my database on several side projects, most notably FeedSalon.

It's a great way to not have to worry about a database if you're running out of EC2. Not only is all your data backed up automatically, but you can just add more EC2 instances and they all just work together (under my high-latency/eventual-consistency uses at least).

The Problem

One of the main downsides of S3Record is that in order to do any searching or sorting, one must iterate through every object, which is completely unacceptable.

However, Amazon recently launched a new service called SimpleDB that is designed to organize data using key-value pairs. So by extending S3Record to store certain fields in a SimpleDB "domain", we can then query those fields as desired.

Thus is born AWSRecord

Implementation

I started with my version of S3Record, which is very similar to the one presented in the blog entry linked above, so take a glance there first.

All we're doing is picking a bucket, and for any given key, storing a YAML representation of the object associated with that key. I think the only significant interface change in my version is that the bucket name is pulled into its own function for ease of changing:

  def self.bucket
    return "#{ self.name.downcase }.caseybucket"
  end

So for example, if we were creating an S3-backed User class for FeedSalon, you could create something like this:

require 's3record'
class User < S3Record
  attr_accessor :name, :age, :zipcode
  def self.bucket
    return "user.s3record"
  end
end
and then start creating, reading, and updating users on a whole bunch of machines without much scaling effort at all.

Let's start working on AWSRecord, which will extend this to be queryable using SimpleDB.

We'll pick a similarly named SimpleDB domain, which unlike an S3 bucket doesn't have to be globally unique.

require 's3record'
require 'aws_sdb'
class AwsRecord < S3Record
  def self.domain
    return "#{ self.name.downcase }.record"
  end
Let's leave the calculation of the queryable fields very flexible, we're not trying to build ActiveRecord here. We can do this by just providing a hash that the child classes can fill and calculate however they want.
  def query_attributes
    {}
  end
One of the issues with SimpleDB is the lack of libraries (it's still in beta), so I picked the most mature looking one (although I was tempted to support the nytimes instead). Unfortunately, it doesn't have quite the same interface as our S3 library, so we'll use a little singleton pattern that we can sub out later if another library looks better.
  @SDB_SERVICE = nil
  def self.sdb
    @SDB_SERVICE ||= AwsSdb::Service.new(Logger.new(nil), 'ACCESS_KEY', 'PRIVATE_KEY')
  end
The meaty part is that on update or delete, we keep the SimpleDB attributes in synch (create just uses the update method).
  def update
    super
    self.class.sdb.put_attributes(self.class.domain, @key, query_attributes)
  end
  def self.delete(key)
    super(key)
    self.sdb.delete_attributes(self.domain, key)
  end
And finally the juicy part is that we can make queries which will return keys that can then be fetched.
  def self.query_keys(query, max = nil, token = nil)
    self.sdb.query(self.domain, query, max, token)
  end
Let's test it by adding an extra method to our User class. We'll set it up to query all three fields directly.
  def query_attributes
    {
      :name => name,
      :age => age,
      :zipcode => zipcode
    }
  end
Trying creating a few:
$ irb
> require 'user'
=> true
> User.new(:key => 'caseymrm', :name => "Casey Muller", :age => 27, :zipcode => 93023).create
=> nil
> User.new(:key => 'casey', :name => "Casey the Great", :age => 27, :zipcode => 98102).create
> User.new(:key => 'nephew', :name => "Nephew", :age => 9, :zipcode => 93023).create
=> nil
> User.all
=> [#<User:0xb75cb1e4 @age=27, @key="casey", @created_at=Sat Mar 15 17:42:14 -0700 2008, @zipcode=98102, @name="Casey the Great">, #<User:0xb75c382c @age=27, @key="caseymrm", @created_at=Sat Mar 15 17:42:59 -0700 2008, @zipcode=93023, @name="Casey Muller">, #<User:0xb75b8314 @age=9, @key="nephew", @created_at=Sat Mar 15 17:44:58 -0700 2008, @zipcode=93023, @name="Nephew">]
Okay, how about a nice SimpleDB query?
> User.query_keys("['zipcode' = '93023']")
=> [["caseymrm", "nephew"], ""]
Looks like it works, try something more complicated, let's say there's a mature FeedSalon section, and we want users over 18:
> User.query_keys("['age' > '18']")
=> [["casey", "caseymrm", "nephew"], ""]
Uh oh, why did the 9 year old nephew come up? SimpleDB does all lexicographical comparisons, so since 9 is greater than the leading 1 of 18, that record was returned. The solution is to pad numbers, so let's add a couple of helpers.
  def self.pad_num(number, max_digits = 10)
    "%%0%di" % max_digits % number.to_i
  end
  def self.query_keys(query, pad = true, max = nil, token = nil)
    query = query.gsub(/\d+/) {|n| self.pad_num(n)} if pad
    self.sdb.query(self.domain, query, max, token)
  end
And adjust our User:
  def query_attributes
    {
      :name => name,
      :age => self.class.pad_num(age),
      :zipcode => zipcode
    }
  end
We'll need to update to get the numbers padded in SimpleDB, then let's try the query again.
> User.all.each{|u| u.update}
=> [#<User:0xb74cf100 @age=27, @key="casey", @created_at=Sat Mar 15 17:42:14 -0700 2008, @zipcode=98102, @name="Casey the Great">, #<User:0xb74cc518 @age=27, @key="caseymrm", @created_at=Sat Mar 15 17:42:59 -0700 2008, @zipcode=93023, @name="Casey Muller">, #<User:0xb74c9930 @age=9, @key="nephew", @created_at=Sat Mar 15 17:44:58 -0700 2008, @zipcode=93023, @name="Nephew">]
> User.query_keys("['age' > '18']")
=> [["caseymrm", "casey"], ""]
So there you have it, S3Record with SimpleDB queries on selected fields.

Conclusions

It's not really written the Ruby on Rails way, I think for that you'd want to embed the SimpleDB information in the attributes directly. But once the child class is written, I find it very easy to work with the data in the actual application code.

Like I said, I whipped this up last weekend, but I'm not using it anywhere, because of a couple of issues.

Price

Wherever I use S3Record, it's a very read-heavy application, and I use a local cache on each machine. Make sure you read up on the S3 per-request charges as well as the SimpleDB machine hour and indexing costs. This technique has a very predictable and scalable cost, but if your revenues don't increase linearly with activity (or are non-existant), be careful.

Sorting

The biggest thing I wanted from SimpleDB was querying and sorting. It turns out you don't get sorting... explcitly.

Actually, people on the forums have found a hacky workaround, check it out:

> TestConsumer.new(:key => 'test1', :age => 15).create
=> nil
> TestConsumer.new(:key => 'test2', :age => 25).create
=> nil
> TestConsumer.new(:key => 'test3', :age => 35).create
=> nil
> TestConsumer.query_keys("")
=> [["test2", "nephew", "test3", "test1", "caseymrm", "casey"], ""]
> TestConsumer.query_keys("['age' > '0']")
=> [["nephew", "test1", "test2", "caseymrm", "casey", "test3"], ""]
Apparently if you add an intersection to the end with a test of (field) > 0, the results reliably come back sorted. This is an undocumented feature though, and requires a lot of work if you want to also maintain a backwards index for reverse sorting, etc.

If anybody else is actually using a technique like this in production, I'd be very interested in hearing about it.

Home alone

Oh, and just to explain the ridiculously awesome photos I just posted, that's what happens when I'm home alone and wanting to cut my hair.

I think I show good range though, from the start to a mustache, I'm getting there, but I get top-heavy before becoming awesome and then being done. Wait, is that a head in the trash?

haiku project: Feedcomb

Okay, I'm in danger of not posting in February, so just like my previous haiku project, I've taken a little thing I wrote for myself, found a good domain name (alright, Jon came up with it), and spent about 10 minutes on a logo.

I give you Feedcomb. I realize the name is kind of ridiculously similar to FeedSalon, maybe I should've gone with one of the runners up.

I built Feedcomb because I was tired of seeing links to the same websites as they spread through the various social news sites. So now I can browse reddit, click a few things, and if I decide to take a dip into boing boing, the links I've already clicked on start out collapsed.

It's very primitive, still plenty to do, but at least it's out there, and I most humbly welcome comments and criticisms.


Old-school comments:
Please comment within the entries rather than here on the front page, thanks! hj