Clever Caching

Besides our talk, my most valuable experience at RailsConf was talking with Tobi about the new caching strategy (‘Tobi caching’) he’s using at Shopify. There are a few parts which all work together nicely.

Etags Matter

As Joe explained, using good etags, can substantially reduce your bandwidth bill. In his case it was a 70% reduction. The take away from this is that you need to think about how you’re going to generate opaque cache coherency values for your actions. For a good intro to HTTP conditional gets, go read this tutorial by charles.

Expiry is a Pain

Anyone who’s had to write sweepers for for an application with heavy caching knows how frustrating it can be. After all, cache invalidation is one of the two hard things in computer science. If you could somehow avoid expiring all the ‘stuff’ you’re caching, your life would be much much easier.

Memcache is Smart

Memcache and the Memcache client libraries have plenty of smarts built into them, despite being ‘dumb by design’. The client libraries use clever hashing to know which server to talk to, this lets you run a cluster of caches without worrying too much about which keys live on which server.

The server also has its own smarts about what keys are important. When it needs the memory memcached will drop the least recently used values, thereby ensuring that your unused keys won’t be ‘wasting space’.

Mix it all together

So with that in mind, what can we do to improve our application’s performance, and simplify our application.

Forget about expiry

As mentioned before, expiry is a complete pain in the ass. So let’s not do it. The key to getting away with this is to pick a key which completely encapsulates the resource you’re caching, and also ensures that if anything relevant changes, the key changes. Take the case of this blog post, a simple key would be the permalink, however if we used that, we’d need to expire the cache every time someone commented, or I corrected a typo.

The no-expiry alternative would be for mephisto to keep a ‘version number’ associated with each post and increment it every time someone commented, or the post body changed. Once it was doing that, we could construct a key that looked like www.koziarski.net:clever-caching:#{version_number}. Every time the version number changed, we’d get a cache miss, and regenerate the content, but subsequent requests will be served out of memcache. No more expiry!

Now that we’ve saved all that CPU time, we should see if there’s a way we can save some bandwidth too.

Embrace Etags

Thankfully, our cache key has all the properties of an ETag, whenever something important changes, our cache key does. So lets use that as a basis of building our ETag by using the MD5 hash. The only reason I don’t advocate using the cache key itself, is that you may want to include sensitive data in the key. Now we can just chuck d444415a8228fbed44cfa7ef39f15d8b into the ETag header, and compare our key with the value of ‘If-None-Match’ from the request headers.

Conclusion

By doing this you get the bandwidth savings of HTTP caching, the performance boost of action caching, but without the difficult expiry code. You can avoid all the NFS related headaches of page caching, but still get most of the performance boost.

While the approach won’t suit every project, it could well suit yours. Finally, a snippet of sorts for those of you who think in code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


around_filter :cache_sensibly, :only=>:show

def cache_sensibly
  # compose the key using something we know matches our business
  cache_response(request.host, request.request_uri, @blog.version, @post.version) { yield }
end

private
def cache_response(*keys)
      key = keys * ':'
      # use the hash as an etag so we can cache on 
      # private data
      etag = MD5.hexdigest(key)
      
      
      # first handle HTTP, lets us avoid a memcache hit
      # and saves a huge amount of bandwidth to the client
      if request.env["HTTP_IF_NONE_MATCH"] == etag
        headers["X-Cache"] = "HTTP"
        head :not_modified
        return
      end
      response.headers["ETag"] = etag
      
      # Next check memcache
      if data = Cache.get(key)
        # render from the cached values
        headers["Content-Type"] = data[:content_type]
        headers["X-Cache"] = "HIT"
        render :text=>data[:content], :status=>data[:status]
      else
        # Finally, yield, indicate we've missed then cache the response
        headers["X-Cache"] = "MISS"
        yield
        Cache.put(key, {:content=>response.body, :status=>headers["Status"].to_i, :content_type=>(response.content_type || "text/html")})
      end
    end
Posted on May 28th, 2007 | 12 comments | Leave a Comment
Tomasz Gorski

Tomasz Gorski May 28th, 2007 @ 11:37 PM

Thanks for very interesting article. btw. I really enjoyed reading all of your posts. It’s interesting to read ideas, and observations from someone else’s point of view… makes you think more. So please keep up the great work. Greetings

Mark Rowe

Mark Rowe May 29th, 2007 @ 12:30 PM

It’s worth noting that sending only an ETag header will lead to the file being uncacheable in Safari and other HTTP clients that rely on Mac OS X’s URL loading functionality. It will send If-Modified-Since header to check the freshness of a resource that had a Last-Modified header when last retrieved, but at present it will never send an If-None-Match if the resource had an ETag.

Koz

Koz May 29th, 2007 @ 01:07 PM

@Mark: Yeah, I’m considering updating my cache_response method to take a Time option too, but for now the memcache caching takes care of the performance impact from safari and similar clients.

Lourens Naude

Lourens Naude May 30th, 2007 @ 04:43 PM

Nice language.

Would a global AR::B#version as per http://pastie.caboo.se/65956 be considered good practice for versioning models?

Rich Collins

Rich Collins May 31st, 2007 @ 09:40 AM

I don’t understand how you aren’t doing cache expiry. Keeping track of the versions is cache expiry. It just turns out that cache expiry is not hard for this case.

The hard case is when you can’t reliably predict what the content will contain (user recommendations for instance).

Koz

Koz May 31st, 2007 @ 11:26 AM

@Rich: Incrementing one integer is way easier than figuring out which fragments and pages to expire.

It’s expiry in the simplest sense of the word, but it’s missing all the painful and slow things.

Jason Watkins

Jason Watkins June 8th, 2007 @ 11:56 AM

I’d suggest taking a look at http://blog.craz8.com/ ’s action cache plugin. You could combine this with the nginx memcached module, X_ACCELL_REDIRECT and not even stream the memcache’d hit through rails. Using a similar pattern with file store, cache hits that fail the conditional get still clear rails in 3ms or so.

Koz

Koz June 8th, 2007 @ 12:58 PM

The ActionCache plugin still uses regular old action caching, which means expiry, and you can’t get per user or per-whatever-matters-to-you caches. However the nginx idea could be worth investigating, but even without it the response times from my actions are < 20ms, so it’s not a pain point right now.

Maarten Manders

Maarten Manders July 24th, 2007 @ 01:10 AM

Thanks for your inspiration. Your versioned cache solves another problem very elegantly when working with memcached or APC.

By appending the same version number (or just a random token) to many cache items at the same time, I can invalidate them all at one go by incrementing it. This makes up for memcached’s and APC’s lack of “tags” (groups) for cache items.

For example, i could invalidate all cached user profiles at once. With another version number, it would also be possible to invalidate all cached profiles of users from, say, the U.S.

campbell Anderson

campbell Anderson July 28th, 2007 @ 12:41 AM

I see Yahoo released a cool tool for analysis, YSlow, where they spent a lot of time researching things. This is a firefox plugin which taps into firebug. http://developer.yahoo.com/yslow/

meekish

meekish July 28th, 2007 @ 08:07 AM

I built a custom Rails calendar for my work to track our work orders. It works very well, but is quite slow as it has to render links for hundreds of work orders on each calendar page.

How could the “Forget About Expiry” concept be applied to a very low traffic site like ours that doesn’t need memcache?

CRAZ8

CRAZ8 August 26th, 2007 @ 06:13 PM

@Koz: The ActionCache plugin extends the Rails action caching to allow you to specify whatever you want as the fragment cache key. This allows you to use session state, cookie values or phases of the moon to differentiate cache entries – typically this is used to key off the current_user in some way.

The plugin also allows you to bypass the expiry problem by setting a Time To Live for each request. Simple, but for lots of scenarios, a simple timeout works great.

Leave a Comment...









Sponsors

Hosted excellently by RailsMachine