In this day of multi-core CPUs and parallel programming platforms a common meme amongst developers has become “Is library X thread safe?”. As reasonable as that sounds, it’s actually not a particularly interesting question to ask, and anyone who claims to have a useful answer, probably doesn’t know what’s going on. I say that because unless you say what you want to do in parallel, it’s meaningless to say whether or not a library is thread safe.
Take a popular definition of thread safety, this one’s from wikipedia:
A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads.
Using that definition, is libevent thread-safe? The answer turns out to be a little more complicated than a yes or no answer, it depends what you’re trying to do in each of those threads.
What about Rails ? Well, strictly speaking Rails is currently thread-safe. It functions correctly during simultaneous execution, we just have this huge mutex around the entire request so we only handle one request at a time. What about Active Record? Well, yes… But it opens a connection per thread which will quickly exhaust your database server’s resources. Both of these restrictions present problems for people wanting to use rails in a highly threaded environment, so perhaps thread safety isn’t a particularly useful goal without further clarification.
It’s just not useful to say whether or not rails ( or any other librarary ) is ‘thread safe’ without saying what you want to do in parallel, the code may function correctly but run slower than it would in a single thread.
With rails there are several different concurrent use cases we could aim to support in our new thread-safety project
- Dispatch requests in parallel with a single thread per request
- Allow rendering to be performed in parallel for a single request (several threads each rendering a seperate partial for the same response)
- Make Active Record use a connection pool to prevent opening an excessive number of connections
- Allow a single Active Record object to be worked with concurrently in multiple threads.
Each of these goals will involved differing numbers of locks, and differing degrees of change to the current design. They’ll also improve (or degrade) performance by different factors for different benchmarks. Whatever the results of this process there will still be some locks around some methods, and those locks are bound to annoy someone.
Phrases like “W is completely threadsafe” or “X isn’t thread safe” should be a red flag to you. They’re a sign that the speaker doesn’t quite understand the subtlety of the question, and the irrelevance of the answer without further qualification.
One of the nice things about the last two and a half years has been working from home. You avoid 90% of the dilbert-style office drama, and the commute is hard to beat. But the nicest thing about working from home is that, strictly speaking, I can work from any home, not just our little place.
So from July the 4th till the end of the year Anika and I will be living in Paris. You know, this paris:

We have an apartment in the Marais for most of our stay, and another in the Latin Quarter for the remainder. We intend to take the opportunity to travel around europe and soak up the french culture.
So if you’re doing something interesting with Ruby, are based in paris, and can handle a visit from a Kiwi speaking awful Franglais drop me a line.
Today I had a few hours to spare and decided to try out Passenger. This blog is hardly a high traffic website, but it has some crazy RewriteRules that I figured would test the limits of the module.
Everything appears to be working perfectly, and it took less than 5 minutes to set everything up. I’m seriously impressed at how simple this was. I’ll confess to being skeptical at first, but so far the package lives up to the promises on its website.
If you notice anything broken let me know. Now I have to figure out how to kill the rest of my spare time.

On Friday the 14th of March 2008 Anika and I ‘tied the knot’ in a small ceremony at the wellington registry office. Thanks to all our well wishers, especially the dozens and dozens of twitter messages I received.
PS – if I owe you an email or something, there’s my excuse!
If you’re a developer, and read other developers’ blogs, odds are you’ve heard of git. While some of the posts about git are almost absurdly exuberant, there’s no denying that it fixes some of the really hard problems with source control.
I’ve been using git-svn as a frontend for my rails development for around 4-5 months, I can’t imagine going without fast local branches, smart merges and rebases. So naturally I’m interested in migrating rails from subversion to git, as are the rest of the core team. Unfortunately it’s not as simple as running git-svnimport and updating some webpages. We rely on several pieces of subversion which aren’t quite so easy to replace.
Our scripts work with Subversion
Our subversion servers do a lot of traffic. The rake rails:freeze:edge task contributes to this, but so do people using svn:externals for their rails application. The amount of traffic the servers do is an indication that people like to use this functionality and we shouldn’t just turn it off one day. The svn command line client is extremely portable and available on almost every system. The git tool set doesn’t have the equivalent of an ‘svn export’, which is what the freeze tasks and plugin installers rely on.
Trac works better with svn than git
The ability to automatically close tickets with a commit message is really useful, as is the automatic hyperlinking that we can use in ticket comments. There are hacks to add git support to trac, but it will take some time to evaluate them. This stuff currently ‘just works’ with subversion.
Interim solution
In the meantime, I’m publishing my git-svn powered repository locally, at github and gitorious. If you want to use git to work on your rails patches, feel free to grab me in #rails-contrib to talk about your branches, and the steps to get them merged.
Unfortunately due to the way git-svn works, when your patches get applied to rails they’ll be squash merged and then rebased out of existence by git-svn. So once your branch gets accepted, you’ll have to throw it away or suffer thousands of merge conflicts. To minimise your pain make sure you have one branch per feature and ensure all branches can exist independently.
While the limitations of git-svn mean we’re only testing a subset of the git workflow, it’s important that we figure out kinks in a git workflow for rails, so please help out.
Migration plan
So in order for us to migrate from subversion to git, we need:
- A reliable read-only svn mirror based on our git repository
- A git post-commit hook which allows us to close tickets
- Something similar to our trac browser + ticket integration.
The post commit hook and browser aren’t particularly difficult, while gitweb is a little ugly, I’m sure we could make something work. The real challenge is the backwards compatible, read-only svn mirror. Evan had a post on this a while back but that involves initializing a brand new repository and apparently is prone to breaking from time to time. Rick has a simple file copying solution which we may end up using.
If you have ideas or experience with any of these issues, and the motivation to help out, please let us know by emailing the rails-core list or grabbing me in IRC.
The great thing about working with open source is that if you come across a bug, you can fix it yourself. It’s the whole underpinning of the success of the model. Every developer scratches their own itch, and the cumulative result is a vibrant and useful piece of software driven by an enthusiastic community. If you’ve been fortunate enough to be the steward of a successful open source project you’ve probably seen this in action.
If you’ve been working on a really successful open source project, you’ve probably seen something else happen.
In the early stages of your project, the people who hang around are willing and able to help. When they find a bug or some missing documentation, they write a patch to remedy the situation. Even those who can’t fix the bugs will put in a Herculean effort to help isolate it with a test case.
But after a while the ‘they-brigade’ shows up, and instead of offering to help they merely act indignant that ‘they’ (the open source project) haven’t done it already. Rather than spend 10 minutes writing a patch to improve the documentation of a project they’ll spend 20 minutes writing an outraged blog post demanding that ‘they’ fix the problem.
Most open source projects aren’t controlled by a secretive cabal of conspirators. If you notice something broken don’t assume that the maintainers are deliberately trying to fuck with you. It probably hasn’t hurt them yet, or they’ve chosen to spend their time on something else. Remember, you’re not paying for this, no-one’s violating their SLA. Instead of seething with rage, try to find a test case that isolates the bug you’ve found. Instead of flaming on a forum, try to sketch out the points that the improved documentation should cover.
You’ll find your own work much more fulfilling if you’ve taken the time to improve your tools. The fact that the wider community benefits is just a nice big cherry on top.
This started as a comment on Geoff’s post but seemed to justify knocking the dust off this thing.
Apparently there are some people who feel that it’s ‘impossible’ to get patches into rails and that the core team doesn’t communicate with its user base. As someone who spends a lot of my time helping others contribute to rails I find the ‘impression’ hard to reconcile with my own experience. We have an active mailing list and irc channel where discussion takes place almost every day.
In the 2.0 effort we’ve received patches from 177 different individuals ranging from minor typo fixes, through to entire new features. So if you’ve got some killer ideas to contribute, subscribe to the core list and start talking about them. 2.0 is almost here, but there’s plenty of scope for new or improved functionality in rails.next, and we’d love to hear from you.
Rails 2.0 has had several profiling-driven optimisations we found by benchmarking real applications rather than hello world apps. Named routes were a common source of slowness in big applications, so 2.0 has new code that makes them several times faster. Repeatedly parsing Dates and Times from database also contributed to performance problems, so we have code to cache the results and for good measure we made the parsing faster too.
I’m not saying we’ve solved all the problems, or that rails is now perfect. No framework is! If you have an idea for improving performance, and a profiler report showing it makes a big difference, join the irc channel and lets talk about it, we’re always open to ideas.
Like any open source project rails depends on you, the community, for contributions. If you have something you feel like fixing, jump on in!
Besides our talk, my most valuable experience at RailsConf was talking with Tobi about the new caching strategy (‘Tobi caching’) he’s using at Shopify. There are a few parts which all work together nicely.
Etags Matter
As Joe explained, using good etags, can substantially reduce your bandwidth bill. In his case it was a 70% reduction. The take away from this is that you need to think about how you’re going to generate opaque cache coherency values for your actions. For a good intro to HTTP conditional gets, go read this tutorial by charles.
Expiry is a Pain
Anyone who’s had to write sweepers for for an application with heavy caching knows how frustrating it can be. After all, cache invalidation is one of the two hard things in computer science. If you could somehow avoid expiring all the ‘stuff’ you’re caching, your life would be much much easier.
Memcache is Smart
Memcache and the Memcache client libraries have plenty of smarts built into them, despite being ‘dumb by design’. The client libraries use clever hashing to know which server to talk to, this lets you run a cluster of caches without worrying too much about which keys live on which server.
The server also has its own smarts about what keys are important. When it needs the memory memcached will drop the least recently used values, thereby ensuring that your unused keys won’t be ‘wasting space’.
Mix it all together
So with that in mind, what can we do to improve our application’s performance, and simplify our application.
Forget about expiry
As mentioned before, expiry is a complete pain in the ass. So let’s not do it. The key to getting away with this is to pick a key which completely encapsulates the resource you’re caching, and also ensures that if anything relevant changes, the key changes. Take the case of this blog post, a simple key would be the permalink, however if we used that, we’d need to expire the cache every time someone commented, or I corrected a typo.
The no-expiry alternative would be for mephisto to keep a ‘version number’ associated with each post and increment it every time someone commented, or the post body changed. Once it was doing that, we could construct a key that looked like www.koziarski.net:clever-caching:#{version_number}. Every time the version number changed, we’d get a cache miss, and regenerate the content, but subsequent requests will be served out of memcache. No more expiry!
Now that we’ve saved all that CPU time, we should see if there’s a way we can save some bandwidth too.
Embrace Etags
Thankfully, our cache key has all the properties of an ETag, whenever something important changes, our cache key does. So lets use that as a basis of building our ETag by using the MD5 hash. The only reason I don’t advocate using the cache key itself, is that you may want to include sensitive data in the key. Now we can just chuck d444415a8228fbed44cfa7ef39f15d8b into the ETag header, and compare our key with the value of ‘If-None-Match’ from the request headers.
Conclusion
By doing this you get the bandwidth savings of HTTP caching, the performance boost of action caching, but without the difficult expiry code. You can avoid all the NFS related headaches of page caching, but still get most of the performance boost.
While the approach won’t suit every project, it could well suit yours. Finally, a snippet of sorts for those of you who think in code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
around_filter :cache_sensibly, :only=>:show
def cache_sensibly
# compose the key using something we know matches our business
cache_response(request.host, request.request_uri, @blog.version, @post.version) { yield }
end
private
def cache_response(*keys)
key = keys * ':'
# use the hash as an etag so we can cache on
# private data
etag = MD5.hexdigest(key)
# first handle HTTP, lets us avoid a memcache hit
# and saves a huge amount of bandwidth to the client
if request.env["HTTP_IF_NONE_MATCH"] == etag
headers["X-Cache"] = "HTTP"
head :not_modified
return
end
response.headers["ETag"] = etag
# Next check memcache
if data = Cache.get(key)
# render from the cached values
headers["Content-Type"] = data[:content_type]
headers["X-Cache"] = "HIT"
render :text=>data[:content], :status=>data[:status]
else
# Finally, yield, indicate we've missed then cache the response
headers["X-Cache"] = "MISS"
yield
Cache.put(key, {:content=>response.body, :status=>headers["Status"].to_i, :content_type=>(response.content_type || "text/html")})
end
end
|
My recent trip to Microsoft reintroduced me to a few goings on in “The Enterprise”, which is thankfully something of a dim memory for me. The one thing which always seemed like garbage to me is BPEL (and any other XML workflow tool). However there are a bunch of smart people who clearly think that there’s some merit to it. I’m fairly opinionated, it goes with the territory but I figure I should ask “what is it I’m missing”?
For the readers who have yet to learn about BPEL, check out this oracle tutorial.
One of the constructs BPEL gives you is the ability to specify conditionals:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
<switch>
<case condition="bpws:getVariableData('FlightResponseAA','confirmationData','/confirmationData/Price') <= bpws:getVariableData('FlightResponseDA','confirmationData','/confirmationData/Price')">
<!-- Select American Airlines -->
<assign>
<copy>
<from variable="FlightResponseAA"/>
<to variable="TravelResponse"/>
</copy>
</assign>
</case>
<otherwise>
<!-- Select Delta Airlines -->
<assign>
<copy>
<from variable="FlightResponseDA"/>
<to variable="TravelResponse"/>
</copy>
</assign>
</otherwise>
</switch> |
Of course we already have that:
1
2
3
4
5
|
if FlightResponseAA.confirmationData.Price <= FlightResponseDA.confirmationData.Price
TravelResponse = FlightResponseAA
else
TravelResponse = FlightResponseDA
end |
It also lets you invoke processes:
1
2
3
4
5
6
7
8
9
10
11
12
|
<flow>
<sequence>
<!-- Async invoke of the AA Web service and wait for the callback-->
<invoke partnerLink="AmericanAirlines" portType="aln:FlightAvailabilityPT" operation="FlightAvailability" inputVariable="FlightDetails"/>
<receive partnerLink="AmericanAirlines" portType="aln:FlightCallbackPT" operation="FlightTicketCallback" variable="FlightResponseAA"/>
</sequence>
<sequence>
<!-- Async invoke of the DA Web service and wait for the callback-->
<invoke partnerLink="DeltaAirlines" portType="aln:FlightAvailabilityPT" operation="FlightAvailability" inputVariable="FlightDetails"/>
<receive partnerLink="DeltaAirlines" portType="aln:FlightCallbackPT" operation="FlightTicketCallback" variable="FlightResponseDA"/>
</sequence>
</flow> |
But we already have that too:
1
2
|
FlightResponseAA = AmericanAirLines.FlightAvailability()
FlightResponseDA = DeltaAirlines.FlightAvailability() |
So it seems to me that BPEL and other workflow tools simply take standard programming language constructs, wrap them up in XML and call it something other than programming. All that engineering effort could have been thrown into interesting research to advance the state of our industry, instead we get this mess…
So, what have I missed? Are workflow systems some kind of amazing solution to a really hard problem, or solution in search of a problem?
Last week I had a whirlwind trip to Seattle to attend the Microsoft Technology Summit, which was intended as a gathering of a small group of technologists to discuss today’s technology issues and opportunities, as well discuss Microsoft’s role & future direction.
The Good
Overall the experience was pleasant and reasonably educational. The three most enjoyable parts of the conference were:
- Hallway conversations with Program Managers for IE and IIS 7, they’re genuinely interested in helping improve the experience of Rails users on windows.
- Informal conversations over meals or drinks with the other attendees.
- Spending time with the team in the Microsoft Open Source Lab
It was great to see the new web.config file in IIS7, it seems like the MS guys really paid attention to what people liked about apache. Jim and John’s talk about dynamic languages on the CLR, was pretty interesting too, they’re both clearly sharp and some cool things are probably in the pipeline.
I was really impressed by the powershell demo, it’s like an REPL with classes and methods for almost everything on your system. I just can’t understand why they decided to take $_.
The Bad
The conference did have its downsides though, some of the sessions were clearly targeted more at people who were either currently using .NET or were likely to switch. This meant we often got demonstrations of IDE interaction rather than an explanation of the underlying technology.
Some of the speakers also seemed to think that if you were an ‘open source guy’ (or gal) you were some kind of foaming at the mouth lunatic. Myself and all of the other attendees are far more pragmatic than that, and it was annoying to be treated as some stallman-esque extremist.
The Ugly
The most frustrating session would have to be Don Box and Chris Anderson on “Why Microsoft Sucks”. Several of the funnier quotes from that session have already made their way onto the web, but there were others, such as:
“But if you’re Matz, or DHH, or Larry Wall, you’re screwed, because you don’t have time to build out this stack and then make it interoperate”.
Of course that’s not the real reason we don’t have a WS-* stack. The reality is that the entire set of standards is a steaming pile of complexity with an infinitesimally small value-add which manages to make even the most simple interactions an enormous engineering effort.
Don and Chris are clearly passionate, intelligent guys, but their talk came across as unjustifiably self-assured given how late Vista was, and the huge disaster that WS-* and SOA have inflicted on our industry.
Conclusion
All in all I’m grateful to Microsoft for inviting me to the summit, it was a great opportunity to meet relevant people within the company and the wider community. For more detailed coverage take a look at Ben Galbraith’s MTS07 articles or check Technorati
Full Disclosure: Microsoft paid my way to the summit and gave me some stuff while I was there. If you think that makes me a shill, you’re nuts but welcome to your opinion.
For those of you who haven’t seen it already, Tumblr is a nice hosted tumblelog service. I’ve enjoyed it so much fun that from now on I’ll be posting any random bits and pieces I find online at Koz’s Web.
If you’re looking for a nice option somewhere between OCD presence and a full-blown blog, give tumblr a try, and if you’re looking for a random peek into what flows through my aggregators, check out Koz’s web
Well yes, except for that kind of policy-driven intermediary-rich environment remains, more or less, science fiction; I personally have never observed such a thing actually working, and I have little faith that the WS-* theorists, meeting in their invitation-only back rooms, cooking up and superseding specs, are going to get it right first time based on zero real-world experience. Particularly with an abomination like WSDL in at the very core.
Sing it brother.
Posted on February 28th, 2007
Sometimes I wonder if it’s worthwhile trying to find common ground between the web and the ‘Enterprisey Guys’ with their Temple of Complexity.
Maybe we’re better off just letting them have the Systems Integration work, and we’ll keep the fun stuff. I’d much rather build a neat new web application than tie some billing system on an AS-400 to a printing system on a zSeries…
Last weekend I was lucky enough to attend Kiwi Foo Camp, 3 days of hanging out with interesting geeks talking about interesting things. Rather than spend my time selling rails to everyone I could meet, I made a conscious effort to attend sessions outside my ‘circle of comfort’. While the evangelist in me still managed to escape at least once, on the whole I kept to the strategy.
The highlights for me were:
- Speaking with Robert O’ Callahan about gecko, caching and fetching external javascripts. More about that some time soon.
- Talking with Ben Goodger about how firefox handles the conflicting demands of plugin authors and core-refactorings. At topic pretty relevant to rails.
- Meeting local and international Perl guys, while I’m not a perl programmer, I really enjoyed hearing their perspective on the issues we have in common.
- Listening to Artur Bergman’s talking on ‘Fucking big Websites’. He’s the only person I’ve met who swears more than DHH, Zed Shaw and myself combined, and the material was good too. Best of luck with the surgery Artur.
Kudos to Nat, Janine and Russell Brown who put on a fantastic event, and to Mahurangi College for being such gracious hosts.
Oh yeah, the Maserati was awesome
Lately I’ve been spending my spare time writing Cocoa code, more on that later. On the whole I’m finding the transition a little strange, but it’s always fun to learn what other frameworks do.
The extremely long method names take a little getting used to. To retrieve an object from a dictionary you use [foo objectForKey:@”bar”], this takes some adjusting if you’re coming from ruby (foo[“bar”]) or Java (foo.get(“bar”)).
Garbage collection can’t come soon enough, while retain/release memory management beats the pants off malloc/free, it’s still too much work and even apple gets it wrong. Those of you thinking garbage collection is slow or inefficient, should read Jamie Zawinsky’s essay on the topic.
I find the handling of nil completely bizarre. First up, you can’t put nil into a collection, you have to use [NSNull null] for that. But the part that really confuses me is message sending.
If you send nil a message, any message, it’ll return nil. Some people like this but I’ve just found it obscures the source of an error. It lets your entire model slowly be swallowed by a single null reference, like some strange cross between Strangelets and cancer.
Overall, I’m enjoying Cocoa, there’s some serious gaps in the documentation and tool chain, but it feels miles ahead of the last desktop programming I did.