ActiveRecord, the ORM of Ruby on Rails is a great tool to model database backed applications without writing a lot of repetitive SQL queries. By following the principle “Convention over Configuration”, you don’t need to setup a lot of configuration to get started. So no getter and setters and no mapping definitions if you follow the conventions. At first, it feels a little bit like magic, except that it is not. In the end, it is still generating SQL for you. That means, that even Active Record does a great job for you, it cannot protect you from bad queries.
During the past few months, I have seen my fair share of code that makes your database cry havoc and even worse, takes down an entire application on subsequent requests. Most of these mistakes could have been easily avoided by following one simple guidline: “Be reasonable. Before committing your code, take a look at the generated SQL and keep in mind, that the production database might have more than just 10 records inside.”
So let’s take a look at some example code that might be trouble for your application:
At first sight, this doesn’t look very bad. It gets the job done and probably performs nicely in a development environment. First, let’s look at the SQL being generated:
We are fetching every single column of every user in our system just to get the id’s. Even worse, for every user in the database, ActiveRecord creates a new user object. That takes time and uses up memory. Just imagine what would happen if you have a couple of thousands or more user in your database. At best, the request will just timeout after 60 seconds and everything goes back to normal. Worst case, memory consumption rockets, takes out the server and the database is jammed with long running queries. Not bad for just one line of code.
Let’s assume this line of code was somewhere inside a view in the admin interface providing the option to assign a project owner. To fix the problem, we can do the following:
Here are some other gems I have found hidden somewhere in a view:
So here are my tips to prevent these kind of time bombs in your application:
To get a feeling of what is happening behind the scenes, I like to try things out in script/console and look at the generated SQL. To see it directly in irb, I change the Rails logger to STDOUT in my ~/.irbrc file:
To get even more information, there are plenty of cool profiling tools that hook into your Rails app and give you a nice GUI with all the information you need. Try out New Relic, FiveRuns TuneUp or Rack::Bug. There is also great Railscast explaining all of these three tools.
In the last post I’ve introduced simple techniques to improve the performance of a web application. In this post I will talk about performance optimization on the browser side. With a few easy steps, the pages will load a lot faster and you will reduce bandwidth consumption and load on your servers.
Yahoo! has released a great tool called YSlow. YSlow analyzes web pages and suggests ways to improve their performance based on a set of rules for high performance web pages. It is an addon for Firebug, which in turn is an add-on for Firefox.
There are 22 rules defined in YSlow:
Since I like to learn by example, I’ve created a simple, rather ugly web application called Cootweets. It will be some sort of a “geo tagged flickr image with a tweet attached to it”-thing. So throughout this blog post, I will try to optimize the start page of Cootweets to satisfy YSlow’s requirements. I won’t talk about every rule, just the ones that YSlow complained about. Here is what the start page looks like: :
So lets get started and make the start page load faster.
Expires headers basically tell the browser how long it can cache a specific component. That avoids unnecessary HTTP requests on subsequent page views. That speeds up page load, saves bandwidth and reduces the load on the servers. If you use a CDN, setting the Expires headers will save you money.
Cootweets runs on Nginx and Passenger. Here is how I changed the Nginx configuration:
If you are running Apache, the solution is similar, but somehow I could not convince Apache only to cache assets that have a time stamp appended. I tried the same regular expression I used with Nginx, but then Apache just didn’t set any Expires headers at all. This actually makes sense, because the directive is called FilesMatch and not RequestMatch. I have come up with a workaround using Apache variables.
However, this does not set the time dynamically. Well of course, we could alter the Apache configuration on each deploy to fix this but that seems cumbersome for such a little task. If someone knows how to do this dynamically with an Apache directive, please let me know.
You can check if it works by looking at the HTTP headers returned from the webserver. YSlow offers a convenient way to do this but you can also use curl with the –head option to look at the headers.
Looking at the browser logs, I have noticed something strange. Even though the Expires headers are set correctly, Safari and Firefox still requested the components at the server and got back a 304 Not modified. While this is an improvement, I prefer no requests over a few requests. It took me quiet some time to figure out that this only happens when an explicit reload is triggered in the browser.
An here for Apache:
YSlow is a very useful tool to quickly analyze the page load performance. It offers a rich set of information and tools comfortably integrated in Firefox. Google has recently released a similar tool called Page Speed. The cool thing about page speed is that it actually estimates the bandwitdh reduction that could be achieved, so make sure to check it out too.