Tag: Performance

ActiveRecord, the ORM of Ruby on Rails is a great tool to model database backed applications without writing a lot of repetitive SQL queries. By following the principle “Convention over Configuration”, you don’t need to setup a lot of configuration to get started. So no getter and setters and no mapping definitions if you follow the conventions. At first, it feels a little bit like magic, except that it is not. In the end, it is still generating SQL for you. That means, that even Active Record does a great job for you, it cannot protect you from bad queries.

During the past few months, I have seen my fair share of code that makes your database cry havoc and even worse, takes down an entire application on subsequent requests. Most of these mistakes could have been easily avoided by following one simple guidline: “Be reasonable. Before committing your code, take a look at the generated SQL and keep in mind, that the production database might have more than just 10 records inside.”

So let’s take a look at some example code that might be trouble for your application:

At first sight, this doesn’t look very bad. It gets the job done and probably performs nicely in a development environment. First, let’s look at the SQL being generated:

We are fetching every single column of every user in our system just to get the id’s. Even worse, for every user in the database, ActiveRecord creates a new user object. That takes time and uses up memory. Just imagine what would happen if you have a couple of thousands or more user in your database. At best, the request will just timeout after 60 seconds and everything goes back to normal. Worst case, memory consumption rockets, takes out the server and the database is jammed with long running queries. Not bad for just one line of code.

Let’s assume this line of code was somewhere inside a view in the admin interface providing the option to assign a project owner. To fix the problem, we can do the following:

  1. Paginate
  2. Don’t fetch all columns
  3. Move the code to the controller, that’s where it belongs

Here are some other gems I have found hidden somewhere in a view:


Tips

So here are my tips to prevent these kind of time bombs in your application:

  • Look at the generated SQL
  • Let AR/SQL do the math (conditions,count,sum,
    named_scope, sorting, ordering) instead of using ruby’s enumerable functions
  • Paginate, but paginate in SQL not on ruby collections (User.paginate vs. User.all.paginate)
  • Only fetch what you really need (:select => “id, name”)
  • Eager load associations with :include to avoid the n + 1 problem (Post.all.each{|post| puts post.author.name} vs. Post.all(:include => :author))
  • Whenever possible fetch all records you need from inside a controller or a model

Check SQL logs

To get a feeling of what is happening behind the scenes, I like to try things out in script/console and look at the generated SQL. To see it directly in irb, I change the Rails logger to STDOUT in my ~/.irbrc file:

To get even more information, there are plenty of cool profiling tools that hook into your Rails app and give you a nice GUI with all the information you need. Try out New Relic, FiveRuns TuneUp or Rack::Bug. There is also great Railscast explaining all of these three tools.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

In the last post I’ve introduced simple techniques to improve the performance of a web application. In this post I will talk about performance optimization on the browser side. With a few easy steps, the pages will load a lot faster and you will reduce bandwidth consumption and load on your servers.

Yahoo! has released a great tool called YSlow. YSlow analyzes web pages and suggests ways to improve their performance based on a set of rules for high performance web pages. It is an addon for Firebug, which in turn is an add-on for Firefox.

There are 22 rules defined in YSlow:

  1. Minimize HTTP Requests
  2. Use a Content Delivery Network
  3. Add an Expires or a Cache-Control Header
  4. Gzip Components
  5. Put StyleSheets at the Top
  6. Put Scripts at the Bottom
  7. Avoid CSS Expressions
  8. Make JavaScript and CSS External
  9. Reduce DNS Lookups
  10. Minify JavaScript and CSS
  11. Avoid Redirects
  12. Remove Duplicate Scripts
  13. Configure ETags
  14. Make AJAX Cacheable
  15. Use GET for AJAX Requests
  16. Reduce the Number of DOM Elements
  17. No 404s
  18. Reduce Cookie Size
  19. Use Cookie-Free Domains for Components
  20. Avoid Filters
  21. Do Not Scale Images in HTML
  22. Make favicon.ico Small and Cacheable

Since I like to learn by example, I’ve created a simple, rather ugly web application called Cootweets. It will be some sort of a “geo tagged flickr image with a tweet attached to it”-thing. So throughout this blog post, I will try to optimize the start page of Cootweets to satisfy YSlow’s requirements. I won’t talk about every rule, just the ones that YSlow complained about. Here is what the start page looks like: :

cootweets_home_post_format

So lets get started and make the start page load faster.

Make fewer HTTP requests (B)

  This page has 6 external Javascript scripts. Try combining them into one.
  This page has 3 external stylesheets. Try combining them into one.

What is this about?

After a browser has received a response from your server, it parses its content and tries to render it. It downloads every resource included in the HTML like CSS, Javascript, Flash and images. That means, for each resource, the browser fires up a new HTTP request. So by combining files, we can reduce the number of requests by 7.

Lets fix it

Let’s combine the CSS and Javascript. If you are running Ruby on Rails, you can simply tell the asset tag helper to combine the files by passing in the :cache option. There is also a plugin called AssetPackager which compresses the files too.

Giving it a second thought, I realize that I don’t even need Javascript on the start page, it’s just a habit to include the jQuery files, because I use them so frequently. So getting rid of it, is the best option at this point.

Add Expires headers (F)

  There are 7 static components without a far-future expiration date.

      * (no expires) http://localhost:3000/stylesheets/all.css?...
      * (no expires) http://localhost:3000/javascripts/all.js?...
      * (no expires) http://localhost:3000/images/compass.png
      * (no expires) http://localhost:3000/images/claim.png?...
      * (no expires) http://localhost:3000/images/camera.png?...
      * (no expires) http://localhost:3000/images/geotag.png?...
      * (no expires) http://localhost:3000/images/tweety.png?...  

What is this about?

Expires headers basically tell the browser how long it can cache a specific component. That avoids unnecessary HTTP requests on subsequent page views. That speeds up page load, saves bandwidth and reduces the load on the servers. If you use a CDN, setting the Expires headers will save you money.

Lets fix it

The implementation highly depends on your setup but the idea is the same. Tell your webserver to add a fare future Expires headers to static assets like CSS, Javascript and images. There is however one pitfall. What happens if you change a file without changing its name? The browser will probabely use the cached version. So we need to add some kind of versioning to our assets. With Cootweets we are in luck, because Rails appends a timestamp to the URLs created with the asset tag helper. If a file changes, its URL changes and the browser will discard the cached version.

Cootweets runs on Nginx and Passenger. Here is how I changed the Nginx configuration:

If you are running Apache, the solution is similar, but somehow I could not convince Apache only to cache assets that have a time stamp appended. I tried the same regular expression I used with Nginx, but then Apache just didn’t set any Expires headers at all. This actually makes sense, because the directive is called FilesMatch and not RequestMatch. I have come up with a workaround using Apache variables.


However, this does not set the time dynamically. Well of course, we could alter the Apache configuration on each deploy to fix this but that seems cumbersome for such a little task. If someone knows how to do this dynamically with an Apache directive, please let me know.

You can check if it works by looking at the HTTP headers returned from the webserver. YSlow offers a convenient way to do this but you can also use curl with the –head option to look at the headers.

expire_headers_post_format



Looking at the browser logs, I have noticed something strange. Even though the Expires headers are set correctly, Safari and Firefox still requested the components at the server and got back a 304 Not modified. While this is an improvement, I prefer no requests over a few requests. It took me quiet some time to figure out that this only happens when an explicit reload is triggered in the browser.

Compress components with GZip (D)

  There are 3 plain text components that should be sent compressed

      * http://localhost:3000/
      * http://localhost:3000/stylesheets/base_packaged.css?...
      * http://localhost:3000/javascripts/base_packaged.js?...

You can save bandwidth by compressing HTML pages, CSS stylesheets and Javascript. Luckily, you don’t have to modify your application. The web server can take care of this. Here is the configuration for Nginx:


An here for Apache:

Results

With a few simple changes, we have reduced the number of requests and the size of the data being transferred. Comparing an empty with a primed cache, I reduced the total number of requests from 8 down to 2 in my sample application. In terms of used bandwidth, the reduction is from 416.9K to 0.6K. For the sake of the example, I have included the Javascript, even though the page doesn’t use any. There is also a CSS image that has no Expires header and therefore gets requested on every page load.


cache_statistics

YSlow is a very useful tool to quickly analyze the page load performance. It offers a rich set of information and tools comfortably integrated in Firefox. Google has recently released a similar tool called Page Speed. The cool thing about page speed is that it actually estimates the bandwitdh reduction that could be achieved, so make sure to check it out too.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]