Apr 07 2011

Baking Your Blog with Varnish - Single Web Server Setup

With all the recent talk around the web about "baked blogs", I wanted to give setting up Varnish a shot on my blog which is powered by Drupal 7. Varnish can provide a specific speed boost; their about page describes what it does quite succinctly:

Varnish stores web pages in memory so the web servers don't have to create the same web page over and over again. The web server only recreate a page when it is changed. Additionally Varnish can serve web pages much faster then any application server is capable of - giving the website a significant speed up.

I wanted to get Varnish up and running on my personal web server as a learning exercise. During the time I was attempting to get started, Nate Haug of Lullabot published a hugely informative article on Configuring Varnish for High-Availability with Multiple Web Servers. This helped me out a lot, but the big difference was the last part of the title in Nate's post; "with Multiple Servers"; I only have just the one.

As I worked through some different configuration issues I quickly came to realize that there is not a great deal of information out there on this type of setup so I decided to document my experience. Before I really begin, I should preface this by saying that I am a Varnish novice. Your mileage may vary based on these instructions but this is what worked for me.

Getting Varnish Installed

I scoured Google to find some specific instructions on how to install Varnish on CentOS. Turns out, I should have started with the official site. It's two lines, and it worked like a charm.

rpm --nosignature -i http://repo.varnish-cache.org/redhat/el5/noarch/varnish-release-2.1-2.noarch.rpm

yum install varnish

Getting Varnish Running

Since I am setting this up on a single box, what I wanted was Varnish on port 80 and Apache on port 8080 so that when users type the domain in their browser, their request will go through Varnish by default. Nevertheless, I recommend setting it up in reverse for testing purposes and then once you have everything running smoothly, switching the ports.

Depending on how you have your server configured, you probably will need to open up your firewall to allow connections on port 8080 for testing. You can do this by editing the iptables configuration located at /etc/sysconfig/iptables and adding the following rule:

-A INPUT -p tcp -m tcp --dport 8080 -j ACCEPT

Save your changes and then restart iptables:

/etc/init.d/iptables restart

Before you start Varnish you need to edit two configuration files. The first is located at /etc/sysconfig/varnish. Two items are of note.

  1. The first is the port, 8080, on the first line. This is the port that you are setting up Varnish to listen on. You will need to come back and change this to answer on port 80 once we are ready to serve from Varnish.
  2. The second item is the last line, which specifies the amount of memory to be made available to Varnish. In a setup where Varnish is running on a dedicated server, it is normal to max out the memory here. However, in this configuration, sharing a server with Apache, I have set the memory allocation to be one half of the available memory. So on a server with 1GB, I would set the memory allocation to 512M. 1


    DAEMON_OPTS="-a :8080 \
                 -T localhost:6082 \
                 -f /etc/varnish/default.vcl \
                 -u varnish -g varnish \
                 -S /etc/varnish/secret \
                 -p thread_pool_add_delay=2 \
                 -p thread_pools=<Number of CPU cores> \
                 -p thread_pool_min=<800 / Number of CPU cores> \
                 -p thread_pool_max=4000 \
                 -p session_linger=50 \
                 -p sess_workspace=262144 \
                 -s malloc,<Available memory / 2>"

The VCL

The second configuration file that needs to be changed is a bit more daunting. This is the .vcl configuration file that is located at /etc/varnish/default.vcl. I used some of the configurations listed in Lullabot's example vcl but not all of these make sense for a single server setup. For instance, I removed all references to multiple backends and SSL configurations as I am not making use of these. My vcl config file can be referenced here. I am not going to go through the vcl line by line, since Nate did a really great job of that, but below are a couple important pieces.

The backend(s) you specify is where Varnish is going to send your web requests when they get passed through. In this case this is Apache. For testing we have left Apache on port 80.

backend default {
  .host = "127.0.0.1";
  .port = "80";
}

Why was there a hit/miss?

You can use the following to your vcl config to add diagnostic headers to the response indicating why there was a Varnish cache hit or miss. You can view these headers by using the Net tab in the Firefox extension, Firebug. 2

# Varnish determined the object was not cacheable
if (!beresp.cacheable) {
    set beresp.http.X-Cacheable = "NO:Not Cacheable";

# You don't wish to cache content for logged in users
} elsif (req.http.Cookie ~ "(UserID|_session)") {
    set beresp.http.X-Cacheable = "NO:Got Session";
    return(pass);

# You are respecting the Cache-Control=private header from the backend
} elsif (beresp.http.Cache-Control ~ "private") {
    set beresp.http.X-Cacheable = "NO:Cache-Control=private";
    return(pass);

# You are extending the lifetime of the object artificially
} elsif (beresp.ttl < 1s) {
    set beresp.ttl   = 5s;
    set beresp.grace = 5s;
    set beresp.http.X-Cacheable = "YES:FORCED";

# Varnish determined the object was cacheable
} else {
    set beresp.http.X-Cacheable = "YES";
}

return(deliver);

Going Live

Once you have Varnish up and running, and have tuned it to your liking, you should then switch the ports that Varnish/Apache are listening on so that requests on port 80 from the web pass hit Varnish first.

Update the following two settings in your Apache configuration to listen on port 8080. (/etc/httpd/conf/httpd.conf)

Listen 8080

NameVirtualHost *:8080

You will also need to update any VirtualHost files you are using to listen on port 8080 instead of port 80.

You will then need to reconfigure Varnish to listen on port 80. These files should now be familiar to you /etc/sysconfig/varnish:

DAEMON_OPTS="-a :80 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -u varnish -g varnish \
             -S /etc/varnish/secret \
             -p thread_pool_add_delay=2 \
             -p thread_pools=<Number of CPU cores> \
             -p thread_pool_min=<800 / Number of CPU cores> \
             -p thread_pool_max=4000 \
             -p session_linger=50 \
             -p sess_workspace=262144 \
             -s malloc,<Available memory / 2>"

Update the vcl configuration file so that Varnish is looking for the backend (Apache) on port 8080:

backend default {
  .host = "127.0.0.1";
  .port = "8080";
}

Once these configuration changes are complete, you will need to stop both Apache & Varnish and then start both again.

1. I do not know if this is the most efficient setting. It seemed logical.
2. VCLExampleHitMissHeader

Dec 16 2010

Starting Fresh

I had been planning to move this site from Tumblr to Drupal for some time now but with Tumblr's recent extended downtime I figured this was as good a time as any to move up the timeline. I really enjoyed using Tumblr during the period of time I was a user. There are some features that I am certainly going to miss but features can be built, especially in a framework as flexible as Drupal. Not having to worry about the backend side of things is also a nice bonus when using Tumblr, but really, I am managing my own server anyway so why should I not host my own site there? What can't be understated though, is the value of controlling your own data, in formats that you understand and can adapt if needed.

As a developer, I find it hard to resist using the latest and greatest, sometimes to my own chagrin. Nevertheless, I went ahead and transitioned to an install of Drupal 7 since it has now made it all to the release candidate stage. The module landscape is obviously a bit more sparse than what is out there for Drupal 6 but considering that this is a fairly simple site and how quickly the module development is moving for D7, I would go ahead and take the plunge. Away we go.

Aug 11 2010

Gator Football Calendar Updated

I have updated my Florida Gators football calendar with published game times and broadcast info for the first few games. This will be updated continuously throughout the season as well as with the results of the games, but with Gator Football fast approaching I wanted to make sure everyone was aware! GO GATORS!

Jul 28 2010

Adding All New Files to Subversion

One thing that has always bothered me when working with Subversion is the inability to easily add a large number of new files to the repository. Subversion does not have an equivalent of 'git add *', so we are left to add all of the new files, at best, one folder at a time.

Imagine doing this for a Drupal site that hasn't been touched in a while and you need to apply security updates for core, along with a large number of modules. You could very easily spend more time typing in all of your 'svn add ...' commands than getting the updates applied.

Using the following command, you can affect an 'svn add' command on every line returned from 'svn status' that has a question mark associated it (ie. new to the repository).

svn status | grep "?" | while read f; do svn add $f; done

Hopefully, this should make adding large numbers of new files to Subversion quite a bit easier. Of course, you could always just use git!

Jul 12 2010

Apache Solr / Drupal search performance tip

A great tip by Davy Van Den Bremt of Drupal Coder on disabling Drupal's core search indexer when using Apache Solr.

Jul 07 2010

Creating a new Drupal site from a Drush make file

If you spend a fair amount of time creating new Drupal sites and are not using Drush you are missing out! I used to maintain a subversion repository with my most frequently used modules just for this purpose but this was a very tedious, manual task keeping these modules up to date (almost more trouble than it was worth). With Drush Make however, you can just fire it off, and right before your eyes, you will have a brand new directory with the latest versions of all the modules you specified, downloaded and ready to go.

From Drush make's project page:

Drush make is an extension to drush that can create a ready-to-use drupal site, pulling sources from various locations. It does this by parsing a flat text file (similar to a drupal .info file) and downloading the sources it describes. In practical terms, this means that it is possible to distribute a complicated Drupal distribution as a single text file.

Below is an example Drush make file.

; CORE
core = 6.x
projects[] = drupal

; MODULES
; acquia
projects[admin_menu][subdir] = "acquia"
projects[cck][subdir] = "acquia"
projects[filefield][subdir] = "acquia"
projects[imageapi][subdir] = "acquia"
projects[imagecache][subdir] = "acquia"
projects[imagefield][subdir] = "acquia"
projects[pathauto][subdir] = "acquia"
projects[token][subdir] = "acquia"
projects[views][subdir] = "acquia"
projects[webform][subdir] = "acquia"

; administration
projects[password_strength][subdir] = "other"
projects[userprotect][subdir] = "other"

; development
projects[coder][subdir] = "other"
projects[devel][subdir] = "other"

; THEMES
projects[zen][subdir] = "other" 

I have shown the example above to illustrate the make file syntax. This example is a bit more lean, module-wise, then even my bare bones make file is. I have separated out the modules contained in the Acquia distribution, of which I am a big fan of, but this is just for organization's sake.

Once you have created the make file that you are going to use you can implement it with the following command:

drush make "path_to_make_file" /path_to_newsite_/

I have published a standard and a bare-bones make file on GitHub. Please feel free to fork with your changes/additions!

Links of note:

Jul 02 2010

Overriding a Drupal Views SQL Query

We all know and love Drupal Views. The queries that it builds are most often very elegant, but there are certain cases where you need to use a query that is just a bit too complicated for the Views query generator to build on its own. I ran into this recently where I could not accomplish my task without a join to a sub-query.

If you find this to be the case, you can utilize the views_pre_execute hook to override the query being supplied by Views.

Implementation

function hook_views_pre_execute(&$view) 
{
  if($view->name == 'your_views_name') 
  {
    $view->build_info['query'] = "SELECT * FROM node";
  }
}

Once you have implemented this in your custom module, if you edit the view on your site, any changes you make that would normally change the query (sort criteria, filters etc.) will have no affect. However, you can utilize the options in 'Basic Settings', 'Page Settings' to edit the properties of how/where the view is going to display. When you use the 'Preview' button to test your view with various displays, your custom SQL query will be displayed so you can verify that it is being used.

Note: You could just create a custom module with custom sql in lieu of using a view. However, if you have already built a view and styled your page/block based on the views generated elements you might not be looking forward to this. In this case you could use the above technique to override the query without disrupting the rest of the work you have done relying on the view.

Feb 13 2010

Blizzard Wonderland

Jan 18 2010

Updated Florida Gators 2010 Football Calendar

I have updated my Florida Gators Football calendar for the 2010 season. If you add this calendar as a subscription in any calendaring program that supports the iCal standard (iCal, Google Calendar, Outlook, etc) it will update itself on a regular basis with game times and channels when these become available! Florida Gators 2010 Football Calendar.

Dec 30 2009

Optimizing Multiple Domains for Google Using Apache's mod_rewrite

If you own more than one domain but point them all to the same site, a portion of your virtual host file may look something like this:

<VirtualHost *:80>
   ServerName maindomain.com 
   ServerAlias www.maindomain.com alternatedomain.com
</VirtualHost>

The problem here is that Google sees the same content being served on multiple domains and will knock down your search ranking as a result. The way to remedy the situation is to utilize Apache's mod_rewrite so that a 301 request pushes the user to your main url. The only caveat being that no matter which domain you type into the browser url bar, it will then be rewritten to the main domain. Here is an example of the code below that you can add to your virtual host file:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.maindomain.com|alternatedomain.com) [NC] 
RewriteRule ^(.*)$ http://maindomain.com$1 [R=301,L]

Save the file, restart Apache, and your alternate domains should then be rewritten to your main domain, your Google ranking preserved, and all is right with the world.

Subscribe to Front page feed