Using NginX as a frontend web cache

Introduction

I have recently put a couple of NginX servers in front of my web server in order to improve performance. Here's a quick description of how to setup NginX as a frontend cache.

The setup

I have my web site served by an Apache web server. I also have the site replicated to a second server, which is a hot standby. I added two NginX servers as frontend caches to improve performance. As the site is mainly static content, I don't have to worry about sessions, etc. If the web server fails, NginX automatically uses the standby. Load balancing is achieved between the NginX caches using round robin DNS (evil, I know). I also have a small app running on another web server and that is transparently served from another domain by NginX.

NginX config

    http {
        include       /etc/nginx/mime.types;

        upstream main {
            server primary.exmaple.com;
            server backup.example.com backup;
        }

        proxy_buffering           on;
        proxy_cache_valid         any 10m;
        proxy_cache_path          /var/www/cache levels=1:2 keys_zone=my-cache:8m max_size=1000m inactive=600m;
        proxy_temp_path           /var/www/cache/tmp;
        proxy_buffer_size         4k;
        proxy_buffers             100 8k;

        sendfile                  on;
        keepalive_timeout         65;
        tcp_nodelay               on;

        server {
            listen 1.2.3.4:80;
            server_name www.example.com;
            access_log  /var/log/nginx/access.log;
            error_log  /var/log/nginx/error.log;

            proxy_buffering on;
            proxy_set_header  X-Real-IP  $remote_addr;

            location / {
                proxy_pass http://main;
                proxy_cache            my-cache;
                proxy_cache_valid       200;
            }

            location /tools/exampletool/ {
                proxy_pass http://server3.example.com/dir/;
                proxy_cache             my-cache;
                proxy_cache_valid       200;
                proxy_set_header Host server3.example.com;
            }
        }
    }

You probably want to worry about/change the following settings:

upstream: main is an arbitary label, you can set that to whatever you like. You can see that I have entered two server names. These are the names of the origins. As you can probably work out, pri mary is usually used, unless it fails, in which case backup is used. If you were to leave off the backup keyword, the two servers would be used in a round robin system.

proxy_: These are fairly self explanitary and generally look after themselves. You might want to specify a different directory for the cache. Do note that the temp_path and cache_path *must* be on the same volume as one just contains hard links to the other.

server_name: If you've setup a web server, you probably know what this is!

location: This maps a location on the cache to an origin server. You can see that / is passed to http://main, which is defined as two origin servers in the upstream {} section. It also uses my cac he to hold a cached copy of the data. The /tools/exampletool does something a little clever. It is still part of the virtual host, but http://www.example.com/tools/exampletool/ is mapped to a different o rigin server to the rest of the site. Notice also that I have specifed a server and path directly, rather than an upstream source, as I only have one server serving this part of the site. I have also use d proxy_set_header to change the host header that the cache presents to the origin server.

Setting up the origin server

If you're going to use a frontend cache it helps if that cache knows how long it can hold a page for before needing to get a fresh copy from the origin server. Obviously different content will want storing for different lengths of time eg. a news page would want a short time and images or stylesheets can be much longer. This is done using the Cache-Control headers in HTTP. Both Apache and Lighttpd have a mod_expire module which handles this. Other web servers are an excersie to the reader. Read the fine manuals.

Apache

First, make sure that mod_expire is loaded. Then you can add "ExpiresActive" and "ExpiresDefault" directives to your <Directory> or <Location> directives:

    <Location />
                ExpiresActive On
                ExpiresDefault "access plus 1 hours"
    </Location>
    <Location /css>
                ExpiresActive On
                ExpiresDefault "access plus 7 days"
    </Location>

Lighttpd

Again, make sure that you have loaded mod_expire. You can then set expire.url directives:

    $HTTP["url"] =~ "^/" {
         expire.url = ( "" => "access plus 1 hours" )
    }
    $HTTP["url"] =~ "^/css/" {
         expire.url = ( "" => "access plus 7 days" )
    }

Load balancing the caches

As previously mentioned, this is achieved by adding an A record for each cache to the DNS zone. This is not exactly ideal as it doesn't achieve terribly good load balancing (the DNS server just hands out the records in a round-robin fashion, which means that several users using the same DNS cache will end up on the same server and different uses accessing different content will cause asymetric load on the caches) however, it is good enough for now. It also means that if a cache fails, the DNS server still quite happilly points users towards it. I have set the TTL to three minutes, so I can manually drop offending records out of the DNS zone in the event of a problem and it will have to do for now.