Komputerwiz

Yesterday was rather eventful. My 9/11 Tenth Anniversary memorial image made it to the top rankings in Google Images and brought a lot of traffic to my site—so much traffic, in fact, that it overwhelmed my server and triggered a sudden need for me to rethink how my server operates.

It all started when I received a notification from Rackspace stating that my server was forcibly restarted because it exceeded its RAM and swap limits. I logged in to see that the problem was still happening and that Apache was the culprit: an influx of web requests had forked hundreds of worker processes ¹. Each individual process uses little memory, but several hundred processes add up quickly.

Apache’s overhead stems from this forking pattern: when the parent process forks to handle excess HTTP requests, each worker process carries the full memory footprint of its prototype ² This means that modules like PHP, SSL, directory index, and authorization are always loaded even for simple tasks like serving static files.

I could have solved this problem in a variety of ways: stopping Apache completely, slimming Apache by disabling a couple of modules, tighter restriction of the maximum number of worker processes, or by using a more lightweight server altogether.

I decided to use the last solution and chose Nginx as a nimbler proxy for Apache. However, I was still locked out of my server by the flood of requests. Even after another forced reboot, the HTTP requests were too numerous for me to log in remotely and stop Apache: I had to put the server into recovery mode to set up the solution. I’m sorry if you were affected by this downtime.

The solution entailed using Nginx as the “official” Web server to serve static files where possible and to proxy other requests to a locally bound Apache. The configuration was fairly simple:

server {
    listen 50.56.79.101:80;
    server_name matthew.komputerwiz.net;

    location ~ /wp-content/uploads/ {
        root /path/to/webroot/;
    }

    location / {
        proxy_pass http://127.0.0.1:80/;
        proxy_redirect off;
        include /etc/nginx/proxy_params;
    }
}

It is worth noting that the first location config block filters out requests containing the path to the uploads directory, and “points” the request at the desired web root directory. Hence “/wp-content/uploads” will already be appended to the path specified in the root directive.

The key is to bind Nginx to the external IP address, pass the host name to Apache so it can do its name-based virtual host resolution (in the included proxy_params config),

proxy_set_header        Host            $host;
proxy_set_header        X-Real-IP       $remote_addr;
proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;

and then bind Apache to the loopback (localhost) interface.

NameVirtualHost *:80
Listen 127.0.0.1:80

At this point, I reenabled Apache and did not have any memory issues for the rest of the day.

In an operating system, forking is a low-level kernel routine that creates a clone of the executing process with the same code and memory layout. The process for which fork() returns a positive ID is parent, and the process for which fork() returns 0 is the child ↩︎
since the worker processes are child processes, does this mean that Apache is guilty of child labor? :-P ↩︎

How to rebuild an overloaded web server in a hurry