[vox-tech] Performance tuning for http file serving

Bill Broadley bill at broadley.org
Tue Apr 13 00:23:23 PDT 2010


On 03/31/2010 05:12 PM, Alex Mandel wrote:
> I'm looking for some references and tips on how to tune a server
> specifically for serving large files over the internet. ie 4 GB iso
> files. I'm talking software config tweaks here.

How many 4GB ISO files are there?  How many simultaneous files? 
Clients?  How fast is the uplink to er, umm, wherever the clients are 
(on the internet)?

> The system is using a RAID with striping, the filesystem is XFS (some
> tuning there maybe?)

Your enemy is random access, I wouldn't expect XFS, ext2/3/4, jfs, or 
any of the many other alternatives to make much difference.

> and will be running Apache2.2 all on a Debian
> Stable install if that helps. It's got 2 2.5ghz cores and 8GB of ram
> (those can be adjusted since it's actually a kvm virtual machine).

Well your enemy is random access.  To keep things simple lets just 
assume a single disk.  If you have one download and relatively current 
hardware you should be able to manage around 80-90MB/sec (assuming your 
network can keep up).

What gets ugly is if you have 2 or more clients accessing 2 or more 
files.  Suddenly it becomes very very important to intelligently handle 
your I/O.  Say you have 4 clients, reading 4 ISO files, and a relatively 
stupid/straight forward I/O system.  Say you read 4KB from each file (in 
user space), then do a send.

Turns out a single disk can only so 75 seeks/sec or so, that means you 
only get 75*4KB = 300KB/sec.  Obviously you want to read ahead on the 
files.

You might assume that a "RAID with striping" will be much faster than a 
single disk on random workloads.  I suggest testing this with your 
favorite benchmark, something like postmark or fio.  Set it up to 
simulate the number of simultaneous random access over the size of files 
you expect to be involved.

Once you fix that problem you run into others as the bandwidth starts 
increasing.  Normal configurations often use basically read(file...) to 
write(socket...).  This involves extra copies and context switches. 
Context switches related to kernel performance much like random reads do 
to I/O performance.  Fixes include using mmap or sendfile.

Which brings me to my next question... do you have a requirement for 
apache or is that just the path fo least resistance?  Various servers 
are highly optimized for serving large static files quickly.  Tux 
springs to mind, although is somewhat dated these days.  A decent 
summary is at:
    http://en.wikipedia.org/wiki/TUX_web_server

A more recent entry into the simple/fast webservers is nginx, fairly 
popular for a niche server.  Various popular sites like wordpress.com, 
hulu, and sourceforge use it.


More information about the vox-tech mailing list