[vox-tech] Performance tuning for http file serving

Alex Mandel tech_dev at wildintellect.com
Tue Apr 13 01:09:36 PDT 2010


Bill Broadley wrote:
> On 03/31/2010 05:12 PM, Alex Mandel wrote:
>> I'm looking for some references and tips on how to tune a server
>> specifically for serving large files over the internet. ie 4 GB iso
>> files. I'm talking software config tweaks here.
> 
> How many 4GB ISO files are there?  How many simultaneous files?
> Clients?  How fast is the uplink to er, umm, wherever the clients are
> (on the internet)?
> 
Most clients will be downloading 1-3 files, and those are likely to be
the same files for everyone.

>> The system is using a RAID with striping, the filesystem is XFS (some
>> tuning there maybe?)
> 
> Your enemy is random access, I wouldn't expect XFS, ext2/3/4, jfs, or
> any of the many other alternatives to make much difference.
> 
>> and will be running Apache2.2 all on a Debian
>> Stable install if that helps. It's got 2 2.5ghz cores and 8GB of ram
>> (those can be adjusted since it's actually a kvm virtual machine).
> 
> Well your enemy is random access.  To keep things simple lets just
> assume a single disk.  If you have one download and relatively current
> hardware you should be able to manage around 80-90MB/sec (assuming your
> network can keep up).
> 
> What gets ugly is if you have 2 or more clients accessing 2 or more
> files.  Suddenly it becomes very very important to intelligently handle
> your I/O.  Say you have 4 clients, reading 4 ISO files, and a relatively
> stupid/straight forward I/O system.  Say you read 4KB from each file (in
> user space), then do a send.
> 
> Turns out a single disk can only so 75 seeks/sec or so, that means you
> only get 75*4KB = 300KB/sec.  Obviously you want to read ahead on the
> files.
>
Is there a way to configure this?

> You might assume that a "RAID with striping" will be much faster than a
> single disk on random workloads.  I suggest testing this with your
> favorite benchmark, something like postmark or fio.  Set it up to
> simulate the number of simultaneous random access over the size of files
> you expect to be involved.
>
It's RAID 6 and there's no changing it.

> Once you fix that problem you run into others as the bandwidth starts
> increasing.  Normal configurations often use basically read(file...) to
> write(socket...).  This involves extra copies and context switches.
> Context switches related to kernel performance much like random reads do
> to I/O performance.  Fixes include using mmap or sendfile.
> 
I'll have to read up more on those.

> Which brings me to my next question... do you have a requirement for
> apache or is that just the path fo least resistance?  Various servers
> are highly optimized for serving large static files quickly.  Tux
> springs to mind, although is somewhat dated these days.  A decent
> summary is at:
>    http://en.wikipedia.org/wiki/TUX_web_server
> 
> A more recent entry into the simple/fast webservers is nginx, fairly
> popular for a niche server.  Various popular sites like wordpress.com,
> hulu, and sourceforge use it.

Apache is least resistance as it's in use on all the machines in this
cluster of various web services and all the admins know how to configure
it. I'm open to exploring anything that's in Debian+Backports and is
current/supported. Most of the oddball things people have mentioned so
far are long abandoned projects from 2005 and earlier. nginx sounds
promising and I'll look into that.

Thanks,
Alex


More information about the vox-tech mailing list