[vox-tech] RAID systems

Tue Oct 19 22:20:18 PDT 2004

> > Measuring a real world workload in real world conditions.  Short
> > of that I'd recommend bonnie++ and "PostMark: A New File System
> > Benchmark"
> 
> Right now all I have been doing is cron'ing iostat to give me snapshots
> every few minutes.

Very reasonable.  Although thats a snapshot.  iostat 60 or iostat 600
will give you a more complete picture (24/7 totals instead of occasional
snapshots).

> Yea, the worst is always what I plan for with these sorts of things,
> but I guess no system is foolproof or failsafe.

Indeed, but offsite offline backups are a great place to start.

> The best idea I have of the population of files that will be stored is:
>  random.  I have general statistics, but they can change on even a
> daily basis.  Most of the storage would be for millions of <64k text
> files, but not always.

I like to run something like:
	http://broadley.org/bill/dirstat.pl

[root at localhost perl]# time ./dirstat.pl /
scanning /

Total directories =    25807
Total files       =   389283
Total size        =    98441.5 MB
Average Directory =       15.1 files and  3906.08 KB
Maximum Directory =     7522 files //dev
Average filesize  =      258.95 KB

real    0m21.077s
user    0m5.128s
sys     0m10.775s
[root at localhost perl]#

So things to look for:
* large directories might need application changes for smaller dirs,
  ext3 htrees, reiserfs or other support for large dirs.
* average file size (for inode allocation)

> > I believe ext3 will allocate additional inodes as needed, no need to
> > preallocate.
> > 
> 
> One of the previous raid systems (scsi hardware raid) that we had ran
> out of inodes (it was formatted ufs and ran in solaris) in the first
> month or two that we used it for production.  I just don't want to make
> the same mistake twice...

Ugh, indeed, I must have misremember or maybe remembering for the wrong
filesystem.  Never allocate more than one inode per block though,
they will go to waste.

> As mentioned before, pretty randomized populations, and there's a high
> degree of variance between projects.  Basically, we are sent huge
> populations of data, we process the data into different formats, and
> return it.  The input data are mostly correspondance (email, word docs,
> spreadsheets, etc), but that is generally just a rule of thumb...  The
> populations are simply moving targets that vary widely from each
> project, and that is all that I have to go on... :)

If you are ever stuck with a lack of inodes you can make a filesystem in a 
file and loop mount it.

> For some projects, there can be 3 million files where 99% are less than
> 4k in size.  For others there can be 3000 files where all are more than

mkfs.ext3 -T news will make one inode per 4kb block.

> 128k.  Most fall somewhere in between.  Knowing exact numbers would
> mean that I could tell the future and know what would be coming in the
> door (which would be cool...).

Heh.

> Again, here is my dilemma.  I just chose something that would hopefully
> e "good enough(tm)" to use everyday, and something that would handle 30
> gazillion 2k files (I for-sure know there will be gazillions of emails,
> most of which are less than 2k, what I don't know is the ratio of
> smaller files to larger files).

Files smaller than blocksize aren't coalesced afaik, you might need 
another fs if you need it, on the otherhand you can set 1k or 2k blocks.

> I have a triple supply on the drive cabinet and a double supply on the
> box, all fed by UPS.  

Nice.

-- 
Bill Broadley
Computational Science and Engineering
UC Davis