[vox-tech] RAID systems

Jan W jcwynholds at yahoo.com
Wed Oct 20 11:48:44 PDT 2004


--- Bill Broadley <bill at cse.ucdavis.edu> wrote:

> > > Measuring a real world workload in real world conditions.  Short
> > > of that I'd recommend bonnie++ and "PostMark: A New File System
> > > Benchmark"
> > 
> > Right now all I have been doing is cron'ing iostat to give me
> snapshots
> > every few minutes.
> 
> Very reasonable.  Although thats a snapshot.  iostat 60 or iostat 600
> will give you a more complete picture (24/7 totals instead of
> occasional
> snapshots).

Again, thanks for the great tip.

> 
> > Yea, the worst is always what I plan for with these sorts of
> things,
> > but I guess no system is foolproof or failsafe.
> 
> Indeed, but offsite offline backups are a great place to start.

I'll be talking to my boss about this today.

> 
> > The best idea I have of the population of files that will be stored
> is:
> >  random.  I have general statistics, but they can change on even a
> > daily basis.  Most of the storage would be for millions of <64k
> text
> > files, but not always.
> 
> I like to run something like:
> 	http://broadley.org/bill/dirstat.pl
> 
> [root at localhost perl]# time ./dirstat.pl /
> scanning /
>  
> Total directories =    25807
> Total files       =   389283
> Total size        =    98441.5 MB
> Average Directory =       15.1 files and  3906.08 KB
> Maximum Directory =     7522 files //dev
> Average filesize  =      258.95 KB
>  
> real    0m21.077s
> user    0m5.128s
> sys     0m10.775s
> [root at localhost perl]#
> 
> So things to look for:
> * large directories might need application changes for smaller dirs,
>   ext3 htrees, reiserfs or other support for large dirs.
> * average file size (for inode allocation)
> 

Pretty handy little tool there.  

> > > I believe ext3 will allocate additional inodes as needed, no need
> to
> > > preallocate.
> > > 
> > 
> > One of the previous raid systems (scsi hardware raid) that we had
> ran
> > out of inodes (it was formatted ufs and ran in solaris) in the
> first
> > month or two that we used it for production.  I just don't want to
> make
> > the same mistake twice...
> 
> Ugh, indeed, I must have misremember or maybe remembering for the
> wrong
> filesystem.  Never allocate more than one inode per block though,
> they will go to waste.
> 
> > As mentioned before, pretty randomized populations, and there's a
> high
> > degree of variance between projects.  Basically, we are sent huge
> > populations of data, we process the data into different formats,
> and
> > return it.  The input data are mostly correspondance (email, word
> docs,
> > spreadsheets, etc), but that is generally just a rule of thumb... 
> The
> > populations are simply moving targets that vary widely from each
> > project, and that is all that I have to go on... :)
> 
> If you are ever stuck with a lack of inodes you can make a filesystem
> in a 
> file and loop mount it.

Just make sure that you have one free inode to make the file...

> 
> > For some projects, there can be 3 million files where 99% are less
> than
> > 4k in size.  For others there can be 3000 files where all are more
> than
> 
> mkfs.ext3 -T news will make one inode per 4kb block.

Exactly the option I used, I must have forgotten to post that.

Actually, as i recall, it was:

[root at localhost /]# mkfs.ext3 -b 4096 -R stride=16 -T news /dev/md0

> 
> > 128k.  Most fall somewhere in between.  Knowing exact numbers would
> > mean that I could tell the future and know what would be coming in
> the
> > door (which would be cool...).
> 
> Heh.
> 
> > Again, here is my dilemma.  I just chose something that would
> hopefully
> > e "good enough(tm)" to use everyday, and something that would
> handle 30
> > gazillion 2k files (I for-sure know there will be gazillions of
> emails,
> > most of which are less than 2k, what I don't know is the ratio of
> > smaller files to larger files).
> 
> Files smaller than blocksize aren't coalesced afaik, you might need 
> another fs if you need it, on the otherhand you can set 1k or 2k
> blocks.

For some cases, I might think about reducing the block size, but for
now, I am just going to sit with 4k blocks until there is a good reason
to move to a smaller size.  Even before I think of reducing block
sizes,  I might think about trying reiserfs if the performance for
small files isn't great.  I might be building a few more of these boxes
soon, so that is one thing I am still going to try out.  I think I'll
try it out just to get a comparison between the two fs's.

> 
> > I have a triple supply on the drive cabinet and a double supply on
> the
> > box, all fed by UPS.  
> 
> Nice.
> 

Thanks for all the help and advice, Bill.  I humbly bow to your wise
words :) 

--thanks much

jan

> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> vox-tech mailing list
> vox-tech at lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
> 


=====
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><
Of course the people don't want war. But after all, it's the leaders 
of the country who determine the policy, and it's always a simple 
matter to drag the people along whether it's a democracy, a fascist 
dictatorship, or a parliament, or a communist dictatorship. 
Voice or no voice, the people can always be brought to the bidding 
of the leaders. That is easy. All you have to do is tell them they 
are being attacked, and denounce the pacifists for lack of patriotism, 
and exposing the country to greater danger.
     --Hermann Goering
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><


		
_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com


More information about the vox-tech mailing list