[vox-tech] Perl - reading fixed width formats

Peter Jay Salzman p at dirac.org
Thu Aug 13 07:50:58 PDT 2009


Hi all,

I need to read in ~25000 files whose lines are in a fixed file format:

   field 1: line 1, chars 1-4
   field 2: line 1, chars 5-6
   field 3: line 1, chars 7-11
   field 4: line 1, chars 12 to EOL
   field 5: line 2, chars 1-30
   field 6: line 3, chars 1-10
   field 7: line 4, chars 1-2
   ...

The naive method is looping over each file, incrementally reading 4 chars, 2
chars, 5 chars, etc.  However, the slowest part of all this (I would think)
would be the constant disk access for each field for each file.

There are ~25000 files.  Altogether they're 1.7GB.  The machine I'm on has
32GB of memory.  It only took 27s to read all 25 kfiles into a Perl array on
a typically loaded machine.

I can imagine some fancy scenario where I slurp in all files into an array,
and then process, but there are many different ways of doing this.

Am I correct that reading everything into memory in one fell swoop will make
a noticeable difference in run time?

And once everything is in an array, what's the most efficient way to take a
string representing one line of a file, and break it down into fields?  Is
there anything faster than substr()?

Thanks!
Pete


More information about the vox-tech mailing list