[vox-tech] Perl - reading fixed width formats
Peter Jay Salzman
p at dirac.org
Thu Aug 13 07:50:58 PDT 2009
Hi all,
I need to read in ~25000 files whose lines are in a fixed file format:
field 1: line 1, chars 1-4
field 2: line 1, chars 5-6
field 3: line 1, chars 7-11
field 4: line 1, chars 12 to EOL
field 5: line 2, chars 1-30
field 6: line 3, chars 1-10
field 7: line 4, chars 1-2
...
The naive method is looping over each file, incrementally reading 4 chars, 2
chars, 5 chars, etc. However, the slowest part of all this (I would think)
would be the constant disk access for each field for each file.
There are ~25000 files. Altogether they're 1.7GB. The machine I'm on has
32GB of memory. It only took 27s to read all 25 kfiles into a Perl array on
a typically loaded machine.
I can imagine some fancy scenario where I slurp in all files into an array,
and then process, but there are many different ways of doing this.
Am I correct that reading everything into memory in one fell swoop will make
a noticeable difference in run time?
And once everything is in an array, what's the most efficient way to take a
string representing one line of a file, and break it down into fields? Is
there anything faster than substr()?
Thanks!
Pete
More information about the vox-tech
mailing list