[vox-tech] Another Perl CSV question - long lines = segfault
Bill Kendrick
vox-tech@lists.lugod.org
Sat, 18 Oct 2003 12:01:42 -0700
Thanks to the folks who helped me 'fix' the CSV file the other day.
I'm now stumbling upon another problem, where Perl is segfaulting on long
lines. If I skip CSV lines > 3500 characters, it goes on its merry way.
Lines nearing 4000 characters or more cause Perl itself to die (segfault)
when I try to parse using the following code (from "Perl Cookbook"):
sub parse_csv
{
my $text = shift;
my @fields = ();
while ($text =~ m{
# Either some non-quote/non-comma text:
([^"',]+)
# Or...
|
# ...a double-quote field:
" # field's opening quote; don't save this
( # now a field is either:
(?: [^"] # non-quotes
| # or
"" # adjacent quote pairs
)* # any number
)
"
}gx)
{
if (defined $1) { $field = $1; }
else { ($field = $2) =~ s/""/"/g; }
push @fields, $field;
}
return @fields;
}
At first, I thought it was dying due to character combos (commas, quotes,
double-quotes, or what-have-you), but then I started examining line lengths
just before running the parse subroutine.
Looking at an strace, when I do this:
open(F, $somefile);
while (<F>)
{
...
}
...it's only reading 4096 bytes at a time.
Is there a way to increase that buffer, so that I can ensure it reads,
say, up to 20,000 bytes? (The longest line in the file seems to be around
14,000 bytes; at least, that's what "wc -L" tells me)
In the meantime, I'll Google, since the books I'm looking at don't have
good examples of this. ;)
Thx!
-bill!
--
bill@newbreedsoftware.com Got kids? Get Tux Paint!
http://newbreedsoftware.com/bill/ http://newbreedsoftware.com/tuxpaint/