[vox-tech] simple parsing of multiple-line records... [solved]
Dylan Beaudette
dylan at iici.no-ip.org
Sun Apr 24 15:25:10 PDT 2005
Thanks again Mark, for the excellent examples.
After a bit of reading in an O'Reilly book on AWK, it seems that
multiple-line records aren't as hard as i was making them:
awk ' BEGIN {FS = "\n"; RS = ""}... rest of script'
did the job!
thanks!
--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341
On Apr 22, 2005, at 12:35 AM, Mark K. Kim wrote:
> AWK was made for what you're trying to do. I don't know AWK all that
> well, but here's a very simple example that will produce a line-record
> from the data you gave. This works with GAWK, and may or may not work
> with pure AWK:
>
> /easting/ { printf("rec=%d easting=%s ", rec, $2); }
> /northing/ { printf("northing=%s ", $2); }
> /elevation/ { printf("elevation=%s ", $2); }
> /distance along surface/ { printf("dist=%s\n", $4); rec++; }
>
> After creating the above as a file (call it "parse.awk"), you can parse
> your data (assuming the data is in a file called "data.txt") like this:
>
> $gawk -f parse.awk data.txt
>
> which will produce the following:
>
> rec=0 easting=661674.9375 northing=4035004.0000 elevation=968.8617
> dist=15.9540
> rec=1 easting=661683.7500 northing=4034946.7500 elevation=961.4768
> dist=58.4077
>
> where "rec" is the record number, and following is the rest of the
> data.
> This code makes several assumptions like data ordering and so forth, SO
> YOU SHOULDN'T USE THE ABOVE CODE in production but use it only as a
> reference to write a more robust code.
>
> Here's a code that's a little more robust but you should still
> customize
> it for your data:
>
> # Check to see whether we want to print the record, then print if we
> do.
> function rec_chk() {
> # If I have all the data for a record, print the record
> if(easting && northing && elevation && dist) {
> # Print the record
> print_rec();
>
> # Reset all data and increment the record counter
> easting = northing = elevation = dist = "";
> rec++;
> }
> }
>
> # Print the record.
> function print_rec() {
> printf("%d %s %s %s %s\n", rec, easting, northing, elevation,
> dist);
> }
>
> # Data updates
> /easting/ { easting=$2; rec_chk(); }
> /northing/ { northing=$2; rec_chk(); }
> /elevation/ { elevation=$2; rec_chk(); }
> /distance along surface/ { dist=$4; rec_chk(); }
>
> There is a lot of built-in functions if you want to do some math
> operations or variable type conversions or whatever you can think of.
> See `info gawk` under "functions" or "library functions" for more
> information.
>
> I hope that gets you started.
>
> -Mark
>
> PS: I usually just use PERL for something like this than AWK -- PERL
> has
> more support, is more flexible, and I know it better than AWK! GAWK
> sure
> does make simple parsing like this easier if you know how, though!
>
> PPS: A friend of mine who went to Columbia tells me he took a class
> from
> "A" (Prof. Alfred Aho) of "AWK"... =P How funny!
>
>
> On Thu, 21 Apr 2005, Dylan Beaudette wrote:
>
>> Hi everyone,
>>
>> this might be a really simple question, but :
>>
>>
>> i have a text file with multiple-line records in a format like this:
>>
>> ----------------------------
>> easting: 661674.9375
>> northing: 4035004.0000
>> elevation: 968.8617
>> distance along surface: 15.9540
>>
>> easting: 661683.7500
>> northing: 4034946.7500
>> elevation: 961.4768
>> distance along surface: 58.4077
>> -----------------------------
>>
>>
>> I am familiar with single-line records, and work with them in AWK
>> quite
>> frequently. However, i have yet to come upon an elegant solution to
>> parsing something like the above file, such that i can assign each
>> field in a multi-line record to a unique variable. a solution in AWK
>> would be ideal, but i am open to tackle this problem with something
>> like python.
>>
>>
>> any ideas?
>>
>> thanks!
>>
>> Dylan
>>
>>
>> _______________________________________________
>> vox-tech mailing list
>> vox-tech at lists.lugod.org
>> http://lists.lugod.org/mailman/listinfo/vox-tech
>>
>
> --
> Mark K. Kim
> AIM: markus kimius
> Homepage: http://www.cbreak.org/
> Xanga: http://www.xanga.com/vindaci
> Friendster: http://www.friendster.com/user.php?uid=13046
> PGP key fingerprint: 7324 BACA 53AD E504 A76E 5167 6822 94F0 F298 5DCE
> PGP key available on the homepage
> _______________________________________________
> vox-tech mailing list
> vox-tech at lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>
>
More information about the vox-tech
mailing list