[vox-tech] simple parsing of multiple-line records... [solved]

Dylan Beaudette dylan at iici.no-ip.org
Sun Apr 24 15:25:10 PDT 2005


Thanks again Mark, for the excellent examples.

After a bit of reading in an O'Reilly book on AWK, it seems that 
multiple-line records aren't as hard as i was making them:

awk ' BEGIN {FS = "\n"; RS = ""}... rest of script'

did the job!

thanks!

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341



On Apr 22, 2005, at 12:35 AM, Mark K. Kim wrote:

> AWK was made for what you're trying to do.  I don't know AWK all that
> well, but here's a very simple example that will produce a line-record
> from the data you gave.  This works with GAWK, and may or may not work
> with pure AWK:
>
>   /easting/ { printf("rec=%d easting=%s ", rec, $2); }
>   /northing/ { printf("northing=%s ", $2); }
>   /elevation/ { printf("elevation=%s ", $2); }
>   /distance along surface/ { printf("dist=%s\n", $4); rec++; }
>
> After creating the above as a file (call it "parse.awk"), you can parse
> your data (assuming the data is in a file called "data.txt") like this:
>
>   $gawk -f parse.awk data.txt
>
> which will produce the following:
>
>   rec=0 easting=661674.9375 northing=4035004.0000 elevation=968.8617 
> dist=15.9540
>   rec=1 easting=661683.7500 northing=4034946.7500 elevation=961.4768 
> dist=58.4077
>
> where "rec" is the record number, and following is the rest of the 
> data.
> This code makes several assumptions like data ordering and so forth, SO
> YOU SHOULDN'T USE THE ABOVE CODE in production but use it only as a
> reference to write a more robust code.
>
> Here's a code that's a little more robust but you should still 
> customize
> it for your data:
>
>   # Check to see whether we want to print the record, then print if we 
> do.
>   function rec_chk() {
>      # If I have all the data for a record, print the record
>      if(easting && northing && elevation && dist) {
>         # Print the record
>         print_rec();
>
>         # Reset all data and increment the record counter
>         easting = northing = elevation = dist = "";
>         rec++;
>      }
>   }
>
>   # Print the record.
>   function print_rec() {
>      printf("%d %s %s %s %s\n", rec, easting, northing, elevation, 
> dist);
>   }
>
>   # Data updates
>   /easting/ { easting=$2; rec_chk(); }
>   /northing/ { northing=$2; rec_chk(); }
>   /elevation/ { elevation=$2; rec_chk(); }
>   /distance along surface/ { dist=$4; rec_chk(); }
>
> There is a lot of built-in functions if you want to do some math
> operations or variable type conversions or whatever you can think of.
> See `info gawk` under "functions" or "library functions" for more
> information.
>
> I hope that gets you started.
>
> -Mark
>
> PS: I usually just use PERL for something like this than AWK -- PERL 
> has
> more support, is more flexible, and I know it better than AWK!  GAWK 
> sure
> does make simple parsing like this easier if you know how, though!
>
> PPS: A friend of mine who went to Columbia tells me he took a class 
> from
> "A" (Prof. Alfred Aho) of "AWK"... =P  How funny!
>
>
> On Thu, 21 Apr 2005, Dylan Beaudette wrote:
>
>> Hi everyone,
>>
>> this might be a really simple question, but :
>>
>>
>> i have a text file with multiple-line records in a format like this:
>>
>> ----------------------------
>> easting:      661674.9375
>>  northing:     4035004.0000
>>  elevation:         968.8617
>> distance along surface:               15.9540
>>
>>  easting:      661683.7500
>>  northing:     4034946.7500
>>  elevation:         961.4768
>> distance along surface:               58.4077
>> -----------------------------
>>
>>
>> I am familiar with single-line records, and work with them in AWK 
>> quite
>> frequently. However, i have yet to come upon an elegant solution to
>> parsing something like the above file, such that i can assign each
>> field in a multi-line record to a unique variable. a solution in AWK
>> would be ideal, but i am open to tackle this problem with something
>> like python.
>>
>>
>> any ideas?
>>
>> thanks!
>>
>> Dylan
>>
>>
>> _______________________________________________
>> vox-tech mailing list
>> vox-tech at lists.lugod.org
>> http://lists.lugod.org/mailman/listinfo/vox-tech
>>
>
> -- 
> Mark K. Kim
> AIM: markus kimius
> Homepage: http://www.cbreak.org/
> Xanga: http://www.xanga.com/vindaci
> Friendster: http://www.friendster.com/user.php?uid=13046
> PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
> PGP key available on the homepage
> _______________________________________________
> vox-tech mailing list
> vox-tech at lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>
>



More information about the vox-tech mailing list