[vox-tech] adding line numbers to an HTML file

Micah Cowan micah at cowan.name
Thu Oct 26 15:29:01 PDT 2006


On Thu, 2006-10-26 at 15:13 -0700, Dylan Beaudette wrote:
> On Thursday 26 October 2006 14:25, Micah Cowan wrote:
> > On Thu, 2006-10-26 at 14:23 -0700, Dylan Beaudette wrote:
> > > Hi everyone,
> > >
> > > wondering if there is a simple way to add line numbers to every non-html
> > > tag in a webpage:
> > >
> > > here is a dirty hack that does not work very well:
> > >
> > > lynx -source http://casoilresource.lawr.ucdavis.edu/drupal/node/319/print
> > > | cat -n > test.html
> > >
> > > or - if there is a way to add line numbers to non-tag data, similar to
> > > how the paste (http://rafb.net/paste/) service works.
> > >
> > > any ideas would be very helpful!
> >
> > The link you show doesn't seem to distinguish "tag data", and it's
> > really not clear to me exactly what you're trying to accomplish. Perhaps
> > if you could post a short "before-and-after" example?
> >
> > Depending on what you want, Perl or Python--or possibly even just
> > awk--should be able to meet your needs, but I can't really give you a
> > solution until I understand the problem properly :)
> 
> some clarification is indeed warranted:
> 
> the page in question 
> (http://casoilresource.lawr.ucdavis.edu/drupal/node/319/print) produces 
> printer-friendly output (simple html). I would like to add line numbers to 
> this document so that the students in my class can easily refer to specific 
> lines of code. In my hack posted above, i add a line number to *every* line - 
> even html tags like <head>, <body> , etc. I would like to add line numbers to 
> the text in-between html elements. i.e
> 
> <body>
> 1 something
> 2 about 
> 3 some other thing
> 4 here
> ...
> </body>
> 
> perhaps some regex-fu is required?

A quick awk script I whipped up earlier when you first posted was:

  awk 'BEGIN{ l=1 }   $1 ~ /^[^<]/ { $0 = l++ " " $0 }   { print }'

Which takes its input and prepends an incrementing line number before
any lines that don't start with a "<" as their first non-space
character. That's almost as stupid as "cat -n", though, as it will add
attributes before lines that are part of a tag spread across multiple
lines, and will fail to add line numbers before lines that consist of
mostly textual content (such as a line that is wrapped in a <b> tag).

Given your specific example, I notice that most of the examples consist
of lines ending in <br>, so for this specific case, you might use:

  awk 'BEGIN{ l=1 }   $0 ~ /<br>$/ && $1 ~ /^[^<]/ { $0 = l++ "&nbsp;"
$0 }   { print }'

Which helps limit it to the lines in the examples.

If you want them to reset for each example, the following might work:

  awk 'BEGIN{ l=1 }

  {
    if ($0 ~ /<br>$/ && $1 ~ /^[^<]/)
      $0 = l++ "&nbsp;" $0;
    else
      l=1;

    print;
  }'

-- 
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/




More information about the vox-tech mailing list