[vox-tech] Suggestions for cleaning up repetitive HTML tags?

Bill Kendrick nbs at sonic.net
Wed Aug 18 16:23:52 PDT 2010


On Wed, Aug 18, 2010 at 01:29:14PM -0500, Chanoch (Ken) Bloom wrote:
> Consider writing a SAX filter that just drops the offending <font> and
> </font>.

Well, we want the style info to remain... there's just no reason in
the world for the document to specify it over and over again on
a per-word or per-character(!) basis. :)


> Also consider using XPath, like my following example in Ruby (using the
> Nokogiri XML library)

Ooooh.  Thanks, I'll poke at this.  (I know there's some some Xpath stuff
in PHP that I know nothing about, since I've only spoken to it about
XML via its DOMDocument stuff, so far.)

Thanks,

-bill!


More information about the vox-tech mailing list