[vox-tech] Suggestions for cleaning up repetitive HTML tags?
Bill Kendrick
nbs at sonic.net
Wed Aug 18 16:23:52 PDT 2010
On Wed, Aug 18, 2010 at 01:29:14PM -0500, Chanoch (Ken) Bloom wrote:
> Consider writing a SAX filter that just drops the offending <font> and
> </font>.
Well, we want the style info to remain... there's just no reason in
the world for the document to specify it over and over again on
a per-word or per-character(!) basis. :)
> Also consider using XPath, like my following example in Ruby (using the
> Nokogiri XML library)
Ooooh. Thanks, I'll poke at this. (I know there's some some Xpath stuff
in PHP that I know nothing about, since I've only spoken to it about
XML via its DOMDocument stuff, so far.)
Thanks,
-bill!
More information about the vox-tech
mailing list