[vox-tech] Parsing Html
Mike Simons
vox-tech@lists.lugod.org
Wed, 11 Jun 2003 16:17:06 -0400
--7ohyzAr2DuZRs7WU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, Jun 11, 2003 at 03:08:27PM -0500, Jay Strauss wrote:
> I have the mongo piece of html I read from an online source. I want to
> parse it, particularly I'm interested in a specific table (one of many
> within the html). I'd like to get at that table and basically turn it in=
to
> a perl data structure I can use
> like: array of array refs, that is an element for each row that points a =
an
> array of cells
>=20
> I tried to read and use HTML::Parser but I was overwhelmed. Anyone know =
an
> easy way to do this?
need more detail.
- can you give the url to the table?
or
if you can't give the url explain generally give a=20
sample of one or two table elements.
- are you interested in a one time parse=20
or=20
will you be re-running this on the changing contents of the page every da=
y?
normally I just use s/aaa/bbb/ expressions to chop out the html crap
from a table, then a m/()()()/ style thing to convert the elements into
useful data structure...
--=20
GPG key: http://simons-clan.com/~msimons/gpg/msimons.asc
Fingerprint: 524D A726 77CB 62C9 4D56 8109 E10C 249F B7FA ACBE
--7ohyzAr2DuZRs7WU
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE+545C4Qwkn7f6rL4RAtuWAJ9qD8zYNB/aSXCH1SsHwF4y7PN/PACdETZ5
z4Fxjn+/CTLfds0z0zSETZ4=
=Gh5y
-----END PGP SIGNATURE-----
--7ohyzAr2DuZRs7WU--