[vox-tech] Parsing Html

Mike Simons vox-tech@lists.lugod.org
Wed, 11 Jun 2003 16:17:06 -0400


--7ohyzAr2DuZRs7WU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 11, 2003 at 03:08:27PM -0500, Jay Strauss wrote:
> I have the mongo piece of html I read from an online source.  I want to
> parse it, particularly I'm interested in a specific table (one of many
> within the html).  I'd like to get at that table and basically turn it in=
to
> a perl data structure I can use
> like: array of array refs, that is an element for each row that points a =
an
> array of cells
>=20
> I tried to read and use HTML::Parser but I was overwhelmed.  Anyone know =
an
> easy way to do this?

need more detail.

- can you give the url to the table?
  or
  if you can't give the url explain generally give a=20
    sample of one or two table elements.

- are you interested in a one time parse=20
  or=20
  will you be re-running this on the changing contents of the page every da=
y?


  normally I just use s/aaa/bbb/ expressions to chop out the html crap
from a table, then a m/()()()/ style thing to convert the elements into
useful data structure...

--=20
GPG key: http://simons-clan.com/~msimons/gpg/msimons.asc
Fingerprint: 524D A726 77CB 62C9 4D56  8109 E10C 249F B7FA ACBE

--7ohyzAr2DuZRs7WU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+545C4Qwkn7f6rL4RAtuWAJ9qD8zYNB/aSXCH1SsHwF4y7PN/PACdETZ5
z4Fxjn+/CTLfds0z0zSETZ4=
=Gh5y
-----END PGP SIGNATURE-----

--7ohyzAr2DuZRs7WU--