[vox-tech] PHP / CURL
Dave Margolis
dave at silogram.net
Mon Sep 4 18:27:13 PDT 2006
On Sep 1, 2006, at 10:35 AM, serendipitu serendipitu wrote:
> I need to READ some data from that page without manually loging in
> every 24 hours.
PHP/curl makes this pretty easy (depending on how much energy the
site developers have put into trying to prevent screen-scraping).
Also, any language that has a curl implementation can also do this
(PERL is one that comes to mind).
You need a pretty strong understanding of PHP and a basic
understudying of how HTTP works. You'll need a webserver that runs
PHP or a local machine with the PHP command line interface
installed. Then you'll need a script. That script will take a
series of steps that each represent a login, a link click, a form
submission, or some kind of user interaction with a website.
The process basically works like this:
First you call curl_init() to get things started.
You need to call curl_setopt() any number of times to define what
type of call you're going to make (in this case a series of HTTP
transactions). These curl_setopt() calls are very similar to the
command line switches you'd throw at the command line version of curl.
Then you finish up with a curl_exec() and a curl_close().
It took me a lot of ready and trial and error to figure this all
out. I'd start here: http://www.php.net/manual/en/ref.curl.php
Every site is different, and it's difficult to tell you what to do
without having a half.com account.
Dave
More information about the vox-tech
mailing list