[vox-tech] how to modify .htaccess to prevent wget or the likes from downing my site?

Chanoch (Ken) Bloom kbloom at gmail.com
Wed May 25 12:10:21 PDT 2011


On Wed, 2011-05-25 at 14:50 -0400, Hai Yi wrote:
> Hello all:
> 
> I first asked this question to the support of my web host, and they
> redirected me to this link:
> http://www.webhostingtalk.com/showthread.php?t=437549
> 
> and the snippet on that page looks like:
> 
> 
> SetEnvIfNoCase User-Agent "^Wget" bad_bot
> 
> <Limit GET POST>
>    Order Allow,Deny
>    Allow from all
>    Deny from env=bad_bot
> </Limit>

This snippet will only block wget, if wget deigns to identify itself as
wget by saying so in the user-agent string.

> 
> I copied and pasted it to the .htaccess under /public_html. Still, I
> am able to use this command to fetch my site:
> 
> wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com

Yup. Wget decided to identify itself as Mozilla in the user-agent
string. That means you have no way at all of knowing that someone's
trying to use Wget to download from your site.

> However, if I  tried the same wget with a slight change in the command
> line (without " -U Mozilla ")
> 
>  wget --wait=20 --limit-rate=20K -r -p www.my_site.com
> 
> I get this:
> 
> --2011-05-25 14:30:36--  http://www.my_site.com/
> Resolving www.my_site.com... xxx.xx.xxx.xx
> Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
> HTTP request sent, awaiting response... 403 Forbidden
> 2011-05-25 14:30:37 ERROR 403: Forbidden.

Wget deigned to identify itself as wget this time.

> Now I have three questions:

> 1. Why didn't the code in .htaccess prevent the downloading? Did I
> miss something?

(See my explanation above.)

> 2. Do we have other tools acting like wget, how can we prevent them
> all from downing the site content?

There are other tools that act like wget. You can't prevent them *all*
from downloading, though you could blacklist specific ones the way you
did with Wget. Of course, they may also decide to change the User-Agent
string, then you have no way of telling at all.

> 3. If someone is downloading, can we have some log file that can
> expose the downloader's info?

Your web browser logs will have their IP address, but I doubt you could
do anything useful with that information. If your user logs in to the
site, you could try to keep track of that yourself somehow, but that
could be very complex depending what you're trying to prevent.

...


In other words, the protection you're asking for is basically impossible
against a determined downloader.

--Ken


More information about the vox-tech mailing list