[vox-tech] how to modify .htaccess to prevent wget or the likes from downing my site?

Hai Yi yihai2004 at gmail.com
Wed May 25 11:50:15 PDT 2011


Hello all:

I first asked this question to the support of my web host, and they
redirected me to this link:
http://www.webhostingtalk.com/showthread.php?t=437549

and the snippet on that page looks like:


SetEnvIfNoCase User-Agent "^Wget" bad_bot

<Limit GET POST>
   Order Allow,Deny
   Allow from all
   Deny from env=bad_bot
</Limit>


I copied and pasted it to the .htaccess under /public_html. Still, I
am able to use this command to fetch my site:

wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com

I noticed the "Wget" on the above snippet has a capital "W" therefore
I changed it, no difference thou.

However, if I  tried the same wget with a slight change in the command
line (without " -U Mozilla ")

 wget --wait=20 --limit-rate=20K -r -p www.my_site.com

I get this:

--2011-05-25 14:30:36--  http://www.my_site.com/
Resolving www.my_site.com... xxx.xx.xxx.xx
Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2011-05-25 14:30:37 ERROR 403: Forbidden.


On the same page
(http://www.webhostingtalk.com/showthread.php?t=437549), I noticed a
comment:

"
wget -U "Mozilla/4.03 [en] (X11; I; SunOS 5.5.1 sun4u)"

Use of this option is discouraged, unless you really know what you are doing.

"


Now I have three questions:

1. Why didn't the code in .htaccess prevent the downloading? Did I
miss something?
2. Do we have other tools acting like wget, how can we prevent them
all from downing the site content?
3. If someone is downloading, can we have some log file that can
expose the downloader's info?

Thanks a lot!
Hai


More information about the vox-tech mailing list