[vox-tech] how to modify .htaccess to prevent wget or the likes from downing my site?
Hai Yi
yihai2004 at gmail.com
Wed May 25 11:50:15 PDT 2011
Hello all:
I first asked this question to the support of my web host, and they
redirected me to this link:
http://www.webhostingtalk.com/showthread.php?t=437549
and the snippet on that page looks like:
SetEnvIfNoCase User-Agent "^Wget" bad_bot
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
I copied and pasted it to the .htaccess under /public_html. Still, I
am able to use this command to fetch my site:
wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com
I noticed the "Wget" on the above snippet has a capital "W" therefore
I changed it, no difference thou.
However, if I tried the same wget with a slight change in the command
line (without " -U Mozilla ")
wget --wait=20 --limit-rate=20K -r -p www.my_site.com
I get this:
--2011-05-25 14:30:36-- http://www.my_site.com/
Resolving www.my_site.com... xxx.xx.xxx.xx
Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2011-05-25 14:30:37 ERROR 403: Forbidden.
On the same page
(http://www.webhostingtalk.com/showthread.php?t=437549), I noticed a
comment:
"
wget -U "Mozilla/4.03 [en] (X11; I; SunOS 5.5.1 sun4u)"
Use of this option is discouraged, unless you really know what you are doing.
"
Now I have three questions:
1. Why didn't the code in .htaccess prevent the downloading? Did I
miss something?
2. Do we have other tools acting like wget, how can we prevent them
all from downing the site content?
3. If someone is downloading, can we have some log file that can
expose the downloader's info?
Thanks a lot!
Hai
More information about the vox-tech
mailing list