[vox-tech] how to modify .htaccess to prevent wget or the likes from downing my site?

Hai Yi yihai2004 at gmail.com
Wed May 25 12:51:06 PDT 2011


thanks a lot, Ken! I love this mailing list!
its not that a disappointing fact thou, I guess I can still enumerate
downloaders and common user-agent strings. Do you have a link or
something that lists popular downloaders? I still want to protect my
site to some extend.

Thanks,
Hai


On Wed, May 25, 2011 at 3:10 PM, Chanoch (Ken) Bloom <kbloom at gmail.com> wrote:
> On Wed, 2011-05-25 at 14:50 -0400, Hai Yi wrote:
>> Hello all:
>>
>> I first asked this question to the support of my web host, and they
>> redirected me to this link:
>> http://www.webhostingtalk.com/showthread.php?t=437549
>>
>> and the snippet on that page looks like:
>>
>>
>> SetEnvIfNoCase User-Agent "^Wget" bad_bot
>>
>> <Limit GET POST>
>>    Order Allow,Deny
>>    Allow from all
>>    Deny from env=bad_bot
>> </Limit>
>
> This snippet will only block wget, if wget deigns to identify itself as
> wget by saying so in the user-agent string.
>
>>
>> I copied and pasted it to the .htaccess under /public_html. Still, I
>> am able to use this command to fetch my site:
>>
>> wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com
>
> Yup. Wget decided to identify itself as Mozilla in the user-agent
> string. That means you have no way at all of knowing that someone's
> trying to use Wget to download from your site.
>
>> However, if I  tried the same wget with a slight change in the command
>> line (without " -U Mozilla ")
>>
>>  wget --wait=20 --limit-rate=20K -r -p www.my_site.com
>>
>> I get this:
>>
>> --2011-05-25 14:30:36--  http://www.my_site.com/
>> Resolving www.my_site.com... xxx.xx.xxx.xx
>> Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
>> HTTP request sent, awaiting response... 403 Forbidden
>> 2011-05-25 14:30:37 ERROR 403: Forbidden.
>
> Wget deigned to identify itself as wget this time.
>
>> Now I have three questions:
>
>> 1. Why didn't the code in .htaccess prevent the downloading? Did I
>> miss something?
>
> (See my explanation above.)
>
>> 2. Do we have other tools acting like wget, how can we prevent them
>> all from downing the site content?
>
> There are other tools that act like wget. You can't prevent them *all*
> from downloading, though you could blacklist specific ones the way you
> did with Wget. Of course, they may also decide to change the User-Agent
> string, then you have no way of telling at all.
>
>> 3. If someone is downloading, can we have some log file that can
>> expose the downloader's info?
>
> Your web browser logs will have their IP address, but I doubt you could
> do anything useful with that information. If your user logs in to the
> site, you could try to keep track of that yourself somehow, but that
> could be very complex depending what you're trying to prevent.
>
> ...
>
>
> In other words, the protection you're asking for is basically impossible
> against a determined downloader.
>
> --Ken
> _______________________________________________
> vox-tech mailing list
> vox-tech at lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>


More information about the vox-tech mailing list