1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
|
HTTP Authentication
Many web sites restrict access to documents by using ``HTTP Authentication''. This isn't just any form of ``enter your password'' restriction, but is a specific mechanism where the HTTP server sends the browser an HTTP code that says ``That document is part of a protected 'realm', and you can access it only if you re-request it and add some special authorization headers to your request''.
For example, the Unicode.org admins stop email-harvesting bots from harvesting the contents of their mailing list archives, by protecting them with HTTP Authentication, and then publicly stating the username and password (at http://www.unicode.org/mail-arch/) -- namely username ``unicode-ml'' and password ``unicode''.
For example, consider this URL, which is part of the protected area of the web site:
http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html
If you access that with a browser, you'll get a prompt like ``Enter username and password for 'Unicode-MailList-Archives' at server 'www.unicode.org'''.
In LWP, if you just request that URL, like this:
use LWP;
my $browser = LWP::UserAgent->new;
my $url =
'http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html';
my $response = $browser->get($url);
die "Error: ", $response->header('WWW-Authenticate') || 'Error accessing',
# ('WWW-Authenticate' is the realm-name)
"\n ", $response->status_line, "\n at $url\n Aborting"
unless $response->is_success;
Then you'll get this error:
Error: Basic realm="Unicode-MailList-Archives"
401 Authorization Required
at http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html
Aborting at auth1.pl line 9. [or wherever]
...because the $browser doesn't know any the username and password for that realm (``Unicode-MailList-Archives'') at that host (``www.unicode.org''). The simplest way to let the browser know about this is to use the credentials method to let it know about a username and password that it can try using for that realm at that host. The syntax is:
$browser->credentials(
'servername:portnumber',
'realm-name',
'username' => 'password'
);
In most cases, the port number is 80, the default TCP/IP port for HTTP; and you usually call the credentials method before you make any requests. For example:
$browser->credentials(
'reports.mybazouki.com:80',
'web_server_usage_reports',
'plinky' => 'banjo123'
);
So if we add the following to the program above, right after the < $browser = LWP::UserAgent-new; >> line...
$browser->credentials( # add this to our $browser 's "key ring"
'www.unicode.org:80',
'Unicode-MailList-Archives',
'unicode-ml' => 'unicode'
);
...then when we run it, the request succeeds, instead of causing the die to be called. |