1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
| {
my %vus;
my @avoir;
sub robot
{
foreach my $url ( @_ )
{
next if defined $vus{$url};
push(@avoir, $url);
}
return if $#avoir == -1;
while (my $page = shift @avoir) {
++$vus{$page};
my $link=$page;
my $verif_links = verif_link_format($link);
if ( $verif_links == 1 ) {
my $reqs = [
HTTP::Request->new('GET', $link)
];
foreach my $req (@$reqs) {
$req->header('Accept' => [
qw(
text/html application/xhtml+xml application/msexcel
application/msword application/pdf application/rtf
application/x-javascript application/x-httpd-php
text/rtf text/nux application/sxc application/sxw
application/sxd application/mdi application/x-httpd-asp
application/x-httpd-aspx application/x-httpd-mspx
application/x-httpd-cfm application/x-httpd-jthml
application/x-httpd-pm application/vnd.sun.xml.calc
application/vnd.sun.xml.writer application/vnd.sun.xml.draw)
]
);
print "Registering '".$req->url."'\n";
if ( my $res = $ua->register ($req) ) {
print STDERR $res->error_as_HTML;
}
}
my $entries = $ua->wait(2);
foreach (keys %$entries) {
my $res = $entries->{$_}->response;
# print "Answer for '",$res->request->url, "' was \t", $res->code,": ",
# $res->message,"\n";
if ( $res->is_success ) {
my ($code, $date, $size, %page, $base, $content_type);
$code = $res->content;
$date = date();
$size = size($link);
$base = $res->base;
$content_type = $res->content_type;
%page = (
result => [
"$code", "$link",
"$date", "$size KB",
"$content_type"
]
);
print "$page{'result'}->[0]\n";
print "$page{'result'}->[1]\n";
print "$page{'result'}->[2]\n";
print "$page{'result'}->[3]\n";
print "$page{'result'}->[4]\n";
my $p = HTML::LinkExtor->new(\&callback);
$p->parse($code);
@web = map { $_ = url($_, $base)->abs; } @web;
@web = illegal(@web);
@web = grep_array(@web);
robot(@web);
}
else
{
print "Request failed\n";
}
}
}
else
{
print "Bad link\n";
}
}
}
} |
Partager