1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
|
#!/usr/bin/perl
use strict;
use WWW::Mechanize;
binmode(STDOUT, ":utf8");
@ARGV = ("../numnotices.txt");
while (<>) {
my $ua = WWW::Mechanize->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15',
cookie_jar => {} );
my $url_pubmed = 'http://www.ncbi.nlm.nih.gov/pubmed';
my ($numero) = $_;
print"$_";
#my $numero = "7957670";
my $uid = $numero . "[uid]";
my $url = $url_pubmed . "?term=" . $numero;
my $reponse = $ua->get($url);
if ( $reponse->is_success ) {
$reponse = $ua->post($url_pubmed, [
'EntrezSystem2.PEntrez.DbConnector.Db' => 'pubmed',
'EntrezSystem2.PEntrez.DbConnector.LastDb' => 'pubmed',
'EntrezSystem2.PEntrez.DbConnector.Term' => $uid,
'EntrezSystem2.PEntrez.DbConnector.LastTabCmd' => '',
'EntrezSystem2.PEntrez.DbConnector.LastQueryKey' => '1',
'EntrezSystem2.PEntrez.DbConnector.IdsFromResult' => '',
'EntrezSystem2.PEntrez.DbConnector.LastIdsFromResult' => '',
'EntrezSystem2.PEntrez.DbConnector.LinkName' => '',
'EntrezSystem2.PEntrez.DbConnector.LinkReadableName' => '',
'EntrezSystem2.PEntrez.DbConnector.LinkSrcDb' => '',
'EntrezSystem2.PEntrez.DbConnector.Cmd' => 'displaychanged',
'EntrezSystem2.PEntrez.DbConnector.TabCmd' => '',
'EntrezSystem2.PEntrez.DbConnector.QueryKey' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_SearchBar.SearchResourceList' => 'pubmed',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_SearchBar.Term' => $uid,
'EntrezSystem2.PEntrez.Pubmed.Pubmed_SearchBar.FeedLimit' => '15',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_SearchBar.FeedName' => $uid,
'EntrezSystem2.PEntrez.Pubmed.Pubmed_SearchBar.CurrDb' => 'pubmed',
'EntrezSystem2.PEntrez.Pubmed.Entrez_PageController.PreviousPageName' => 'results',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.sPresentation' => 'xml',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.FFormat' => 'abstract',
'email_format' => 'abstract',
'email_address' => '',
'email_add_text' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.FileFormat' => 'abstract',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.LastPresentation' => 'abstract',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.Presentation' => 'xml',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.PageSize' => '20',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.LastPageSize' => '20',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.Sort' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.LastSort' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.FileSort' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.Format' => 'text',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.LastFormat' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_ResultsController.ResultCount' => '1',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_ResultsController.RunLastQuery' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_SingleItemSupl.Discovery_SearchDetails.SearchDetailsTerm' => $uid,
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.HistoryDisplay.Cmd' => 'displaychanged',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailReport' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailFormat' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailCount' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailStart' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailSort' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.Email' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailText' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.EmailQueryKey' => '',
'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.EmailTab.QueryDescription' => '',
'p%24a' => 'EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DisplayBar.SetDisplay',
'p%24l' => 'EntrezSystem2',
'p%24st' => 'pubmed' ]);
if ( $reponse->is_success ) {
my $contenu = $ua->content();
print html2xml($contenu) . "\n";
}
else {
my $status = $ua->status();
print STDERR "Echec [$status] pour POST \"$numero\"\n";
}
}
else {
my $status = $ua->status();
print STDERR "Echec [$status] pour GET \"$url\"\n";
}
# remplacer les caractères de balises <, >
sub html2xml
{
my $html = shift;
$html =~ s|.+<pre>(.+)</pre>|$1|s;
while ( $html =~ m|</(\w+)>|o ) {
my $avant = $`;
my $apres = $';
my $tag = $1;
$avant =~ s|^(.*)<($tag(\s+\w+=\".+?\")*)>|$1<$2>|s;
$html = $avant . "</$tag>" . $apres;
}
# retourner en une seule ligne
#$html =~ s|\n[\t ]*||go;
#print $html;
open my $fh, ">>id$_";
print {$fh} $html;
#return $html;
}
}
exit 0; |
Partager