DEBUTANT: probleme pour recupere une variable dans un trie
Salut a tous
Voici un code que j'ai fait:
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| #!/usr/bin/perl -w
#invocation: ./recup X.xml>X_res
#progamme qui va recupere le titre et l'auteur
$nb=0;
$hmax=0;
$repmax=0;
while (<>)
{
chomp;
$nb=++$nb;
if(/^<text top="(\d+)" left="(\d+)" width="(\d+)" height="(\d+)" font="(\d+)">(.+)<\/text>$/)
{
$text=$_;
$t=$1;
$h=$4;
$hmax=$h if ($h > $hmax);
$f=$5;
$text=~ s/<\/text>/ /g;
$text=~ s/<.+?>//g;
$ft=$f.$t;
$ft{$nb}=$ft;
$nt=$nb.$text;
$hash_nbline=$hash_nbline{$ft}++;
$repmax=$hash_nbline if ($hash_nbline > $repmax);
$hash_ft{$hash_nbline}=$ft;
push @{$hash{$nt}},$h,$ft,$nb,$hash_nbline,$text;
}
last if ($nb==70);
}
print "Titre:";
for $nt(sort keys %hash)
{
$text{$nt}=pop @{$hash{$nt}};
if ($hash{$nt}[0] == $hmax)
{
print $text{$nt};
}
}
print "\nAuteurs:";
for $nt(sort keys %hash)
{
if ($hash{$nt}[1] == $hash_ft{$repmax})
{
print $text{$nt};
}
} |
Le probleme est que dans un fichier comme le suivant, je ne recupere que:1,2 1,2 1,2 1 2
au lieu de recuperre les auteurs.
Il faudrait que je recupere "$repmax" mais avec le plus petit "$ft"car la se que je recupere c'est "$repmax" mais avec le plus grand "$ft".
Comment faire? :?
Je vous remercie
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| <?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml>
<page number="1" position="absolute" top="0" left="0" height="1191" width="918">
<fontspec id="0" size="27" family="Times" color="#292425"/>
<fontspec id="1" size="15" family="Times" color="#292425"/>
<fontspec id="2" size="9" family="Times" color="#292425"/>
<fontspec id="3" size="7" family="Times" color="#292425"/>
<fontspec id="4" size="12" family="Times" color="#292425"/>
<text top="122" left="85" width="646" height="28" font="0">ITTACA: a new database for integrated tumor</text>
<text top="155" left="85" width="650" height="28" font="0">transcriptome array and clinical data analysis</text>
<text top="200" left="85" width="92" height="17" font="1">Adil Elfilali</text>
<text top="196" left="178" width="19" height="11" font="2">1,2</text>
<text top="200" left="196" width="123" height="17" font="1">, Severine Lair</text>
<text top="196" left="319" width="19" height="11" font="2">1,2</text>
<text top="200" left="338" width="129" height="17" font="1">, Catia Verbeke</text>
<text top="196" left="467" width="19" height="11" font="2">1,2</text>
<text top="200" left="486" width="156" height="17" font="1">, Philippe La Rosa</text>
<text top="196" left="642" width="7" height="11" font="2">1</text>
<text top="200" left="649" width="171" height="17" font="1">, Francois Radvanyi</text>
<text top="196" left="820" width="7" height="11" font="2">2</text>
<text top="224" left="86" width="195" height="17" font="1">and Emmanuel Barillot</text>
<text top="220" left="281" width="11" height="11" font="2">1,</text>
<text top="224" left="292" width="7" height="17" font="1">*</text>
<text top="259" left="85" width="6" height="9" font="3">1</text>
<text top="262" left="91" width="591" height="14" font="4">Institut Curie, Service Bioinformatique, 26 rue d'Ulm, Paris, 75248 cedex 05, France and</text>
<text top="277" left="85" width="6" height="9" font="3">2</text>
<text top="280" left="91" width="510" height="14" font="4">Institut Curie, CNRS UMR 144, 26 rue d'Ulm, Paris, 75248 cedex 05, France</text>
<text top="318" left="85" width="371" height="11" font="2">Received August 9, 2005; Revised and Accepted September 19, 2005</text> |