J'ai mis du temps à comprendre comment étaient formatées tes données (eh oui, tes lignes de données sont discontinues !) et aussi ce que tu voulais obtenir : tu veux compter les occurrences des bases et retenir la base de plus grande fréquence à une position donnée.
(Tu noteras qu'il y a un travers : que fais-tu si deux bases sont de même fréquence ? Mais peu importe, on en prendra une au hasard).
Je te montre une façon propre de le faire (mais peut être un peu ardue si tu débutes en Perl).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298
| use strict; use warnings;
# Parse file and build array
#
my %sequences;
while ( my $line = <DATA> ) {
next unless $line =~ s/^accession_([^\s]+)\s+//;
my $code = $1;
my @bases = split //, $line;
push @{ $sequences{$code} }, @bases ;
}
my @array = values %sequences;
# Compute occurences and sort bases by frequence
#
my @result;
my $last = $#{ $array[0] };
for my $index ( 0..$last ) {
my %occurences;
my @column = map { $_->[$index] } @array;
for my $base ( @column ) {
$occurences{$base}++;
}
my @bases_by_frequence =
map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, $occurences{$_} ] }
keys %occurences;
push @result, $bases_by_frequence[0];
}
print @result;
__DATA__
accession_NC_003888.3 TGGGTATCCA TCCCTCAGCC GCCGCCCTCC TCGCCCGCTC CCATCGGCTC
accession_AL939106.1 TGGGTATCCA TCCCTCAGCC GCCGCCCTCC TCGCCCGCTC CCATCGGCTC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GGTGCCGATC CTCGCAACAC CAACTACGCC GGCGGCAACG CGTCCGCCAA
accession_AL939106.1 GGTGCCGATC CTCGCAACAC CAACTACGCC GGCGGCAACG CGTCCGCCAA
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GGGCACCGGG ACCGATCCCG TCACCGGGGG TGAGGTGGAG CTGATGTGGG
accession_AL939106.1 GGGCACCGGG ACCGATCCCG TCACCGGGGG TGAGGTGGAG CTGATGTGGG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 TCAAGGGGTC CGGCGGGGAC CTCGGGACGC TCACCGGGGC CGGGCTCGCC
accession_AL939106.1 TCAAGGGGTC CGGCGGGGAC CTCGGGACGC TCACCGGGGC CGGGCTCGCC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GTGCTGCGGC TGGACCGGAT GCGGGCGCTC AAGGACGTCT ACCCGGGCGT
accession_AL939106.1 GTGCTGCGGC TGGACCGGAT GCGGGCGCTC AAGGACGTCT ACCCGGGCGT
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CGAGCACGAG GACGAGATGG TCGCCGCCTT CGACTACTGC CTGCACGGCA
accession_AL939106.1 CGAGCACGAG GACGAGATGG TCGCCGCCTT CGACTACTGC CTGCACGGCA
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 AGGGCGGCGC CGCGCCCTCG ATCGACACGG CGATGCACGG GCTCGTCGAG
accession_AL939106.1 AGGGCGGCGC CGCGCCCTCG ATCGACACGG CGATGCACGG GCTCGTCGAG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GCCGCACACG TCGACCACCT CCACCCCGAC TCCGGCATCG CGCTGGCCTG
accession_AL939106.1 GCCGCACACG TCGACCACCT CCACCCCGAC TCCGGCATCG CGCTGGCCTG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CGCGGCGGAC GGGGAGAAGC TGACCGCCGA GTGCTTCGGC GACACCGTGG
accession_AL939106.1 CGCGGCGGAC GGGGAGAAGC TGACCGCCGA GTGCTTCGGC GACACCGTGG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 TGTGGGTGCC GTGGCGGCGG CCCGGTTTCC AGCTCGGGCT GGACATCGCC
accession_AL939106.1 TGTGGGTGCC GTGGCGGCGG CCCGGTTTCC AGCTCGGGCT GGACATCGCC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GCCGTCAAGG AGGCCAACCC GCAGGCCGTC GGCTGTGTGC TCGGCGGGCA
accession_AL939106.1 GCCGTCAAGG AGGCCAACCC GCAGGCCGTC GGCTGTGTGC TCGGCGGGCA
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CGGCATCACC GCGTGGGGCG ACACCTCCGA GGAGTGCGAG CGCAACTCGC
accession_AL939106.1 CGGCATCACC GCGTGGGGCG ACACCTCCGA GGAGTGCGAG CGCAACTCGC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 TGCACATCAT CCGCACCGCC GAGGCGTTCC TGGCCGAACG CGGGAAGGCC
accession_AL939106.1 TGCACATCAT CCGCACCGCC GAGGCGTTCC TGGCCGAACG CGGGAAGGCC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GAGCCGTTCG GTCCGGTCAT CCCGGCGTAC GCGGCGCTGC CCGAGGCCGA
accession_AL939106.1 GAGCCGTTCG GTCCGGTCAT CCCGGCGTAC GCGGCGCTGC CCGAGGCCGA
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GCGGCGGGAG CGGGCGGCGG CGCTGGCGCC GTACGTCCGT GGTCTGGCGT
accession_AL939106.1 GCGGCGGGAG CGGGCGGCGG CGCTGGCGCC GTACGTCCGT GGTCTGGCGT
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CCCGGGACCG GGCGCAGGTG GGACACTTCA CCGACGCGGA CGTGGTGCTG
accession_AL939106.1 CCCGGGACCG GGCGCAGGTG GGACACTTCA CCGACGCGGA CGTGGTGCTG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GACTTCCTCG CGCGCGCCGA GCATCCGCGA CTCGCCGCGC TCGGCACCTC
accession_AL939106.1 GACTTCCTCG CGCGCGCCGA GCATCCGCGA CTCGCCGCGC TCGGCACCTC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CTGCCCGGAC CACTTCCTGC GCACGAAGGT GCGGCCGCTG GTCCTGGACG
accession_AL939106.1 CTGCCCGGAC CACTTCCTGC GCACGAAGGT GCGGCCGCTG GTCCTGGACG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 TGGCGCCGAC CGCGCCGCTG GAGGAGGCGG TGGCCCGGCT CAAGGAGCTG
accession_AL939106.1 TGGCGCCGAC CGCGCCGCTG GAGGAGGCGG TGGCCCGGCT CAAGGAGCTG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CACGCGGCCT ACCGCGAGGA GTACGCCGCC TACTACGAGC GGCACGCCGA
accession_AL939106.1 CACGCGGCCT ACCGCGAGGA GTACGCCGCC TACTACGAGC GGCACGCCGA
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GCCCGACTCC CCCGCCATGC GCGGCGCCGA CCCGGCGATC GTGCTGGTGC
accession_AL939106.1 GCCCGACTCC CCCGCCATGC GCGGCGCCGA CCCGGCGATC GTGCTGGTGC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 CGGGTGTCGG CATGTTCAGC TTCGGCAAGG ACAAGCAGAC CGCCCGGGTG
accession_AL939106.1 CGGGTGTCGG CATGTTCAGC TTCGGCAAGG ACAAGCAGAC CGCCCGGGTG
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GCCGGCGAGT TCTACGTCAA CGCGATCAAC GTGATGCGCG GGGCCGAGGC
accession_AL939106.1 GCCGGCGAGT TCTACGTCAA CGCGATCAAC GTGATGCGCG GGGCCGAGGC
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 GGTGTCGACG TACGCGCCGA TCGAGGAGTC GGAGAAGTTC CGCATCGAGT
accession_AL939106.1 GGTGTCGACG TACGCGCCGA TCGAGGAGTC GGAGAAGTTC CGCATCGAGT
accession_NM_016286.2 .......... .......... .......... .......... ..........
accession_AB013846.1 .......... .......... .......... .......... ..........
accession_NC_003888.3 ACTGGGCGCT GGAGGAGGCC AAGCTCCGGC GGATGCCGAG GCCGAAGC.C
accession_AL939106.1 ACTGGGCGCT GGAGGAGGCC AAGCTCCGGC GGATGCCGAG GCCGAAGC.C
accession_NM_016286.2 .......... .......... .......... ......TGGA GCTGTTCCTC
accession_AB013846.1 .......... .......... .......... ......TGGA GCTGTTCCTC
accession_NC_003888.3 GCTGGCCACC CGTGTCGCCC TGGTCACCGG CGCGGGCAGC GGGATCGGGA
accession_AL939106.1 GCTGGCCACC CGTGTCGCCC TGGTCACCGG CGCGGGCAGC GGGATCGGGA
accession_NM_016286.2 GCGGGCCGCC GG....GTGC TGGTCACCGG GGCAGGCAAA GGTATAGGGC
accession_AB013846.1 GCGGGCCGCC GG....GTGC TGGTCACCGG GGCAGGCAAA GGTATAGGGC
accession_NC_003888.3 AGGCGATCGC GCGGCGGCTG GTGGACGA.G GGCGCCTGTG TGGTCGTGGC
accession_AL939106.1 AGGCGATCGC GCGGCGGCTG GTGGACGA.G GGCGCCTGTG TGGTCGTGGC
accession_NM_016286.2 GCGGCACGGT CCAGGCGCTG CACGCGAC.G GGCGCGCGGG TGGTGGCTGT
accession_AB013846.1 GCGGCACGGT CCAGGCGCTG CACGCGAC.G GGCGCGCGGG TGGTGGCTGT
accession_NC_003888.3 CGACCTGAAC GCCGA..GAA CGC.GGCGGC GGTCGCCGAG GAGC.TGGGC
accession_AL939106.1 CGACCTGAAC GCCGA..GAA CGC.GGCGGC GGTCGCCGAG GAGC.TGGGC
accession_NM_016286.2 .GAGCCGGAC TCAGGCGGAT CTT.GACAGC CTTGTCCGCG AG...TGCCC
accession_AB013846.1 .GAGCCGGAC TCAGGCGGAT CTT.GACAGC CTTGTCCGCG AG...TGCCC
accession_NC_003888.3 GGG..GACGA CAAGGCCGTC GCCGTGACC. ........GT CGACGTCA..
accession_AL939106.1 GGG..GACGA CAAGGCCGTC GCCGTGACC. ........GT CGACGTCA..
accession_NM_016286.2 GGG..GATAG AAC....... .CCGTGTGC. ........GT GGACCTGG..
accession_AB013846.1 GGG..GATAG AAC....... .CCGTGTGC. ........GT GGACCTGG..
accession_NC_003888.3 CCTCCGAGGA GCAGATCGCC GCGGCGTTCC AGGCGGCCGC CCTCGCCTTC
accession_AL939106.1 CCTCCGAGGA GCAGATCGCC GCGGCGTTCC AGGCGGCCGC CCTCGCCTTC
accession_NM_016286.2 GTGACTGGGA GGCCACCGAG CGGGCGCT.. GGGCAGCG.. ........TG
accession_AB013846.1 GTGACTGGGA GGCCACCGAG CGGGCGCT.. GGGCAGCG.. ........TG
accession_NC_003888.3 GGCGGGGTCG ACCTGGTGGT CAACAACGCG GGGATCTCCA TCTCCAAGCC
accession_AL939106.1 GGCGGGGTCG ACCTGGTGGT CAACAACGCG GGGATCTCCA TCTCCAAGCC
accession_NM_016286.2 GGCCCCGTGG ACCTGCTGGT GAACAACGCC GCTGTCGCCC TGCTGCAGCC
accession_AB013846.1 GGCCCCGTGG ACCTGCTGGT GAACAACGCC GCTGTCGCCC TGCTGCAGCC
accession_NC_003888.3 GCTGCTGGAG ACCTCCGCGA AGGACTGGGA CCTCCAGCAC GACATCATGG
accession_AL939106.1 GCTGCTGGAG ACCTCCGCGA AGGACTGGGA CCTCCAGCAC GACATCATGG
accession_NM_016286.2 CTTCCTGGAG GTCACCAAGG AGGCCTTTGA CAGATCCTTT GAGGTGAACC
accession_AB013846.1 CTTCCTGGAG GTCACCAAGG AGGCCTTTGA CAGATCCTTT GAGGTGAACC
accession_NC_003888.3 CCCGCGGTTC CT.TCCTCGT CTCGCGCGAG GCCGCCCGGG TGATGACCGC
accession_AL939106.1 CCCGCGGTTC CT.TCCTCGT CTCGCGCGAG GCCGCCCGGG TGATGACCGC
accession_NM_016286.2 TGCGTGCGGT CA.TCCAGGT GTCGCAGATT GTGGCCAGGG GCTTAATAGC
accession_AB013846.1 TGCGTGCGGT CA.TCCAGGT GTCGCAGATT GTGGCCAGGG GCTTAATAGC
accession_NC_003888.3 GCAGGAGCTG GGCGGCGACA TCGTCTACAT CGCGTCGAAG AACGCCGTGT
accession_AL939106.1 GCAGGAGCTG GGCGGCGACA TCGTCTACAT CGCGTCGAAG AACGCCGTGT
accession_NM_016286.2 CCGGGGAGTC CCAGGGGCCA TCGTGAATGT CTCCAGCCAG TGCTCCCAGC
accession_AB013846.1 CCGGGGAGTC CCAGGGGCCA TCGTGAATGT CTCCAGCCAG TGCTCCCAGC
accession_NC_003888.3 TCGCCGGCCC GAACAACATC GCC.TACTCC GCCACCAAGG CCGACCAGGC
accession_AL939106.1 TCGCCGGCCC GAACAACATC GCC.TACTCC GCCACCAAGG CCGACCAGGC
accession_NM_016286.2 GGGCAGTAAC TAACCATAGC GTC.TACTGC TCCACCAAGG GTGCCCTGGA
accession_AB013846.1 GGGCAGTAAC TAACCATAGC GTC.TACTGC TCCACCAAGG GTGCCCTGGA
accession_NC_003888.3 CCATCAGGTG CGCCTGCTCG CCGCCGAGCT GGGCGAGCAC GGCATCCGCG
accession_AL939106.1 CCATCAGGTG CGCCTGCTCG CCGCCGAGCT GGGCGAGCAC GGCATCCGCG
accession_NM_016286.2 CATGCTGACC AAGGTGATGG CCCTAGAGCT CGGGCCCCAC AAGATCCGAG
accession_AB013846.1 CATGCTGACC AAGGTGATGG CCCTAGAGCT CGGGCCCCAC AAGATCCGAG
accession_NC_003888.3 TCAACGGCGT CAACCCGGAC GGCGTGGTGC GCGGTTCCGG GATCTTCGCG
accession_AL939106.1 TCAACGGCGT CAACCCGGAC GGCGTGGTGC GCGGTTCCGG GATCTTCGCG
accession_NM_016286.2 TGAATGCAGT AAACCCCACA GTGGTGATG. ACGTCCATGG G.....CCAG
accession_AB013846.1 TGAATGCAGT AAACCCCACA GTGGTGATG. ACGTCCATGG G.....CCAG
accession_NC_003888.3 GGCGGCTGGG GTGCCAAGCG GGCGGCGGTG TACGGCGTGC CGGAGGAGAA
accession_AL939106.1 GGCGGCTGGG GTGCCAAGCG GGCGGCGGTG TACGGCGTGC CGGAGGAGAA
accession_NM_016286.2 GCCACCTGGA GTGAC..... .......... CCCCACAAGG CCAAGACTAT
accession_AB013846.1 GCCACCTGGA GTGAC..... .......... CCCCACAAGG CCAAGACTAT
accession_NC_003888.3 GCTGGGCGAG TTCTACGCGC AGCGGACCCT GCTCAAGCGC GAGGTGCTGC
accession_AL939106.1 GCTGGGCGAG TTCTACGCGC AGCGGACCCT GCTCAAGCGC GAGGTGCTGC
accession_NM_016286.2 GCTGAACCGA ATCC.CACTT GGCA.AGTTT GCT....... GAGGTA....
accession_AB013846.1 GCTGAACCGA ATCC.CACTT GGCA.AGTTT GCT....... GAGGTA....
accession_NC_003888.3 CCGAGCACGT CGCCAACGCC GTGTTCGCGC TGACCGGCGG C.GACCTGAC
accession_AL939106.1 CCGAGCACGT CGCCAACGCC GTGTTCGCGC TGACCGGCGG C.GACCTGAC
accession_NM_016286.2 ..GAGCACGT GGTGAACGCC ATCCTCTTTC TGCTGAGTGA CCGAAGTGGC
accession_AB013846.1 ..GAGCACGT GGTGAACGCC ATCCTCTTTC TGCTGAGTGA CCGAAGTGGC
accession_NC_003888.3 GCACACCACC GGTCTGCACG TCCCGGTCGA CGCCGGCGTC GCCGCCGCGT
accession_AL939106.1 GCACACCACC GGTCTGCACG TCCCGGTCGA CGCCGGCGTC GCCGCCGCGT
accession_NM_016286.2 ATG.ACCACG GGTTCCACTT TGCCGGTGGA AGGGGGCTTC TGGGCC....
accession_AB013846.1 ATG.ACCACG GGTTCCACTT TGCCGGTGGA AGGGGGCTTC TGGGCC....
accession_NC_003888.3 TCCTGCGATG AG
accession_AL939106.1 TCCTGCGATG AG
accession_NM_016286.2 TGCTGAG... ..
accession_AB013846.1 TGCTGA.... .. |
Voilà ce que donne l'éxécution :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
|
T...TATCCA TCCCTCA.CC .CC.CCCTCC TC.CCC.CTC CCATC..CTC
..T.CC.ATC CTC.CAACAC CAACTAC.CC ..C..CAAC. C.TCC.CCAA
...CACC... ACC.ATCCC. TCACC..... T.A..T..A. CT.AT.T...
TCAA....TC C..C....AC CTC...AC.C TCACC....C C...CTC.CC
.T.CT.C..C T..ACC..AT .C...C.CTC AA..AC.TCT ACCC...C.T
C.A.CAC.A. .AC.A.AT.. TC.CC.CCTT C.ACTACT.C CT.CAC..CA
A...C..C.C C.C.CCCTC. ATC.ACAC.. C.AT.CAC.. .CTC.TC.A.
.CC.CACAC. TC.ACCACCT CCACCCC.AC TCC..CATC. C.CT..CCT.
C.C..C..AC ....A.AA.C T.ACC.CC.A .T.CTTC..C .ACACC.T..
T.T...T.CC .T..C..C.. CCC..TTTCC A.CTC...CT ..ACATC.CC
.CC.TCAA.. A..CCAACCC .CA..CC.TC ..CT.T.T.C TC..C...CA
C..CATCACC .C.T....C. ACACCTCC.A ..A.T.C.A. C.CAACTC.C
T.CACATCAT CC.CACC.CC .A..C.TTCC T..CC.AAC. C...AA..CC
.A.CC.TTC. .TCC..TCAT CCC..C.TAC .C..C.CT.C CC.A..CC.A
.C..C...A. C...C..C.. C.CT..C.CC .TAC.TCC.T ..TCT..C.T
CCC...ACC. ..C.CA..T. ..ACACTTCA CC.AC.C..A C.T..T.CT.
.ACTTCCTC. C.C.C.CC.A .CATCC.C.A CTC.CC.C.C TC..CACCTC
CT.CCC..AC CACTTCCT.C .CAC.AA..T .C..CC.CT. .TCCT..AC.
T..C.CC.AC C.C.CC.CT. .A..A..C.. T..CCC..CT CAA..A.CT.
CAC.C..CCT ACC.C.A..A .TAC.CC.CC TACTAC.A.C ..CAC.CC.A
.CCC.ACTCC CCC.CCAT.C .C..C.CC.A CCC..C.ATC .T.CT..T.C
C...T.TC.. CAT.TTCA.C TTC..CAA.. ACAA.CA.AC C.CCC...T.
.CC..C.A.T TCTAC.TCAA C.C.ATCAAC .T.AT.C.C. ...CC.A..C
..T.TC.AC. TAC.C.CC.A TC.A..A.TC ..A.AA.TTC C.CATC.A.T
ACT...C.CT ..A..A..CC AA.CTCC..C ..AT.CTGAA GCTGAACCTC
GCTGGCCACC CGT.TCGTCC TGGTCACCGG CGCAGGCAAA GGTATAGGGA
ACGCCATCGT CCAGCCGCTG CACGACAA.G GGCGCCTGTG TGGTCGTTGT
CGACCTGAAC TCAGAC.GAA CTT.GACAGC CTTCTCCGAG AA.C.TGCCC
GGG..GATAA AAA..CC.TC .CCGTGACC. ........GT CGACCTCA..
CTTACTAGGA GCACATCGAC CCGGCGTTCC AGGCAGCC.C CCTC.CCTTC
GGCCCCGTCG ACCTGCTGGT CAACAACGCC GCTATCTCCA TCTTCAAGCC
CTTCCTGGAG ATCACCAAGA AGGACTTTGA CATATACTAT GACATCAACC
TCCGTGCTTT CA.TCCACGT CTCGCACAAT GTCGCCAGGG TCATAATAGC
CCAGGAACTC CCAGGCGACA TCGTCAATAT CTCCACCAAG AACTCCCAGT
TCGCAGTAAC TAACAATATC GTC.TACTCC TCCACCAAGG CTGACCAGGA
CAATCAGATC AACCTGATCG CCCTAGAGCT CGGCCACCAC AACATCCGAG
TCAATGCAGT AAACCCCAAA GTCGTGATGC ACGTTTATGG GATCTTCCAG
GCCACCTGGA GTGACAA.C. ..C..C..T. TACCACAAGC CCAAGAATAA
GCTGAACCAA ATCTACACTT AGCA.ACTTT GCTCAA.C.C GAGGTACT.C
CCGAGCACGT CGTCAACGCC ATCTTCTTTC TGATCAGTGA CCGAACTGAC
ATACACCACC GGTTTCAATT TCCCGGTCGA AGCCGGCTTC TCCGCC.C.T |
Quelques explications maintenant. 1. On lit chaque ligne. On ignore les lignes vides ; au passage on récupère le code qui identifie la ligne de donnée (le code qui suit accession_). On fait un tableau où chaque élément est une base (@bases) grâce à un split dont le séprateur est vide. On range chacun de ses tableaux dans une table associative (un hash) dont la clé identifie la ligne de donnée (ton code accession_ machin). Quand on a fini on transforme le hash en tableau (à 2 dimensions, donc).
2. Ensuite on compte les occurences pour chaque colonne. Pour ça on les parcourt par leur indice et on construit un hash (%occurences), dont les clés sont les bases ; les valeurs du hash sont incrémentés à chaque occurence. Ensuite on trie par ordre décroissant les bases (les clés du hash) en fonction du nombre d'occurence (la valeur associée à la clé). On met le tout dans un tableau. Et on met son premier élément (la base de plus grande fréquence) dans un tableau de résultat (@result).
3. On imprime.
Le tri du hash utilise une technique très courante en Perl, dite transformée de Schwartz. En gros, on contruit avec map des couples formés (1) de la base (la clé de notre hash) et (2) de son nombre d'occurence (la valeur de notre hash), on passe ces couples passe à sort qui les trie en prenant (2) comme critère, puis on passe ces couples à map qui renvoie la liste d'élément qui nous intéresse, les bases, mais cette fois triées par occurences.
Ah oui, j'oubliais, cette construction se lit de bas en haut.
Bien sûr pour lire les données depuis un fichier, tu fais:
open my $fh, '<', 'ClustalW_result.txt' or die "$!";
et tu remplaces <DATA> par <$fh>.
Autres conseils. Indente correctement ton code (perltidy). Utilise des noms de variables qui font sens. N'oublie pas les pragmas strict et warnings. Ton code était difficilement compréhensible et probablement pas compilable.
J'espère que ça te donne des idées. Demande ce que tu ne comprends pas.
PS : J'ai bien aimé ton :
Envoyé par
angioedema
(...) je n'ai pas l'habitude de travailler (...)
Partager