1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
| #!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Bio::SeqIO;
#-------------------------------- best_variable_region_per_seq.pl
# fichier d'entrées
my $infile = 'P:/Theorie/Leonid/pamgene/purA/test.fsa';
my $in = Bio::SeqIO->new(-file => $infile , '-format' => 'fasta');
# taille de la région de part et d'autre du point central
my $N = 4;
my $l;
my %hash_ali;
my $start = 0;
my $gap_symbol = '';
while ( my $seq_IO = $in->next_seq() ) {
$hash_ali{$seq_IO->primary_id} = $seq_IO->seq ;
$l = length ($seq_IO->seq);
# on recherche la position où toutes
# les séquences sont alignées
my ($s) = $seq_IO->seq =~ m/^([^a-z]*)/i;
my $l = length ($s) + 1;
if ($start < $l){
$start = $l
}
($gap_symbol) = $seq_IO->seq =~ m/([^a-z])/i;
}
my $k = $l - 2 * $N + 1;
foreach my $id (sort keys %hash_ali){
my %score_id;
my @array_score;
for my $X ( $N + 1 + $start ..$k){
# my $subseq = join "", @{[substr($hash_ali{$id}, 0, $X-1) =~ /([A-Z])/gi]}[-$N..-1], @{[substr($hash_ali{$id}, $X-1) =~ /([A-Z])/gi]}[0 .. $N];
my @subseq = @{[substr($hash_ali{$id}, 0, $X) =~ /([A-Z])/gi]}[-$N..-1], @{[substr($hash_ali{$id}, $X-1) =~ /([A-Z])/gi]}[0 .. $N];
=h
my $rx = join ('\\'.$gap_symbol.'*', @subseq );
my ($subseq_ali) = $hash_ali{$id} =~m/($rx)/;
my $pos = $X - 2 * $N;
my $score = 0;
foreach my $id2 (keys %hash_ali){
if ($id2 ne $id){
my @subseq2 = @{[substr($hash_ali{$id2}, 0, $X) =~ /([A-Z])/gi]}[-$N..-1], @{[substr($hash_ali{$id2}, $X-1) =~ /([A-Z])/gi]}[0 .. $N];
# print join "$id\t".('', @subseq)."\t$id2".('', @subseq2)."\n";
for my $j (0..$#subseq){
if ($subseq[$j] ne $subseq2[$j]){
$score ++;
}
}
}
}
print "$id\t$pos\t$score\t$subseq_ali\n";
push @{$array_score[$score]}, $score;
=cut
}
# $score_id{$id} = \(sort {$b<=>$a} @array_score)[0..10];
# print Dumper %score_id ;
} |
Partager