1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
|
use strict;
#transcript sequences file
open(TRANSCRIPTS_FILE,$ARGV[0]);
my @transcripts=<TRANSCRIPTS_FILE>;
#source of transcript sequences
my $transcript_source=$ARGV[1];
#name of the anchoring enzyme and sequence of its restriction site
my $enzyme=$ARGV[2];
my $restriction_site=uc($ARGV[3]); #if restriction site sequence isn't written in capital letters, we write it in capital letters
$restriction_site =~ s/\s//g; #keep out eventual blanks in the sequence
#initialization:
#one transcript sequence
my $transcript_seq='';
#one transcript description line (i.e. the line beginning by ">" excepted the symbol ">")
chomp($transcripts[0]);
my $transcript_line_desc=substr($transcripts[0],1,length($transcripts[0]));
#number of virtual tags extracted from transcript sequences
my $tags_nb=$ARGV[4]+1;
#number of transcript sequences
my $transcripts_nb=$ARGV[5]+1;
#opening of load data files
open(TAGS,">>loaddata_VirtualTag");
open(CONNECTION,">>loaddata_VirtualTagTranscript");
open(TRANSCRIPTS,">>loaddata_Transcript");
#reading of the transcript file from second line
#(the first one was used for initialization)
for (my $i=1; $i<=$#transcripts; $i++)
{
my $line=$transcripts[$i];
chomp($line);
if ($line =~ /^>/){
#analyzing the previous transcript sequence
$transcript_seq =~ s/\s//g; #keep out eventual blanks in the sequence
$transcript_seq = uc($transcript_seq); #if transcript sequence isn't written in capital letters, we write it in capital letters
#finding the virtual tags in the transcript sequence
my @tags_id = Find_Tags($transcript_seq,$tags_nb,$enzyme,$restriction_site);
if ($#tags_id != -1)
{
$tags_nb = $tags_id[$#tags_id]+1; #incrementation of the number of virtual tags already extracted from transcript sequences
}
#analyzing transcript sequence
Analyze_Transcript($transcript_seq,$transcript_line_desc,$transcripts_nb,$transcript_source);
#connection between virtual tags and transcript
foreach my $id (@tags_id)
{
print CONNECTION $id."\t".$transcripts_nb."\n";
}
#consideration of the next transcript sequence
$transcript_line_desc=substr($line,1,length($line));
$transcripts_nb ++;
$transcript_seq='';
}
else
{
$transcript_seq .= $line;
}
}
#analyzing the last transcript sequence
$transcript_seq =~ s/\s//g; #keep out eventual blanks in the sequence
$transcript_seq = uc($transcript_seq); #if transcript sequence isn't written in capital letters, we write it in capital letters
#finding the virtual tags in the transcript sequence
my @tags_id = Find_Tags($transcript_seq,$tags_nb,$enzyme,$restriction_site);
$tags_nb = $tags_id[$#tags_id]; #incrementation of the number of virtual tags already extracted from transcript sequences
#analyzing transcript sequence
Analyze_Transcript($transcript_seq,$transcript_line_desc,$transcripts_nb,$transcript_source);
#connection between virtual tags and transcript
foreach my $id (@tags_id)
{
print CONNECTION $id."\t".$transcripts_nb."\n";
}
#closing of files to load into corresponding tables
close TAGS;
close CONNECTION;
close TRANSCRIPTS;
close TRANSCRIPTS_FILE;
#printing number of virtual tags and transcripts in files to load into corresponding tables
print "$tags_nb\n$transcripts_nb\n";
exit; |
Partager