[XML] Améliorer le parsing

Version imprimable

Bonjour à vous tous !

Voici mon petit problème du WE (bon WE à tous :D ). Voici un fichier XML que je veux parser :

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 
<protein_group group_number="2" probability="1.00">
      <protein protein_name="UniRef100_Q8IUK7" n_indistinguishable_proteins="3" probability="1.00" percent_coverage="3.8" unique_stripped_peptides="KVPQVSTPTLVEVSR" group_sibling_id="a" total_number_peptides="2">
         <annotation protein_description="Similar to serum albumin precursor [Homo sapiens]"/>
         <indistinguishable_protein protein_name="UniRef100_P02768">
            <annotation protein_description="Serum albumin precursor [Homo sapiens]"/>         </indistinguishable_protein>
         <indistinguishable_protein protein_name="UniRef100_Q86YG0">
            <annotation protein_description="Similar to alpha-fetoprotein [Homo sapiens]"/>         </indistinguishable_protein>
         <peptide peptide_sequence="KVPQVSTPTLVEVSR" charge="2" initial_probability="1.00" nsp_adjusted_probability="1.00" peptide_group_designator="a" weight="1.00" is_nondegenerate_evidence="Y" n_tryptic_termini="2" n_sibling_peptides="0.99" n_sibling_peptides_bin="3" n_instances="1" is_contributing_evidence="Y">
         </peptide>
         <peptide peptide_sequence="KVPQVSTPTLVEVSR" charge="3" initial_probability="0.99" nsp_adjusted_probability="1.00" peptide_group_designator="a" weight="1.00" is_nondegenerate_evidence="Y" n_tryptic_termini="2" n_sibling_peptides="1.00" n_sibling_peptides_bin="3" n_instances="1" is_contributing_evidence="Y">
         </peptide>
      </protein>
</protein_group>

Et voici mon code qui parse ce fichier XML :

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
 
use XML::Parser;
 
# initialize the parser
my $parser = XML::Parser->new( Handlers => 
{
Start=>\&handle_start
});
$url = "./interact-prot.xml";
$parser->parsefile( $url );
open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
print FILE "\nURL:$url";
 
my @stack;
 
sub handle_start {
	my(@stack);
	my($name, $desc, $val);
	my( $expat, $protein, %attrs ) = @_;
	push( @stack, { protein_group=>$protein});
	if( %attrs ) {
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "group_number") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "\nGroup number: $val\n";
			}		
		}
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "protein_name") {
				$name = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "AccessionNumber: $name\n";
			}	
		}
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "probability") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "probability=$val\t";
			}
		}
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "percent_coverage") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "percent_coverage=$val\t";
			}
		}
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "unique_stripped_peptides") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "unique_stripped_peptides=$val\n";
			}
		}	
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "peptide_sequence") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "peptide_sequence=$val\t";
			}
		}		
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "charge") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "charge=$val\t";
			}
		}		
		while( my( $key, $value ) = each( %attrs )) {
			if ($key eq "initial_probability") {
				$val = $value;
				open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
				print FILE "initial_probability=$val\n";
			}
		}	
	}	
}

Vous l'aurez remarqué, je ne cherche que quelques infos. Toute fois, j'observe des redondances notamment au niveau des attributs probability. En outre, je voudrais faire le distingo entre l'attribut protein_name appartenant à la balise protein et l'attribut protein_name appartenant à la la balise indistinguishable_protein. Voyez-vous comment je pourrais faire ?
Je vous remercie d'avance pour votre aide et excusez la longueur du thread.

@ ++

Un truc comme ça, non :

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
 
#! /usr/bin/perl
use strict;
use warnings;
use XML::Parser;
 
# initialize the parser
my $parser = XML::Parser->new( Handlers =>
{
Start=>\&handle_start
});
my $url = "./interact-prot.xml";
$parser->parsefile( $url );
open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
print FILE "\nURL:$url";
close FILE;
 
sub handle_start {
   my(@stack);
   my($name, $desc, $val);
   my( $expat, $protein, %attrs ) = @_;
   push( @stack, { protein_group=>$protein});
   if( %attrs ) {
        open FILE, ">>Results.txt" or die "Peut pas ouvrir Result.txt !!";
        foreach my $key( keys %attrs ) {
            if ($key eq "group_number") {
                $val = $attrs{$key};
                print FILE "\nGroup number: $val\n";
            }
            elsif ($key eq "protein_name" and $protein eq 'protein') {
                #pour la balise 'protein'
                $name = $attrs{$key};
                print FILE "AccessionNumber: $name\n";
            }
            elsif ($key eq "protein_name" and $protein eq 'indistinguishable_protein') {
                #pour la balise 'indistinguishable_protein'
                $name = $attrs{$key};
                print FILE "AccessionNumber: $name\n";
            }
            elsif ($key eq "probability") {
                $val = $attrs{$key};
                print FILE "probability=$val\t";
            }
            elsif ($key eq "percent_coverage") {
                $val = $attrs{$key};
                print FILE "percent_coverage=$val\t";
            }
            elsif ($key eq "unique_stripped_peptides") {
                $val = $attrs{$key};
                print FILE "unique_stripped_peptides=$val\n";
            }
            elsif ($key eq "peptide_sequence") {
                $val = $attrs{$key};
                print FILE "peptide_sequence=$val\t";
            }
            elsif ($key eq "charge") {
                $val = $attrs{$key};
                print FILE "charge=$val\t";
            }
            elsif ($key eq "initial_probability") {
                $val = $attrs{$key};
                print FILE "initial_probability=$val\n";
            }
        }
        close FILE;        
    }   
}

:D

--
Jedaï

17/07/2004, 23h01
GLDavid

Salut Jedaï, j'ai hâte de tester ton code car là tu me fais un super cadeau (j'ai pô Internet encore ma cabane au Canada qui est blotti au fond des bois, on y vois des écureuils sur le seuil...). Alors, vivement lundi !
Et je te remercie très vivement avec une Maudite :ccool: (c'est la bière à Robert Charlebois).

@++
19/07/2004, 17h46
GLDavid

Que te dire d'autres que :
:merci: :hola: :bravo: :ave:

@ ++