Bonjour,

j'essaye désespérément de parser le texte suivant. Mon but est de faire en sorte que chaque élément entre {{ }} soit un élément d'une liste.

J'utilise pour cela la fonction findall mais sans succès.

Code : Sélectionner tout - Visualiser dans une fenêtre à part
re.findall("{{ ?.*|\s* ?}}",snp_page)
Merci d'avance pour votre aide.

Le texte à parser.

Code : Sélectionner tout - Visualiser dans une fenêtre à part
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{{Rsnum
|rsid=1799966
|Gene=BRCA1
|Chromosome=17
|position=43071077
|Orientation=minus
|ReferenceAllele=A
|MissenseAllele=G
|GMAF=0.3274
|Assembly=GRCh38
|GenomeBuild=38.1
|dbSNPBuild=141
|geno1=(A;A)
|geno2=(A;G)
|geno3=(G;G)
|Gene_s=BRCA1
}}{{ population diversity
| geno1=(A;A)
| geno2=(A;G)
| geno3=(G;G)
| CEU | 45.5 | 46.4 | 8.2
| HCB | 46.3 | 41.9 | 11.8
| JPT | 53.1 | 38.9 | 8.0
| YRI | 63.3 | 35.4 | 1.4
| ASW | 50.9 | 45.6 | 3.5
| CHB | 46.3 | 41.9 | 11.8
| CHD | 29.4 | 55.0 | 15.6
| GIH | 31.6 | 50.0 | 18.4
| LWK | 62.7 | 32.7 | 4.5
| MEX | 41.4 | 43.1 | 15.5
| MKK | 60.9 | 35.9 | 3.2
| TSI | 41.2 | 51.5 | 7.2
| HapMapRevision=28
}}
This SNP, a variant in the [[BRCA1]] gene, is 1 of 25 SNPs reported to represent independently minor, but cumulatively significant, increased risk for [[breast cancer]]. {{PMID|17341484}}

For details of all 25 SNPs in this group, along with the two methods used to calculate overall risk estimates for [[breast cancer]], refer to the SNPedia [[breast cancer]] entry.

For this particular SNP, the risk (minor) allele is (G).

Code : Sélectionner tout - Visualiser dans une fenêtre à part
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
{{ClinVar
|rsid=1799966
|Reversed=1
|FwdREF=A
|FwdALT=G,T
|REF=T
|ALT=A,C
|RSPOS=41223094
|CHROM=17
|GMAF=0.3274
|dbSNPBuildID=89
|SSR=0
|SAO=0
|VP=0x05016800000017051f100101
|GENEINFO=BRCA1:672
|GENE_NAME=BRCA1
|GENE_ID=672
|WGT=0
|VC=SNV
|CLNALLE=1; 2
|CLNHGVS=NC_000017.10:g.41223094T>A; NC_000017.10:g.41223094T>C
|CLNSIG=2
|CLNCUI=
|CLNACC=
RCV000031194.2; RCV000048673.2; RCV000034753.1; RCV000048672.2
|Tags=RV;PM;PMC;SLO;VLD;G5A;G5;HD;GNO;KGPhase1;KGPilot123;KGPROD;OTHERKG;PH3;LSD
|CAF=0.6726; 0.3274
|CLNDBN=Breast-ovarian cancer, familial 1; Familial cancer of breast; not provided
|CLNDSDB=GeneReviews:MedGen:OMIM:Orphanet; GeneReviews:MedGen:OMIM:SNOMED_CT
|CLNDSDBID=NBK1247:C2676676:604370:145; NBK1247:C0346153:114480:254843006
|COMMON=1
|Disease=Breast-ovarian cancer; Familial cancer of breast; not provided
}}
 
{{PMID Auto
|PMID=18559551
|Title=Pathway analysis of single-nucleotide polymorphisms potentially associated with glioblastoma multiforme susceptibility using random forests.
}}
 
{{GET Evidence
|gene=BRCA1
|aa_change=Ser1634Gly
|aa_change_short=S1634G
|impact=not reviewed
|qualified_impact=Insufficiently evaluated not reviewed
|inheritance=unknown
|quality_scores=Array
|dbsnp_id=rs1799966
|overall_frequency_n=3203
|overall_frequency_d=10758
|overall_frequency=0.297732
|n_genomes=3
|n_genomes_annotated=0
|n_haplomes=3
|n_articles=0
|n_articles_annotated=0
|qualityscore_in_silico=1
|qualitycomment_in_silico=Y
|gene_in_genetests=Y
|genetests_testable=Y
|genetests_reviewed=Y
|nblosum100=2
|autoscore=2
|webscore=N
}}
 
{{on chip | 23andMe v1}}
{{on chip | 23andMe v2}}
{{on chip | 23andMe v3}}
{{on chip | 23andMe v4}}
{{on chip | FTDNA2}}
{{on chip | FTDNA}}
{{on chip | Illumina Human 1M}}