Projet lycée décodage code césar

Bonjour,

Je suis en première avec un spécialité nsi (programmation) donc je débute sur python,

j'ai un projet de cryptographie qui consiste a décrypté un code césar sans savoir le décalage soit je doit brute force avec toute les possibilité et de trouvé la quel est la plus française soit j'utilise la fréquence d'apparition des lettres en français via wikipedia,

le problème c'est que le résultat trouvé n'est pas du tout français soit je pense que le problème viens de mes calcules mais je ne trouve pas le problème c'est pour ça que je vous demande votre aide.

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
 
import csv
alphabet=["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
###decodage code césar###
def decodage_césar(texte,decalage:int):
    nv=""
    global alphabet
    for lettre in texte:
        if lettre in alphabet:
            ind=alphabet.index(lettre)
            ind=ind-decalage
            if ind<0:
                ind=26+ind
                print(ind,len(texte))
            lettre=alphabet[ind]
            nv+=lettre
        else:
            nv+=lettre
 
    return nv
 
 
 
 
def csv(nomFichier):
    """Fonction qui importe un fichier csv sous la forme d'une liste de
    tuples en sautant la ligne d'entêtes"""
    freq = {}
    csvfile = open(nomFichier, 'r', newline='')
    tableReader = csv.reader(csvfile, delimiter=';')
    tableReader.__next__()    # permet de sauter une ligne (la ligne d'entête lorsqu'elle existe)
    for row in tableReader:
        freq.append({row[0]:row[1]})
    return freq
 
def test(texte):
    écart=0
    freq_fr={"a":7.11,"b":1.14,"c":3.18,"d":3.67,"e":12.10,"f":1.11,"g":1.23,"h":1.11,"i":6.59,"j":0.34,"k":0.29,"l":4.96,"m":2.62,"n":6.39,"o":5.02,"p":2.49,"q":0.65,"r":6.07,"s":6.51,"t":5.92,"u":4.49,"v":1.11,"w":0.17,"x":0.38,"y":0.46,"z":0.15}
    #frequence d'apparition de lettre en français en pourcentage
    freq_texte={"a":0,"b":0,"c":0,"d":0,"e":0,"f":0,"g":0,"h":0,"i":0,"j":0,"k":0,"l":0,"m":0,"n":0,"o":0,"p":0,"q":0,"r":0,"s":0,"t":0,"u":0,"v":0,"w":0,"x":0,"y":0,"z":0}
    #frequence d'apparition de lettre dans le texte
    ban=[" ","'","(",")",",","?","1","2","3","4","5","6","7","8","9",'"',"’",".","-","_","!","\n","«","»"]
    #les caractère/nombre bannies pour le test
    for lettre in texte:
        if lettre not in ban:
            freq_texte[lettre]+=1
            #on ajoute +1 a chaque fois que la lettre est dans le texte
    for i in alphabet:
        freq_texte[i]=freq_texte[i]/len(texte)*100
        #on calcule le pourcantage d'apparition
    for i in alphabet:
        écart=freq_texte[i]/freq_fr[i]
        #on calcule la marge d'erreur 
    return (écart/len(freq_texte))*-1
    #on le met en negatif pour un trie + efficace
 
def detection(texte):
    compatibilité=[]
    for i in range(len(alphabet)+1):
        compatibilité.append((test(decodage_césar(texte,i)),i))
    compatibilité.sort()
    #on ajoute le resultat de toute les possibilité et on trie la liste 
    #ainsi la meilleur possibilité est la première
    compatibilité=int(compatibilité[0][1])
    #on prend le decalge pour décodé
    return decodage_césar(texte,compatibilité)
 
texte="""papc ijgxcv thi jc bpiwtbpixrxtc, rgneidadvjt, exdccxtg st a’xcudgbpixfjt, st a’xcitaaxvtcrt pgixuxrxtaat ti st ap bdgewdvtctht tc qxdadvxt. papc bpiwxhdc ijgxcv cpxi at 23 yjxc 1912 p adcsgth. hdc etgt thi psbxcxhigpitjg rdadcxpa p bpsgph tc xcst. pj bpaqdgdjvw rdaatvt, hth sdch wdgh sj rdbbjc hdci kxit gtrdccjh. psdathrtci gtugpripxgt p ap rjaijgt raphhxfjt, hdc vdji edjg ath hrxtcrth ti hdc wdbdhtmjpaxit at bpgvxcpaxhtci p ap hwtgqdgct hrwdda. p 17 pch, ijgxcv rdbegtcs ath gtrwtgrwth s’txchitxc, bpxh rdccpxi sth trwtrh gtetith p hth tmpbtch. tc 1931, xa xcitvgt at zxcv’h rdaatvt st a’jcxktghxit st rpbqgxsvt dj xa igdjkt jc bxaxtj eajh upkdgpqat edjg tijsxtg ath bpiwtbpixfjth.sxeadbt tc 1933, papc ijgxcv p aj ath ejqaxrpixdch st zjgi vosta ti st kdc ctjbpcc hjg at udcstbtci sth bpiwtbpixfjth fjx xcuajtcrtci hth igpkpjm,  vgprt pjmfjtah xa dqixtci jct qdjght s’tchtxvcpci rwtgrwtjg tc 1935. xa gthdji at egdqatbt st ap strxhxdc stuxcx epg spkxs wxaqtgi, ejxh h’xchrgxi tc sdridgpi st advxfjt bpiwtbpixfjt p a’jcxktghxit st egxcrtidc hdjh ap sxgtrixdc s’padcod rwjgrw, jc advxrxtc pbtgxrpxc fjx thi pggxkt pjm btbth rdcrajhxdch. xa stktadeet spch hp iwtht ap cdixdc s’wnetgrparja ti peejxt hp stbdchigpixdc hjg jct "bprwxct jcxktghtaat", jc dgsxcpitjg fjx thi tcrdgt jct pqhigprixdc. pavdgxiwbth, rparjapqxaxit - ath rdcrteih st ap egdvgpbbpixdc hdci edhth ti ap kdxt thi djktgit pjm dgsxcpitjgh egdvgpbbpqath. tc 1939, papc ijgxcv gtkxtci tchtxvctg p rpbqgxsvt dj xa phhxhit pj rdjgh st lxiivtchitxc fjx egtitcs sxhhdjsgt ap advxfjt bpiwtbpixfjt ! ap htrdcst vjtggt bdcsxpat trapit ti ijgxcv h’tcvpvt spch a’pgbtt qgxipccxfjt dj xa igpkpxaat p qatirwatn epgz pj strwxuugtbtci sth bthhpvth st ap bpgxct paatbpcst. hdc peedgi thi thhtcixta, rpg ijgxcv pbtaxdgt ap "qdbqt" edadcpxht, jc sxhedhxixu fjx etgbti st strwxuugtg at rdst sth bprwxcth tcxvbp jixaxhtth epg at 
 
rdbbpcstbtci cpox. pegth ap vjtggt, papc ijgxcv igpkpxaat pj cpixdcpa ewnhxrpa apqdgpidgn ti rdcrdxi jc egdidinet st rparjapitjg tatrigdcxfjt, a’prt ( pjidbpixr rdbejixcv tcvxct), fjx egtcs sj gtipgs spch hp gtpaxhpixdc. pjhhx xa gtydxci a’jcxktghxit st bpcrwthitg fjx pkpxi rdchigjxi tc 1948 at egtbxtg dgsxcpitjg egdvgpbbpqat detgpixdccta, at bpgz1. ijgxcv epgixrxet p ap egdvgpbbpixdc ti ht ephhxdcct edjg a’xcitaaxvtcrt pgixuxrxtaat. xa tapqdgt jc ithi fjx kpaxst a’xcitaaxvtcrt s’jct bprwxct, at upbtjm "ithi st ijgxcv." epgpaatatbtci, xa h’xcitgthht pjm ewtcdbtcth st rgdxhhpcrt pcxbpat ti ktvtipat fjx at rdcsjxhtci pjm « higjrijgth st ijgxcv ». taj btbqgt st ap gdnpa hdrxtin, ijgxcv tegdjkt st vgpkth sxuuxrjaith fjpcs ap gtktapixdc st hdc wdbdhtmjpaxit egdkdfjt jc hrpcspat tc 1952. r’thi jc rgxbt edjg ap yjhixrt qgxipccxfjt. at egdrth at rdcspbct p ap egxhdc. edjg tkxitg a’tcutgbtbtci, papc ijgxcv thi rdcigpxci s’prrteitg ap rphigpixdc rwxbxfjt, bpxh hp rpggxtgt thi qgxhtt. xa ht hjxrxst p 42 pch tc rgdfjpci jct edbbt igtbett spch sj rnpcjgt at 7 yjxc 1954.
"""
#print(csv("freq_lettre_fr.csv")
print(detection(texte))

p.s: l'import de fichié csv ne marche pas je sais pas si c'est mon programme ou visual studio code

Salut,

Expliquez ces lignes de la fonction test:
Code:

1 2 3 for i in alphabet: écart=freq_texte[i]/freq_fr[i] #on calcule la marge d'erreur
et prenez le temps de décrire en français la méthode/algorithme que vous voulez utiliser.

note: votre truc est assez laborieux... Si le caractère "e" est le plus fréquent (en français), trouver le caractère le plus fréquent dans le texte et calculer de décalage en fonction de sa distance à 'e' devrait suffire.

Après vous pouvez bien sur calculer la fréquence de chaque texte produit par tous les décalages possibles et choisir celui qui aura la plus petite distance avec la fréquence de référence.... pour autant qu'on sache définir une distance entre fréquences!

- W

Merci de ton aide,

Enfaite mon calcule est mal fait j'avais mal réfléchie j'ai donc tout modifié et c'est bon
Code:

1 2 3 4 5 6 7 8 9 10 11 12 13 écart=[] for lettre in texte: if lettre not in ban: freq_texte[lettre]+=1 #on ajoute +1 a chaque fois que la lettre est dans le texte for i in alphabet: freq_texte[i]=freq_texte[i]/len(texte)*100 #on calcule le pourcantage d'apparition écart.append((freq_texte[i]-freq_fr[i])/freq_fr[i]) #on calcule la marge d'erreur return sum(écart)/len(écart) #on le met en negatif pour un trie + efficace
En soit j'ai fait qu'il met toute les marges d'erreurs dans la liste écart puis je fait une moyenne et ainsi je trouve le bon résultat en le metten en negatif pour qu'avec un .sort() la meilleur option soit la 1er .

Même si ce n'est pas le programme le plus optimisé ni le plus efficace j'ai fait avec mon niveau et mon temps.