parser de texte - amélioration du code

**yapasmieux** · 15/11/2020, 21h35

Bonjour

Je viens de finir un de mes premiers code python en POO et je viens écrire sur ce forum afin de poster mon code et avoir un retour de votre part concernant des éventuelles améliorations du code.
Il se trouve que le code fonctionne correctement mais je pense qu'il y a beaucoup d'aspects de python que je ne maîtrise pas et je souhaite avoir l'avis de gens plus expérimentés pour m'apporter des solutions plus élégantes.

Je cherche à inclure une ID devant un blocs de titres organisés en hiérarchie par tabulation.
WBS of the Project
\t Design
\t\t Preliminary Design Review
\t\t\t Documentation milestone 1
\t\t\t\t Documentation package drafting
\t\t\t PDR Meeting
\t\t\t\t PDR Report
\t\t Critical Design Review
\t\t\t Documentation milestone 2
\t\t\t\t Documentation package drafting
\t\t\t Final CDR Meeting
\t\t\t\t Final CDR Report
\t\t\t Documentation milestone 3
\t\t\t\t Documentation package drafting
\t HW & SW Development
\t\t Software
\t\t\t Software Specification
\t\t\t\t Spec 1
\t\t\t\t Spec 2
\t\t\t\t Spec 3

Je chercher à garder les tabulations de titre mais inclure un numéro devant chaque titre : 1.1 ou 1.1.2 etc

1 WBS of the Project
\t 1.1 Design
\t\t 1.1.1 Preliminary Design Review
\t\t\t 1.1.1.1 Documentation milestone 1
\t\t\t\t 1.1.1.1.1 Documentation package drafting
\t\t\t 1.1.1.2 PDR Meeting
\t\t\t\t 1.1.1.2.1 PDR Report
\t\t 1.1.2 Critical Design Review
\t\t\t 1.1.2.1 Documentation milestone 2
\t\t\t\t 1.1.2.1.1 Documentation package drafting
\t\t\t 1.1.2.2 Final CDR Meeting
\t\t\t\t 1.1.2.2.1 Final CDR Report
\t\t\t 1.1.2.3 Documentation milestone 3
\t\t\t\t 1.1.2.3.1 Documentation package drafting

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
 
# -*-coding: utf8 -*
#Python 3 only
 
"""
##################         Algorithm       #########################
#      Original         |    Count   |    ID        |     len(ID)  #
#Project                |    0       |     1        |       1      #
#   HW                  |    1       |     1.1      |       3      #
#       Specification   |    2       |     1.1.1    |       5      #
#   SW                  |    1       |     1.2      |       3      #
#       Specification   |    2       |     1.2.1    |       5      #
#       Code            |    2       |     1.2.2    |       5      #
#           Validation  |    3       |     1.2.2.1  |       7      #
#   Qualification       |    1       |     1.3      |       3      #
####################################################################
 
"""
 
import os
import re
import sys
 
def tabCreator(nb_tab, sentence):
    """
    Function that returns a sentence with the number of tab on front
    """
    temp_sentence = sentence
    for tab in range(nb_tab):
        temp_sentence = "\t" + temp_sentence
    return temp_sentence
 
def calculID(count_tab, ID):
    """
    Function that gives an ID to each title depending on its level of hierarchy
    """    
    #if the level hierarchy is inferior to the previous ID then add ".1"
    if 2*count_tab+1 > len(str(ID)):
        ID = ID + ".1"
 
    #otherwise if level hierarchy is equal to the previous ID then increase last numbe
    elif 2*count_tab +1 == len(str(ID)):
        ID = ID[:-1] + str(int(ID[-1])+1)
 
    #in other cases, increase the level ID 
    else:
        ID = ID[0:count_tab*2] + str(int(ID[count_tab*2])+1)
 
    return ID
 
class WBS:
    """
    Class defining an object for manipulating a WBS document
    """
    def __init__(self, origin_file='WBS.txt'):
        self.original_content = open(origin_file,'r', encoding="utf8")
        self.final_file = open('GENERATED_WBS.txt','w', encoding="utf8")
        self.l_final = []
        self.count_tab = 0
        self.ID = "0"        
 
    def parseOrigin(self):
        """
        Function for parsing the original content
        remove \n character
        Allocate tab space to hierarchy number
        """
 
        for each_line in self.original_content:
 
            #counts how many tab are included in the sentence
            self.count_tab = each_line.count("\t")
 
            #calculate the level of hierarchy of the title
            self.ID = calculID(self.count_tab, self.ID)
 
            #removes the \n character from the line
            temp_buffer = each_line.replace('\n','')   
 
            #split each line from original list into two different lists
            temp_buffer = re.compile("\t+").split(temp_buffer)
 
            #then concatenate ID and the content of the line
            temp_buffer[-1] = self.ID + " " + temp_buffer[-1]
 
            #creates a tabulation before ID
            final_sentence = tabCreator(self.count_tab, temp_buffer[-1])
 
            #include it into final list
            self.l_final.append(final_sentence)
 
    def writeFinalFile(self):
        """
        Function to write l_final into final file
        """
        for each_line in self.l_final:
            print(each_line,file=self.final_file)
 
    def closeFiles(self):
        """
        Function to close all files
        """
        self.original_content.close()
        self.final_file.close()
 
if __name__ == '__main__':
    """Main function to call python patch_anki.py"""
    os.chdir('.')
    try:
        #sys.argv[1] allows to inclue a file as argument
        wbs = WBS(sys.argv[1])
    except IndexError :
        wbs = WBS()
    except FileNotFoundError:
        print("file not found")
        sys.exit(1)
 
    wbs.parseOrigin()
    wbs.writeFinalFile()
    wbs.closeFiles()

En vous remerciant par avance

parser de texte - amélioration du code

Python

Mode arborescent

Discussions similaires

Partager

Partager