Pandas transposition d'une colonne d'un dataframe

**Eric_03** · 19/11/2020, 11h58

Bonjour,

J'essaie de transposer une colonne pour la réutiliser en tant qu'argument (colspecs) dans la fonction "pd.read_fwf" de Pandas.

Nom : Pandas.png
Affichages : 351
Taille : 49,4 Ko

Nom : Pandas.png
Affichages : 351
Taille : 49,4 Ko

Nom : Transpo.png
Affichages : 247
Taille : 25,6 Ko

La fonction "pd.read_fwf" fonctionne parfaitement quand la liste est définie en dur. Par contre avec la liste df3 cela n'aboutit pas. Je pense déjà avoir une difficulté avec la transposition.

Merci pour vos suggestions

Eric

**Eric_03** · 19/11/2020, 13h50

Nom : Transpo.png
Affichages : 222
Taille : 25,6 Ko

J'ai réussi à transposer mais bloque toujours sur la suite

Nom : Pandas.png
Affichages : 214
Taille : 49,4 Ko

Eric

**wiztricks** · 19/11/2020, 16h46

Salut,

Postez des images c'est bien mais si on veut reproduire pour essayer de comprendre, çà n'aide pas.

colspecs est une structure à 2 dimensions.
Votre tableau semble en avoir 3.

- W

**Eric_03** · 19/11/2020, 17h34

Bonjour,

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
 
import pandas as pd

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
 
def createList(name, n):
    result = {}
    for i in range(n):
        nameList = name + str(i)
        result[nameList] = []
    return result

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
 
#Lecture du fichier xml (PNS FI)
 
f = open("xml2.xml", "r")
 
# Détermine le nombre de lignes dans le fichier
 
NombredeLigne = 0
 
for line in f:
 
    NombredeLigne += 1
 
# Revient au début du fichier après avoir déterminer le nombre de lignes
 
f.seek(0)
 
# Détermine le nombre de champs par ligne
 
data = f.read()
 
lines = data.splitlines()
 
i = 0
 
NombreChampsParLigne = 0
 
for line in enumerate(lines) :
 
    s = lines[i]
 
    if s.startswith("<FixedColumn>") :
 
        NombreChampsParLigne += 1
 
        i += 1
 
    i += 1
 
    if i >= NombredeLigne :
 
        break
 
print (NombreChampsParLigne)
 
res = createList("list", NombreChampsParLigne) # création de listes dont le nom commence par list
 
f.closed

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
 
# Récupération des valeurs du fichier xml (PNS FI)
 
t = []
 
i = 0
j = -1
 
debut = ""
 
for line in enumerate(lines):
 
    s = lines[i]
 
    if s.startswith("<FixedColumn>") :
        debut = "OK"
        j += 1
 
    if debut == "OK":  
 
        if s.startswith("<Name>") :
            s = s.replace("<Name>", "")
            s = s.replace("</Name>","")
            t.append(s)
 
        if s.startswith("<Description>") :
            s = s.replace("<Description>", "")
            s = s.replace("</Description>","")
            t.append(s)
 
        if s.startswith("<AlphaNumeric/>") :
            s = s.replace("<AlphaNumeric/>", "AlphaNumeric")
            verifFormat = "no"
            t.append(s)
            w = "NaN"
            t.append(w)
 
        if s.startswith("<Numeric>") :
            s = s.replace("<Numeric>", "Numeric")
            verifFormat = "no"
            t.append(s)
            w = "NaN"
            t.append(w)
 
        if s.startswith("<Date>") :
            s = s.replace("<Date>", "Date")
            t.append(s)
            verifFormat = "yes"
 
        if s.startswith("<Format>") and verifFormat == "yes":
            s = s.replace("<Format>", "")
            s = s.replace("</Format>","")
            verifFormat = "no"
            t.append(s)
 
        if s.startswith("<From>") :
            s = s.replace("<From>", "")
            s = s.replace("</From>","")
            t.append(s)
 
        if s.startswith("<To>") :
            s = s.replace("<To>", "")
            s = s.replace("</To>","")
            t.append(s)      
 
            valList = "list" + str(j)       
 
            res[valList].extend(t) # on ajoute les valeurs à list i
 
            del t[:]
 
    i += 1
 
my_dataframe = pd.DataFrame.from_dict(res, orient='index')
my_dataframe

out

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 
 	0 	1 	2 	3 	4 	5
list0 	BKPF-BUKRS 	Company Code 	AlphaNumeric 	NaN 	1 	4
list1 	BKPF-BELNR 	Accounting Document Number 	AlphaNumeric 	NaN 	5 	14
list2 	BKPF-GJAHR 	Fiscal Year 	Numeric 	NaN 	15 	18
list3 	BKPF-BLART 	Document Type 	AlphaNumeric 	NaN 	19 	20
list4 	BKPF-BLDAT 	Document Date in Document 	Date 	YYYYMMDD 	21 	28
... 	... 	... 	... 	... 	... 	...
list110 	BSIS-FIPEX 	Commitment item - Do not use field - see note ... 	AlphaNumeric 	NaN 	1138 	1161
list111 	BSIS-PRODPER 		AlphaNumeric 	NaN 	1162 	1167
list112 	BSIS-QSSKZ 	Withholding Tax Code 	AlphaNumeric 	NaN 	1168 	1169
list113 	BSIS-PROPMANO 	Mandate, Mandate-Opening Contract 	AlphaNumeric 	NaN 	1170 	1182
list114 	SKB1-XOPVW 	Indicator: Open Item Management? 	AlphaNumeric 	NaN 	1183 	1183
 
115 rows × 6 columns

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
 
# Convertie la colonne 4 et 5 du dataframe (xml) en entier
my_dataframe[4] = my_dataframe[4].astype(int)
my_dataframe[5] = my_dataframe[5].astype(int)
 
# Soustrait 1 aux valeurs de la colonne 4 pour être utilisées dans le fractionnement du fichier PNS FI
my_dataframe[4] = my_dataframe[4].apply(lambda x: x - 1)
 
# Ajout d'une colonne 6 par concaténation de la colonne 4 et 5
my_dataframe[6] = my_dataframe[[4 , 5]].apply(tuple, axis=1)
 
my_dataframe

out

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 
 	0 	1 	2 	3 	4 	5 	6
list0 	BKPF-BUKRS 	Company Code 	AlphaNumeric 	NaN 	0 	4 	(0, 4)
list1 	BKPF-BELNR 	Accounting Document Number 	AlphaNumeric 	NaN 	4 	14 	(4, 14)
list2 	BKPF-GJAHR 	Fiscal Year 	Numeric 	NaN 	14 	18 	(14, 18)
list3 	BKPF-BLART 	Document Type 	AlphaNumeric 	NaN 	18 	20 	(18, 20)
list4 	BKPF-BLDAT 	Document Date in Document 	Date 	YYYYMMDD 	20 	28 	(20, 28)
... 	... 	... 	... 	... 	... 	... 	...
list110 	BSIS-FIPEX 	Commitment item - Do not use field - see note ... 	AlphaNumeric 	NaN 	1137 	1161 	(1137, 1161)
list111 	BSIS-PRODPER 		AlphaNumeric 	NaN 	1161 	1167 	(1161, 1167)
list112 	BSIS-QSSKZ 	Withholding Tax Code 	AlphaNumeric 	NaN 	1167 	1169 	(1167, 1169)
list113 	BSIS-PROPMANO 	Mandate, Mandate-Opening Contract 	AlphaNumeric 	NaN 	1169 	1182 	(1169, 1182)
list114 	SKB1-XOPVW 	Indicator: Open Item Management? 	AlphaNumeric 	NaN 	1182 	1183 	(1182, 1183)
 
115 rows × 7 columns

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
 
df3 = pd.DataFrame(data=my_dataframe[6])
 
df3_transposed = df3.T
 
Listcolspecs = df3_transposed
 
# dd = list(Listcolspecs.values)
# l = m.tolist()
a =list(Listcolspecs.values)
# l = a.tolist()
print(a)

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 
[array([(0, 4), (4, 14), (14, 18), (18, 20), (20, 28), (28, 36), (36, 38),
       (38, 46), (46, 52), (52, 60), (60, 68), (68, 80), (80, 100),
       (100, 116), (116, 132), (132, 142), (142, 152), (152, 156),
       (156, 181), (181, 182), (182, 187), (187, 207), (207, 211),
       (211, 212), (212, 222), (222, 234), (234, 254), (254, 274),
       (274, 275), (275, 283), (283, 291), (291, 299), (299, 311),
       (311, 313), (313, 373), (373, 379), (379, 419), (419, 451),
       (451, 455), (455, 465), (465, 473), (473, 483), (483, 501),
       (501, 505), (505, 515), (515, 518), (518, 526), (526, 534),
       (534, 539), (539, 555), (555, 557), (557, 559), (559, 561),
       (561, 562), (562, 566), (566, 568), (568, 571), (571, 586),
       (586, 601), (601, 616), (616, 631), (631, 681), (681, 697),
       (697, 709), (709, 713), (713, 723), (723, 731), (731, 732),
       (732, 740), (740, 741), (741, 756), (756, 771), (771, 777),
       (777, 782), (782, 783), (783, 798), (798, 813), (813, 828),
       (828, 843), (843, 858), (858, 873), (873, 874), (874, 882),
       (882, 892), (892, 893), (893, 894), (894, 909), (909, 910),
       (910, 913), (913, 923), (923, 943), (943, 947), (947, 962),
       (962, 977), (977, 992), (992, 995), (995, 1003), (1003, 1011),
       (1011, 1024), (1024, 1044), (1044, 1060), (1060, 1074),
       (1074, 1090), (1090, 1100), (1100, 1110), (1110, 1111),
       (1111, 1115), (1115, 1117), (1117, 1127), (1127, 1137),
       (1137, 1161), (1161, 1167), (1167, 1169), (1169, 1182),
       (1182, 1183)], dtype=object)]

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
 
# Lecture du fichier PNS (PNS FI)
 
# colspecs = [(0, 4), (4, 14), (14, 18), (18, 20), (20, 28), (28, 36), ......, (1182, 1183)]
# colspecs = [(0, 4), (4, 14), (14, 18), (18, 20), (20, 28), (28, 36)]
 colspecs = Listheader
 
pd.read_fwf('FI.TXT', colspecs=colspecs, header=None)

l'extension du fichier xml2 doit être renommée en xml et le fichier FI2 renommé en FI

Merci
Eric

**wiztricks** · 19/11/2020, 18h42

Salut,

Un code qui permette de reproduire, c'est quelques dizaines de lignes qu'on peut exécuter après les avoir recopiées (sans les modifier).

En gros vous devriez arriver à quelque chose comme çà:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from io import StringIO
from pandas.io.parsers import read_fwf
import numpy as np
 
data1 = """\
201158    360.242940   149.910199   11950.7
201159    444.953632   166.985655   11788.4
201160    364.136849   183.628767   11806.2
201161    413.836124   184.375703   11916.8
201162    502.953953   173.237159   12468.3
"""
colspecs = [(0, 4), (4, 8), (8, 20), (21, 33), (34, 43)]
df = read_fwf(StringIO(data1), colspecs=colspecs, header=None)
print(df)

Ou vous remplacez data1 par quelques données à vous et colspecs par l'objet que vous voulez.

- W

**Eric_03** · 19/11/2020, 21h36

Bonsoir,

merci pour votre suggestion mais j'ai toujours le même problème lorsque "result" est passé à partir de "result = list(records)".

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-b63d84ab8fc7> in <module>
      3 #result = [(0, 4), (4, 14), (14, 18), (18, 20), (20, 28), (28, 36), (36, 38), (38, 46), (46, 52), (52, 60), (60, 68), (68, 80), (80, 100), (100, 116), (116, 132), (132, 142), (142, 152), (152, 156), (156, 181), (181, 182), (182, 187), (187, 207), (207, 211), (211, 212), (212, 222), (222, 234), (234, 254), (254, 274), (274, 275), (275, 283), (283, 291), (291, 299), (299, 311), (311, 313), (313, 373), (373, 379), (379, 419), (419, 451), (451, 455), (455, 465), (465, 473), (473, 483), (483, 501), (501, 505), (505, 515), (515, 518), (518, 526), (526, 534), (534, 539), (539, 555), (555, 557), (557, 559), (559, 561), (561, 562), (562, 566), (566, 568), (568, 571), (571, 586), (586, 601), (601, 616), (616, 631), (631, 681), (681, 697), (697, 709), (709, 713), (713, 723), (723, 731), (731, 732), (732, 740), (740, 741), (741, 756), (756, 771), (771, 777), (777, 782), (782, 783), (783, 798), (798, 813), (813, 828), (828, 843), (843, 858), (858, 873), (873, 874), (874, 882), (882, 892), (892, 893), (893, 894), (894, 909), (909, 910), (910, 913), (913, 923), (923, 943), (943, 947), (947, 962), (962, 977), (977, 992), (992, 995), (995, 1003), (1003, 1011), (1011, 1024), (1024, 1044), (1044, 1060), (1060, 1074), (1074, 1090), (1090, 1100), (1100, 1110), (1110, 1111), (1111, 1115), (1115, 1117), (1117, 1127), (1127, 1137), (1137, 1161), (1161, 1167), (1167, 1169), (1169, 1182), (1182, 1183)]
      4 
----> 5 pd.read_fwf('FI.TXT', colspecs=result, header=None)
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in read_fwf(filepath_or_buffer, colspecs, widths, infer_nrows, **kwds)
    780     kwds["infer_nrows"] = infer_nrows
    781     kwds["engine"] = "python-fwf"
--> 782     return _read(filepath_or_buffer, kwds)
    783 
    784 
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    446 
    447     # Create the parser.
--> 448     parser = TextFileReader(fp_or_buf, **kwds)
    449 
    450     if chunksize or iterator:
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    878             self.options["has_index_names"] = kwds["has_index_names"]
    879 
--> 880         self._make_engine(self.engine)
    881 
    882     def close(self):
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1124                     '"python-fwf")'
   1125                 )
-> 1126             self._engine = klass(self.f, **self.options)
   1127 
   1128     def _failover_to_python(self):
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, **kwds)
   3659         self.colspecs = kwds.pop("colspecs")
   3660         self.infer_nrows = kwds.pop("infer_nrows")
-> 3661         PythonParser.__init__(self, f, **kwds)
   3662 
   3663     def _make_reader(self, f):
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, **kwds)
   2273         # Set self.data to something that can read lines.
   2274         if hasattr(f, "readline"):
-> 2275             self._make_reader(f)
   2276         else:
   2277             self.data = f
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_reader(self, f)
   3662 
   3663     def _make_reader(self, f):
-> 3664         self.data = FixedWidthReader(
   3665             f,
   3666             self.colspecs,
 
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, colspecs, delimiter, comment, skiprows, infer_nrows)
   3573                 and isinstance(colspec[1], (int, np.integer, type(None)))
   3574             ):
-> 3575                 raise TypeError(
   3576                     "Each column specification must be "
   3577                     "2 element tuple or list of integers"
 
TypeError: Each column specification must be 2 element tuple or list of integers

je joins le fichier ipynb en txt. Les affichages montrent qu'il s'agit bien d'une liste avec des integer.

Eric

Pandas transposition d'une colonne d'un dataframe [Python 3.X]

Python

Vue hybride

Discussions similaires

Partager

Partager