pandas problème encodage unicode

**kondor76** · 15/03/2019, 10h47

Bonjour

Ayant enfin trouvé une potentielle solution avec pandas pour la conversion de feuilles excel en pdf, je suis confronté à un autre problème:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/python
# coding: utf-8
import glob
import os
from os.path import basename, splitext
#from openpyxl import load_workbook
#from PyPDF2 import PdfFileWriter
import sys
import codecs
#from fpdf import FPDF
import pandas as pd
import pdfkit as pdf
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
path = "."
os.chdir(path)
dir = os.getcwd()
ext = "*.xlsm"
print("dir",dir)
#fics_list = os.listdir(path)
pattern = '/'.join([dir,ext])
print("pattern",pattern)
fics_list = glob.glob(pattern)
#print (fics_list)
for path_file in fics_list:
        #ouverture du fichier Excel
        xl = pd.read_excel(path_file, 3, encoding='utf-8')
        df_CRA = xl.parse(sheet_name=3, encoding='utf-8')
        df_CRA.to_html('test.html')
        filename = basename(path_file)
        filename_prefix_CRA,filename_ext = filename.split(".")
        pdf_ext = "pdf"
        pdf_CRA_filename = '.'.join([filename_prefix_CRA,pdf_ext])
        pdf.from_file('test.html', pdf_CRA_filename)
        filename_prefix_HSupp = filename_prefix_CRA.replace('CRA','HSupp')
        pdf_HSupp_filename = '.'.join([filename_prefix_HSupp,pdf_ext])
        df_HSUPP = xl.parse(sheet_name=4, encoding='utf-8')
        df_HSUPP.to_html('test.html')
        pdf.from_file('test.html', pdf_HSupp_filename)

Lorsque j'exécute le sript j'obtiens cela pour la sheet 3:

Traceback (most recent call last):
  File "./file.py", line 30, in <module>
    df_CRA.to_html('test.html')
  File "/----/python/lib/python3.6/site-packages/pandas/core/frame.py", line 2265, in to_html
    formatter.to_html(classes=classes, notebook=notebook, border=border)
  File "/----/python/lib/python3.6/site-packages/pandas/io/formats/format.py", line 734, in to_html
    buffer_put_lines(f, html)
  File "/----/python/lib/python3.6/site-packages/pandas/io/formats/format.py", line 1626, in buffer_put_lines
    buf.write('\n'.join(lines))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 118: ordinal not in range(128)

Et cela pour la sheet 4 (en commentant le sheet 3):

Traceback (most recent call last):
  File "./file.py", line 30, in <module>
    df_CRA.to_html('test.html')
  File "/-----/python/lib/python3.6/site-packages/pandas/core/frame.py", line 2265, in to_html
    formatter.to_html(classes=classes, notebook=notebook, border=border)
  File "/-----/python/lib/python3.6/site-packages/pandas/io/formats/format.py", line 734, in to_html
    buffer_put_lines(f, html)
  File "/-----/python/lib/python3.6/site-packages/pandas/io/formats/format.py", line 1626, in buffer_put_lines
    buf.write('\n'.join(lines))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 117: ordinal not in range(128)

J'avoue que je galère toujours pas mal avec ces problème d'encodage..... une idée quand à la résolution?

Merci

pandas problème encodage unicode

Python

Mode arborescent

Partager

Partager