Salut,

Je cherche à écrire et lire un fichier XML contenant des caractères non ascii
Voici ce que j'ai fait :

Code : Sélectionner tout - Visualiser dans une fenêtre à part
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# -*- coding: utf-8 -*-
from lxml import etree
 
def write_xml(filename):
        root_elt = etree.Element("Repository", {"version": str(1),
                                                "val": "éléà"})
        elt_tree = etree.ElementTree(root_elt)
        tmp = etree.tostring(elt_tree, pretty_print=True, encoding="utf-8", xml_declaration=True)
        with open(filename, 'w') as f:
            f.write(str(tmp, encoding="utf-8"))
 
def read_xml(filename):
        tree = etree.parse(filename)
        root = tree.getroot()
        version = int(root.get("version"))
        val = root.get("val")
 
filename = "fic.xml"
write_xml(filename)
read_xml(filename)
Voici la trace que j'obtiens :

Code : Sélectionner tout - Visualiser dans une fenêtre à part
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Traceback (most recent call last):
  File "test.py", line 22, in <module>
    read_xml(filename)
  File "test.py", line 13, in read_xml
    tree = etree.parse(filename)
  File "lxml.etree.pyx", line 2954, in lxml.etree.parse (src/lxml\lxml.etree.c:56220)
  File "parser.pxi", line 1533, in lxml.etree._parseDocument (src/lxml\lxml.etree.c:82303)
  File "parser.pxi", line 1562, in lxml.etree._parseDocumentFromURL (src/lxml\lxml.etree.c:82596)
  File "parser.pxi", line 1462, in lxml.etree._parseDocFromFile (src/lxml\lxml.etree.c:81635)
  File "parser.pxi", line 1002, in lxml.etree._BaseParser._parseDocFromFile (src/lxml\lxml.etree.c:78544)
  File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml\lxml.etree.c:74488)
  File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml\lxml.etree.c:75379)
  File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74712)
lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x6C 0xE9 0xE0, line 2, column 25
J'ai essayé de jouer avec le paramètre encoding sans réussir à faire quelque chose de fonctionnel.

Quelqu'un a une idée ?