Impossible de convertir une cellule en HTML avec les accents convertis

**balawoo** · 09/02/2019, 22h17

Bonjour,

Je suis un newbie en Perl. Pour pouvoir utiliser Testlink, j'ai besoin de convertir un fichier Excel en XML avec pour le texte une conversion des caractères accentués et je veux garder les formats spécifiques de mon fichier source (gras, surligné ...).

Je donne un exemple, voici le contenu d'une ligne de mon fichier:
PP10-RG-010 MASTER DATA 1 1 1 Le format et le contenu des 2 documents sont décrits dans la SFD XXX (JIRA 624).

lorsque je fais un exemple simple comme celui-ci cela fonctionne

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
 #!/usr/bin/perl
 
use HTML::Entities;
use warnings;
#use	utf8;
#binmode(stdout => utf8);
 
my $unsafe_chars = "< & >";
my $string="<br>àéèçûîùô<>";
print $string, "\n";
print encode_entities($string, "àéèçûîùô");

Le résultat est conforme à ce que je cherche à faire, c'est-à-dire:
<br>àéèçûîùô<>
<br>Ã Ã©Ã¨Ã§Ã»Ã®Ã¹Ã´<>

Si je prends mon fichier excel avec le code suivant (issu d'un script déjà fait)

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
 
#!/usr/bin/perl -w
#######################################################
#
# Author: Polina Antipova aka Kitsune
#
# What it does:
# 	This script converts XLS files to XML format compatible with Testlink.
#
# More info at <a href="https://bitbucket.org/kitsuneo/xls2testlink-script/wiki" target="_blank">https://bitbucket.org/kitsuneo/xls2testlink-script/wiki</a>
# Suggestion/bugs are also accepted at <a href="mailto:polina.antipova@gmail.com">polina.antipova@gmail.com</a>
#
# !! Script is very data sensible. Please, be careful with input XLS file !!
#
########################################################
 
use warnings;
use Encode qw/encode decode/;
use Spreadsheet::ParseExcel;
use File::Basename;
use feature qw(say);
use feature "switch";
#use Text::EtText::EtText2HTML;
#use HTML::TextToHTML;
use HTML::Entities;
 
sub log_and_print {
    print (scalar(localtime()), " ", @_, "\n");
    print LOGFILE (scalar(localtime()), " ", @_, "\n");
}
 
sub print_log {
	print LOGFILE (scalar(localtime()), " ", @_, "\n");
}
 
sub print_txt {
	print TXT (@_);
}
 
sub print_xml {
	print XML (@_);
}
 
# Source file and path are taken as script parameter
if (@ARGV) {
	$full_source_file_path = $ARGV[0];
	}
 
else {
	die (" ! Please, enter file name as script parameter. \n See README.txt for mor details");
	}
 
# Parse @ARGV, get path and file from script parameter
($source_file_name, $source_file_path, $source_file_suffix) = fileparse($full_source_file_path, qr/\.[^.]*/);	
 
# Define all files and paths
$source_file = "$source_file_name$source_file_suffix";
 
$logs_dir = "logs";
$converted_dir = "converted_files";
 
# Create folders for logs and converted files
chdir ($source_file_path);
mkdir ($logs_dir);
mkdir ($converted_dir);
 
# Output files get names after source file
$txt_file = ("$source_file_name\_parsed.txt");
$xml_file = ("$source_file_name\_resulted.xml");
$file_log = ("$source_file_name\_debug.log");
 
# Files + folders knocked into variables
$txt = "$converted_dir/$txt_file";
$xml = "$converted_dir/$xml_file";
$log = "$logs_dir/$file_log";
 
open (LOGFILE, "> $log") || die("Could not open file! $log");
open (TXT, "> $txt") || die("Could not open file! $txt");
 
log_and_print ("**************");
log_and_print ("Start working");
log_and_print ("STEP1: converting $source_file to TXT \n");
 
# see <a href="http://search.cpan.org/~jmcnamara/Spreadsheet-ParseExcel-0.59/lib/Spreadsheet/ParseExcel.pm#SYNOPSIS" target="_blank">http://search.cpan.org/~jmcnamara/Sp...el.pm#SYNOPSIS</a>
# STEP1: The data from XLS file is stored in temp TXT file 
my $parser   = Spreadsheet::ParseExcel->new();
    my $workbook = $parser->parse($source_file);
 
    if ( !defined $workbook ) {
        die $parser->error(), ".\n";
    }
 
    for my $worksheet ( $workbook->worksheets() ) {
 
        my ( $row_min, $row_max ) = $worksheet->row_range();
        my ( $col_min, $col_max ) = $worksheet->col_range();
 
        for my $row ( $row_min .. $row_max ) {
            for my $col ( $col_min .. $col_max ) {
 
                my $cell = $worksheet->get_cell( $row, $col );
                next unless $cell;
 
               # Every cell is readed as row in .txt. Output format "Row, Col, value" 
			   $cell_unformatted = $cell->unformatted();	
			   $cell_unformatted =~ s/\n/<br>/g; 
 
			   print_txt "$row;;$col;;", $cell_unformatted ,"\n";
 
            }
        }
    }
 
close(TXT);	
 
log_and_print ("STEP1: OK \n Result: file $txt is created \n");	
log_and_print ("STEP2: converting $txt to XML \n");
 
# HERE THE KINGDOM OF CONVERTING BEGINS
open (XML, "> $xml") || die("Could not open file! $xml");
 
 # Read TXT to an array
	open (SOURCE, "$txt") || die ("Could not open file! $txt");
	@parsed_data = <SOURCE>;
	close (SOURCE);
 
	# Initialize auxiliary variables
	$source_row_counter = 0;
	$commented = "//";
 
	$header_of_xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<requirements>\n";
	$end_of_kw = "</requirements>";
 
	# Print first row of XML file
	print_xml $header_of_xml;
 
 # Read every row from .txt array.	
 foreach $file_row (@parsed_data){	
 
	chomp $file_row;
 
	# Cell is processed according to its positions and value
	@splited_row = split(/;;/, $file_row);
 
	$cell_row_position = $splited_row[0];
	$cell_col_position = $splited_row[1];
	$cell_value = $splited_row[2];
 
	print_log ("Debug: Current position: row - $cell_row_position, col - $cell_col_position");
 
	# First cell in excel file is expected to be mark of new test case (one test case - one excel row)
	# If cell is first, all test case variable are cleaned up
	if ($cell_col_position == 0){
		$test_requirement = "";
		$tc_docid_tag = "";
		$tc_title_tag = "";
		$tc_version_tag = "";
		$tc_revision_tag = "";
		$tc_node_order_tag = "";
		$tc_description_tag = "";
		$tc_status_tag = "";
		$tc_type_tag = "";
		$tc_coverage_tag = "";
 
		# TODO shift positions in arrays: in excel rows starts from 1, in parsed excel from 0
		print_log ("Debug: Start accumulate test case at row position $cell_row_position");
	}
 
	# Each value is checked if it is commented or not
	# Commented if cell is empty or first cell symbol is double slash "//"
	no warnings;
	$if_commented = substr $cell_value, 0, 2;
 
	# Skip cells from Row1 and Column A - reserved for Header and comments
	if (($cell_row_position == 0) and ($cell_col_position == 0)) {	
		print_log ("File: Skip \"Header or Comments\". Cell position: row - $cell_row_position, col - $cell_col_position");
	}
 
	# Skip empty cells
	# Check that 1st test case has a test suite name (!mandatory)
#	elsif ($cell_value eq "") {	
#		print_log ("File: Skip empty cell. \"$cell_value\", cell position: row - $cell_row_position, col - $cell_col_position");
#		
#		log_and_print ("! File: Test suite (row2) must be defined, otherwise .xml will be broken!") if (($cell_row_position ==1) && ($cell_col_position ==1));
#		log_and_print ("! File: Test case name (row", $cell_row_position+1, ") must be defined, otherwise .xml will be broken!") if ($cell_col_position == 2);
#	}
 
	# Skip cells started with "//"
	elsif ($if_commented eq $commented) {	
		print_log ("File: Skip empty cell. Cell begins with \"$if_commented\", cell position: row - $cell_row_position, col - $cell_col_position");
		log_and_print ("! File: Test suite (row2) must not be commented, otherwise .xml will be broken!") if (($cell_row_position == 1)&& ($cell_col_position ==1));
		log_and_print ("! File: Test case name (row", $cell_row_position+1, ") must not be commented, otherwise .xml will be broken!") if ($cell_col_position == 2);
	}
 
	# Accumulate test case record from all other values
	# Script use hard-coded values (columns numbers) and expects precise work with source file	
	else {	
#		{  my $ts_requirement = "$cell_value";
 
#								my $ts_requirement_tag = "<requirement>\n" ;	
 
		#						print_xml ($end_of_kw, "\n") if ($cell_row_position > 2);
		#						print_xml ($ts_name_tag, "\n");
#							}
		given ($cell_col_position) {
 
			# Block <requirement>
#			# Get test suite name, close previous test suite if it is not 1st test suite in the file
#			when (1) {  my $ts_requirement = "$cell_value";
 
#						my $ts_requirement_tag = "<requirement>\n" ;	
 
#						print_xml ($end_of_kw, "\n") if ($cell_row_position > 2);
#						print_xml ($ts_name_tag, "\n");
#					}
#					print_log("$cell_col_position");
 
			# Block <docid>	
			when (0) {; 
				my $tc_docid = "$cell_value"; 
						$tc_docid_tag = "<requirement><docid ><![CDATA[$tc_docid]]></docid>\n";
					print_log("Ligne\n $cell_col_position\n $tc_docid_tag")
					}
 
			# Block <title>		
			when (1) { my $tc_title = "$cell_value"; 
						$tc_title_tag = "<title><![CDATA[$tc_title]]></title>\n";			
					}	
 
			# Block <version>		
			when (2) { my $tc_version = "$cell_value"; 
						$tc_version_tag = "<version>$tc_version</version>\n";
					}
 
			# Block <revision>		
			when (3) { my $tc_revision = "$cell_value";
						$tc_revision_tag = "<revision> $tc_revision </revision>\n";
					}
 
			# Block <node_order>
			when (4) { my $tc_node_order = "$cell_value";
							$tc_node_order_tag = "<node_order>$tc_node_order</node_order>\n";
 
							}
 
			# Block <description>
			when (5) { my $tc_description = "$cell_value";
			$t = encode_entities("$cell_value", "àéèçûîùôâ");
 
						$tc_description_tag = "<description><![CDATA[$t]]></description>\n";
						}
 
			# Block <status>
			when (6) { my $tc_status = "$cell_value";
					$tc_status_tag = "<status><![CDATA[V]]></status>\n";
 
					}
 
			# Block <type>
			when (7) { my $tc_type = "$cell_value";
					$tc_type_tag = "<type><![CDATA[3]]></type>\n";
 
				}
			# Block <excepted_coverage>
			when (8) { my $tc_coverage = "$cell_value";
					$tc_coverage_tag = "<expected_coverage><![CDATA[1]]></expected_coverage>\n";
 
 
 
		# Final accumulated requirement
		$test_requirement = "$tc_docid_tag $tc_title_tag $tc_version_tag $tc_revision_tag $tc_node_order_tag $tc_description_tag $tc_status_tag $tc_type_tag $tc_coverage_tag	</requirement>\n";	
		print_xml $test_requirement;		
}
 
	# When cell is last in the excel row - call to print test case to file
		if ($cell_col_position == 8) {
		$source_row_counter++;
 
		print_log ("File: Print accumulated keyword at row position $cell_row_position");
		print_log ("Fichier $test_requirement");
		print_xml $test_requirement;		
		}
	}
	}
 
}
 
 
 # Close XML file - double </testsuite>
print_xml ($end_of_kw, "\n");
 
 # deduct one row from row counter (reserved for headers) for get number of processed Excel test cases
 $source_row_counter = $source_row_counter -1; 
 
 log_and_print ("STEP2: OK \n Result: file $xml is created \n");	
 log_and_print ("Processed $source_row_counter rows from $source_file \n See debug log for more details.\n\n");
 
close(XML);
close(LOGFILE);

Le résultat est que le système ne traite pas les accents et je n'arrive pas à comprendre mon erreur, pouvez-vous m'aider ?

Balawoo
NB les pièces jointes sont mon code et le fichier d'exemple
Pour information, j'ai également un post accessible ici https://perlmonks.org/?node_id=1229686

**balawoo** · 10/02/2019, 20h54

Avec l'aide des membres de Perlmonk le point est résolu

Voici le code qui fonctionne

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
 
use warnings;
use strict;
#use Spreadsheet::ParseExcel ();
use Spreadsheet::Read 'ReadData';
use Encode 'decode';
use XML::LibXML;
 
my $INFILE = 'TestPGR.xls';
my $ENCODING = 'MacRoman';
my $OUTFILE = 'TestPGR.xml';
my %FIELDS = ( 1=>'docid', 2=>'title', 3=>'version', 4=>'revision',
	5=>'node_order', 6=>'description', 7=>'status', 8=>'type',
	9=>'expected_coverage', );
 
my $book = ReadData($INFILE, rc=>1, cells=>0 );
my $sheet = $book->[1] or die "Book doesn't have a sheet 1";
 
my $doc = XML::LibXML::Document->createDocument('1.0', 'UTF-8');
my $reqs = $doc->createElement('requirements');
$doc->setDocumentElement($reqs);
 
for my $r ( $sheet->{minrow}+1 .. $sheet->{maxrow} ) {
	my $req = $doc->createElement('requirement');
	for my $c ( $sheet->{mincol} .. $sheet->{maxcol} ) {
		next unless exists $FIELDS{$c};
		my $val = decode($ENCODING, $sheet->{cell}[$c][$r],
			Encode::FB_HTMLCREF);
		my $node = $doc->createElement($FIELDS{$c});
		$node->appendText($val);
		$req->appendChild($node);
	}
	$reqs->appendChild($req);
}
 
$doc->toFile($OUTFILE,1);

Impossible de convertir une cellule en HTML avec les accents convertis

Langage Perl

Discussions similaires

Partager

Partager