If you're getting errors that say: "'ascii' codec can't encode character 'x' in position y: ordinal not in range(128)", the problem is probably with your Python installation rather than with Beautiful Soup. Try printing out the non-ASCII characters without running them through Beautiful Soup and you should have the same problem. For instance, try running code like this:
latin1word = 'Sacr\xe9 bleu!'
unicodeword = unicode(latin1word, 'latin-1')
print unicodeword
If this works but Beautiful Soup doesn't, there's probably a bug in Beautiful Soup. However, if this doesn't work, the problem's with your Python setup. Python is playing it safe and not sending non-ASCII characters to your terminal. There are two ways to override this behavior.
   1. The easy way is to remap standard output to a converter that's not afraid to send ISO-Latin-1 or UTF-8 characters to the terminal.
	
	1 2 3 4
   |       import codecs
      import sys
      streamWriter = codecs.lookup('utf-8')[-1]
      sys.stdout = streamWriter(sys.stdout) | 
       codecs.lookup returns a number of bound methods and other objects related to a codec. The last one is a StreamWriter object capable of wrapping an output stream.
   2.  The hard way is to create a sitecustomize.py file in your Python installation which sets the default encoding to ISO-Latin-1 or to UTF-8. Then all your Python programs will use that encoding for standard output, without you having to do something for each program. In my installation, I have a /usr/lib/python/sitecustomize.py which looks like this:
	
	1 2
   |       import sys
      sys.setdefaultencoding("utf-8") | 
 
			
		 
	
Partager