Edgewall Software

Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#29 closed defect (fixed)

compiler.parse() from std. library only takes a bytestring, not unicode

Reported by: arnarbi at gmail Owned by: cmlenz
Priority: major Milestone: 0.2
Component: Expression evaluation Version: 0.1
Keywords: Cc:

Description

A template expression like this

${tg.fleirtala(subdir.nopages, u'síða', u'síður')}

fails with UnicodeEncodeError?.

The error originates from compiler.parse() which is called in markup.eval:_compile() This simple test generates the error:

>>> from compiler import parse
>>> parse(u"u'\xfe'")

The string has to be converted to a bytestring, and the encoding specified to parse via a '# -*- encoding: xxx -*-' line, or a UTF-8 byte order marker. Like this:

>>> parse("# -*- encoding: UTF-8 -*-\nu'\xc3\xbe'")
Module(u'\xc3\xbe', Stmt([]))

or this

>>> parse("\xef\xbb\xbfu'\xc3\xbe'")    # the \ef\xbb\xbf is the UTF-8 BOM
Module(u'\xc3\xbe', Stmt([]))

Attachments (2)

parse_utf8bom.patch (559 bytes) - added by arnarbi at gmail 18 years ago.
Patch that converts unicode expressions to byte-strings and adds BOM
parse_utf8bom_2.patch (1.5 KB) - added by arnarbi at gmail 18 years ago.
Same as above, but only sends marked string to parse() and doesn't store it

Download all attachments as: .zip

Change History (3)

Changed 18 years ago by arnarbi at gmail

Patch that converts unicode expressions to byte-strings and adds BOM

Changed 18 years ago by arnarbi at gmail

Same as above, but only sends marked string to parse() and doesn't store it

comment:1 Changed 18 years ago by cmlenz

  • Resolution set to fixed
  • Status changed from new to closed

Applied slightly modified version of the patch in [211]. Thanks!

Note: See TracTickets for help on using tickets.