Context Navigation

← Previous Ticket
Next Ticket →

Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#538 closed defect (fixed)

HTMLParser fails if a multi-byte character falls on a 4K boundary

Reported by:	hodgestar	Owned by:	hodgestar
Priority:	major	Milestone:	0.7
Component:	Parsing	Version:	devel
Keywords:		Cc:

Description

If one does:

text = u'a' * ((4 * 1024) - 1) + u'\xe6'
events = list(HTMLParser(BytesIO(text.encode('utf-8')),
                                 encoding='utf-8'))

it produces a truncated-input error because the multi-byte character crosses the boundary of a read from the input file.

Change History (1)

comment:1 Changed 12 years ago by hodgestar

Resolution set to fixed
Status changed from new to closed

Fixed in r1189.

Note: See TracTickets for help on using tickets.

Download in other formats: