#538 closed defect (fixed)
HTMLParser fails if a multi-byte character falls on a 4K boundary
Reported by: | hodgestar | Owned by: | hodgestar |
---|---|---|---|
Priority: | major | Milestone: | 0.7 |
Component: | Parsing | Version: | devel |
Keywords: | Cc: |
Description
If one does:
text = u'a' * ((4 * 1024) - 1) + u'\xe6' events = list(HTMLParser(BytesIO(text.encode('utf-8')), encoding='utf-8'))
it produces a truncated-input error because the multi-byte character crosses the boundary of a read from the input file.
Change History (1)
comment:1 Changed 12 years ago by hodgestar
- Resolution set to fixed
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
Fixed in r1189.