Opened 17 years ago
Closed 17 years ago
#136 closed defect (fixed)
Transformer.filter returns inconsistently marked stream
Reported by: | Dave Abrahams <dave@…> | Owned by: | athomas |
---|---|---|---|
Priority: | major | Milestone: | 0.5 |
Component: | General | Version: | 0.4.2 |
Keywords: | Cc: |
Description
The resulting stream ends up with a mixture of marked and unmarked events. Because the current tests just use print on the stream, the error is hidden. You can see the problem by doing print(list( resultstream )). You need to decide whether the result is supposed to be a marked or an unmarked stream. I suggest changing all filtering doctests by appending a new filter that asserts the correctness of the stream format before passing it through to the output, or updating Stream.render() to do the invariant check.
Attachments (1)
Change History (8)
Changed 17 years ago by Dave Abrahams <dave@…>
comment:1 Changed 17 years ago by athomas
- Summary changed from Transformer.filter broken to Transformer.filter returns inconsistently marked stream
comment:2 Changed 17 years ago by athomas
- Status changed from new to assigned
It's not clear to me what the exact problem is. Do you have an example filter that triggers this behaviour?
comment:3 Changed 17 years ago by Dave Abrahams <dave@…>
As I tried to make clear above, your own doctest case triggers it.
comment:4 follow-up: ↓ 5 Changed 17 years ago by athomas
Right, and I'm asking for more information. Like which doctest? And which marks are you talking about? StreamFilter marks or Transformer marks?
The doctest I assume you're referring to is this one (modified to use list()):
>>> from genshi.filters.html import HTMLSanitizer >>> html = HTML('<html><body>Some text<script>alert(document.cookie)' ... '</script> and some more text</body></html>') >>> import pprint >>> pprint.pprint(list(html | Transformer('body/*').filter(HTMLSanitizer())))
Which prints this when using list() as you suggest:
[('START', (QName(u'html'), Attrs()), (None, 1, 0)), ('START', (QName(u'body'), Attrs()), (None, 1, 6)), ('TEXT', u'Some text', (None, 1, 12)), ('TEXT', u' and some more text', (None, 1, 60)), ('END', QName(u'body'), (None, 1, 79)), ('END', QName(u'html'), (None, 1, 86))]
I don't see a mix of marked/unmarked events. If there's a bug, I'd like to fix it, but I need more information.
Also, the flush() function looks correct to me. self.filter is a "normal" stream filter, not a transformer filter, so it must return a normal stream event ((kind, data, pos)), not a transformer event ((mark, (kind, data, pos))). That is why flush() prepends OUTSIDE.
comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 17 years ago by Dave Abrahams <dave@…>
Replying to athomas:
Right, and I'm asking for more information. Like which doctest?
I meant (AFAICT) the only one there is (the one you assumed below). But clearly I was slightly mistaken.
And which marks are you talking about? StreamFilter marks or Transformer marks?
Heck, I didn't know there was a distinction. I think more docs may be needed to describe all these marks.
I don't see a mix of marked/unmarked events. If there's a bug, I'd like to fix it, but I need more information.
>>> def noop(s): ... for e in s: yield e ... >>> pprint.pprint(list(html | Transformer('body/*').filter(noop))) [('START', (QName(u'html'), Attrs()), (None, 1, 0)), ('START', (QName(u'body'), Attrs()), (None, 1, 6)), ('TEXT', u'Some text', (None, 1, 12)), ('START', (QName(u'script'), Attrs()), (None, 1, 21)), ('TEXT', u'alert(document.cookie)', (None, 1, 29)), ('END', QName(u'script'), (None, 1, 51)), ('OUTSIDE', ('END', QName(u'script'), (None, 1, 51))), ('END', QName(u'body'), (None, 1, 79)), ('END', QName(u'html'), (None, 1, 86))]
Also, the flush() function looks correct to me. self.filter is a "normal" stream filter, not a transformer filter, so it must return a normal stream event ((kind, data, pos)), not a transformer event ((mark, (kind, data, pos))). That is why flush() prepends OUTSIDE.
Well, then, I must be doing something wrong.
comment:6 in reply to: ↑ 5 Changed 17 years ago by athomas
Replying to Dave Abrahams <dave@boost-consulting.com>:
>>> def noop(s): ... for e in s: yield e ... >>> pprint.pprint(list(html | Transformer('body/\*').filter(noop))) [('START', (QName(u'html'), Attrs()), (None, 1, 0)), ('START', (QName(u'body'), Attrs()), (None, 1, 6)), ('TEXT', u'Some text', (None, 1, 12)), ('START', (QName(u'script'), Attrs()), (None, 1, 21)), ('TEXT', u'alert(document.cookie)', (None, 1, 29)), ('END', QName(u'script'), (None, 1, 51)), **('OUTSIDE', ('END', QName(u'script'), (None, 1, 51)))**, ('END', QName(u'body'), (None, 1, 79)), ('END', QName(u'html'), (None, 1, 86))]
Perfect, thanks!
Try this patch:
-
genshi/filters/transform.py
794 794 if mark: 795 795 queue.append(event) 796 796 else: 797 for event in flush(queue):798 yield event797 for qevent in flush(queue): 798 yield qevent 799 799 yield None, event 800 800 for event in flush(queue): 801 801 yield event
comment:7 Changed 17 years ago by athomas
- Resolution set to fixed
- Status changed from assigned to closed
This fix seems to work for me, committed in r691.
This patch appears to fix it, but I don't know whether you want None marks