Opened 18 years ago
Closed 18 years ago
#37 closed enhancement (fixed)
[PATCH] Make it possible to have a dict inside a full expression
Reported by: | cboos | Owned by: | cmlenz |
---|---|---|---|
Priority: | minor | Milestone: | 0.4 |
Component: | Template processing | Version: | 0.2 |
Keywords: | Cc: |
Description
Currently, the regexp for a full expression (i.e. ${...}) will stop at the first "}" it finds. This makes it impossible to embed a litteral dictionary inside such a full expression.
It would be nice to be able to escape this character, e.g. by using a backquote:
-
markup/tests/template.py
604 604 self.assertEqual(Template.EXPR, parts[0][0]) 605 605 self.assertEqual('bla', parts[0][1].source) 606 606 607 def test_interpolate_full_escape(self): 608 parts = list(Template._interpolate('${{1:2\}}')) 609 self.assertEqual(1, len(parts)) 610 self.assertEqual(Template.EXPR, parts[0][0]) 611 self.assertEqual('{1:2}', parts[0][1].source) 612 607 613 def test_interpolate_mixed1(self): 608 614 parts = list(Template._interpolate('$foo bar $baz')) 609 615 self.assertEqual(3, len(parts)) -
markup/template.py
825 825 826 826 self.stream = stream 827 827 828 _FULL_EXPR_RE = re.compile(r'(?<!\$)\$\{(.+?) \}', re.DOTALL)828 _FULL_EXPR_RE = re.compile(r'(?<!\$)\$\{(.+?)(?<!\\)\}', re.DOTALL) 829 829 _SHORT_EXPR_RE = re.compile(r'(?<!\$)\$([a-zA-Z][a-zA-Z0-9_\.]*)') 830 830 831 831 def _interpolate(cls, text, filename=None, lineno=-1, offset=-1): … … 844 844 for idx, group in enumerate(patterns.pop(0).split(text)): 845 845 if idx % 2: 846 846 try: 847 group = group.replace(r'\}','}') 847 848 yield EXPR, Expression(group, filename, lineno), \ 848 849 (filename, lineno, offset) 849 850 except SyntaxError, err:
Note: This relates to the suggestion of oliver on the mailing list about escaping ";" in the vars attribute of <py:with> directives. Here, I don't think that doubling "}" for escaping it would be a good idea. I think that we should even consider using backquote as the general way of quoting. That would work in all cases (also for "\;" in <py:with>).
Attachments (6)
Change History (23)
comment:1 Changed 18 years ago by cboos
- Component changed from General to Template processing
comment:2 Changed 18 years ago by cboos
comment:3 Changed 18 years ago by cmlenz
For the record, the reason I haven't applied this patch yet is that I'm still hoping we'll be able to find a way to not require the escaping at all. I consider the backslash escaping a last resort if we can't come up with something better.
comment:4 Changed 18 years ago by cmlenz
Lovely! Thanks a lot, Oliver!
Now the last missing part is to ignore braces inside string literals :-P
comment:5 Changed 18 years ago by oliver.cope@…
Well, I have something that does that now. It is somewhat inelegant: it scans to the first '}', then uses python's parser module to test if what it has found is syntactically valid, and if not scans to the next '}' and so on. It runs approximately three times slower than the existing implementation, a noticeable slowdown.
I also tried writing a simple fsm based parser to detect string literals, but that performed even more slowly.
I'm not happy with either solution, and I can't think how else to tackle this one :o(
Of course, so long as any brackets inside string literals balance, the attached patch will still work. And if the brackets do not balance then it is always possible to make them balance, so perhaps we should not worry too much about this particular case?
comment:6 Changed 18 years ago by cboos
... and what about a completely different approach: doubling the braces used to enclose the expression?
-
markup/tests/template.py
717 717 self.assertEqual(Template.EXPR, parts[0][0]) 718 718 self.assertEqual('bla', parts[0][1].source) 719 719 720 def test_interpolate_full_escape(self): 721 parts = list(Template._interpolate('${{ {1:2} }}')) 722 self.assertEqual(1, len(parts)) 723 self.assertEqual(Template.EXPR, parts[0][0]) 724 self.assertEqual('{1:2}', parts[0][1].source) 725 720 726 def test_interpolate_mixed1(self): 721 727 parts = list(Template._interpolate('$foo bar $baz')) 722 728 self.assertEqual(3, len(parts)) -
markup/template.py
853 853 self.stream = stream 854 854 855 855 _FULL_EXPR_RE = re.compile(r'(?<!\$)\$\{(.+?)\}', re.DOTALL) 856 _FULL_EXPR2_RE = re.compile(r'(?<!\$)\$\{\{(.+?)\}\}', re.DOTALL) 856 857 _SHORT_EXPR_RE = re.compile(r'(?<!\$)\$([a-zA-Z][a-zA-Z0-9_\.]*)') 857 858 858 859 def _interpolate(cls, text, filename=None, lineno=-1, offset=-1): … … 889 890 offset += len(lines[-1]) 890 891 else: 891 892 offset += len(grp) 892 return _interpolate(text, [cls._FULL_EXPR_RE, cls._SHORT_EXPR_RE]) 893 return _interpolate(text, [cls._FULL_EXPR2_RE, cls._FULL_EXPR_RE, 894 cls._SHORT_EXPR_RE]) 893 895 _interpolate = classmethod(_interpolate) 894 896 895 897 def generate(self, *args, **kwargs):
comment:7 follow-ups: ↓ 8 ↓ 9 Changed 18 years ago by cmlenz
cboos: that's still similar to having to escape... it's not the backslash that I dislike, it's that you need to think about this stuff at all as a template author :-P
So IMHO the ideal solution would mean dict literals in expressions “Just Work”.
Oliver, can you attach that code? Personally, I'm less concerned about parsing performance than about render performance.
Also, this code may be interesting:
(found that via PEP 215)
comment:8 in reply to: ↑ 7 Changed 18 years ago by cboos
Replying to cmlenz:
... it's that you need to think about this stuff at all as a template author :-P
OK, I understand your p.o.v, but mine (as a user) is that I much better prefer a simple rule that will always work than a clever algorithm that might break in unexpected circumstances and leave me with a weird backtrace...
Also, here the "user" is writing some Python code, so she's at least familiar with the Python syntax. In Python, there's a very similar quoting idiom, it's the triple single quote or triple double quote syntax. I'd amend comment:6 proposal to use triple braces, so that this would match exactly the Python way.
e.g.
This is part of a template text ${{{ """ Now this is part of a Python "expression. This is "%". """ % {1: 'Bad', 2: 'Average', 3: 'Good'}[level] }}}
So IMHO the ideal solution would mean dict literals in expressions “Just Work”.
Of course, if you find the ideal solution, I would have nothing against it, I'm only suggesting a sub-optimal approach which is not so bad, IMO (and better than not being able to have dict litterals at all ;)
Changed 18 years ago by oliver.cope@…
Match balanced braces, even in quoted strings, using a simple fsm based parser
Changed 18 years ago by oliver.cope@…
Match balanced braces, even in quoted strings, using python's parser module
comment:9 in reply to: ↑ 7 Changed 18 years ago by oliver.cope@…
Replying to cmlenz:
Oliver, can you attach that code? Personally, I'm less concerned about parsing performance than about render performance.
Attached. Two versions: one uses python's parser module as described in my previous comment, the other uses an fsm based parser. Both handle quoted strings correctly. I'm currently using the FSM version in my own project as it is marginally faster, and also to give it a thorough working out.
Unfortunately, just too late, I noticed I'd left the doctests commented out for testing purposes when generating the diff, and don't have permission to replace the patches with corrected versions.
Olly.
comment:10 Changed 18 years ago by cmlenz
- Status changed from new to assigned
I've added my take on this, which is based on the Itpl.py module I linked to above. I've not tested the performance of this one, but I think it is rather clean.
Also included in the patch is a change of the escaping from double dollars ($$) to using a backslash.
comment:11 Changed 18 years ago by cmlenz
Oh, plus shorthand expressions no longer accept non-alphanumeric characters (i.e. no dots or underscores). I'd like to also add the restriction that they should start with letters, to match python identifiers. Not yet sure whether those changes (plus the escaping change) should be included.
comment:12 Changed 18 years ago by cboos
Ouch! I'd happily trade the possibility to have litteral dicts in full expressions against the support for dots and underscores in shorthand expressions. I mean, if you don't support those, what would be the point of having shorthand expressions at all?
comment:13 follow-up: ↓ 14 Changed 18 years ago by cmlenz
Well, first of all, those aspects of the patch have nothing to do with this ticket, and I shouldn't have slipped them in. Sorry about that.
My thinking about shorthand expression notation is that it should be limited to what is allowed for python identifiers. Underscores should definitely be allowed.
Dots are a different story though. If we allow identifiers plus dots, the rule is no longer simple… and why then stop with dots? You could allow brackets, parenthesis, etc. And really, if you have expressions that include attribute access or anything more complex, how much overhead is it to add the braces?
I must admit I wasn't even aware that the shorthand notation allowed dots before seeing some of your templates on the Trac branch. I do think allowing dots in shorthand expressions hurts clarity/readability.
comment:14 in reply to: ↑ 13 Changed 18 years ago by anonymous
In know we're a bit OT wrt. this ticket, but I consider the subject to be of importance, so please read on.
Replying to cmlenz:
Dots are a different story though. If we allow identifiers plus dots, the rule is no longer simple… and why then stop with dots? You could allow brackets, parenthesis, etc.
The "." is at the same, horizontal, visual level than "_", so I don't think it makes you stop reading, as there is no white-space after such a dot. Also, brackets are not needed, thanks to the unification of attribute access/key access. Usage of "()" would OTOH introduce some kind of vertical barrier, so I agree that this would be confusing and shouldn't be allowed.
Last but not least, use of "." retain the same read-only semantic as access to a variable.
And really, if you have expressions that include attribute access or anything more complex, how much overhead is it to add the braces?
Not much, but really, why then bother having shorthand expressions at all? If you can't use shorthand expressions to read arbitrarily deep values out of your datamodel, you're limited to the non-structured values that you decide to put at the toplevel, which are not that many in practice. Lacking this ability, it could perhaps even encourage the use of the with directive for trivial things (like vars="k = item.key; v = item.value"), which would be worser, IMO.
I must admit I wasn't even aware that the shorthand notation allowed dots before seeing some of your templates on the Trac branch. I do think allowing dots in shorthand expressions hurts clarity/readability.
Well, that's a matter of style... if you want to let some stylistic freedom to Genshi users, please consider leaving this possibility in. The Kid authors seem to share your view about readability, but they nevertheless support the dotted notation as well (http://kid-templating.org/language.html#identifier-shortcut-name).
comment:15 Changed 18 years ago by cboos
(forgot to login, the anonymous from comment:14 was me)
comment:16 Changed 18 years ago by cmlenz
- Milestone changed from 0.3 to 0.4
Gonna do this for the next release.
comment:17 Changed 18 years ago by cmlenz
- Resolution set to fixed
- Status changed from assigned to closed
Applied updated patch in [491].
Hm, note that if one wants a dict in a full expression, the dict constructor could be used instead of {...}.
As such, an alternative to '${{1:2\}}' would be '${dict(1:2)}' which is even arguably cleaner, as no special escaping syntax would be needed.
However, there's still the case of "}" characters within string content, e.g.