Edgewall Software

Opened 18 years ago

Closed 18 years ago

#37 closed enhancement (fixed)

[PATCH] Make it possible to have a dict inside a full expression

Reported by: cboos Owned by: cmlenz
Priority: minor Milestone: 0.4
Component: Template processing Version: 0.2
Keywords: Cc:

Description

Currently, the regexp for a full expression (i.e. ${...}) will stop at the first "}" it finds. This makes it impossible to embed a litteral dictionary inside such a full expression.

It would be nice to be able to escape this character, e.g. by using a backquote:

  • markup/tests/template.py

     
    604604        self.assertEqual(Template.EXPR, parts[0][0])
    605605        self.assertEqual('bla', parts[0][1].source)
    606606
     607    def test_interpolate_full_escape(self):
     608        parts = list(Template._interpolate('${{1:2\}}'))
     609        self.assertEqual(1, len(parts))
     610        self.assertEqual(Template.EXPR, parts[0][0])
     611        self.assertEqual('{1:2}', parts[0][1].source)
     612
    607613    def test_interpolate_mixed1(self):
    608614        parts = list(Template._interpolate('$foo bar $baz'))
    609615        self.assertEqual(3, len(parts))
  • markup/template.py

     
    825825
    826826        self.stream = stream
    827827
    828     _FULL_EXPR_RE = re.compile(r'(?<!\$)\$\{(.+?)\}', re.DOTALL)
     828    _FULL_EXPR_RE = re.compile(r'(?<!\$)\$\{(.+?)(?<!\\)\}', re.DOTALL)
    829829    _SHORT_EXPR_RE = re.compile(r'(?<!\$)\$([a-zA-Z][a-zA-Z0-9_\.]*)')
    830830
    831831    def _interpolate(cls, text, filename=None, lineno=-1, offset=-1):
     
    844844            for idx, group in enumerate(patterns.pop(0).split(text)):
    845845                if idx % 2:
    846846                    try:
     847                        group = group.replace(r'\}','}')
    847848                        yield EXPR, Expression(group, filename, lineno), \
    848849                              (filename, lineno, offset)
    849850                    except SyntaxError, err:

Note: This relates to the suggestion of oliver on the mailing list about escaping ";" in the vars attribute of <py:with> directives. Here, I don't think that doubling "}" for escaping it would be a good idea. I think that we should even consider using backquote as the general way of quoting. That would work in all cases (also for "\;" in <py:with>).

Attachments (6)

escape_rbrace-r255.patch (1.9 KB) - added by cboos 18 years ago.
updated patch, on current trunk (hint: ready for inclusion ;)
balanced.diff (6.4 KB) - added by oliver.cope@… 18 years ago.
patch for matching balanced brackets
interpolate-fsm-r281.patch (12.1 KB) - added by oliver.cope@… 18 years ago.
Match balanced braces, even in quoted strings, using a simple fsm based parser
interpolate-parser-r281.patch (9.6 KB) - added by oliver.cope@… 18 years ago.
Match balanced braces, even in quoted strings, using python's parser module
interpolate-tokenize_r306.patch (8.0 KB) - added by cmlenz 18 years ago.
Another attempt, based on the Itpl.py code
interpolate-tokenize_r311.diff (8.5 KB) - added by cmlenz 18 years ago.
Updated patch using the Itpl approach

Download all attachments as: .zip

Change History (23)

comment:1 Changed 18 years ago by cboos

  • Component changed from General to Template processing

comment:2 Changed 18 years ago by cboos

Hm, note that if one wants a dict in a full expression, the dict constructor could be used instead of {...}.

As such, an alternative to '${{1:2\}}' would be '${dict(1:2)}' which is even arguably cleaner, as no special escaping syntax would be needed.

However, there's still the case of "}" characters within string content, e.g.

 ${flag and "this is { ... \}" or "this is [ ... ]"}

Changed 18 years ago by cboos

updated patch, on current trunk (hint: ready for inclusion ;)

comment:3 Changed 18 years ago by cmlenz

For the record, the reason I haven't applied this patch yet is that I'm still hoping we'll be able to find a way to not require the escaping at all. I consider the backslash escaping a last resort if we can't come up with something better.

Changed 18 years ago by oliver.cope@…

patch for matching balanced brackets

comment:4 Changed 18 years ago by cmlenz

Lovely! Thanks a lot, Oliver!

Now the last missing part is to ignore braces inside string literals :-P

comment:5 Changed 18 years ago by oliver.cope@…

Well, I have something that does that now. It is somewhat inelegant: it scans to the first '}', then uses python's parser module to test if what it has found is syntactically valid, and if not scans to the next '}' and so on. It runs approximately three times slower than the existing implementation, a noticeable slowdown.

I also tried writing a simple fsm based parser to detect string literals, but that performed even more slowly.

I'm not happy with either solution, and I can't think how else to tackle this one :o(

Of course, so long as any brackets inside string literals balance, the attached patch will still work. And if the brackets do not balance then it is always possible to make them balance, so perhaps we should not worry too much about this particular case?

comment:6 Changed 18 years ago by cboos

... and what about a completely different approach: doubling the braces used to enclose the expression?

  • markup/tests/template.py

     
    717717        self.assertEqual(Template.EXPR, parts[0][0])
    718718        self.assertEqual('bla', parts[0][1].source)
    719719
     720    def test_interpolate_full_escape(self):
     721        parts = list(Template._interpolate('${{ {1:2} }}'))
     722        self.assertEqual(1, len(parts))
     723        self.assertEqual(Template.EXPR, parts[0][0])
     724        self.assertEqual('{1:2}', parts[0][1].source)
     725
    720726    def test_interpolate_mixed1(self):
    721727        parts = list(Template._interpolate('$foo bar $baz'))
    722728        self.assertEqual(3, len(parts))
  • markup/template.py

     
    853853        self.stream = stream
    854854
    855855    _FULL_EXPR_RE = re.compile(r'(?<!\$)\$\{(.+?)\}', re.DOTALL)
     856    _FULL_EXPR2_RE = re.compile(r'(?<!\$)\$\{\{(.+?)\}\}', re.DOTALL)
    856857    _SHORT_EXPR_RE = re.compile(r'(?<!\$)\$([a-zA-Z][a-zA-Z0-9_\.]*)')
    857858
    858859    def _interpolate(cls, text, filename=None, lineno=-1, offset=-1):
     
    889890                    offset += len(lines[-1])
    890891                else:
    891892                    offset += len(grp)
    892         return _interpolate(text, [cls._FULL_EXPR_RE, cls._SHORT_EXPR_RE])
     893        return _interpolate(text, [cls._FULL_EXPR2_RE, cls._FULL_EXPR_RE,
     894                                   cls._SHORT_EXPR_RE])
    893895    _interpolate = classmethod(_interpolate)
    894896
    895897    def generate(self, *args, **kwargs):

comment:7 follow-ups: Changed 18 years ago by cmlenz

cboos: that's still similar to having to escape... it's not the backslash that I dislike, it's that you need to think about this stuff at all as a template author :-P

So IMHO the ideal solution would mean dict literals in expressions “Just Work”.

Oliver, can you attach that code? Personally, I'm less concerned about parsing performance than about render performance.

Also, this code may be interesting:

http://lfw.org/python/Itpl.py

(found that via PEP 215)

comment:8 in reply to: ↑ 7 Changed 18 years ago by cboos

Replying to cmlenz:

... it's that you need to think about this stuff at all as a template author :-P

OK, I understand your p.o.v, but mine (as a user) is that I much better prefer a simple rule that will always work than a clever algorithm that might break in unexpected circumstances and leave me with a weird backtrace...

Also, here the "user" is writing some Python code, so she's at least familiar with the Python syntax. In Python, there's a very similar quoting idiom, it's the triple single quote or triple double quote syntax. I'd amend comment:6 proposal to use triple braces, so that this would match exactly the Python way.

e.g.

This is part of a template text ${{{ """
 Now this is part of a Python "expression. This is "%".
""" % {1: 'Bad', 2: 'Average', 3: 'Good'}[level] }}}


So IMHO the ideal solution would mean dict literals in expressions “Just Work”.

Of course, if you find the ideal solution, I would have nothing against it, I'm only suggesting a sub-optimal approach which is not so bad, IMO (and better than not being able to have dict litterals at all ;)

Changed 18 years ago by oliver.cope@…

Match balanced braces, even in quoted strings, using a simple fsm based parser

Changed 18 years ago by oliver.cope@…

Match balanced braces, even in quoted strings, using python's parser module

comment:9 in reply to: ↑ 7 Changed 18 years ago by oliver.cope@…

Replying to cmlenz:

Oliver, can you attach that code? Personally, I'm less concerned about parsing performance than about render performance.

Attached. Two versions: one uses python's parser module as described in my previous comment, the other uses an fsm based parser. Both handle quoted strings correctly. I'm currently using the FSM version in my own project as it is marginally faster, and also to give it a thorough working out.

Unfortunately, just too late, I noticed I'd left the doctests commented out for testing purposes when generating the diff, and don't have permission to replace the patches with corrected versions.

Olly.

Changed 18 years ago by cmlenz

Another attempt, based on the Itpl.py code

comment:10 Changed 18 years ago by cmlenz

  • Status changed from new to assigned

I've added my take on this, which is based on the Itpl.py module I linked to above. I've not tested the performance of this one, but I think it is rather clean.

Also included in the patch is a change of the escaping from double dollars ($$) to using a backslash.

comment:11 Changed 18 years ago by cmlenz

Oh, plus shorthand expressions no longer accept non-alphanumeric characters (i.e. no dots or underscores). I'd like to also add the restriction that they should start with letters, to match python identifiers. Not yet sure whether those changes (plus the escaping change) should be included.

comment:12 Changed 18 years ago by cboos

Ouch! I'd happily trade the possibility to have litteral dicts in full expressions against the support for dots and underscores in shorthand expressions. I mean, if you don't support those, what would be the point of having shorthand expressions at all?

comment:13 follow-up: Changed 18 years ago by cmlenz

Well, first of all, those aspects of the patch have nothing to do with this ticket, and I shouldn't have slipped them in. Sorry about that.

My thinking about shorthand expression notation is that it should be limited to what is allowed for python identifiers. Underscores should definitely be allowed.

Dots are a different story though. If we allow identifiers plus dots, the rule is no longer simple… and why then stop with dots? You could allow brackets, parenthesis, etc. And really, if you have expressions that include attribute access or anything more complex, how much overhead is it to add the braces?

I must admit I wasn't even aware that the shorthand notation allowed dots before seeing some of your templates on the Trac branch. I do think allowing dots in shorthand expressions hurts clarity/readability.

comment:14 in reply to: ↑ 13 Changed 18 years ago by anonymous

In know we're a bit OT wrt. this ticket, but I consider the subject to be of importance, so please read on.

Replying to cmlenz:

Dots are a different story though. If we allow identifiers plus dots, the rule is no longer simple… and why then stop with dots? You could allow brackets, parenthesis, etc.

The "." is at the same, horizontal, visual level than "_", so I don't think it makes you stop reading, as there is no white-space after such a dot. Also, brackets are not needed, thanks to the unification of attribute access/key access. Usage of "()" would OTOH introduce some kind of vertical barrier, so I agree that this would be confusing and shouldn't be allowed.

Last but not least, use of "." retain the same read-only semantic as access to a variable.


And really, if you have expressions that include attribute access or anything more complex, how much overhead is it to add the braces?

Not much, but really, why then bother having shorthand expressions at all? If you can't use shorthand expressions to read arbitrarily deep values out of your datamodel, you're limited to the non-structured values that you decide to put at the toplevel, which are not that many in practice. Lacking this ability, it could perhaps even encourage the use of the with directive for trivial things (like vars="k = item.key; v = item.value"), which would be worser, IMO.


I must admit I wasn't even aware that the shorthand notation allowed dots before seeing some of your templates on the Trac branch. I do think allowing dots in shorthand expressions hurts clarity/readability.

Well, that's a matter of style... if you want to let some stylistic freedom to Genshi users, please consider leaving this possibility in. The Kid authors seem to share your view about readability, but they nevertheless support the dotted notation as well (http://kid-templating.org/language.html#identifier-shortcut-name).

comment:15 Changed 18 years ago by cboos

(forgot to login, the anonymous from comment:14 was me)

Changed 18 years ago by cmlenz

Updated patch using the Itpl approach

comment:16 Changed 18 years ago by cmlenz

  • Milestone changed from 0.3 to 0.4

Gonna do this for the next release.

comment:17 Changed 18 years ago by cmlenz

  • Resolution set to fixed
  • Status changed from assigned to closed

Applied updated patch in [491].

Note: See TracTickets for help on using tickets.