Genshi Recipes: Localization
This is code to aid in localization of Genshi templates, without altering the underlying templates. It was originally written by Matt Good, then updated and fixed up by David Fraser.
How it works
First a word on streams. This operates on the Template streams which are lists of events fairly similar to normal Genshi XML streams, but contain other special things (like EXPRessions and SUBstreams). The reason it needs to operate here is that it should take advantage of the template parsing, but it needs to operate before the contents are merged with the template.
In order to do this, we need three parts:
- Extraction of the localization text from templates
- Construction of the localized template stream using the translations and the original template stream
- Use of the localized template stream to generate the resulting page
The code on this page uses gettext - it generates POT files (although they don't currently contain the required header) and uses the ugettext function to translate the template pages (which will use MO files compiled from PO files containing the translations).
Extraction of Localized Text
Here is a module that can be used to extract text from Genshi template streams into POT files, for translation into different languages
import fnmatch import os import re import logging import copy import genshi.core import genshi.input import genshi.eval import genshi.template ignore_tags = ['script', 'style'] include_attribs = ['title', 'alt'] exclude_dirs = ('.AppleDouble', '.svn', 'CVS', '_darcs') gettext_re = re.compile(r"_\(((?:'[^']*')|(?:\"[^\"]*\"))\)") # calculate escapes escapes = [] def make_escapes(pass_iso8859): global escapes if pass_iso8859: # Allow iso-8859 characters to pass through so that e.g. 'msgid # "Höhe"' would result not result in 'msgid "H\366he"'. Otherwise we # escape any character outside the 32..126 range. mod = 128 else: mod = 256 for i in range(256): if 32 <= (i % mod) <= 126: escapes.append(chr(i)) else: escapes.append("\\%03o" % i) escapes[ord('\\')] = '\\\\' escapes[ord('\t')] = '\\t' escapes[ord('\r')] = '\\r' escapes[ord('\n')] = '\\n' escapes[ord('\"')] = '\\"' make_escapes(False) def escape(s): global escapes s = list(s) for i in range(len(s)): s[i] = escapes[ord(s[i])] return ''.join(s) def normalize(s): """This converts the various Python string types into a format that is appropriate for .po files, namely much closer to C style.""" lines = s.split('\n') if len(lines) == 1: s = '"' + escape(s) + '"' else: if not lines[-1]: del lines[-1] lines[-1] = lines[-1] + '\n' for i in range(len(lines)): lines[i] = escape(lines[i]) lineterm = '\\n"\n"' s = '""\n"' + lineterm.join(lines) + '"' return s def lang_extract(potfile, source_files, template_class=None): """extracts text strings from the given source files and outputs them at the end of the given pot file""" fd = open(potfile, 'at+') try: keys_found = {} key_order = [] for fname, linenum, key in extract_keys(source_files, ['.'], template_class): if key in keys_found: keys_found[key].append((fname, linenum)) else: keys_found[key] = [(fname, linenum)] key_order.append(key) for key in key_order: for fname, linenum in keys_found[key]: fd.write('#: %s:%s\n' % (fname, linenum)) fd.write('msgid %s\n' % normalize(key)) fd.write('msgstr ""\n\n') finally: fd.close() def _matching_files(dirname, fileglob): """searches for matching filenames in a directory""" for root, dirs, files in os.walk(dirname): for exclude in exclude_dirs: try: dirs.remove(exclude) except ValueError: pass for fname in fnmatch.filter(files, fileglob): yield os.path.join(root, fname) def extract_keys(files, search_path=None, template_class=None): """finds all the text keys in the given files""" loader = genshi.template.TemplateLoader(search_path) for fname in files: logging.info('Scanning l10n keys from: %s' % fname) try: if template_class is None: template = loader.load(fname) else: template = loader.load(fname, cls=template_class) except genshi.input.ParseError, e: logging.warning('Skipping extracting l10n keys from %s: %s' % (fname, e)) continue for linenum, key in extract_from_template(template): yield fname, linenum, key def extract_from_template(template, search_text=True): """helper to extract linenumber and key pairs from a given template""" return extract_from_stream(template.stream, search_text) def extract_from_stream(stream, search_text=True): """takes a MatchTemplate.stream (not a normal XML Stream) and searches for localizable text, yielding linenumber, text tuples""" # search_text is set to false when extracting from substreams (that are attribute values for an attribute which is not text) # in this case, only Python strings in expressions are extracted stream = iter(stream) tagname = None skip_level = 0 for kind, data, pos in stream: linenum = pos[1] if skip_level: if kind is genshi.core.START: tag, attrs = data if tag.localname in ignore_tags: skip_level += 1 if kind is genshi.core.END: tag = data if tag.localname in ignore_tags: skip_level -= 1 continue if kind is genshi.core.START: tag, attrs = data tagname = tag.localname if tagname in ignore_tags: # skip the substream skip_level += 1 continue for name, value in attrs: if isinstance(value, basestring): if search_text and name in include_attribs: yield linenum, value else: for dummy, key in extract_from_stream(value, name in include_attribs): yield linenum, key elif kind is genshi.template.EXPR: if data.source != "?": # TODO: check if these expressions should be localized for key in gettext_re.findall(data.source): key = key[1:-1] if key: yield linenum, key elif kind is genshi.core.TEXT and search_text: key = data.strip() if key: yield linenum, key elif kind is genshi.template.SUB: sub_kind, sub_stream = data for linenum, key in extract_from_stream(sub_stream, search_text): yield linenum, key
Localization of the Template Stream at Run Time
The following function can then be used to localize the template stream (see below for details on use) The reason that the ugettext is passed in as a function, is that language selection etc needs to happen depending on the language of the user submitting the request, not the machine serving the pages. You can thus pass in a specialized ugettext function that uses the appropriate language for the current user.
def localize_template(template_source_stream, ugettext, search_text=True): """localizes the given template source stream (i.e. genshi.XML(template_source), not the parsed template's stream need to pass in the ugettext function you want to use""" # NOTE: this MUST NOT modify the underlying objects or template reuse will break # in addition, if it calls itself recursively it must convert the result to a list or it will break on repetition # search_text is set to false when extracting from substreams (that are attribute values for an attribute which is not text) # in this case, only Python strings in expressions are extracted stream = iter(template_source_stream) skip_level = 0 for kind, data, pos in stream: # handle skipping whole chunks we don't want to localize (just yielding everything in them) if skip_level: if kind is genshi.core.START: tag, attrs = data tag = tag.localname if tag in ignore_tags: skip_level += 1 if kind is genshi.core.END: tag = data.localname if tag in ignore_tags: skip_level -= 1 yield kind, data, pos continue # handle different kinds of things we want to localize if kind is genshi.core.START: tag, attrs = data tagname = tag.localname if tagname in ignore_tags: skip_level += 1 yield kind, data, pos continue new_attrs = genshi.core.Attrs(attrs[:]) changed = False for name, value in attrs: if isinstance(value, basestring): if search_text and name in include_attribs: new_value = ugettext(search_text) new_attrs.set(name, new_value) changed = True else: # this seems to be handling substreams, so we should get back a localized substream # note: passing search_text=False implies far fewer matches, this may be wasteful and the subcall could be skipped in some cases new_value = list(localize_template(value, ugettext, search_text=(name in include_attribs))) new_attrs.set(name, new_value) changed = True if changed: # ensure we don't change the original string attrs = new_attrs yield kind, (tag, attrs), pos elif kind is genshi.template.EXPR: if data.source != "?": # TODO: check if these expressions should be localized for key in gettext_re.findall(data.source): key = key[1:-1] if key: new_key = ugettext(key) # TODO: if we do this, it needs to be fixed :-) new_data = genshi.eval.Expression(data.source.replace(key, new_key)) # we lose the following data, but can't assign as its readonly # new_data.code.co_filename = data.code.co_filename # new_data.code.co_firstlineno = data.code.co_firstlineno yield kind, data, pos elif kind is genshi.core.TEXT and search_text: # we can adjust this as strings are immutable, so this won't change the original string key = data.strip() if key: new_key = ugettext(key) data = data.replace(key, new_key) yield kind, data, pos elif kind is genshi.template.SUB: sub_kind, sub_stream = data new_sub_stream = list(localize_template(sub_stream, ugettext, search_text=search_text)) yield kind, (sub_kind, new_sub_stream), pos else: yield kind, data, pos
Page Generation with Localized Templates
In order to use the modified Template stream, we basically need to do some processing before the normal Genshi mechanism takes over...
This class allows inclusion of "prefilters" that operate before the Template stream's standard filters (you can't just use a filter to do this, it causes problems):
class PrefilterMarkupTemplate(genshi.template.MarkupTemplate): """Derived markup template that can receive prefilters in its generate method""" # TODO: try and upstream this into genshi def generate(self, prefilters, *args, **kwargs): """Apply the template to the given context data. Any keyword arguments are made available to the template as context data. Only one positional argument is accepted: if it is provided, it must be an instance of the `Context` class, and keyword arguments are ignored. This calling style is used for internal processing. @return: a markup event stream representing the result of applying the template to the context data. """ if args: assert len(args) == 1 ctxt = args[0] if ctxt is None: ctxt = genshi.template.Context(**kwargs) assert isinstance(ctxt, genshi.template.Context) else: ctxt = genshi.template.Context(**kwargs) stream = self.stream for prefilter in prefilters: # TODO: add support for context in prefilters stream = prefilter(iter(stream)) for filter_ in self.filters: stream = filter_(iter(stream), ctxt) return genshi.core.Stream(stream)
This derived class then allows you to call the above localization function as a prefilter on templates. It uses the domain_name as a parameter (this corresponds to which PO/MO file to use for translation, but assumes you can construct or retrieve a translation object for the current user on the fly using a get_translation function (not described here):
class LocalizeMarkupTemplate(PrefilterMarkupTemplate): """Derived markup template that can handle localizing before stream generation""" def __init__(self, source, basedir=None, filename=None, loader=None, encoding=None, domain_name=None): """Initialize a template from either a string or a file-like object.""" super(LocalizeMarkupTemplate, self).__init__(source, basedir=basedir, filename=filename, loader=loader, encoding=encoding) self.domain_name = domain_name def localize_prefilter(self, stream): """prefilter for localizing...""" translation = get_translation(self.domain_name) stream = genshi.core.Stream(stream) localized_stream = genshigettext.localize_template(stream, translation.ugettext) return list(iter(localized_stream)) # TODO: try and persuade genshi to accept a stream directly here instead of using self.stream - or accept prefilters # then we won't use such fragile copied code... def generate(self, prefilters, *args, **kwargs): """Apply the template to the given context data. Any keyword arguments are made available to the template as context data. Only one positional argument is accepted: if it is provided, it must be an instance of the `Context` class, and keyword arguments are ignored. This calling style is used for internal processing. @return: a markup event stream representing the result of applying the template to the context data. """ return super(LocalizeMarkupTemplate, self).generate(prefilters + [self.localize_prefilter], *args, **kwargs)
Using the above code
Basically the above code should really be integrated into Genshi, but if you want to use it first:
- Place it in a module that you can use
- Run the extraction code to get the .pot files containing your translatable text (and verify that they seem sensible)
- Use a translation editor (something like Pootle for online translation or poedit or kbabel for a local GUI
- Use the LocalizationTemplate class rather than the standard MarkupTemplate class to parse your templates, and set up the required translation hooks for generation
- Hint: Don't waste your time trying to add _("") in your templates, all the text is automatically extracted.
See also: GenshiRecipes, Internationalization and Localization