How to built a simple template engine with Python and regex

Prologue

As I mentioned previously I want to create a static content creation system. The first step is A Template Engine. Rather than building a fully featured template engine, I am planning on what is just needed, in this major iteration.

I have also saved some bonus features for later major iterations 🎊.

With that being said, this major iteration(namely v1.0.0) will have 2 basic features:

Including external templates into another, OR Inheritance, I guess 🤔
Looping over a dataset to produce multiple pages

Before anything, we should decide on the syntax. The generic one I have decided looks like...

{ macro_name macro_parameter }

Without further ado, let's go 🏃‍♀️

1. Including external templates into another

For this, the syntax would look like this to embed another page called index.html into base.html

base.html

  <html>
  <head>...</head>
  <body>
      <!-- some generic content -->
      { include content.main }
  </body>
  </html>

index.html
```
  <h1>Welcome to SPC</h1>
```

So what I want to do is to read through base.html and replace the line if {} is encountered. We could do this in many different ways, but an easy one is the regex way.

regex stands for Regular Expression

The usage of regex with python is much simple than other languages make it seem. If you want me to do a swing-by regex with python, please let me know in the comments.

So to substitute the template we would do something like

import re # import the standard regex library

pattern = r'{\s?\w+\s(\w+.\w+)\s?}' # regex pattern to search for
specimen = """
<html>
    <head>...</head>
    <body>
        <!-- some generic content -->
        { include content.main }
    </body>
</html>
"""
replace = "<h1>Welcome to SPC</h1>"

parsed_str = re.sub(pattern, replace, specimen) # using .sub() from library

Now if we write parsed_str to a file, will be the page we intended for. Now, let's encapsulate it into a function for modularity and to be DRY. Thus, the function would be

def eval_include(specimen, replacement):
    global pattern
    return re.sub(pattern, replacement, specimen)

If you are disgusted by the global keyword, just so you know, I am coming from assembly language and Cheat-Engine 😜, I am pretty comfortable with it.

Now, an end user might use the library like

from os.path import realpath
from mathew.macros import eval_include

base = ""
with open(realpath("templates/base.html"), "r") as b:
    base = b.read()

index_template = ""
with open(realpath("templates/index.html"), "r") as i:
    index_template = i.read()

with open(realpath("out/index.html"), "w") as i:
    i.write(
        eval_include(base, index) # do the templating magic 🧙‍♂️
    )

Parsed page can be found in the out/ dir. File discovery and all other stuff will be automated later. For now, let's just focus on one thing.

2. Looping over a dataset to produce multiple pages

Let's say, we have a list of article titles to display on the homepage of the blog page. E.g.

pubslist.html

 <section>
     <h2>Patrician Publications</h2>
     { include pubsdetail.html }
 </section>

pubslistitem.html

 <article>
         <h4>{ eval pubs.title}</h4>
         <span>{eval pubs.cat }</span>
         <p>{ eval pubs.sum }</p>
 </article>

and the dataset

 {"pubs": [
 {"title": "Some 404 content", "cat": "kavik", "sum": "Summary 501"},
 {"title": "Some 403 content", "cat": "eric", "sum": "Summary 502"},
 {"title": "Some 402 content", "cat": "beric", "sum": "Summary 503"},
 {"title": "Some 401 content", "cat": "manuk", "sum": "Summary 504"},
 ]}

The dataset can be mapped to python's dict without any logic. The difference between embedding another template from evaluating a variable and creating many pages by just replacing the data in the template appropriately and embedding the end-string to the destination template. Let's do it, shall we?

For evaluating the variable, we could use the Groups feature in the regex. That's what the () around the \w+.\w+ in the pattern for. We can easily access the matched string slice by the .group() method on the match object returned by re lib-functions.

str_1 = "Hello 123"
pattern = r'\w+\s(\d+)'
digits = re.finditer(patter, str) # returns aggregation of `match` objects

for digit in digits:
    print(digit.group(1)) # 123

Notice we are calling for 1, not 0. Nothing that the lib is 1-index, it is 0-indexed but 0^th index is the entire str, "Hello 123"

Remember the .sub() method, its second parameter accepts either str or a callable. This callable will get a match object as an argument for each matched pattern validates. So we can produce dynamic replacements based on each match like...

# retriving the key from template string
key = m.group(1) # == pubs.title
key = key.split(".") # == ["pubs", "title"]
key = key[1] # == "title"

# evaluating the variable with i^th record from dataset
re.sub(
       pattern, # the pattern
       lambda m: dataset["pubs"][i][key]
)

If lambda is mysterious for you, it is a way to define an anonymous or inline function in python

Defining functions for lib API be

# map each datumset
def __eval_map(string, data):
    global pattern
    return re.sub(
        pattern, lambda m: data[m.group(1).split(".")[1]], string
    )

# parse the batch of dataset
def parse_template(template, data):
    return [
            __eval_map(template, datum)
            for datum in data
        ]

parse_template returns aggregated results using list comprehension syntax, if you are unfamiliar with the syntax let me know in the comment

So accessing the key to evaluate is just as breezy as...

from os.path import realpath
from mathew.macros import parse_template, eval_include

specimen = """
<article>
    <h4>{ eval pubs.title}</h4>
    <span>{eval pubs.cat }</span>
    <p>{ eval pubs.sum }</p>
</article>
"""
dataset = {
       "pubs": [
        {"title": "Some 404 content", "cat": "kavik", "sum": "Summary 501"},
        {"title": "Some 403 content", "cat": "eric", "sum": "Summary 502"},
        {"title": "Some 402 content", "cat": "beric", "sum": "Summary 503"},
        {"title": "Some 401 content", "cat": "manuk", "sum": "Summary 504"},
        ],
    }

# parse each `<article>` tag for each list item
parsed_str = parse_template(specimen, dataset["pubs"])

# join the `<article>` tag-group
pubs_list_items = "".join(parsed_str)

pubs_list_template = ""
with open(realpath("templates/pubslist.html"), "r") as p:
    pubs_list_template = p.read()

# parse the `pubs_list` itself
parsed_list = eval_include(pubs_list_template, pubs_list_items) 

# write the final file with base
with open(realpath("out/pubs.html"), "w") as i:
    i.write(
        eval_include(base, parsed_list)
    )

Final pubslist.html will be in out/ directory.

Done?

Not quite so. Did you notice the fact, that we still have to read the template string manually, have the data populate in a specific format and the parsing of the template is still manual.

These are for later. For now, we have a simple working template engine that does the job I intended it for. I am happy with it.

Another thing, keen eyes might have noticed is the macro_name in the template does nothing, in fact, if you swap include with eval or anything, as long as the latter part is valid, the script does its job. This is a bad design but the worst part is our eval_include allows only one template. Gotta fix that!

Epilogue

I guess I don't have anything further, so I will just sign off, this is BE signing off.

Cover by Suzy Hazelwood