Becoming a Jinja Ninja

Jinja is a templating framework used within Airflow

For those of us coming to Airflow without a background in Web Development (e.g. with Flask) the power of jinja can quite hidden. As of writing this, none of the examples in the Airflow documentation show off anything more than:

{{ my_var }} or {{ my_func(a_variable) }}

This is missing out loads of awesome functionality which can save you so much time, and make our code much more readable!

Additionally, from Python 3.6, this sort of templating is so easy with fstrings:

my_template=f'{my_var}' or my_template=f'{my_func(a_variable)}'

Why should I bother to use jinja at all? Happily, the answer is because jinja offers so much more, in a much clearer to read format than sting manipulation in vanilla Python.

Filters

If columns = ['col1', 'col2', 'col3'] then we easily concatenate the list of columns e.g.


SELECT {{ columns|join(', ') }}
FROM {{ my_table }}

would be:


SELECT col1, col2, col3
FROM schema.table

Other built in Filters

Statements

Using the syntax {% ... %} allows for some very powerful logic within the template

Tests (aka Conditions)


SELECT * 
FROM {{ my_table }}
{% if my_filter %}
WHERE {{ my_filter }}
{% endif %}

Loops

If columns = ['col1', 'col2', 'col3'] then we can do more complicated formatting e.g.


SELECT
{% for col in columns %}
{{ col }}{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ my_table }}

would be:

SELECT
col1,
col2,
col3
FROM schema.table

This format can be much more readable for large lists than the |join(', ') filter

Particularly useful its the {% if not loop.last %},{% endif %} which is built in to jinja. Other loop helpers are available too

Single (variable) Object Pattern

Consider the template


SELECT {{ columns|join(', ') }}
FROM {{ schema }}.{{ name }}

with

my_template.render(
    columns=columns,
    schema=schema,
    name=name,
)

vs this template


SELECT {{ table.columns|join(', ') }}
FROM {{ table.schema }}.{{ table.name }}

with

my_template.render(table=table)

Pros

  • Adding new variable requires no change to the rendering, just adding to the “model” object if it doesn’t already exist

SELECT {{ table.columns|join(', ') }}
FROM {{ table.schema }}.{{ table.name }}
LIMIT {{ table.limit }}

  • If the template uses variables from multiple “things” it’s clearer which are which without a huge number of variables e.g.

SELECT *
FROM {{ table.schema }}.{{ table.name }} AS t1
LEFT JOIN {{ table_2.schema }}.{{ table_2.name }} AS t2
ON t1.{{ table.id_col }} = t2.{{ table_2.f_key_col }}

Cons

  • ???

Conclusion

Combining these simple aspects together can lead to some very elegant templates

These were just a few of the things I found super useful create templates, mostly for ELT workflows with SQL in a Data Warehouse

See the jinja Template Documentation for a more thorough overview