In December, I did my first Advent of Code (opens in new tab). I used Ansible, which is a terrible choice for programming challenges but that’s part of the fun!

Note

I gave myself these rules:

  • No LLMs or search engines.
  • No external commands (eg, ansible.builtin.shell etc).
  • No custom plugins (eg, my_filter.py).
  • No extra dependencies (eg, json_query).
  • No Jinja blocks (eg, {% … %}).

People have used Ansible for Advent of Code before, but all of them used Jinja blocks. Jinja block loops are hundreds of times faster than Ansible loops, but I restricted myself to only Ansible loops for my solutions (opens in new tab).

Ansible loops aren’t designed for iterating 10,000 times on a single host. So being forced to abuse them led me to discover two curious things:

  1. Looping over include_tasks has a slow-down that scales quadratically with the number of iterations.

  2. Appending to lists or dicts in a loop has O(N^2 * M) complexity for both time and memory (N=iterations, M=data_size). This easily causes out-of-memory problems.

Explanations below!

1. LoopsLink to this heading

Even looping just a single task is already slow. This takes 57 seconds:

# Use ANSIBLE_STDOUT_CALLBACK=selective
# to avoid testing terminal print speed.
- hosts: localhost
  tasks:
    - name: 'set var to true 50k times'
      set_fact:
        foo: true
      loop: '{{ range(50000) | list }}'

# Executed in 57s.

To loop over several tasks, you’d use include_tasks to include another file:

- hosts: localhost
  tasks:
    # Looping over an `include_tasks`.
    - include_tasks:
        file: other_tasks.yml
      loop: '{{ range(50000) | list }}'

If other_tasks.yml has the same single task as before, it runs 14x slower:

# other_tasks.yml:
- set_fact:
    foo: true

# Executed in 14m13s.

There’s also a huge initialization overhead when looping include_tasks. In this case, Ansible spends 2 minutes just preparing to start the loop!

Why?Link to this heading

Every loop iteration seems to copy and re-evaluate nearly everything (eg, config, vars, templates etc).

It’s even worse when looping over an include_tasks. Every iteration is treated as a separate include (ie, each with its own copies of everything), and even just starting the loop can take minutes.

The slow-down when looping include_tasks scales quadratically with the number of iterations:

  • include_tasks gets deduplicated by searching a list for matches.
  • When looping include_tasks, every iteration is unique (due to loop vars) so can’t be deduped:
    • iteration 1: search list (0 items), no match so append
    • iteration 2: search list (1 item), no match so append
    • iteration N: search list (N-1 items), no match so append
    • total comparisons: 0 + 1 + ... + (N-1) = N(N-1)/2 = O(N^2)
    • N=50k iterations means 1.25 billion failed comparisons!

MitigationsLink to this heading

You can sometimes avoid looping over a single task by chaining filters. For example:

# Fast
- set_fact:
    result: >-
      {{
        items
        | map('int')
        | select('gt', 0)
      }}

# Slow
- set_fact:
    result: >-
      {{
        result | default([]) + [item | int]
      }}
  loop: '{{ items }}'
  when: item | int > 0

When you need to loop over more than one task, it’s natural to reach for include_tasks. But in some cases you can avoid this by using:

2. Appending to lists/dictsLink to this heading

Let’s say we want to append data into a list in a loop. For example, looping every combination of inputs from an Advent of Code puzzle (which can sometimes mean 100k+ iterations) and storing the results of a calculation:

- hosts: localhost
  tasks:

  - name: 'list of 100 integers'
    set_fact:
      my_list: '{{ range(100) | list }}'
      result: []

  - name: 'append 1000 times'
    set_fact:
      result: '{{ result + [my_list] }}'
    loop: '{{ range(100) | list }}'

What happens if we increase the number of iterations?

IterationsRuntimePeak RAM
1001 second10 MiB
1,00070 seconds1 GiB
10,0002 hours (killed)>64 GiB

Appending to create a list of only 1M integers (100 * 10000) exhausted all the RAM on my desktop! The same thing applies when appending to a dict in a loop with combine().

Why?Link to this heading

Complexity is O(N^2 × M) (where N=iterations, M=data_size) for both time and memory!

Note

In Python, you can use .append() or .extend() instead, but not in Ansible. Using + or combine() in a loop is O(N^2) which is already bad, but the reality below is even worse.

This happens for two reasons:

  1. Ansible’s finalization (ie, template result processing) recursively recreates nested lists/dicts.
    • Each inner my_list is recreated as a new list object.
    • This happens for all nested lists on every iteration, not just the new one!
  2. Ansible stores every iteration’s result in a list until the loop finishes.
    • iteration 1: Ansible stores result, which has one copy of my_list
    • iteration 2: Ansible stores another result, which has two copies
      • that’s now 1 + 2 = 3 copies of my_list
    • after iteration N, Ansible’s internal store now has 1 + 2 ... + N = N(N+1)/2 copies of my_list
Relevant code from upstream Ansible.

From Ansible’s _jinja_bits.py (opens in new tab):

# L1024, recursively recreate nested lists:
def _finalize_list(o, mode):
    for v in o:
        if v is not Omit:
            yield _finalize_template_result(v, mode)

From Ansible’s task_executor.py (opens in new tab):

def _run_loop(self, items):
    results = []
    for item_index, item in enumerate(items):
        # L310, result includes ansible_facts:
        res = self._execute(variables=task_vars)
        # L366, append every iteration's result:
        results.append(res)
    return results

Another way to think about it is that the first factor prevents data sharing and the second factor prevents garbage collection.

The combined effect is catastrophic. After N iterations there are 1 + 2 + ... + N references to my_list, and each of those is a separate copy (rather than a shared reference):

  • N=100 iterations = 5k copies of my_list in memory
  • N=1,000 iterations = 500k copies
  • N=10,000 iterations = 50 million copies!

Each iteration recreates the entire result so far as a new list object. So not only does it take increasingly longer, but it also uses up increasingly more RAM until you question your life choices.

MitigationsLink to this heading

Don’t use Ansible for Advent of Code!

But if you must, a workaround is to append in batches to smaller lists and combine them at the end with a single expression:

# Combine batch_0, batch_1 ... batch_N
- set_fact:
    result: >-
      {{
        range(num_batches)
        | map('string')
        | map('regex_replace', '^', 'batch_')
        | map('extract', vars)
        | sum(start=[])
      }}