Check for duplicate items in a list with Ansible using a custom filter

Ansible provides some useful ways to manipulate data and filter items from sets or lists. None of these built-in filters provided what I thought would be a simple task - show duplicate items in a list.

Let's assume that I mistyped and entered a duplicate IP address into a list - like so:

- {mac: 3E:79:66:00:87:F7, ip: 192.168.1.1, hostname: dns}
- {mac: F2:8F:35:40:8A:9C, ip: 192.168.1.1, hostname: irc}
- {mac: 92:A9:C6:04:AE:CA, ip: 192.168.1.3, hostname: dev}

I need a way to not only check each item for duplicate but also for convenience it would be nice to print out the error to the user. Imagine this list is 100+ items long!

You'd think using one of the set-theory filters like unique or difference would work. unique is designed to take a list with duplicate items and just completely discard these duplicates. difference compares two lists and both sides of the comparison contain the same information so it returns that there is no difference.

There's a full example of my entire solution in my infra repo.

Custom Filter

The answer turned out to be straight forward. Create a custom filter - source. Create a folder in the root of your Ansible project named filter_plugins and put the following code into a file named something like dupcliate_filter.py.

#!/usr/bin/python

class FilterModule(object):
    def filters(self):
        return {'duplicates': self.duplicates}

    def duplicates(self, items):
        sums = {}
        result = []

        for item in items:
            if item not in sums:
                sums[item] = 1
            else:
                if sums[item] == 1:
                    result.append(item)
                sums[item] += 1
        return result

Then in your Ansible code you can call this new custom filter named duplicates thus:

- name: dupe check
  debug:
    msg: "Duplicate entry: {{ item | duplicates }}"
  loop:
    - "{{ dhcp_reservations | selectattr('mac', 'defined') | map(attribute='mac') }}"
    - "{{ dhcp_reservations | selectattr('ip', 'defined') | map(attribute='ip') }}"
    - "{{ dhcp_reservations | selectattr('hostname', 'defined') | map(attribute='hostname') }}"

This produces the following output:

TASK [ktz-dhcp-dns : dupe check] ****************************************************************************************************************************************************************************************
ok: [10.42.0.201] => (item=['3E:79:66:00:87:F7', 'F2:8F:35:40:8A:9C', '92:A9:C6:04:AE:CA']) => {
    "msg": "Duplicate entry: []"
}
ok: [10.42.0.201] => (item=['192.168.1.1', '192.168.1.1', '192.168.1.3']) => {
    "msg": "Duplicate entry: ['192.168.1.1']"
}
ok: [10.42.0.201] => (item=['dns', 'irc', 'dev']) => {
    "msg": "Duplicate entry: []"
}

Thus we can see easily that the duplicated field here was the IP address 192.168.1.1.

I expected writing a custom filter was going to be difficult and cumbersome but it was very simple and in the end, much faster than trying to turn YAML into a programming language!