Everyone is welcome here (except, of course, those who have borrowed books from me for and have not returned them yet 😉)

Doit: A make tool implemented in Python

Posted on novembre 14, 2022 in computer-science

make is a very useful tool to automate the creation of files depending on other files. It takes into account the modification times of files to perform only the absolutely necessary actions. make was invented to assist compilation, and is considered a developer's tool. As such, it is not available on every computer. I always thought that it would be nice to have a pure Python tool emulating the main functionality of make. I was quite happy to discover that such a Python tool actually exists. It is named doit.

Here is a Makefile to builds the table of word frequencies from Alice in Wonderland.

all: alice_frequency_table.csv

alice.txt:
    curl https://www.gutenberg.org/files/11/11-0.txt --output alice.txt

alice_nopunct.txt: alice.txt
    sed 's/[[:punct:]]/ /g' $^  |  tr -s "[:blank:]" > $@

alice_tokenized.txt: alice_nopunct.txt
    sed 's/[[:blank:]]/\n/g' $^ > $@

alice_lowercase.txt: alice_tokenized.txt
    gawk '$$0{print tolower($$0)}' $^ > $@

alice_frequency_table.csv: alice_lowercase.txt
    sort $^ | uniq -c | sort -nr > $@

Now, here is the dodo.py file that doit will parse to behave like make with Makefile above:

DOIT_CONFIG = {'action_string_formatting': 'both'}


def task_gettext():
    """Download the text of _Alice in Wonderland_"""
    return {
        "targets": ["alice.txt"],
        "uptodate": [True],
        "actions": ["curl https://www.gutenberg.org/files/11/11-0.txt \
                     --output {targets}"],
        }


def task_removepunct():
    """remove punctuation"""
    return {
        "targets": ["alice_nopunct.txt"],
        "file_dep": ["alice.txt"],
        "actions": ["sed 's/[[:punct:]]/ /g' {dependencies}  | \
                         tr -s '[:blank:]' > {targets}"]
        }


def task_tokenize():
    """replace whitespaces by newlines"""
    return {
        "targets": ["alice_tokenized.txt"],
        "file_dep": ["alice_nopunct.txt"],
        "actions": ["sed 's/[[:blank:]]/\\n/g' {dependencies} > {targets}"]
        }


def task_tolowercase():
    """convert text to lower case"""
    return {
        "targets": ["alice_lowercase.txt"],
        "file_dep": ["alice_nopunct.txt"],
        "actions": ["gawk '$0{{print tolower($0)}}' \
                             {dependencies} > {targets}"]
        }


def task_computefreqs():
    """tabulate the token frequencies"""
    return {
        "targets": ["alice_frequency_table.csv"],
        "file_dep": ["alice_lowercase.txt"],
        "actions": ["sort {dependencies} | uniq -c | sort -nr > {targets}"],
        }

Once you have installed doit with pip install doit, you can run it with doit. It will search for dodo.py in the current working directory, and run the necessary actions (and only them) to create the missing targets. It will use md5 sums saved in an internal database to check if files have been modified and if and which actions need to be executed to update the targets.

The dodo.py file is, no doubt, more verbose than the original Makefile.

Yet, if in the above example, all actions are commands to be executed inside a shell, doit'actions can also execute Python functions. This gives you the full power of Python when you write the “Makefiles”.

If you want to know more, check out https://pydoit.org/