python package management with pdm

Tuesday, 01 Oct 2024
#software

It is widely recognised that python is the most popular programming language in 2024. It leads indices like TIOBE and PYPL convincingly. While modern python stands out as a highly versatile language, its origins lie in scripting and automation (think bash or perl). Although it evolved and overgrew that role quickly, an area where its scripting roots still remain evident is the utter hell that is python dependency management. In this article I will share how I use a tool called pdm to make python libraries easier to maintain, share, and replicate.

I will assume that the user is on an apt-based system (e.g., Debian or Ubuntu), and has access to the internet and a terminal-emulator like kitty or konsole. I will also assume that you are using bash as it is the default shell on most systems.

Installing pdm

To install pdm you will need some pre-requisites which can be installed with:

$ sudo apt install curl python3-venv

While python3-dev is not strictly necessary, it provides a number of quality-of-life improvements when using pdm version 2.13 or higher. Once all the prerequisites are available, installing pdm is quite simple.

$ curl -sSL https://pdm-project.org/install-pdm.py | python3 -
$ printf '\n#pdm\nexport PATH=$HOME/.local/bin:$PATH\n' | tee -a ~/.bashrc > /dev/null

This should install pdm and add it to your path.

Creating a project

From here on, I will assume that we want to create and work on a project called pdm-project. Furthermore, we will assume that this project requires python version 3.9.2, and depends on numpy and matplotlib. First, let us ensure that the required version of python is installed and available. In pdm >= 2.13 this can be done by:

$ pdm python install 3.9.2

We are now ready to create our project. I will assume the working directory is the user root. Creating a new project is as easy as:

$ mkdir pdm-project && cd ./pdm-project
$ pdm init

pdm will now ask you a number of questions and then create a folder for your project with some scaffolding already in place. The project is ready for development, and the correct python interpreter has been installed. Now try the command which python in the project folder. This should give you the path of the python binary accessible from this folder. You will probably get no output. This is because while the correct version of python is installed, the interpreter (the virtual environment created for the project) has not been activated. You can invoke your virtual environment by direct activation.

Making life easier with direnv

However, here I’d like to introduce a tool called direnv. It changes the shell environment based on your current working directory. By creating and populating a file named .envrc in a directory, you can tell direnv to export variables and run scripts when you enter/leave the directory. In our case, the basic idea is to create a .envrc file which will activate our virtual environment for us any time we are in the project-directory.

Assuming your are on a bash shell, you can install direnv by:

$ sudo apt install direnv
$ printf '\n#direnv\neval "$(direnv hook bash)"' | tee -a ~/.bashrc > /dev/null
$ . ~/.bashrc

You can now make a .envrc file and autoload the python interpreter by:

$ cd ~/pdm-project
$ printf 'eval $(pdm venv activate)' | tee -a ./.envrc > /dev/null && direnv allow

Running which python should now give you the path to your interpreter (probably something like ./.venv/bin/python). You can confirm the version of python by python --version (this should give you 3.9.2).

Tracking dependencies

Let us recap. We have installed pdm. We created a new project which requires a specific version of python (which we also installed). And we automated the process of firing up the correct python interpreter when your working directory is the same as the project. Since installing the tools (pdm and direnv) only needs to be done once on your system, setting up a new project requires about 1 minute (pdm init and creating the .envrc).

Great. You have set up a new project. Next, let us install the project dependencies. We can do this by:

$ pdm add numpy matplotlib

Now you can use numpy and matplotlib in your project! But how does this work? Try looking at the file pyproject.toml in your project folder. The packages should be listed there with their precise version numbers. Congratulations, you have now started tracking dependencies in your project! Any new package you install with pdm add will be resolved and the added to pyproject.toml. You can remove a dependency X with pdm remove X.

Sharing and replicating software

Now suppose you want to properly version and possibly share your code. The ideal tool for this is, of coures, git. If not available already, it can be installed with:

$ sudo apt install git

To initialise a git repository at the root of our project and make the first commit, we simply do:

$ git init
$ printf '\n\n.envrc' | tee -a .gitignore > /dev/null
$ git add -A && commit -m "First commit"
$ git status

The final line confirms that your repository is indeed synced. Note that we have excluded the .envrc file from syncing by adding it to .gitignore. It is often preferable to gitignore any files (such as .envrc) you do not want to track or share. If this were a real project, at this point one would then sync it to a forge such as gitlab or bitbucket.

Now suppose your enthusiastic collaborator has just read all the things written above, packaged their code into a neat little repository, and is inviting you to use it. This is where systems like pdm shine. We are going to simulate the process of installing someone else’s code by cloning our recently-created project.

$ git clone ~/pdm-project ~/pdm-project-2 && cd ~/pdm-project-2
$ pdm install

And you’re done! pdm will take care of installing the correct version of python, pulling in all the dependencies, and installing compatible versions of the software. If you then want the virtual environment to autoload, simply create a .envrc file and use direnv:

$ printf 'eval $(pdm venv activate)' | tee -a ./.envrc > /dev/null && direnv allow

Conclusion

We have seen how to track and properly distribute small libraries. Larger projects may need more work, but chances are, if you are reading this, the process outlined above is probably going to be enough for your needs.

#software