Bundling additional resources with your Python package

Modern web applications have a lot of dependencies and often require a lot of steps to properly install on the remote host. For instance, the typical bootstrapping process for a new version of the application on a remote host consists of the following steps:

Create a new virtual environment for isolation.
Move the project code to the execution environment.
Install the latest project requirements (usually from the requirements.txt file).
Synchronize or migrate the database schema.
Collect static files from project sources and external packages to the desired location.
Compile localization files for applications available in different languages.

For more complex sites, there might be lot of additional tasks mostly related to frontend code that is independent from previously defined tasks, as in the following example:

Generate CSS files using preprocessors such as SASS or LESS.
Perform minification, obfuscation, and/or concatenation of static files (JavaScript and CSS files).

Compile code written in JavaScript superset languages (CoffeeScript, TypeScript, and so on) to native JS.
Preprocess response template files (minification, style inlining, and so on).

Nowadays, for these kind of applications that require a lot of additional assets to be prepared, most developers would probably use Docker images. Dockerfiles allow you to easily define all of the steps that are necessary to bundle all assets with your application image. But if you don't use Docker, it means that all of these steps must be automated using other tools such as Make, Bash, Fabric, or Ansible. Still, it is not a good idea to do all of these steps directly on the remote hosts where the application is being installed. Here are the reasons:

Some of the popular tools for processing static assets can be either CPU or memory intensive. Running them in production environments can destabilize your application execution.
These tools very often will require additional system dependencies that may not be required for the normal operation of your projects. These are mostly additional runtime environments such as JVM, Node, or Ruby. This adds complexity to configuration management and increases the overall maintenance costs.
If you are deploying your application to multiple servers (tens, hundreds, or thousands), you are simply repeating a lot of work that could be done once. If you have your own infrastructure, then you may not experience the huge increase of costs, especially if you perform deployments in periods of low traffic. But if you run cloud computing services in the pricing model that charges you extra for spikes in load or generally for execution time, then this additional cost may be substantial on a proper scale.
Most of these steps just take a lot of time. You are installing your code on remote servers, so the last thing you want is to have your connection interrupted by some network issue. By keeping the deployment process quick, you are lowering the chance of deployment interruption.

Obviously, the results of these predeployment steps can't be included in your application code repository either. Simply, there are things that must be done with every release and you can't change that. It is obviously a place for proper automation but the clue is to do it in the right place and at the right time.

Most of the things, such as static collection and code/asset preprocessing, can be done locally or in a dedicated environment, so the actual code that is deployed to the remote server requires only a minimal amount of on-site processing. The following are the most notable of such deployment steps, either in the process of building distribution or installing a package:

Installation of Python dependencies and transferring of static assets (CSS files and JavaScript) to the desired location can be handled as a part of the install command of the setup.py script.
Preprocessing of code (processing JavaScript supersets, minification/obfuscation/concatenation of assets, and running SASS or LESS) and things such as localized text compilation (for example, compilemessages in Django) can be a part of the sdist/bdist command of the setup.py script.

Inclusion of preprocessed code other than Python can be easily handled with the proper MANIFEST.in file. Dependencies are, of course, best provided as an install_requires argument of the setup() function call from the setuptools package.

Packaging the whole application, of course, will require some additional work from you, such as providing your own custom setuptools commands or overriding the existing ones, but it gives you a lot of advantages and makes project deployment a lot faster and reliable.

Let's use a Django-based project (in Django 1.9 version) as an example. I have chosen this framework because it seems to be the most popular Python project of this type, so there is a high chance that you already know it a bit. A typical structure of files in such a project might look like the following:

$ tree . -I __pycache__ --dirsfirst
.
├── webxample
│   ├── conf
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   ├── locale
│   │   ├── de
│   │   │   └── LC_MESSAGES
│   │   │       └── django.po
│   │   ├── en
│   │   │   └── LC_MESSAGES
│   │   │       └── django.po
│   │   └── pl
│   │       └── LC_MESSAGES
│   │           └── django.po
│   ├── myapp
│   │   ├── migrations
│   │   │   └── __init__.py
│   │   ├── static
│   │   │   ├── js
│   │   │   │   └── myapp.js
│   │   │   └── sass
│   │   │       └── myapp.scss
│   │   ├── templates
│   │   │   ├── index.html
│   │   │   └── some_view.html
│   │   ├── __init__.py
│   │   ├── admin.py
│   │   ├── apps.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   ├── __init__.py
│   └── manage.py
├── MANIFEST.in
├── README.md
└── setup.py
    
15 directories, 23 files

Note that this slightly differs from the usual Django project template. By default, the name of the package that contains the WSGI application, the settings module, and the URL configuration has the same name as the project. Because we decided to take the packaging approach, this would be named as webxample. This can cause some confusion, so it is better to rename it to conf. Without digging into the possible implementation details, let's just make the following few simple assumptions:

Our example application has some external dependencies. Here, it will be two popular Django packages: djangorestframework and django-allauth, plus one non-Django package: gunicorn.
djangorestframework and django-allauth are provided as INSTALLED_APPS in the webexample.webexample.settings module.
The application is localized in three languages (German, English, and Polish) but we don't want to store the compiled gettext messages in the repository.
We are tired of vanilla CSS syntax, so we decided to use a more powerful SCSS language that we translate into CSS using SASS.

Knowing the structure of the project, we can write our setup.py script in a way that makes setuptools handle the following:

Compilation of SCSS files under webxample/myapp/static/scss
Compilation of gettext messages under webexample/locale from .po to .mo format
Installation of the requirements
A new script that provides an entry point to the package, so we will have the custom command instead of the manage.py script

We have a bit of luck here—Python binding for libsass, a C/C++ port of the SASS engine, provides some integration with setuptools and distutils. With only a little configuration, it provides a custom setup.py command for running the SASS compilation. This is shown in the following code:

from setuptools import setup 
 
setup( 
    name='webxample', 
    setup_requires=['libsass == 0.6.0'], 
    sass_manifests={ 
        'webxample.myapp': ('static/sass', 'static/css') 
    }, 
)

So, instead of running the sass command manually or executing a subprocess in the setup.py script, we can type python setup.py build_scss and have our SCSS files compiled to CSS. This is still not enough. It makes our life a bit easier but we want the whole distribution fully automated so there is only one step for creating new releases. To achieve this goal, we are forced to override some of the existing setuptools distribution commands.

The example setup.py file that handles some of the project preparation steps through packaging might look like this:

import os 
 
from setuptools import setup 
from setuptools import find_packages 
from distutils.cmd import Command 
from distutils.command.build import build as _build 
 
try: 
    from django.core.management.commands.compilemessages \ 
        import Command as CompileCommand 
except ImportError: 
    # note: during installation django may not be available 
    CompileCommand = None 
 
# this environment is requires 
os.environ.setdefault( 
    "DJANGO_SETTINGS_MODULE", "webxample.conf.settings" 
) 
 
class build_messages(Command): 
    """ Custom command for building gettext messages in Django 
    """ 
    description = """compile gettext messages""" 
    user_options = [] 
 
    def initialize_options(self): 
        pass 
 
    def finalize_options(self): 
 
        pass 
 
    def run(self): 
        if CompileCommand: 
            CompileCommand().handle( 
                verbosity=2, locales=[], exclude=[] 
            ) 
        else: 
            raise RuntimeError("could not build translations") 
 
class build(_build): 
    """ Overriden build command that adds additional build steps 
    """ 
    sub_commands = [ 
        ('build_messages', None), 
        ('build_sass', None), 
    ] + _build.sub_commands 
 
setup( 
    name='webxample', 
    setup_requires=[ 
        'libsass == 0.6.0', 
        'django == 1.9.2', 
    ], 
    install_requires=[ 
        'django == 1.9.2', 
        'gunicorn == 19.4.5', 
        'djangorestframework == 3.3.2', 
        'django-allauth == 0.24.1', 
    ], 
    packages=find_packages('.'), 
    sass_manifests={ 
        'webxample.myapp': ('static/sass', 'static/css') 
    }, 
    cmdclass={ 
        'build_messages': build_messages, 
        'build': build, 
    }, 
    entry_points={ 
        'console_scripts': { 
            'webxample = webxample.manage:main', 
        } 
    } 
)

With such an implementation, we can build all assets and create the source distribution of a package for the webxample project using the following single Terminal command:

$ python setup.py build sdist

If you already have your own package index (created with devpi), you can add the install subcommand or use twine so this package will be available for installation with pip in your organization. If we look into a structure of source distribution created with our setup.py script, we can see that it contains the following compiled gettext messages and CSS style sheets generated from SCSS files:

$ tar -xvzf dist/webxample-0.0.0.tar.gz 2> /dev/null
$ tree webxample-0.0.0/ -I __pycache__ --dirsfirst
webxample-0.0.0/
├── webxample
│   ├── conf
│   │   ├── __init__.py
│   │   ├── settings.py
│   │   ├── urls.py
│   │   └── wsgi.py
│   ├── locale
│   │   ├── de
│   │   │   └── LC_MESSAGES
│   │   │       ├── django.mo
│   │   │       └── django.po
│   │   ├── en
│   │   │   └── LC_MESSAGES
│   │   │       ├── django.mo
│   │   │       └── django.po
│   │   └── pl
│   │       └── LC_MESSAGES
│   │           ├── django.mo
│   │           └── django.po
│   ├── myapp
│   │   ├── migrations
│   │   │   └── __init__.py
│   │   ├── static
│   │   │   ├── css
│   │   │   │   └── myapp.scss.css
│   │   │   └── js
│   │   │       └── myapp.js
│   │   ├── templates
│   │   │   ├── index.html
│   │   │   └── some_view.html
│   │   ├── __init__.py
│   │   ├── admin.py
│   │   ├── apps.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   ├── __init__.py
│   └── manage.py
├── webxample.egg-info
│   ├── PKG-INFO
│   ├── SOURCES.txt
│   ├── dependency_links.txt
│   ├── requires.txt
│   └── top_level.txt
├── MANIFEST.in
├── PKG-INFO
├── README.md
├── setup.cfg
└── setup.py

16 directories, 33 files

The additional benefit of using this approach is that we were able to provide our own entry point for the project in place of Django's default manage.py script. Now, we can run any Django management command using this entry point, for instance:

$ webxample migrate
$ webxample collectstatic
$ webxample runserver

This required a little change in the manage.py script for compatibility with the entry_points argument in setup(), so the main part of its code is wrapped with the main() function call. This is shown in the following code:

#!/usr/bin/env python3 
import os 
import sys 
 
 
def main(): 
    os.environ.setdefault( 
        "DJANGO_SETTINGS_MODULE", "webxample.conf.settings" 
    ) 
 
    from django.core.management import execute_from_command_line 
 
    execute_from_command_line(sys.argv) 
 
 
if __name__ == "__main__": 
    main()

Unfortunately, a lot of frameworks (including Django) are not designed with the idea of packaging your projects that way in mind. It means that, depending on the advancement of your application, converting it to a package may require a lot of changes. In Django, this often means rewriting many of the implicit imports and updating a lot of configuration variables in your settings file.

The other problem here is consistency of releases created using Python packaging. If different team members are authorized to create application distribution, it is crucial that this process takes place in the same replicable environment. Especially when you do a lot of asset preprocessing, it is possible that the package created in two different environments will not look the same, even if it is created from the same code base. This may be due to different versions of tools used during the build process. The best practice is to move the distribution responsibility to some continuous integration/delivery system such as Jenkins, Buildbot, Travis CI, or similar. The additional advantage is that you can assert that the package passes all of the required tests before going to distribution. You can even make the automated deployment as a part of such a continuous delivery system.

Mind that although distributing your code as Python packages using setuptools might seem elegant, it is actually not simple and effortless. It has potential to greatly simplify your deployments and so it is definitely worth trying but it comes with the cost of increased complexity. If your preprocessing pipeline for your application grows too complex, you should definitely consider building Docker images and deploying your application as containers.

Deployment with Docker requires some additional setup and orchestration but in the long term saves a lot of time and resources that are otherwise required to maintain repeatable build environments and complex preprocessing pipelines.

In the next section, we'll take a look at the common conventions and practices regarding deployment of Python applications.