A local CMake workflow with Docker

l#+BLOG: sdowney

An outline of a template that provides an automated workflow driving a CMake project in a docker container.

This post must be read in concert with https://github.com/steve-downey/scratch of which it is part.

Routine process should be automated

Building a project that uses cmake runs through a predictable lifecycle that you should be able to pick up where you left off without remembering, and for which you should be able to state your goal, not the step you are on. make is designed for this, and can drive the processs.

The workflow

  • Update any submodules
  • Create a build area specific to the toolchain
  • Run cmake with that toolchain in the build area
  • Run the build in the build area
  • Run tests, either dependent or independent of rebuild
  • Run the intall
  • (optionally) Clean the build
  • (optionally) Clean the configuration

All of the steps have recognizable filesystem artifacts, need to be run in order, and if there are any failures, the process should stop.

So we want a make system on top of our meta-make build system.

The one thing this system does, that plain cmake doesn’t automate, is making sure that any changes are incorporated into a build before tests are run. CMake splits these, in order to provide the option of running tests without a recompile. I think that is a mistake to automate, but I do provide a target that just runs ctest.

My normal target is test

make test

This will run through all of the steps, but only those, necessary to compile and run tests. The core commands for the build driver are

compile
Compile all out of date source
install
Install into the INSTALL_PREFIX
ctest
Run the currently build test suite
test
Build and run the test suite
cmake
run cmake again in the build area
clean
Clean the build area
realclean
Delete the build area

There are several makefile variables controlling the details of what toolchain is used and where things are located. By default the build and install are completely out of the source tree, in ../cmake.bld and ../install respectively, with the build directory further refined by the project name, which defaults to the current directory name, and the toolchain if that is overridden.

The build is a Ninja Multi-config build, supporting RelWithDebInfo, Debug, Tsan, and Asan, with the particular flavor being selectable by the CONFIG variable. See targets.mk for the variables and details, such as they are.

Because other tools, unfortunately, rely on a compile_commands.json this system symlinks one from the build area when reconfiguration is done.

default: compile

$(_build_path):
    mkdir -p $(_build_path)

$(_build_path)/CMakeCache.txt: | $(_build_path) .gitmodules
    cd $(_build_path) && $(run_cmake)
    -rm compile_commands.json
    ln -s $(_build_path)/compile_commands.json

compile: $(_build_path)/CMakeCache.txt ## Compile the project
    cmake --build $(_build_path)  --config $(CONFIG) --target all -- -k 0

install: $(_build_path)/CMakeCache.txt ## Install the project
    DESTDIR=$(abspath $(DEST)) ninja -C $(_build_path) -k 0  install

ctest: $(_build_path)/CMakeCache.txt ## Run CTest on current build
    cd $(_build_path) && ctest

ctest_ : compile
    cd $(_build_path) && ctest

test: ctest_ ## Rebuild and run tests

cmake: |  $(_build_path)
    cd $(_build_path) && ${run_cmake}

clean: $(_build_path)/CMakeCache.txt ## Clean the build artifacts
    cmake --build $(_build_path)  --config $(CONFIG) --target clean

realclean: ## Delete the build directory
    rm -rf $(_build_path)

To Docker or Not to Docker

The reason for the separate targets.mk file is to simplify running the build purely locally in the host, or in a docker containter. The structure of the build is the same either way. In fact, before I dockerized this template project, there was a single makefile which was mostly the current contents of targets.mk. However, splitting does make the template easier, as project specific targets can naturally be placed in targets.

Tha outer Makefile is responsible for checking if Docker has been requested and for making sure the container is ready. The makefile has a handful of targets of its own, but otherwide defers everything to targets.mk.

use-docker
set a flag file, USE_DOCKER_FILE, indicating to forward to docker
remove-docker
remove the flag file
docker-rebuild
rebuild the docker image
docker-clean
Clean volumes and rebuild image
docker-shell
Shell in the docker container

The docker container is build via docker-compose with the configuration docker-compose.yml. It uses the Dockerfile which uses steve-downey/cxx-dev:latest as the base image, and mounts the current source directory as a bind mount and a volume for ../cmake.bld.

I don’t publish steve-downey/cxx-dev:latest and you should build your own BASE. I do provide the recipe for the base image as a subprojct in docker-inf/docker-cxx-dev.

You running unknown things as root scares me.

The image is assumed to provide current version of gcc and clang as c++ or gcc, or clang++ respectively.

The intent of the image is to provide compilation services and operate as an lsp server using clangd. Mine doesn’t provide X, editors, IDEs, etc. The intent isn’t a VM, it’s a controlled compiler installation.

Compiler installations bleed in to each other. Mutliple compilers installed onto the same base system can’t be assumed to behave the same way as a compier installed as the only compiler. The ABI libraries vary, as do the standard libaries. Deployment just makes this all an even worse problem. As a Rule I use for production Red Hat’s DTS compilers and only deploy on later OSs than I’ve built on, with strict controls on OS deployments and statically linking everything I possibly can.

The base image I am using here, steve-downey/cxx-dev, works for me, and is avaiable at https://github.com/steve-downey/docker-cxx-dev as a definition as well.

It is based on current Ubuntu (jammy), installs gcc-12 from the ubuntu repositories, adds the LLVM repos and installs clang-14 from them based on how https://apt.llvm.org/llvm.sh does.

It then installs the current release of cmake from https://apt.kitware.com/ubuntu/ because using out of date build tools is a bad idea all around.

I also configure it to run as USER 1000, because running everything as root is strictly worse, and 1000 is a 99.99 percent solution/

.update-submodules:
    git submodule update --init --recursive
    touch .update-submodules

.gitmodules: .update-submodules

.PHONY: use-docker
use-docker: ## Create docker switch file so that subsequent `make` commands run inside docker container.
    touch $(USE_DOCKER_FILE)

.PHONY: remove-docker
remove-docker: ## Remove docker switch file so that subsequent `make` commands run locally.
    $(RM) $(USE_DOCKER_FILE)

.PHONY: docker-rebuild
docker-rebuild: ## Rebuilds the docker file using the latest base image.
    docker-compose build

.PHONY: docker-clean
docker-clean: ## Clean up the docker volumes and rebuilds the image from scratch.
    docker-compose down -v
    docker-compose build

.PHONY: docker-shell
docker-shell: ## Shell in container
    docker-compose run --rm dev

Work In Progress

I expect I will make many changes to all of this. I’m providing no facilities for you to pick them up. Sorry.

Please consider this as an exhibition of techniques rather than as a solution.

std::execution, Sender/Receiver, and the Continuation Monad

Some thoughts on the std::execution proposal and my understanding of the underlying theory.

What’s proposed

From the paper’s Introduction

This paper proposes a self-contained design for a Standard C++ framework for managing asynchronous execution on generic execution contexts. It is based on the ideas in [P0443R14] and its companion papers.

Which doesn’t tell you much.

It proposes a framework where the principle abstractions are Senders, Receivers, and Schedulers.

Sender
A composable unit of work.
Receiver
Delimits work, handling completion, exceptions, or cancellation.
Schedulers
Arranges for the context work is done in.

The primary user facing concept is the sender. Values and functions can be lifted directly into senders. Senders can be stacked together, with a sender passing its value on to another function. Or stacking exception or cancellation handling the same way.

Receivers handle the three ways a sender can complete, by returning a value, throwing an exception, or being canceled. As described, receivers are most likely to be implemented within particular algorithms that combine senders, such as `then` or `retry`.

Schedulers provide access to execution contexts. Like inline, single thread, a thread pool, a GPU, and so on, would all have schedulers that provide for putting a sender into the context they manage.

There’s a fairly large API surface being proposed. But there’s an underlying theory about this, governing what algorithms need to be there and how the pieces fit together.

Continuation Passing Style and the Continuation Monad

Continuation passing style is a transformation from a normal function and a call stack to a direction to send the result to the “continuation” without returning. This means the functions context can be cleaned up. Delimited continuations are a slight variation, where instead of an unbounded “rest of the program”, the continuation has an end point and a value. It’s essentially a function, and can be handled as such. There is a purely mechanical method for converting all of the lambda calculus transforms into CPS form, and this can be profitable for compilers based on lambda, or related logics, like system F.

The mechanical transformation also means that all the control structures, like loops, gotos, coroutines, exceptions, have CPS equivalents.

CPS is tedious, though. Having to explicitly add a continuation to everything is complicated.

However, there’s also a typeclass, or concept, that allows you to convert regular functions into continuation passing style, automatically. It’s then rather straightforward to involve concerns like where work is being run for something that wraps up the entire work. Even being able to switch back and forth between contexts. That’s the continuation monad.

And unfortunately monads became an organizing principle in programming language theory one or two decades after most CS programs were standardized. So it’s all complicated and involves things we weren’t trained on. Fitting it into C++ has been an ongoing challenge, and until we had generic lambda was neither reasonbly concise nor idiomatic.

See, however, the new monadic interface additions for std::optional for why you want this. Or Ranges, which are solidly based in the ‘list’ or non-deterministic monad.

We have a poor relationship with Theory

There is no satisfactory PL theory for object oriented programming. There’s lots of work, but it mostly ends up describing something that OO programmers don’t think is quite the same as what they do. Even the ones who spend a lot of time doing theory.

Yet OO was, and is, a successful discipline. Working with identity, behavior, and state has produced remarkable results. Temporal calculi, not so much.

For a long while, we as a discipline thought that multi-threading was similar. There was poor theory, but we had hardware, and libraries that let us use that hardware to do concurrent work correctly.

That turned out not to be the case.

Concurrency can’t be just a library, unfortunately. Concurrency models that hardware vendors will commit to won’t promise not to violate causality. That makes producing a programming model programmers can use frighteningly difficult.

Which is why having a sound theory for std::execution is a good thing, even if the theory is unfamiliar.

But as a group, we learned the wrong lessons from the 80s and thought it was a researcher’s job to take the successes of practitioners and put a sound basis to them. Ignoring that it is a feedback loop. In the 60s and 70s, those researchers were also the practitioners. It’s not wrong to get out ahead of theory, but we do need to check back.

p2300 std::execution

Senders, via the Decorator pattern, lift ordinary functions into the continuation passing style. People writing functions only need to be concerned with handling the arguments they are passed, without concern for execution context or continuations. Functions used by senders act like, and are, normal functions.

Senders manage a bundle of channels, representing normal return of a value, throwing an exception, or an error channel to handle cancellation, or other errors not within the bound of ordinary functions. All of these channels can be composed taking the result to another function, or monadically with a function returning a sender, where that function can determine the kind of sender based on the values of the arguments. The channels can be combined or rerouted, connecting one to another, or presenting a variant containing either result, exception, and/or error to the continuation function.

Although senders form a logical graph of units of work, the physical type model is containment, much like expression templates. The result of binding senders together via an algorithm is a sender that contains the bound together senders. There are no nodes or allocations inherent to the model, just function calls.

C++ coroutines fit into this model. C++ coroutines are, from the outside, functions with rules about the interaction patterns with the returned value. Making a coroutine owning type a sender, and a sender co_awaitable, is possible and has been demonstrated.

std::execution takes the Continuation Monad and fits it to C++ control flow, return or exception, and adds cancellation, which incidentally allows a channel for failures from execution contexts. The thread pool can potentially signal failure via the error channel, without aliasing problems from application function code. However, for advanced users, these can be folded back into the normal function arguments and handled by application code. Policy decisions are not burned into the ROM of std::execution, but there are defaults that can be provided by application infrastructure authors.

Those infrastructure authors do not have to be std library vendors. The protocols, rendered as concepts, are available to normal users.

Network TS

Eppur si muove
And yet it moves

I do not believe ASIO’s model is a firm foundation for all async programming. However, it is well proven, and exists. It works.

And …

I have confidence that a networking library can and will be built using p2300. I am less confident that can be done in the timeframe for C++26. I do not believe for a moment we could have one for C++23, even with an existence proof a networking library appearing now. It’s simply too late to review and agree. We’re in the same place as coroutines. We can have the machinery, but without all of the application user facing infrastructure we should have.

I think this was the right choice with coroutines, and I think providing the machinery for general continuation based async in the standard library so that we can build on top of it is the right choice. The authors have committed to making sure all the facilities are available for programmers, in particular the pipe syntax (an issue for ranges) as well as providing bases or adapters for coroutine promises and typed senders. We can experiment and add existing practice as we go.

Disclaimer

This is all my personal opinion, based on my own understanding. I’ve been in the meetings, I’ve been in discussions, asked questions. But if I’m wrong about some aspect of the proposal, that’s on me. Certainly not a formal opinion of Bloomberg, where I work. While we do lots of network services, and async programming, this isn’t what our tech looks like at all. Getting from here to there is an open question, but it would be for ASIO, too.

At least it isn’t CORBA.

Source For Blog