--- type: "blog-post" title: "Superior caching with dagger" description: "Dagger is an up-and-coming ci/cd orchestration tool as code, this may sound abstract, but it is quite simple, read on to learn more." draft: false date: "2023-08-02" updates: - time: "2023-08-02" description: "first iteration" tags: - '#blog' --- Dagger is an up-and-coming ci/cd orchestration tool as code, this may sound abstract, but it is quite simple, read on to learn more. ## Introduction This post is about me finding a solution to a problem, I've faced for a while with `rust` caching for docker images. I was building a new tool I am working on called `cuddle-please` (a release manager inspired by [release-please](https://github.com/googleapis/release-please)). I will start with a brief introduction to dagger, then the problem and how dagger solves it, in comparison to docker. ## What is dagger > If you already know what dagger is, feel free to skip ahead. I will explain > briefly what it is, and give a short example. Dagger is a tool where you can define your pipelines as code, dagger doesn't desire to replace your tools, such as bash, clis, apis and whatnot, but it wants to allow you to orchestrate them to your hearts content. And at the same time bring proper engineering principles to it, such as testing, packaging, and ergonomics. Dagger allows you to write your pipelines in one of the supported languages (of which are rapidly expanding). The official languages are by the dagger team are: - Go - Python - Typescript Community based ones are: - Rust (I am currently the author and maintainer of this one, but I don't work for `dagger`) - Elixir - Dotnet (in-progress) - Java (In-progress) - Ruby etc. Dagger at its simplest is an api on top of `docker` or rather `buildkit`, but brings with it so much more. You can kind of think of `dagger` as a juiced up `Dockerfile`, but it brings more interactivity and programmability to it. It even have elements of `docker-compose` as well. I personally call it `Programmatic Orchestration`. Anyways, a sample pipeline could be: ```rust #[tokio::main] async fn main() -> eyre::Result<()> { let client = dagger::connect().await?; let output = client.container() .from("alpine") .with_exec(vec!["echo", "hello-world"]) .stdout().await?; println!("stdout: {output}"); } ``` Now simply build and run it. ```bash cargo run ``` This will go ahead and download the image, and run the `echo "hello-world"` command. Which in turn we can extract and print. This is a very basic example. The equivalent `Dockerfile` would look like this. ```Dockerfile FROM alpine RUN echo "hello-world" ``` > The only prerequisite is a newer version of `docker`, but you can also install > `dagger` as well, for better ergonomics and output. However, dagger as its namesake suggests runs on dags, this means that normally when you would use `multi-stage dockerfiles` ```Dockerfile FROM alpine as base FROM base as builder RUN ... FROM base as production COPY --from=builder /mnt/... . ``` This forms a dag when you run `docker build .`, where. ``` base is run first because builder depends on it. after is done, production will run because depends on builder ``` Dagger does the same things behind the scenes, but with a much more capable api. In dagger, you can easily, share sockets, files, folders, containers, stdout, etc. All of which can be done in a programming language, instead of a recipe like declarative file like a `Dockerfile`. It should be noted that dagger transforms your code into a declarative manifest behind the scenes, kind of like `Pulumi`, though it is still interactive, think `SQL`, where each query is a declarative command/query. ## Why orchestration matters. Dagger is a paradigm shift, because you can now enable engineering on top of your pipelines, normally in Dockerfiles, you would download all sorts of clis to manage your package managers, and tooling such as `jq` and whatnot to perform small changes to the scripts to transform them into something compatible with the `docker build`. ## The problem A good example is building production images for rust. Building ci docker images for rust is a massive pain. This is because when you run `cargo build`, or any of its siblings, you refresh package registry if needed, download dependencies, form the dependency chain between crates, and build the final crates / binaries. This is very bad for caching, because you can't tell `cargo` to only fetch dependencies and compile them, but leave your own crates alone. This is general means that you will cache bust your dependencies each time you do a code change to your crates, no matter how small. `Dockerfile` or rather `Buildkit` on its own isn't able to properly split the cache, between these commands, because from its point of view, it is all a single atomic command. Existing solutions are downloading tools to handle it for you, but those are cumbersome, and tbh, incompatible. For example, `cargo-chef`. With cargo chef, it should allow you to create a recipe.json file, which contains a list of all your dependencies, which you can move from an step into your build step, and cache the dependencies that way. I've honestly found this really flaky, as the lower `recipe.json` producing image, would cache-bust all the time. ```Dockerfile FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef WORKDIR /app FROM chef AS planner COPY . . RUN cargo chef prepare --recipe-path recipe.json FROM chef AS builder COPY --from=planner /app/recipe.json recipe.json # Build dependencies - this is the caching Docker layer! RUN cargo chef cook --release --recipe-path recipe.json # Build application COPY . . RUN cargo build --release --bin app # We do not need the Rust toolchain to run the binary! FROM debian:buster-slim AS runtime WORKDIR /app COPY --from=builder /app/target/release/app /usr/local/bin ENTRYPOINT ["/usr/local/bin/app"] ``` The above is the original example, but there are some flaws, it relies on the checksum of the recipe.json to be the same. If you do a change in one of your crates it will bust the hash of the recipe.json, because we just load all the files in `COPY . .`. Instead, what we would like to do is just load in the `Cargo.toml` and `Cargo.lock` files in for our workspace, as well as any crates we've got. And then dynamically construct empty main and lib.rs files to act as the binaries. This is the simplest approach, but very bothersome in a `Dockerfile`. ```Dockerfile FROM rustlang/rust:nightly as base FROM base as dep-builder WORKDIR /mnt/src COPY **/.Cargo.toml . COPY **/.Cargo.toml . RUN echo "fn main() {}" >> crates//src/main.rs RUN echo "fn main() {}" >> crates//src/lib.rs RUN echo "fn main() {}" >> crates//src/main.rs RUN echo "fn main() {}" >> crates//src/lib.rs # ... RUN cargo build # refreshes registry, fetches deps, compiles thems, and links them to a dummy binary FROM base as builder WORKDIR /mnt/src COPY --from=dep-builder target target COPY **/.Cargo.toml . COPY **/.Cargo.toml . COPY crates crates RUN cargo build # Compiles user code and links everything together, reuses cache from incremental build done previously ``` This is very cumbersome, as you have to remember to update the `echo` lines set above. You can script your way out of it, but it is just an ugly approach, that is hard to maintain and grok. ## The solution built in dagger Instead what we can do in `dagger` is to use a proper programmatic tool for this. ```rust // Some stuff omitted for brevity # 1 let mut rust_crates = vec![PathBuf::from("ci")]; # 2 let mut dirs = tokio::fs::read_dir("crates").await?; while let Some(entry) = dirs.next_entry().await? { if entry.metadata().await?.is_dir() { rust_crates.push(entry.path()) } } # 3 fn create_skeleton_files( directory: dagger_sdk::Directory, path: &Path, ) -> eyre::Result { let main_content = r#"fn main() {}"#; let lib_content = r#"fn some() {}"#; let directory = directory.with_new_file( path.join("src").join("main.rs").display().to_string(), main_content, ); let directory = directory.with_new_file( path.join("src").join("lib.rs").display().to_string(), lib_content, ); Ok(directory) } # 4 let mut directory = directory; for rust_crate in rust_crates.into_iter() { directory = create_skeleton_files(directory, &rust_crate)? } ``` You can find this in [cuddle-please](https://git.front.kjuulh.io/kjuulh/cuddle-please/src/branch/main/ci/src/main.rs). Which uses dagger as part of its `ci`. Anyways, for those not versed on `rust`, which most people probably arent. What is happening here, in rough terms: 1. We create a list of known crates. In this case ci, is added, because it is a bit special. 2. We list all folders in the folder crates and add them to `rust_crates` 3. An inline function is created, which has the option of adding a new file to an existing directory, in this case it adds both a main.rs and lib.rs file with some dummy content to a given path. 4. Here we apply these files for all the crates we found above. This is roughly equivalent to what we had above, but this time we can test individual parts of the code, or even share it. For example, I could create a rust library containing this functionality which I could reuse across all of my projects. This is a game-changer! > Note that rust is a bit more verbose than the other sdks, especially in > comparison to the dynamic once, such as Python or Elixir. But to me this is a > plus, because it allows us to work in the language we're most comfortable > with, which in my case is `rust` You can look at the rest of the [file](https://git.front.kjuulh.io/kjuulh/cuddle-please/src/branch/main/ci/src/main.rs), but now if I actually build using `cargo run -p ci`, it will first do everything while it builds its cache, and then afterwards if I do a code change in any of the files, only the binary will be recompiled and linked. This is mainly because of these two import of files (which are equivalent to `COPY` in dockerfiles) ```rust # 1 let dep_src = client.host().directory_opts( args.source .clone() .unwrap_or(PathBuf::from(".")) .display() .to_string(), dagger_sdk::HostDirectoryOptsBuilder::default() .include(vec!["**/Cargo.toml", "**/Cargo.lock"]) .build()?, ); # 2 let src = client.host().directory_opts( args.source .clone() .unwrap_or(PathBuf::from(".")) .display() .to_string(), dagger_sdk::HostDirectoryOptsBuilder::default() .exclude(vec!["node_modules/", ".git/", "target/"]) .build()?, ); ``` 1. Will load in only the Cargo files, this allows us to only cache-bust if any of those files change. 2. We load in everything except for some stuff, this is a mix of `COPY` and `.dockerignore`. Now we simply load them at different times and execute builds in between: ```rust # 1 let rust_build_image = client.container().from( args.rust_builder_image .as_ref() .unwrap_or(&"rustlang/rust:nightly".into()), ); # 2 let target_cache = client.cache_volume("rust_target"); # 3 let rust_build_image = rust_build_image .with_workdir("/mnt/src") .with_directory("/mnt/src", dep_src.id().await?) .with_exec(vec!["cargo", "build"]) .with_mounted_cache("/mnt/src/target/", target_cache.id().await?) .with_directory("/mnt/src/crates", src.directory("crates").id().await?); # 4 let rust_exe_image = rust_build_image.with_exec(vec!["cargo", "build"]); # 5 rust_exe_image.exit_code().await?; ``` 1. Do a `FROM` equivalent, creating a base container. 2. Builds a cache volume, this is extremely useful, because you can setup a shared cache pool for these volumes, so that you don't have to rely on buildkit-layer caching. (what is normally used in Dockerfiles) 3. Here we build the image 1. First we set the workdir, 2. then load in the directory fetched from above, this includes, the Cargo files as well as stub main and lib.rs files 3. Next we fire off a normal build with `with_exec` which function like a `RUN`. here we build the stub, with refreshed registry, downloaded and compiled dependencies. 4. We load in the rest of the source and replace `crates` with out own crates, this loads in the proper `.rs` files. 4. We now build the actual binary 5. We trigger exit_code, to actually run the dag, everything previously had been lazy, so if we didn't fire off the exit_code, or do another code action on it, we wouldn't actually execute the step. Now dagger will figure out the most optimal way of running our pipeline for maximum performance and cacheability. ## This is very verbose Rust is a bit more verbose than other languages, especially in comparison to scripting languages. In the future, I would probably package this up, and publish this as a `crate` I can depend on myself. This is super nice, and would make it quite easy to share this across all of my projects. That project like in my previous [post](https://blog.kasperhermansen.com/posts/cuddle/) could serve as a singular component, which could be tested in isolation, and serve as a proper api, and tool in general. This is something very hard, if not impossible with regular `Dockerfiles` (without templating). # Conclusion I've shown a rough outline of what dagger is, why it is useful and how you can do stuff with it that isn't possible using `Dockerfile` proper. The code examples show some contrived code, that highlight that you can solve real problems, using this new paradigm of mixing code with orchestration. In this case an unholy union of `rust` and `buildkit` through `dagger`.