398 lines
14 KiB
Markdown
398 lines
14 KiB
Markdown
---
|
|
type: "blog-post"
|
|
title: "Superior caching with dagger"
|
|
description: "Dagger is an up-and-coming ci/cd orchestration tool as code, this may sound abstract, but it is quite simple, read on to learn more."
|
|
draft: false
|
|
date: "2023-08-02"
|
|
updates:
|
|
- time: "2023-08-02"
|
|
description: "first iteration"
|
|
tags:
|
|
- '#blog'
|
|
---
|
|
|
|
Dagger is an up-and-coming ci/cd orchestration tool as code, this may sound
|
|
abstract, but it is quite simple, read on to learn more.
|
|
|
|
## Introduction
|
|
|
|
This post is about me finding a solution to a problem, I've faced for a while
|
|
with `rust` caching for docker images. I was building a new tool I am working on
|
|
called `cuddle-please` (a release manager inspired by
|
|
[release-please](https://github.com/googleapis/release-please)).
|
|
|
|
I will start with a brief introduction to dagger, then the problem and how
|
|
dagger solves it, in comparison to docker.
|
|
|
|
## What is dagger
|
|
|
|
> If you already know what dagger is, feel free to skip ahead. I will explain
|
|
> briefly what it is, and give a short example.
|
|
|
|
Dagger is a tool where you can define your pipelines as code, dagger doesn't
|
|
desire to replace your tools, such as bash, clis, apis and whatnot, but it wants
|
|
to allow you to orchestrate them to your hearts content. And at the same time
|
|
bring proper engineering principles to it, such as testing, packaging, and
|
|
ergonomics.
|
|
|
|
Dagger allows you to write your pipelines in one of the supported languages (of
|
|
which are rapidly expanding).
|
|
|
|
The official languages are by the dagger team are:
|
|
|
|
- Go
|
|
- Python
|
|
- Typescript
|
|
|
|
Community based ones are:
|
|
|
|
- Rust (I am currently the author and maintainer of this one, but I don't work
|
|
for `dagger`)
|
|
- Elixir
|
|
- Dotnet (in-progress)
|
|
- Java (In-progress)
|
|
- Ruby etc.
|
|
|
|
Dagger at its simplest is an api on top of `docker` or rather `buildkit`, but
|
|
brings with it so much more. You can kind of think of `dagger` as a juiced up
|
|
`Dockerfile`, but it brings more interactivity and programmability to it. It
|
|
even have elements of `docker-compose` as well. I personally call it
|
|
`Programmatic Orchestration`.
|
|
|
|
Anyways, a sample pipeline could be:
|
|
|
|
```rust
|
|
#[tokio::main]
|
|
async fn main() -> eyre::Result<()> {
|
|
let client = dagger::connect().await?;
|
|
|
|
let output = client.container()
|
|
.from("alpine")
|
|
.with_exec(vec!["echo", "hello-world"])
|
|
.stdout().await?;
|
|
|
|
println!("stdout: {output}");
|
|
}
|
|
```
|
|
|
|
Now simply build and run it.
|
|
|
|
```bash
|
|
cargo run
|
|
```
|
|
|
|
This will go ahead and download the image, and run the `echo "hello-world"`
|
|
command. Which in turn we can extract and print. This is a very basic example.
|
|
The equivalent `Dockerfile` would look like this.
|
|
|
|
```Dockerfile
|
|
FROM alpine
|
|
RUN echo "hello-world"
|
|
```
|
|
|
|
> The only prerequisite is a newer version of `docker`, but you can also install
|
|
> `dagger` as well, for better ergonomics and output.
|
|
|
|
However, dagger as its namesake suggests runs on dags, this means that normally
|
|
when you would use `multi-stage dockerfiles`
|
|
|
|
```Dockerfile
|
|
FROM alpine as base
|
|
|
|
FROM base as builder
|
|
RUN ...
|
|
|
|
FROM base as production
|
|
COPY --from=builder /mnt/... .
|
|
```
|
|
|
|
This forms a dag when you run `docker build .`, where.
|
|
|
|
```
|
|
base is run first because builder depends on it.
|
|
after is done, production will run because depends on builder
|
|
```
|
|
|
|
Dagger does the same things behind the scenes, but with a much more capable api.
|
|
|
|
In dagger, you can easily, share sockets, files, folders, containers, stdout,
|
|
etc. All of which can be done in a programming language, instead of a recipe
|
|
like declarative file like a `Dockerfile`.
|
|
|
|
It should be noted that dagger transforms your code into a declarative manifest
|
|
behind the scenes, kind of like `Pulumi`, though it is still interactive, think
|
|
`SQL`, where each query is a declarative command/query.
|
|
|
|
## Why orchestration matters.
|
|
|
|
Dagger is a paradigm shift, because you can now enable engineering on top of
|
|
your pipelines, normally in Dockerfiles, you would download all sorts of clis to
|
|
manage your package managers, and tooling such as `jq` and whatnot to perform
|
|
small changes to the scripts to transform them into something compatible with
|
|
the `docker build`.
|
|
|
|
## The problem
|
|
|
|
A good example is building production images for rust. Building ci docker images
|
|
for rust is a massive pain. This is because when you run `cargo build`, or any
|
|
of its siblings, you refresh package registry if needed, download dependencies,
|
|
form the dependency chain between crates, and build the final crates / binaries.
|
|
This is very bad for caching, because you can't tell `cargo` to only fetch
|
|
dependencies and compile them, but leave your own crates alone.
|
|
|
|
This is general means that you will cache bust your dependencies each time you
|
|
do a code change to your crates, no matter how small. `Dockerfile` or rather
|
|
`Buildkit` on its own isn't able to properly split the cache, between these
|
|
commands, because from its point of view, it is all a single atomic command.
|
|
|
|
Existing solutions are downloading tools to handle it for you, but those are
|
|
cumbersome, and tbh, incompatible. For example, `cargo-chef`. With cargo chef,
|
|
it should allow you to create a recipe.json file, which contains a list of all
|
|
your dependencies, which you can move from an step into your build step, and
|
|
cache the dependencies that way. I've honestly found this really flaky, as the
|
|
lower `recipe.json` producing image, would cache-bust all the time.
|
|
|
|
```Dockerfile
|
|
FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
|
|
WORKDIR /app
|
|
|
|
FROM chef AS planner
|
|
COPY . .
|
|
RUN cargo chef prepare --recipe-path recipe.json
|
|
|
|
FROM chef AS builder
|
|
COPY --from=planner /app/recipe.json recipe.json
|
|
# Build dependencies - this is the caching Docker layer!
|
|
RUN cargo chef cook --release --recipe-path recipe.json
|
|
# Build application
|
|
COPY . .
|
|
RUN cargo build --release --bin app
|
|
|
|
# We do not need the Rust toolchain to run the binary!
|
|
FROM debian:buster-slim AS runtime
|
|
WORKDIR /app
|
|
COPY --from=builder /app/target/release/app /usr/local/bin
|
|
ENTRYPOINT ["/usr/local/bin/app"]
|
|
```
|
|
|
|
The above is the original example, but there are some flaws, it relies on the
|
|
checksum of the recipe.json to be the same. If you do a change in one of your
|
|
crates it will bust the hash of the recipe.json, because we just load all the
|
|
files in `COPY . .`.
|
|
|
|
Instead, what we would like to do is just load in the `Cargo.toml` and
|
|
`Cargo.lock` files in for our workspace, as well as any crates we've got. And
|
|
then dynamically construct empty main and lib.rs files to act as the binaries.
|
|
This is the simplest approach, but very bothersome in a `Dockerfile`.
|
|
|
|
```Dockerfile
|
|
FROM rustlang/rust:nightly as base
|
|
|
|
FROM base as dep-builder
|
|
WORKDIR /mnt/src
|
|
COPY **/.Cargo.toml .
|
|
COPY **/.Cargo.toml .
|
|
|
|
RUN echo "fn main() {}" >> crates/<some-crate>/src/main.rs
|
|
RUN echo "fn main() {}" >> crates/<some-crate>/src/lib.rs
|
|
|
|
RUN echo "fn main() {}" >> crates/<some-other-crate>/src/main.rs
|
|
RUN echo "fn main() {}" >> crates/<some-other-crate>/src/lib.rs
|
|
|
|
# ...
|
|
|
|
RUN cargo build # refreshes registry, fetches deps, compiles thems, and links them to a dummy binary
|
|
|
|
FROM base as builder
|
|
|
|
WORKDIR /mnt/src
|
|
|
|
COPY --from=dep-builder target target
|
|
COPY **/.Cargo.toml .
|
|
COPY **/.Cargo.toml .
|
|
COPY crates crates
|
|
|
|
RUN cargo build # Compiles user code and links everything together, reuses cache from incremental build done previously
|
|
```
|
|
|
|
This is very cumbersome, as you have to remember to update the `echo` lines set
|
|
above. You can script your way out of it, but it is just an ugly approach, that
|
|
is hard to maintain and grok.
|
|
|
|
## The solution built in dagger
|
|
|
|
Instead what we can do in `dagger` is to use a proper programmatic tool for
|
|
this.
|
|
|
|
```rust
|
|
// Some stuff omitted for brevity
|
|
|
|
# 1
|
|
let mut rust_crates = vec![PathBuf::from("ci")];
|
|
|
|
# 2
|
|
let mut dirs = tokio::fs::read_dir("crates").await?;
|
|
while let Some(entry) = dirs.next_entry().await? {
|
|
if entry.metadata().await?.is_dir() {
|
|
rust_crates.push(entry.path())
|
|
}
|
|
}
|
|
|
|
# 3
|
|
fn create_skeleton_files(
|
|
directory: dagger_sdk::Directory,
|
|
path: &Path,
|
|
) -> eyre::Result<dagger_sdk::Directory> {
|
|
let main_content = r#"fn main() {}"#;
|
|
let lib_content = r#"fn some() {}"#;
|
|
|
|
let directory = directory.with_new_file(
|
|
path.join("src").join("main.rs").display().to_string(),
|
|
main_content,
|
|
);
|
|
let directory = directory.with_new_file(
|
|
path.join("src").join("lib.rs").display().to_string(),
|
|
lib_content,
|
|
);
|
|
|
|
Ok(directory)
|
|
}
|
|
|
|
# 4
|
|
let mut directory = directory;
|
|
for rust_crate in rust_crates.into_iter() {
|
|
directory = create_skeleton_files(directory, &rust_crate)?
|
|
}
|
|
```
|
|
|
|
You can find this in
|
|
[cuddle-please](https://git.front.kjuulh.io/kjuulh/cuddle-please/src/branch/main/ci/src/main.rs).
|
|
Which uses dagger as part of its `ci`. Anyways, for those not versed on `rust`,
|
|
which most people probably arent. What is happening here, in rough terms:
|
|
|
|
1. We create a list of known crates. In this case ci, is added, because it is a
|
|
bit special.
|
|
2. We list all folders in the folder crates and add them to `rust_crates`
|
|
3. An inline function is created, which has the option of adding a new file to
|
|
an existing directory, in this case it adds both a main.rs and lib.rs file
|
|
with some dummy content to a given path.
|
|
4. Here we apply these files for all the crates we found above.
|
|
|
|
This is roughly equivalent to what we had above, but this time we can test
|
|
individual parts of the code, or even share it. For example, I could create a
|
|
rust library containing this functionality which I could reuse across all of my
|
|
projects. This is a game-changer!
|
|
|
|
> Note that rust is a bit more verbose than the other sdks, especially in
|
|
> comparison to the dynamic once, such as Python or Elixir. But to me this is a
|
|
> plus, because it allows us to work in the language we're most comfortable
|
|
> with, which in my case is `rust`
|
|
|
|
You can look at the rest of the
|
|
[file](https://git.front.kjuulh.io/kjuulh/cuddle-please/src/branch/main/ci/src/main.rs),
|
|
but now if I actually build using `cargo run -p ci`, it will first do everything
|
|
while it builds its cache, and then afterwards if I do a code change in any of
|
|
the files, only the binary will be recompiled and linked.
|
|
|
|
This is mainly because of these two import of files (which are equivalent to
|
|
`COPY` in dockerfiles)
|
|
|
|
```rust
|
|
# 1
|
|
let dep_src = client.host().directory_opts(
|
|
args.source
|
|
.clone()
|
|
.unwrap_or(PathBuf::from("."))
|
|
.display()
|
|
.to_string(),
|
|
dagger_sdk::HostDirectoryOptsBuilder::default()
|
|
.include(vec!["**/Cargo.toml", "**/Cargo.lock"])
|
|
.build()?,
|
|
);
|
|
# 2
|
|
let src = client.host().directory_opts(
|
|
args.source
|
|
.clone()
|
|
.unwrap_or(PathBuf::from("."))
|
|
.display()
|
|
.to_string(),
|
|
dagger_sdk::HostDirectoryOptsBuilder::default()
|
|
.exclude(vec!["node_modules/", ".git/", "target/"])
|
|
.build()?,
|
|
);
|
|
```
|
|
|
|
1. Will load in only the Cargo files, this allows us to only cache-bust if any
|
|
of those files change.
|
|
2. We load in everything except for some stuff, this is a mix of `COPY` and
|
|
`.dockerignore`.
|
|
|
|
Now we simply load them at different times and execute builds in between:
|
|
|
|
```rust
|
|
# 1
|
|
let rust_build_image = client.container().from(
|
|
args.rust_builder_image
|
|
.as_ref()
|
|
.unwrap_or(&"rustlang/rust:nightly".into()),
|
|
);
|
|
|
|
# 2
|
|
let target_cache = client.cache_volume("rust_target");
|
|
|
|
# 3
|
|
let rust_build_image = rust_build_image
|
|
.with_workdir("/mnt/src")
|
|
.with_directory("/mnt/src", dep_src.id().await?)
|
|
.with_exec(vec!["cargo", "build"])
|
|
.with_mounted_cache("/mnt/src/target/", target_cache.id().await?)
|
|
.with_directory("/mnt/src/crates", src.directory("crates").id().await?);
|
|
|
|
# 4
|
|
let rust_exe_image = rust_build_image.with_exec(vec!["cargo", "build"]);
|
|
|
|
# 5
|
|
rust_exe_image.exit_code().await?;
|
|
```
|
|
|
|
1. Do a `FROM` equivalent, creating a base container.
|
|
2. Builds a cache volume, this is extremely useful, because you can setup a
|
|
shared cache pool for these volumes, so that you don't have to rely on
|
|
buildkit-layer caching. (what is normally used in Dockerfiles)
|
|
3. Here we build the image
|
|
1. First we set the workdir,
|
|
2. then load in the directory fetched from above, this includes, the Cargo
|
|
files as well as stub main and lib.rs files
|
|
3. Next we fire off a normal build with `with_exec` which function like a
|
|
`RUN`. here we build the stub, with refreshed registry, downloaded and
|
|
compiled dependencies.
|
|
4. We load in the rest of the source and replace `crates` with out own
|
|
crates, this loads in the proper `.rs` files.
|
|
4. We now build the actual binary
|
|
5. We trigger exit_code, to actually run the dag, everything previously had been
|
|
lazy, so if we didn't fire off the exit_code, or do another code action on
|
|
it, we wouldn't actually execute the step. Now dagger will figure out the
|
|
most optimal way of running our pipeline for maximum performance and
|
|
cacheability.
|
|
|
|
## This is very verbose
|
|
|
|
Rust is a bit more verbose than other languages, especially in comparison to
|
|
scripting languages. In the future, I would probably package this up, and
|
|
publish this as a `crate` I can depend on myself. This is super nice, and would
|
|
make it quite easy to share this across all of my projects.
|
|
|
|
That project like in my previous
|
|
[post](https://blog.kasperhermansen.com/posts/cuddle/) could serve as a singular
|
|
component, which could be tested in isolation, and serve as a proper api, and
|
|
tool in general. This is something very hard, if not impossible with regular
|
|
`Dockerfiles` (without templating).
|
|
|
|
# Conclusion
|
|
|
|
I've shown a rough outline of what dagger is, why it is useful and how you can
|
|
do stuff with it that isn't possible using `Dockerfile` proper. The code
|
|
examples show some contrived code, that highlight that you can solve real
|
|
problems, using this new paradigm of mixing code with orchestration. In this
|
|
case an unholy union of `rust` and `buildkit` through `dagger`.
|