This commit is contained in:
397
content/posts/2023-08-01-superior-caching-with-dagger.md
Normal file
397
content/posts/2023-08-01-superior-caching-with-dagger.md
Normal file
@@ -0,0 +1,397 @@
|
||||
---
|
||||
type: "blog-post"
|
||||
title: "Superior caching with dagger"
|
||||
description: "Dagger is an up-and-coming ci/cd orchestration tool as code, this may sound abstract, but it is quite simple, read on to learn more."
|
||||
draft: false
|
||||
date: "2023-08-02"
|
||||
updates:
|
||||
- time: "2023-08-02"
|
||||
description: "first iteration"
|
||||
tags:
|
||||
- '#blog'
|
||||
---
|
||||
|
||||
Dagger is an up-and-coming ci/cd orchestration tool as code, this may sound
|
||||
abstract, but it is quite simple, read on to learn more.
|
||||
|
||||
## Introduction
|
||||
|
||||
This post is about me finding a solution to a problem, I've faced for a while
|
||||
with `rust` caching for docker images. I was building a new tool I am working on
|
||||
called `cuddle-please` (a release manager inspired by
|
||||
[release-please](https://github.com/googleapis/release-please)).
|
||||
|
||||
I will start with a brief introduction to dagger, then the problem and how
|
||||
dagger solves it, in comparison to docker.
|
||||
|
||||
## What is dagger
|
||||
|
||||
> If you already know what dagger is, feel free to skip ahead. I will explain
|
||||
> briefly what it is, and give a short example.
|
||||
|
||||
Dagger is a tool where you can define your pipelines as code, dagger doesn't
|
||||
desire to replace your tools, such as bash, clis, apis and whatnot, but it wants
|
||||
to allow you to orchestrate them to your hearts content. And at the same time
|
||||
bring proper engineering principles to it, such as testing, packaging, and
|
||||
ergonomics.
|
||||
|
||||
Dagger allows you to write your pipelines in one of the supported languages (of
|
||||
which are rapidly expanding).
|
||||
|
||||
The official languages are by the dagger team are:
|
||||
|
||||
- Go
|
||||
- Python
|
||||
- Typescript
|
||||
|
||||
Community based ones are:
|
||||
|
||||
- Rust (I am currently the author and maintainer of this one, but I don't work
|
||||
for `dagger`)
|
||||
- Elixir
|
||||
- Dotnet (in-progress)
|
||||
- Java (In-progress)
|
||||
- Ruby etc.
|
||||
|
||||
Dagger at its simplest is an api on top of `docker` or rather `buildkit`, but
|
||||
brings with it so much more. You can kind of think of `dagger` as a juiced up
|
||||
`Dockerfile`, but it brings more interactivity and programmability to it. It
|
||||
even have elements of `docker-compose` as well. I personally call it
|
||||
`Programmatic Orchestration`.
|
||||
|
||||
Anyways, a sample pipeline could be:
|
||||
|
||||
```rust
|
||||
#[tokio::main]
|
||||
async fn main() -> eyre::Result<()> {
|
||||
let client = dagger::connect().await?;
|
||||
|
||||
let output = client.container()
|
||||
.from("alpine")
|
||||
.with_exec(vec!["echo", "hello-world"])
|
||||
.stdout().await?;
|
||||
|
||||
println!("stdout: {output}");
|
||||
}
|
||||
```
|
||||
|
||||
Now simply build and run it.
|
||||
|
||||
```bash
|
||||
cargo run
|
||||
```
|
||||
|
||||
This will go ahead and download the image, and run the `echo "hello-world"`
|
||||
command. Which in turn we can extract and print. This is a very basic example.
|
||||
The equivalent `Dockerfile` would look like this.
|
||||
|
||||
```Dockerfile
|
||||
FROM alpine
|
||||
RUN echo "hello-world"
|
||||
```
|
||||
|
||||
> The only prerequisite is a newer version of `docker`, but you can also install
|
||||
> `dagger` as well, for better ergonomics and output.
|
||||
|
||||
However, dagger as its namesake suggests runs on dags, this means that normally
|
||||
when you would use `multi-stage dockerfiles`
|
||||
|
||||
```Dockerfile
|
||||
FROM alpine as base
|
||||
|
||||
FROM base as builder
|
||||
RUN ...
|
||||
|
||||
FROM base as production
|
||||
COPY --from=builder /mnt/... .
|
||||
```
|
||||
|
||||
This forms a dag when you run `docker build .`, where.
|
||||
|
||||
```
|
||||
base is run first because builder depends on it.
|
||||
after is done, production will run because depends on builder
|
||||
```
|
||||
|
||||
Dagger does the same things behind the scenes, but with a much more capable api.
|
||||
|
||||
In dagger, you can easily, share sockets, files, folders, containers, stdout,
|
||||
etc. All of which can be done in a programming language, instead of a recipe
|
||||
like declarative file like a `Dockerfile`.
|
||||
|
||||
It should be noted that dagger transforms your code into a declarative manifest
|
||||
behind the scenes, kind of like `Pulumi`, though it is still interactive, think
|
||||
`SQL`, where each query is a declarative command/query.
|
||||
|
||||
## Why orchestration matters.
|
||||
|
||||
Dagger is a paradigm shift, because you can now enable engineering on top of
|
||||
your pipelines, normally in Dockerfiles, you would download all sorts of clis to
|
||||
manage your package managers, and tooling such as `jq` and whatnot to perform
|
||||
small changes to the scripts to transform them into something compatible with
|
||||
the `docker build`.
|
||||
|
||||
## The problem
|
||||
|
||||
A good example is building production images for rust. Building ci docker images
|
||||
for rust is a massive pain. This is because when you run `cargo build`, or any
|
||||
of its siblings, you refresh package registry if needed, download dependencies,
|
||||
form the dependency chain between crates, and build the final crates / binaries.
|
||||
This is very bad for caching, because you can't tell `cargo` to only fetch
|
||||
dependencies and compile them, but leave your own crates alone.
|
||||
|
||||
This is general means that you will cache bust your dependencies each time you
|
||||
do a code change to your crates, no matter how small. `Dockerfile` or rather
|
||||
`Buildkit` on its own isn't able to properly split the cache, between these
|
||||
commands, because from its point of view, it is all a single atomic command.
|
||||
|
||||
Existing solutions are downloading tools to handle it for you, but those are
|
||||
cumbersome, and tbh, incompatible. For example, `cargo-chef`. With cargo chef,
|
||||
it should allow you to create a recipe.json file, which contains a list of all
|
||||
your dependencies, which you can move from an step into your build step, and
|
||||
cache the dependencies that way. I've honestly found this really flaky, as the
|
||||
lower `recipe.json` producing image, would cache-bust all the time.
|
||||
|
||||
```Dockerfile
|
||||
FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
|
||||
WORKDIR /app
|
||||
|
||||
FROM chef AS planner
|
||||
COPY . .
|
||||
RUN cargo chef prepare --recipe-path recipe.json
|
||||
|
||||
FROM chef AS builder
|
||||
COPY --from=planner /app/recipe.json recipe.json
|
||||
# Build dependencies - this is the caching Docker layer!
|
||||
RUN cargo chef cook --release --recipe-path recipe.json
|
||||
# Build application
|
||||
COPY . .
|
||||
RUN cargo build --release --bin app
|
||||
|
||||
# We do not need the Rust toolchain to run the binary!
|
||||
FROM debian:buster-slim AS runtime
|
||||
WORKDIR /app
|
||||
COPY --from=builder /app/target/release/app /usr/local/bin
|
||||
ENTRYPOINT ["/usr/local/bin/app"]
|
||||
```
|
||||
|
||||
The above is the original example, but there are some flaws, it relies on the
|
||||
checksum of the recipe.json to be the same. If you do a change in one of your
|
||||
crates it will bust the hash of the recipe.json, because we just load all the
|
||||
files in `COPY . .`.
|
||||
|
||||
Instead, what we would like to do is just load in the `Cargo.toml` and
|
||||
`Cargo.lock` files in for our workspace, as well as any crates we've got. And
|
||||
then dynamically construct empty main and lib.rs files to act as the binaries.
|
||||
This is the simplest approach, but very bothersome in a `Dockerfile`.
|
||||
|
||||
```Dockerfile
|
||||
FROM rustlang/rust:nightly as base
|
||||
|
||||
FROM base as dep-builder
|
||||
WORKDIR /mnt/src
|
||||
COPY **/.Cargo.toml .
|
||||
COPY **/.Cargo.toml .
|
||||
|
||||
RUN echo "fn main() {}" >> crates/<some-crate>/src/main.rs
|
||||
RUN echo "fn main() {}" >> crates/<some-crate>/src/lib.rs
|
||||
|
||||
RUN echo "fn main() {}" >> crates/<some-other-crate>/src/main.rs
|
||||
RUN echo "fn main() {}" >> crates/<some-other-crate>/src/lib.rs
|
||||
|
||||
# ...
|
||||
|
||||
RUN cargo build # refreshes registry, fetches deps, compiles thems, and links them to a dummy binary
|
||||
|
||||
FROM base as builder
|
||||
|
||||
WORKDIR /mnt/src
|
||||
|
||||
COPY --from=dep-builder target target
|
||||
COPY **/.Cargo.toml .
|
||||
COPY **/.Cargo.toml .
|
||||
COPY crates crates
|
||||
|
||||
RUN cargo build # Compiles user code and links everything together, reuses cache from incremental build done previously
|
||||
```
|
||||
|
||||
This is very cumbersome, as you have to remember to update the `echo` lines set
|
||||
above. You can script your way out of it, but it is just an ugly approach, that
|
||||
is hard to maintain and grok.
|
||||
|
||||
## The solution built in dagger
|
||||
|
||||
Instead what we can do in `dagger` is to use a proper programmatic tool for
|
||||
this.
|
||||
|
||||
```rust
|
||||
// Some stuff omitted for brevity
|
||||
|
||||
# 1
|
||||
let mut rust_crates = vec![PathBuf::from("ci")];
|
||||
|
||||
# 2
|
||||
let mut dirs = tokio::fs::read_dir("crates").await?;
|
||||
while let Some(entry) = dirs.next_entry().await? {
|
||||
if entry.metadata().await?.is_dir() {
|
||||
rust_crates.push(entry.path())
|
||||
}
|
||||
}
|
||||
|
||||
# 3
|
||||
fn create_skeleton_files(
|
||||
directory: dagger_sdk::Directory,
|
||||
path: &Path,
|
||||
) -> eyre::Result<dagger_sdk::Directory> {
|
||||
let main_content = r#"fn main() {}"#;
|
||||
let lib_content = r#"fn some() {}"#;
|
||||
|
||||
let directory = directory.with_new_file(
|
||||
path.join("src").join("main.rs").display().to_string(),
|
||||
main_content,
|
||||
);
|
||||
let directory = directory.with_new_file(
|
||||
path.join("src").join("lib.rs").display().to_string(),
|
||||
lib_content,
|
||||
);
|
||||
|
||||
Ok(directory)
|
||||
}
|
||||
|
||||
# 4
|
||||
let mut directory = directory;
|
||||
for rust_crate in rust_crates.into_iter() {
|
||||
directory = create_skeleton_files(directory, &rust_crate)?
|
||||
}
|
||||
```
|
||||
|
||||
You can find this in
|
||||
[cuddle-please](https://git.front.kjuulh.io/kjuulh/cuddle-please/src/branch/main/ci/src/main.rs).
|
||||
Which uses dagger as part of its `ci`. Anyways, for those not versed on `rust`,
|
||||
which most people probably arent. What is happening here, in rough terms:
|
||||
|
||||
1. We create a list of known crates. In this case ci, is added, because it is a
|
||||
bit special.
|
||||
2. We list all folders in the folder crates and add them to `rust_crates`
|
||||
3. An inline function is created, which has the option of adding a new file to
|
||||
an existing directory, in this case it adds both a main.rs and lib.rs file
|
||||
with some dummy content to a given path.
|
||||
4. Here we apply these files for all the crates we found above.
|
||||
|
||||
This is roughly equivalent to what we had above, but this time we can test
|
||||
individual parts of the code, or even share it. For example, I could create a
|
||||
rust library containing this functionality which I could reuse across all of my
|
||||
projects. This is a game-changer!
|
||||
|
||||
> Note that rust is a bit more verbose than the other sdks, especially in
|
||||
> comparison to the dynamic once, such as Python or Elixir. But to me this is a
|
||||
> plus, because it allows us to work in the language we're most comfortable
|
||||
> with, which in my case is `rust`
|
||||
|
||||
You can look at the rest of the
|
||||
[file](https://git.front.kjuulh.io/kjuulh/cuddle-please/src/branch/main/ci/src/main.rs),
|
||||
but now if I actually build using `cargo run -p ci`, it will first do everything
|
||||
while it builds its cache, and then afterwards if I do a code change in any of
|
||||
the files, only the binary will be recompiled and linked.
|
||||
|
||||
This is mainly because of these two import of files (which are equivalent to
|
||||
`COPY` in dockerfiles)
|
||||
|
||||
```rust
|
||||
# 1
|
||||
let dep_src = client.host().directory_opts(
|
||||
args.source
|
||||
.clone()
|
||||
.unwrap_or(PathBuf::from("."))
|
||||
.display()
|
||||
.to_string(),
|
||||
dagger_sdk::HostDirectoryOptsBuilder::default()
|
||||
.include(vec!["**/Cargo.toml", "**/Cargo.lock"])
|
||||
.build()?,
|
||||
);
|
||||
# 2
|
||||
let src = client.host().directory_opts(
|
||||
args.source
|
||||
.clone()
|
||||
.unwrap_or(PathBuf::from("."))
|
||||
.display()
|
||||
.to_string(),
|
||||
dagger_sdk::HostDirectoryOptsBuilder::default()
|
||||
.exclude(vec!["node_modules/", ".git/", "target/"])
|
||||
.build()?,
|
||||
);
|
||||
```
|
||||
|
||||
1. Will load in only the Cargo files, this allows us to only cache-bust if any
|
||||
of those files change.
|
||||
2. We load in everything except for some stuff, this is a mix of `COPY` and
|
||||
`.dockerignore`.
|
||||
|
||||
Now we simply load them at different times and execute builds in between:
|
||||
|
||||
```rust
|
||||
# 1
|
||||
let rust_build_image = client.container().from(
|
||||
args.rust_builder_image
|
||||
.as_ref()
|
||||
.unwrap_or(&"rustlang/rust:nightly".into()),
|
||||
);
|
||||
|
||||
# 2
|
||||
let target_cache = client.cache_volume("rust_target");
|
||||
|
||||
# 3
|
||||
let rust_build_image = rust_build_image
|
||||
.with_workdir("/mnt/src")
|
||||
.with_directory("/mnt/src", dep_src.id().await?)
|
||||
.with_exec(vec!["cargo", "build"])
|
||||
.with_mounted_cache("/mnt/src/target/", target_cache.id().await?)
|
||||
.with_directory("/mnt/src/crates", src.directory("crates").id().await?);
|
||||
|
||||
# 4
|
||||
let rust_exe_image = rust_build_image.with_exec(vec!["cargo", "build"]);
|
||||
|
||||
# 5
|
||||
rust_exe_image.exit_code().await?;
|
||||
```
|
||||
|
||||
1. Do a `FROM` equivalent, creating a base container.
|
||||
2. Builds a cache volume, this is extremely useful, because you can setup a
|
||||
shared cache pool for these volumes, so that you don't have to rely on
|
||||
buildkit-layer caching. (what is normally used in Dockerfiles)
|
||||
3. Here we build the image
|
||||
1. First we set the workdir,
|
||||
2. then load in the directory fetched from above, this includes, the Cargo
|
||||
files as well as stub main and lib.rs files
|
||||
3. Next we fire off a normal build with `with_exec` which function like a
|
||||
`RUN`. here we build the stub, with refreshed registry, downloaded and
|
||||
compiled dependencies.
|
||||
4. We load in the rest of the source and replace `crates` with out own
|
||||
crates, this loads in the proper `.rs` files.
|
||||
4. We now build the actual binary
|
||||
5. We trigger exit_code, to actually run the dag, everything previously had been
|
||||
lazy, so if we didn't fire off the exit_code, or do another code action on
|
||||
it, we wouldn't actually execute the step. Now dagger will figure out the
|
||||
most optimal way of running our pipeline for maximum performance and
|
||||
cacheability.
|
||||
|
||||
## This is very verbose
|
||||
|
||||
Rust is a bit more verbose than other languages, especially in comparison to
|
||||
scripting languages. In the future, I would probably package this up, and
|
||||
publish this as a `crate` I can depend on myself. This is super nice, and would
|
||||
make it quite easy to share this across all of my projects.
|
||||
|
||||
That project like in my previous
|
||||
[post](https://blog.kasperhermansen.com/posts/cuddle/) could serve as a singular
|
||||
component, which could be tested in isolation, and serve as a proper api, and
|
||||
tool in general. This is something very hard, if not impossible with regular
|
||||
`Dockerfiles` (without templating).
|
||||
|
||||
# Conclusion
|
||||
|
||||
I've shown a rough outline of what dagger is, why it is useful and how you can
|
||||
do stuff with it that isn't possible using `Dockerfile` proper. The code
|
||||
examples show some contrived code, that highlight that you can solve real
|
||||
problems, using this new paradigm of mixing code with orchestration. In this
|
||||
case an unholy union of `rust` and `buildkit` through `dagger`.
|
Reference in New Issue
Block a user