Clangd Setup for Embedded Development

February 1, 2025

Clangd and `compile_commands.json`

Clangd is a wonderful language server for C/C++ development. It works with tons of different compilers, target architectures, configurations, build environments, etc. In order to support all these different scenarios, it makes use of a special file (increasingly becoming a standard among build systems) named compile_commands.json.

This file is generated by build systems like CMake or Meson, and is quite literally just a big list of compiler commands listed out in json format. That is, for each source file that the build system inputs to your compiler, compile_commands.json contains that file and the list of flags your compiler uses to build it (think: standard library version, include paths, environment variables, config flags, etc). This allows cool things to work like clangd correctly differentiating between the different c/c++ versions for included libraries vs your own code!

With most typical environments (eg. compiling to native host architecture, regular linux distros), as long as clangd finds your compile_commands.json, you’ll be good to go. Sometimes you’ll have to add a build system flag to generate it, and maybe symlink the json file to the project root, but that’s it.

With compile_commands.json, Clangd will do the same thing your compiler does for each file: first try to resolve includes using any passed-in include paths (eg. via -I or -isystem), then check the usual places like /usr/lib/. Most critical things, like libc or libstdc++ headers, will be found in the usual system directories.

Embedded Development

When writing code for embedded devices, or even any devices with a different system architecture, things change up a bit!

The header files in the usual places are no longer valid for resolving includes; you’re not building binaries for your native system, so native system libraries don’t apply anymore. Sometimes, if everything clangd needs to know is specified in your compile_commands.json, it will still Just Work™. However, if clangd fails to understand your compiler toolchain (as is often the case when using gcc and especially with gcc-arm-embedded), you will have errors in your editor!

To fix this, clangd has a special cli flag: --query-driver. This flag is a glob pattern that whitelists certain gcc-compatible compiler binaries that clangd is allowed to then query for more correct include paths. This way, clangd can call out to gcc (or gcc-arm-embedded) directly instead of guessing what it thinks the include paths should be!

Let’s see this in action. First, without --query-driver (output shortened for clarity):

$ clangd --check=main.cpp
I[02:28:06.482] Loaded compilation database from compile_commands.json
I[02:28:06.485] Compile command from CDB is: arm-none-eabi-g++ --target=arm-none-eabi ...
I[02:28:06.500] internal (cc1) args are: -cc1 -triple thumbv7em-none-unknown-eabi ...
E[02:28:07.548] [pp_file_not_found] Line 6: in included file: 'gnu/stubs-32.h' file not found
I[02:28:07.601] All checks completed, 1 errors

Clangd knows from our compile_commands.json that we (the build system) are using arm-none-eabi-g++ to compile our code for a non-native target. But since it still uses clang’s header resolving logic under the hood instead of our compiler’s, it gets some stuff wrong. In this case, the error about 'gnu/stubs-32.h' file not found is because clangd is trying to use our system’s glibc headers, which causes conflicts due to architecture differences.

Now again, but with --query-driver='**' (the ** tells clangd it can query any compiler binary):

$ clangd --query-driver='**' --check=main.cpp
I[02:28:06.482] Loaded compilation database from compile_commands.json
I[02:38:09.179] System includes extractor: successfully executed arm-none-eabi-g++
      got includes: "/path/to/gcc-arm-embedded/arm-none-eabi/include/c++/12.3.1, ..."
I[02:28:06.485] Compile command from CDB is: arm-none-eabi-g++ --target=arm-none-eabi ...
I[02:28:06.500] internal (cc1) args are: -cc1 -triple thumbv7em-none-unknown-eabi ...
I[02:38:10.655] All checks completed, 0 errors

As we can see, clangd is now calling our embedded gcc compiler directly, asking it for the correct include paths to add to its resolving logic. Using strace, we can even see the command it’s calling directly:

$ strace -f -e trace=process -o out.txt clangd --query-driver='**' --check=main.cpp > /dev/null 2>&1
$ cat out.txt
...
4063415 execve("/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/arm-none-eabi-g++", ["/nix/store/i3m8xrhhnb7l83cpwdd9r"..., "-E", "-v", "-x", "c++", "-"], 0x7ffec9e80478 /* 177 vars */ <unfinished ...>
...

We can see clangd invoked arm-none-eabi-g++ -E -v -x c++ -. Let’s run it ourselves and check the output:

$ arm-none-eabi-g++ -E -v -x c++ -
#include "..." search starts here:
#include <...> search starts here:
 /nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include/c++/12.3.1
 /nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include/c++/12.3.1/arm-none-eabi
 /nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include/c++/12.3.1/backward
 /nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/include
 /nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/include-fixed
 /nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include
End of search list.

And we see our compiler toolchain’s include paths! Awesome!

NixOS’s Fault

One of the design goals of clangd (as far as I understand) is that it behaves very similarly to clang. This makes sense; you don’t want your code to compile with no errors but then to have errors from clangd in your editor at the same time!

So, since compilers automatically search common default paths (like /usr/lib) for header files and shared libraries, clangd does too. This way, if you spin up a main.c and add a #include <stdio.h>, it just works; you don’t have to say “Please use my system compiler’s libc for printf.”

However, these paths don’t exist in NixOS, as everything is stored in unique isolated paths under /nix/store. So what do the nixpkgs authors do to make it still work? The same thing they do for compilers: add a wrapper bash script to set environment variables!

Unfortunately, in the case of clangd, it’s not a very good wrapper. In fact, in the earlier working example using --query-driver, I had to use the unwrapped clangd binary from nixpkgs to get to the correct output! Let’s take a look at why it’s bad (note that “@clang@” is substituded with clang’s nix store output path during the build process):

...
export CPATH=${CPATH}${CPATH:+':'}$(buildcpath ${NIX_CFLAGS_COMPILE} \
        $(<@clang@/nix-support/libc-cflags)
    ):@clang@/resource-root/include
export CPLUS_INCLUDE_PATH=${CPLUS_INCLUDE_PATH}${CPLUS_INCLUDE_PATH:+':'}$(buildcpath ${NIX_CFLAGS_COMPILE} \
        $(<@clang@/nix-support/libcxx-cxxflags) $(<@clang@/nix-support/libc-cflags) \
    ):@clang@/resource-root/include

exec -a "$0" @unwrapped@/bin/$(basename $0) "$@"

According to gcc’s docs, CPATH and CPLUS_INCLUDE_PATH tell the compiler (clangd uses clang for parsing under the hood, so it’s affected too!) to search the specified list of directories for header files exactly as if they were passed in with -I.

I other words, the wrapper is unconditionally hardcoding paths to the standard library header files asociated with our system’s clang; which, in our case, is glibc and libstdc++. Yikes! What about when we’re not using those? This wrapper assumes that we’re always using clangd for native-only development and never with custom toolchains.

I am unclear as to whether similar errors would occur in regular linux distributions without using --query-driver. My gut tells me that as long as the --query-driver’s include flags come first in the list, having the system’s c/c++ standard library headers hardcoded as /usr/local/whatever paths wouldn’t cause errors. But it is worth acknowledging that NixOS’s clangd wrapper caused the error I was having.

Since we don’t need any of the system’s default header files for our embedded development, running the unwrapped version of clangd with our query-driver='**' passed in will work just fine. However, this does break the works-out-of-the-box behavior that clang and gcc have and that I mentioned earlier, which is a hard sacrifice to make.

For now, I have my editor configured to use the unwrapped clangd binary, which I have added to my path using the following Nix code:

home.packages = let
  clangdUnwrapped = pkgs.runCommand "clangdUnwrapped" {} ''
    mkdir -p $out/bin
    ln -s ${pkgs.clang.cc}/bin/clangd $out/bin/clangd-unwrapped
  '';
in [
  clangdUnwrapped
];

Neovim lspconfig configuration for clangd:

local servers = {
  ...
  clangd = {
    cmd = {
      'clangd-unwrapped',
      '--query-driver=**', -- whitelists all compiler binaries. Technically a security risk, but I'm usually working in trusted environments
      '--enable-config', -- allows clangd to parse global or project-local configuration files
      '--background-index',
      '--clang-tidy',
      -- '--log=verbose',
    },
  },
}

Inconvenient for sure, but it works well enough as a temporary solution.

In the meantime, I’ll keep an eye on these issue and PRs:

(PR) https://github.com/NixOS/nixpkgs/pull/354755
(Issue) https://github.com/NixOS/nixpkgs/issues/348791
(Issue) https://github.com/clangd/clangd/issues/2181

That’s all. Hope you enjoyed or learned something!

Using Python VirtualEnvs in NixOS

December 13, 2024

Quick preface: this post is about a NixOS specific issue and thus expects some base level Nix experience. Hopefully soon when I have more time I’ll write more about how Nix works, why to use it, its tradeoffs, etc. That’s for another time though for now.

Python Venvs

Python virtual environments (venvs) are the official solution for managing python dependency hell and version conflict. They’re super nice; each project gets its own venv which houses all the project’s dependencies/libraries separately from everything else on the system in a simple and familiar layout:

Creating one is as simple as python -m venv .venv using your system-installed python, which is then set to be the python interpreter for that venv via a symlink (just use a different python3.x binary during setup to set a different python version for that venv).

To “enter” the venv, run source .venv/bin/active (or whatever the equivalent is for your inferior OS). Now pip as seen in the directory tree is added to your path, and you can pip install xyz just as if you were installing globally, but it only affects the local venv!

Python in NixOS

NixOS has it’s own special way to manage python installations. Nixpkgs, the Nix package collection, has over 8 thousand python libraries packaged as first class derivations, which are composable with any given python interpreter to create fully isolated, infinitely granular, unbreakable and immutable python installations. It’s extremely powerful (far more than regular python venvs are), however it requires a complete paradigm shift, which isn’t beneficial whatsoever for those not using Nix. So, especially when working with teams, it’s sometimes just not worth the extra effort.

Thankfully venvs work just fine with Nix’s python binaries. Just put any python version into your PATH and use the same command as above. There’s a catch though, and it has to do with compiled python libraries.

Compiled Python Libraries

Many if not most python libraries (eg. numpy, polars, pandas) are written in more efficient compiled programming languages and exposed as python wrappers around the more efficient code. This requires compiling native executables (or shared libaries) for the target machine and having python load them into the runtime correctly. However, these compiled libraries are often dynamically linked, requiring other dynamic libraries (libstdc++.so.6 for example) to be present on the system to run.

On most linux/unix systems this is all well and good as the libraries are present in /usr/local/lib/ and the dynamic library loader (/lib/ld-linux-x86-64.so.2) which python is linked against knows how to find them there. However, on NixOS, those libraries don’t exist there, but rather in the Nix store (/nix/store/…)! And since we don’t control the compilation process of python libraries from pypi, we aren’t able to patchelf them automatically to reference the nix store library paths instead.

Of course since python is so popular there is a dedicated wiki.nixos.org page for this. There’s few solutions listed:

The first one is a tool that runs patchelf on all the compiled libraries in your venv, but it requires setting up the venv a little differently and a manual run on every package update, which feels too clumsly
The second one recommends using buildFHSEnv, which basically puts your shell in an environment that emulates the regular FHS paths like /lib. I’ve used this a lot before and it works well, but still requires manual setup work for each new python project, which I’d like to avoid.
The last suggestion is to use a tool called nix-ld, which is what I’ll be going with!

Nix-ld

Nix-ld’s intended purpose is indeed to make unpatched dynamic executables run transparently on NixOS, which is exactly what we’re looking for! To accomplish this, it inserts a set of user-defined shared libraries (usually just the minimum necessary) into a path specified under $NIX_LD_LIBRARY_PATH, and adds a wrapper dynamic library loader at the default FHS path which simply adds the aforementioned shared library path to $LD_LIBRARY_PATH before jumpstarting the regular dynamic library loader. That’s a lot of jumble to basically say, it gives all unpatched dynamic executables access to a set of user-defined shared libraries using environment variables.

There’s one problem with this though, which is mentioned in nix-ld’s readme: This breaks subtly when using interpreters from nixpkgs that load the unpatched dynamically compiled libraries; which is exactly our case with Nixpkgs python. This is because in order for the nix-ld shared library path to be used by the interpreter when loading the compiled libary, the interpreter needs to have been started using the wrapper dynamical library loader created by nix-ld, as it does the task of inserting the path into $LD_LIBRARY_PATH of the child process. We can see that it’s still failing, even after installing nix-ld:

python main.py
> ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory

The Full Solution

So, since Nixpkgs python isn’t started with the wrapper dynamic library loader (because it’s properly patched!), we have to do the job of telling it to use the nix-ld shared library path ourselves. We can just do:

LD_LIBRARY_PATH=$NIX_LD_LIBRARY_PATH python main.py

This is already enough to get our python interpreter to properly load dynamic compiled python libraries. But this requires an ugly environment variable manipulation every time you want to call python or pip! Unacceptable. Instead, I prefer to wrap all of the python binaries installed on my system to do this for me, so that I can manipulate venvs without ever having to think about it. This is the generalized solution I came up with to do just that:

home.packages =
  let
    # Some dynamic executables are unpatched but are loaded by patched nixpkgs
    # executables, and therefore never pick up NIX_LD_LIBRARY_PATH. For
    # example, interpreters that use dynamically linked libraries, like python3
    # libraries run by nixpkgs' python. This wraps the interpreter for ease of
    # use with those executables. WARNING: Using LD_LIBRARY_PATH like this can
    # override some of the program's dylib links in the nix store; this should
    # be generally ok though
    makeNixLDWrapper = program: (pkgs.runCommand "${program.pname}-nix-ld-wrapped" { } ''
      mkdir -p $out/bin
      for file in ${program}/bin/*; do
        new_file=$out/bin/$(basename $file)
        echo "#! ${pkgs.bash}/bin/bash -e" >> $new_file
        echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$NIX_LD_LIBRARY_PATH"' >> $new_file
        echo 'exec -a "$0" '$file' "$@"' >> $new_file
        chmod +x $new_file
      done
    '');
  in
  with pkgs; [ ... (makeNixLDWrapper python3) ... ]

Taking a look at the embedded bash script more closely, with our python example (All nix store paths are automatically substituted in during the build process):

mkdir -p /nix/store/xgb<...>xsc-python3-nix-ld-wrapped/bin
# the python derivation has 14 binaries and/or aliases! Let's wrap each one
for file in /nix/store/3wb<...>jir-python3-3.11.9/bin/*; do
  new_file=$out/bin/$(basename $file)
  echo "#! /nix/store/1xh<...>qns-bash-5.2p32/bin/bash -e" >> $new_file

  # !! Insert user-defined shared library path into child process
  echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$NIX_LD_LIBRARY_PATH"' >> $new_file
  # !! Call the unwrapped binary, but set its argv[0] to this wrapper script
  echo 'exec -a "$0" '$file' "$@"' >> $new_file

  chmod +x $new_file
done

I found that it’s important to set argv[0] of the child process (regular unwrapped python in our case) to be the path to its wrapper script instead of the regular binary (see the exec -a flag). This is because if you don’t, creating new venv’s will symlink the unwrapped python interpreter into the venv, which will replace the wrapped one as soon as you enter rendering the wrapper useless.

(I also did try using bash heredocs instead of the separate echo’s, but kept getting mysterious syntax/formatting errors. Bash is so cursed lmao)

That’s all, hope you enjoyed or learned something :P

Building Non-Native Docker Images with QEMU and binfmt

November 2, 2024

So, you’ve finally finished the perfect Dockerfile for your project. You build it and test it on your machine; all is working well. You export the tarball, scp it over to your production server, and load it into the docker daemon - but wait! You forgot that your server has an Arm CPU and your laptop is x86_64. This won’t work, it’s an architecture mismatch! Womp womp.

There’s 3 solutions to the problem.

Build the Dockerfile on your production server (or another Arm computer). You’ll have to move over your project’s source code and dependencies for this too… Ew!
Rewrite your Dockerfile to support cross compilation. That’s a lot of work, and requires cross compilation compiler flags…
Simply enable ✨magic✨. And by that I of course mean emulating the docker build process using QEMU and binfmt_misc! Keep reading to learn more.

QEMU

As described on its website, QEMU is “a generic and open source machine emulator and virtualizer.” It’s super powerful and honestly feels like some black magic; but I’ll leave that for you to discover! It has two main modes: System Emulation and User Mode Emulation.

QEMU Zsh autocomplete list

With system emulation, QEMU emulates an entire foreign computer (optionally paired with a hypervisor like KVM/Xen to take advantage of native virtualization features), allowing full operating systems to be run that are built for nearly any non-native CPU architecture.

With user mode emulation, QEMU emulates only the CPU of an non-native binary, allowing, for example, a powerpc or armv6 binary to be run on an x86_64 machine. This uses dynamic binary translation for instruction sets, syscalls (fixing endianness and pointer-width mismatches), signal handling and threading emulation - in other words, basically magic. Notably it’s much faster than system emulation, which has to emulate the kernel, peripheral devices, and more.

As I mentioned earlier, our goal is to emulate the docker build process so we can generate non-native container images; i.e., docker build -t my-container . but with a different cpu architecture. User mode emulation is perfect for this!

binfmt

Ok, so we’ve installed QEMU. How do we tell Docker to use it? Here’s where magic part two comes in: binfmt_misc. Straight from the Wikipedia page: “binfmt_misc is a capability of the Linux kernel which allows arbitrary executable file formats to be recognized and passed to certain user space applications, such as emulators and virtual machines.” Essentially, we’re telling the linux kernel, “Actually, you do know how to run this foreign-architecture binary: just feed it into QEMU!” So we get:

$ ./ForeignHelloWorld
zsh: exec format error: ./ForeignHelloWorld    # before
$ ./ForeignHelloWorld
Hello, World!                                  # after

To set this up on most linux distributions, use one of multiarch/qemu-user-static or dbhi/qus or tonistiigi/binfmt; they all do roughly the same thing. For qemu-user-static, simply run

$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

You might be wondering, how does an isolated docker container install binfmt registrations for qemu persistently on my host kernel??

Registering qemu to binfmt_misc

These “intaller” containers utilize Docker’s --privileged mode to mount the /proc/sys/fs/binfmt_misc directory, which will be the same for the host and in the container. They contain statically compiled builds of qemu with user mode only (qemu-user-static), and register them directly to the mounted binfmt_misc directory. When the container exits, the registrations persist on this host machine. You’ll end up with registrations looking like this:

$ cat /proc/sys/fs/binfmt_misc/qemu-arm   # host machine
enabled
interpreter /usr/bin/qemu-arm-static
flags: F
offset 0
magic 7f454c4601010100000000000000000002002800
mask ffffffffffffff00ffffffffffff00fffeffffff

The two important parts here are flags and interpreter. Again you might be wondering, I don’t see /usr/bin/qemu-arm-static on my host system, so where where does the kernel find the qemu binary?? If you weren’t wondering you should’ve been, because that’s a path to qemu from within the installer container, which was then promptly removed (--rm) altogether. Take a look (credit):

$ docker run --rm multiarch/qemu-user-static:x86_64-aarch64 /usr/bin/qemu-aarch64-static --version
qemu-aarch64 version 7.2.0 (Debian 1:7.2+dfsg-1~bpo11+2)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

The fundamental problem is that the kernel has do a path lookup for the registered binfmt interpreter at the time when a foreign binary is invoked. But if it’s invoked in, say, a container or chroot environment, the path to the interpreter is obviously no longer valid. the F flag is the solution to this problem created by Jonathan Corbet. Instead of locating the interpreter binary lazily (at the time of first invocation), it’s given a file descriptor immediately after being registered. This way, when the kernel needs to find the interpreter in a chroot or container where the path doesn’t exist, it just uses the preallocated, always-valid file descriptor for it instead!

This is also why the qemu binaries have to be static; if they were dinamically linked, the dynamic library loader lookup would obviously fail within the chroot/container environment.

So to answer the previous question about where the kernel finds /usr/bin/qemu-arm-static (I’m fairly confident about this but please somebody correct me if I’m wrong!): when the installer container registers qemu-arm-static to binfmt_misc, it’s immediately loaded by the kernel using the valid path from within the installer container, and the given file descriptor for it persists after the container exits. It doesn’t persist on reboot though, which is why the best solution I could find online is literally to rerun the installer on every boot with a systemd service!

Docker Build

Now all we have to do is make the Docker build process use QEMU from our binfmt_misc registrations. Fortunately, this is pretty easy!

Firstly, for a multiplatform build (i.e. if you want to build your docker container natively and in one or more non-native architectures and combine them into one multiarch build), make sure the containerd image store is enabled for the docker daemon. Podman shouldn’t have this problem; just a classic case docker being annoying (see this and this, or my NixOS setting).

Then, you may have to create a new docker buildx builder (especially for multiplatform builds):

$ docker buildx create \
  --name my-container-builder \
  --driver docker-container \
  --driver-opt=default-load=true \
  --use --bootstrap

Lastly, just run your build and specify the target platform!

$ docker buildx build -t my-container-image --platform=linux/arm64 .
      # or
$ docker buildx build -t my-multiplatform-container-image --platform=linux/arm64,linux/amd64,<whatever_else> .

You’ll know it’s working if you see something like this (this is a NextJS build, emulated for aarch64. Note that NixOS paths are slightly sifferent; the underlined path is a symlink to qemu-user-static for aarch64)

NixOS

Of course, NixOS has a super fun and declarative way to set this all up and I can’t help but share that here as well.

Instead of needing to use container-based qemu-user-static binfmt_misc installers (on every boot), NixOS provides a configuration module for binfmt: boot.binfmt.emulatedSystems. This sets up qemu-* for the given systems architectures automatically, even including wasmtime for wasm files and wine for Windows executables! To ensure the enterpreters are statically compiled versions (qemu-*-static), we can use pkgsStatic.qemu-user (requires nixpkgs-unstable; see nixpkgs#314998 and nixpkgs#334859). Lastly, to set the F flag for the registrations, we can use fixBinary = true;. Here’s what we end up with (thanks to Ten for this dicourse comment!):

boot.binfmt.emulatedSystems =
  let
    emulationsBySystem = {
      "x86_64-linux" = [
        "aarch64-linux" # qemu
        "armv6l-linux"
        "armv7l-linux"

        "x86_64-windows" # wine
        "i686-windows"

        "riscv32-linux" # qemu
        "riscv64-linux"

        "wasm32-wasi" # wasmtime
        "wasm64-wasi"
      ];
    }
  in
  emulationsBySystem.${pkgs.system};

# backport of preferStaticEmulators to nixos-24.05
boot.binfmt.registrations = lib.mergeAttrsList (system:
  {
    ${system}={
      interpreter = (pkgsUnstable.lib.systems.elaborate { inherit system; }).emulator pkgsUnstable.pkgsStatic;
      fixBinary = true;
    }
  }) config.boot.binfmt.emulatedSystems;

That last part is a backport of preferStaticEmulators for nixos-24.05. In nixos-unstable and nixos-24.11 it will become just

boot.binfmt.preferStaticEmulators = true;

Conclusion

In this post I (attempted to) explain the process of using QEMU user mode emulation and binfmt_misc registrations to automagically build non-native docker images. I hope you learned something! Feel free to bother me with questions, comments, or corrections.

Here are some of the resources I used:

P.S. In one of my next posts, I’ll talk about how I irradicated Dockerfiles alltogether, replacing them completely with pure nix :) More to come!

Compiling Sway with wlroots from Source

May 25, 2024

I used this process when I needed to compile the latest commit of Sway to test a new feature, and my distro (NixOS) didn’t have the required wlroots version packaged so I needed to compile that from source too. With the correct clangd setup in your editor, autocomplete/intellisense for sway & wlroots will both work flawlessly. Disclaimer: This isn’t official; there may be better ways to do it (feel free to ping me on the original gist). With that out of the way, here are the steps I used:

wlroots

Clone (git clone https://gitlab.freedesktop.org/wlroots/wlroots)
Obtain deps

Use your system’s package manager to obtain all of wlroots’ dependencies (listed here). A Nix shell which accomplishes just that is at the bottom of this post.

Setup build & Compile

--prefix tells Meson to use a new out subdir as the prefix path (for .so and .h files ) instead of the default /usr/local
-Doptimization is to avoid compiler errors from Nix’s fortify flags (alternatively use Nix’s hardeningDisable = [ "all" ];)

meson setup build -Ddebug=true -Doptimization=2 --prefix=/home/<user>/dev/wlroots/build/out
ninja -C build install # installs headers and shared object files to the previously specified prefix

Sway

Clone (git clone https://github.com/swaywm/sway/)
Obtain deps

Use your system package manager again (or the Nix shell at the end of this post). Sway’s build deps are listed here

Setup build & compile

The fun part:

PKG_CONFIG_LIBDIR="/home/<user>/dev/wlroots/build/out/lib/pkgconfig/" meson setup build -Ddebug=true -Doptimization=2
# ^ tells Meson to look for wlroots in our previously specified location
ninja -C build

Nix shell

Running nix-shell (or nix develop -f ./shell.nix) with this shell.nix in your CWD will add all of the listed dependencies to your environment without installing them to your system.

{ pkgs ? import <nixpkgs> { }, ... }:
pkgs.mkShell {
  name = "wlroots";
  packages = with pkgs; [
    # Wlroots deps
    libdrm
    libGL
    libcap
    libinput
    libpng
    libxkbcommon
    mesa
    pixman
    seatd
    vulkan-loader
    wayland
    wayland-protocols
    xorg.libX11
    xorg.xcbutilerrors
    xorg.xcbutilimage
    xorg.xcbutilrenderutil
    xorg.xcbutilwm
    xwayland
    ffmpeg
    hwdata
    libliftoff
    libdisplay-info

    # Sway deps
    libdrm
    libGL
    wayland
    libxkbcommon
    pcre2
    json_c
    libevdev
    pango
    cairo
    libinput
    gdk-pixbuf
    librsvg
    wayland-protocols
    xorg.xcbutilwm
    wayland-scanner
    scdoc

    # Build deps
    pkg-config
    meson
    ninja
  ];
}

How to Control External Monitor Brightness in Linux

May 11, 2024

Do you have an external monitor? Do you want to change it’s backlight brightness, but hate having to reach to fiddle with it’s awkwardly placed and unintuitive button interface?

Me too! Thankfully, there’s a widely supported protocol called Display Data Channel (DDC) based on i2c that lets us solve this problem in a nice way. It allows for communication between a computer display and a graphics adapter - things like setting color contrast, getting model name/date information, and of course, setting backlight brightness.

Control displays using ddcutil

To manually query or change monitor settings from the command line, install the ddcutil program. Use it to detect which of your monitors are ddc-capable with:

ddcutil detect

or to set backlight brightness with:

ddcutil setvcp 10 $BRIGHTNESS -b $I2C_BUS

where 10 corresponds to the backlight brightness setting, and $I2C_BUS is the unique i2c ID given to your monitor (/sys/bus/i2c/devices/i2c-{n}) which should be reported to you by ddcutil detect.

This process is definitely not ideal though; scripting ddcutil is quite painful (trust me on that) and prone to errors (running multiple commands too quickly causes “communication failed” errors!) Plus, nobody wants to manually detect and set bus ID’s!

Furthermore, none of the other cool brightness control software or modules (like Waybar’s backlight module) will work with your monitors. Thankfully, there’s a better solution!

ddcci-driver-linux

Linux has a standard “display brightness” interface (which is what’s used by all the brightness control software), in which each display is given a /sys/class/backlight/ entry for programs to interact with. To make this work with external ddc-capable monitors as well as regular laptop displays, one simply needs a program that bridges the gap between linux’s interface (/sys/class/backlight/) and ddc commands (/sys/bus/i2c/devices/i2c-{n}). Which is exactly what the ddcci-driver-linux kernel module does!

After properly installing the ddcci-driver-linux kernel module, /sys/class/backlight/ddcci{n} directories will be created for each capable external monitor. Now any backlight program will work with them!

Nvidia sucks

Unfortunately, Nvidia graphics cards can cause some trouble. Since the graphics adapter (in the case of an Nvidia GPU, Nvidia’s drivers) is required to do the messaging with capable monitors, it is responsible for some of the ddc-capability detection. And of course, Nvidia’s implementation of that is known to be faulty/broken.

If ddcutil doesn’t detect your monitors, try these workarounds. If the ddcci-driver-linux kernel module doesn’t create the necessary /sys/class/backlight/ entries, try this workaround (I implemented it in my NixOS configs here).

A Website? Explained.

July 9, 2023

When it comes to building and making things, I almost never feel satisfied without knowing how said thing works under the hood. A personal website is no different. In fact, I put off this project for so long for just that reason; I knew how much I didn’t know about the web, and knew that I wanted to learn it all properly instead of skimping out for some existing already-built-for-you solution.

I had never properly worked with the web before; things like how JavaScript frameworks work, what a web server does, or how SSL certificates are granted were all things I knew I would need to learn.

This article serves as a documentation of that process, and an explanation of the choices I made that led me to this website. I find that writing stuff like this down is the best way to make it stick. But it anyone else also somehow finds it useful, great!

Note: As my motivation came more from the desire to learn than to make a production-ready and maintainable system, my decisions may seem unsensible or borderline rediculous to experienced web people. Deal with it :P

The Backend

I started with the backend, figuring out how I would host the website, and learning the basics of how web hosting works in the process. I knew I didn’t want to use anything automatic with too many details hidden (e.g. Vercel), so I opted to find a cloud server that I could get access to for free (no trials) that I could host my website on manually.

After some searching, I found that the Oracle Cloud Always Free tier has the best free options out of any provider. So I got a VM of the shape that was available, and began installing software.

I opted for using the Nginx webserver, as it offers very low level and complete control over the webhosting, and is very efficient (and seemingly better than Apache at this point).

For SSL certs, I chose to use CertBot, which uses the LetsEncrypt certificate authority (this decision didn’t last - more on this later!). It was a relatively simple process, but it involved installing more software and many specific steps, commands, and configurations.

In the back of my head, as I continued to make all these small changes and additions that are impossible to remember, I knew I would regret not using something more reproducible (Docker, Nix, Bash scripts, etc). I knew that if I ever had to switch server providers or rebuild the system from scratch for any reason, I would have to relearn many small details all over again. But I continued on for the time being, to at least get to the point of hosting a webpage (I’ll get back to this later too).

After many many hours of:

configuring firewalls, iptables and DNS records
learning the Nginx syntax and its capabilities
finally understanding file permissions in Linux
installing SSL certs with CertBot
many failed attempts at fixing the SELinux security layer before I finally gave up and disabled it

I finally had a default landing page hosted on my domain. The next order of business:

The Frontend

I have only ever worked with raw HTML and JavaScript, and barely even any CSS. I knew however that I would want to write blogs in Markdown, and that I would want at least some extra capabilities and QOL features than what you get from just those basics. So I began to search for Static Site Generators (SSGs) and frameworks that I might want to use. Some things I looked for:

I don’t need lots of functionality, but I like a high complexity ceiling (aka opt-in complexity)
No rediculous abstractions; I still want to understand the entire system and compilation process
Preferably lightweight, minimal to no client-side JS
Fast compilation / build time

Out of those preferences, I found myself quite liking the approach that Astro takes. It allows for fully opt-in client-side JavaScript, is very simple to start with but allows for practically unlimited complexity (even mixing and matching any other JavaScript frameworks together), and has extremely well designed features, such as content collections. I spent a long long time over-analyzing and not able to make the final decision on what to use (there are far too many options), but eventually I just decided to go with it.

During the frontend design process, I learned that tinkering and perfecting with tiny details is quite addicting; enough to significantly increased the amount of time it took for me to finish putting the website together. Many hours of productivity have been accidentally lost to those details. It’s a lot easier to mess with paddings and drop-shadows in CSS than it is to think about what to write! I guess it’s something good to know about myself :p

As I continued to develop the website though, I had a realization. I now had access to a server fully capable of hosting my website, and my website design in development. But how was I going to connect them? How was I going to take the build ouput and transfer it over to the server?

Connecting the Frontend and Backend

I had to figure out a good way to transfer the website files to the server. Should I build the site locally or on another platform and copy that output to the server’s web-root directory manually after every update? Should I build the website on the server itself and constantly poll for updates in a loop?

After some thinking, I came up with these desires:

Source code for the website should be hosted on GitHub, but the build process should NOT tied into it
- Simple webhooks are OK - most git forges should have them
- GitHub Actions are NOT OK - they’re too rooted into GitHub, and hides too much of the process
The website is built separately from my computer, on the server itself
All I should have to do to trigger the rebuild and deploy process is a git push

I found out that GitHub (not surprisingly) provides a quite convenient webhook interface, allowing reposity owners to configure webhooks that get triggered upon different events that occur within the repository (such as a new commit or a push). This works perfectly for my case, as I can have the hosting server listen for said webhooks and begin the rebuild process when they’re received, which only requires me to run a git push to start the process! The only difficult part left is figuring out how to make the webhosting server actually listen for those webhooks.

A GitHub webhook can send to any IP or URL on any port, as a POST request with specific data and headers. Coming into this, I had absolutely no idea how I could make a computer listen for requests, or what that even really looked like or meant. Do you need a separate application for it? Is it common to have a public-facing server listen for random POST requests on an open port?

What I did know is that I already had an Nginx server running that was listening for POST requests, but for the website. So naturally, that was the first place my brain went to look for potential answers. But as I was too impatient to scour StackOverflow to decide if this was a good train of thought, I decided to have a (what turned out to be very long and detailed) conversation with ChatGPT about it! (that link contains a large portion of the conversation)

I was blown away by ChatGTP’s ability to understand my problem, and especially with its knowledge surrounding Nginx and its syntax. After a lot of conversation, I ended up with another Nginx server block–in the same config file I was already using for webhosting–to listen for the webhook POST requests and pass them to a “webhook server” running on port 3000 (something that parses the requests and can then execute commands depending on them): proxy_pass http://localhost:3000;.

At first, it recommended me to use a simple Bash script with NetCat as the “webhook server”, to listen to and parse the requests. That wasn’t an ideal solution though, due to many limitations and NetCat’s barebones-ness. So I searched online for other webhook servers, and found 2, both written in Go and hosted on GitHub. One had more recent commits, but the other had far more stars and popularity and has a GitHub Webhook configuration already built in, so I went with that one.

The webhook server I picked turned out to be very helpful and easy to work with. I copied the default GitHub Webhook configuration available in the repo, tweaked some parameters and added my secret field (basically an encrypted password), and set it to run my rebuild bash script (which I would show if not for security concerns). All the script has to do was pull the latest updates from my websites GitHub repo, rebuild, and copy the output to the webroot folder. And bam, I had a working connection between the front end source code and the backend hosting server!

An over-engineered diagram of the setup at this stage (I got carried away with Excalidraw lol):

Website Hosting Diagram

The Backend, Revisited (Dockerization)

At this point, I had configured and installed more things on the Oracle Cloud server than I could count. I knew that it would become a problem, as if at any time I would have to switch the server hosting platform or rebuild the system for any reason, I would have to figure it all out again. Installing tons of packages and software, configuring iptables, installing SSL certs, configuring file permissions for Nginx and its webroot, and working with but finally disabling SELinux. It was a lot.

Sensing my future regret, I decided to rebuild as much as I could of the setup in a Docker container, using what I’d learned so far to make it much more reproducible and understandable. This of course had to begin with learning Docker and its basic usage, but that went by fairly quick. More importantly, I had to figure out how I was going to combine the three main/difficult components of the setup:

Nginx, and its configuration
The Webhook server
Requesting and Installing SSL Certs

Nginx & Webhook Server (1 & 2)

This part was easiest; I started by using the officially supported Golang container to build Webhook manually. Then, utilizing Docker’s Multi-Stage builds (where you take stuff from one container to build the final one, leading a smaller final image), I copy the webhook binary into a fresh nginx container. Now all I had left to work out was SSL certs - the hardest part.

SSL Certs (3)

What makes the SSL certs more annoying is that you don’t want to re-request the certs from the certificate authority every single time you restart the container - it’s a very long process. It’s typical to wait to refresh your SSL certs for around 60 days, or even longer. This meant I would need to use a Docker Volume to keep the certificate files persistent between runs of the container.

I quickly learned that CertBot (which also forces users to install it as a Snap package :|) was not going to be ideal. Instead, I found acme.sh, a (posix shell compliant - no bash needed) shell script that can do everything CertBot can but better, plus a lot more.

Acme.sh has two main subcommands that are needed for issuing certs: acme.sh --issue, and acme.sh --install-cert. The issue command contacts the CAs and requests the certs, which is the long process that you don’t want to repeat. The install command just takes the certs that have already been received, and does something with them - in my case, I have configured it to simply copy them into a location where Nginx can read them.

In order to get certificates from a CA, they need to verify that you have control of the domain. There are many methods to do this, and acme.sh comes with 8 (as of writing), but the most common of them is called “Webroot mode.”

By giving acme.sh write-access to your webserver’s webroot folder (/usr/share/nginx in my case), the CA can verify that you own the domain by requesting acme.sh to add a file somewhere under the root (usually under /.well-known/acme-challenge/). When the CA then requests for that specific file, if your webserver is set up correctly, it should receive it, and therefor know that you own the domain.

Challenges Automating SSL Certs (3)

The above strategy works well and good, except for a problem that arises when attempting to automate the process in Docker: My Nginx configuration is set to host the webroot under https only; i.e., I don’t host anything without encryption. So how is Nginx supposed to host the file that acme.sh generates to request the SSL certs, without there being any SSL certs yet!

My solution: A super minimal, http-only Nginx configuration that is used just to request the SSL certs. Once they have been received and installed, Nginx will then be reloaded with the full https-only configuration. The amount of complexity here made me move the logic into a separate init_container.sh script, which then becomes the ENTRYPOINT of the docker container.

Here’s (another over-engineered) diagram of the setup for automatically generating and renewing SSL certs within the Docker container:

SSL Certs Automation

Results

And, bam! If you’re reading this, that’s because the setup is working.

Ignoring the fact that what I have now is a likely unmaintable, definitely overly-complicated and unnecessarily custom system, this is a success! :P

Most importantly, I learned a shit ton of stuff throughout the process. That’s what it was about anyways. There might be one or two other people that ever read things I post or appreciate the effort put into this site. I don’t mind though. It’s for myself and for fun–and that I have achieved :)