Blogs
Using Python VirtualEnvs in NixOS
Quick preface: this post is about a NixOS specific issue and thus expects some base level Nix experience. Hopefully soon when I have more time I’ll write more about how Nix works, why to use it, its tradeoffs, etc. That’s for another time though for now.
Python Venvs
Python virtual environments (venvs) are the official solution for managing python dependency hell and version conflict. They’re super nice; each project gets its own venv which houses all the project’s dependencies/libraries separately from everything else on the system in a simple and familiar layout:
Creating one is as simple as python -m venv .venv
using your system-installed
python, which is then set to be the python interpreter for that venv via a
symlink (just use a different python3.x
binary during setup to set a different
python version for that venv).
To “enter” the venv, run source .venv/bin/active
(or whatever the equivalent
is for your inferior OS). Now pip
as seen in the directory tree is added to
your path, and you can pip install xyz
just as if you were installing
globally, but it only affects the local venv!
Python in NixOS
NixOS has it’s own special way to manage python installations. Nixpkgs, the Nix package collection, has over 8 thousand python libraries packaged as first class derivations, which are composable with any given python interpreter to create fully isolated, infinitely granular, unbreakable and immutable python installations. It’s extremely powerful (far more than regular python venvs are), however it requires a complete paradigm shift, which isn’t beneficial whatsoever for those not using Nix. So, especially when working with teams, it’s sometimes just not worth the extra effort.
Thankfully venvs work just fine with Nix’s python binaries. Just put any python version into your PATH and use the same command as above. There’s a catch though, and it has to do with compiled python libraries.
Compiled Python Libraries
Many if not most python libraries (eg. numpy, polars, pandas) are written in more efficient compiled programming languages and exposed as python wrappers around the more efficient code. This requires compiling native executables (or shared libaries) for the target machine and having python load them into the runtime correctly. However, these compiled libraries are often dynamically linked, requiring other dynamic libraries (libstdc++.so.6 for example) to be present on the system to run.
On most linux/unix systems this is all well and good as the libraries are
present in /usr/local/lib/ and the dynamic library loader
(/lib/ld-linux-x86-64.so.2) which python is linked against knows how to find
them there. However, on NixOS, those libraries don’t exist there, but rather in
the Nix store (/nix/store/…)! And since we don’t control the compilation
process of python libraries from pypi, we aren’t able to patchelf
them
automatically to reference the nix store library paths instead.
Of course since python is so popular there is a dedicated wiki.nixos.org page for this. There’s few solutions listed:
- The first one is a tool that runs
patchelf
on all the compiled libraries in your venv, but it requires setting up the venv a little differently and a manual run on every package update, which feels too clumsly - The second one recommends using
buildFHSEnv
, which basically puts your shell in an environment that emulates the regular FHS paths like /lib. I’ve used this a lot before and it works well, but still requires manual setup work for each new python project, which I’d like to avoid. - The last suggestion is to use a tool called
nix-ld
, which is what I’ll be going with!
Nix-ld
Nix-ld’s intended purpose is indeed to make unpatched dynamic executables run transparently on NixOS, which is exactly what we’re looking for! To accomplish this, it inserts a set of user-defined shared libraries (usually just the minimum necessary) into a path specified under $NIX_LD_LIBRARY_PATH, and adds a wrapper dynamic library loader at the default FHS path which simply adds the aforementioned shared library path to $LD_LIBRARY_PATH before jumpstarting the regular dynamic library loader. That’s a lot of jumble to basically say, it gives all unpatched dynamic executables access to a set of user-defined shared libraries using environment variables.
There’s one problem with this though, which is mentioned in nix-ld’s readme: This breaks subtly when using interpreters from nixpkgs that load the unpatched dynamically compiled libraries; which is exactly our case with Nixpkgs python. This is because in order for the nix-ld shared library path to be used by the interpreter when loading the compiled libary, the interpreter needs to have been started using the wrapper dynamical library loader created by nix-ld, as it does the task of inserting the path into $LD_LIBRARY_PATH of the child process. We can see that it’s still failing, even after installing nix-ld:
python main.py
> ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory
The Full Solution
So, since Nixpkgs python isn’t started with the wrapper dynamic library loader (because it’s properly patched!), we have to do the job of telling it to use the nix-ld shared library path ourselves. We can just do:
LD_LIBRARY_PATH=$NIX_LD_LIBRARY_PATH python main.py
This is already enough to get our python interpreter to properly load dynamic compiled python libraries. But this requires an ugly environment variable manipulation every time you want to call python or pip! Unacceptable. Instead, I prefer to wrap all of the python binaries installed on my system to do this for me, so that I can manipulate venvs without ever having to think about it. This is the generalized solution I came up with to do just that:
home.packages =
let
# Some dynamic executables are unpatched but are loaded by patched nixpkgs
# executables, and therefore never pick up NIX_LD_LIBRARY_PATH. For
# example, interpreters that use dynamically linked libraries, like python3
# libraries run by nixpkgs' python. This wraps the interpreter for ease of
# use with those executables. WARNING: Using LD_LIBRARY_PATH like this can
# override some of the program's dylib links in the nix store; this should
# be generally ok though
makeNixLDWrapper = program: (pkgs.runCommand "${program.pname}-nix-ld-wrapped" { } ''
mkdir -p $out/bin
for file in ${program}/bin/*; do
new_file=$out/bin/$(basename $file)
echo "#! ${pkgs.bash}/bin/bash -e" >> $new_file
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$NIX_LD_LIBRARY_PATH"' >> $new_file
echo 'exec -a "$0" '$file' "$@"' >> $new_file
chmod +x $new_file
done
'');
in
with pkgs; [ ... (makeNixLDWrapper python3) ... ]
Taking a look at the embedded bash script more closely, with our python example (All nix store paths are automatically substituted in during the build process):
mkdir -p /nix/store/xgb<...>xsc-python3-nix-ld-wrapped/bin
# the python derivation has 14 binaries and/or aliases! Let's wrap each one
for file in /nix/store/3wb<...>jir-python3-3.11.9/bin/*; do
new_file=$out/bin/$(basename $file)
echo "#! /nix/store/1xh<...>qns-bash-5.2p32/bin/bash -e" >> $new_file
# !! Insert user-defined shared library path into child process
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$NIX_LD_LIBRARY_PATH"' >> $new_file
# !! Call the unwrapped binary, but set its argv[0] to this wrapper script
echo 'exec -a "$0" '$file' "$@"' >> $new_file
chmod +x $new_file
done
I found that it’s important to set argv[0] of the child process (regular unwrapped python in our case) to be the path to its wrapper script instead of the regular binary (see the exec -a flag). This is because if you don’t, creating new venv’s will symlink the unwrapped python interpreter into the venv, which will replace the wrapped one as soon as you enter rendering the wrapper useless.
(I also did try using bash heredocs instead of the separate echo’s, but kept getting mysterious syntax/formatting errors. Bash is so cursed lmao)
That’s all, hope you enjoyed or learned something :P
Building Non-Native Docker Images with QEMU and binfmt
So, you’ve finally finished the perfect Dockerfile for your project. You build
it and test it on your machine; all is working well. You export the tarball,
scp
it over to your production server, and load it into the docker daemon -
but wait! You forgot that your server has an Arm CPU and your laptop is x86_64.
This won’t work, it’s an architecture mismatch! Womp womp.
There’s 3 solutions to the problem.
- Build the Dockerfile on your production server (or another Arm computer). You’ll have to move over your project’s source code and dependencies for this too… Ew!
- Rewrite your Dockerfile to support cross compilation. That’s a lot of work, and requires cross compilation compiler flags…
- Simply enable ✨magic✨. And by that I of course mean emulating the docker build process using QEMU and binfmt_misc! Keep reading to learn more.
QEMU
As described on its website, QEMU is “a generic and open source machine emulator and virtualizer.” It’s super powerful and honestly feels like some black magic; but I’ll leave that for you to discover! It has two main modes: System Emulation and User Mode Emulation.
With system emulation, QEMU emulates an entire foreign computer (optionally paired with a hypervisor like KVM/Xen to take advantage of native virtualization features), allowing full operating systems to be run that are built for nearly any non-native CPU architecture.
With user mode emulation, QEMU emulates only the CPU of an non-native binary, allowing, for example, a powerpc or armv6 binary to be run on an x86_64 machine. This uses dynamic binary translation for instruction sets, syscalls (fixing endianness and pointer-width mismatches), signal handling and threading emulation - in other words, basically magic. Notably it’s much faster than system emulation, which has to emulate the kernel, peripheral devices, and more.
As I mentioned earlier, our goal is to emulate the docker build process so we
can generate non-native container images; i.e., docker build -t my-container .
but with a different cpu architecture. User mode emulation is perfect for this!
binfmt
Ok, so we’ve installed QEMU. How do we tell Docker to use it? Here’s where magic part two comes in: binfmt_misc. Straight from the Wikipedia page: “binfmt_misc is a capability of the Linux kernel which allows arbitrary executable file formats to be recognized and passed to certain user space applications, such as emulators and virtual machines.” Essentially, we’re telling the linux kernel, “Actually, you do know how to run this foreign-architecture binary: just feed it into QEMU!” So we get:
$ ./ForeignHelloWorld
zsh: exec format error: ./ForeignHelloWorld # before
$ ./ForeignHelloWorld
Hello, World! # after
To set this up on most linux distributions, use one of multiarch/qemu-user-static or dbhi/qus or tonistiigi/binfmt; they all do roughly the same thing. For qemu-user-static, simply run
$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
You might be wondering, how does an isolated docker container install binfmt registrations for qemu persistently on my host kernel??
Registering qemu to binfmt_misc
These “intaller” containers utilize Docker’s --privileged
mode to mount the
/proc/sys/fs/binfmt_misc
directory, which will be the same for the host and in
the container. They contain statically compiled builds of qemu with user mode
only (qemu-user-static), and register them directly to the mounted binfmt_misc
directory. When the container exits, the registrations persist on this host
machine. You’ll end up with registrations looking like this:
$ cat /proc/sys/fs/binfmt_misc/qemu-arm # host machine
enabled
interpreter /usr/bin/qemu-arm-static
flags: F
offset 0
magic 7f454c4601010100000000000000000002002800
mask ffffffffffffff00ffffffffffff00fffeffffff
The two important parts here are flags
and interpreter
. Again you might be
wondering, I don’t see /usr/bin/qemu-arm-static
on my host system, so where
where does the kernel find the qemu binary?? If you weren’t wondering you
should’ve been, because that’s a path to qemu from within the installer
container, which was then promptly removed (--rm
) altogether. Take a look
(credit):
$ docker run --rm multiarch/qemu-user-static:x86_64-aarch64 /usr/bin/qemu-aarch64-static --version
qemu-aarch64 version 7.2.0 (Debian 1:7.2+dfsg-1~bpo11+2)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
The fundamental problem is that the kernel has do a path lookup for the registered binfmt interpreter at the time when a foreign binary is invoked. But if it’s invoked in, say, a container or chroot environment, the path to the interpreter is obviously no longer valid. the F flag is the solution to this problem created by Jonathan Corbet. Instead of locating the interpreter binary lazily (at the time of first invocation), it’s given a file descriptor immediately after being registered. This way, when the kernel needs to find the interpreter in a chroot or container where the path doesn’t exist, it just uses the preallocated, always-valid file descriptor for it instead!
This is also why the qemu binaries have to be static; if they were dinamically linked, the dynamic library loader lookup would obviously fail within the chroot/container environment.
So to answer the previous question about where the kernel finds
/usr/bin/qemu-arm-static
(I’m fairly confident about this but please somebody
correct me if I’m wrong!): when the installer container registers
qemu-arm-static to binfmt_misc, it’s immediately loaded by the kernel using the
valid path from within the installer container, and the given file descriptor
for it persists after the container exits. It doesn’t persist on reboot though,
which is why the best
solution
I could find online is literally to rerun the installer on every boot with a
systemd service!
Docker Build
Now all we have to do is make the Docker build process use QEMU from our binfmt_misc registrations. Fortunately, this is pretty easy!
Firstly, for a multiplatform build (i.e. if you want to build your docker container natively and in one or more non-native architectures and combine them into one multiarch build), make sure the containerd image store is enabled for the docker daemon. Podman shouldn’t have this problem; just a classic case docker being annoying (see this and this, or my NixOS setting).
Then, you may have to create a new docker buildx builder (especially for multiplatform builds):
$ docker buildx create \
--name my-container-builder \
--driver docker-container \
--driver-opt=default-load=true \
--use --bootstrap
Lastly, just run your build and specify the target platform!
$ docker buildx build -t my-container-image --platform=linux/arm64 .
# or
$ docker buildx build -t my-multiplatform-container-image --platform=linux/arm64,linux/amd64,<whatever_else> .
You’ll know it’s working if you see something like this (this is a NextJS build, emulated for aarch64. Note that NixOS paths are slightly sifferent; the underlined path is a symlink to qemu-user-static for aarch64)
NixOS
Of course, NixOS has a super fun and declarative way to set this all up and I can’t help but share that here as well.
Instead of needing to use container-based qemu-user-static binfmt_misc
installers (on every boot), NixOS provides a configuration module for binfmt:
boot.binfmt.emulatedSystems
. This sets up qemu-* for the given systems
architectures automatically, even including wasmtime for wasm files and wine for
Windows executables! To ensure the enterpreters are statically compiled versions
(qemu-*-static), we can use pkgsStatic.qemu-user
(requires nixpkgs-unstable;
see nixpkgs#314998 and
nixpkgs#334859). Lastly, to set
the F flag for the registrations, we can use fixBinary = true;
. Here’s what we
end up with (thanks to Ten for
this dicourse comment!):
boot.binfmt.emulatedSystems =
let
emulationsBySystem = {
"x86_64-linux" = [
"aarch64-linux" # qemu
"armv6l-linux"
"armv7l-linux"
"x86_64-windows" # wine
"i686-windows"
"riscv32-linux" # qemu
"riscv64-linux"
"wasm32-wasi" # wasmtime
"wasm64-wasi"
];
}
in
emulationsBySystem.${pkgs.system};
# backport of preferStaticEmulators to nixos-24.05
boot.binfmt.registrations = lib.mergeAttrsList (system:
{
${system}={
interpreter = (pkgsUnstable.lib.systems.elaborate { inherit system; }).emulator pkgsUnstable.pkgsStatic;
fixBinary = true;
}
}) config.boot.binfmt.emulatedSystems;
That last part is a backport of preferStaticEmulators for nixos-24.05. In nixos-unstable and nixos-24.11 it will become just
boot.binfmt.preferStaticEmulators = true;
Conclusion
In this post I (attempted to) explain the process of using QEMU user mode emulation and binfmt_misc registrations to automagically build non-native docker images. I hope you learned something! Feel free to bother me with questions, comments, or corrections.
Here are some of the resources I used:
- https://lwn.net/Articles/679309/
- https://dbhi.github.io/qus/context.html
- https://github.com/NixOS/nixpkgs/issues/160300
- https://github.com/NixOS/nixpkgs/blob/nixos-unstable/nixos/modules/system/boot/binfmt.nix
- https://discourse.nixos.org/t/docker-ignoring-platform-when-run-in-nixos/21120/16?u=bvngeecord
- https://docs.docker.com/build/building/multi-platform/#qemu
- https://drpdishant.medium.com/multi-arch-images-with-docker-buildx-and-qemu-141e0b6161e7
P.S. In one of my next posts, I’ll talk about how I irradicated Dockerfiles alltogether, replacing them completely with pure nix :) More to come!
Compiling Sway with wlroots from Source
I used this process when I needed to compile the latest commit of Sway to test a new feature, and my distro (NixOS) didn’t have the required wlroots version packaged so I needed to compile that from source too. With the correct clangd setup in your editor, autocomplete/intellisense for sway & wlroots will both work flawlessly. Disclaimer: This isn’t official; there may be better ways to do it (feel free to ping me on the original gist). With that out of the way, here are the steps I used:
wlroots
-
Clone (
git clone https://gitlab.freedesktop.org/wlroots/wlroots
) -
Obtain deps
Use your system’s package manager to obtain all of wlroots’ dependencies (listed here). A Nix shell which accomplishes just that is at the bottom of this post.
- Setup build & Compile
--prefix
tells Meson to use a newout
subdir as the prefix path (for.so
and.h
files ) instead of the default/usr/local
-Doptimization
is to avoid compiler errors from Nix’s fortify flags (alternatively use Nix’shardeningDisable = [ "all" ];
)
meson setup build -Ddebug=true -Doptimization=2 --prefix=/home/<user>/dev/wlroots/build/out
ninja -C build install # installs headers and shared object files to the previously specified prefix
Sway
-
Clone (
git clone https://github.com/swaywm/sway/
) -
Obtain deps
Use your system package manager again (or the Nix shell at the end of this post). Sway’s build deps are listed here
- Setup build & compile
The fun part:
PKG_CONFIG_LIBDIR="/home/<user>/dev/wlroots/build/out/lib/pkgconfig/" meson setup build -Ddebug=true -Doptimization=2
# ^ tells Meson to look for wlroots in our previously specified location
ninja -C build
Nix shell
Running nix-shell
(or nix develop -f ./shell.nix
) with this shell.nix
in
your CWD will add all of the listed dependencies to your environment without
installing them to your system.
{ pkgs ? import <nixpkgs> { }, ... }:
pkgs.mkShell {
name = "wlroots";
packages = with pkgs; [
# Wlroots deps
libdrm
libGL
libcap
libinput
libpng
libxkbcommon
mesa
pixman
seatd
vulkan-loader
wayland
wayland-protocols
xorg.libX11
xorg.xcbutilerrors
xorg.xcbutilimage
xorg.xcbutilrenderutil
xorg.xcbutilwm
xwayland
ffmpeg
hwdata
libliftoff
libdisplay-info
# Sway deps
libdrm
libGL
wayland
libxkbcommon
pcre2
json_c
libevdev
pango
cairo
libinput
gdk-pixbuf
librsvg
wayland-protocols
xorg.xcbutilwm
wayland-scanner
scdoc
# Build deps
pkg-config
meson
ninja
];
}
How to Control External Monitor Brightness in Linux
Do you have an external monitor? Do you want to change it’s backlight brightness, but hate having to reach to fiddle with it’s awkwardly placed and unintuitive button interface?
Me too! Thankfully, there’s a widely supported protocol called Display Data Channel (DDC) based on i2c that lets us solve this problem in a nice way. It allows for communication between a computer display and a graphics adapter - things like setting color contrast, getting model name/date information, and of course, setting backlight brightness.
Control displays using ddcutil
To manually query or change monitor settings from the command line, install the ddcutil program. Use it to detect which of your monitors are ddc-capable with:
ddcutil detect
or to set backlight brightness with:
ddcutil setvcp 10 $BRIGHTNESS -b $I2C_BUS
where 10 corresponds to the backlight brightness setting, and $I2C_BUS is the
unique i2c ID given to your monitor (/sys/bus/i2c/devices/i2c-{n}
) which
should be reported to you by ddcutil detect.
This process is definitely not ideal though; scripting ddcutil
is
quite
painful
(trust me on that) and prone to errors (running multiple commands too quickly
causes “communication failed” errors!) Plus, nobody wants to manually detect and
set bus ID’s!
Furthermore, none of the other cool brightness control software or modules (like Waybar’s backlight module) will work with your monitors. Thankfully, there’s a better solution!
ddcci-driver-linux
Linux has a standard “display brightness” interface (which is what’s used by all
the brightness control software), in which each display is given a
/sys/class/backlight/
entry for programs to interact with. To make this work
with external ddc-capable monitors as well as regular laptop displays, one
simply needs a program that bridges the gap between linux’s interface
(/sys/class/backlight/
) and ddc commands (/sys/bus/i2c/devices/i2c-{n}
).
Which is exactly what the
ddcci-driver-linux
kernel module does!
After properly installing the ddcci-driver-linux kernel module,
/sys/class/backlight/ddcci{n}
directories will be created for each capable
external monitor. Now any backlight program will work with them!
Nvidia sucks
Unfortunately, Nvidia graphics cards can cause some trouble. Since the graphics adapter (in the case of an Nvidia GPU, Nvidia’s drivers) is required to do the messaging with capable monitors, it is responsible for some of the ddc-capability detection. And of course, Nvidia’s implementation of that is known to be faulty/broken.
If ddcutil doesn’t detect your monitors, try
these workarounds. If the ddcci-driver-linux
kernel module doesn’t create the necessary /sys/class/backlight/
entries, try
this workaround
(I implemented it in my NixOS configs
here).
A Website? Explained.
When it comes to building and making things, I almost never feel satisfied without knowing how said thing works under the hood. A personal website is no different. In fact, I put off this project for so long for just that reason; I knew how much I didn’t know about the web, and knew that I wanted to learn it all properly instead of skimping out for some existing already-built-for-you solution.
I had never properly worked with the web before; things like how JavaScript frameworks work, what a web server does, or how SSL certificates are granted were all things I knew I would need to learn.
This article serves as a documentation of that process, and an explanation of the choices I made that led me to this website. I find that writing stuff like this down is the best way to make it stick. But it anyone else also somehow finds it useful, great!
Note: As my motivation came more from the desire to learn than to make a production-ready and maintainable system, my decisions may seem unsensible or borderline rediculous to experienced web people. Deal with it :P
The Backend
I started with the backend, figuring out how I would host the website, and learning the basics of how web hosting works in the process. I knew I didn’t want to use anything automatic with too many details hidden (e.g. Vercel), so I opted to find a cloud server that I could get access to for free (no trials) that I could host my website on manually.
After some searching, I found that the Oracle Cloud Always Free tier has the best free options out of any provider. So I got a VM of the shape that was available, and began installing software.
I opted for using the Nginx webserver, as it offers very low level and complete control over the webhosting, and is very efficient (and seemingly better than Apache at this point).
For SSL certs, I chose to use CertBot, which uses the LetsEncrypt certificate authority (this decision didn’t last - more on this later!). It was a relatively simple process, but it involved installing more software and many specific steps, commands, and configurations.
In the back of my head, as I continued to make all these small changes and additions that are impossible to remember, I knew I would regret not using something more reproducible (Docker, Nix, Bash scripts, etc). I knew that if I ever had to switch server providers or rebuild the system from scratch for any reason, I would have to relearn many small details all over again. But I continued on for the time being, to at least get to the point of hosting a webpage (I’ll get back to this later too).
After many many hours of:
- configuring firewalls, iptables and DNS records
- learning the Nginx syntax and its capabilities
- finally understanding file permissions in Linux
- installing SSL certs with CertBot
- many failed attempts at fixing the SELinux security layer before I finally gave up and disabled it
I finally had a default landing page hosted on my domain. The next order of business:
The Frontend
I have only ever worked with raw HTML and JavaScript, and barely even any CSS. I knew however that I would want to write blogs in Markdown, and that I would want at least some extra capabilities and QOL features than what you get from just those basics. So I began to search for Static Site Generators (SSGs) and frameworks that I might want to use. Some things I looked for:
- I don’t need lots of functionality, but I like a high complexity ceiling (aka opt-in complexity)
- No rediculous abstractions; I still want to understand the entire system and compilation process
- Preferably lightweight, minimal to no client-side JS
- Fast compilation / build time
Out of those preferences, I found myself quite liking the approach that Astro takes. It allows for fully opt-in client-side JavaScript, is very simple to start with but allows for practically unlimited complexity (even mixing and matching any other JavaScript frameworks together), and has extremely well designed features, such as content collections. I spent a long long time over-analyzing and not able to make the final decision on what to use (there are far too many options), but eventually I just decided to go with it.
During the frontend design process, I learned that tinkering and perfecting with tiny details is quite addicting; enough to significantly increased the amount of time it took for me to finish putting the website together. Many hours of productivity have been accidentally lost to those details. It’s a lot easier to mess with paddings and drop-shadows in CSS than it is to think about what to write! I guess it’s something good to know about myself :p
As I continued to develop the website though, I had a realization. I now had access to a server fully capable of hosting my website, and my website design in development. But how was I going to connect them? How was I going to take the build ouput and transfer it over to the server?
Connecting the Frontend and Backend
I had to figure out a good way to transfer the website files to the server. Should I build the site locally or on another platform and copy that output to the server’s web-root directory manually after every update? Should I build the website on the server itself and constantly poll for updates in a loop?
After some thinking, I came up with these desires:
- Source code for the website should be hosted on GitHub, but the build
process should NOT tied into it
- Simple webhooks are OK - most git forges should have them
- GitHub Actions are NOT OK - they’re too rooted into GitHub, and hides too much of the process
- The website is built separately from my computer, on the server itself
- All I should have to do to trigger the rebuild and deploy process is a
git push
I found out that GitHub (not surprisingly) provides a quite convenient webhook
interface, allowing reposity owners to configure webhooks that get triggered
upon different events that occur within the repository (such as a new commit or
a push). This works perfectly for my case, as I can have the hosting server
listen for said webhooks and begin the rebuild process when they’re received,
which only requires me to run a git push
to start the process! The only
difficult part left is figuring out how to make the webhosting server actually
listen for those webhooks.
A GitHub webhook can send to any IP or URL on any port, as a POST request with specific data and headers. Coming into this, I had absolutely no idea how I could make a computer listen for requests, or what that even really looked like or meant. Do you need a separate application for it? Is it common to have a public-facing server listen for random POST requests on an open port?
What I did know is that I already had an Nginx server running that was listening for POST requests, but for the website. So naturally, that was the first place my brain went to look for potential answers. But as I was too impatient to scour StackOverflow to decide if this was a good train of thought, I decided to have a (what turned out to be very long and detailed) conversation with ChatGPT about it! (that link contains a large portion of the conversation)
I was blown away by ChatGTP’s ability to understand my problem, and especially
with its knowledge surrounding Nginx and its syntax. After a lot of
conversation, I ended up with another Nginx server block–in the same config
file I was already using for webhosting–to listen for the webhook POST requests
and pass them to a “webhook server” running on port 3000 (something that parses
the requests and can then execute commands depending on them):
proxy_pass http://localhost:3000;
.
At first, it recommended me to use a simple Bash script with NetCat as the “webhook server”, to listen to and parse the requests. That wasn’t an ideal solution though, due to many limitations and NetCat’s barebones-ness. So I searched online for other webhook servers, and found 2, both written in Go and hosted on GitHub. One had more recent commits, but the other had far more stars and popularity and has a GitHub Webhook configuration already built in, so I went with that one.
The webhook server I picked turned out to be very helpful and easy to work with. I copied the default GitHub Webhook configuration available in the repo, tweaked some parameters and added my secret field (basically an encrypted password), and set it to run my rebuild bash script (which I would show if not for security concerns). All the script has to do was pull the latest updates from my websites GitHub repo, rebuild, and copy the output to the webroot folder. And bam, I had a working connection between the front end source code and the backend hosting server!
An over-engineered diagram of the setup at this stage (I got carried away with Excalidraw lol):
The Backend, Revisited (Dockerization)
At this point, I had configured and installed more things on the Oracle Cloud server than I could count. I knew that it would become a problem, as if at any time I would have to switch the server hosting platform or rebuild the system for any reason, I would have to figure it all out again. Installing tons of packages and software, configuring iptables, installing SSL certs, configuring file permissions for Nginx and its webroot, and working with but finally disabling SELinux. It was a lot.
Sensing my future regret, I decided to rebuild as much as I could of the setup in a Docker container, using what I’d learned so far to make it much more reproducible and understandable. This of course had to begin with learning Docker and its basic usage, but that went by fairly quick. More importantly, I had to figure out how I was going to combine the three main/difficult components of the setup:
- Nginx, and its configuration
- The Webhook server
- Requesting and Installing SSL Certs
Nginx & Webhook Server (1 & 2)
This part was easiest; I started by using the officially supported Golang container to build Webhook manually. Then, utilizing Docker’s Multi-Stage builds (where you take stuff from one container to build the final one, leading a smaller final image), I copy the webhook binary into a fresh nginx container. Now all I had left to work out was SSL certs - the hardest part.
SSL Certs (3)
What makes the SSL certs more annoying is that you don’t want to re-request the certs from the certificate authority every single time you restart the container - it’s a very long process. It’s typical to wait to refresh your SSL certs for around 60 days, or even longer. This meant I would need to use a Docker Volume to keep the certificate files persistent between runs of the container.
I quickly learned that CertBot (which also forces users to install it as a Snap package :|) was not going to be ideal. Instead, I found acme.sh, a (posix shell compliant - no bash needed) shell script that can do everything CertBot can but better, plus a lot more.
Acme.sh has two main subcommands that are needed for issuing certs:
acme.sh --issue
, and acme.sh --install-cert
. The issue command contacts the
CAs and requests the certs, which is the long process that you don’t want to
repeat. The install command just takes the certs that have already been
received, and does something with them - in my case, I have configured it to
simply copy them into a location where Nginx can read them.
In order to get certificates from a CA, they need to verify that you have
control of the domain. There are many methods to do this, and acme.sh
comes
with 8 (as of writing), but the most common of them is called “Webroot mode.”
By giving acme.sh
write-access to your webserver’s webroot folder
(/usr/share/nginx in my case), the CA can verify that you own the domain by
requesting acme.sh
to add a file somewhere under the root (usually under
/.well-known/acme-challenge/). When the CA then requests for that specific file,
if your webserver is set up correctly, it should receive it, and therefor know
that you own the domain.
Challenges Automating SSL Certs (3)
The above strategy works well and good, except for a problem that arises when attempting to automate the process in Docker: My Nginx configuration is set to host the webroot under https only; i.e., I don’t host anything without encryption. So how is Nginx supposed to host the file that acme.sh generates to request the SSL certs, without there being any SSL certs yet!
My solution: A super minimal, http-only Nginx configuration that is used just
to request the SSL certs. Once they have been received and installed, Nginx will
then be reloaded with the full https-only configuration. The amount of
complexity here made me move the logic into a separate init_container.sh
script, which then becomes the ENTRYPOINT of the docker container.
Here’s (another over-engineered) diagram of the setup for automatically generating and renewing SSL certs within the Docker container:
Results
And, bam! If you’re reading this, that’s because the setup is working.
Ignoring the fact that what I have now is a likely unmaintable, definitely overly-complicated and unnecessarily custom system, this is a success! :P
Most importantly, I learned a shit ton of stuff throughout the process. That’s what it was about anyways. There might be one or two other people that ever read things I post or appreciate the effort put into this site. I don’t mind though. It’s for myself and for fun–and that I have achieved :)