Clangd and compile_commands.json
Clangd is a wonderful language server for C/C++ development. It works with tons
of different compilers, target architectures, configurations, build
environments, etc. In order to support all these different scenarios, it makes
use of a special file (increasingly becoming a standard among build systems)
named compile_commands.json
.
This file is generated by build systems like CMake or Meson, and is quite literally just a big list of compiler commands listed out in json format. That is, for each source file that the build system inputs to your compiler, compile_commands.json contains that file and the list of flags your compiler uses to build it (think: standard library version, include paths, environment variables, config flags, etc). This allows cool things to work like clangd correctly differentiating between the different c/c++ versions for included libraries vs your own code!
With most typical environments (eg. compiling to native host architecture, regular linux distros), as long as clangd finds your compile_commands.json, you’ll be good to go. Sometimes you’ll have to add a build system flag to generate it, and maybe symlink the json file to the project root, but that’s it.
With compile_commands.json, Clangd will do the same thing your compiler does for
each file: first try to resolve includes using any passed-in include paths (eg.
via -I
or -isystem
), then check the usual places like /usr/lib/
. Most
critical things, like libc or libstdc++ headers, will be found in the usual
system directories.
Embedded Development
When writing code for embedded devices, or even any devices with a different system architecture, things change up a bit!
The header files in the usual places are no longer valid for resolving includes; you’re not building binaries for your native system, so native system libraries don’t apply anymore. Sometimes, if everything clangd needs to know is specified in your compile_commands.json, it will still Just Work™. However, if clangd fails to understand your compiler toolchain (as is often the case when using gcc and especially with gcc-arm-embedded), you will have errors in your editor!
To fix this, clangd has a
special cli flag:
--query-driver
. This flag is a glob pattern that whitelists certain
gcc-compatible compiler binaries that clangd is allowed to then query for more
correct include paths. This way, clangd can call out to gcc (or
gcc-arm-embedded) directly instead of guessing what it thinks the include paths
should be!
Let’s see this in action. First, without --query-driver
(output shortened for
clarity):
$ clangd --check=main.cpp
I[02:28:06.482] Loaded compilation database from compile_commands.json
I[02:28:06.485] Compile command from CDB is: arm-none-eabi-g++ --target=arm-none-eabi ...
I[02:28:06.500] internal (cc1) args are: -cc1 -triple thumbv7em-none-unknown-eabi ...
E[02:28:07.548] [pp_file_not_found] Line 6: in included file: 'gnu/stubs-32.h' file not found
I[02:28:07.601] All checks completed, 1 errors
Clangd knows from our compile_commands.json that we (the build system) are using
arm-none-eabi-g++ to compile our code for a non-native target. But since it
still uses clang’s header resolving logic under the hood instead of our
compiler’s, it gets some stuff wrong. In this case, the error about
'gnu/stubs-32.h' file not found
is because clangd is trying to use our
system’s glibc headers, which causes conflicts due to architecture differences.
Now again, but with --query-driver='**'
(the **
tells clangd it can query
any compiler binary):
$ clangd --query-driver='**' --check=main.cpp
I[02:28:06.482] Loaded compilation database from compile_commands.json
I[02:38:09.179] System includes extractor: successfully executed arm-none-eabi-g++
got includes: "/path/to/gcc-arm-embedded/arm-none-eabi/include/c++/12.3.1, ..."
I[02:28:06.485] Compile command from CDB is: arm-none-eabi-g++ --target=arm-none-eabi ...
I[02:28:06.500] internal (cc1) args are: -cc1 -triple thumbv7em-none-unknown-eabi ...
I[02:38:10.655] All checks completed, 0 errors
As we can see, clangd is now calling our embedded gcc compiler directly, asking
it for the correct include paths to add to its resolving logic. Using strace
,
we can even see the command it’s calling directly:
$ strace -f -e trace=process -o out.txt clangd --query-driver='**' --check=main.cpp > /dev/null 2>&1
$ cat out.txt
...
4063415 execve("/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/arm-none-eabi-g++", ["/nix/store/i3m8xrhhnb7l83cpwdd9r"..., "-E", "-v", "-x", "c++", "-"], 0x7ffec9e80478 /* 177 vars */ <unfinished ...>
...
We can see clangd invoked arm-none-eabi-g++ -E -v -x c++ -
. Let’s run it
ourselves and check the output:
$ arm-none-eabi-g++ -E -v -x c++ -
#include "..." search starts here:
#include <...> search starts here:
/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include/c++/12.3.1
/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include/c++/12.3.1/arm-none-eabi
/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include/c++/12.3.1/backward
/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/include
/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/include-fixed
/nix/store/i3m8xrhhnb7l83cpwdd9rlkcglpnxkw8-gcc-arm-embedded-12.3.rel1/bin/../lib/gcc/arm-none-eabi/12.3.1/../../../../arm-none-eabi/include
End of search list.
And we see our compiler toolchain’s include paths! Awesome!
NixOS’s Fault
One of the design goals of clangd (as far as I understand) is that it behaves very similarly to clang. This makes sense; you don’t want your code to compile with no errors but then to have errors from clangd in your editor at the same time!
So, since compilers automatically search common default paths (like /usr/lib
)
for header files and shared libraries, clangd does too. This way, if you spin up
a main.c and add a #include <stdio.h>
, it just works; you don’t have to say
“Please use my system compiler’s libc for printf.”
However, these paths don’t exist in NixOS, as everything is stored in unique
isolated paths under /nix/store
. So what do the nixpkgs authors do to make it
still work? The same thing they do for compilers: add a wrapper bash script to
set environment variables!
Unfortunately, in the case of clangd, it’s not a very good wrapper. In fact, in
the earlier working example using --query-driver
, I had to use the unwrapped
clangd binary from nixpkgs to get to the correct output! Let’s take a look at
why it’s bad (note that “@clang@” is substituded with clang’s nix store output
path during the build process):
...
export CPATH=${CPATH}${CPATH:+':'}$(buildcpath ${NIX_CFLAGS_COMPILE} \
$(<@clang@/nix-support/libc-cflags)
):@clang@/resource-root/include
export CPLUS_INCLUDE_PATH=${CPLUS_INCLUDE_PATH}${CPLUS_INCLUDE_PATH:+':'}$(buildcpath ${NIX_CFLAGS_COMPILE} \
$(<@clang@/nix-support/libcxx-cxxflags) $(<@clang@/nix-support/libc-cflags) \
):@clang@/resource-root/include
exec -a "$0" @unwrapped@/bin/$(basename $0) "$@"
According to
gcc’s docs,
CPATH and CPLUS_INCLUDE_PATH tell the compiler (clangd uses clang for parsing
under the hood, so it’s affected too!) to search the specified list of
directories for header files exactly as if they were passed in with -I
.
I other words, the wrapper is unconditionally hardcoding paths to the standard library header files asociated with our system’s clang; which, in our case, is glibc and libstdc++. Yikes! What about when we’re not using those? This wrapper assumes that we’re always using clangd for native-only development and never with custom toolchains.
I am unclear as to whether similar errors would occur in regular linux
distributions without using --query-driver
. My gut tells me that as long as
the --query-driver
’s include flags come first in the list, having the
system’s c/c++ standard library headers hardcoded as /usr/local/whatever
paths
wouldn’t cause errors. But it is worth acknowledging that NixOS’s clangd wrapper
caused the error I was having.
Since we don’t need any of the system’s default header files for our embedded
development, running the unwrapped version of clangd with our
query-driver='**'
passed in will work just fine. However, this does break the
works-out-of-the-box behavior that clang and gcc have and that I mentioned
earlier, which is a hard sacrifice to make.
For now, I have my editor configured to use the unwrapped clangd binary, which I have added to my path using the following Nix code:
home.packages = let
clangdUnwrapped = pkgs.runCommand "clangdUnwrapped" {} ''
mkdir -p $out/bin
ln -s ${pkgs.clang.cc}/bin/clangd $out/bin/clangd-unwrapped
'';
in [
clangdUnwrapped
];
Neovim lspconfig configuration for clangd:
local servers = {
...
clangd = {
cmd = {
'clangd-unwrapped',
'--query-driver=**', -- whitelists all compiler binaries. Technically a security risk, but I'm usually working in trusted environments
'--enable-config', -- allows clangd to parse global or project-local configuration files
'--background-index',
'--clang-tidy',
-- '--log=verbose',
},
},
}
Inconvenient for sure, but it works well enough as a temporary solution.
In the meantime, I’ll keep an eye on these issue and PRs:
- (PR) https://github.com/NixOS/nixpkgs/pull/354755
- (Issue) https://github.com/NixOS/nixpkgs/issues/348791
- (Issue) https://github.com/clangd/clangd/issues/2181
That’s all. Hope you enjoyed or learned something!