This post is best viewed using your browser's reader mode

Back to home

NixOS + Starting UNI = Should forked out the extra dosh for a macbook?

Today I had a tutorial that required splunk enterprise to be installed.

I use NixOS, so the only option for me was to use the tar ball distribution as the other packaging options are for debian and redhat based systems.

Upon running ./bin/splunk start I was greeted with the following error:

fish: Job 1, './splunk start --accept-license' terminated by signal SIGSEGV (Address boundary error)

At first I thought for some silly reason that splunk literally meant "for linux kernels 3.x, 4.x and 5.4.x" on their downloads page and that the kernel had some sort of breaking change. Not sure why I thought a bad mem read would be caused by a kernel change, but being stupid I did. Anyways it took all of 5 minutes to literally downgrade my kernel to 5.4.0 from uname -r -> 6.3.11to 5.4 thanks to nixos. That obviously didn't work but my regular kernel was just one boot entry away thanks to nixos so I booted back into that and decided to do some more digging.

I noticed a large amount of shared libraries in the lib directory, so I ensured that all permissions were correct as stated in the manifest file. As part of starting UNI i've (maybe unfortunately) had to learn python, so I decided to put my 2-3 days of python knowledge to the test and write a small script to do this rather than doing it in the shell. It was suprising easily even with very little knowledge of python's syntactic sugar.

The manifest file was structured as follows:

d 755 splunk splunk splunk/bin -
f 555 splunk splunk splunk/bin/genWebCert.sh 5e8b5dd29a8c8fc97cb29abfcd9328e9e395ce31cd8e1839dcef3f546278e7d0
l 555 splunk splunk splunk/bin/idle3 -> idle3.7

And the python:

from pwd import getpwnam
import os

manifest = open("manifest", "r")

known_users = {}
known_groups = {}

def adjust_file_meta(path, mode, user, group):
    if user not in known_users:
        uid = getpwnam(user).pw_uid
        known_users[user] = uid
    else: uid = known_users[user]
    if group not in known_groups:
        gid = getpwnam(group).pw_gid
        known_groups[group] = gid
    else: gid = known_groups[group]
    oct_mode = int(mode, 8)
    print(f"Changing mode to {oct_mode} (oct) for {path}")
    os.chmod(path, oct_mode)
    print(f"Changing ownership to {user}:{group} for {path}")
    os.chown(path, uid, gid)

def main():
    print("You may need to run this script as root or create a 'splunk' user and group.")

    for line in manifest:
        # Try to parse (production ready parsing right here) as either a regular file or a symlink
        try:
            ty, mode, user, group, path, _ = line.split(" ")
            adjust_file_meta(path, mode, user, group)
        except ValueError:
            ty, mode, user, group, path, _, _ = line.split(" ")
            adjust_file_meta(path, mode, user, group)

    print("Done.")

if __name__ == "__main__":
    main()

In hindsight i could have branched on the first character of the line rather than mucking around with the try except, but it worked.

and setting up the user and group was a simple change to my nixos config:

users = {
  users.splunk = {
    isSystemUser = true;
    group = "splunk";
  }
  groups.splunk = {}
}

Well that didn't work either.. I was still getting the same error, delving more into exactly what dynamic linking bullshitery was going on it was time to bust out strace. No bueno the only syscall I could see was:

execve("/opt/splunk/bin/splunk", ["/opt/splunk/bin/splunk"], 0x7ffe600af570 /* 106 vars */) = -1 ENOENT (No such file or directory)
strace: exec: No such file or directory
+++ exited with 1 +++

not very useful at first glance but if you fire up the manpage for execve you'll notice this key refresher:

pathname must be either a binary executable, or a script starting with a line of the form:
                  #!interpreter [optional-arg]

So a binary or a script denoted with a shebang? This for some odd reason reminded me of an interesting article I skimmed a while ago about creating a custom binary format for the smallest possible binary by creating a custom kernel module to serve things like requests for shared objects https://www.muppetlabs.com/~breadbox/txt/mopb.html.

Remembering this I decided to check out exactly how the splunk binary was being intrerpreted by running readelf -a splunk | grep interpreter and was greeted with the problem:

[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

Anyway this is ELF (not the obsecure but cool binfmt I mentioned above) so the program interpreter is just a string in the header of the binary file that is used to determine how to run the binary. This is usually /lib64/ld-linux-x86-64.so.2 on most linux systems. ld is the runtime linker facilitating the loading of shared objects post the kernel's loading of the binary.

Let's compare this to a binary that works on my system to highlight the issue:

$ nix-shell -p bintools --command fish # Jump into a shell with readelf installed.
$ readelf -a (which ls) | rg interpreter # Find the interpreter for ls
[Requesting program interpreter: /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2]

Ah ok so the interpreter is being set to the absolute path of the interpreter in my current nix profile. This is a very common pattern in nixos as it allows fully visualised and atomic changes to be computed from software's dependency graphs. I'm very new to NixOS (about 4-5 days in) and I've seen quite a lot of patching of binaries when dealing with packaging pre-built software. This was my main hint for the fix.

Ok so stupid fix:

patchelf --set-interpreter /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2 splunk

Well that worked, there are a million reasons why this is a bad idea but I was in the middle of a tutorial and had already wasted time trying to connect to the wifi (EAP) instead of the PSK using wpa_supplicant and wpa_cli (which I eventually got working by using my phone as a hotspot and connecting to that :^) smarter not harder).

Ok so now we have a few different errors which are really just the same error we need to fix all the ELF executables shipped with splunk.

Now I might not be new to linux desktop but abusing the shell to create slow ambominations is doable:

sudo fd . --type f | sudo xargs file | rg 'ELF' | awk '{print $1}' | sed -e 's/://' > files.txt

Let's find all files, pipe them to file to get a file description, filter for those that are ELF binaries, use awk to get the first column (the file path) and finally remove the trailing : with the stream editor and pipe that into a scratch file. I did a quick check to see if man find or man fd had a way to directly filter for ELF binaries but I couldn't find anything, another route would be to look at readelf's status code (I'd guess that would work).

Anyways let's use some vim magic to continue our journey:

vi files.txt

Now in vim let's run the following substitute command:

s/\(.*\)/readelf -a \1 | grep Requesting\\ program\\ interpreter:\\ \\\/lib64\\\/ld-linux-x86-64.so.2 > \/dev\/null \&\& echo "\1"/
eg.
./mybinary
into:
readelf -a ./mybinary | grep interpret > /dev/null && echo "./mybinary"

Now we've transformed our list of files into a list commands that will either return the file name (if it has an interpreter) or a bad exit code (if it does not). Next I press gg to go to the top of the file and then V to enter visual line mode and then G to select all the lines. Invoking command mode with :'<,'>!sh spawns shell processes for each line in the file and runs them. If a line returns a non-zero exit code then it will be removed from the buffer. We've now effectively filtered all files shipped with splunk to:

  1. ELF binaries
  2. That have an interpreter

In my case that went from 21365 files to 135 ELF binaries to just 33 files to patch.

Now we can use the same vim magic to transform this list into a list of patchelf commands to run:

s/\(.*\)/patchelf --set-interpreter \/nix\/store\/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8\/lib\/ld-linux-x86-64.so.2 \1/
and repeat our :'<,'>!sh command to patch all the binaries.

And finally I can run splunk! In all honestly this is exactly what I expected when installing software on linux, it's not a big deal to spend more time upfront on things for a deeper understanding IMO. I'm a bit disappointed the UNI leans into software as proprietary as splunk but I'm happy to learn it either way :^).

Back to home