Today I had a tutorial that required splunk enterprise to be installed.
I use NixOS, so the only option for me was to use the tar ball distribution as the other packaging options are for debian and redhat based systems.
Upon running ./bin/splunk start
I was greeted with the following error:
fish: Job 1, './splunk start --accept-license' terminated by signal SIGSEGV (Address boundary error)
At first I thought for some silly reason that splunk literally meant "for linux
kernels 3.x, 4.x and 5.4.x" on their downloads page and that the kernel had some sort of breaking
change. Not sure why I thought a bad mem read would be caused by a kernel change,
but being stupid I did. Anyways it took all of 5 minutes to literally downgrade
my kernel to 5.4.0 from uname -r
-> 6.3.11
to 5.4
thanks to nixos. That obviously didn't work but my regular kernel was just one
boot entry away thanks to nixos so I booted back into that and decided to do
some more digging.
I noticed a large amount of shared libraries in the lib
directory, so I
ensured that all permissions were correct as stated in the manifest file. As
part of starting UNI i've (maybe unfortunately) had to learn python, so I
decided to put my 2-3 days of python knowledge to the test and write a small
script to do this rather than doing it in the shell. It was suprising easily
even with very little knowledge of python's syntactic sugar.
The manifest file was structured as follows:
d 755 splunk splunk splunk/bin - f 555 splunk splunk splunk/bin/genWebCert.sh 5e8b5dd29a8c8fc97cb29abfcd9328e9e395ce31cd8e1839dcef3f546278e7d0 l 555 splunk splunk splunk/bin/idle3 -> idle3.7
And the python:
from pwd import getpwnam import os manifest = open("manifest", "r") known_users = {} known_groups = {} def adjust_file_meta(path, mode, user, group): if user not in known_users: uid = getpwnam(user).pw_uid known_users[user] = uid else: uid = known_users[user] if group not in known_groups: gid = getpwnam(group).pw_gid known_groups[group] = gid else: gid = known_groups[group] oct_mode = int(mode, 8) print(f"Changing mode to {oct_mode} (oct) for {path}") os.chmod(path, oct_mode) print(f"Changing ownership to {user}:{group} for {path}") os.chown(path, uid, gid) def main(): print("You may need to run this script as root or create a 'splunk' user and group.") for line in manifest: # Try to parse (production ready parsing right here) as either a regular file or a symlink try: ty, mode, user, group, path, _ = line.split(" ") adjust_file_meta(path, mode, user, group) except ValueError: ty, mode, user, group, path, _, _ = line.split(" ") adjust_file_meta(path, mode, user, group) print("Done.") if __name__ == "__main__": main()
In hindsight i could have branched on the first character of the line rather than mucking around with the try except, but it worked.
and setting up the user and group was a simple change to my nixos config:
users = { users.splunk = { isSystemUser = true; group = "splunk"; } groups.splunk = {} }
Well that didn't work either.. I was still getting the same error, delving more
into exactly what dynamic linking bullshitery was going on it was time to bust
out strace
. No bueno the only syscall I could see was:
execve("/opt/splunk/bin/splunk", ["/opt/splunk/bin/splunk"], 0x7ffe600af570 /* 106 vars */) = -1 ENOENT (No such file or directory) strace: exec: No such file or directory +++ exited with 1 +++
not very useful at first glance but if you fire up the manpage for execve
you'll notice this key refresher:
pathname must be either a binary executable, or a script starting with a line of the form: #!interpreter [optional-arg]
So a binary or a script denoted with a shebang? This for some odd reason reminded me of an interesting article I skimmed a while ago about creating a custom binary format for the smallest possible binary by creating a custom kernel module to serve things like requests for shared objects https://www.muppetlabs.com/~breadbox/txt/mopb.html.
Remembering this I decided to check out exactly how the splunk binary was being
intrerpreted by running readelf -a splunk | grep interpreter
and was greeted
with the problem:
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
Anyway this is ELF (not the obsecure but cool binfmt I mentioned above) so the
program interpreter is just a string in the header of the binary file that is
used to determine how to run the binary. This is usually
/lib64/ld-linux-x86-64.so.2
on most linux systems. ld is the runtime linker
facilitating the loading of shared objects post the kernel's loading of the
binary.
Let's compare this to a binary that works on my system to highlight the issue:
$ nix-shell -p bintools --command fish # Jump into a shell with readelf installed. $ readelf -a (which ls) | rg interpreter # Find the interpreter for ls [Requesting program interpreter: /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2]
Ah ok so the interpreter is being set to the absolute path of the interpreter in my current nix profile. This is a very common pattern in nixos as it allows fully visualised and atomic changes to be computed from software's dependency graphs. I'm very new to NixOS (about 4-5 days in) and I've seen quite a lot of patching of binaries when dealing with packaging pre-built software. This was my main hint for the fix.
Ok so stupid fix:
patchelf --set-interpreter /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2 splunk
Well that worked, there are a million reasons why this is a
bad idea but I was in the middle of a tutorial and had already wasted time trying to
connect to the wifi (EAP) instead of the PSK using wpa_supplicant
and
wpa_cli
(which I eventually got working by using my phone
as a hotspot and connecting to that :^) smarter not harder).
Ok so now we have a few different errors which are really just the same error we need to fix all the ELF executables shipped with splunk.
Now I might not be new to linux desktop but abusing the shell to create slow ambominations is doable:
sudo fd . --type f | sudo xargs file | rg 'ELF' | awk '{print $1}' | sed -e 's/://' > files.txt
Let's find all files, pipe them to file
to get a file description, filter for
those that are ELF binaries, use awk to get the first column (the file path) and
finally remove the trailing :
with the stream editor and pipe that into a
scratch file. I did a quick check to see if man find
or man fd
had a way to
directly filter for ELF binaries but I couldn't find anything, another route
would be to look at readelf's status code (I'd guess that would work).
Anyways let's use some vim magic to continue our journey:
vi files.txt
Now in vim let's run the following substitute command:
s/\(.*\)/readelf -a \1 | grep Requesting\\ program\\ interpreter:\\ \\\/lib64\\\/ld-linux-x86-64.so.2 > \/dev\/null \&\& echo "\1"/eg.
./mybinaryinto:
readelf -a ./mybinary | grep interpret > /dev/null && echo "./mybinary"
Now we've transformed our list of files into a list commands that will either
return the file name (if it has an interpreter) or a bad exit code (if it does
not). Next I press gg to go to the top of the file and then V
to enter visual
line mode and then G
to select all the lines. Invoking command mode with
:'<,'>!sh
spawns shell processes for each line in the file and runs them. If a
line returns a non-zero exit code then it will be removed from the buffer. We've
now effectively filtered all files shipped with splunk to:
In my case that went from 21365
files to 135
ELF binaries to just 33
files
to patch.
Now we can use the same vim magic to transform this list into a list of
patchelf
commands to run:
s/\(.*\)/patchelf --set-interpreter \/nix\/store\/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8\/lib\/ld-linux-x86-64.so.2 \1/and repeat our
:'<,'>!sh
command to patch all the binaries.
And finally I can run splunk! In all honestly this is exactly what I expected when installing software on linux, it's not a big deal to spend more time upfront on things for a deeper understanding IMO. I'm a bit disappointed the UNI leans into software as proprietary as splunk but I'm happy to learn it either way :^).