April 1, 2023

The Joys of Bootstrapping the IBM backup-archive Client

This is a recap of my struggles with trying to fully automate the initial boostrap of the IBM backup-archive client.

My goal was to create an automatic bootstrapping tool for setting up IBM Spectrum Protect (formerly Tivoli Storage Manager (TSM), formerly ADSTAR Distributed Storage Manager (ADSM), formerly Workstation Data Save Facility (WDSF)) for Ubuntu servers. The goal was fairly simple: given some input values like passwords and such, do a hands-off installation where the client was then ready to immediately start backing up data with no interaction from the operator.

The actual installation is fairly straightforward as IBM has .tar files containing .deb files which can be installed via dpkg -i. See Installing the Ubuntu Linux x86_64 client for an exemple of how to get the packages installed.

After this you need to supply some configuration files, dsm.sys and dsm.opt. Still no major trickery going on there so things was going forward as one might expect. However, when it came to supplying password credentials to the client this is were things started getting choppy. In my case I had two passwords I needed to supply to the client: the password used for registering the client with the central backup server, and secondly, an encryption password so you can do client side encryption before sending the data to the server.

The client password: calling on the wisdoms of the ancients

When setting an initial password for contacting the backup server the general expectation seems to be that you should interactively run the dsmc CLI tool. The tool will then prompt you for the password. For example, the article Starting a command-line session states the following:

Your IBM Spectrum Protect administrator can require you to use a password to connect to the server. The client prompts you for a password, if it is required. Contact your administrator if you do not know your password.

This was obviosuly a no-go for my needs. Digging around for information regarding automation I found some information in the Setting the client scheduler process to run as a background task and start automatically at startup but it too states:

For the scheduler to start unattended, you must enable the client to store its password by setting the passwordaccess option to generate, and store the password by running a simple client command such as dsmc query session

So even this asks you to run the client interactively... argh.

While it seems doable to work with this interactive UI using something like an Expect script (having decided to use python for the automation tool I was already eyeing the pexpect docs) it seemed like a lot of work for something that should be more straight forward.

I did eventually find documentation for a set password command but it mainly talks about "changing" the password, not directly showing any examples of how to set an initial password. However, some more web crawling lead me to a forum thread on a URL named after one of the earlier incarnations of the Spectrum Protect system, namely ADSM: [SOLVED] Initial Login - Automate.

By just calling dsmc set password <initial password> <initial password> (supplying the same password twice) it will actually set the initial password without us having to be interactively prompted for it. While I am not the biggest fan of running CLI commands with sensitive arguments (since the information is visible in the process list if anyone happens to be watching) at least it is not a long lived process and will generally only be executed at initial bootstrap of the machine so the damage is limited. I salute you, my dear ancients!

All things considered, while the information was a bit hard to find at least there was indeed a way to set the password. So now we only have to set the encryption password, how hard can it be?

The encryption password: oh my...

The dsmc client is able to encrypt data before sending it to the backup server. The way it works is that you configure an include.encrypt pattern in your client configuration. For example, if you wanted to encrypt everything on a UNIX-like machine you could configure the pattern include.encrypt /.../* (based on IBM Spectrum Protect client encryption). When a file path is about to be backed up the client will check if it matches the pattern. If the match is successfull the file will be encrypted.

Looking at the Encryptkey docs there are three alternatives as to how the key is managed:

  • Prompt

The user is prompted for the encryption key password when the client begins a backup or archive.

... no, this ain't it.

  • Save

The encryption key password is saved in the backup-archive client password file. A prompt is issued for an initial encryption key password, and after the initial prompt, the saved encryption key password in the password file is used for the backups and archives [...]

... this seems closer to what we want, but again with the prompts? Why are you making my life difficult? Maybe the next one will solve my needs...

  • Generate

An encryption key password is dynamically generated when the client begins a backup or archive. This generated key password is used for the backups of files matching the include.encrypt specification. The generated key password, in an encrypted form, is kept on the IBM Spectrum Protect server. The key password is returned to the client to enable the file to be decrypted on restore and retrieve operations.

... so this is closer to what we want in regards to "hands-off" bootstrapping, but storing the password on the server is a bit too much to ask for this. I actually want the part of client side encryption where the server is not able to decrypt stuff.

So what it comes down to is that we need to use the Save setting, but somehow supplying the password without being prompted for it. The way I understand the ordinary Save operation to work is that when a backup is running, if the client decides to backup a file and this file matches the pattern in you include.encrypt setting, only then will you be prompted for the encryption password if it is not already saved from before.

This makes it quite a bit harder to deal with the password prompt from Expect-like tools. Since, unlike the client password which is always asked for if not set, to trigger the encryption password prompt we would need to look at our configured include.encrypt pattern(s), find a file in the system matching this pattern, and then run a backup of that file. What if the file is already backed up? What if the file happens to be really big? This becomes ugly fast.

I had absolutely no luck finding any information on a way to set this password other than having dsmc interactively prompt me for it.

After giving up on my quest of finding an existing solution I started investigating what actually happens when setting the encryption password interactively, It turns out the files related to this operation are the following:

  • /etc/adsm/TSM.sth
  • /etc/adsm/TSM.KDB
  • /etc/adsm/TSM.IDX

There exists some descriptions of these files here: Secure password storage.

The TSM.KDB is the file actually storing passwords in an encrypted format, TSM.sth stores the encryption key used for encryption of TSM.KDB contents and finally there is TSM.IDX: an "index" file "used to track the passwords in the TSM.KDB file".

The article explains that these are "IBM® Global Security Kit (GSKit) keystores". As part of installing the .deb files for the client you also get a tool for interacting with those files: gsk8capicmd_64. Here is a reference PDF I dug up: GSKCapiCmd Users Guide.

Directly after setting the client (not encryption) password via dsmc set password we can see that password ends up in this file:

# gsk8capicmd_64 -cert -list -db /etc/adsm/TSM.KDB -stashed
Certificates found
* default, - personal, ! trusted, # secret key
#	1680006909_0

... the -stashed argument might look a bit odd but it means we are using the encryption key stored in TSM.sth for decrypting the contents of TSM.KDB.

So what we are seeing is a list of "certificates". The legend describes what type the following "label" is. In this case 1680006909_0 is a label for a "secret key". The label looks like a unix timestamp followed by some sort of counter. It turns out we can extract the secret value, like so:

# gsk8capicmd_64 -secretkey -extract -db /etc/adsm/TSM.KDB -label 1680006909_0 -target /root/1680006909_0  -stashed

Inspecting the resulting file we see something like this:

# cat /root/1680006909_0
-----BEGIN SECRET KEY-----
b3JpZ2luYWwtcGFzc3dvcmQtZG8tbm90LXN0ZWFs
-----END SECRET KEY-----

The data in the middle is just a base64 encoded string of the plain text client password. We can see the cleartext password like this (there is no newline on the end of the string so be prepared to have your shell prompt appear at the end of the password string afterwards):

# grep -v "SECRET KEY" /root/1680006909_0 | base64 -d
original-password-do-not-steal

So now we have a fairly good understanding of the TSM.KDB and TSM.sth files, but what is that TSM.IDX file? I have found no mention of it in the above reference guide, and no available commands mentions "index" or "idx". What can we tell from investigating it from our shell?

# ls -l /etc/adsm/TSM.IDX
-rw-rw-r-- 1 root root 645 Mar 30 22:35 /etc/adsm/TSM.IDX

It is fairly small at 645 bytes, is it a known filetype?

# file /etc/adsm/TSM.IDX
/etc/adsm/TSM.IDX: data

Not very helpful, just generic "data".

Any interesting text strings available in it?

# strings /etc/adsm/TSM.IDX
MYSERVER
XXXXXXXXXXXX
1680006909_0

OK, while it is a binary file the string contents seem fairly understandable. The first part is a capitalized version of the "SERVERNAME" setting in our dms.sys, the "XXXXXXXXXXXX" part matches the "NODENAME" setting of our client, and the last bit of data is the label that was present in TSM.KDB. So while I am not able to see anything in the file identifying this as a "client" password, it seems to at least point to one.

Lets see if we can glean some more information of the bytes present in the file (the following entry has been generated as a dummy example, but it mimics the real world example):

# hexdump -C /etc/adsm/TSM.IDX
00000000  ef 05 00 4d 59 53 45 52  56 45 52 00 00 00 00 00  |...MYSERVER.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 00 58 58 58 58 58  58 58 58 58 58 58 58 00  |...XXXXXXXXXXXX.|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000180  00 00 00 00 00 31 36 38  30 30 30 36 39 30 39 5f  |.....1680006909_|
00000190  30 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |0...............|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000280  00 00 00 00 00                                    |.....|
00000285

It starts off with three fairly specific looking bytes, EF, 05 and 00, then the SERVERNAME, followed by null bytes until the NODENAME appears and then we have some more null padding until we reach the timestamp-based key entry reference. Finally there is some additional null padding until the end of the file.

At this point you can try backing up a file matching your include.encrypt pattern via dsmc, have it ask you for an encryption password in order to proceed, and after the operation has concluded you can then inspect TSM.KDB and TSM.IDX again.

The TSM.KDB database now has a new entry, something like this:

# gsk8capicmd_64 -cert -list -db /etc/adsm/TSM.KDB -stashed
Certificates found
* default, - personal, ! trusted, # secret key
#       1680006910_0
#       1680006909_0

So a new entry has been placed at the top of the list, if we extract this we will find the same kind of secret as we saw for the client password, but containing the encryption password instead. How about TSM.IDX?

# ls -l /etc/adsm/TSM.IDX
-rw-rw-r-- 1 root root 1290 Mar 30 22:36 /etc/adsm/TSM.IDX

The file has grown from 645 to 1290 bytes, the keen eyed observer might notice that 1290/2 = 645. It seems one "entry" in the file is 645 bytes long. How does the data look now?

# hexdump -C /etc/adsm/TSM.IDX
00000000  ef 05 00 4d 59 53 45 52  56 45 52 00 00 00 00 00  |...MYSERVER.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 00 58 58 58 58 58  58 58 58 58 58 58 58 00  |...XXXXXXXXXXXX.|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000180  00 00 00 00 00 31 36 38  30 30 30 36 39 30 39 5f  |.....1680006909_|
00000190  30 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |0...............|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000280  00 00 00 00 00 ef 05 01  4d 59 53 45 52 56 45 52  |........MYSERVER|
00000290  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000380  00 00 00 00 00 00 00 00  58 58 58 58 58 58 58 58  |........XXXXXXXX|
00000390  58 58 58 58 00 00 00 00  00 00 00 00 00 00 00 00  |XXXX............|
000003a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400  00 00 00 00 00 00 00 00  00 00 31 36 38 30 30 30  |..........168000|
00000410  36 39 31 30 5f 30 00 00  00 00 00 00 00 00 00 00  |6910_0..........|
00000420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000500  00 00 00 00 00 00 00 00  00 00                    |..........|
0000050a

From this we can start to see some patterns emerge. It seems the initial two EF and 05 bytes are present at the start of both entries, we also see that the following byte is 00 for the first entry and 01 for the second entry, so this seems like an index counter. Other than that the information is the same except for holding different key references. I tried changing the configuration of "SERVERNAME" to a string of a different length (think "MYSERVERTEST" instead of "MYSERVER") and the entry remained 645 bytes long.

This made me sure that an entry is a static size no matter the length of the strings held inside the entry fields, and there will just be null bytes used to fill out the unused space for a given field.

One thing I find unclear about all of this is that none of the entries have any identifying information stating "this is a client password" or "this is an encryption password". At this point I just assume that the index is the deciding factor.

With this new knowledge I started thinking that maybe I could add these entries myself? I had already figured out I could add a password entry to TSM.KDB by doing this:

# gsk8capicmd_64 -secretkey -add -db /etc/adsm/TSM.KDB -label $(date +%s)_0 -stashed -file /root/my-password-file

This worked as expected. I did notice I should probably skip the trailing newline character when adding my password string to /root/my-password-file because the secret added to TSM.KDB by dsmc did not include a newline.

Adding this password with gsk8capicmd_64 did alter KSM.KDB, unfortunately it did not touch TSM.IDX. Without the pointer in TSM.IDX the client still asked for an encryption password and I have not been able to find a way to easily manage this file. However, since I was already writing python to automate the rest of the installation, an idea started to form, equally interesting and disgusting: maybe I could just write out a matching binary entry myself?

Creating my own TSM.IDX entry

What we now know about the IDX file is that it has a static entry size, and luckily enough it seems all fields in it are of a static length as well. After counting bytes for a while these are the sizes I came up with (the below is an excerpt from my global python declarations):

PREFIX_EXPECTED_BYTES = b"\xef\x05"
PREFIX_SIZE = 2
INDEX_SIZE = 1
SERVERNAME_MAX_SIZE = 256
NODENAME_MAX_SIZE = 130
LABEL_MAX_SIZE = 256
ENTRY_SIZE = (
    PREFIX_SIZE + INDEX_SIZE + SERVERNAME_MAX_SIZE + NODENAME_MAX_SIZE + LABEL_MAX_SIZE
)

So we have a "magic" prefix being the two leading bytes for each entry, followed by a 1 byte index counter. Then we have 256 bytes for the servername field, 130 bytes for the nodename field (why not 128 I do not know, I was wondering if I had counted wrong there but it seems not), and finally another 256 bytes for the label.

Adding this up we get the entry size of 645 which matches the file size we saw for the single-entry file initially.

With this I was able to construct a second entry for the file (expecting the first entry to have been created when setting the client password earlier) like so:

index = 1
servername="MyServer"
nodename="XXXXXXXXXXXX"
label="1680006910_0"

prefix_bytes = PREFIX_EXPECTED_BYTES
entry += prefix_bytes

# The index is at most one byte so therefore require length=1
index_bytes = index.to_bytes(length=1, byteorder="big")
entry += index_bytes

servername_bytes = upper_if_ascii(servername).encode(encoding="utf-8")
servername_padding_bytes = b"\x00" * (SERVERNAME_MAX_SIZE - len(servername_bytes))
entry += servername_bytes + servername_padding_bytes

nodename_bytes = nodename.encode(encoding="utf-8")
nodename_padding_bytes = b"\x00" * (NODENAME_MAX_SIZE - len(nodename_bytes))
entry += nodename_bytes + nodename_padding_bytes

label_bytes = label.encode(encoding="utf-8")
label_padding_bytes = b"\x00" * (LABEL_MAX_SIZE - len(label_bytes))

entry += label_bytes + label_padding_bytes

Update: See 'Using the python "struct" module' below for a better way of doing this.

I have omitted things like verifying the supplied servername, nodename and label fits inside the field boundaries, verifying the resulting entry length matches ENTRY_SIZE etc for brevity. What I then did was "read the existing file, create my new entry, append my entry to the existing data and write out a new file".

So by basically building up the entry field by field, filling up any remaining space in a given "field" with null characters I could now create my own TSM.IDX entry to match the TSM.KDB entry created with gsk8capicmd_64.

The upper_if_ascii() was a result of me playing around with how the file reacted if I inserted UTF-8 characters like "åäö" in the SERVERNAME configuration. It turns out the entry inserted in the file would have all ASCII characters uppercased but for some reason the two byte åäö characters was left untouched.

This meant calling the python built in upper() method on the string would add a field containing MYSÖRVER, where dsmc itself would insert MYSöRVER. While it is not likely we will use non-ASCII characters it felt wrong leaving this known inconsistency in there. upper_if_ascii() just walks the characters of the string and if the character is decided to be ASCII it calls upper() on it, otherwise it returns the character as it is.

Similarly to creating the entries I also found myself needing to parse the file to see what entries were already taken (to tell if we needed to add our extra entry etc).

Given that I had the entry field sizes I could use this to iterate over any number of entries in the file and carve out the different fields like this:

with open(path, "rb") as fileobj:
    data = fileobj.read()

num_entries = int(len(data) / ENTRY_SIZE)

entry_index = 0
while entry_index < num_entries:
    # Set the entry offset in the file based on what entry we are on,
    # starting with 0 for the first entry and then going forward.
    prefix_offset = entry_index * ENTRY_SIZE
    prefix_bytes = data[prefix_offset : prefix_offset + PREFIX_SIZE]

    if prefix_bytes != PREFIX_EXPECTED_BYTES:
        raise ValueError(
            "unexpected start of IDX entry, expected: '{PREFIX_EXPECTED_BYTES}', actual: '{prefix_bytes'}"
        )

    index_offset = prefix_offset + PREFIX_SIZE
    index_byte = data[index_offset : index_offset + INDEX_SIZE]

    # The byteorder does not matter for a single byte, so "big" has just been
    # randomly selected. If the index is ever allowed to be more than one byte
    # this needs to be verified, but we raise an execption above if this is
    # the case so it is noticed.
    index = int.from_bytes(index_byte, byteorder="big")

    servername_offset = index_offset + INDEX_SIZE
    servername_bytes = data[
        servername_offset : servername_offset + SERVERNAME_MAX_SIZE
    ]
    servername = servername_bytes.decode(encoding="utf-8").rstrip("\x00")

    nodename_offset = servername_offset + SERVERNAME_MAX_SIZE
    nodename_bytes = data[nodename_offset : nodename_offset + NODENAME_MAX_SIZE]
    nodename = nodename_bytes.decode(encoding="utf-8").rstrip("\x00")

    label_offset = nodename_offset + NODENAME_MAX_SIZE
    label_bytes = data[label_offset : label_offset + LABEL_MAX_SIZE]
    label = label_bytes.decode(encoding="utf-8").rstrip("\x00")

    # Go to next entry
    entry_index += 1

Update: See 'Using the python "struct" module' below for a better way of doing this.

Similarly I have omitted some sanity checking above, like "is the length of the data read from the file evenly divisible by entry size" etc.

With these things handled the automation tool was now able to take an encryption password as input, call gsk8capicmd_64 to add an entry to TSM.KDB and then call my own functions for appending the necessary entry to TSM.IDX. With this I can finally bootstrap a client without having to be prompted for passwords. So bring out the celebratory beverages, right? Well...

Abandon hope all ye who modify backend files directly

This approach to working with files is ultimately a dirty solution and is most definately not what the designers of the system intended. While I am happy that I am able to create a workaround to improve server handling, there are no guarantees that this will work forever. Just as an example, the previously mentioned Secure password storage article already mentions a migration from their earlier file setup to the one I have been dealing with here like so:

When you upgrade to the IBM Spectrum Protect 8.1.2 or later client from an earlier client that uses the old password locations, the existing passwords are migrated to the following files in the new password store

The hack I came up with above is not likely to survive such future changes in a gracful way (how likely such a future change is I do not know either).

Dear IBM...

If anyone happens to know a better way to deal with this which can be carried out without interactive prompts then I would be very interested to hear about it. Throughout this experience I have felt a constant "I can't be the only one wanting to bootstrap this client without interactive prompts".

If I have indeed not missed something, and there actually is no way to do this other than the hacky way I just did, then I would kindly request that someone at IBM considers this and supplies a standard way of setting this information in a script-friendly way. With the general notion that servers are something you spin up programatically these days there should be no need to visit them individually like the backup client seems to expect.

Update: Using the python "struct" module

A friend of mine who read this article mentioned the python struct module as an easier way to work with the binary TSM.IDX entries as opposed to the manual padding I was doing above. Looking into this it made things much prettier. When using the struct module you describe the format of the data, and can then pack or unpack information based on that format. In my case the format ended up being defined (and verified) like this:

# Struct format string based on above numbers (INDEX_SIZE is replaced by
# an unsigned char "B")
 STRUCT_FORMAT = (
     f"{PREFIX_SIZE}sB{SERVERNAME_MAX_SIZE}s{NODENAME_MAX_SIZE}s{LABEL_MAX_SIZE}s"
 )

 # Verify our manually counted sizes matches the size resulting from our struct
 # format string
 if ENTRY_SIZE != struct.calcsize(STRUCT_FORMAT):
     print(
         f"ENTRY_SIZE ({ENTRY_SIZE}) does not match struct format string size ({struct.calcsize(STRUCT_FORMAT)})"
     )
     sys.exit(1)

Then, creating an entry (with the same padding as was done above) is as easy as this:

entry = struct.pack(
      STRUCT_FORMAT,
      PREFIX_EXPECTED_BYTES,
      index,
      bytes(upper_if_ascii(servername), "utf-8"),
      bytes(nodename, "utf-8"),
      bytes(label, "utf-8"),
)

... and going in the other direction, reading out fields from the binary data:

(
    prefix_bytes,
    index,
    servername_bytes,
    nodename_bytes,
    label_bytes,
) = struct.unpack(
    STRUCT_FORMAT, data[prefix_offset : prefix_offset + ENTRY_SIZE]
)

As can be seen, this saves you the work of doing manual padding calculations and has much nicer UX once the format string has been defined. Thanks a lot for the feedback, blueCmd!