SSHMux Part 2 - The Internals

In my last post I told the tale of how I ended up writing my own SSH command runner / multiplexer which I creatively named SSHMux. In this post I am to give a more thorough examination of the internals of SSHMux and explain how and why I chose to do things a certain way. Forewarning this post will be quite dense and code heavy, proceed at your own risk.

First lets discuss the config loader, this is the module responsible for digesting the TOML file that defines the list of hosts and the command to run on each of those hosts. For reference, here is an example of what that TOML file looks like.

command = "uptime"

[[hosts]]
host = "server1.local"
user = "root"
identity_file = "~/.ssh/id_rsa"

[[hosts]]
host = "server2.local"
user = "notroot"
identity_file = "~/.ssh/other_id_rsa"

[[hosts]]
host = "server3.local"
user = "notroot"
port = 2220
identity_file = "~/.ssh/id_rsa"

We define a command, in this case we’re just running uptime. Then we define 3 hosts, we have the flexibility to point to define different credentials, auth keys, and ports for each host independent. This was an important design choice as the problem I was facing that lead to development of SSHMux required this kind of flexibility.

To digest this file first the bread and butter of any parser, a basic set of structs…who would’ve guessed?

#[derive(Debug, Deserialize, Clone)]
#[serde(deny_unknown_fields)]
pub struct Host {
    pub host: String,
    pub user: Option<String>,
    pub port: Option<u16>,
    pub identity_file: Option<String>,
}

#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Config {
    pub command: String,
    pub hosts: Vec<Host>,
}

Fairly standard affair here, nothing fancy, nothing too over the top. I would like to draw attention the added decorator serde(deny_unknown_fields), without over-explaining it, essentially this will create a check that looks for any fields in the config file that are not defined in the structs, if an unknown field is found, we bail and exit non-zero. While it’s not entirely necessary, I have adopted this as a standard practice in most parser code I write in one form or another. One of those “good practice” things, would rather think of it as overkill than something I should have done.

The rest of the config loader module is just 2 functions, load_config and validate_config. If it wasn’t obvious, the load_config function takes a single argument, being the file path, and passes that to fs::read_to_string. Now that we have the contents of the config file we can serialize it to our Config struct with toml::from_str. The function returns our Config wrapped in a Result for appropriate error checking and we’re off to the races.

Our validate_config function is where we start doing some actual work. We take our newly defined Config and iterate over the defined hosts, performing some basic sanity checks that might cause errors or misbehavior later on. To start I define a bog stock for loop that iterates over the list of hosts pulled from the config.

Check number 1, has the target address / hostname been defined? It sounds simple enough but what if? Due to poor naming conventions and a lack of creativity the hosts field in the Config is a vector of Host structs, in the iterator each Host is referenced as host but also contains a field called host. Which leads to the if statement looking like this

// Note: the i variable simply references the index of the current Host being checked.
if host.host.trim().is_empty() {
  anyhow::bail!("Host {} undefined hostname", i);
}

Simply put, if we have a host in our config that doesn’t have a hostname defined, we can’t target it, so we bail and tell the user to fix it.

Check number 2, this one is more meaty, but again an obvious oversight and another case of protecting the user (me) from themselves.

let mut seen = std::collection::HashSet::new;

let key = if let Some(user) = &host.user {
  format!("{}@{}", user, host.host);
} else {
  host.host.clone()
};

if !seen.insert(key.clone()) {
  anyhow::bail!("Duplicate host entry found");
}

HashSet::insert() has this neat feature where it will insert the value given into the hashset but it will return a boolean value as to whether or not that value was previously inserted into the hashset. Or in other words if the value given is unique or not. Leveraging this, we define a key if the host has a username defined user@host can be used to eliminate cases where we might want to run the command for multiple users on the same target. Otherwise we can just use the hostname for this check.

If both of these checks pass, the validate_config function will return Ok() and we can move on with the rest of the program.

Moving on to the meat and potatoes of the whole program the runner functions that create the SSH connection and perform the execution of the command on each host. Again this is a simple module consisting of 2 functions with very creative and well thoughtout names, run_ssh and run_all. Sidenote, both of these functions are async for obvious reasons.

First up, the run_ssh function. This function takes in the Host and a string containing the command defined in the config file.

async fn run_ssh(host: &Host, command: &str)

From there the function pulls out the port, username, and identity file defined in the config file. With the port definition .unwrap_or is used to default the value to 22 if none was defined, the other values are not unwrapped so we can perform some conditionals later. Next a mutable vector called ssh_args is defined, this initially will contain the argument for -p or --port which will be equal to either 22 or the custom value given in the config.

Next is the first conditional, the identity file. Using Some() we get either a quick way to determine if there needs to be a -i option given to the ssh command. Here is the full conditional statement.

if let Some(identity) = identity_file {
  ssh_args.push("-i".to_string());
  ssh_args.push(identity.clone());
}

Not much to explain here, if the identity_file contains anything we push "-i ~/.ssh/id_rsa" (or the equivalent) to the arguments vector.

Next we determine how we address the target host, user@host or just host. This might seem strange but in some cases where there are multiple users and complex ACLs in place, or when there are (machines) executing scripts against other machinesits not uncommon to have the username on a local machine match the username on the server. This is achieved again in a vary simple if Some() check against the user field, Creating a new variable destination.

let destination = if let Some(user) = user {
  format!("{}@{}", user, host.host)
} else {
  host.host.clone()
};

After all of this, the destination and the command are pushed onto the vector.

ssh_args.push(destination);
ssh_args.push(command.to_string());

For reference this entire function exists to generate a command that looks like this.

$ ssh -p 22 -i ~/.ssh/id_rsa user@server.local uptime

And then run it using Command::spawn()

let mut child = match Command::new("ssh")
  .args(&ssh_args)
  .stdout(Stdio::piped())
  .stderr(Stdio::piped())
  .spawn()
{
  Ok(child) => child,
  Err(e) => {
    eprintln!("Failed to spawn ssh command: {}", e);
    return;
  }
};

The run_all command is simply a wrapper around this function that iterates over our list of hosts and creates a “task” using tokio::spawn, these tasks exist inside of a vector and then that vector is looped over and each task executed.

pub async fn run_all(config: &Config, verbose: bool) {
  let mut tasks = vec![];

  for (i, host_config) in config.hosts.iter().enumerate() {
    let command = config.command.clone();
    let host_clone = host_config.clone();

    let task = tokio::spawn(async move {
      run_ssh(&host_clone, &command).await;
    });

    tasks.push(task);
  }

  for task in tasks {
    let _ = task.await;
  }
}

The rest of the program just defines a CLI and some output prettification. If you want to explore the code further. It’s available on Github here.