How do you test a command-line interface?

This might seem like an obvious question, but the devil’s in the detail. Command-line interfaces often have lots of flags and options. Things can get complex quickly.

So how should we manage this complexity?

The answer? Fluent interfaces, an object-orientated design pattern using method chaining to create a (sort of) domain specific language to increase code legibility. Think of building a sentence one word at a time.

In this post, we’ll explore fluent interfaces by testing a small portion of a very well-known Unix command.

A `cat` Has 9 Lives 12 Flags#

Let’s look at the Linux cat command.

From the man page:

NAME
       cat - concatenate files and print on the standard output

SYNOPSIS
       cat [OPTION]... [FILE]...

DESCRIPTION
       Concatenate FILE(s) to standard output.

       With no FILE, or when FILE is -, read standard input.

       -A, --show-all
              equivalent to -vET

       -b, --number-nonblank
              number nonempty output lines, overrides -n

       -e     equivalent to -vE

       -E, --show-ends
              display $ at end of each line

       -n, --number
              number all output lines

       -s, --squeeze-blank
              suppress repeated empty output lines

       -t     equivalent to -vT

       -T, --show-tabs
              display TAB characters as ^I

       -u     (ignored)

       -v, --show-nonprinting
              use ^ and M- notation, except for LFD and TAB

       --help display this help and exit

       --version
              output version information and exit

How would you test this utility with all its flags and options?

To keep the scope small, let’s only test cating a file, and the --help and --number flags.

With that in mind, here’s a first attempt:


from pathlib import Path
import subprocess

# given a command string, run it and return the result
def run_command(command: str):
    return subprocess.run(
        command.split(" "), 
        stdout=subprocess.PIPE, 
        stderr=subprocess.PIPE,
        encoding="utf-8",
    )

# test 'cat --help'
def test_help_flag(expected_help_text: str):
    assert run_command("cat --help").stdout == expected_help_text

# taking a file as an argument, test 'cat <filename>'
def test_catting_file(filename: Path):
    file_contents = open(filename, encoding="utf-8").read()
    assert run_command("cat " + str(filename)).stdout == file_contents

# given 2 files - 'f1' with line numbers and 'f2' without,
# test 'cat --number f2' gives same result as 'f1'
def test_line_numbers(f1: Path, f2: Path):
    expected_line_numbers = open(f1, encoding="utf-8").read()
    cat_line_numbers = run_command("cat --number " + str(f2)).stdout
    assert cat_line_numbers == expected_line_numbers

What’s wrong with that? Well…

It’s not very clear what’s being tested as the test commands are constructed through string concatenation (+).
It violates the DRY principle as we need to copy “cat…” for each test.
What will we do if the functionality of cat is extended? We’d have to go back and make more copies of our tests with slight changes.

Becoming Fluent#

The problem is our approach to testing variants of the cat command produces neither clear nor scalable code.

To address this, let’s create a class to help us build instances of the cat command:

from pathlib import Path
from typing import List
from typing_extensions import Self

class Cat:
    
    # initialise with an optional file name (the file to "cat")
    # as well as switches for '--help' and '--number'
    def __init__(self, file_name: Path | None = None) -> None:
        self._command: List[str] = ["cat"]
        self._help: bool = False
        self._number: bool = False
        self._file: Path | None = file_name  # the file to cat
    
    # set the '--help' flag
    def show_help(self) -> Self:
        self._help = True
        return self
    
    # set the '--number' flag
    def show_line_nums(self) -> Self:
        self._number = True
        return self
    
    # build the complete command string
    def build(self) -> List[str]:
        if self._help:
            self._command.append("--help")
        if self._number:
            self._command.append("--number")
        if self._file:
            self._command.append(str(self._file))
            
        return self._command

With our new Cat class, we can rewrite our tests from earlier. We’ll also change the run_command function so that it accepts an instance of our Cat class rather than a str:

import subprocess

# given a Cat instance, build and run it then return the result
def run_command(command: Cat):
    return subprocess.run(
        command.build(), 
        stdout=subprocess.PIPE, 
        stderr=subprocess.PIPE,
        encoding="utf-8",
    )

# test 'cat --help'
def test_help_flag(expected_help_text: str):
    assert run_command(Cat().show_help()).stdout == expected_help_text

# taking a file as an argument, test 'cat <filename>'
def test_catting_file(filename: Path):
    file_contents = open(filename, encoding="utf-8").read()
    assert run_command(Cat(filename)).stdout == file_contents

# given 2 files - 'f1' with line numbers and 'f2' without,
# test 'cat --number f2' gives same result as 'f1'
def test_line_numbers(f1: Path, f2: Path):
    expected_line_numbers = open(f1, encoding="utf-8").read()
    cat_line_numbers = run_command(Cat(f2).show_line_nums()).stdout
    assert cat_line_numbers == expected_line_numbers

By turning "cat --help", "cat " + filename and "cat --number " + filename into Cat().show_help(), Cat(filename) and Cat(filename).show_line_nums(), we’ve turned plain-old string concatenation into a sort-of sentence explaining what’s happening at each step.

Now it’s clear exactly what we’re testing.

Food for Thought#

While we’ve improved readability, there are some things to consider.

Notice in our Cat class each of the methods (apart from build) returns Self? This is important as its what makes fluent interfaces work.

Fluent interfaces rely on method chaining - calling multiple methods on the same object. In order to do that, you must return a reference to the original object in each method, hence return self.

Also, in my implementation above, its easy to generate nonsense command strings. For example, you could write: run_command(Cat(filename).show_help().show_line_nos()), which translates to cat --help --number <filename> - that doesn’t make sense.

To prevent this, we could make the build method more defensive, so that flags are only added where they make sense. Here’s a revised Cat class with validation:

class Cat:

    # initialise with an optional file name (the file to "cat")
    # as well as switches for '--help' and '--number'
    def __init__(self, file_name: Path | None = None) -> None:
        self._command: List[str] = ["cat"]
        self._help: bool = False
        self._number: bool = False
        self._file: Path | None = file_name  # the file to cat
    
    # set the '--help' flag
    def show_help(self) -> Self:
        self._help = True
        return self
    
    # set the '--number' flag
    def show_line_nums(self) -> Self:
        self._number = True
        return self
    
    # build the complete command string
    def build(self) -> List[str]:

        if sum([self._help, self._number]) > 1:
            raise ValueError("Only one of '--help' or '--number' can be set.")
        
        if self._help and self._file:
            raise ValueError("Can't set file name when setting '--help' flag.")
        
        # Then build command
        if self._help:
            self._command.append("--help")
        elif self._number:
            self._command.append("--number")
        
        if self._file and not self._help:  # Only add file if not help mode
            self._command.append(str(self._file))
            
        return self._command

Though this version handles errors, you can see the complexity rising even with this small example. Ultimately, it’s a choice between preventing invalid constructions, or embracing the “garbage in, garbage out” philosophy.

Conclusion#

Command-line tools often look simple — until you start testing them. Between numerous flags, optional inputs, and subtle behaviours, it’s easy to end up with brittle, repetitive test code that’s hard to reason about and even harder to maintain.

In this post, we explored how fluent interfaces offer a clean and expressive alternative. By modeling the cat command as a class with chainable methods, we were able to build test cases that read more like plain English and less like string puzzles. The result: better readability, less duplication, and a clearer link between what’s being tested and why.

Of course, fluent interfaces aren’t a silver bullet. They require discipline in design, and as we saw, it’s still possible to misuse them if you’re not careful. But for many scenarios, they strike a strong balance between flexibility and structure.

And remember: this pattern isn’t limited to Unix commands. You can apply the same technique to APIs, database queries, file system operations — anything where configurability and clarity matter. So, next time you’re wrangling a pile of test strings, ask yourself: could this be more fluent?

Fluent Interfaces: Making Your Code Tell a Story

Table of Contents

A `cat` Has 9 Lives 12 Flags#

Becoming Fluent#

Food for Thought#

Conclusion#

Fluent Interfaces: Making Your Code Tell a Story

Table of Contents

A cat Has 9 Lives 12 Flags#

Becoming Fluent#

Food for Thought#

Conclusion#

A `cat` Has 9 Lives 12 Flags#