elbencho is an I/O benchmarking tool developed by the illustrious Sven Breuner to combine the best aspects of fio and IOR into a modern, flexible I/O testing tool. It is much friendlier to run on non-HPC parallel environments since it does not rely on MPI for its inter-node synchronization, and it has really nice features like a live text UI so you can watch I/O performance in real time. Its code base is much cleaner and nicer than IOR as well.
Getting started
Installing elbencho on macOS requires the following:
brew install boost cmake gcc git libarchive curl ossp-uuid make ncurses openssl@3 zlib
And build with
make LDFLAGS_BOOST="-lboost_program_options -lboost_thread"
Single client
To do an fio-like write bandwidth test, I do something like this:
mkdir elbencho.seq.1M
./bin/elbencho ./elbencho.seq.1M \
    --threads 8 \
    --size 1M \
    --block 1M \
    --blockvaralgo fast \
    --blockvarpct 0 \
    --sync \
    --mkdirs \
    --write \
    --delfiles \
    --deldirs \
    --nolivewhere all of the arguments fall into one of several groups that affect what the benchmark actually does.
By default, elbencho performs all I/Os to one file, analogous to the IOR shared-file mode. With that in mind, the following define the general parameters of the benchmark:
- elbencho.seq.1Mis the name of the output file or directory.
- --threads 8uses eight I/O threads
- --size 8Mgenerates a file that is 8M (8,388,608 bytes) large
- --block 1Mperforms writes using 1M (1,048,576 bytes) transfers. Since we’re generating an 8 MiB file using 8 threads, this test will have each thread write exactly 1 MiB at different offsets.
- --blockvaralgo fastuses the “fast” algorithm for filling the write I/O buffers with randomized data
- --blockvarpct 0generates new random data in 0% of the write blocks (i.e., every write will contain the same randomized data)
- --synccalls sync after every step of the test to ensure that we capture the performance of writing data down to persistent media
The following define what tests to actually run:
- --mkdirscreates a directory in which each thread creates files
- --writeperforms a write test
- --delfilesdeletes the files created during the- --writephase
- --deldirsdeletes the directories created during the- --mkdirsphase
The following affects what information is actually presented to you:
- --nolivedisables a curses-like live update screen that pops up for long-running test phases
The order of these options is not important. elbencho will always order the tests in the sensible way (create dirs, create files, write files, delete files, delete dirs).
Since we only specified one output file (elbencho.seq.1M), all threads will
write to one shared file.  You can have elbencho create multiple files like
this:
./bin/elbencho ./outputfile.{0..4}In this case, five files (outputfile.0, outputfile.1, etc) will be created
and filled in parallel.  How different threads fill different offsets in
different files is described by Sven.
Multiple clients
To run on a cluster, you need to be able to open TCP sockets between your compute nodes. Provided you can do that, first consult the official Using elbencho with Slurm page to get an idea of what has to happen.
At NERSC, I had to modify the provided example script to appear as follows:
#!/usr/bin/env bash
#SBATCH -J elbencho
#SBATCH -C gpu
#SBATCH -t 60:00
#SBATCH -G 0
#SBATCH -A nstaff
#SBATCH -N 8
#SBATCH --tasks-per-node 1
#SBATCH --cpus-per-task 16
 
ELBENCHO="$HOME/src/git/elbencho/bin/elbencho"
 
# Build elbencho compatible list of hostnames
HOSTNAMES=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | tr "\n" ",")
 
# Start service on all nodes - note you have to foreground process at NERSC
# to prevent srun from falling through, thinking the task is clean, and
# killing orphaned processes.
srun $ELBENCHO --service --foreground &
 
# Wait for the elbencho service to actually open up its http ports.  Without
# this, the next step may attempt to connect to elbencho services that haven't
# yet initialized, resulting in failure.
sleep 5
 
# Run benchmark
$ELBENCHO --hosts "$HOSTNAMES" \
         --threads 16 \
         --resfile elbencho-result.$SLURM_JOBID.txt \
         --size 1t \
         --block 1m \
         --blockvaralgo fast \
         --blockvarpct 100 \
         --write \
         /vast/$USER/elbencho.data
 
# Quit services
$ELBENCHO --quit --hosts "$HOSTNAMES"We essentially run this like a hybrid MPI+OpenMP job such that there is one elbencho service run on each of the eight compute nodes, and that process is responsible for spawning 16 I/O threads. You could run one thread per service and 16 service processes per node like a straight MPI job I suppose.
File per process
The default mode for elbencho is to perform all I/O to a single file. This makes sense when you consider elbencho as a replacement to fio, but it can also aggravate performance problems caused by file system locking which may not be desirable.
To run elbencho in file-per-process mode (in a way analogous to ior -F), you
can do
mkdir somedirs
./bin/elbencho --threads 32 \
                 --files 1 \
                 --size 1t \
                 --block 1m \
                 --dirsharing \
                 --mkdirs \
                 --sync \
                 --write \
                 ./somedirsThe important parts here are
- --threads 32is analogous to using 32 processes per node
- --files 1is files per thread, per process. In this case, each of our 32 threads will each create one file and will perform I/O to it exclusively.
- --size 1twill make each file 1 TiB. In the above case we have 32 threads, so we will create 32 × 1 TiB files (or 32 TiB total spread over 32 files)
- --dirsharingmakes each thread and process create its files in a single shared directory. This isn’t strictly necessary, but this makes elbencho emulate the behavior of IOR’s- -F(file per process) option.
If you are running in multiprocess mode (using --hosts), each thread from each
process will create the number of files specified by --files.  For example,
srun -N 6 -n 6 \
     --nodelist node06,node07,node08,node09,node10,node11 \
     ./bin/elbencho --service --foreground
 
mkdir somedirs
 
elbencho --hosts node06,node07,node08,node09,node10,node11 \
         --threads 32 \
         --files 1 \
         --size 1t \
         --block 1m \
         --dirsharing \
         --mkdirs \
         --sync \
         --write \
         ./somedirsThe above
- Starts one elbencho service on each of six nodes (srun -N 6 -n 6)
- Creates the directory in which elbencho will generate its directory tree
- Runs a test across six nodes, each with 32 processes per node, and each generating a single 1 TiB file. A total of files will be created. That’s a lot of data!
Single shared file
Elbencho can also emulate IOR’s default strided access pattern using --strided:
elbencho --threads 2 \
         --size 8192 \
         --block 1024 \
         --write \
         --hosts node0,node1 \
         --strided \
         ./somefileThe above creates a single 8 KiB file called somefile, and each node writes 4 KiB to it using 1 KiB transfers. The exact interleaving pattern is as follows:
| node | thread | worker | block index | range | 
|---|---|---|---|---|
| node0 | thread0 | worker 0 | 0 | 0 - 1023 | 
| node0 | thread1 | worker 1 | 1 | 1024 - 2047 | 
| node1 | thread0 | worker 2 | 2 | 2048 - 3071 | 
| node1 | thread1 | worker 3 | 3 | 3072 - 4095 | 
| node0 | thread0 | worker 0 | 4 | 4096 - 5119 | 
| node0 | thread1 | worker 1 | 5 | 5120 - 6143 | 
| node1 | thread0 | worker 2 | 6 | 6144 - 7167 | 
| node1 | thread1 | worker 3 | 7 | 7168 - 8191 | 
That is, elbencho behaves the same as IOR would if IOR’s blockSize was the same as transferSize. segmentCount is inferred from elbencho’s --size and --block parameters.
Metadata testing
elbencho can also perform metadata performance testing like what mdtest does. To test the rate at which a client create empty files, do something like
mkdir somedirs
./bin/elbencho --threads 1 \
               --size 0 \
               --files 10000 \
               --mkdirs \
               --write \
               --delfiles \
               --deldirs \
               ./somedirsThis will run, using a single thread (--threads 1), a test where
- a new directory is created called /somedirs/r0/d0/(`—mkdirs)
- 10,000 new files are created (--files 10000)
- nothing is written to them so they stay empty (--size 0)
- those files are deleted (--delfiles)
- the directory structure created in step 1 is all deleted (--deldirs)
Like with mdtest, the --files argument specifies the files to create per
parallel worker - so using two threads will create twice as many files and
directories:
mkdir somedirs
./bin/elbencho --threads 2 \
               --size 0 \
               --files 10000 \
               --mkdirs \
               --write \
               ./somedirsThis will create the following:
find somedirs -type d
# somedirs
# somedirs/r0
# somedirs/r0/d0
# somedirs/r1
# somedirs/r1/d0
 
find somedirs/r0 -type f | wc -l
# 10000
 
find somedirs/r1 -type f | wc -l
# 10000You can force all threads to create all files in a single shared directory using
the --dirsharing argument.  Re-running the above example with this additional
argument gives us a different directory tree:
mkdir somedirs
./bin/elbencho --threads 2 \
                 --size 0 \
                 --files 10000 \
                 --mkdirs \
                 --write \
                 --dirsharing \
                 ./somedirs
 
# ...
 
$ find somedirs -type d
# somedirs
# somedirs/r0
# somedirs/r0/d0
 
$ find somedirs/r0 -type f | wc -l
# 20000More patterns
You can see exactly what order elbencho is issuing I/Os to files using the --opslog somefile.jsonl option. It generates a jsonl file with the offsets of each I/O issued for a job.