Background

A customer who used a Solaris-based embedded UNIX system in his medical products was running into serious performance problems. UNIX communicated with the medical hardware via a SCSI channel, and this channel was shared by the system's hard drive.

The data-transfer requirements for the medical hardware were severe: once data started flowing, it could not be interrupted or there would be an underflow condition, which was fatal (to the software, not the patient). Under normal conditions, the system had enough headroom to accomodate the transfer without incident, but if there were also inbound jobs for the device, the processing of the jobs would consume enough disk I/O bandwidth to cause device underflow for the single job that was transferring to the hardware.

These transfer-failures were showing up often enough in the field to constitute a real problem for the tech support department, and there was enormous pressure to "fix it" somehow.

My customer was considering a drastic approach: create a lockfile while the data transfer was occurring and modify much of his software to check for this lockfile periodically and "slow down" when it found the file there. They reasoned that by intercepting the disk I/O they could achieve enough modulation of the non-transfer activity that it would get around the SCSI bus bandwidth limitations. But it meant modifying - and testing - a lot of software. He hated this idea.

An even worse idea was his thoughts on redesigning the SCSI subsystem. He really hated this idea.

The solution

My approach was to write a kind of system debugger, the "procmgr". It was a background daemon that scanned the system's process table at regular intervals, and when it found a "troublesome" process, it would throttle it if a critical data-transfer operation was in progress.

A text configuration file was used to tell the daemon which processes were to be controlled, and each had one of three possible modes:

Runnable: These processes were either important to the system (init) or didn't use enough resources to be worth bothering with (vi or telnetd). These processes were always left alone.
Throttle: These processes used a lot of I/O, but for various reasons we couldn't just fully stop them for a minute at a time. Targets included the FTP daemon, which would cause network clients to timeout if the daemon simply stopped.; So instead, the program intercepted all read() and write() system calls via the ioctl() interface and inserted a brief pause (250msec) before allowing the program to continue. The program being controlled had no way of knowing that it was being "debugged" in this way, and it had the effect of dramatically reducing the I/O resources used while the critical data transfer was going on.
Fullstop: These processes were I/O pigs and were simply stopped in their tracks for the duration of the critical data transfer. These included gzip and many of the other heavy-duty image processing programs.

Unknown processes were left alone but reported in a debug log so the customer could consider how to properly characterize them.

Controlling processes with /proc

The /proc filesystem provides a rich set of primitives for attaching and controlling processes, and the system debuggers are built around this interface. This replaces the older ptrace() interface of older UNIX systems, and it's been adopted by Linux as well.

This code fragment shows roughly how the logic worked, though this example drastically simplifies things by controlling only a single process and skipping all the error checking. It also ignores the issue of checking the global state (doing critical transfer or not), so this is more about illustrating how the /proc interface works.

#include <unistd.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <sys/procfs.h>

void control(int pid)
{
int         fd;             // file descriptor to process
sysset_t    syscalls;       // list of "interesting" system calls
char        fbuf[80];       // buffer for /proc filename
prstatus_t  pstatus;

    // create /proc filename and open it
    sprintf(fbuf, "/proc/%05d", pid);
    fd = open(fbuf, O_RDWR);

    // mark the "interesting" system calls
    premptyset( &syscalls );
    praddset( &syscalls, SYS_read );
    praddset( &syscalls, SYS_write );

    // set the "syscall entry" mask
    ioctl(fd, PIOCSENTRY, &syscalls);

    while ( ioctl(fd, PIOCSTATUS, &pstatus) == 0 )
    {
        if (pstatus.pr_flags & PR_ISTOP)    // stopped due to tracing?
            ioctl(fd, PIOCRUN, 0);

        usleep(250*1000);            // 250 milliseconds
    }

    // got an error from ioctl, target process must have exited

    close(fd);
}

Summary

This program was about 2500 lines of C++ and was delivered three days after the initial telephone conversation with the customer. It was tested and put into production, and in the more than five years since it was deployed, the customer has not seen a single critical-data transfer error in the field: this project was considered wildly successful.

Does this site look plain?

About Steve Friedl

UNIX Process Manager

Background

The solution

Controlling processes with /proc

Summary