Testing patches for Live Upgrade compliance

Live Upgrade is a feature of Solaris that lets you create alternate boot environments. This makes it easy to switch between OS builds at boot time, but also make upgrading much easier, less risky, and quicker. This extends to patching too.

I recently received a query from a customer asking how we ensure that patches installed via live upgrade do not interfere with the running system. As well as ensuring that the patch applies correctly to your alternative boot environment you need to be sure that the patch is not changing any files or killing processes on tour running system.

In Solaris 8 and 9 we use an interposition library to check this. We check all the open*, creat*,*link* calls to ensure that they are dealing with files on the correct boot environment; we allow changes in /tmp etc. and commands also need to load libraries from the running environment so we make exceptions for these. We also check the kill calls to ensure that processes are not being killed on the running system. An interposition library is one that is usually preloaded using LD_PRELOAD so that when a call is searched for the call as defined in our library will be matched rather than the system call. Heres a snippit of how we check for creat calls:

 
int
creat(char *path, mode_t mode)
{
        char *cwd;
        char *cmdname="creat";
        typedef (*realcreat_t)(char *p, mode_t m);
        static realcreat_t prealcreat;
        if (prealcreat == NULL){
                prealcreat=  (realcreat_t)dlsym(RTLD_NEXT, "creat");
                if (prealcreat== NULL){
                        (void) printf("dlopen: %s\n", dlerror());
                        return (0);
                }
        }
        parsepstname(path,cmdname);
        return ((*prealcreat)(path, mode));
}

Our creat() call takes the same arguments as the system call. The first thing we do is look for the real system call by calling dlsym(3C) and we store it. We then write out the file thats being created to a log file and call the real creat() call. The parsepstname() function works out the full path to the file and then filters out our exceptions (/tmp etc).

Similar functions need to be written for any calls that we want to examine.

One issue we came up against when designing this was that shell script often call /sbin/sh when they need to run other scripts. /sbin/sh is statically linked so our interposition library will not work. In the case of pkgadd the environment was also being cleared. We get around these problems by catching the call to execute /sbin/sh, reloading our environment variables from a file and then execing /bin/sh instead. It works but it’s a bit invasive. Also if we need to make changes to the test we need to recompile the library and reinstall it on the test machines. If only there was some way to dynamically trace what was happening on the system…

Well in s10 we can use dtrace for this. The procedure is basically the same; we check for certain system calls, filter out exceptions and flag an error if something is happening that should not be. Heres the dtrace script

#!/usr/sbin/dtrace -qs

int x;
BEGIN{
/* set it to something that wont match a pid for
   the syscall prov. below */
x=-1;
}

/* The process that we are interested in */
proc:::create
/execname == "patchadd" || execname == "patchrm"/
{
        x=pid;
        self->called_proc_create = 1;
}

syscall::open*:entry,
syscall::creat*:entry,
syscall::unlink*:entry,
syscall::link:entry,
syscall::symlink:entry
/progenyof(x)/
{
     self->path = copyinstr(arg0);
     printf("%s:%s:%s:%s\n", probefunc, self->path, cwd, execname);
}

We check for patchadd and patchrm processes being started and note the pid. Although you use the luupgrade command to do the patching it ultimately calls patchadd and patchrm to do the work. Then when we examine a system call we check that it is from the patchadd process tree with the progenyof() test. If it is we log the function and arguments. Rather than having dtrace handle the parsing we have a perl script in our test harness that filters out the exceptions and warns us of any errors.

We also check for kill calls in Solaris 10, but if a patch needs to start or stop a process it should really do so by svcadm. So we check expecially for any calls to that:

proc:::exit
/execname == "svcadm"/
{   printf("%s:%d:%s:%s\n", probefunc, arg0, execname,execname);
}

The dtrace is much more straightforward and easier to implement. It’s also tracing everything so we don’t have to worry about someone clearing the environment or calling statically linked commands.

This test has caught quite a few problems in patches. The majority of these are down to errors in the patch and package scripts where patch creators are allowed to write their own scripts; sometimes these are written by product teams that have not considered patching in a live upgrade scenario. We rarely see any issues with this test anymore. It seems that once we introduce a test we get an initial peak in test fails, the issues are fed back upstream and corrected and we then see a steady tailoff in failures.

Leave a Comment