Overriding library functions in Mac OS X, the easy way: DYLD_INSERT_LIBRARIES

Back at MacHack 2003 Jonathan Rentzsch talked about how to override functions and inject code in Mac OS X using [several neat tricks](http://rentzsch.com/papers/overridingMacOSX). He also released a framework called [mach_star](http://rentzsch.com/mach_star/) which has two components: mach_override and mach_inject. These are great, but overkill for some simple cases.

A much easier way of doing library function overrides is using the DYLD_INSERT_LIBRARIES environment variable (analogous to LD_PRELOAD on Linux). The concept is simple: at load time the dynamic linker (dyld) will load any dynamic libraries specified in DYLD_INSERT_LIBRARIES before any libraries the executable wants loaded. By naming a function the same as one in a library function it will override any calls to the original.

The original function is also loaded, and can be retrieved using the dlsym(RTLD_NEXT, “function_name”); function. This allows a simple method of wrapping existing library functions.

Here’s a simple example which prints out the path of every file opened using the “fopen” function (lib_overrides.c):

#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>

// for caching the original fopen implementation
FILE * (*original_fopen) (const char *, const char *) = NULL;

// our fopen override implmentation
FILE * fopen(const char * filename, const char * mode)
{
    // if we haven’t already, retrieve the original fopen implementation
    if (!original_fopen)
        original_fopen = dlsym(RTLD_NEXT, "fopen");

    // do our own processing; in this case just print the parameters
    printf("== fopen: {%s,%s} ==\n", filename, mode);
    
    // call the original fopen with the same arugments
    FILE* f = original_fopen(filename, mode);
    
    // return the result
    return f;
}

And a simple test program (overrides_test.c):

#include <stdio.h>
#include <string.h>

int main(int argc, char const *argv[])
{
    char hello[] = "hello world";
    
    FILE *fp = fopen("hello.txt", "w");
   
    if (fp) {
        fwrite(hello, 1, strlen(hello), fp);
        fclose(fp);
    }

    return 0;
}

Compiled and tested:

tlrobinson$ gcc -Wall -o lib_overrides.dylib -dynamiclib lib_overrides.c
tlrobinson$ gcc -Wall -o overrides_test overrides_test.c
tlrobinson$ DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=lib_overrides.dylib overrides_test
== fopen: {hello.txt,w} ==
tlrobinson$

There are certainly scenarios far more interesting than this, though!

Presenting GCCalc: a horrible abuse of GCC

Following an [interesting discussion on Reddit](http://programming.reddit.com/info/62v70/comments) about [first class functions](http://en.wikipedia.org/wiki/First-class_function) in C, I was inspired to see what I could do with this new-found knowledge. The result is what I affectionately call “GCCalc”, for reasons that will become clear below.

GCCalc is a simple command line calculator, much like the common [bc](http://en.wikipedia.org/wiki/Bc_programming_language) calculator on many Unix systems. It’s implementation, however, is *very* different than most calculators. While bc is said to have “C-like syntax”, GCCalc’s syntax *is* C. Whatever you enter on the command line automatically gets compiled, loaded, and executed, and the result is returned (as a double) and printed to the screen.

You can either enter expressions like:

round(46.95886*sqrt(1+2/9.99*sin((21%5)*pow(2,8))))

or you can enter whole C statements (as long as they’re on one line, for now) like:

int i; for (i=0;i<10;i++) { printf("hello world!\n"); } printf("goodbye\n"); Unfortunately variables are scoped to the function that wraps them, so they don't persist across multiple entries. However, you can access the last result using the "last" variable (a double). [Here's the source file](http://tlrobinson.net/projects/gccalc/gccalc.c), and here's a syntax highlighted version: It's been tested on Mac OS X (Leopard) and Linux (Ubuntu Gutsy), with GCC 4. Compile with "gcc -o gccalc gccalc.c" on OS X, or "gcc -o gccalc gccalc.c -ldl" on Linux.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dlfcn.h>
#include <unistd.h>

#ifdef __ELF__
#define GCC_FLAGS "-fPIC -shared"
#define EXTENSION "so"
#else
#define GCC_FLAGS "-dynamiclib"
#define EXTENSION "dylib"
#endif

#define HEADERS "#include <stdio.h>\n#include<math.h>"

typedef double(func_return_double)(double);

unsigned count = 0;
char *cwd;
char tmp_path[1024] = {‘\0’};

void *lib = NULL;

int main(int argc, char **argv)
{
    double result = 0.0;
    char input_buffer[1024], code_buffer[2048], function_name[32], command_buffer[1024];
    
    // get out current directory, which we’ll use for tmp files (dlopen seems to need absolute paths)
    cwd = getcwd(NULL, 0);
    
    while (1)
    {
        // for unique function and file names (needed for dlopen/dlsym to work correctly)
        count++;
        
        // read in the next line
        printf(">> ");
        fgets(input_buffer, sizeof(input_buffer), stdin);
    
        // format the function name
        sprintf(function_name, "f%d", count);
        
        // format the code string: if it doesn’t contain a semicolon, assume it is just an expression
        if (strchr(input_buffer, ‘;’))
            sprintf(code_buffer, "%s\ndouble %s(double last) { %s\nreturn 0; }", HEADERS, function_name, input_buffer);
        else
            sprintf(code_buffer, "%s\ndouble %s(double last) { return (%s); }", HEADERS, function_name, input_buffer);
            
        // format the filename string, delete the file if it exists
        sprintf(tmp_path, "%s/libtmp%d.%s", cwd, count, EXTENSION);
        unlink(tmp_path);
        
        // format the gcc command string
        sprintf(command_buffer, "gcc -Wall %s -x c – -o %s", GCC_FLAGS, tmp_path);
        
        // execute gcc command, write out the code
        FILE *fp = popen(command_buffer, "w");
        fwrite(code_buffer, 1, strlen(code_buffer), fp);
        fprintf(fp, "\n");
    
        // pclose waits for gcc to terminate (fclose/close do NOT thus compilation will sometimes not finish prior to the dlopen)
        pclose(fp);

        void *ptr = NULL;
        
        // open the just-compiled dynamic library
        if ((lib = dlopen(tmp_path, RTLD_NOW|RTLD_LOCAL)) == NULL) {
            puts(dlerror());
        }
        // get the function pointer
        else if ((ptr = dlsym(lib, function_name)) == NULL) {
            puts(dlerror());
        }
        
        // execute it
        if (ptr != NULL)
        {
            func_return_double *func = (func_return_double*)ptr;
            result = (*func)(result);
            // print the result
            printf("=> %.*lf\n", (result/((int)result)>1.0)?5:0, result);
        }

        // clean up: close the library, delete the temp file
        dlclose(lib);
        unlink(tmp_path);
    }

    return 0;
}

Thanks to jbert on Reddit for the initial code and inspiration.

If only I had known about this back when The Daily WTF has having their [OMG WTF](http://omg.thedailywtf.com/) crazy calculator programming contest…