Are bus errors still commonplace? Plus a bonus discussion on org-babel.

Tags:

What is a bus error?

I was reading through Peter Linden's Expert C Programming when I noticed an interesting example. On page 189, Peter talks about how one can cause a bus error. I've never had a bus error occur before. Maybe they're a solved problem?

Before we get too far ahead of ourselves, a bus error can occur when we access a variable at an address that's not valid for that variable. An address is not valid if the address is not evenly divisible by the length of the variable. In other words...

/* sizeof (int) == 4 */
int p1 = *(int *) 5;
/* Causes a bus error, 5 % 4 != 0 */

int p2 = *(int *) 32;
/* No bus error, 32 % 4 == 0 */

Realistically these programs would immediately seg fault as we don't have access to arbitrary memory addresses. (Unless we were working with embedded systems, perhaps...). To avoid this, we can use a union.

Peter Linden's Code

Using the sample code in Expert C Programming, pg. 189, I am going to see if it causes a bus error.

The address of the union must be divisible by 4 (or sizeof int), as it can store an integer. As long as sizeof int > sizeof char (or sizeof int > 1 as sizeof char == 1), we can successfully get our bus error.

union {
  char a[10];
  int i;
} u;
int *p = (int *) &(u.a[1]);
,*p = 17;
printf("*p %d\n", *p);

#+RESULTS: : *p 17

Look at that! No problems.

x86 is very forgiving when it comes to misalignment errors. For the most part, they just don't happen. This is great for us, but what if we ported this code over to a platform that is less friendly, like ARM?

Ideally, we want to see if a bus error can occur in our code, so that way we can avoid them during development, as opposed to fixing it later.

Looking through the gcc manual, I found a compile flag that will be useful.

fsanitize=undefined

Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector. Various computations are instrumented to detect undefined behavior at runtime.

By adding the -fsanitize=undefined compile flag, our program will print a runtime error whenever one occurs.

There are similar flags, -fsanitize=address and -fsanitize=thread, that can be useful for runtime error checking; look at the gcc manual for more information. I can combine options with commas, i.e. -fsanitize=address,thread,undefined.

-fsanitize=undefined

There is one change that I need to make to the code. When a runtime error occurs, the results are printed to stderr. When we're looking at our code through a terminal, stderr and stdout might seem like the exact same thing.

I am not running this code through a terminal. I'm using org-babel, a very powerful tool for literate programming. If our program runs successfully, org-babel will tell us the results.

Unfortunately, these results don't include stderr. In order to see the runtime error occur, I need to close stderr, then change stderr's file descriptor to point to stdout. This is what the dup2() function is doing.

dup2(STDOUT_FILENO, STDERR_FILENO);

union {
  char a[10];
  int i;
} u;
int *p = (int *) &(u.a[1]);
,*p = 17;
printf("*p %d\n", *p);
printf("p %lld\n", p);

#+RESULTS:

/tmp/babel-YOFYnN/C-src-93AiCJ.c:17:6: runtime error: store to misaligned address 0x7ffec796bddd for type 'int', which requires 4 byte alignment
0x7ffec796bddd: note: pointer points here
 40 5a 14 84 55 00 00  e0 be 96 c7 fe 7f 00 00  00 f5 9a c3 4a 31 08 2e  00 00 00 00 00 00 00 00  25
             ^ 
/tmp/babel-YOFYnN/C-src-93AiCJ.c:18:3: runtime error: load of misaligned address 0x7ffec796bddd for type 'int', which requires 4 byte alignment
0x7ffec796bddd: note: pointer points here
 40 5a 14 84 11 00 00  00 be 96 c7 fe 7f 00 00  00 f5 9a c3 4a 31 08 2e  00 00 00 00 00 00 00 00  25
             ^ 
,*p 17
p 140732246965725

And it works! We can now see the runtime error! We're trying to access an integer at address 140732246965725, which is not divisible by 4 (AKA sizeof int). Thus, a bus error occurs.

Crash and burn programming

Running code and printing out runtime errors is great. However, there's a saying in programming called "Fail early, fail often". What if we don't just want an error message printed? What if, instead, we want the program to immediately crash? After all, this is what would actually happen if we were on a CPU architecture that couldn't handle misaligned addresses.

I looked through the gcc manual and saw the -fno-sanitize-recover=all option. Supposedly, it does the following:

-fsanitize-recover=all and -fno-sanitize-recover=all is also accepted, the former enables recovery for all sanitizers that support it, the latter disables recovery for all sanitizers that support it.

Let's try it! I'm going to add -fno-sanitize-recover=all as a compile flag. This should cause the program to immediately crash, only printing the error message.

dup2(STDOUT_FILENO, STDERR_FILENO);

union {
  char a[10];
  int i;
} u;
int *p = (int *) &(u.a[1]);
,*p = 17;
printf("p %d\n", *p);

Huh? Why wasn't the error message printed? Crashing the program is what we wanted, but not without the error message! Without an error message, all we're doing is making our program harder to debug.

Fortunately, this isn't our fault. The error message is actually being printed, and it is being printed to stdout. If we were running our program in a terminal, we'd see the error message we expect.

Unfortunately, this is a limitation of org-babel. -fno-sanitize-recover=all causes a nonzero exit code to be returned on failure. org-babel does not like nonzero exit codes and fails to evaluate stdout when this happens. It does evaluate stderr when the exit code is nonzero, but only to a separate temporary buffer. At least this works outside of org-babel.

There's a (brief) discussion of this issue on the mailing list here. Given that this thread is 5 years old, I'm not holding my breath for a fix.

There is an easy solution for sh scripts; just create a line at the end with :. Unfortunately since this is C, that's not really an option.

Wrapping it up

The entire point of this endeavour is to try to make sure our code is portable. When I write a program for one system, that program better work on as many other systems as possible.

If any college students read this, professors don't like the "but it worked on my machine!" excuse. (On the other hand, it takes one mean professor to test with a different architecture in order to if you were careful about memory alignment. We can't predict everything!)

-fsanitize=undefined is a great flag to add when compiling; it catches more than just memory alignment! If you add the flag and forget about it, you will at least get a warning when undefined behavior occurs! I'd much rather have a program that doesn't work but I know why then a program that doesn't work and I don't know why.