test page
Box Heading
Box Content
Joel on Software
CityDesk News

Test page
apg is a great password generator

We spend a lot of time on this site talking about exciting Big Picture Stuff like .NET versus Java, XML strategy, Lock-In, competitive strategy, software design, architecture, and so forth. All this stuff is a layer cake, in a way. At the top layer, you've got software strategy. Below that, we think about architectures like .NET, and below that, individual products: software development products like Java or platforms like Windows.

Go lower on the cake, please. DLLs? Objects? Functions? No! Lower! At some point you're thinking about lines of code written in programming languages.

Still not low enough. Today I want to think about CPUs. A little bit of silicon moving bytes around. Pretend you are a beginning programmer. Tear away all that knowledge you've built up about programming, software, management, and get back to the lowest level Von Neumann fundamental stuff. Wipe J2EE out of your mind for a moment. Think Bytes.

Why are we doing this? I think that some of the biggest mistakes people make even at the highest architectural levels come from having a weak or broken understanding of a few simple things at the very lowest levels. You've built a marvelous palace but the foundation is a mess. Instead of a nice cement slab, you've got rubble down there. So the palace looks nice but occasionally the bathtub slides across the bathroom floor and you have no idea what's going on.

So today, take a deep breath. Walk with me, please, through a little exercise which will be conducted using the C programming language.

Remember the way strings work in C: they consist of a bunch of bytes followed by a null character, which has the value 0. This has two obvious implications:

  • There is no way to know where the string ends (that is, the string length) without moving through it, looking for the null character at the end.
  • Your string can't have any zeros in it. So you can't store an arbitrary binary blob like a JPEG picture in a C string.
  • Why do C strings work this way? It's because the PDP-7 microprocessor, on which UNIX and the C programming language were invented, had an ASCIZ string type. ASCIZ meant "ASCII with a Z (zero) at the end."

    Is this the only way to store strings? No, in fact, it's one of the worst ways to store strings. For non-trivial programs, APIs, operating systems, class libraries, you should avoid ASCIZ strings like the plague. Why?

    Let's start by writing a version of the code for strcat, the function which appends one string to another.

    void strcat( char* dest, char* src )
         while (*dest) dest++;
         while (*dest++ = *src++);

    Study the code a bit and see what we're doing here. First, we're walking through the first string looking for its null-terminator. When we find it, we walk through the second string, copying one character at a time onto the first string.