Write and Use Self-Documenting Code

To this day, there is still a hidden, even subliminal debate between saving keystrokes and being verbose. My stance is that a majority of the C standard library is incomprehensible without memorization. I should only have to consult a reference to find out how to properly use a function, not to find out what its name even means in the first place.

But naming convention isn't the topic of this article, and it's not what I am talking about when I mention keystrokes and verbosity. What does "self-documenting" mean in the context of code? It means you don't have to write a comment or documentation on code that is "self-documenting". I have an extremist view here, meaning that I consider code bad if it requires comments or documentation to explain it. But even without an extremist view, I would still prefer self-documenting code.

I'm going to repeat an example Bjarne Stroustrup used in one of his talks on C++11 (C++0x, at the time). Consider you have a function with this prototype:

void drawRectangle(int, int, int, int);

The argument names are omitted because they were nondescript anyway, probably a through d. What does this function do? Well, it draws a rectangle, as its name self-documents. But how do we control where and how this rectangle is drawn? Well, I assume that's what the parameters are for.

But what are the parameters? Two points? A point, width, and height? A width, height, and point? Maybe it draws a rectangle from three points and one of the parameters is thickness? Something else entirely?

This is an example of code that is not self-documenting. You have no idea how to make it do what you want, and unless you read the source or consult possibly nonexistent documentation, you're just guessing or copying other code you think does what you want. How about we try and fix this prototype with some prototypes of our own:

void drawRectangle(Point, Point);

void drawRectangle(Point, Dimension);

void drawRectangle(Point, Point, Point);

Not only is each of these prototypes better on its own, they can all exist at once. How do they draw rectangles? Well, the first one takes two points, and rectangles can be drawn from two points, so that's no mystery. The second takes a point and a dimension, and a rectangle can be drawn with just a point, a width, and a height. It's still slightly ambiguous though, as we don't know if the given point is the center or the top-left corner, so I personally would not write this prototype. Ideally, the parameter name should indicate this. The third prototype takes three points, and if you've ever heard of three-point rectangles you know exactly how this one works too.

With all this in mind, I bet you can guess why I prefer strongly-typed languages. Languages like Python where objects and parameters can assume any type are a nightmare to work in - you havr to constantly consult comments, documentation, and implementations to understand how to use code properly. It's literally impossible to just glance at code and know what it does, how to use it, or how to change it.

With self-documenting code, you know there is only one correct way to use a piece of code. You can abuse code of course, but such code should be rejected. Since code has to have one correct use to be self-documenting, there is one feature of C and C++ that springs to mind as completely breaking this rule: pointers. If you've read my article on pointers, you already know I don't like them, but I'll go ahead and explain in the context of this article.

T *p;

All we know is that p is a pointer to objects of type T. Is it an owning pointer? That is, do we need to eventually free/release it? If so, how? Do we need to use free or delete? Do we need to use somelib_destroy_type? And when do we even need to free or release it? When some reference counter variable reaches zero? At user request? When it's reoved from a container?

If it's not an owning pointer, maybe it's an optional object? Do we need to check it for null and act differently accordingly? If it's null, do we assume a default value or do we stop executing code and treat it as an error? Or, are we just supposed to change the pointed object through this pointer?

Pointers are used for so many different things in so many different ways and in so many different scenarios that any code which uses a raw pointer is completely not self-documenting. It is impossible to write self-documenting code that uses pointers. Instead, people invented wrapper classes that allow you to write self-documenting code with pointer semantics - in C++, the standard library has classes like unique_ptr, shared_ptr, weak_ptr, reference_wrapper, optional, etc. which are all self-documenting, since there is only one correct way to use each one.

Another good example of self-documenting code is the operator overloading in the C++ standard library. This is often criticized, especially the choice of using the shift operators for stream interaction, but I beg to differ. With the stream insertion and extraction operators, you can tell at a glance whether you're inserting to or extracting from the stream based on which way the chevrons face. The fact that they perform bitwise shifts for integral types is irrelevant.

In fact, I think the stream insertion and extraction operators are more self-documenting than the bitwise shift operators on integers - why? Because you know exactly what's happening with the stream, whereas with integers you don't know what's happening to bits that get shifted off the end or bits that get shifted in from the edge, especially with negative vs positive. All you know for a fact is what happens to the bits in the middle.

String concatenation is another case where there is debate, though I don't understand why there is debate. Using + for string concatenation is self-documenting - you know what it should do just by looking at it, and your assumption is right. I don't know about other spoken languages, but in English, it is common to say "add a sentence after that paragraph" just as you would say "add 10 to that number", so for native English speakers it makes sense too.

There's also the subscript operator, which for obvious reasons is much less controversial, so much so that not allowing it for user-defined types is often considered a flaw in the language design. Similarly, there's the function call operator for functors for which people have the same opinion - lambdas/closures have become very popular in programming recently.

Moving on from operator overloading, I want to touch on Hungarian notation. If you haven't read "Making Wrong Code Look Wrong" by Joel Spolsky, please do. Apps Hungarian is a good way to write self-documenting code in languages where type aliases are not allowed (I'm looking at you, Java) or where making a class type to simulate the wrapped/aliased type is impossible or inefficient (I'm looking at you, Java).

I assume you gave that article a good read, so I'll transition right into my variant: type aliases. Most languages allow them, even C does. C++11 has nicer syntax with support for template aliases, even. But what's the point of aliasing an existing type? For the exact reason the article suggests you prefix variables with the kind of data they hold.

There's the obvious reason to alias an existing type - you only have to change one line of code to change the type used everywhere else in the code. The less obvious, even more useful reason, however, is that it makes your code both self-documenting and more abstract. The self-documenting part is obvious: the alias name tells you exactly how to treat it, regardless of the underlying type. The abstraction aspect is a bit tricker to understand: the alias name tells you exactly how to treat it, regardless of the underlying type.

No, you read correctly, they're one and the same. What do I mean? Well, I'll use an example from my own experience. I've been working on a community project to make a modular chess game in C++, and one of the new developers was looking through code and he kept seeing a type called Suit_t. Every time he found out what it was an alias of, it just aliased another type. Eventually he got down to something he understood: the root type was std::string (which is itself a type alias of std::basic_string<char>).

He asked why there was so much mystery involved and why we didn't just directly use the underlying type. I explained, it was because the underlying type was unimportant, and in fact is was an implementation detail. It was also only a temporary choice - it wouldn't be a string forever. The name Suit_t told you it represented the suit of the chess piece (e.g. black or white), and with our design the only important characteristics of the type are that we can tell if two pieces have different suits and that suits be dynamically created at runtime.

Thus, using a string was the most practical choice at the time. A string can do so much more than just be compared for inequality and be dynamically created at runtime, so in order to mask away things that would lock us to it we aliased it and used the minimalistic characteristics we needed. We can easily swap it out for another type without any hassle, but for now a string more than satisfies our requirements.

You might think that it's the exact opposite of self-documenting, "how do you know what you can and cannot do?", but the point is that the underlying type is an implementation detail, and aliasing hides that detail while at the same time telling you what the type is used for. it only takes a quick line of documentation to say "Suit_t can only be used in inequality comparisons" and you're done. On the other hand, if we had used a string everywhere in our code, every single time we used it we would have to document the class, member, or function to explain "this is supposed to be a suit". Using the alias reduces the required documentation dramatically.

This also goes back to what I was saying earlier: that self-documenting code should only have one right way to use it. A string can be used in so many different ways, and you could argue that using a string for the suit is abuse of the string type. You're right, that's why we aliased it so that the fact that it is a string is an implementation detail.

So, in conclusions, writing self-documenting code is an important skill to have, and you should reject or rewrite code that is not self-documenting before you forget how it works. I only touched on areas I thought were relatively untouched - I assume you know the other more commonly-known forms of self-documentation, such as class names, function names, and interface design.