A co-author of the C language, Brian Kernighan, wrote, “Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”. With that in mind, it makes sense to try and make our programs as simple as possible. Simple programs are easier to understand, easier to debug, and easier to maintain. It does take an effort to write simple code, though, so let’s talk about some ways to do that.

Problems That Lead to Complexity

One of the most common causes of complex code is not fully understanding the problem. And, unfortunately, unless the problem is trivial, this is going to happen to everyone on almost every problem they work to solve. I don’t mean to sound insulting or condescending, it’s just how it is. Bear with me, though, I’ll explain. If you’ve ever written a piece of code that is still running and has not needed any maintenance, added features or bugs to be fixed, the problem you solved was trivial, or it’s not being used. Sometimes the understanding of the problem may have been correct, but as time goes on, the problem changed and understanding of the problem does not.

Much of the code we write, especially for our jobs, needs to be updated. We have bugs to fix, features to add, font sizes to change and colors to adjust. This can be due to customer requests, business requests as well as a slew of other reasons. For a long-running systems which have needed no changes can all of a sudden need to be updated. Software doesn’t mature as it gets older. It rots. If everyone stopped writing code, and attackers stopped trying to exploit it, then I suppose I could accept an argument that code rot is not happening anymore. But that’s not realistic. Even if the code has run for years, there may be changes in the underlying operating system or in the system that interact with the code, or that the code interacts with. Security updates may force changes that adversely affect how the code runs. Perhaps a service that the code calls out to is changed and no longer accepts the payload you’re sending, or the services that send in data may have changes in requirements which in turn leads to further downstream changes.

In short, any non-trivial software will require changes at some point. The choice then is to either maintain the software and make the changes, decommission the software, so it can no longer be used, or leave it alone and deal with the consequences and fallout. Therefore, in software, change is inevitable, it is a certainty. And since we cannot know what all those changes are ahead of time, it is not possible to fully understand the problem. The problem changes as time goes on.

Poor Requirements

If you’ve been a developer for any length of time, you’ve certainly been presented with poor requirements. Building software is hard. Clients don’t know exactly what they want and assumptions are made, on both sides. Clients, and when I say this, I mean whoever it is that is creating the requirements for software that you build, make assumptions. If you’re building a product, and you’re the one coming up with the requirements, be aware that you are making assumptions - you are playing the role of the client. You may make fewer than someone who hasn’t written code, but you are making assumptions. You may be assuming the users of your software will use the feature in a particular way (or that they want the feature at all). In some cases, you may be playing three roles – the role of the client deciding what the software needs to do, the role of the developer creating the software, and the role of the user, actually utilizing the software to solve a problem.

There are ways to help make the requirements better – You can point out any inconsistencies, remove as many assumptions as you can identify, wireframe, whiteboard, build prototypes, etc., but even with these tools, assumptions will still sneak in, requirements will change. New features will be requested, bugs will be found, and edge cases identified. The combination of not completely understanding the problem and changing requirements can easily lead to more complexity in the software than there should be. It takes a very concerted effort of constant refactoring and redefining the understanding of the problem in order to keep complexity out and keep the software simple.

More Code, More Problems

There’s an old joke that there are two hard problems in computer science: naming things, cache invalidation, and off-by-one errors. Coming up with good names for things in code is absolutely one of the tougher problems. Beginner programmers, learning about the differences between ints and floats, chars and strings often choose short names that may indicate what type data a variable holds. They want to avoid typing and keeping track of the differences in data types may be difficult. More senior programmers figure that knowing what a variable represents and why it exists is more important. They tend to use longer, more descriptive variable names. Nevertheless, creating descriptive and useful names is difficult. I often use the nondescript $i variable as a counter in a for loop because it’s easy and doesn’t require any thinking about what $i actually means within the loop. In other naming exercises, I’m better - for function and method names, class names, and namespaces, I do my best to create good names. Sometimes, I’m more successful than others.

Doing Too Much

In almost every OOP talk I’ve given, I include the quote “If you need to use ‘and’ to describe what your class or method does, it’s probably doing too much.” The code can do too much for a number of reasons. Junior developers often don’t understand the need to separate concerns and split chunks of code into different methods and classes. Later on, it’s often easier to just add another conditional to meet the changing requirements of the software than it is to look at the software as a whole, understand how the new requirements fit in the whole scheme of things and refactor to keep complexity low and keep the software simple. Over time, if the software is maintained in this way, complexity grows slowly until the system is essentially a big ball of mud with little bits of functionality jammed in wherever it can fit.

Keeping the code simple is not simple work. It means that adding a new feature may require more code and classes than just adding a conditional or a loop. It requires the discipline to understand when to add a conditional vs when to refactor conditionals into classes, and when copy/paste might be the right way to solve a problem. It’s not easy, but as I’ve been saying, building non-trivial systems and keeping them simple is quite hard.

Knowing Too Much

One common pattern I see that quickly leads to complex software is when the classes and methods know too much, or when they expose too much. Encapsulation is the concept in building OOP that allows objects to hide data and functionality in order to protect them from outside interference and misuse. The “Law of Demeter”, also known as the “Principle of Least Knowledge” provides guidelines to help keep software simple. It says that if we have an object with a method, that method can invoke methods on the object itself, the parameters passed to the method, any objects created or instantiated in the object, or objects that were passed in, as well as global variables that the object can reach. This means if an object has a dependency, our code should not reach into that dependent object and do things with it. In PHP, we can often see violations of the Law of Demeter when a line of code contains more than two -> operators. For example:

$query = $this->mapper->getSql()->buildQuery();

In this case, whatever code includes the above would break if the mapper changes, as well as if the SQL object in the mapper changes. It means the code knows too much about the internal workings of the objects passed in. Code that doesn’t concern itself with how other objects work, or only does that minimally is simpler than code that has tendrils reaching into objects and going into places it doesn’t belong. This kind of code is hard to test, more prone to breaking, and is difficult to debug and understand. One real world example I see regularly is with ZF2’s SQL object when you want to determine what the latest id generated by the database happened to be. This code would be found within a method that runs an insert query. Let’s take a look:

$id = $this->sql
    ->getAdapter()
    ->getDriver()
    ->getLastGeneratedValue($nameOfSequence);

The issue is that while the SQL object was injected, and it’s technically ok to retrieve the Adapter from it, the Law of Demeter says we should not be calling anything on the Adapter. But this code requires we retrieve the Driver from the Adapter and then ask the Driver to tell us the value it received on the last insert for the generated id. In the case of MySQL, the sequence name is not needed, but in PostgreSQL, it is.

In order to unit test this code, the test double for the SQL object would require the getAdapter method to return a test double for the Adapter. That test double would need to return a test double for a driver object when getDriver was called. And the getLastGeneratedValue method on the double for the driver would need to return a value or emulate the behavior you’re trying to test. Rather than just injecting a double for the SQL object, we’d also need test doubles for an AdapterInterface, and a Driver, all of which must have behaviors and return values defined, even if you really don’t care about the returned id value. And if any of those objects changed behavior, every bit of code that retrieves the generated identifier would need to change as well.

It would have been better for the SQL object itself to have a getLastGeneratedValue method which could in turn delegate to an Adapter, which would delegate to a Driver. In that case, the code could have been:

$id = $this->sql
    ->getLastGeneratedValue($nameOfSequence);

With this design the behavior of the adapter and driver is unknown and of no concern to our class, and unit testing would only require the injection of an SQL test double.

// Conclusion

While the naming of things is hard, and sometimes, some level of complexity is needed, it’s best to write code that is self-explanatory. This is a lofty goal, and it’s not possible to do all the time. So we have comments in the code that can explain the how, or the what, or the why. The best comments provide insight into the “why” a piece of code does what it does.

On the other side, it’s not uncommon to see a comment written when the code is initially built. However, as the code changes and the method is updated and requirements are changed, often times, comments are left behind, unmaintained. I’d argue that an incorrect, invalid, or misleading comment is worse than no comment at all. Comments should be avoided where you can. To be clear, I don’t mean don’t comment your code. I’m all for commenting your code. The code will require comments, but if you can make your code simple and clear enough that it is obvious what’s happening, that is much better than a comment explaining what is otherwise an opaque bit of logic. Tracking through complex code is difficult enough without having to do so when the comments are lying or misleading. So if you change some code that has comments, be sure to keep them updated when updating code. Just as the comment beginning this section was misleading, bad comments lead to incorrect assumptions.

Too Many Loops and Conditionals

If you run PHPUnit against your code (and you should), one of the metrics you can generate is called the “CRAP” index. The conveniently named acronym for “Change Risk Anti-Patterns” index is a combination of cyclomatic complexity and the amount of code coverage a method has. If your code coverage increases, the CRAP score decreases. If your code complexity increases, the CRAP score goes up. Cyclomatic complexity is a measurement of how many linearly independent paths through a piece of code there are. Each additional if statement and every added loop construct mean there’s a new, different path through the code. For switch .. case statements, each case, including default would represent another unique linear path through the code.

To properly unit test code fully, there should be tests that exercise each of the paths. Cognitively, it becomes more difficult for a developer to reason through and keep track of all the ways through a bit of code as well as any changes to state and variables as the code progresses. Complexity leads to bugs. Fortunately, there’s a great way to reduce complexity by removing loops and conditionals.

Removing Loops and Conditionals

If we can treat everything the same, conditionals become unnecessary. You may have seen members of the community pushing for functions and methods that only return one type of thing. PHP’s own return type hints help with this. But PHP allows us to return anything we want from the same function. It’s not uncommon to see a bit of code that tries to return a record from the database. If the record isn’t found, it returns false, or null. This means that anything which calls this method must have a conditional to deal with either case.

$record = $this->getRecordById($id);
if ($record === false) {
  // Deal with no record found
} else {
  // Deal with the record
}

In the code above, we have two distinct paths, one if the record is found, one if it is not. To test properly we’d need at least two tests. You might wonder how to avoid returning something different if no record is found, and indeed, this does seem to be different. This can be accomplished by indicating the method will return a particular interface. You then have two (or more) objects which implement that interface – one object that acts normally and contains the data, and another, commonly referred to as a “Null Object” which has all the methods specified by the interface, but doesn’t need to actually contain the data (since there isn’t any). Typically, every method in the null object doesn’t do anything.

Let’s take a look at another strategy. Again, with the database, rather than a method that returns a single record (or nothing) imagine a method that gives back multiple records (or nothing), such as a filtering search over a table. A common return value would be an array of records. In this case, we get back an array regardless of whether the filter returned some records or excluded all of them. The caller of this method can safely loop over whatever is returned. It will work whether the array is empty or contains a bunch of records.

Declarative Programming

When we write loops in order to do things, whether it is totaling up values, transforming values, searching through an array to determine if something exists, or any other reason, we’re telling the computer “how” to do something. It requires more mental gymnastics to read that code and determine what it is doing. If instead, we can tell the computer “what” we want and let it work it out, it’s much easier to understand. Easy to understand code is simpler. Compare these two examples:

$result = [];
foreach ($people as $person) {
    if ($person['name'] == 'chad') {
       continue;
    }
    if ($person['cash_on_hand'] > 20) {
       $result[] = $person;
    }
}

SELECT * FROM people
 WHERE name != 'chad'
   AND cash_on_hand > 20;

Both essentially do the same thing, but in the first example, we’ve told the computer how to get the list we want. We have a loop and two conditionals. But at the end, $result should contain just the people who have more than $20 in their pocket and who are not named “Chad”.

The SQL example does the same thing, but rather than telling the computer how to give us what we want, we just tell it what we want and let it figure out how to do it. The code is more expressive and clear.

Next, On “Hoarders”

Earlier, I mentioned that treating everything the same can help eliminate loops and conditionals. Using a concept of a “Collection”, we can treat sets of data more uniformly, and we can write code that is more declarative. We can tell the computer what we want and let it sort out the details. I could go into how to derive your own collection, but there’s a pretty excellent one in the Illuminate/Support package. For a more in-depth look at Collections and how they are awesome, I’d highly recommend picking up Adam Wathan’s “Refactoring to Collections” book - http://adamwathan.me/refactoring-to-collections/.

Let’s take a look at how we can rework the PHP example to utilize declarative programming and collections. Last month, I talked about Transducers, so the map, filter, and reduce concepts should be somewhat familiar. First, let’s build some functions to use in those methods.

$isChad = function ($person) {
    return $person['name'] == 'chad';
};

That’s simple enough, but it’s limited. It will return true if we pass in a $person which has a name of “chad”. With a tiny bit more code, we can make something that will return a function which will match whatever we send in.

function matchesName ($name) {
    return function ($person) use ($name) {
        return $person['name'] == $name;
    };
};

This means that through the use of a closure on $name we can create functions that match any name. The $isChad function can be created like this:

$isChad = matchesName('chad');

Pretty simple, right? And it means we don’t have to write a new function just to match a new name. With a few minor modifications which I’ll leave to you, we can make a function which will create a function that matches if a given field matches a given name.

You may notice that in my original examples, we wanted to match people who didn’t have the name “chad”, but in this code, we’re returning true if the name is “chad”. We’ll get to why that is in just a moment.

First, let’s build the next helper function for filtering based on people with more than $20 in their wallet. Since we’ve already seen how to make a function that builds a function, we’ll start with that here:

function moneyOnHand ($money) {
    return function ($person) use ($money) {
        return $person['money'] > $money;
    }
}

This means we can define a function which returns true for people with more than $20 like this:

$moreThan20 = moneyOnHand(20);

We now have what we need.

$result = collect($people)
              ->reject(matchesName('chad'))
              ->filter(moneyOnHand(20));

Here’s what’s happening. First, the collect function is just a helper function which returns a new instance of the \Illuminate\Support\Collection object with the contents being whatever was in the $people array. Next, we call the built-in reject method which takes a function that should return true if the record should be excluded. So we’re telling it to reject any person in the collection named “chad”. This returns a branch new Collection object with that rejection in place. The original $people is not touched. Next, that Collection runs the filter method which keeps records which return true with the supplied function. The reject and filter methods are the opposite of each other. At the end, the $result variable will contain a collection which has just what we asked for. There are no loops or conditionals in our code, and it clearly expresses what we want and the intent of it. You can quickly read the code see what’s going on, as opposed to the foreach loop which takes a bit of time.

There are many other built-in functions we can use on collections to work with the values. If we wanted to just have a list of the names of the people who have more than $20, we could do $result->pluck('name');. If we wanted to sum up all the money from these people, then $result->sum('money'); will give us what we want. We could average it, or slice and dice the data in any other way we want in a simple, clear and declarative way. This means we’ve got less code, and it’s clearer, which helps the code stay simple.

Conclusion

Keeping code simple and easy to maintain requires work. It’s not easy, and it requires constant vigilance when we’re changing the code. Reducing conditionals and loops by using collections can simplify code and make it more expressive. Keeping the requirements in mind, avoiding assumptions, and keeping responsibilities limited and class implementations ignorant of the inner workings of other things help to keep the code simple. Understanding the problem fully is also one of the best ways to simplify the code, but fully understanding only comes after time. While it’s not easy, working to keep code simple pays off dividends. See you next month!

Leveling Up: Simple Is Better