Last month we talked about how our code is a liability and determining what problems we need to solve before writing more code. However, code is inevitable. So, we should strive to make our code as good as possible. Fortunately, we have quite a few tools available to help us do that and this month we’re going to talk about some of them.

Introduction

It has long been the goal of many to make programming a thing of the past - a way that non-developers could tell the computer what they wanted and the computer would create the necessary code to solve their problems. Unfortunately, or fortunately, this has not yet materialized. Fortunately, because it means we’ve all got our jobs coding for the foreseeable future. Unfortunately because software that could do that for anyone could bring about all sorts of advances we haven’t even thought of.

However, I don’t believe software like that is a real possibility. In order for it to work for anything non-trivial, humans still would need to provide requirements to the computer in a clear, unambiguous manner. The requirements would need to be syntactically correct so the computer could understand and interpret them correctly. We all know how difficult it is to get clear and unambiguous requirements for building software. As humans, we’re really good at parsing the requirements, but understanding is a tough one. The requirements we receive are often open to interpretation and assumptions. Many times we don’t even realize we’ve made assumptions until a bug or defect is opened or a new requirement comes in to clarify or explain what they really wanted. And often what the client wants isn’t even known by them until they see the code running and decide that whatever they saw doesn’t match what they want, even if it does what they asked for.

How More Code Can Help

Even though software is pretty bad at writing software, that doesn’t mean we can’t use it to gain insight into our code. As I’m writing this article, I have a running count of words at the bottom of the screen. Right now it reads 370 words. While I could go through this article manually and figure out how many words it contains, software can do it significantly faster than I could ever hope for. Code can find syntax errors faster than I can. But we can do even more than syntax checking and word counts with software. We can use this software to gain insight into our codebase, let if find patterns, and use it to help determine ways we can improve our code, our tests, identify areas that need better testing or refactoring and more. This month, we’ll look at some of these tools and see what sort of insights they can provide, how to integrate them so you are kept informed and work on improving our code.

PHPUnit

PHPUnit has been the topic of many an article in PHP Architect over the years. So I’ll just briefly discuss it here. It’s a testing framework that helps with ensuring our code is doing what we expect and not doing what is not expected. Of course in order to use this tool, we’ve got a lot of work to do. We must write tests, whether they are at the unit level, integration, functional or system tests. For now, I’ll state that PHPUnit as a tool that you should be using and leave it to other future articles to elaborate more on that.

PHPLOC

The PHPLOC (or php lines of code) is a tool that doesn’t require any work on your part to use other than running it. It will run across your codebase and give you insight about the make-up of your code. It does a lot more than just checking lines of code though. Since it’s written in PHP and knows about PHP, it will give you listings of how many and what types of classes, comments versus lines of code and more. Take a look at Listing 1 to see a sample output from phploc some code I work on regularly.

Listing 1

phploc 2.1.3 by Sebastian Bergmann.

Directories                               621
Files                                    1592

Size
  Lines of Code (LOC)                  129318
  Comment Lines of Code (CLOC)          40834 (31.58%)
  Non-Comment Lines of Code (NCLOC)     88484 (68.42%)
  Logical Lines of Code (LLOC)          24354 (18.83%)
    Classes                             16202 (66.53%)
      Average Class Length                 10
        Minimum Class Length                0
        Maximum Class Length              212
      Average Method Length                 2
        Minimum Method Length               0
        Maximum Method Length              42
    Functions                            1336 (5.49%)
      Average Function Length              26
    Not in classes or functions          6816 (27.99%)

Cyclomatic Complexity
  Average Complexity per LLOC            0.08
  Average Complexity per Class           2.34
    Minimum Class Complexity             1.00
    Maximum Class Complexity            78.00
  Average Complexity per Method          1.38
    Minimum Method Complexity            1.00
    Maximum Method Complexity           34.00

Dependencies
  Global Accesses                          13
    Global Constants                        0 (0.00%)
    Global Variables                        0 (0.00%)
    Super-Global Variables                 13 (100.00%)
  Attribute Accesses                     3974
    Non-Static                           3965 (99.77%)
    Static                                  9 (0.23%)
  Method Calls                          13062
    Non-Static                          12972 (99.31%)
    Static                                 90 (0.69%)

Structure
  Namespaces                              525
  Interfaces                               65
  Traits                                    2
  Classes                                1360
    Abstract Classes                        0 (0.00%)
    Concrete Classes                     1360 (100.00%)
  Methods                                5363
    Scope
      Non-Static Methods                 5362 (99.98%)
      Static Methods                        1 (0.02%)
    Visibility
      Public Methods                     5157 (96.16%)
      Non-Public Methods                  206 (3.84%)
  Functions                                50
    Named Functions                         0 (0.00%)
    Anonymous Functions                    50 (100.00%)
  Constants                                20
    Global Constants                        0 (0.00%)
    Class Constants                        20 (100.00%)

Tests
  Classes                                  92
  Methods                                 261

As you can see, about 13 of the overall codebase is comments. The average length of a class is 10 lines of code, but the maximum is 212. That may be a class that’s good to look at and refactor. Looking further down, the average method length is just 2 lines of code but the longest is 42 lines. Again, this is potentially pointing to a place that might warrant some further investigation and simplification.

The next section is Cyclomatic Complexity. Cyclomatic complexity is a measure of how many different linear independent paths exist in the code. For example, a method with no loops and no branches has a Cyclomatic Complexity of 1. There are no branches and hence, there is only one way through that code. Each branch point in the code, including loops and case statements increase the cyclomatic complexity number by one. For example, a method with two if statements would score a 3 for cyclomatic complexity, scoring one for the entrance to the method and one more for each of the if statements.

Methods and functions with high cyclomatic complexity are harder to understand and harder to test. They are places that bugs find places to hide. This means I should probably look at my code and find ways to reduce the number of conditionals and loops for my method with a cyclomatic complexity of 34. Other tools we’ll talk about in a bit also use this metric so we’ll revisit in a bit.

Next up is the section for Dependencies. Currently there are no global constants or global variables, but 13 places that my code is accessing superglobals. I took a look at this one in particular and fortunately all of them happen in three methods across three classes. I could refactor those to inject the values I need instead of having those methods reaching out into the superglobals. Removing these accesses will make the code easier to test and remove a potential source of bugs. Also in this section is the numbers and types of attribute accesses and method calls.

The structure section indicates how many namespaces, interfaces, and traits were found. It splits classes between concrete and abstract. Methods are broken down between public and non-public and static and non-static. Additionally, functions and constants are shown. For the output in the listing, I also included the --count-tests option which also counts phpunit tests and lists the number of classes and methods.

PHPLOC provides interesting statistics relatively quickly on your codebase, but it is up to you to interpret and act on them. But for now, on to the next tool.

PHP_Codesniffer

If you’re working on a codebase with a team, being able to easily read and understand each class and section of code is important. A few years ago, persuading team members to follow a coding standard was a much bigger challenge than it is today. While there are still developers who rail against coding standards, most I see have embraced the reality that even if you don’t completely agree with all parts of any given coding standard, having a coding standard is better than not. Many, if not most, open source PHP projects define a coding standard they follow. And many of these now use one defined by the PHP-FIG (Framework Interoperability Group). PSR-1 and PSR-2 are the two. There are also a number of other coding standards that are included with PHP_Codesniffer such as PEAR and Zend standards.

PHP_Codesniffer allows you to use any of these defined standards, or to use them but either add in or remove certain “sniffs”, or define your own standard with any of the “sniffs” from any of the standards. The software will run across your codebase and report any errors or warnings for code that doesn’t conform to your chosen or defined coding standard. This allows software that is impartial to let developers know when they’ve submitted code that is not compliant with the coding standards, even going so far as to reject pull requests or commits that are not up to standards (if you so choose).

Additionally, there are number of tools available that will automatically reformat code to follow your chosen coding standard. I won’t be covering them here though. Choosing a defined coding standard (recommended) or creating your own (not recommended) means that no matter what section of code a developer is working on, the resulting code should look the same, which lowers the barrier to making changes by allowing developers to focus on the code, not how it’s formatted.

PHP Copy/Paste Detector

The PHP Copy/Paste Detector finds copy/pasted code in your software. And it’s probably smarter about it than you may think. First of all, in order to count, duplicate sections of code need to be longer than some threshold, so a bunch of small methods that are duplicated won’t be reported as duplicated. Secondly, even if the code has had variable names changed or comments added and been reformatted, phpcpd can find and report on those duplicate sections. It’s actually using the php parse tokens to determine blocks of duplicated code, so formatting and variable names don’t necessarily come into play. The report for phpcpd will indicate the files and line ranges for found duplicated code as well as a summary indicating how much of the codebase is made up of copy pasted code.

While it might be tempting to keep this tool reporting at 0% all the time, that may not be a good idea in all cases. In many things, including reducing copy/paste code, the answer is “it depends”. Often copy/paste code can be reduced by refactoring the code into a common method or class and then calling it from the places that require it. However, it is important to ensure that the code that is duplicated is actually doing the same thing for the same reason. It’s often better to deal with duplicated code as opposed to introducing an incorrect abstraction.

PHP Mess Detector

The PHP Mess Detector is another code analysis tool that can tell you about potential problems and issues. This includes reporting on dead code (code that is unreachable), overcomplicated expressions (cyclomatic complexity), and other possible bugs. The phpmd tool has a number of different rulesets that can be enabled or disabled, each containing several different rules that it uses to detect these (potential) problem areas.

The first set of rules are about writing clean code. It will detect and report on functions and methods that use boolean arguments. The reason for this is a boolean argument often represents a violation of the single responsibility principle (SRP). It’s the “S” in “SOLID”. It means each of our methods should do one thing and do it well. A boolean argument may indicate two different responsibilities provided by one method. To fix, the logic around the boolean flag could be extract into another method or class.

The “clean code” ruleset also looks for static access to other resources and uses of “else”. Static access can be a problem when testing because it makes it difficult to introduce a test double. And according to the phpmd docs, the “else” keyword is never needed and recommends the use of a ternary when doing simple assignments. Personally, I wouldn’t always follow all of those recommendations, but having the phpmd report list these as issues gives me an opportunity to revisit code and potentially come up with a better or simpler solution.

Next up, is the “Unused Code Rules” ruleset. These detect unused private fields, unused local variables, unused private variables and unused formal parameters. The first three of these I’ve found are always fixable and often indicate either a change in design or a refactor that left a little bit of mess in the codebase. Removing them is nearly always the right thing to do and should not ever result in broken code. The only rule I’ve had any issue with is the unused formal parameters. If the parameter is part of an interface you’re implementing then removing it would require a change to the interface which would necessitate the change of other classes that implement the interface. This may not be possible. If the method in question doesn’t implement a method from an interface, then removing the unused parameter would require finding all callers of the method and changing them. Tools like PHP Storm make this easy, but if the unused parameter is not the last one in the list, it’s imperative that to update all callers to remove the parameter. If this is not done, it will introduce an error.

The third set of phpmd rules are around naming. It detects fields, parameters and local variables that are too long or too short. For instance, variable names like $id will be flagged as being too short. Longer variables over 20 characters will also be flagged. Additionally, the naming rules will find constants that are not defined in upper case. It will detect and flag PHP 4 style constructors. These are constructors that aren’t “__construct”, but rather match the name of the class. Finally, it will flag “getters” for boolean fields that use the word “get” instead of “is” or “has”. For example, a getter named “getValid” on a boolean field will be flagged with the recommendation that the getter be called isValid() or hasValid().

Additionally, PHPMD includes a set of rules around design. It will flag code that uses exit(), eval() or goto. It will flag classes that have over 15 child classes, and will flag classes that descend from more than six classes. Both of these indicate there is likely an unbalanced inheritance hierarchy. Finally, it looks at coupling between objects. This means it will count up dependencies, method parameters that are objects, return types, and thrown exceptions. If there are more than 13 total, the class will be flagged. This detection works not just on formally type-hinted parameters, but it also examines the doc block comments for @returns, @throws, @param and others. Each of these coupled classes indicate some other class that the class in question must know something about, or that consumers of this class need to be aware of. Keeping this number low means it’s easier to work with the code.

The next-to-last set of rules are the so called “Controversial Rules”. They are mostly about coding style and many would be covered by a php_codesniff ruleset - ensuring class names, property names, method names, parameters and variables are defined in camel case. The final rule in this set is about accessing super global variables. Where phploc will report a count of how many times super global variables are accessed, phpmd will give you a report of where all those places in the code are.

The final ruleset for phpmd contain the “Code Size Rules”. These are the rules that for me at least, lead to the best improvements in the code, but typically the most effort to resolve. The first is Cyclomatic Complexity. As mentioned in the phploc section, this is how many paths there are through the code. PHPMD will flag methods with more 10 paths. Each “if”, “else if”, “for”, “while” and “case” will be counted, along with 1 for entering into the method.

The next rule is for NPathComplexity. It is somewhat related to Cyclomatic Complexity, but it is the number of unique paths through the code, not just linear independent paths. For this metric, each added conditional or loop can have a multiplier effect on the number of paths through the code. A score of 200 or higher will result in phpmd flagging the code.

The excessive method length rule looks at the lines of code in a method as an indicator that a method may be trying to do much and recommends refactoring into other helper methods and classes, or removing copy pasted code. Similarly, the excessive class length rule looks at the lines of code in the entire class, again as an indication the class might be doing too much.

The excessive parameter count rule flags methods that have more than 10 parameters. It may indicate that a new object should be created to group like parameters. The “Excessive Public Methods” rule flags classes that define more than 45 public methods. It indicates that a lot of effort may be needed in order to thoroughly test the class. The recommendation is the break the class into smaller classes that each do less. It will also flag classes that contain too many methods (public or otherwise) or too many properties. More than 15 properties or 25 methods may indicate a class that could be reduced in complexity. Both the public method and total method rules will ignore getters and setters by default.

Finally, the “Excessive Class Complexity” totals up all the complexity metrics of the various methods in a class. It gives an indication of the relative amount of time and effort that would be needed to maintain or modify the class.

It is simple to configure phpmd to use some or all of the rules across any of the rulesets. If a configured rule doesn’t make sense for your codebase or you don’t agree with it, simply disable it. Overall though, phpmd provides a great way to automatically detect what could be problem areas of your code.

Recommendations for Usage

While it is possible to run all of these tools on your own development machine, chances are that doing so will quickly become tedious and you’ll quickly stop running them. Instead, I’d recommend setting up a “build” in Jenkins or other continuous integration server (think TravisCI, Bamboo, or others). These tools can be configured to run any time new source code is checked in, whether that’s merged into the mainline, or preferably, also when a pull request is submitted. The CI tools can keep track of the various statistics and reports between one run and the next and produce charts and graphs that allow for easy viewing of trends and changes. The build job can be configured to ensure that statistics that indicate problems are increasing will cause a build to fail, indicating that the code should not be merged or accepted until the issues introduced are resolved.

The CI server can report back build status via email, slack, IRC and other ways to inform you or other developers or the results of running all of these tools. It’s then up to you to determine how much effort and when you want to maintain the code to reduce the various errors and warnings presented by these code analysis tools.

Conclusion

I’d recommend looking at these tools, trying them out, and that you also take a look at other tools that are mentioned on the PHP QA Tools page: http://phpqatools.org/. In addition to providing installation and configuration for the tools I’ve talked about, it also provides information about getting all of these tools (and more) to work in Jenkins. If you’re already running a CI server and don’t have these tools, consider installing them and integrating the reports into your build. Use the reports to increase the quality and maintainability of your code.

Finally, one tool we didn’t go into much this month was PHPUnit. With that tool, it’s important, but not always easy to write tests that are effective and help to ensure the code is doing the right thing. Next time I’ll talk about a tool that can help you determine if your tests are as effective as you might think they are. I’m really excited about sharing it with you and I hope you’ll join me next month.