Writing tests for your code is an outstanding way to ensure your application is doing what it should do and not do what it shouldn’t do, without needing to hire a lot of QA staff or relying on your users to beta test and find bugs. Since this column started, I’ve written on various testing topics and tools – PHPUnit, phpspec, and humbug, as well as testing philosophies like TDD. It’s been six months since the last testing article, and I’ve got another testing tool and another technique to tell you about.

Behavior Driven Development

We’ve talked about TDD or Test-Driven Development before. TDD is writing tests that describe what you want the code to do or not do. After that, you build the code that makes those tests pass. Behavior-Driven Development is another type of TDD, but traditionally it is focused at a different level, focusing on a different perspective than we’d normally expect with TDD.

Typically, TDD is focused on unit testing. That is, we’re describing what an individual unit of code should do. This “unit” typically means a single public method in an object. If the class has dependencies, we mock out those external classes or systems and fake the behavior, so we can ensure our method does what we want when the external dependencies behave in certain ways. This is a great approach, and it’s extremely useful in many cases, but there are some downsides.

In order to test classes that have dependencies, we create test doubles to simulate how those dependencies should react, we are ensured the code under test is working correctly. However, when the system is integrated, and it’s using the real dependencies, things can go poorly. Imagine integrating with another team’s library or another company’s API, and they change how it works. It could be they fixed a bug your code inadvertently relies on, or they’ve added or removed fields or functionality. It could be some aspect of the database has changed, a new field added, or a critical field removed. All your unit tests would still pass, but when the code is all running together, it fails.

This is where we can look at a different level of testing besides relying solely on unit tests. There a four widely accepted levels of testing. Let’s take a quick look at them before moving into Behat and BDD.

Four Levels of Testing

At the smallest level, we have unit tests. Unit tests should be written, ideally as early as possible. They test each bit of code in isolation. This is where we use test doubles like mock objects and stubs to ensure that our code does what it should. These tests should run very quickly, and chances are, there will be a lot of them. This is the level of testing we should be at when using a tool like PHPSpec, and can be doing with a tool like PHPUnit. Errors are detected early and quickly at this level. Unit tests should be the most stable type of tests in the system.

The next level is integration testing. At this level, we are combining different parts of the system to ensure they are working properly with each other. This could be a test that exercises your method that makes an API call or saves values to a database. At the unit test level, we’d provide a fake database or API. At the integration level, we can use the real database or real API. We may just be calling it without needing to go through our full routing, validation, security and other levels. Integration tests will run more slowly than unit tests, sometimes significantly so. At this level, you can detect errors that happen when libraries or services are changed, but you are also just as likely to run into false failures when the services you’re trying to use are unavailable or misbehaving. We can use PHPUnit to perform this sort of test in an automated way.

After integration testing, there’s system testing. At this level, we’re testing the system as a whole. It helps us know that everything is configured, provisioned and working together. These tests are even slower than integration tests. Often, the QA team can be performing this level of testing, but we can also use Behat.

Finally, there’s acceptance testing. This is where we determine if the product is ok to ship or deploy, as well as if it’s doing what it should be, if words are spelled correctly, colors and alignment look good, and no major bugs are found. Tests at this level are even slower and more fragile. Often, this level is not automated, or it’s only partially automated. Some acceptance tests can be automated via a tool like Behat.

As we move from unit tests, the fastest and most stable level, up through integration, system, and acceptance testing, the tests become slower and more fragile. This makes sense. At the higher levels, there are more systems, more code and more requirements in play. Requirements are added or changed which in turn affects the software from the top down. On the other hand, if a change causes a break in a unit test, there’s probably only on a single function that needs to be modified.

How is BDD Different?

With PHPUnit or PHPSpec (and Humbug), in order to be effective, you need to be able to write and understand PHP code. You probably need to be familiar with the code that’s being tested. As a developer, this is expected and understood, and if you are a developer, you may wonder why I’m even bringing it up at all. However, a QA tester may not be familiar with the code and may not be familiar with writing code. This means that in general, developers would need to be involved with building and maintaining any test suites written using PHPUnit or phpspec, if not entirely responsible for them.

My goal is to ensure that testing is not only a role of my developers, but it’s something that my QA team can build on and contribute to. If they are already spending the time writing test cases, I want that effort to be reusable in the form of automated tests. I feel the only viable way to grow software applications is through ensuring we have good tests. If we rely on manual tests, then as the software grows, we must grow the QA team which means more hires, more desks, more computers. It’s not sustainable. I’d much rather continue to build on our automation suite and let the QA team find new and weird bugs. As they find bugs, anything that can be automated is added to the test suite which means they don’t have to worry about that particular bug ever coming back. If you can automate a test or tests that show any new bugs found have been fixed, you can effectively eliminate manual regression testing. This means that if your team does not follow a continuous deployment/delivery cycle and uses a concept such as “code freeze” to ensure testers get a chance at testing the application as a whole without new changes coming in, this “code freeze” cycle can be greatly reduced or even eliminated.

Reporting Bugs and Automating Regression Testing

When testers find a bug or defect in the system, usually they’ll write up a description of the problem. A good bug report will include reproduction steps, expected outcome as well as the actual outcome. If there were a way to turn these descriptions and steps into automated tests, then the testers job of verifying the fix for the defect they reported is working is simple – they just run that automated test or tests.

If we’ve got this code, we can continue to run it on any change to the system, and we are assured that those defects haven’t sneaked back into the system. In other words, our regression cycle isn’t happening at the end of the iteration or once every once in a while, it’s happening on every change, each time a bug is fixed, a feature is added or a pull request is merged.

Describing how new features work can also be thought of like a series of bug reports. Instead of the system doing something incorrect, it isn’t doing anything at all. It is harder to describe new features in terms of bug reports, but it may be possible given a good understanding of the requirements and the system.

Introducing Behat

Behat is a BDD testing framework. It is related to, and is now listed as, the official PHP implementation of a system called Cucumber, a Ruby BDD framework. The tests in Behat and Cucumber are organized into features (essentially test suites) while each individual test is referred to as a scenario. There’s a standard flow for writing these tests, which are written in a language called Gherkin. Gherkin is a domain specific language with which you can describe the behavior of your software without needing to describe how it is implemented. It can act as both documentation and the source for controlling automated tests.

Each “feature” in Behat is described in a *.feature file. The first part of a feature file is the “Feature Introduction”. Usually, it will look like a standard agile feature description:

Feature: Election Management API
  As an election administrator
  I need to be able to manage election details
  So I can properly configure the system for a new election

This block of text doesn’t actually do anything as far as testing is concerned. It’s strictly documentation. You can write whatever you want here.

The next part of the feature file are the scenarios. Each scenario describes some aspect of the functionality that makes up the feature (or bug). For example, a bug report could look like:

@BUG-1134
Scenario: A request for an id with alpha should not match routing and be a 404
  Given I authenticate as an administrator
  When I request "GET /api/election/banana"
  Then I should get a "404" response
  And The "detail" field should be "Page not found."

There are a couple of bits of note. First, this scenario has a tag. That’s the @BUG-1134 part. This is optional, but Behat will allow you to filter which scenarios you run by tag. This means the QA team could verify the bug is fixed by running vendor/bin/behat --tags @BAL-1134.

The next line is the scenario description. In a bug report, it can represent the expected behavior from the bug report. Every line after that are the steps of the scenario. There is a combination of setup and validation. Each step line starts with one of the following words: Given, When, Then, And, or But. These words don’t mean anything in the context of a Behat test, but they do serve to make the scenario flow more naturally. Typically, a scenario will use the word “Given” to indicate setup steps, “When” to indicate the action we are doing, and “Then” to transition to verification and validation. However, each of these words can be used to start any of the scenario phrases in any order. Since these can be used as documentation, I’d recommend choosing the words in a way that makes the most sense when read aloud. This may mean if you’re verifying multiple things, the first phrase could start with Then while each subsequent verification starts with And, or you may decide it makes the most sense to start each verification phrase with Then.

When you run the test, by default, Behat will output the scenario description, followed by each of the steps, along with the file and line they are from. If a step fails, it will indicate the failure in the way the step failed. Any steps after a failing step will not be executed, the same way that if an assertion fails in PHPUnit, the rest of the test will stop as well.

Scenario Templates

In PHPUnit, if we’ve got a bunch of tests that look the same, we can use the @dataProvider doc block annotation and send in a bunch of different arguments to the test. In a similar fashion, Behat provides a way to build a scenario using the Scenario Outline or Scenario Template designator. Suppose the bug report above also found that if an id of 0 or an id larger than the database can handle cause problems as well. In that case, we may want to ensure that the id fields only route if they are positive integers less than some large value that is smaller than the maximum recognized integer. In that case, the only thing changing is the id, but we still expect the route to not match. We can rewrite the test like so:

Scenario Template: Invalid IDs should not route
  Given I authenticate as an administrator
  When I request "GET /api/election/<id>"
  Then I should get a "404" response
  And The "detail" field should be "Page not found."

  Examples:
  | id                  |
  | 0                   |
  | banana              | 
  | 1234567890123456789 |

With the above example, the scenario will be executed 3 times, and the <id> variable will be substituted with the values from the table, one at a time. Each example is treated like a separate scenario, so if one fails, the others will still run independently.

Background and Common Setup

Suppose you’ve got a feature and every part of describing the scenarios requires the same setup, perhaps authentication or another step. Instead of repeating those steps on every single scenario, we can do something similar to PHPUnit’s setUp function. Before each scenario in a feature file, Behat will run any steps in the “Background” section. You’ll want to place these steps at the start of your feature file:

Background:
  Given I authenticate as an admin

This means that we could eliminate the authentication step from every Behat scenario in that feature file. Depending on what you’re testing, this could eliminate a lot of lines of redundant steps.

How Does This All Work?

If you’re trying to follow along and write these tests and run them, you’re probably not having a lot of luck so far. Behat runs through the scenario file and matches each phrase to a method in a class (by default) called “FeatureContext”. Each scenario we run gets a new FeatureContext object which can be used to store state and other information within a scenario execution. Additionally, since each scenario gets a new FeatureContext, you don’t have to worry about feature state from one scenario leaking into the next.

Inside the FeatureContext class, methods will be annotated with a doc block comment that contains a regular expression or expressions that match the phrases you see in the feature file.

For example:

/**
 * @Then I should get a ".*" response
 */
 public function iShouldGetAResponse($statusCode)
 {
     $responseStatus = $this->response->getStatusCode();
     if ($responseStatus != $statusCode) {
         throw new Exception('Response code was not what was expected');
     }
 }

Anytime we have a validation or test that fails, we can throw an exception and Behat treats it like a failure. This means that out of the box, Behat doesn’t have any assertions like PHPUnit. However, PHPUnit assertions work based on throwing exceptions when the assertions don’t match. This means you can install PHPUnit and use the assertions within your Behat FeatureContext. The previous method could be rewritten as:

use PHPUnit_Framework_Assert as t;

...<snip>...

/**
 * @Then I should get a ".*" response
 */
 public function iShouldGetAResponse($statusCode)
 {
     t::assertEquals($statusCode, $this->response->getStatusCode());
 }

Now, realistically, this is already making a lot of assumptions, and if you build your FeatureContext, you’ll probably find you want to take care of them in some way. The first assumption is that by the time we make this call, we’ve already made a request and have a response. If we don’t then chances are $this->response will be null and running Behat will fail when PHP tries to make a method call on a null object. So adding a check that the response is null and failing with t::fail('A request must be made before checking the response'); would ensure that the test fails gracefully and lets the person who wrote the scenario know that they’ve messed up. In my version of this code, I ensure that steps are called in order (that we have a response), as well as outputting the response if the http status code doesn’t match. This can help understand what went wrong if an assertion failed. In fact, for nearly all my phrases that fail, I output the response. I find it gives a lot more insight into why a test is failing which helps to fix the code.

Because each line of the scenario executes on its own, it’s important to build phrases that are stand-alone and to ensure that anything they expect to be in place has run first. Just because the Behat scenarios can be built by putting the phrases in any order doesn’t mean it will work that way or that it makes sense.

In the examples I’ve given so far, which are very similar to actual Behat tests I have in my system, some phrases are doing very little work, while others do quite a lot. For instance, while the “I should get a (.*) response” is essentially a single assertion, the phrase for making a request is a lot more complicated. For me, it’s building up a Guzzle request, including a body and headers that may have been configured in previous phrases. It’s making that request and then setting the value of the response into the FeatureContext object, as well as parsing it into a different property if it’s JSON. It will also automatically fail a test if a request causes a 500 status code or fails to make the request we asked for.

In the Behat FeatureContext I have, I’ve built it for testing APIs. That means I can make requests against any endpoint with whatever HTTP verb I want, with whatever headers and body I need. I can make assertions about the returned HTTP status code. I can ensure that JSON content has fields that are exact values or values that match a regex pattern, or that they contain certain parts. It allows me to specify a path through the JSON object to get to a specific field. I can even extract certain fields and store them in variables that I can use when making subsequent requests or assertions.

This means that I can quickly and easily write scenarios that can test the APIs we have. It’s also easy to create phrases that are more expressive. For instance, I’ve written a lot of scenarios for various APIs to ensure routing doesn’t match for invalid IDs. I noticed that every one I’d written ended with the check to ensure there’s a 404 as well as checking that the detail field was always “Page not found.”. So I created a new phrase of “The route should not match” which internally calls the method to assert the status and detail.

What Else

Of course, Behat can be used for more than just API testing. If you integrate it with a library like Mink, you can use it to control a browser and ensure that your website or application is behaving properly. This would allow for some additional level of System and Acceptance testing beyond just ensuring your API is acting right. Depending on how you create your FeatureContext, you could use Behat to build integration tests as well, or even unit tests, if you were so inclined. I still posit that PHPUnit and phpspec are better tools for unit testing though.

How We Use It

In our project, any time there’s an API defect found, we build a Behat test before fixing the defect. Our Behat tests are kept in a different source repository than our application code. Our CI server picks up the pull request and runs it. It will fail, of course, but that’s ok. This gives the developer something to work against, and it means that once the code is fixed, the Behat PR can be merged and will ensure the defect is fixed once and for all. It means for the Behat jobs on our CI server, we have a ton of failures. When following a TDD or BDD workflow, this is expected and ok. By having the tests in a different repo, though, we don’t have to interrupt our deployment flow. We can also upgrade dependencies like PHPUnit independently of the main application.

Additionally, to make things a bit simpler since a feature or bug fix based on a Behat test involves two different pull requests in two different repositories, we’ve also added a feature to the webhooks project I told you about last month. It allows us to leave a comment like “Test this when is merged” where the URL represents the URL of our application pull request that should make the Behat test work.

IDE Integration

One last bit before I wrap up. If you’re using PHPStorm, Behat integration is built in. This means you can command-click or control-click from a step in your feature file, and it will jump directly to the code that implements the phrase. Additionally, when you need a new phrase, you can type it up in a feature file, click somewhere in it and press alt-enter or option-enter and select “Create step definition”. It will prompt you for where to put the code, and it will generate the method name as well as the @Given doc block. Then you can fill it out. If you build a scenario full of phrases that don’t exist, it can create the stubs for you all in one shot as well.

You can also execute all your features, a single feature or a single scenario with a keystroke in PHPStorm. If there are failing scenarios, and you feel you’ve fixed them, you can also tell it to only re-run the failing scenarios.

Conclusion

Behat provides a really nice way to quickly automate system or acceptance tests. Since the tests are controlled using English phrases, it is relatively easy to add new features or ensure bugs have been fixed. It allows for users with little or no development experience to write scenarios that becoming running and working tests that can inform the developers what they need to do to fix a bug or build a feature. Since it is testing at a higher level, while the tests will be slower than unit tests, they can ensure the system is working as a whole, properly configured, and integrated. I’d highly recommend trying it out. It has become a critical part of my testing strategy on my projects, right next to PHPUnit and phpspec. See you next month.

Behavior Driven Development With Behat