In the mid-to-late 1980s, Robert Zemeckis, Michael J. Fox, and Christopher Lloyd (and others) created a series of movies which explored time travel, paradoxes, and how if you mess up the past and your parents don’t fall in love, you may cease to exist. The second movie in the series (Spoiler Alert!) involves a bit where Biff, the series antagonist, retrieves a sports almanac from the future, brings it to the past, and is able to place a large series of winning bets on the outcome of various events, placing him at the top of his own empire and greatly affecting the future in an arguably negative way.

Introduction

While the movies are excellent and beloved by many, they’re not real. There’s no actual stainless steel car that can send the occupants through time when it reaches 88 miles per hour. In the movie, Marty jumps forward to just a few weeks from now – October 21, 2015 – ironically (or maybe not), the day after the end of PHP|World. So in a few weeks we’ll know completely how accurate or ridiculous the predictions made in this movie may be. Some prognostications have come true, while others, not so much. For instance, fax machines, while still, unfortunately, existing, do not play nearly as large of a role today as the movies predicted. We also still don’t have a real hoverboard.

As much as I would love to revisit these movies and talk about them for hours, I do have a point, and a way to tie all this together. In Back to the Future, Part II, Biff uses his knowledge of the future to make himself a lot of money. It is not hard to imagine that if one could actually predict the future, it would not be difficult to use that knowledge for profit. However, we cannot predict the future. Mostly.

On August 11, 2015, just a couple of weeks ago, Federal authorities unsealed charges against 32 hackers and international traders who used their knowledge of the future to gain profits of over $100 million by trading stocks. Now, it’s certainly not illegal to trade stocks, but it is illegal to trade stocks when you have insider information about deals a company may make or how they will announce their performance during their annual or quarterly reviews. However, these people were not (to my knowledge) insiders in the traditional sense. They used knowledge that was released to the public via news articles to make trades that made them a lot of money. So what’s the problem?

The information they were using was in the form of not-yet-released news articles, press releases, earnings statements, etc which they acquired by hacking the networks belonging to Marketwired and the PR Newswire Association. This gave them access to these articles before they were published, allowing them to trade based on easy-to-make predictions of what would happen to various companies’ stock prices before the public knew about the news. If you have a way of knowing with a fair amount of certainty that a stock price is going to get a good bump (or take a dive), it’s not hard to make trades that will take advantage of that knowledge.

By using a series of SQL injection attacks against the servers, over three years, the hackers gained access to about 150,000 draft news articles which they used to make informed trades. They didn’t do this all at once and they didn’t act on every article that was stolen which made it harder for authorities to figure out that something shady was going on.

What is SQLi?

Last month, I talked briefly about SQL Injection, or SQLi attacks. This month, it’s all about that. Like PHP, SQL (Structure Query Language) has certain keywords which mean something to the language on their own. These include words like SELECT, INSERT, UPDATE, DELETE, WHERE, INTO, etc. In between these keywords you’ll find words which were supplied by a user: names of tables, fields, functions, and more. In many queries there are also values or patterns which are used to control or limit the records which are affected by a particular query.

The database engine which runs they queries and gives back results or changes data, is able to interpret a provided instruction string into keywords, identifiers and data in order to do what we’ve asked it to do. The problem comes about when the SQL engine doesn’t know what the intention of a query is and it doesn’t know the difference between the instruction parts of SQL that the developer wanted to run as instruction and data provided by a user which may be misinterpreted as instruction rather than data.

Let’s take a look at how a SQL statement might be built and how it could be susceptible to an injection attack:

$query = "SELECT * FROM users WHERE username = '{$_REQUEST['user']}';"
// Run the $query

Now suppose we’ve set up a page with a field called user and we’ve got well behaved users. As long as they’re not messing around with us, a field containing a standard string representing a username will come into that query and we’ll have something that looks like this:

$query = "SELECT * FROM users WHERE username = 'dave';"

That’s a perfectly legitimate query and it will give back the results the developer was expecting: If there’s some user named ‘dave’ then we’ll get a row back. If not, we will get back something indicating that no such row exists. It’s probably worth noting here early on that most of the code examples in this article will be bad. Don’t use them in your code unless you’re practicing making attacks, and certainly don’t let this code get into any code you’re running on public servers.

So let’s jump out a bit to explain $_REQUEST just in case anyone reading is not familiar. $_REQUEST in PHP is what’s known as a “superglobal”. It’s automatically set up and populated by PHP and it’s available everywhere. The $_REQUEST array will be filled with values from $_GET (query parameters) and $_POST (standard form data from HTTP POST requests where the Content-Type is application/x-www-form-urlencoded). If the request comes in with some other Content-Type, then $_POST won’t be populated and $_REQUEST will not have any of the POSTed fields. It’s also possible that $_REQUEST could be populated by cookie values, depending on your php.ini setting for request_order. By default it’s, GP which stands for GET and POST. So right there it means that it’s not possible to know from $_REQUEST if the variable is a query parameter, a POST field or even a cookie. Not knowing where your variables come from is not a great idea on its own, but it’s not the topic today.

Now let’s revisit the SQL above. Suppose we’ve got a user who has an apostrophe in their name, like O’Reilly. They fill out the form, submit, and our query becomes:

$query = "SELECT * FROM users WHERE username = 'O'Reilly';";

The string itself is ok, but when executed as a SQL statement, the SQL engine will think the query is using “Reilly” as some sort of command and it will fail to run. Even though there was (likely) no malicious intent from our O’Reilly user, they’re not likely to have a good time on the site since it will not behave well with their username. Back in the early days of PHP, a function named addslashes was added and uncountable tutorials on the language recommended its use. The function replaces single quotes with \’ which means our code above changes slightly:

$username = addslashes($_REQUEST['user']);
$query = "SELECT * FROM users WHERE username = '$username';";

Now this is a tiny bit better because our O’Reilly friend will be able to use the site. The resulting query string becomes:

$query = "SELECT * FROM users WHERE username = 'O\'Reilly';";

This is a legitimate, runnable SQL string (in MySQL). But addslashes is bad, so my first suggestion is to make sure that you’re not using it anywhere in your code. If you find calls to it, work toward removing them and making your code safe. The prevalence of addslashes and the false assumption that just escaping (that’s what the backslash is doing) single quotes was good enough resulted in a misguided concept called magic quotes. Magic quotes meant that PHP would automatically escape quotes in strings if found. Since the majority of code at the time was going against MySQL and this worked well enough, it stuck around in PHP for some time. It was deprecated in PHP 5.3.0 and removed in PHP 5.4.0, but while it was around, and since it’s a feature that could be turned on or off in the ini, it lead to a lot of problems which to many PHP developers, were indicators of other developers or admins who didn’t really understand what they were doing. Indicators usually were strings which when viewed on the site would have single quotes prefaced by one or many backslashes. This was usually caused by a developer working on a machine that didn’t use magic quotes and manually calling addslashes and then uploading to a server which was configured with magic quotes causing the backslashes to be escaped as well.

As PHP evolved and gained more and more support for other flavors of databases, it became clear that a one-size-fits-all solution to escaping database input would not work and was not appropriate. Instead, it’s important to filter input strings and escape output strings (output into the database I mean) through a database-specific method which can ensure that SQL injection is avoided. More on that in a bit.

For now, though, enough of the history lesson, let’s get back to SQLi and recommendations on how to identify and fix issues in the code. If your code contains calls which start with mysql_, I would highly recommend fixing it. The mysql extension has been deprecated as of PHP 5.5.0 and will be removed. This means upgrading your server to run on a current version of PHP like PHP 5.6 or the upcoming PHP 7 may not work. The mysqli extension is recommended over the mysql extension. If you’re using MySQL on your sites or applications, the mysqli extension will work, and will support everything you can do with MySQL 5.1+. PDO doesn’t support every bit of every database functionality, just most, but I would still recommend PDO over mysqli (or other database specific functions) unless you’re using, and your application requires some of the functionality that PDO does not support. Chances are though, PDO will work for anything you’re doing.

The advantage PDO provides is that you’ll be able to work with a number of different database engines using the exact same set of method calls. In my previous position, we had a single application which needed to fetch data from MySQL, PostGreSQL, Oracle, and Microsoft SQL Server. To use each of these with their native drivers would mean learning mysqli_*, pg_*, oci* and mssql_* functions. By using PDO, I was able to connect and send queries into all of these databases with the same set of methods. While the actual SQL dialects means the queries needed to be built slightly differently, the PHP calls were all the same.

Identifying Potential SQL Injection

The easiest way to have a SQL injection vulnerability in your code is to build your queries using string concatenation with user provided data. By “user provided data”, I am intending to cast a wider net than you might be thinking. Of course all the standard $_GET, $_REQUEST, $_POST and $_COOKIE values are suspect. Additionally I also mean any value that we’ve stored in the database as well. You might be wondering why. It’s because at some point, data in the database may have been inserted through some way that would not pass our current restrictions. It could be DBAs directly inserting data, loading data from files, or at some time in the past when your application was not quite as secure as it may be today. So with that in mind, I mean we need to look for queries in our code that are built using PHP variables directly in the query.

In order to find potential SQL injection candidates, you’ll want to search your code for any queries that you’re running. This means looking for calls to functions like mysql_query, or mysqli_query or even PDO methods like query and execute. Additionally, searches for SQL keywords like SELECT, UPDATE, INSERT, DELETE and CALL will help find other places where queries may have been built in a different place from the code that runs them. When you’ve found the queries in your code, look at how they are built. Some queries may have no variable portion of the query, meaning the query never changes based on any variable. Queries like SELECT site_title FROM configuration;, for instance, are not vulnerable to SQL injection.

If you find PHP variables in the SQL string, such as:

$query = "SELECT $field FROM $table WHERE $whereField = '$whereValue'"

Then your query might be vulnerable to SQL injection. In order to ensure that it’s not, you’d need to track back to the origin of the $field, $table, $whereField and $whereValue variables to determine where they came from. If any of them traces back to any user supplied value, then the query is definitely SQLi vulnerable. If all of the variables have been properly filtered and escaped before using them, then you’re probably ok. Probably. If the variables are not properly filtered and escaped then you should consider the query to be vulnerable even if you cannot easily determine a way to compromise it.

Hacking Your Own Sites

Once you’ve found a query or a whole slew of queries which are potentially vulnerable, it can be enlightening (if not a bit scary) to develop a way to break it. Easy queries to target would be those that involve logging in, authentication, or searching. Targeting APIs is another way to quickly get data from the outside into your vulnerable queries. Suppose you’ve found that authentication involves the following bit of code.

public function authenticate($username, $password)
{
	$password = md5($password);
	$query = "SELECT * FROM users WHERE username='$username' AND password='$password';
	$result = $this->db->query($query);
	if ($result->getNumRows() == 0) {
		return false;
	}
	$_SESSION['user'] = $result->fetch();
	return $result;
}

Now, beyond SQL injection there’s plenty wrong with this method. However, the md5 call on password, while being a terrible, horrible, no-good, very bad idea, does mean that we know that $password is not SQLi vulnerable. But, we don’t have any idea where $username comes from. You could jump around in the code and find all the places from which the authenticate method is called, (assuming that’s even possible, it may not be) and make sure that all of them escape $username properly before passing it in. Assuming you’re able to do this, it will still never ensure that some future use of this method will not call into it without escaping $username.

In order to ensure that $username is dealt with properly in all cases, we need to handle it within the authenticate method. It should not be the responsibility of the caller of our method to know how to deal with $username and prepare it. Before we look at how to fix this query, let’s look at how to prove it’s injectable.

Suppose there’s a login form which will receive form values username and password. The form POSTs to a script which will handle creation of the object that contains the authenticate method. It calls into that method, passing in $_POST['username'] and $_POST['password']. In this example, our login is definitely vulnerable.

We’ve already established that the $password parameter is not vulnerable. So if you put anything into the password field, and pass in ' OR 1=1; -- for the username, the query that will be run against the database is:

SELECT * FROM users WHERE username='' OR 1=1; -- ' AND password='5f4dcc3b5aa765d61d8327deb882cf99'

Surprisingly, or perhaps not, the statement above is valid SQL. The SQL engine will stop caring about it after the semi colon because the double dash indicates that the rest of the statement is a comment. The query is essentially interpreted as “give me all the fields from the users table where a row has a username which is blank, OR true”. This sounds really weird, but think of it in PHP terms. If we build a statement like this:

if (someFunction() || 1==1) {
	// code in this block will always run
}

We’re essentially doing the same in SQL. Every row in the users table will be returned since 1=1 is always true for SQL. If you follow through the rest of the logic, the authenticate method, which was intended to either find one or zero records, is now dealing with all the records in the users table. The session value will be set to the first returned row, whatever that may be, but in many cases, it will be the fist user in the database. Often times, this is an admin user.

I can’t just leave the terrible authenticate method out there without mentioning a number of things wrong with it, and reiterating that it should never be used by anyone for anything real. We’ll get back to SQLi in just a moment, I promise.

  1. Using md5 for password hashing is highly discouraged. Use bcrypt instead. (Republishing note: use password_hash)
  2. The query itself is vulnerable to SQL injection for the $username parameter. We’ll fix that shortly.
  3. Rather than comparing to 0 results, we could compare to not equal 1. This doesn’t make this method safe on its own, but it means the attacker at least has to put a limit clause in the injected username field instead of the trivial OR 1=1; -- username.
  4. This function modifies session. As a function, it’s doing too much. It should not be using or changing superglobals at all, but it’s doing more than determining that a user is authenticated.
  5. It has more than one return value. It can either return an array or false. This means anything calling it will have to check for the false value.
  6. The returned array is not defined. Instead the method should return an object which could be used as a user object if authenticated or would identify as a non-authenticated user otherwise. Then there are no undefined values, the developer calling the method doesn’t need to look up the database definition to know what to expect or use.

There are probably more things wrong, but 6 major problems for seven lines of code is a good start. Now on to the fixing of the code.

Prepared Statements

To wrap up, I’m going to jump briefly into PDO and prepared statements. PDO provides a unified API for making database queries and a simple way to build and use “prepared statements”. Prepared statements give the developer a way to let the database engine know about the intention of the query. Instead of building the whole SQL string using concatenation or variable replacement, the instruction part of the query is provided completely with the variable parts provided as placeholders. There are limits on where you can put these placeholders as well. Since the database is using this provided query to make a plan on how to execute it against the database, you can essentially only provide pieces of the where clause. You will not be able to do things like providing placeholders for field or table names. If those changed, the database engine would have to create a different plan.

With prepared statements, you no longer need to worry about quoting strings, escaping them, or leaving integer fields unquoted. The prepared statement will handle all of that for you. In the method above, the query part becomes:

$query = 'SELECT * FROM users WHERE username = ? and password = ?';

Alternatively, you can also use named placeholders like so:

$query = 'SELECT * FROM users WHERE username = :username and password = :password';

In the first example, variables are “bound” in the order they appear in the query. In the second, they can be bound in any order but must match the name.

Briefly, we can create a PDO connection like so:

$db = new PDO('mysql:host=localhost;dbname=myapp', $dbuser, $dbpass);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

We’ve now got a PDO object which will throw exceptions for errors from queries. Assuming it’s the object that has been injected into our authentication object, we can update our method:

public function authenticate($username, $password)
{
    $password = md5($password);
    $query = "SELECT * FROM users WHERE username=:username AND password=:password;
    $statement = $this->db->prepare($query);
    $statement->bindParam('username', $username);
    $statement->bindParam('password', $password);

    $statement->execute();
    $row = $statement->fetch(PDO::FETCH_ASSOC);
	
    if (!$row) {
        return false;
    }
    $_SESSION['user'] = $row;
    return $row;
}

While the code above still has many of the problems I outlined above, it’s no longer vulnerable to SQL injection. The database receives the query with placeholders, and then the two calls to bindParam link the named placeholders to the variables that were passed into the method. Next we execute the statement. The execute line is one place where exceptions can happen, so with good code, we’d want to wrap it in a try…catch block and deal with exceptions appropriately. Once we’ve executed the statement, we can fetch a row from it. If there are no rows, we’ll get receive false and we can leave the method indicating the user is not authenticated. Otherwise, we’ll store the row in session (again, probably not good to do in this method) and return the data.

We’ve barely scratched the surface of the capabilities of PDO, and for this article we’ve gone as far as we will. I hope you’ll take some time to look at other ways to use PDO to achieve the goals of your application.

Conclusion

SQL Injection is a serious problem in applications. It allows attackers a rather straight-forward way to cause the database to do something that the application developer didn’t want to happen. It can allow authentication for users who have no business being allowed in an application, it can return data that doesn’t belong to a user or data they are not supposed to see. It can allow the attacker to insert their own data, change existing data, or even delete data, potentially up to dropping entire tables or databases. In short, it’s a serious problem and if you’re unaware of it, chances are your applications may be vulnerable.

I urge you to audit your code (as well as the code of third-party libraries you use) for SQL injection vulnerabilities. Fix these problems by properly escaping your code, or preferably, using tools like prepared statements. Using them consistently is a great way to avoid SQL injection. Thank you for joining me this month, and I’ll see you next time.