13 minutes
Leveling Up: Databasics With PDO
This article was originally published in PHP|Architect magazine in the August 2015 issue. These articles are all copyright by David Stockton. You may be able to purchase the issue here.
Requirements:
- PHP with PDO (Pretty much since 5.1)
- A database
If you’ve built anything useful or interesting with PHP, and chances are that you have, then you’ve probably used a database. PHP makes interacting with databases easy and PDO makes it even easier.
Introduction
I hope that most people reading this far have already heard of (and are using) PDO when interacting with databases. I am surprised every time I ask the question at a conference or user group, “How many people use PDO?” and the answer comes back as a rather low number. I wanted to write this article in the hopes that it will convert those of you who aren’t using PDO to start using it.
What is PDO?
PDO stands for PHP Data Objects. It’s not new either. It’s been around since version 5.1 &emdash; practically ancient in the PHP world! PDO provides a common API for interacting with a variety of databases. Rather than needing to know a bunch of database specific function calls like mysqli_connect
, oci_fetch
, mssql_execute
, etc, all of which vary slightly based on what database you’re connecting to and what features are exposed, you can learn the PDO API and then you’ll know how to use them with your databases.
An often cited reason for using PDO (or another database abstraction layer) is the ease of switching from one database engine to another. For instance, moving from MySQL to PostgreSQL or MSSQL. In reality though, a database migration is a really big deal. It’s only going to be marginally easier to migrate from one database to another if you’re using PDO versus native drivers. A database migration is almost certainly going to involve changing actual queries unless you’re using just the most basic functionality and minimal syntax. Nevertheless, learning PDO’s API and using PDO to interact with the database means that connecting to additional databases or working on another application will be easier regardless of what type of database you’re using.
Currently at my employer, we have several applications that use by MySQL as well as PostgreSQL. In my previous position, the application I worked on connected to and used data from MySQL, PostgreSQL, Oracle and MS SQL Server. Since all of these connections utilized PDO, interacting with the various data sources was consistent and simple.
Supported Databases
PDO supports a lot of databases, including many that you’re not likely to ever need. But since they are supported and, if you should have the need to use them, PDO will be there for you. Currently supported databases include the following:
- CUBRID
- MS SQL Server (via DBLIB and SQLSRV drivers)
- Firebird
- IBM
- Informix
- MySQL
- Oracle
- ODBC and DB2
- PostgreSQL
- SQLite
- 4D
How to Connect with PDO
In order to establish a connection with a database, PDO requires what’s called a DSN. DSN means “Data Source Name” and it provides the information PDO needs in order to establish the connection and make the native calls behind the scenes to run queries. Typically, the DSN will include the DSN prefix (usually identifying what kind of a database you’re connecting to), the host, port, name of the database, etc. You can also provide information like the unix socket (if you’re connecting to MySQL through a socket rather than a host) and the character set. Let’s take a look at some examples for connecting to a couple of different databases:
MySQL:
mysql:host=localhost;dbname=test;port=3307
PostgreSQL:
pgsql:dbname=example;host=localhost;port=5432
Of course there are DSNs for the other databases supported by PDO as well, including ODBC. If you’re running on a platform with ODBC drivers, you can connect to just about any SQL database.
Once you have a DSN you can use it (along with the username and password for the database) to establish a PDO object:
$db = new PDO('mysql:host=localhost;dbname=myapp_db;port=3306', $username, $password);
Most likely you’ll want to store your database paramaters in a configuration file of some kind, and then build the DSN from that so you can use the same code in development, QA, production, and any other environments you use.
Configuring the PDO Object
After creating the PDO object, chances are you’re going to want to configure its behavior just a bit. One of the most frustrating things I’ve run into is establishing a connection, running a query and then nothing happens. PDO, by default, is silent when it comes errors. It will set an error code, which you can retrieve, but then you need to remember to do as such, or your code may just fail and you’ll have no idea why. You can also configure PDO to emit warnings (why?) or throw exceptions on errors &emdash; which is my personal preference. You can configure this behavior with the setAttribute method like so:
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
Which upon an error, PDO will throw an exception you can catch and hopefully deal with if something goes wrong.
If the connection to the database outright fails, the constructor will throw a PDOException, so catching that when establishing the connection would be a good idea as well.
Running Queries with PDO
Running queries is simple with PDO. The simplest way is to utilize the query method of the PDO object. To do this, you pass it the SQL statement you wish to execute. It will return a PDOStatement object which you can use to deal with the results of the query.
$result = $db->query("SELECT name, description, quantity FROM store_inventory WHERE category = 'electronics'");
foreach ($result as $row) {
echo "$row['name'] - $row['description']: $row['quantity']\n";
}
In this example, there was no user provided data, so query is completely fine to use. In a bit, we’ll talk about how to deal with user input that you want to utilize in the query. For now though, let’s take a look at some of the other features PDO provides for dealing with queries that return data.
Column Binding
In the example above, $row was an associative array that we needed to provide the column names in order to use the values from each row. PDO provides the ability to bind a variable to a column, so that as you iterate over a result set, it will automatically update the value of the variables you bind. Let’s take a look.
$result = $db->query("SELECT name, description, quantity FROM store_inventory WHERE category = 'electronics'");
$result->bindColumn('name', $name);
$result->bindColumn('description', $description);
$result->bindColumn('quantity', $qty);
foreach ($result as $row) {
echo "$name - $description: $qty\n";
}
We’ve essentially go the same output in both examples, but in the second we’re not dealing with any of the pesky and redundant $row
references.
SQL Injection
Enough of the basic basics for now though. The main reason I like PDO, and encourage as many PHP developers as I can to use it, is that it makes it very simple to avoid a vulnerability known as SQL injection. For anyone not familiar with this, it’s a type of attack against a website which can result in running SQL of an attacker’s choosing. This can result in anything from being able to login as arbitrary users without needing to know their passwords, to seeing data that the user shouldn’t be able to see, inserting or changing arbitrary data, deleting arbitrary data, or even dropping the entire database. Needless to say, it’s best to avoid it.
SQL injection happens when user-provided data makes its way into the instruction part, when it should have been part of the data. You may be wondering what I mean by this.
Let’s take a look at a simple query that has user-provided-data. Suppose you have a site that allows the user to search for products for sale. A theoretical query to do this could look like:
SELECT * FROM products WHERE description LIKE '%elephpant%'
In this example, the user is searching for elephpant
and that value is inserted into the WHERE clause. The instruction part of the query tells the database to return all columns from a table called products. It should limit those products to those that contain ‘elephpant’ in the description. The ‘elephpant’ part is “data”, not instruction. It’s something provided by the user that will be used to let the database engine return a set of results that matches what we’re looking for.
You can imagine in PHP, the query may be built by something like this code:
$query = "SELECT * FROM products WHERE description LIKE '%$_REQUEST['description']%'";
If you have any queries in your system that look like this, your site or application is vulnerable to SQL Injection. The value in $_REQUEST['description']
is user provided.
If the user data is not properly escaped, the “data” part of the query can “jump out” into the instruction part. In the search for ‘elephpant’, we don’t have a problem. But imagine that this same code is run and the user searches for something containing a single quote character like “jacob’s ladder”. There’s nothing malicious going on, but the query that the code above would try to execute would be:
SELECT * FROM products WHERE description LIKE '%jacob's ladder%''
This will result in an error, because the database doesn’t know what “s ladder” means as far as instruction goes. If your site isn’t properly handling malformed SQL errors, you may have given the attacker the information they need to expand their attack. Imagine instead of searching for science experiments, someone searches for this: ' OR 1=1; DELETE FROM products; --
In this case, the SQL statement turns into two statements &emdash; one which will give back all products, followed by a second command which deletes every product from the database. There’s all sorts of options an attacker has if your query is open to SQL injection. Now depending on your database, drivers and other options, the above SQL injection may not delete everything. Some databases will only run a single statement at once. But there’s still plenty to be done. If the database supports subqueries or the UNION statement, the query could be modified to include other data besides product information, for instance usernames and passwords, transaction data, etc. The sky is really the limit.
Of course, it’s a bad idea to build queries by string concatenation with user provided data. If you’re using the mysql (please stop, this extension needs to die) or mysqli, you’ve probably (hopefully) used mysqli_real_escape_string
before putting user-provided-data into a SQL string. This takes care of issues like single quotes and other special characters that can make user-provided-data be perceived as instructions to the database. There’s an easier way to deal with all of it though.
Prepared Statements
PDO makes it very simple to create and execute what’s called a “prepared statement”. Using prepared statements allows the database to plan out how it’s going to execute a query ahead of time without data. (The data can be provided later) By separating the query and the data, there’s no chance for the database to see user-provided-data as instruction. Let’s take a look:
$query = "SELECT * FROM products WHERE name=:name";
$statement = $db->prepare($query);
At this point, $statement
contains a reference to a planned out and ready to execute query that just needs a value for :name to be provided. The :name
part is a placeholder in the statement that will be provided by the user through filling out a form. In order to use it, we need to bind a variable or provide it during execution. The following two examples will result in the same result:
$statement->bindValue('name', $_REQUEST['name']);
$statement->execute();
Or, alternatively, this way works as well:
$statement->execute(['name' => $_REQUEST['name']);
If you’re planning on running the same query multiple times, prepared statements are great. By allowing the database to plan out the statement and hold onto that plan, it doesn’t have to repeat work. Let’s take a look at a different way to bind that makes this especially nice:
$query = 'INSERT INTO fruit (name) VALUES (:name)';
$statement = $db->prepareStatement($query);
$statement->bindParam('name', $name);
Notice there’s a subtle difference. Instead of using bindValue
, which allows the use of explicit strings or constants (or the value of a variable at a point in time), bindParam uses the value of $name
as it changes. So we can run the code above, and then loop over something and call execute repeatedly for different values:
$fruits = ['apple', 'banana', 'pear'];
foreach ($fruits as $name) {
$statement->execute();
}
This code will loop over the $fruits
array and insert each one in turn through a statement that was previously planned and prepared once. It will execute more quickly than building and executing the individual queries for each one of the fruit names.
Imagine the same sort of pattern for importing a CSV file with a number of different fields into the database. You could create the query, prepare the statement, and then loop over the CSV file with a function like fgetcsv
, binding the values in a resulting row and running your prepared insert statement. This will go pretty quickly, but we can speed things up even further.
Transactions
SQL databases typically provide a way to execute something called a “transaction”. A transaction can be seen as an “all or none” sort of operation. To quickly understand what this means, imagine a database that represented bank accounts. In order to transfer money from one account to another, we need to subtract money from one account and add money to another account. If these are separate operations (probably a couple of update statements) and the script dies in the middle, say after the subtraction operation occurs, then the first account no longer has the money that was being transferred out, but the second account doesn’t have that same money deposited. The money has essentially disappeared.
Transactions allow us to wrap multiple operations and tell the database that it needs to do all of them or else it should undo everything it has done. This means if the script died before the money was added, then the original account would not reflect that any money had been withdrawn. As far as the database is concerned, it would look like the transfer hadn’t even happened.
Now back to the previous example of inserting a bunch of rows from a CSV file. Transactions can help speed things up. Instead of inserting each row one at a time, we can start a transaction, insert 1000 rows or so, then commit that in one shot and continue to do so until the file has been processed. I’ve seen massive speed increases on this sort of operation. Here’s what that code could look like:
$file = fopen('somefile.csv', 'r');
// prepare statement here
// make calls to bind variables into statement
$db->beginTransaction();
$count = 1;
foreach (fgetcsv($file) as $row) {
$statement->execute();
$count++;
if ($count % 1000 == 0) {
$db->commit();
$db->beginTransaction();
}
}
// Get the last < 1000 inserts committed
$db->commit();
This speedup occurs because instead of needing to guarantee that each individual row is written at the time execute() is called, we’re telling the database it only needs to do this every 1000 rows. It can do that operation pretty quickly.
Conclusion
PDO is a powerful database abstraction that allows you to quickly and easily interact with a multitude of databases in a very similar way. We can create prepared statements, which help to avoid SQL injection with almost no additional work on your part. We can use transactions to ensure that even if our script dies or is killed before it’s done, the database will be in a consistent state. We can also use transactions to speed up large imports by reducing the number of times the database needs to write to disk and confirm that the data has been written.
I do want to mention one thing in regards to SQL injection and prepared statements. It is not correct to say that using prepared statements will completely prevent SQL injection. For standard SQL with prepared statements, SQL injection is completely avoided. However, imagine that the SQL statement is a call to a stored procedure which builds a SQL string through concatenation of provided parameters. In this case, the parameters passed in could still result in a SQL string that causes a SQL injection.
Many, if not most PHP frameworks, are using PDO under the hood. So if you’re building using a framework, you might be using PDO already and not know it. However, if your database calls include the name of the database in a function call, like mysqli_connect
or pgsql_query
or anything similar, then I would highly recommend taking a look at PDO and using it. The patterns it allows for and provides means that your application will quickly become more secure, working with prepared statements in PDO will become second nature.
David Stockton is a husband, father and Software Developer. He builds software in Colorado, leading a few teams of software developers creating a very diverse array of web applications. His two daughters, age 11 and 9, are learning to code JavaScript, Python, Scratch, a bit of Java and PHP as well as building electrical circuits, and a 4 year old son who has been seen studying calculus, recursive algorithms and is excelling at annoying his sisters. David is a conference speaker and an active proponent of TDD, APIs and elegant PHP. He's on twitter as @dstockto and can be reached by email at levelingup@davidstockton.com.
2758 Words
2015-08-01 22:00 -0600