SQL injections

tbowan
(en français)
Juillet 6th 2020

Spoiler: If you are interested about web application security, SQL injection is the ABC of vulnerabilities. Even if it is known since at least 1998, it is still in the OWASP TOP 10. In fact, it is in the TOP 1…

If there's one constant in application development, it's data persistence. Between two executions of your program, how do you save and restore the data it needs? Corollary: when there's a lot of data, how do you organize them to find fast the right information?

As always in computer science, there are many solutions (more or less adapted), but for once, one solution has solved the problem so well that it has become almost systematic: SQL databases.

To be more precise, we must talk about "Relational database", SQL is the language that permit the interaction with thoses databases. These notions are so closely related that we often refer to them as “SQL database”.

Access where you shoudn't. jplenio @ pixabay

The problem, as always with these specific languages, is that users ~~can~~ will send anything as data and, if the developer isn't careful, hijack the normal execution of his application for their benefit. This is known as SQL injection, and it can do some serious damage.

Fortunately, since we've known about this vulnerability for so long, it's very easy to avoid it, and it's surprising that we can still find them... When you want to execute an SQL query, always use prepared queries and only insert parameters at runtime, and everything will be fine.

Relational databases

Even if you don't need to be an expert in relational databases, there are a few basics I'd like to cover before I get down to the nitty-gritty...

Tables

The key idea with these databases is that we don't store information individually, but that this information forms relations (hence the adjective relational), i.e. groups within which information is associated in order to form a whole that carries more meaning than its components.

Organize well for easy retrieval. j_cadmus @ flickr

Individually, titles, dates and text don't mean anything in particular, but in combination, they can describe blog posts.

To represent those relations, arrays are used (which are called tables). Rows represents items of the relation (e.g. blog articles), columns represents the information forming the relationship (e.g. their title, date, etc.).

For developers, relations can be seen as structures (in C) or objects (like C++, Java, ... but without methods). Lines of the table are individual

Pour les développeurs, les relations peuvent aussi être vues comme des structures (en C) ou encore des objets (en C++, Java, ... mais sans les méthodes). The table rows are the individual objects, and the table is a convenient way of storing and representing all the objects in the class.

Staying with the example of articles, here's what a table might look like, where they're described by their title (title), their date (publication in UNIX timestamp), their content (content, plain text) and an identifier (id), optional but usefull.

id	title	publication	content
1	Welcome	1593691200	Lorem ipsum dolor sit amet, consectetur adipiscing elit.
2	Happy new year	1672531199	Nullam convallis libero ac tellus sagittis congue ut ut ipsum.

For those who don't read UNIX timestamps off the top of their heads, the second article is scheduled for publication on December 31st, 2022 at 23:59:59.

Queries

All interactions with a database involve queries. These are textual commands that the database engine interprets and then executes on its data to provide you with the result.

For young people who have not lived through this period, you have to imagine that in the XX^th century, the pioneers of computer science had this rather crazy hope of being able to communicate naturally with machines. The languages and formats invented at the time were therefore generally very verbose.

So to create the table of previous articles, we would send the following request to our favourite server:

CREATE TABLE articles (
    id          int         AUTO_INCREMENT,
    title       VARCHAR(70) NOT NULL,
    publication int         NOT NULL,
    content     TEXT        NOT NULL,
    PRIMARY KEY(id)
) ;

Once the table has been created, it still needs to be populated, again via a specific query:

insert into articles (title, publication, content) VALUES
    (
        'Welcome',
        1593691200,
        'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
    ) ,
    (
        'Happy new year',
        1672531199,
        'Nullam convallis libero ac tellus sagittis congue ut ut ipsum.'
    ) ;

To retrieve a particular article, another query would be used:

SELECT * FROM articles WHERE title = 'Welcome' ;

There are, of course, other queries for modifying (UPDATE) or deleting (delete) data and tables (ALTER to modify them, DROP to delete them). You get the general idea...

In applications

When your application handles data or database, it must build a query then send it to the database using the appropriate function. Then, when the execution is completed, the result of the query is return to your application, and you can continue your operations.

Send, then wait for a response. Atlantios @ pixabay

Each databases language and technology has it own habits, but by and large, it's look alike. To keep thing simple, the following exemple are in PHP using a SQLite database.

To continue with the blog exemple, imagine your web application show the content of an article (which identifier id is a URL parameter). After connecting, you must build a select query (select), then execute and print the result.

Here's the kind of classic code you're likely to come across, which, as we'll see, has an SQL injection vulnerability...

// 1. connect the database
$pdo    = new PDO("sqlite:/var/www/mabase.sqlite") ;

// 2. build SQL query
$query  = "select * from articles where "
       .= "id = '" . $_GET["id"] . "' and "
       .= "publication < strftime('%s', 'now')"
    ;

// 3. send the query and get the response
$result = $pdo->query($query) ;
$row    = $result->fetch() ;

// 4. print the content
if ($row !== false && ) {
    echo "<h1>" . $row["title"] . "</h1>\n" ;
    echo "<p>Published : " . date("d/m/Y H:i:s", $row["publication"]) . "</p>\n" ;
    echo $row["content"] . "\n" ;
} else {
    echo "Not Found\n" ;
}

An honest user will pass normal values to your script, which in turn will supply the expected values. On the command line, here's the kind of thing you might get:

tbowan@nop:~$ curl "http://localhost?id=1"
<h1>Welcome</h1>
<p>Published : 02/07/2020 10:00:00</p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

And if you want to see the articles in advance, the SQL query takes care of the filtering and you get an error:

tbowan@nop:~$ curl "http://localhost?id=2"
Not Found

SQL injection

You see us coming... Just because you're supposed to put an integer in the parameter doesn't mean you can't put anything else 😉. And if we choose carefully, we can then inject our own SQL commands and hijack legitimate execution with our own instructions.

If you'd like to play with this vulnerability, I recommend Natas challenge #14 or even installing DVWA.

Ignore the rest of the query

By inserting the comment character (-- in most cases and # with mysql). Everything that follows in the query will then be ignored.

If the value inserted in the query is surrounded by single (or double) quotation marks, we must also inject a single (or double) quotation mark to simulate the end of the string and allow the engine to interpret our hyphens.

In our case, if a user sends the value 2'-- for the id parameter, this value will be inserted as is in the query constructed in PHP as follows:

select * from articles where id = '2'--' and publication < strftime('%s', 'now')

The past publication date condition is then ignored by the database, so filtering is no longer applied and all articles can be read.

tbowan@nop:~$ curl "http://localhost?id=2%27--"
<h1>Happy new year</h1>
<p>Published : 31/12/2022 23:59:59</p>
Nullam convallis libero ac tellus sagittis congue ut ut ipsum.

Read data

By inserting larger query fragments and using UNION, it is also possible to retrieve data from other tables...

See more than allowed. Coernl @ pixabay)

Assuming that another table (users) stores users with their pseudonym (username) and password (password), we could insert a more suitable fragment and construct the following query:

Indentation is added for readability, but in real life, everything will fit on a single line.

select * from articles where id = '-1'
union select
    id,
    username as title,
    0 as publication,
    password as content
from users
where username = "tbowan"
--' and publication < strftime('%s', 'now')

We use a non-existent identifier (-1) to not select any article, then we add users (union) after taking care to adapt the columns. In reality, you have to urlencode the fragment so that it corresponds to a valid URL, but that's not too complicated...

To make the request readable on small screens, I had to add line feeds...

tbowan@nop:~$ curl "http://localhost?id=-1%27"\
"%20union%20select"\
"%20id%2C"\
"%20username%20as%20title%2C"\
"%200%20as%20publication%2C"\
"%20password%20as%20content"\
"%20from%20users"\
"%20where%20username%20%3D%20%22tbowan%22"\
"%20--"
<h1>tbowan</h1>
<p>Published : 01/01/1970 00:00:00</p>
$2y$10$Yoynw3upeUSzt4A3ouRt1.
V/dAp62uHyhRB2c4e5e2Ad1KIh2b4We

Good thing I securely stored the password....

But also…

Sometimes, the application and/or database is configured to accept queries in series. By separating them with a semicolon, you can then insert all possible SQL queries and have read and write access to all data.

You can also destroy everything. stux @ pixabay

If this sequence is not possible, you will need to find entry points in the application that use corresponding SQL queries and are vulnerable to injection. For example, inserting, editing or deleting an item.

Protections

Fortunately, we now have a whole arsenal of solutions for easily avoiding these injections.

Filtring, that's a start

When handling data from users, it's never a good idea to first validate it. In fact, it should already be a reflex to filter all inputs according to the expected type.

It's tempting to make your own routines, but often languages already have everything you need. In PHP, you have filter_var() and validation filters which check that the data supplied is in the correct format.

The code could be modified to insert a validation before the query is generated.

// 2.1. Filter user data
$id = filter_var($_GET["id"], FILTER_VALIDATE_INT) ;
if ($id === false) {
    echo "Well tried but no." ;
    exit(1) ;
}

// 2.2 build the SQL query
$query  = "select * from articles where "
       .= "id = $id and "
       .= "publication < strftime('%s', 'now')"
    ;

But when about securing your applications, you shouldn't rely solely on filtering. There are two main reasons why:

Filters are only available for simple types (integer, float, domain name, URL, IP address,...). If your data is more complex, or if you need to accept arbitrary text, they will be useless,
They need to be systematically called up, and for maintenance, this is a source of omissions and problems, as no analysis tool will be able to detect them.

Escaping is good

Since we need to be able to handle textual data (or filter omissions), we need to escape escape characters. Once again, each language has its own methods.

Historically, we've used addslashes() for a long time, which escapes characters that should be escaped. This was automatically applied to input data (get, post and cookies) but it depended on the configuration... By default before PHP 5.4.0 and then only if you activated it...

In addition to the problems of double escaping (when you think it hasn't been escaped yet) or forgetting to escape (when you think it has already been escaped), this function doesn't handle character encodings.

Most of the time, you can rely on the database access functions library, which contains a function that knows how to properly escape your data, taking all the subtleties into account.

In PHP, I recommend the method PDO::quote() which, as well as escaping the characters, will add single quotes around our value. Our example could then be modified to specify the connection encoding and then escape the parameter as follows:

// 1. Connexion to the database
$pdo    = new PDO("sqlite:/var/www/mabase.sqlite", "charset=UTF8") ;

// 2 SQL Query
$query  = "select * from articles where "
       .= "id = " . $pdo->quote($_GET["id"]) . " and "
       .= "publication < strftime('%s', 'now')"
    ;

This time, it's no longer possible to inject SQL fragments but, as with any manual solution, you'll need to be very careful not to forget a call, as you can't check it automatically.

Preparing is better

Obviously, the best thing would be to use an adapter which, by construction, secures SQL queries against the injection of fragments into your parameters. Something like for command injection?

Protect in advance. Wokandapix @ pixabay

Good thing it already exists: prepared statements. You create a textual SQL query and, each time you need to insert a parameter, you mention it. The parameters will only be passed when the query is executed, and will be injected cleanly in the locations you specify.

Another advantage of this type of query is that by sending them in advance, your driver can take advantage of the available request caches which, for queries executed several times (even with different parameters) can save you time.

As always, each language has its little habit. For PHP, it's the PDO::prepare() method that creates the query that you can then execute. In our example, you need to specify a parameter in the query, then prepare it before executing it with the value of the parameter.

You can specify the location of the parameters with a question mark ? but I prefer to use the colon followed by a name :id which allows me to name them and often makes the code more readable.

// 2. SQL query
$query   = "select * from articles where "
        .= "id = :id and "
        .= "publication < strftime('%s', 'now')"
    ;

// 3. Send the query and get the result
$request = $pdo->prepare($query) ;
$request->execute([ "id" => $_GET["id"] ]) ;
$row    = $request->fetch() ;

This time, not only we protect against injections, it's detectable by code analysis tools (basically, any call to query() becomes suspicious) and more, we take advantage of optimisation to save execution time. Isn't life grand?

And now ?

You are now able to detect these vulnerabilities and correct them, but that's only the beginning of the journey if you're passionate about these injections... While you wait for us to write the rest, you can wait with these two other web vulnerabilities :

Avoid Command Injection in PHP: January 13rd 2020 Over time, we all start issuing commands from our web applications. The problem is when users provide parameters, you have to be particularly careful to avoid command injections that would hijack your application. Fortunately, the solution is easy to implement.
Object serialization and injection in PHP: April 2nd 2020 One day, we invented the somewhat crazy idea of transferring objects from one application to another. Subsequently, we realized that this feature opened up great attack possibilities for those who can inject their own content. Eventually, we realized that we should never use serialization.