# What could go wrong when MySQL strict SQL mode is off?
This article shows some examples of attacks that can abuse MySQL behavior when the strict SQL mode is disabled, especially when string characters are invalid in the current encoding. This happens when the encoding of the application (e.g. UTF-8) is wider than that of the database (e.g. ASCII).
Looking to improve your skills? Discover our **trainings** sessions! Learn more.
## What is MySQL strict SQL mode?
Strict mode[1] controls how MySQL handles invalid or missing values in data-change statements such as `INSERT` or `UPDATE`. The case detailed in this article is when the value is out of range. What happens when a value is invalid for the current encoding?
An over-simplification of the strict mode in this case:
Strict mode On
Strict mode Off
Error
Warning (silent)
Nothing modified
Insert “adjusted” value (“closest values”)
At this point, you guessed what could go wrong.
## Set and check
In fact, there is no “strict mode” that is “on” or “off”. There is a system variable named `sql_mode` that contains an array of values. If that array contains one of `STRICT_TRANS_TABLES` or `STRICT_ALL_TABLES` values or the combination mode `TRADITIONAL`, then MySQL is in “strict mode”.
The documentation for MySQL 9.1[2] claims that the default value of `sql_mode` is the following.
“`
ONLY_FULL_GROUP_BY STRICT_TRANS_TABLES NO_ZERO_IN_DATE NO_ZERO_DATE ERROR_FOR_DIVISION_BY_ZERO NO_ENGINE_SUBSTITUTION
“`
The default value may differ on other versions or alternative backend engines such as MariaDB or Percona Server. The `@@sql_mode` system variable allows to query the current value.
“`
[u]> SELECT @@sql_modeG; @@sql_mode: STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
“`
Installation programs may configure the SQL mode during the installation process. So even if the strict mode is enabled by default, XAMPP, CMS, etc. may silently disable it.
The violent method to disable strict mode (and everything else) is:
“`
SET sql_mode=”;
“`
## Observations
Let’s create a table using ASCII encoding where it is easy to insert invalid values and see what happens when the strict mode is on and off.
“`
— Create a table with a column using a narrow space encoding on purpose CREATE TABLE uni_sandbox ( id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255) CHARACTER SET ascii ); — Strict mode enabled SET sql_mode=’STRICT_ALL_TABLES’; INSERT INTO uni_sandbox (data) VALUES (‘I ♥ Unicode’); — => ERROR 1366 (22007): Incorrect string value: ‘xE2x99xA5 Un…’ for column `unicode8`.`uni_sandbox`.`data` at row 1 — Strict mode disabled SET sql_mode=”; INSERT INTO uni_sandbox (data) VALUES (‘I ♥ Unicode’); SELECT * FROM uni_sandboxG; — => data: I ? Unicode
“`
With strict mode on, an error was triggered, with strict mode off, the data was inserted but `♥` was replaced by `?`. Oracle MySQL or MariaDB documentation says “convert the invalid value to the closest valid value” without explaining exactly how it works. In practice, strings longer than the column size will be truncated, integers rounded to the nearest values, but that is only for values that are overflowing. For values that are invalid (can’t be represented in the encoding), each byte is replaced with a `?`.
## Attack context
Imagine the following context: there is a web application, that is strictlly checking the user input. A generic method commonly used for validation is by using regular expressions (RegExp). Independently of the language, nearly all RegExp engines supports POSIX character classes. But years ago, with the generalization of the Unicode support, many engines extended the range of those classes to work on Unicode too (originally limited to ASCII). So for example, instead of matching alphanumeric characters only on the ASCII range ( `A-Za-z0-9`), it will also match characters from the corresponding Unicode categories like Letter ( `L`) and Number ( `N`) by default.
Take the following example in Ruby with the character Latin Letter Bidental Percussive `ʭ`.
“`
/[[:alnum:]]/.match(‘ʭ’) # => #
“`
Some other languages like JavaScript, have also implemented non-POSIX character class selectors by implementing Unicode properties and categories selectors ( `p{…}`) where you can match Unicode properties directly (e.g. `p{Ll}` or `Lowercase_Letter` for lowercase letters) or alias properties[3] (e.g. `p{Alpha}` for Alphabetic that matches Letter and Letter Numbers).
“`
“ʭ”.match(/p{Alpha}/u) // Array [ “ʭ” ]
“`
On the other hand, if the database is a using a “narrow space” encoding such ASCII, CP-1252 or even a legacy partial implementations such as utf8mb3 in combination with the strict mode disabled (e.g. CMS installation does it automatically), that would lead to some Unicode characters passing the security check but, being invalid on the database encoding, they would end up replaced with `?`.
In this context, there are a few attack scenarios and security bypasses that could work in real life that come to mind.
## Weaponizing MySQL strict SQL mode fallback
### Summary of the context
All the attacks presented in this section will be placed under the following context: the application has security checks allowing only AlphaNumeric characters including Unicode ones but the MySQL database uses an ASCII encoding or similar where Unicode characters are invalid and with strict SQL mode disabled.
### Shell globbing
Bash does not support Regular Expressions (RegExp) but can still carry out filename expension, which is known as globbing. Bash will then expand characters known as wild cards. The most famous one is `*` matching any string of any length, but there is also `?` that matches exactly one single character (different from the `?` RegExp quantifier). For example, at the root of a Unix file system `???t` will match both `boot` and `root` directories.
“`
$ ls -d /???t /boot /root
“`
So imagine you have a local file disclosure (LFD) thanks to an Insecure direct object references (IDOR) but the files are renamed with an UUIDv4 (e.g. `2a0f6947-bd44-449e-94fe-82ebc3ecf115.txt`). Technically you can read the files of all other users but in practice you cannot because it is not possible to brute-force the identifier.
But now, what if you ask to read `ʭʭʭʭʭʭʭʭ-ʭʭʭʭ-ʭʭʭʭ-ʭʭʭʭ-ʭʭʭʭʭʭʭʭʭʭʭʭ.txt`? It would allow you to list all files, because `ʭ` (Latin Letter Bidental Percussive), as well as hundread of thousand of other characters, is a Unicode character of Letter type that would pass the alphanumeric security check. After that, when stored in MySQL, it would not be recognized as a valid ASCII character, so it would fall back to `?` because of the strict SQL mode being disabled. Then, in Bash, `?` would be expanded as any single character. Thanks to that a `find` command would list all files in the directory instead of just one, bypassing the application security check as well as the use of UUID identifier.
### Regular expression quantifier
For RegExps, `?` is a quantifier character, it is appended after an expression to say there must be zero or one occurence.
For example, the RegExp `filenamed?.txt` will match any of the filename below with no or one digit but not `filename10.txt` because there are two digits.
“`
filename.txt filename1.txt … filename9.txt
“`
Now imagine a Python application that returns files based on a user input like follows.
“`
import re username = User.username # user input fetched from DB check(username, lib.alphanum) # some Unicode Alphanum check re.findall(f’confidential-{username}.pdf’, ‘list-files-fetched-fromFS-or-DB’, re.IGNORECASE)
“`
So similarly to before, registering a username of `aʭbʭ…yʭzʭ0ʭ1…8ʭ9ʭ` would be transformed to `a?b?…y?z?0?1?…8?9?` which would allow matching any ASCII letter and number once interpolated into a RegExp. If targeting a pattern that is five character long, then this pattern must be repeated 5 times. The payload would be very long and inefficient, but it will work.
“`
import re import string payload = “”.join(map(lambda i: i + ‘?’, string.ascii_lowercase + string.digits)) * 5 re.findall(f’confidential-{payload}.pdf’, ‘confidential-noraj.pdf’, re.IGNORECASE) # => [‘confidential-noraj.pdf’]
“`
Of course, it would be easier in a scenario where you are allowed to use hyphens and brackets, and where you just need to bypass the `?` restriction.
### Query parameter
The ability to inject a `?` in a context where the user is supposed to only be able to write alphanumeric characters could also allow injecting a query parameter in a URL.
For example, if a web application is internally crafting a URL based on the user’s name or any filtered user input stored in database, then the user could be able to add a query parameter like `debug` or `admin` that would give them unauthorized access or sensitive information. The partial Python example below could give an idea of what it could look like.
“`
from flask import Flask, redirect, url_for import urllib.parse app = Flask(__name__) @app.route(‘/data/’) @internal # Internal route to get user data by username def profile_name(username): # get some data from FS based on username data = “user_secret” if request.args.get(“debug”) != None else “user_public_stuff” return data @app.route(‘/profile/name/’) @internal # Internal route to convert userid to username def profile_data(userid): user = User(userid).username # instanciate user class by fetching data from DB return redirect(urllib.parse.unquote(url_for(‘profile_name’, username=user))) @app.route(‘/api/public/profile/’) @auth # Public route to get profile info def api_profile(userid): return redirect(url_for(‘profile_data’, userid=userid)) @app.route(‘/api/public/profile/name/’) @auth # Public route to set user’s name def set_name(userid): username = request.args.get(“name”) check(username, lib.alphanum) # some Unicode Alphanum check User(userid).username = username return True if __name__ == ‘__main__’: app.run(debug=True)
“`
With what was seen earlier, the database would transform a username like `norajʭdebug` into `noraj?debug`. Of course, not being able to inject an `=` will prevent passing a value to the parameter. However, depending on the web framework used or programming language, the application may only check that the parameter is present.
### WAF bypass
Outside the previously set context, of course indirectly injecting a `?` could also help to bypass WAFs. For example, in a database storing file blobs or templates where the attacker has gained write access to, the WAF would block a payload including `<?php `. In that case, sending `<ʭphp` instead could bypass the detection rule and being converted to the original payload right after thanks to strict SQL mode fallback behavior.
### Real life story
A long time ago (in 2012), the planets were aligned to allow crazy bugs. In those prehistoric times, WordPress was disabling the strict SQL mode during the setup. On the other hand, MySQL was chaining the bad design flaws. The only implementation of UTF-8 in MySQL was called `utf8` (which is now called `utf8mb3`) that had the issue of handling only 1 to 3 bytes characters[4] (instead of 4 bytes maximum for valid UTF-8). `utf8mb3` was very misleading as claiming that MySQL was supporting UTF-8 was letting people think that the UTF-8 support was 100% complete, not just a partial implementation. The other dangerous behavior was that when strict SQL mode was disabled, it was not replacing invalid characters with `?` like it is now. Instead, the invalid characters were purely and simply removed as well as the rest of the string! So, any valid UTF-8 4 bytes character was judged as invalid in `utf8mb3` and was triggering a truncation of anything that was following.
The combination of those 3 major flaws was exploited in 2014 by Cedric’s Cruft to obtain stored XSS in WordPress[5] in the core comment feature.
