Creative Evasion Technique Against Website Firewalls

7 minute read

During one of our recent in-house Capture The Flag (CTF) events, I was playing with the idea of what could be done with Non-Breaking Spaces. I really wanted to win and surely there had to be a way through the existing evasion controls.

This post is going to be a bit code-heavy for most end-users, but if you choose to read you’re bound to find it very insightful.

Setting the Foundation

For those of you that are unfamiliar with programming languages, you have to understand the world of code is very deep and complex. Even most developers don’t fully understand this, hence the various issues we see. This is especially true in some of the world’s most popular extensible Content Management Systems (CMS) like WordPress, Joomla!, Drupal, and so many others.

The issue is not in the core as much as it is in things like plugins, extensions, themes, modules, components, etc… This comes from the lack of understanding of the complexity associated with web languages and encoding.

For instance, let’s look at HyperText Markup Language (HTML). HTML uses special characters such as “quotes” to add formatting to plain text.

Sucuri - HTML Encoding Example

Sucuri – HTML Encoding Example

HTML is obviously a very simple language and most developers don’t even recognize it as a real language, but that’s not the point in this post.. :)

What about JavaScript though? Javascript uses {(curly brackets)} to define objects and functions.

Sucuri - JavaScript Example

Sucuri – JavaScript Example

You then have browser technologies like Flash and Java, but for the purpose of this post we’ll focus strictly on plain text.

What this hopefully illustrates is the dynamic nature of today’s web browsing technologies. This complexity is driven by users looking for rich applications to improve our experience and businesses looking to improve user experience. Unfortunately, in the process we start to overlook threats.

Plain Text in the Browser

Sometimes it doesn’t take much to demonstrate the type of threats that exist.

For instance, in the recent in-house CTF I decided to prove my point my focusing on Plain Text. The objective of the CTF was to successfully perform a successful SQL Injection attack against one of our honeypots. Naturally the honeypot was hardened and so the normal techniques were out of the question (Yes, I checked too).

I started thinking of the various web elements and how they all interact with each other to give us the experiences we enjoy. In the process, I began thinking of plain text, specifically how it displays in our browsers.

Have you heard of ASCII characters? This is what I started thinking about as I pondered plain text. You see, ASCII worked for the English language. However, it was impossible for many other languages to use – because of the 128 character limitation.

Today, we use Unicode to display plain text in a browser:

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world’s writing systems. Wikipedia

This brought me full circle to how attacks against websites work today.

In many instances the abuse method is the website URL structure. It’s what you type into the browser when you’re navigating to a page (e.g., Today’s attacks abuse these structure through the use of characters like :colons: and /slashes/; the same characters used to indicate resources. For this to work though you have to encode these parameters to the URL structure you have to encode them (similar to the table provided above).

For example, if you Google :/ it would show up in the search parameters of the URL as %3A%2F

Writing Spaces

With this knowledge, I was still focusing on Plain Text but now also looking at how it’s Encoded in the URL structure. The missing piece was how spaces would be handled.

Spaces like other parameters need to be encoded as well. Who would think to abuse spaces right? That just feels wrong, like kicking someone when they are down. It’s just not cool, right?

Reflecting on Spaces

With that understanding, let’s take a quick peak at what spaces do for us. In programming spaces are very important, let’s look at what happens when we don’t leverage them.

Since the objective of the CTF was SQLI, I looked at how Structure Query Language (SQL) makes use of a space. Something like the following:


Would generate an error like this:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECTCOUNT(*)FROM`wp_users`' at line 1

Add a little spacing, and voila:

SELECT COUNT(*) FROM `wp_users`;

The query to the database would be recognized as a valid query. Great!

This got me thinking how spaces are handled across the various technologies.

In ASCII, the ordinary space will be %20 when it is URL Encoded. This also happens to be the default configuration for most security products. What I learned is that this leaves windows of opportunities open, in this case the non-breaking space.

In HTML coding, the non-breaking space is a character entity which can:

  • create white space between words or web page elements
  • stop the browser from breaking a line in the wrong place.

What exactly happens when non-breaking spaces are overlooked ?

Non-Breaking Spaces

The non-breaking space is essentially just a space, which isn’t allowed in a URL. It can be represented in different ways. Here are some examples:

HTML Entity:  
URL Encoding: %A0
Hexadecimal Notation: xA0

Let’s create a demonstration of how this could be used in an attack scenario, using PHP. </p>
     var_dump( "xA0" );
     var_dump( " " );
     var_dump( "x20" );

The output of the example above would be:

     string ' ' (length=1)
     string ' ' (length=1)
     string ' ' (length=1)

What this shows is that we have three different ways of making a space. This got me thinking what would happen if we removed the whitespace.

Trim Whitespace

This took me to the Trim function in PHP. This should technically remove the whitespace from the start of the string.

Here is an example of what it’d look like:

 <?php     $test = trim( "xA0" );
     var_dump( $test );
     $test = trim( " " );
     var_dump( $test );
     $test = trim( "x20" );
     var_dump( $test );

This would output something like:

     string ' ' (length=1)
     string '' (length=0)
     string '' (length=0)

To understand why the first “space” was not removed, we really need to turn to the PHP Documentation on Trim:

This function returns a string with whitespace stripped from the beginning and end of str. Without the second parameter, trim() will strip these characters:

  • ” ” (ASCII 32 (0x20)), an ordinary space.
  • “t” (ASCII 9 (0x09)), a tab.
  • “n” (ASCII 10 (0x0A)), a new line (line feed).
  • “r” (ASCII 13 (0x0D)), a carriage return.
  • “” (ASCII 0 (0x00)), the NUL-byte.
  • “x0B” (ASCII 11 (0x0B)), a vertical tab.

You will notice xA0 is not listed there, meaning if you do not tell trim to remove xA0 it will not do it for you. In most cases you will not need to worry about this.

In this example, we can see the non-breaking space gets removed from the output:

       $string = preg_replace( '/s+/', null, "testxA0ing" );
       var_dump( $string );

The output of the example above is:

       string 'Testing' (length=7)

Of course, you can still use trim, providing that you tell it to trim xA0:

      $string = trim( "xA0", " tnrx0BxA0"  );
      var_dump( $string );

The Impacts of the Space

What I can say is that I went home with a prize.. :)

As for whether you as a developer or website owner find it a credible threat, I supposed only time will tell. What I can say though is that based on our research it appears that very few developers account for the lonely space, and that’s something we strongly encourage everyone to look at.

Spaces are crucial to a number of actions like SQL queries. Some ingenuity can lead to very successful SQL injection attacks, something that even the biggest Website Firewalls today are failing at.

For example, if you visited a page similar to:

The value of $_GET[‘var’] would be:

string 'xA0' (length=4)

Taking a look at the web servers access log, we would see an extra backslash is added:

"GET /index.php?var=\xA0 HTTP/1.1" 200 -

This prevents xA0 from being converted to a non-breaking space. If someone were to make a script to send the hexadecimal notation, then this would get converted to a non-breaking space when it reaches its destination.

So, if we sent a HTTP request in PHP like this:

     $data = file_get_contents( "" );
     var_dump( $data );

The notation would be converted to a non-breaking space, something like the following:

"GET /index.php?var=xa0 HTTP/1.0" 200 -

See the problem and potential impact?

It’s a lot of fun for our team to do these challenges. Also, the attention to detail and commitment to investigating potetntial attack vectors is what makes our Website Firewall so awesome.

Spotlight on Women in Cybersecurity

less than 1 minute read

Sucuri is committed to helping women develop their careers in technology. On International Women’s Day, Sucuri team members share their insights into workin...

Hacked Website Trend Report – 2018

less than 1 minute read

We are proud to be releasing our latest Hacked Website Trend Report for 2018. This report is based on data collected and analyzed by the GoDaddy Security / ...