XSS – Security Acronyms explained

September 7, 2017December 4, 2017Daniel HepperDjango, JavaScript, Security, Web Development

This article is part of a series on security acronyms every Django developer should know.

What the XSS?

XSS stands for Cross-Site Scripting. Cross-Site Scripting is a code injection technique. Through carefully crafted input, an attacker injects code, usually JavaScript, into a website. This code is then run by the victim’s browser and can basically do anything on the page that a legitimate user can do.

Why is XSS a problem?

An important concept in web security is the Same Origin Policy (SOP). According to this policy, scripts on a web page can only access data in another page if it is of the same origin. Two pages have the same origin if the protocol, host and port are identical.

What would happen without the Same-Origin policy? Let’s say you are logged into your online banking account. Suddenly, you receive an instant message from a friend: “Hey, you must watch this funny dog video: http://t.co/awoeirusfksl”. Normally, you don’t click any unsolicited links, but because it comes from your friend and you are a dog-person, you make an exception.

But it wasn’t really your friend who sent you this link. His IM account has been hacked and the website actually belongs to a hacker. So while you watch a cute wiener dog doing summersaults, the malicious web page opens your bank’s website in a hidden iframe. The malicious site runs a script that controls the iframe and executes a wire transfer, emptying your bank account.

If your palms got all sweaty, you can relax because the Same Origin Policy prevents exactly this kind of behavior.

However, when your bank’s website has a Cross-Site Scripting vulnerability, this attack can happen. Because the malicious code is injected into your bank’s website, the browser treats it as having the same origin and gives it full access to all data on the page.

The “Hello World” of XSS is to inject this code snippet:

<script>alert(1)</script>

When successfully injected into a page, the browser opens a dialogue displaying “1”.

This looks deceivingly harmless and might be dismissed as an annoying prank, but make no mistake: once an attacker manages to exploit an XSS vulnerability, it’s Game Over. Almost anything you can do on the vulnerable site, an attacker can do.

Types of XSS

So how does an actual Cross-Site Scripting attack work? Cross-site scripting can be differentiated into several categories. The difference between these categories is how the malicious code, called the payload, finds its way into the vulnerable website.

Reflected XSS

In a reflected XSS attack, the payload gets sent by the victim’s browser and then it is returned back as part of the response by the server.

A common example is a search function, where the search term you entered is sent as an URL parameter and return as part of the results page. To execute the attack, the attacker tricks you into accessing a specially crafted URL.

Persistent XSS

In a persistent XSS attack, the payload is stored as part of some user input in the server’s database. As soon as the victim accesses a URL that uses this user input to render the page, the payload is executed.

A persistent XSS vulnerability can be especially harmful because it doesn’t require any action from a user besides visiting the vulnerable page.

An example could be a comment that an attacker posted on your photo on a photo-sharing page. As soon as you open the page containing the comment, the code planted by the attacker get executed.

Part of the attack might be to post the same malicious comment on all of your friend’s profiles. When they check out the comment that seemingly came from you, it gets posted to their friends’ profiles and thus spreads like wild fire.

An example of this attack in the wild was the Samy worm that made headlines in 2005 by infecting over 1 million MySpace profiles in just 20 hours.

Client-side/DOM-based

In a client-side or DOM-based attack, the attacker exploits a vulnerability in the code that runs on the client-side, most commonly JavaScript. Again, one attack vector could be an attacker tricking you into accessing a specially crafted URL where the URL parameter contains the payload.

Self XSS

Last but not least, there is self-XSS, where the attacker actually tricks you to hack yourself, e.g. by entering the payload into the JavaScript debug console of your web browser.

You wonder why anyone would do this? How about someone tells you about a secret method that unlocks a hidden functionality on Facebook? All you have to do is to visit facebook.com, open the developer console and paste this blob of gibberish. You say that sound ridiculous and nobody would fall for this scam?

Well, apparently enough people fall for it that Facebook feels the need to display a warning if you open the development console on facebook.com.

XSS Examples

Let’s look at some really simple examples how an XSS vulnerability looks like. Let’s assume for a second you just started learning Django, read about Models, URLs and Views, and got really excited to build the next Facebook. You haven’t fully grasped the concept of templates yet, but you know how to concatenate strings, so you start coding.

# NOTE: THIS IS BAD CODE, FOR DEMONSTRATION ONLY!

def show_user(request):
    username = requests.GET.get('username')
    ty:
        user = User.objects.get(username=username)
    except User.DoesNotExist:
        return HttpResponseNotFound('<html><body>User with username "%s" does not exist</body></html>', status_)
    ...

def add_comment(request):
    if request.method == 'POST' and 'comment' in request.POST:
        Comment.objects.create(text=request.POST['comment'])
    return HttpResponse('ok')

def show_comments(request):
    body = '<html><head></head><body>'
    for comment in Comment.objects.all():
        body += '<p>' + comment.text + '</p>'
    body += '</body></html>'
    return HttpResponse(body)

Can you spot the XSS vulnerabilities and identify their type?

The view show_user reads the query parameter username and if no user with that username can be found, the username is used verbatim to build an error message.

What happens if show_user get called with the query string username=<script>alert(1)</script>? There is probably no such user, so this error message would be displayed:

<html><body>User with username "<script>alert(1)</script>" does not exist</body></html>

This is a textbook example of a reflected XSS vulnerability.

add_comment and show_comments together form a persistent XSS vulnerability. add_comment takes data from a POST request and stores in the database without any processing. This itself is not an issue, but show_comment uses this data to build a (really simple) HTML site.

XSS protection

So how do you prevent XSS attack? The underlying principle of Cross-site Scripting, like any injection attack, is that some kind of user input is unintentionally interpreted as code and executed. There are two ways how to prevent that, escaping and sanitizing.

When you sanitize, you try to filter out the potentially troublesome data, either by forbidding certain things (blacklisting) or by only allowing certain things (whitelisting). Whitelisting is usually the more secure approach.

Blacklisting

Let’s say you are building a Twitter clone and want to allow people to post status updates containing some formatting using HTML. Obviously, you don’t want people to post JavaScript, so you filter out script tags with a regular expression.

safe_comment = re.sub(r'</?script.*?>', '', comment) # DON'T DO THIS!

Pretty simple, right? Here are two ways how this simple filter can be circumvented:

<img src="https://httpbin.org/status/404" onerror="javascript:...">
<scri<script>pt>...</scrip</script>t> – when the filter is only run once, it removes the one pair of <script> tags and makes the scrambled tags valid.

Of course, these could be filtered out as well, but the point is that blacklisting is a loosing battle and should be avoided.

Whitelisting

The alternative approach is to whitelist only valid data.

One example can be found in Django’s URL patterns. When you use regular expressions to capture view parameters, make the regular expression as strict as possible.

Let’s say you have view profile_view(request, user_id) that takes an user ID as URL parameter. Here are two ways how you could define the corresponding URL pattern.

url(r'/profile/(?P<user_id>.+)/$', profile_view),   # Bad, accepts any character
url(r'/profile/(?P<user_id>\d+)/$', profile_view),  # Good, only accepts digits

The second approach is better because it is the most strict definition and ensures that only digits ever get passed to the view. The URL /profile/<script>alert(1)<%2Fscript>/ wouldn’t do any harm and just result in a 404 Not Found.

When dealing with HTML, you could allow <i></i>, <b></b> and <u>, but reject any input that contains any other tags.

Whitelisting is easier to get right than blacklisting, but you have to be really careful not to allow too much.

Escaping

The better approach is to escape any user input. Escaping means expressing a symbol in an alternative way that results in a different interpretation. Let’s look at a practical example:

The characters < and > have a special meaning in HTML, they mark the begin and end of an HTML tag. So if we want to actually display one of these characters on a web page, we have to write them as < and >. The Ampersand & has a special meaning too, so we have to escape it as & if we want to use it. Last but not least, you also have to escape the ” mark as " because it is used around attributes.

There are essentially two places where you should think about escaping and/or sanitizing data: when the data enters the server and when it leaves the server.

Sanitize incoming data
Escape outgoing data

In most cases, it doesn’t make any sense to store escaped data, because the correct escaping depends on the context. When dealing with HTML you have to escape different characters than when constructing a URL.

XSS protection in Django

Proper escaping can be cumbersome. Luckily, Django has you covered: ever since version 1.0, Django automatically escapes all template variables.

This can be easily confirmed in the console:

>>> from django.template import engines
>>> django_engine = engines['django']
>>> template = django_engine.from_string("Hello {{ name }}!")
>>> template.render({'name': 'Daniel'})
'Hello Daniel!'
>>> template.render({'name': '<script>alert(1)</script>'})
'Hello &lt;script&gt;alert(1)&lt;/script&gt;!'

So all we have to do to fix those broken views from the example above is to rewrite them with proper templates, as you would probably have done in the first place.

def show_user(request):
    username = requests.GET.get('username')
    ty:
        user = User.objects.get(username=username)
    except User.DoesNotExist:
        return render(request, 'user_not_found.html', {'username': username}, status=404)
    ...

def add_comment(request):
    if request.method == 'POST' and 'comment' in request.POST:
        Comment.objects.create(text=request.POST['comment'])
    return HttpResponse('ok')

def show_comments(request):
    return render(request, 'comment_list.html', {'comments': Comment.objects.all()})

# user_not_found.html
<html>
  <body>
    User with username "{{ username }}" does not exist
  </body>
</html>

# comment_list.html
<html>
  <head></head>
  <body>'
    {% for comment in comments %}
    <p>{{ comment.text }}</p>
  </body>
</html>

Potential XSS vulnerabilities in Django

Just because Django automatically escapes template variables doesn’t mean that Django applications can’t have XSS vulnerabilities.

Here are a couple of ways how autoescaping can be circumvented:

With the template tag {% autoescape off %}
With the template filters safe and safeseq
Using
SafeBytes, SafeString, SafeText, SafeUnicode or mark_safe() from django.utils.safestring
Using format_html(), format_html_join(), or html_safe from django.utils.html
By creating a class with a __html__() method
Using the wrong escaping, e.g. the escapejs filter to display HTML.
Not using templates to create a response

All of these have valid use cases. Autoescaping is convenient, but sometimes, it gets in the way. Maybe you build a Content Management System to power your website and want to allow your staff to edit the HTML of individual pages. The HTML is stored in the database and to properly render the page, you do not want it to be escaped.

But don’t turn off escaping without careful consideration. If you find yourself using one of these functions or classes or encounter any of them in a code review, step back and have a look at the data they are using. Following the data back to its source and make sure untrusted user input never gets marked as safe.

What exactly is user input

Let’s talk about user input for a second. The term user input is much broader than it seems at first glance. “User input” implies that it was typed in by a user on a keyboard, but that is misleading.

Everything that comes across the network must be considered user input: the URL, the request body and also headers like user-agent or referer.

Imagine an administrative page that creates statics about which browsers are used to access your site by reading the user-agent header. If this page doesn’t implement proper escaping, an attacker can send an HTTP request with a manipulated user-agent header to hijack this page to steal your session cookie and thus gaining administrative access to your website.

Takeaways

This was a long read, but I want you to take away two things:

User input should never be trusted blindly
User input does not only come from input fields

XSS is far from a new thing. It’s been around ever since websites were created dynamically. XSS is so old that even MySpace was hacked through it! But despite its age, XSS is still one of the most prevalent security vulnerabilities and creeps into applications written both by newbies and veterans. Hopefully, this article gave you a solid understanding and helps to make your applications a tiny bit more secure.

Drop any questions in the comment box or reach out via Twitter or Email.

The next article in this series on security acronyms every Django developer should know will discuss how to make your site less secure by using CORS to circumvent the SOP and why that might be a good idea.

Considerate Code