Webinar: Evaluation - 05.12
We continue to develop PVS-Studio as a SAST solution. Thus, one of our major goals is expanding OWASP coverage. You might ask, what's the use when there's no taint analysis? That's exactly what we thought - and decided to implement taint analysis in the C# analyzer. Curious about what we accomplished? Read on!
Note. This article briefly touches upon the topics of SQL injections and working with SQL in C#. This theory serves as context. For in-depth information on these topics, do additional research.
Taint analysis helps track data that a program receives from an external source (taint source). Though such data is considered tainted, it does not necessarily cause damage when spreading. However, programs may have key points (taint sinks). When tainted data hits them, this interaction may result in vulnerabilities: SQLI, XSS, etc. Taint analysis helps find data distribution routes that allow data from the taint source reach the taint sink unverified.
Taint analysis works correctly if it takes the following into account:
Stir, but do not shake. ;)
Here's an example:
var userName = Request.QueryString["userName"];
var rawCommand = "SELECT * FROM users WHERE name = '" + userName + "'";
var sqlCommand = new SqlCommand(rawCommand);
// Execution of the tainted command
The code above is a "classic" SQL injection example. It would be one of the first things you find if you google this topic. The problem here is, data comes from the user and immediately becomes a part of a data query. An intruder may take advantage of this and adjust data so that it changes the SQL query's logic. Did you expect an incoming string that looks like JohnSmith? Instead, you may get the following: Sergey'; DROP TABLE users; --. Now doesn't this look great? :)
The first PVS-Studio analyzer to support taint analysis was the C and C++ version. We founded our V1010 diagnostic rule on taint analysis mechanics. The rule detects cases that conform with the following principle: a program gets data from an outside source and the data travels unverified and unhindered to a location it should not reach. For example, such tainted data may be passed to the command interpreter wrapper function - system. This rule actually helped me find a few interesting cases that I described in a separate article (it even contains a few videos).
Note. The article I mentioned above contains one curious case. The developers were fixing CVE in the code, but something went wrong. As a result, the code was fixed, but the problem did not go away. After a while the problem was assigned a new CVE identifier. Only then was the code fixed for good. :)
We've wanted to implement taint analysis in our C# analyzer for a while. Probably since the moment we added it to the C++ analyzer. The team would occasionally discuss the importance of adding something like V1010 to the C# analyzer - but we always had something more urgent to tackle. This changed in 2021. One of the goals the new roadmap defined for us was improving the C# analyzer's OWASP coverage. OWASP Top 10 2017 was of the top importance. However, we also wanted to keep ASVS in mind. Now that was an excellent excuse to finally get our hands dirty and tackle taint analysis!
We determined that our first taint-based diagnostic would search for possible SQL injections. This looked perfect, because it's a frequently encountered problem, mentioned both in OWASP Top 10 2017 (A1:2017-Injection) and in OWASP ASVS. That settled it.
Taint analysis is based on data flow analysis. The main infrastructure was already there. Now we needed to add information about taint sources, taint sinks, routes tainted data takes, and clearing data's "tainted" status.
While working on taint analysis, we refined some unrelated features we came across: we improved interpolated string support in data flow, enhanced loop counter processing, updated a portion of general mechanics and removed certain false positives. This chance to fine-tune the analyzer here and there, in locations we usually don't get to, was exciting.
But let's get back to taint analysis.
Tainted data distribution chains start from sources. Sources are locations where an application receives data from users. We assume unconditionally that all sources return tainted data.
Here are some common sources:
We assume that taintedVariable in the code below is tainted.
void Example()
{
var taintedVariable = Console.ReadLine();
TaintSink(taintedVariable);
}
Methods whose parameters are exposed to external code are another data source we believe can supply contaminated data. These include parameters of public methods that public classes contain:
public class Helper
{
public void ExecuteCommand(String commandToExecute)
{
TaintSink(commandToExecute);
}
}
In the code above, the ExecuteCommand method takes the commandToExecute parameter. We consider this parameter tainted. Let me elaborate on why we chose this approach. If a method is available for an external assembly, it could be a part of API that helps interact with a library. Alternatively, the reason why someone wrote such code could be that they did not care about access modifiers. :)
A developer who uses the library may hope that the library's method they call checks incoming data. Even if the library is open-source and available on GitHub, the library's user probably does not check how each method is implemented. The user may hope - and rightfully so - that the method they call checks the data.
The developer of this library may expect to get verified data and consider the second check unnecessary.
This may result in a scenario when user data enters an application unverified, because neither the application, nor the library checked it. Such direct use of external data may create a vulnerability.
Unfortunately, the PVS-Studio analyzer will not be able to reach a library method's implementation when the library's code is not available and the method is not annotated. But we still want to detect these cases. Therefore, it's a good idea to warn library developers that data passed to a public method may hit a taint sink unverified.
Here's an example that demonstrates this.
public class DBHelper
{
public void ProcessUserInfo(String userName)
{
....
var command = "SELECT * FROM Users WHERE userName = '" + userName + "'";
ExecuteCommand(command);
....
}
private void ExecuteCommand(String rawCommand)
{
using (SqlConnection connection = new SqlConnection(_connectionString))
{
....
using (var sqlCommand = new SqlCommand(rawCommand, connection))
{
using (var reader = sqlCommand.ExecuteReader())
....
}
}
}
}
External code can call the ProcessUserInfo method, because it is public and is inside a public class. The ProcessUserInfo method takes userName as a parameter. The value becomes part of a SQL query string written to the command variable. The ExecuteCommand method takes command as the rawCommand parameter, uses it to create an SQL command (sqlCommand), and executes it. This code looks unsecure, but this example does not contain an obvious taint source.
Now let's take a closer look at the scenario. Assume the code above is part of the SQLLib.dll library. A sample SQLIApp.exe application uses this library and calls the ProcessUserInfo method:
static void TestHelper(DBHelper helper)
{
var userName = Request.Form["userName"];
helper.ProcessUserInfo(userName);
}
Request.Form["userName"] gets user data that is then passed directly to the ProcessUserInfo method. Since ProcessUserInfo method is declared in the external library, one cannot review the method's code.
As a result, the data travels unverified - directly from the user to a method that uses this data. This looks unsecure.
Yes, the library's developers and its users could agree that, for example, the users check all data before passing it to the library's methods. This may be specified in the API documentation. However, when it comes to security, it's better be safe than sorry.
Unfortunately, when checking the SQLIApp.exe application's code, the analyzer won't know anything about the ProcessUserInfo method's implementation and won't be able to warn about a possible SQL injection. However, the analyzer may issue a warning when analyzing the library's source code.
Each taint-based diagnostic decides independently whether to consider parameters tainted. For a diagnostic that searches for SQL injections, we chose to produce warnings at a low certainty level.
Note. If you do not want to see such warnings, you can disable them in the .pvsconfig file with the following comment: //-V::5608:3. Then the log will not display low certainty level V5608 (SQLI) warnings. You can find detailed information about .pvsconfig files in the documentation article: "Suppression of False Alarms" (the "Suppression of false positives through diagnostic configuration files (.pvsconfig)" section).
And vice versa, if you consider these warnings extremely important, you can up their importance to high level, using //V_LEVEL_1::5608. The details are in the documentation's following article: "Additional Diagnostics Configuration" (the "How to Set Your Level for Specific Diagnostics" chapter).
Each diagnostic covers its unique taint sinks. This means, taint sinks are associated with their diagnostics rather than the entire taint analysis mechanics. As we discussed, it's essential that sinks don't get tainted data. If an application contains a route that can lead data from a taint source to a taint sink - trouble is brewing.
For example, in case with SQL injections, the sink may be the SQLCommand class constructor or the FromSqlRaw method.
For example:
var taintedStr = GetTaintedData();
var sqlCommand = new SqlCommand(taintedStr); // taint sink
....
You could think that the SqlCommand class's constructor is more of a transmitter, while the sink is one of the methods that execute the SqlCommand.ExecuteSomehow command. However, it seems very strange to first create a tainted command - and check it after. It makes more sense to first check incoming data, and then pass it to the SQLCommand class constructor. This is why in our case the SqlCommand constructor is a sink, and not a data transmitter.
The SqlCommand.CommandText property is also a sink. Below is an example of usecure code:
void ProcessUserInfo()
{
using (SqlConnection connection = new SqlConnection(_connectionString))
{
....
String userName = Request.Form["userName"];
using (var command = new SqlCommand()
{
Connection = connection,
CommandText = "SELECT * FROM Users WHERE UserName = '" + userName + "'",
CommandType = System.Data.CommandType.Text
})
{
using (var reader = command.ExecuteReader())
....
}
}
}
The code above creates a SqlCommand type instance. However, the tainted string is not passed as an argument to the constructor. This string is used to initialize the CommandText property.
It's worth saying that not all tainted data chains follow from sources to sinks. There are several reasons why the analyzer may stop tracking tainted data chains:
Note that conditional validation can be even more dangerous than tainted data, because there is an illusion of safety.
Different data types require different verification approaches. The choice depends on what we expect as input: data for a SQL command, a path, etc. For example, you can use parametrized queries to prevent SQLI.
String userName = Request.Form["userName"];
using (var command = new SqlCommand()
{
Connection = connection,
CommandText = "SELECT * FROM Users WHERE UserName = @userName",
CommandType = System.Data.CommandType.Text
})
{
var userNameParam = new SqlParameter("@userName", userName);
command.Parameters.Add(userNameParam);
using (var reader = command.ExecuteReader())
....
}
In this case, the analyzer will lose track of the tainted data chain when a SqlParameter type object is created. The analyzer has no information about whether the object transmits the contamination or is tainted. As a result, the analyzer will not consider the userNameParam variable tainted. The command does not include the userName value directly, which is why the analyzer will not issue a warning.
Tainted data does not travel directly from the taint source to taint sink. Theoretically, this is possible, but it is a somewhat fantastic scenario. :) After entering an application through a taint source, tainted data will most likely spread via various routes, and only then get into a taint sink. There are many ways tainted data can travel in an application. Simple variable assignments is the most obvious route.
In fact, we have already demonstrated this earlier:
void Example()
{
var taintedVariable = Console.ReadLine();
TaintSink(taintedVariable);
}
In the code above, the Console.ReadLine() method call is tagged as a taint source. Then the data is transmitted by assignment to the taintedVariable variable.
Reassignment can also guide tainted data:
var taintedVariable = Console.ReadLine();
var taintedVariable2 = taintedVariable;
There are even more interesting cases of tainted data transmission. For example, tainted strings can be formed through concatenation:
var shipCity = Console.ReadLine();
var resStr
= "select * from OrdersTable where ShipCity = '" + shipCity + "'";
While analyzing string concatenation, we check whether one of the operands is tainted. If this is so, the entire expression is marked as tainted.
Tainted data can also travel through interpolated strings:
var resStr = $"select * from UsersTable where Id = '{id}'";
Here we use a similar approach - we analyze the interpolated elements. If at least one of them is tainted, the entire expression is marked as tainted.
Another way to transmit tainted data is by calling methods. There are countless opportunities here. :)
One can translate tainted data from arguments into the return value. For example:
var resStr = String.Join(separator, nonTaintedStr, taintedStr);
When this code is executed, the contamination is passed from taintedStr to the value the String.Join method returns, and then to resStr.
One can also contaminate an object by passing tainted data to a method called for this object. Typical cases involve StringBuilder.
var sb = new StringBuilder();
sb.AppendLine(taintedStr);
var resStr = sb.ToString();
At first, sb is not tainted. But it becomes tainted if the AppendLine method called for this object receives tainted data as the taintedStr argument. After the sb object is tainted, it can contaminate other entities. In our case, the ToString method call translates the tainted status from the sb object to the returned value, thus contaminating the resStr variable.
Obviously, all these approaches may be combined, while tainted data may move away and on to a different method - such cases are also important to detect.
One of the things we have yet to overcome is the limitations of the value type analysis. Here's why. Currently, C# data flow analysis is limited to enumerations and integer types, such as int, byte, short, etc. If a tainted data chain contains an unknown value type (a structure, for example), the analyzer cannot track this chain any further. This is where the analyzer could truly grow and improve.
Since this is our first release of taint analysis features, we already have new ideas for additional features and enhancements. Step by step we will keep perfecting taint analysis. We also consider adding new diagnostic rules. If you encounter false positives or if the analyzer misses something, let us know. We will research these cases and may support them in the future.
Now let's take a look at how we use these general mechanics to conduct taint analysis. The general algorithm is about the same.
Of course, the diagnostics have additional logic, but they all follow this general algorithm.
As I've mentioned earlier, our first taint-based diagnostic was the rule to search for potential SQL injections.
What's an SQL injection? If you don't know, read up on Wikipedia or docs.microsoft.com. However, I'll still provide some context here.
SQL injections conform to the basic taint analysis theory we discussed earlier. Let's say there is some external taint source. Users are free to pass any data to this source. The data enters the application, moves around and, unverified, becomes a part of a SQL command. If the command allows any data, a user can supply compromised data, thus forcing the program to execute a custom query. That's an injection.
Let's take a closer look at one of the examples from above. If you've already googled queries like "SQLI C#", you have probably seen an example like this:
private HttpRequest Request { get; set; }
void ProcessUserInfo()
{
using (SqlConnection connection = new SqlConnection(_connectionString))
{
....
String userName = Request.Form["userName"];
using (var command = new SqlCommand()
{
Connection = connection,
CommandText = "SELECT * FROM Users WHERE UserName = '" + userName + "'",
CommandType = System.Data.CommandType.Text
})
{
using (var reader = command.ExecuteReader())
....
}
}
}
Here data that comes from an external source is assigned to the userName variable. Then this data, unverified, enters a SQL query - and this is a problem. This may cause consequences if the userName variable receives a compromised command instead of adequate data. For example, the incoming data may contain the following string: ' OR '1'='1. Instead of processing data for one user, the resulting command will process all elements in the data table.
Now let's take a look at this example from the analyzer's perspective. How is it going to detect a SQLI threat here?
In this case the Request.Form property is the taint source. The Request variable is of type HttpRequest. The Form property is of the NameValueCollection type. The analyzer considers NameValueCollection type object tainted. When the object's indexer is called, it retranslates tainted data across the entire expression (the value the indexer returns): Request.Form -> Request.Form["userName"]. Since we know the expression is tainted, the userName variable also becomes tainted.
Then the analyzer checks the SqlCommand constructor call, i.e. property initializations. The CommandText property is the one of interest. In our case CommandText is a sink. We expect the diagnostic to be triggered when data reaches the sink. Therefore, we analyze the right part of the assignment: "SELECT * FROM Users WHERE UserName = '" + userName + "'". What do we see here? That's right, string concatenation. We remember that when at least one operand is tainted, it contaminates the entire expression. As we remember, userName is tainted. This makes the resulting expression contaminated. Thus, the CommandText is assigned a tainted expression, which is exactly what we intended to check.
The described taint analysis is a part of PVS-Studio 7.13. The release also includes our new diagnostic that searches for possible SQLI - V5608. PVS-Studio 7.13 is available here.
Of course, there's much work yet to be done. We plan to improve the taint analysis mechanics, as well as develop new diagnostics. So I have a question for you. What things would you like to see our taint analysis do? If you have any thoughts and ideas, please do let us know!
As usual, I also invite you to follow my Twitter account. ;)
0