To get a trial key
fill out the form below
Team license
Enterprise license
** By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
USD
EUR
RUB
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
** By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
** By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
** By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
V5626. OWASP. Possible ReDoS vulnerabil…
Analyzer diagnostics
General Analysis (C++)
General Analysis (C#)
General Analysis (Java)
Diagnosis of micro-optimizations (C++)
Diagnosis of 64-bit errors (Viva64, C++)
Customer specific requests (C++)
MISRA errors
AUTOSAR errors
OWASP errors (C#)
Problems related to code analyzer
Additional information
Contents

V5626. OWASP. Possible ReDoS vulnerability. Potentially tainted data is processed by regular expression that contains an unsafe pattern.

Oct 07 2022

The analyzer detected that a potentially dangerous regular expression is used for processing data from external source. This can make an application vulnerable to a ReDoS attack.

What is a ReDoS attack?

ReDoS is a denial of service attack caused by a vulnerable regular expression. The aim of an attacker is to pass a tainted string to a regular expression. The string is designed in such a way that evaluating it will require the maximum amount of time.

A regular expression is vulnerable if it meets the following conditions:

  • There are two subexpressions, and one of them includes another. One of the following quantifiers is applied to each of them: '*', '+', '*?', '+?', '{...}'. For example, the 'x+' subexpression is included in '(x+)*');
  • There is a string that can be matched with both subexpressions. For example, the 'xxxx' string may fit both the 'x+' and the '(x+)*' templates.

Therefore, when receiving a warning of this diagnostic rule, you should check a regular expression for presence of subexpressions of the following forms:

  • ...(a+)+...
  • ...(b?a*c?)+...
  • ...(.+a+c?)*?...
  • ...(a+){x}...
  • ...(...|a...|...|a?...|...)+..., etc.

Here 'a', 'b', 'c' can be:

  • separate characters;
  • a set of characters in square brackets '[...]';
  • a subexpression in parenthesis '(...)';
  • any class of characters that are supported by a regular expression ('\d', '\w', '.', etc.).

It is also important that after these subexpressions there should be at least one subexpression not marked with quantifiers '?' or '*'. For example: '(x+)+y', '(x+)+$', '(x+)+(...)', ' (x+)+[...]'.

Let's analyze this issue on the example of '(x+)+y'. In this expression, the 'x+' pattern can match any number of 'x' characters. The string that matches the '(x+)+y' pattern consists of any number of substrings matched with 'x+'. As a result, there is a large number of options for matching the same string with a regular expression.

You can see several options for matching the 'xxxx' string with the '(x+)+y' template in the table below:

V5626/image1.png

Every time a regular expression fails to find the 'y' character at the end of the string, it starts checking the next option. Only after checking all of them, the regular expression gives the answer – no matches were found. However, this process can be executed for a catastrophically long time, depending on the length of the substring corresponding to the vulnerable pattern.

The graph below shows how the (x+)+y regular expression's calculation time depends on the number of characters in the input strings of the 'xx....xx' form:

V5626/image2.png

Look at the code example:

Regex _datePattern = new Regex(@"^(-?\d+)*$");

public void ProcessArticle(Article article)
{
  if (IsDateCorrect(article.Date))
    ....
}

public bool IsDateCorrect(string date)
{
  if (_datePattern.IsMatch(date))
    ....
}

In this example, the date of the article is checked with a regular expression. If the date is correct, the regular expression works as expected. The situation changes if the application receives the following string as the date:

3333333333333333333333333333333333333333333333333333333333333 Hello ReDoS!

In this case, processing with a regular expression will take a long time. Receiving several requests with similar data may create a heavy load on the application.

A possible solution is to limit the time a regular expression spends on processing the input string:

Regex _datePattern = new Regex(@"^(-?\d+)*$", 
                               RegexOptions.None, 
                               TimeSpan.FromMilliseconds(10));

Look at another example. The '(\d|[0-9]?)' subexpression was intentionally added to the regular expression to show the essence of the problem.

Regex _listPattern = new Regex(@"^((\d|[0-9]?)(,\s|\.))+$(?<=\.)");
public void ProcessItems(string path)
{
  using (var reader = new StreamReader(path))
  {
    while (!reader.EndOfStream)
    {
      string line = reader.ReadLine();
      if (line != null && _listPattern.IsMatch(line))
      ....
    }
  }
}

Here, the data is read from the file. A regular expression checks the data for compliance with the following pattern: the string should be a list, each element of which is a digit or an empty string. The correct input may look like this:

3, 4, 5, 3, 4, 5, 3, 4, 5, 3, 4, 5, 3, 4, 5, 3, 4, 5, 3, 4, 5, 3, 4, 5, 3, 4.

When processing such data, the regular expression works in normal time. However, if we pass the same string but without the dot at the end, the application will take much more time to process it.

The regular expression uses the '\d' and '[0-9]?' subexpressions, which can be matched with the same values. Note that the '? ' quantifier is applied to the second subexpression and the '+' quantifier is applied to the '((\d|[0-9]?)(,\s|\.))' parent subexpression. This leads to a large number of possible matches in the string. If we remove at least one of these quantifiers, the ReDoS attack won't happen.

In this example, to eliminate the ReDoS vulnerability, it is enough to remove the unnecessary matches:

Regex _listPattern = new Regex(@"^([0-9]?(,\s|\.))+$(?<=\.)");

You can read more about ReDoS vulnerabilities on the OWASP website.

How to eliminate a ReDoS vulnerability?

There are several options. Let's inspect them using the '^(-?\d+)*$' regular expression as an example.

Option 1. Add a limit on the processing time a regular expression spends on a string. This can be done by setting the 'matchTimeout' parameter when you create the 'Regex' object or when you call a static method:

RegexOptions options = RegexOptions.None;
TimeSpan timeout = TimeSpan.FromMilliseconds(10);
    
Regex datePattern = new Regex(@"^(-?\d+)*$", options, timeout);
Regex.IsMatch(date, @"^(-?\d+)*$", options, timeout);

Option 2. Use atomic groups '(?>...)'. Atomic groups disable the search for all possible combinations of characters corresponding to a subexpression, being limited to only one:

Regex datePattern = new Regex(@"^(?>-?\d+)*$");

Option 3. Rewrite the regular expression by removing the dangerous pattern. Suppose that the '^(-?\d+)*$' expression is designed to search for a date of the '27-09-2022' form. In this case it can be replaced with a more reliable analogue:

Regex datePattern = new Regex (@"^(\d{2}-\d{2}-\d{4})$");

Here any substring is matched with no more than one subexpression due to the mandatory check of the '-' character between the '\d{...}' templates.

This diagnostic is classified as:

Unicorn with delicious cookie
Our website uses cookies to enhance your browsing experience.
Accept