>
>
>
Shoot yourself in the foot when handlin…

Sergey Vasiliev
Articles: 96

Shoot yourself in the foot when handling input data

The linking concept of today's article differs from usual. This time it is not one project, the source code of which was analyzed, but a number of warnings related to one and the same diagnostic rule in several projects. What's interesting about this? The point is that some of considered code fragments contain errors reproduced when working with the application and other fragments even represent vulnerabilities (CVE). In addition, at the end of the article you will find a small talk on security defects.

Brief Preface

All errors, which will be regarded today in the article, have a similar pattern:

  • a program receives data from the stdin;
  • a check of successful data reading is performed;
  • if the data is read successfully, newline character is removed from a string.

However, all fragments that will be considered, contain errors and are vulnerable to the intentionally malformed input. As data is received from a user, who can disrupt the logic of application execution, it was extremely tempting to try breaking something. That's what I did.

All problems listed below were detected by a PVS-Studio static analyzer that searches for errors in code, not only for C and C++ languages, but also for C# and Java ones.

It's already great to find a problem by a static analyzer, but to find and reproduce it - that's a totally different level of delight. :)

FreeSWITCH

The first suspicious code fragment was detected in the fs_cli.exe module code, included in the FreeSWITCH distribution:

static const char *basic_gets(int *cnt)
{
  ....
  int c = getchar();
  if (c < 0) {
    if (fgets(command_buf, sizeof(command_buf) - 1, stdin) 
          != command_buf) {
      break;
    }
    command_buf[strlen(command_buf)-1] = '\0'; /* remove endline */
    break;
  }
  ....
}

PVS-Studio warning: V1010 CWE-20 Unchecked tainted data is used in index: 'strlen(command_buf)'.

The analyzer warns about suspicious access to the command_buf array by an index. It is considered suspicious because unchecked external data is used as an index. Data is external as it was received through the fgets function from the stdin. Data is unchecked as there was no check before using. The expression fgets(command_buf, ....) != command_buf doesn't count as in this case we check only the fact of receiving data, not its content.

The problem of this code is that under certain circumstances there will be a recording '\0' outside the array, which will lead to the undefined behavior. For this, it is enough to just enter a zero-length string (a zero-length string in terms of the C language, i.e. the one in which the first character will be '\0').

Let's get a rough estimate of what will happen when feeding a zero-length string to the function:

  • fgets(command_buf, ....) -> command_buf;
  • fgets(....) != command_buf -> false (then-branch of the if statement is ignored);
  • strlen(command_buf) -> 0;
  • command_buf[strlen(command_buf) - 1] -> command_buf[-1].

Ooops!

What is interesting here is that this analyzer warning can be pretty "grasped between fingers". In order to reproduce the problem, you need to:

  • get program execution to this function;
  • adjust the input so that the call of getchar() returned a negative value;
  • pass a string with a terminal null to the fgets function in the beginning and a function must successfully read the string.

Digging in sources for a while, I have formed a specific sequence of the problem reproducing:

  • Run fs_cli.exe in a batch-mode (fs_cli.exe -b). I'd like to note that to perform further steps, you need to make sure the connection to the fs_cli.exe server has been successful. For this purpose it is enough, for example, to locally run FreeSwitchConsole.exe as administrator.
  • After that we need to perform the input so that the call of getchar() returned a negative value.
  • Now let's enter a string with a terminal null in the beginning (for example, '\0Oooops').
  • ....
  • PROFIT!

You can find a video of reproducing the problem below:

NcFTP

A similar problem has been detected in the NcFTP project, but only it occurred already in two places. As the code looks similar, we'll consider only one problem case:

static int NcFTPConfirmResumeDownloadProc(....)
{
  ....
  if (fgets(newname, sizeof(newname) - 1, stdin) == NULL)
    newname[0] = '\0';
  newname[strlen(newname) - 1] = '\0';
  ....
}

PVS-Studio warning: V1010 CWE-20 Unchecked tainted data is used in index: 'strlen(newname)'.

Here, unlike the example from FreeSWITCH, the code is worse and more prone to problems. For example, the recording '\0' occurs regardless of the fact whether the reading using fgets happened successfully or not. So here there are even more possibilities how to disrupt the normal execution logic. Let's follow the proven way of zero-length strings.

The problem is reproduced less harder, than in case of FreeSWITCH. The sequence of steps is described below:

  • running and connection with a server, from which you can download a file. For example, I used speedtest.tele2.net (eventually, start application command looks as follows: ncftp.exe ftp://speedtest.tele2.net);
  • downloading the file from the server. The file with such a name but other properties has to already exist locally. For example, you can download the file from the server, change it and try again running a download command (for example, get 512KB.zip);
  • on the question about choice of action, answer with a string starting with 'N' character (for example, Now let's have some fun);
  • enter '\0' (or something more interesting);
  • ....
  • PROFIT!

Reproducing of a problem is also available on a video:

OpenLDAP

In the OpenLDAP project (more precisely - in one of related utilities) developers make the same errors, as in FreeSWITCH. Attempt to delete newline character occurs only if a string was read successfully but there also is no protection from zero-length strings.

Code fragment:

int main( int argc, char **argv )
{
  char buf[ 4096 ];
  FILE *fp = NULL;
  ....
  if (....) {
    fp = stdin;
  }
  ....
  if ( fp == NULL ) {
    ....
  } else {
    while ((rc == 0 || contoper)
           && 
           fgets(buf, sizeof(buf), fp) != NULL) {
      buf[ strlen( buf ) - 1 ] = '\0'; /* remove trailing newline */

      if ( *buf != '\0' ) {
        rc = dodelete( ld, buf );
        if ( rc != 0 )
          retval = rc;
        }
      }
  }
  ....
}

PVS-Studio warning: V1010 CWE-20 Unchecked tainted data is used in index: 'strlen(buf)'.

Let's omit redundant part so that the point of the problem was more obvious:

while (.... && fgets(buf, sizeof(buf), fp) != NULL) {
  buf[ strlen( buf ) - 1 ] = '\0';
  ....
}

This code is better than in NcFTP, but is still vulnerable. If you feed in a zero-length string when calling fgets:

  • fgets(buf, ....) -> buf;
  • fgets(....) != NULL -> true (the body of the while loop starts executing);
  • strlen(buf) - 1 -> 0 - 1 -> -1;
  • buf[-1] = '\0'.

libidn

The errors, reviewed above, are quite juicy, they can be consistently reproduced, you can "touch" them. Unless, I just didn't get around to reproducing problems on OpenLDAP. Nevertheless, you cannot call them vulnerabilities at least for the reason that these problems aren't assigned CVE-IDs.

However, some real vulnerabilities have the same problem pattern. Both of the code fragments given below, relate to the libidn project.

Code fragment:

int main (int argc, char *argv[])
{
  ....
  else if (fgets (readbuf, BUFSIZ, stdin) == NULL)
  {
    if (feof (stdin))
      break;
  
    error (EXIT_FAILURE, errno, _("input error"));
  }

  if (readbuf[strlen (readbuf) - 1] == '\n')
    readbuf[strlen (readbuf) - 1] = '\0';
  ....
}

PVS-Studio warning: V1010 CWE-20 Unchecked tainted data is used in index: 'strlen(readbuf)'.

The situation is similar, except that unlike the previous examples, where a recording by index -1 took place, a reading is happening here. However, it is still undefined behavior. This error was given its own CVE identifier (CVE-2015-8948).

After the problem has been discovered, the code was changed as follows:

int main (int argc, char *argv[])
{
  ....
  else if (getline (&line, &linelen, stdin) == -1)
  {
    if (feof (stdin))
      break;

    error (EXIT_FAILURE, errno, _("input error"));
  }

  if (line[strlen (line) - 1] == '\n')
    line[strlen (line) - 1] = '\0';
  ....
}

A bit surprised? Well, it happens. A new vulnerability, here is the corresponding CVE: CVE-2016-6262.

PVS-Studio warning: V1010 CWE-20 Unchecked tainted data is used in index: 'strlen(line)'.

After another attempt, the problem was fixed by adding a check of the length of the input string:

if (strlen (line) > 0)
  if (line[strlen (line) - 1] == '\n')
    line[strlen (line) - 1] = '\0';

Let's take a look at the dates. Commit 'closing' CVE-2015-8948 - 10.08.2015. Commit closing CVE-2016-62-62 - 14.01.2016. So the difference between given fixes is 5 months! Here's the moment when you remind yourself about such an advantage of a static analyzer, as a detection of errors at the early stages of writing code...

Static Analysis and Security

I'm not going to give code examples from now on, instead of this - statistics and reasoning. In this section the author's opinion might be much more different than earlier in this article. : )

Note. I recommend checking out another article on the similar topic -"How Can PVS-Studio Help in the Detection of Vulnerabilities?". There are interesting examples of vulnerabilities that look like simple errors. Additionally, in that article, I talked a bit about terminology and the question why static analysis is a must have if you care about security issue.

Let's take a look at statistics about the number of detected vulnerabilities over the past 10 years to assess the situation. I took this data from the CVE Details site.

We've got an interesting situation here. Until 2014, the number of reported CVE hadn't not exceed 6000 units, and since than it hasn't been less. The most interesting thing here is, of course, statistics for the year 2017, which is the absolute leader (14714 units). With regard to the current year 2018, it hasn't ended yet, but already beats records - 15310 units.

Does this mean that all new soft is leaky as a sieve? I don't think so, and here's why:

  • Increased interest in the topic of vulnerabilities. Surely, even if you are not very close to the security issue, you must have repeatedly stumbled on articles, notes, reports and video dedicated to the topic of security. In other words, some sort of hype was created around it. Is that a bad thing? Well, it's not. Ultimately it all comes down to the fact that developers are more concerned about the security of applications, which is good.
  • The increase in the number of applications. Much code - the greater probability of any vulnerability that will fill up statistics.
  • Improved vulnerability searching tools and code quality assurance. The greater supply -> the greater demand. Analyzers, fuzzers and other tools are becoming more sophisticated, that plays into the hands for those who want to look for vulnerabilities (regardless of which side of the barricades they are).

So the emerging trend cannot be described as purely negative - vendors are more concerned about information security, problem searching tools are improving, undoubtedly, in a positive way.

Does this mean that you we relax and take it easy? I think not. If you are concerned about the security of your applications you should take as many security measures as possible. This is especially true when the source code is publicly available, because it:

  • is more prone to introducing vulnerabilities from outside;
  • is more prone to "sensing" by "gentlemen" who are interested in holes in your application with a view of their exploitation. Although well-wishers in this case will be able to help you anymore.

I don't want to say that you do not need to open source your projects. Just be mindful of proper quality control measures / security.

Is static analysis an additional measure in this regard? Yes! Static analysis is good at finding potential vulnerabilities that can later become quite real.

It seems to me (admittedly, that wrong) that many consider vulnerabilities as a fairly high-level phenomenon. Well, yes and no. Problems in code that seem to be simple programming errors, may well be serious vulnerabilities. Again, some examples of such vulnerabilities are listed in the article mentioned earlier. We shouldn't underestimate 'simple' errors.

Conclusion

Don't forget that input data can have a zero length, it's necessary to take it into account.

Draw your own conclusions whether all this hype about vulnerabilities is just a fuss or there is a real problem.

On my part, I'll just suggest trying PVS-Studio on your project if you haven't done so already.

All the best!