To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
A Post About Analyzing PHP

A Post About Analyzing PHP

Sept. 1, 2014

PHP is a server-side scripting language designed for web development but also used as a general-purpose programming language. As of January 2013, PHP was installed on more than 240 million websites (39% of those sampled) and 2.1 million web servers. Originally created by Rasmus Lerdorf in 1994, the reference implementation of PHP (powered by the Zend Engine) is now produced by The PHP Group. While PHP originally stood for Personal Home Page, it now stands for PHP: Hypertext Preprocessor, which is a recursive acronym.

When developing compilers and interpreters, their source code and its testing procedure are demanded to comply with especially strict quality and reliability requirements. However, there are still some suspicious fragments found in the PHP interpreter's source code.

In this article, we are going to discuss the results of the check of the PHPhttp://www.asterisk.org/ interpreter by PVS-Studio 5.18.

0277_php/image1.png

Identical conditional expressions

V501 There are identical sub-expressions '!memcmp("auto", charset_hint, 4)' to the left and to the right of the '||' operator. html.c 396

static enum
entity_charset determine_charset(char *charset_hint TSRMLS_DC)
{
  ....
  if ((len == 4) /* sizeof (none|auto|pass) */ && // <=
    (!memcmp("pass", charset_hint, 4) ||
     !memcmp("auto", charset_hint, 4) ||          // <=
     !memcmp("auto", charset_hint, 4)))           // <=
  {
       charset_hint = NULL;
      len = 0;
  }
  ....
}

The conditional expression contains a few calls of the 'memcmp' function with identical arguments. The comment /* sizeof (none|auto|pass) */ suggests that the "none" value should be passed into one of the functions.

Always false condition

V605 Consider verifying the expression: shell_wrote > - 1. An unsigned value is compared to the number -1. php_cli.c 266

PHP_CLI_API size_t sapi_cli_single_write(....)
{
  ....
  size_t shell_wrote;
  shell_wrote = cli_shell_callbacks.cli_shell_write(....);
  if (shell_wrote > -1) {  // <=
    return shell_wrote;
  }
  ....
}

This comparison is an evident error. The value '-1' turns into the largest value of the 'size_t' type, so the condition will always be false, thus making the entire check absolutely meaningless. Perhaps the 'shell_wrote' variable used to be signed earlier but then refactoring was done and the programmer forgot about the specifics of operations over unsigned types.

Incorrect condition

V547 Expression 'tmp_len >= 0' is always true. Unsigned type value is always >= 0. ftp_fopen_wrapper.c 639

static size_t php_ftp_dirstream_read(....)
{
  size_t tmp_len;
  ....
  /* Trim off trailing whitespace characters */
  tmp_len--;
  while (tmp_len >= 0 &&                  // <=
    (ent->d_name[tmp_len] == '\n' ||
     ent->d_name[tmp_len] == '\r' ||
     ent->d_name[tmp_len] == '\t' ||
     ent->d_name[tmp_len] == ' ')) {
       ent->d_name[tmp_len--] = '\0';
  }
  ....
}

The 'size_t' type, being unsigned, allows one to index the maximum number of array items possible under the current application's bitness. The (tmp_len >= 0) check is incorrect. In the worst case, the decrement may cause an index overflow and addressing memory outside the array's boundaries. The code executing correctly is most probably thanks to additional conditions and correct input data; however, there is still the danger of a possible infinite loop or array overrun in this code.

Difference of unsigned numbers

V555 The expression 'out_buf_size - ocnt > 0' will work as 'out_buf_size != ocnt'. filters.c 1702

static int strfilter_convert_append_bucket(
{
  size_t out_buf_size;
  ....
  size_t ocnt, icnt, tcnt;
  ....
  if (out_buf_size - ocnt > 0) { // <=
    ....
    php_stream_bucket_append(buckets_out, new_bucket TSRMLS_CC);
  } else {
    pefree(out_buf, persistent);
  }
  ....
}

It may be that the 'else' branch executes more rarely than it should as the difference of unsigned numbers is almost always larger than zero. The only exception is when the operands are equal. Then the condition should be changed to a more informative version.

Pointer dereferencing

V595 The 'function_name' pointer was utilized before it was verified against nullptr. Check lines: 4859, 4860. basic_functions.c 4859

static int user_shutdown_function_call(zval *zv TSRMLS_DC)
{
  ....
  php_error(E_WARNING, "....", function_name->val);  // <=
  if (function_name) {                               // <=
    STR_RELEASE(function_name);
  }
  ....
}

Checking a pointer after dereferencing always alerts me. If a real error occurs, the program may crash.

Another similar issue:

  • V595 The 'callback_name' pointer was utilized before it was verified against nullptr. Check lines: 5007, 5021. basic_functions.c 5007

Insidious optimization

V597 The compiler could delete the 'memset' function call, which is used to flush 'final' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. php_crypt_r.c 421

/*
 * MD5 password encryption.
 */
char* php_md5_crypt_r(const char *pw,const char *salt, char *out)
{
  static char passwd[MD5_HASH_MAX_LEN], *p;
  unsigned char final[16];
  ....
  /* Don't leave anything around in vm they could use. */
  memset(final, 0, sizeof(final));  // <=
  return (passwd);
}

The 'final' array may contain private password information which is then cleared, but the call of the 'memset' function will be removed by the compiler. To learn more why it may happen and what is dangerous about it, see the article "Overwriting memory - why?" and the description of the V597 diagnostic.

Other similar issues:

  • V597 The compiler could delete the 'memset' function call, which is used to flush 'final' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. php_crypt_r.c 421
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'output' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. crypt.c 214
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha512.c 622
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha512.c 625
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'alt_ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha512.c 626
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha256.c 574
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha256.c 577
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'alt_ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha256.c 578

Can we trust the libraries we use?

Third-party libraries do make a large contribution to project development allowing one to reuse already implemented algorithms, but their quality should be checked as carefully as that of the basic project code. I will cite just a few examples from third-party libraries to meet the article's topic and simply muse over the question of our trust in third-party libraries.

The PHP interpreter employs plenty of libraries, some of them slightly customized by the authors for their needs.

libsqlite

V579 The sqlite3_result_blob function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. sqlite3.c 82631

static void statInit(....)
{
  Stat4Accum *p;
  ....
  sqlite3_result_blob(context, p, sizeof(p), stat4Destructor);
  ....
}

I guess the programmer wanted to get the size of the object, not the pointer. So it should have been sizeof(*p).

pcrelib

V501 There are identical sub-expressions '(1 << ucp_gbL)' to the left and to the right of the '|' operator. pcre_tables.c 161

const pcre_uint32 PRIV(ucp_gbtable[]) = {
  (1<<ucp_gbLF),
  0,
  0,
  ....
  (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)|    // <=
    (1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT), // <=

   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbV)|
     (1<<ucp_gbT),
  ....
};

The expression calculating one array item contains the repeating (1<<ucp_gbL) statement. Judging by the code following this fragment, one of the ucp_gbL variables was meant to be named ucp_gbT, or it is just an unnecessary one.

PDO

V595 The 'dbh' pointer was utilized before it was verified against nullptr. Check lines: 103, 110. pdo_dbh.c 103

PDO_API void pdo_handle_error(pdo_dbh_t *dbh, ....)
{
  pdo_error_type *pdo_err = &dbh->error_code;  // <=
  ....
  if (dbh == NULL || dbh->error_mode == PDO_ERRMODE_SILENT) {
    return;
  }
  ....
}

In this fragment, in the very beginning of the function, a received pointer is dereferenced and then is checked for being null.

libmagic

V519 The '* code' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 100, 101. encoding.c 101

protected int file_encoding(...., const char **code, ....)
{
  if (looks_ascii(buf, nbytes, *ubuf, ulen)) {
    ....
  } else if (looks_utf8_with_BOM(buf, nbytes, *ubuf, ulen) > 0) {
    DPRINTF(("utf8/bom %" SIZE_T_FORMAT "u\n", *ulen));
    *code = "UTF-8 Unicode (with BOM)";
    *code_mime = "utf-8";
  } else if (file_looks_utf8(buf, nbytes, *ubuf, ulen) > 1) {
    DPRINTF(("utf8 %" SIZE_T_FORMAT "u\n", *ulen));
    *code = "UTF-8 Unicode (with BOM)";                     // <=
    *code = "UTF-8 Unicode";                                // <=
    *code_mime = "utf-8";
  } else if (....) {
    ....
  }
}

The character set was twice written into the variable. One of these statements is redundant and may cause incorrect program behavior somewhere later.

Conclusion

Despite that PHP has existed for a long time already and is pretty famous, there are still a few suspicious fragments to be found in its basic code and the third-party libraries it employs, although a project like that is very likely to be checked by various analyzers.

Using static analysis regularly will help you save much time you can spend on solving more useful tasks.

Popular related articles
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: 11.21.2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
The Evil within the Comparison Functions

Date: 05.19.2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
PVS-Studio for Java

Date: 01.17.2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
The Ultimate Question of Programming, Refactoring, and Everything

Date: 04.14.2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
Static analysis as part of the development process in Unreal Engine

Date: 06.27.2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
The way static analyzers fight against false positives, and why they do it

Date: 03.20.2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
PVS-Studio ROI

Date: 01.30.2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Appreciate Static Code Analysis!

Date: 10.16.2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
Free PVS-Studio for those who develops open source projects

Date: 12.22.2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: 10.22.2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept