We delved into the world of web development and discovered a bizarre creature—an elephant with the habits of a bug. While exploring the PHP project, we found the reason behind its strange behavior. Let's take a peek at some unusual cases that can lead to unexpected results.
PHP is one of the most popular scripting languages for server-side web programming. As an open-source project, it naturally caught our attention.
We've already analyzed this project a few years ago, but as time passes, the product evolves, our tools become more advanced, and developers don't stop to invent new ways to shoot themselves in the foot.
In this article, we'll talk about the C and C++ languages, where undefined behavior lurks around every corner. We even recently published an entire book on this topic: A C++ Programmer's Guide to Undefined Behavior.
Let's get back to our elephants in a bug's clothing. We checked the PHP project at the c998c36
commit using PVS-Studio analyzer and the Visual Studio Code plugin.
Fragment N1
int fcgi_listen(const char *path, int backlog)
{
char *s;
int tcp = 0;
....
if (!tcp) {
chmod(path, 0777);
}
....
}
The PVS-Studio warning:
V1118 Excessive file permissions can lead to vulnerabilities. Consider restricting file permissions. fastcgi.c 761
Here is a small code fragment—what could be wrong here? Let's figure it out. The chmod
function is used to set file permissions. In this case, the 0777
mask allows all users to read, modify, or execute this file. This can lead to serious consequences: attackers can exploit it and disrupt the program's operation. To avoid this, it's recommended to use more gentle mode and additional conditions for setting access rights.
By the way, this is the first error we detected using this diagnostic rule. It was introduced recently, but is already yielding fruitful results. Try PVS-Studio on your project—it's free for evaluation and can help you catch security issues in your code.
Fragment N2
This fragment contains over a thousand characters in a single line. There's no way to spot the problem like that, so we've split it up for the article. By the way, we can understand what the problem with code sausage is in a short article.
static inline zend_string* phar_get_stub(....)
{
....
static const char newstub1_1[] =
"Extract_Phar::$temp))
{
\nheader('HTTP/1.0 404 Not Found');
\necho
\"<html>
\\n <head>
\\n <title>File Not Found<title> // <=
\\n </head>
\\n <body>
\\n <h1>404 - File Not Found</h1>
\\n </body>
\\n</html>\";
// part of the string is hidden
";
....
}
The PVS-Studio warning:
V735 Possibly an incorrect HTML. The "\</head\>" closing tag was encountered, while the "\</title\>" tag was expected. stub.h 23
We didn't even notice the elephant in the room. The PHP project has caught the warning about an incorrectly closed tag. It's such a tiny error, but it breaks the entire layout, and the browser refuses to show this page. The fix? Just swap the opening tag for the closing one:
\\n <title>File Not Found</title>
Fragment N3
static void zend_compile_class_decl(....)
{
....
if (....) {
if (toplevel) {
if (extends_ast) {
zend_class_entry *parent_ce = zend_lookup_class_ex(
ce->parent_name, NULL, ZEND_FETCH_CLASS_NO_AUTOLOAD); // <=
....
}
}
}
....
if (ce->parent_name) { // <=
/* Lowercased parent name */
zend_string *lc_parent_name = zend_string_tolower(ce->parent_name);
opline->op2_type = IS_CONST;
LITERAL_STR(opline->op2, lc_parent_name);
}
}
Here we can see the following pattern: the ce->parent_name
pointer is passed as an argument to the zend_lookup_class_ex
function, and then checked for NULL
. This implies that the pointer can be NULL
and used only after the check.
But here, it's passed to the zend_lookup_class_ex
function without any checks. If there is an additional condition inside the function, there is nothing wrong with that. Let's take a look inside the function: the analyzer has already checked it and pointed to the line with dereference:
ZEND_API zend_class_entry *zend_lookup_class_ex(zend_string *name,
zend_string *key,
uint32_t flags)
{
zend_class_entry *ce = NULL;
zval *zv;
zend_string *lc_name;
zend_string *autoload_name;
uint32_t ce_cache = 0;
if (( zval_gc_flags((name)->gc.u.type_info) & (1<<5)) // <=
&& __builtin_expect(
!!((zend_gc_refcount(&(name)->gc)-1)/sizeof(void *)
<(compiler_globals.map_ptr_last)), 1)) {
ce_cache = GC_REFCOUNT(name);
ce = GET_CE_CACHE(ce_cache);
if (EXPECTED(ce)) {
return ce;
}
}
....
}
Boom! The name
parameter, which received the ce->parent_name
pointer, is used without any check. As we know, dereferencing a null pointer leads to undefined behavior and it could be anything.
The PVS-Studio warning:
V595 The 'ce->parent_name' pointer was utilized before it was verified against nullptr. Check lines: 'zend_execute_API.c:1169', 'zend_compile.c:9135', 'zend_compile.c:9165'. zend_compile.c 9135
Fragment N4
ZEND_API zend_result zend_register_functions(...)
{
while (ptr->fname) {
reg_function = malloc(sizeof(zend_internal_function));
memcpy(reg_function, &function, sizeof(zend_internal_function));
}
}
The PVS-Studio warning:
V575 The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 3069, 3068. zend_API.c 3069
The analyzer issues a warning about undefined behavior when the null pointer is passed to the memcpy
function. "But where's the NULL
pointer here? We can see the memory allocation in the above line," an attentive reader might argue. Unfortunately, here's the catch: if memory allocation fails, the malloc
function may return the null pointer. Therefore, after allocating memory, we need to check for NULL
.
Fragment N5
ZEND_API void zend_collect_module_handlers(void)
{
modules_dl_loaded = realloc(modules_dl_loaded,
sizeof(zend_module_entry*)
* (dl_loaded_count + 1));
modules_dl_loaded[dl_loaded_count] = NULL;
}
The PVS-Studio warning:
V522 There might be dereferencing of a potential null pointer 'modules_dl_loaded'. Check lines: 2527, 2526. zend_API.c 2527
Working with memory is the core of C and C++; that's why it's so easy to shoot ourselves in the foot. Just like the malloc
function, the realloc
function, may return the null pointer.
Fragment N6
ZEND_API void zend_collect_module_handlers(void)
{
modules_dl_loaded = realloc(modules_dl_loaded,
sizeof(zend_module_entry*)
* (dl_loaded_count + 1));
modules_dl_loaded[dl_loaded_count] = NULL;
}
The PVS-Studio warning:
V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'modules_dl_loaded' is lost. Consider assigning realloc() to a temporary pointer. zend_API.c 2526
Let's check your attention! Do you notice that this is the same code fragment, yep? But there's another issue here: developers pass the same pointer with the returned result to the realloc
function. If the function returns NULL
, the original pointer will be lost, leading to a memory leak.
Fragment N7
typedef union _znode_op {
uint32_t num;
....
}
struct _zend_op {
_znode_op op2;
....
}
struct _zend_op_array {
uint32_t num_dynamic_func_defs;
....
}
typedef struct _zend_op_array zend_op_array;
static void preload_remove_declares(zend_op_array *op_array)
{
zend_op *opline = op_array->opcodes;
....
while (opline != end) {
switch (opline->opcode) {
....
case ZEND_DECLARE_FUNCTION:
....
if (func && func == op_array->dynamic_func_defs[opline->op2.num]) {
....
if (op_array->num_dynamic_func_defs == 0) {
dynamic_func_defs = NULL;
} else {
....
if (op_array->num_dynamic_func_defs - opline->op2.num > 0) { // <=
memcpy(
dynamic_func_defs + opline->op2.num,
op_array->dynamic_func_defs + (opline->op2.num + 1),
sizeof(zend_op_array*)
* (op_array->num_dynamic_func_defs - opline->op2.num));
}
// the rest of the code
}
It's time to turn on our mathematical brains. The op_array->num_dynamic_func_defs
and opline->op2.num
variables are unsigned types, which means that the expression will also be unsigned and can never be less than 0
.
What do we see here? It's a subtraction of two unsigned values. If the minuend is less than the subtrahend, the result will be a value less than zero. However, since there are unsigned values, we get the modulo arithmetic, 0u - 1 == UINT_MAX
. So, the condition can only be false when the variables aren't equal to each other.
This expression should be replaced with the following:
if (op_array->num_dynamic_func_defs > opline->op2.num)
Now the code is shorter and cleaner. Maybe the developer made a logical mistake.
The PVS-Studio warning:
V555 The expression of the 'A - B > 0' kind will work as 'A != B'. ZendAccelerator.c 3915
Fragment N8
uint64_t flags
....
PHPDBG_API void phpdbg_delete_breakpoint(zend_ulong num)
{
....
if ((brake = phpdbg_find_breakbase_ex(num, &table, &numkey, &strkey))) {
int type = brake->type;
char *name = NULL;
size_t name_len = 0L;
switch (type) {
....
default: {
if (zend_hash_num_elements(table) == 1) {
PHPDBG_G(flags) &= ~(1<<(brake->type+1)); // <=
}
}
}
....
}
}
The PVS-Studio warnings:
V629 Consider inspecting the '1 << (brake->type + 1)' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. phpdbg_bp.c 1209
V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. phpdbg_bp.c 1209
Mathematicians, no time to relax. The flags
variable has the unsigned long int
type, while brake->type
has int
. The code is designed to remove a specific bit in flags
. Now, let's take a closer look at what's really going on:
1
constant of the int
type is shifted left by a certain number of bits. Most often, int
is 32 bits. We hope that the shift isn't by 32 or more bits, otherwise it leads to undefined behavior.int
type.flags
. The loss of significant bits in flags
will occur when the right operand is positive. It'll only be so when there is a 31-bit shift to the left, i.e., when the 31st bit in flags
needs to be cleared. Catch a proof.Notice how much you need to keep in mind for such a harmless expression? The problem lies in the different operand sizes and signedness in some subexpressions. To fix it, developers just need to change the type of the 1
constant from int
to unsigned long long
, and everything will work as intended:
PHPDBG_G(flags) &= ~( 1uLL <<(brake->type+1));
Fragment N9
typedef signed long long timelib_sll;
....
timelib_sll timelib_get_current_offset(timelib_time *t);
....
static void php_do_date_sunrise_sunset(INTERNAL_FUNCTION_PARAMETERS,
bool calc_sunset)
{
double latitude, longitude, zenith, gmt_offset, altitude;
....
if (gmt_offset_is_null) {
gmt_offset = timelib_get_current_offset(t) / 3600; // <=
}
....
}
The PVS-Studio warning:
V636 The 'timelib_get_current_offset(t) / 3600' expression was implicitly cast from 'long long' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. php_date.c 5557
The result of dividing two integer types is an integer type. So why was the result assigned to a floating-point variable?
To preserve calculation accuracy, one of the operands should be converted to the double
type:
gmt_offset = timelib_get_current_offset(t) / 3600.0;
Fragment N10
static zend_always_inline zend_result _zend_update_type_info(....)
{
....
if (ssa_op->result_def >= 0) {
tmp = MAY_BE_ANY | MAY_BE_ARRAY_KEY_ANY
| MAY_BE_ARRAY_OF_ANY | MAY_BE_ARRAY_OF_REF;
if (opline->result_type == IS_TMP_VAR) {
if ( opline->opcode == ZEND_FETCH_R
|| opline->opcode == ZEND_FETCH_IS) {
/* Variable reference counter may be decremented before use */
/* See: ext/opcache/tests/jit/fetch_r_001.phpt */
tmp |= MAY_BE_RC1 | MAY_BE_RCN;
} else {
tmp |= MAY_BE_RC1 | MAY_BE_RCN;
}
}
}
....
}
The PVS-Studio warning:
V523 The 'then' statement is equivalent to the 'else' statement. zend_inference.c 4078
This is a classic copy-paste error. Regardless of how the if
statement evaluates, the code performs exactly the same operation—it adds the MAY_BE_RC1 | MAY_BE_RCN
flags to the tmp
variable.
If there really should be different logic here, it's now missing, and the code doesn't function as expected. If it's not, why do developers use if
?
Fragment N11
#define zend_error_noreturn_impl(type, format) do { \
zend_string *filename; \
uint32_t lineno; \
va_list args; \
get_filename_lineno(type, &filename, &lineno); \
va_start(args, format); \
zend_error_va_list(type, filename, lineno, format, args); \
va_end(args); \
/* Should never reach this. */ \
abort(); \ // <=
} while (0)
....
ZEND_API ZEND_COLD ZEND_NORETURN
void zend_error_noreturn(int type, const char *format, ...)
{
zend_error_noreturn_impl(type, format);
}
....
ZEND_API zend_result zend_startup_module_ex(zend_module_entry *module)
{
....
if (module->module_startup_func) {
EG(current_module) = module;
if (module->module_startup_func(module->type,
module->module_number)==FAILURE) {
zend_error_noreturn(E_CORE_ERROR,
"Unable to start %s module",
module->name);
EG(current_module) = NULL; // <=
return FAILURE;
}
EG(current_module) = NULL;
}
return SUCCESS;
}
The PVS-Studio warning:
V779 Unreachable code detected. It is possible that an error is present. zend_API.c 2437
Unreachable code after calling a function that doesn't return control flow. Such code is not only useless, but also misleading.
After digging through the logs a bit, we found an interesting commit from 10 years ago. Previously, errors were simply logged and execution continued, but then this principle was changed for critical errors. In the commit, developers replaced a call to the zend_error
function, which returns control flow, with a call to zend_error_noreturn
, but the surrounding code wasn't refactored accordingly.
Fragment N12
static void check_conflict(LexState*ls,struct LHS_assign*lh,expdesc*v){
FuncState*fs = ls->fs;
int extra = fs->freereg;
int conflict = 0;
for(; lh; lh = lh->prev){
if(lh->v.k == VINDEXED){
if(lh->v.u.s.info == v->u.s.info){ // <=
conflict = 1;
lh->v.u.s.info = extra;
}
if(lh->v.u.s.aux == v->u.s.info){ // <=
conflict = 1;
lh->v.u.s.aux = extra;
}
}
}
if(conflict){
luaK_codeABC(fs, OP_MOVE, fs->freereg, v->u.s.info, 0);
luaK_reserveregs(fs, 1);
}
}
The PVS-Studio warning:
V778 Two similar code fragments were found. Perhaps, this is a typo and 'aux' variable should be used instead of 'info'. minilua.c 4331
There are two similar code fragments. Manually writing code instead of copying may be a bit tedious—but it helps avoid subtle bugs like this one.
In this case, not all occurrences of info
were correctly replaced with aux
in the condition. As a result, the program may behave incorrectly or produce invalid output.
That wraps up our journey through the tangled jungle of the PHP code. Let's keep working to make software safer. I would like to remind you that PVS-Studio has free licenses for open-source projects and educational purposes. For others, a trial version of the analyzer is available—give it a try!
0