Perl 5 was chosen to expand the list of open source programming languages that have been tested using the PVS-Studio static code analyzer. This article is about found errors and difficulties when viewing analysis results. The number of macros in the code is so great that it seems that the code is written not in the C programming language, but in its peculiar dialect. In spite of the difficulties when viewing code, it was possible to collect interesting problems that will be demonstrated in this article.
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. Development of Perl 5 started in 1994. After a couple of decades, the code in the C programming language with many macros makes today's developers feel nervous.
Perl 5 source code was taken from the official repository (branch blead). To check the project, the PVS-Studio static code analyzer was used. The analysis was performed on the Linux operating system, but the analyzer is also available on Windows and macOS.
Viewing the analysis results was not a simple task. The fact of the matter is that the analyzer checks the preprocessed .i files, in which all preprocessor directives are already expanded, and issues warnings for source code files. This is correct behavior of the analyzer, you do not need to change anything, but many warnings are issued on macros! And unreadable code lies behind macros.
V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '-' operator. toke.c 9494
STATIC char *
S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, I32 ck_uni)
{
....
if ((s <= PL_bufend - (is_utf8)
? UTF8SKIP(s)
: 1)
&& VALID_LEN_ONE_IDENT(s, PL_bufend, is_utf8))
{
....
}
....
}
Let's start an overview with a nice error. Every few code reviews I have to repeat that the ternary operator has almost the lowest priority in calculations.
Let's look at the following code fragment with an error:
s <= PL_bufend - (is_utf8) ? UTF8SKIP(s) : 1
Order of operations that a programmer expects:
What is happening in reality:
Here is a chart with operations priorities: "Operation priorities in C/C++".
V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. re_exec.c 9193
STATIC I32
S_regrepeat(pTHX_ regexp *prog, char **startposp, const regnode *p,
regmatch_info *const reginfo, I32 max _pDEPTH)
{
....
assert(STR_LEN(p) == reginfo->is_utf8_pat ? UTF8SKIP(STRING(p)) : 1);
....
}
Code with a similar error. Nevertheless, if you do not know the priorities of operations, you can make a mistake in the expression of any size.
Another place with an assert:
V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '&&' operator. pp_hot.c 3036
PP(pp_match)
{
....
MgBYTEPOS_set(mg, TARG, truebase, RXp_OFFS(prog)[0].end);
....
}
And here is a warning for the macro... To understand what is happening, even macro implementation will not help, because it also uses several macros!
Therefore I cite a fragment of the preprocessed file for this line of code:
(((targ)->sv_flags & 0x00000400) && (!((targ)->sv_flags & 0x00200000) ||
S_sv_only_taint_gmagic(targ)) ? (mg)->mg_len = ((prog->offs)[0].end),
(mg)->mg_flags |= 0x40 : ((mg)->mg_len = (((targ)->sv_flags & 0x20000000)
&& !__builtin_expect(((((PL_curcop)->cop_hints + 0) & 0x00000008) ?
(_Bool)1 :(_Bool)0),(0))) ? (ssize_t)Perl_utf8_length( (U8 *)(truebase),
(U8 *)(truebase)+((prog->offs)[0].end)) : (ssize_t)((prog->offs)[0].end),
(mg)->mg_flags &= ~0x40));
Somewhere here the analyzer questioned about proper use of the ternary operator (3 of them), but I have not found enough energy to get what was going on in that code. We have already seen that the developers make such errors, so it could be likely here as well.
Three more cases of using this macro:
Note by a colleague Andrey Karpov. I have been meditating for 10 minutes on this code and I'm inclined to the view that there are no errors. Anyway, it's very painful to read such code, and it's better not to write this way.
V523 The 'then' statement is equivalent to the 'else' statement. toke.c 12056
static U8 *
S_add_utf16_textfilter(pTHX_ U8 *const s, bool reversed)
{
....
SvCUR_set(PL_linestr, 0);
if (FILTER_READ(0, PL_linestr, 0)) {
SvUTF8_on(PL_linestr);
} else {
SvUTF8_on(PL_linestr);
}
PL_bufend = SvEND(PL_linestr);
return (U8*)SvPVX(PL_linestr);
}
I think you can get by without inspecting the contents of macros to make sure that suspiciously duplicated code fragments take place.
V564 The '|' operator is applied to bool type value. You have probably forgotten to include parentheses or intended to use the '||' operator. op.c 11494
OP *
Perl_ck_rvconst(pTHX_ OP *o)
{
....
gv = gv_fetchsv(kidsv,
o->op_type == OP_RV2CV
&& o->op_private & OPpMAY_RETURN_CONSTANT
? GV_NOEXPAND
: iscv | !(kid->op_private & OPpCONST_ENTERED), iscv // <=
? SVt_PVCV
: o->op_type == OP_RV2SV
? SVt_PV
: o->op_type == OP_RV2AV
? SVt_PVAV
: o->op_type == OP_RV2HV
? SVt_PVHV
: SVt_PVGV);
....
}
This code is very strange. The "iscv | !(kid->op_private & OPpCONST_ENTERED)" expression isn't used anyway. These is clearly some sort of a typo here. For example, it is possible, that this should have been written here:
: iscv = !(kid->op_private & OPpCONST_ENTERED), iscv // <=
V547 Expression 'RETVAL == 0' is always true. Typemap.c 710
XS_EUPXS(XS_XS__Typemap_T_SYSRET_pass);
XS_EUPXS(XS_XS__Typemap_T_SYSRET_pass)
{
dVAR; dXSARGS;
if (items != 0)
croak_xs_usage(cv, "");
{
SysRet RETVAL;
#line 370 "Typemap.xs"
RETVAL = 0;
#line 706 "Typemap.c"
{
SV * RETVALSV;
RETVALSV = sv_newmortal();
if (RETVAL != -1) { // <=
if (RETVAL == 0) // <=
sv_setpvn(RETVALSV, "0 but true", 10);
else
sv_setiv(RETVALSV, (IV)RETVAL);
}
ST(0) = RETVALSV;
}
}
XSRETURN(1);
}
The RETVAL variable is checked twice in a row. However, it can be seen from the code that this variable is always equal to zero. Perhaps in one or in both conditions a developer wanted to check a pointer RETVALSV, but made a typo.
In the analyzer, there are several types of diagnostic rules, which search for bugs related to the sizeof operator usage. In the Perl 5 project, two such diagnostics summarily issued about a thousand of warnings. In this case, macros are to blame, not the analyzer.
V568 It's odd that the argument of sizeof() operator is the 'len + 1' expression. util.c 1084
char *
Perl_savepvn(pTHX_ const char *pv, I32 len)
{
....
Newx(newaddr,len+1,char);
....
}
In code there many similar macros. I chose one for example, we are interested in the argument "len + 1".
The marco is expanded by the preprocessor in the following way:
(newaddr = ((void)(__builtin_expect(((((( sizeof(size_t) < sizeof(len+1) ||
sizeof(char) > ((size_t)1 << 8*(sizeof(size_t) - sizeof(len+1)))) ?
(size_t)(len+1) : ((size_t)-1)/sizeof(char)) > ((size_t)-1)/sizeof(char))) ?
(_Bool)1 : (_Bool)0),(0)) && (S_croak_memory_wrap(),0)),
(char*)(Perl_safesysmalloc((size_t)((len+1)*sizeof(char))))));
The analyzer warning is issued for the construction sizeof(len +1). The fact of the matter is that no calculations in the arguments of the operator sizeof are executed. Various macros are expanded in such code. Probably, it is the old legacy code, where nobody wants to touch anything, but current developers continue to use old macros, assuming they behave differently.
V522 Dereferencing of the null pointer 'sv' might take place. pp_ctl.c 577
OP * Perl_pp_formline(void)
{
....
SV *sv = ((void *)0);
....
switch (*fpc++) {
....
case 4:
arg = *fpc++;
f += arg;
fieldsize = arg;
if (mark < sp)
sv = *++mark;
else {
sv = &(PL_sv_immortals[2]);
Perl_ck_warner( (28 ), "....");
}
....
break;
case 5:
{
const char *s = item = ((((sv)->sv_flags & (....)) == 0x00000400) ? ....
....
}
....
}
This code fragment is entirely taken from the preprocessed file, because it is impossible to make sure the problem takes place according to the source code, again because of macros.
The sv pointer is initialized by zero during declaration. The analyzer detected that, in the switch branch corresponding to the value 5, this pointer that has not been initialized before, gets dereferenced. Changing of the sv pointer takes place in the branch with the value 4 but in the end of this block, there is the operator break. Most likely, this place requires additional coding.
V595 The 'k' pointer was utilized before it was verified against nullptr. Check lines: 15919, 15920. op.c 15919
void
Perl_rpeep(pTHX_ OP *o)
{
....
OP *k = o->op_next;
U8 want = (k->op_flags & OPf_WANT); // <=
if ( k // <=
&& k->op_type == OP_KEYS
&& ( want == OPf_WANT_VOID
|| want == OPf_WANT_SCALAR)
&& !(k->op_private & OPpMAYBE_LVSUB)
&& !(k->op_flags & OPf_MOD)
) {
....
}
In this code fragment, the analyzer has detected a pointer k, which is dereferenced one line before it is checked for validity. This can be either an error, or redundant code.
V595 diagnostic finds many warnings in any project, Perl 5 is no exception. There is no way to pack everything in the single article, so we shall confine ourselves with one example, but developers, if they wish, will check the project themselves.
V779 Unreachable code detected. It is possible that an error is present. universal.c 457
XS(XS_utf8_valid);
XS(XS_utf8_valid)
{
dXSARGS;
if (items != 1)
croak_xs_usage(cv, "sv");
else {
SV * const sv = ST(0);
STRLEN len;
const char * const s = SvPV_const(sv,len);
if (!SvUTF8(sv) || is_utf8_string((const U8*)s,len))
XSRETURN_YES;
else
XSRETURN_NO;
}
XSRETURN_EMPTY;
}
In the line XSRETURN_EMPTY, the analyzer has detected unreachable code. In this function, there are two return operators, and croak_xs_usage, which is a macro that expands into a noreturn function:
void Perl_croak_xs_usage(const CV *const cv, const char *const params)
__attribute__((noreturn));
In such places of the Perl 5 code, the macro NOT_REACHED is used to specify the unreachable branch.
V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. inffast.c 296
void ZLIB_INTERNAL inflate_fast(z_streamp strm, unsigned start)
{
....
unsigned long hold; /* local strm->hold */
unsigned bits; /* local strm->bits */
....
hold &= (1U << bits) - 1;
....
}
The analyzer has detected a suspicious operation in code which works with bit masks. A variable of a lower size than the hold variable is used as a bitmask. This results in the loss of higher bits. Developers should pay attention to this code.
Finding errors through macros was very difficult. Viewing of the report took a lot of time and effort. Nevertheless, the article included very interesting cases related to real errors. The analyzer report is quite large, there are definitely much more exciting things. However, I cannot view it further :). I recommend developers checking the project themselves, and eliminating defects that they will be able to find.
P.S. We surely want to support this exciting project and we are ready to provide developers with a license for a few months.
0