I've been looking for a challenge to put the PVS-Studio analyzer through its paces, and that's why I chose GCC, an open-source compiler collection. It's not the first time we check this project. However, as the programming languages supported by the collection evolve, so does the GCC code. So, we will use the PVS-Studio analyzer to check GCC for bugs.
The GCC compiler collection, dating back to 1987, has been a major contributor to many open-source projects. Initially, GCC included only a compiler for the C programming language, C++, Objective-C, Java, Fortran, Ada, Go, GAS, and D were supported later. Currently, the latest major update to GCC is 13. This article discusses the 13.1 release. GCC has attracted many developers and even more users over the course of its long development history. As the result, the project is carefully tested, and looking for errors in such well-debugged and polished code is an exciting challenge.
The compiler code follows special coding conventions. The code teems with macros, which makes reading it a real nightmare for an unprepared user. It also makes the job of the PVS-Studio analyzer more complicated, but not impossible. Here are 10 code snippets that caught my eye when I was looking through the warnings. Unfortunately, it's difficult to examine the report in more detail because of the macros I mentioned. They require a preliminary configuration of the analyzer, which goes beyond the scope of just writing an article. If you'd like to read about the previous check of the GCC project, follow the link.
static ctf_id_t
gen_ctf_function_type (ctf_container_ref ctfc,
dw_die_ref function,
bool from_global_func)
{
....
ctf_funcinfo_t func_info; // <=
....
{
....
if (....)
{
do
{
....
if (....)
....
else if (....)
{
func_info.ctc_flags |= CTF_FUNC_VARARG; // <=
....
}
}
}
....
}
....
}
The PVS-Studio analyzer warning: V614 Uninitialized variable 'func_info.ctc_flags' used. gcc/dwarf2ctf.cc 676
The func_info object is not initialized when it is declared. Further down the code, in one of the branches, the CTF_FUNC_VARARG flag is set in the ctc_flags data member, but it has not been initialized.
static struct gcov_info *
read_gcda_file (const char *filename)
{
....
curr_gcov_info = obj_info =
(struct gcov_info *) xcalloc (sizeof (struct gcov_info) +
sizeof (struct gcov_ctr_info) * GCOV_COUNTERS, 1);
obj_info->version = version;
obj_info->filename = filename;
....
}
The PVS-Studio analyzer warning: V522 There might be dereferencing of a potential null pointer 'obj_info'. Check lines: 290, 287. libgcov-util.c 290. libgcov-util.c 287
The PVS-Studio analyzer has detected that the validity of the obj_info pointer is not checked after allocation. The xmalloc, xcalloc, and xrealloc functions always return a valid pointer. If one of them is unable to allocate memory, it displays an error message and terminates the program. When I saw this warning, I considered it as false alarm and decided to make a minimal reproducible example for future debugging. However, when I opened a preprocessed file, I found a call to calloc in the line where xcalloc should have been. This is what the xcalloc macro actually expands to:
// libgcc/libgcov.h
....
/* work around the poisoned malloc/calloc in system.h. */
#ifndef xmalloc
#define xmalloc malloc
#endif
#ifndef xcalloc
#define xcalloc calloc
#endif
I anticipate future comments, so here is the link to my colleague's article. There, he gave the reasons why we need to check the result of these functions.
static void
generate_results (const char *file_name)
{
....
json::object *root = new json::object ();
root->set ("format_version", new json::string ("1"));
root->set ("gcc_version", new json::string (version_string));
if (....)
root->set ("current_working_directory", new json::string (bbg_cwd));
root->set ("data_file", new json::string (file_name));
json::array *json_files = new json::array ();
root->set ("files", json_files);
....
if (....)
{
if (....)
{
root->dump (stdout);
printf ("\n");
}
else
{
pretty_printer pp;
root->print (&pp);
....
}
}
}
The PVS-Studio analyzer warning: V773 The function was exited without releasing the 'root' pointer. A memory leak is possible. gcov.cc 1525
Developers forgot to delete root, the json object, that was created. This caused a memory leak. Most likely, this JSON output is a one-time event, and the error will have no real impact. I chose this case to demonstrate PVS-Studio's capabilities in memory leak detection.
char m_flag_chars[256];
....
void
flag_chars_t::add_char (char ch)
{
int i = strlen (m_flag_chars);
m_flag_chars[i++] = ch;
m_flag_chars[i] = 0;
}
The PVS-Studio analyzer warning: V557 Array overrun is possible. The value of 'i' index could reach 256. c-format.cc 1994
There may be an array overrun when writing a terminal null if the length of the m_flag_chars string was equal to 255 before calling the add_char function.
....
/* Return True when the format flag CHR has been used. */
bool get_flag (char chr) const
{
unsigned char c = chr & 0xff;
return (flags[c / (CHAR_BIT * sizeof *flags)]
& (1U << (c % (CHAR_BIT * sizeof *flags))));
}
....
static fmtresult
format_floating (const directive &dir, const HOST_WIDE_INT prec[2])
{
....
unsigned flagmin = (1 /* for the first digit */
+ (dir.get_flag ('+') | dir.get_flag (' ')));
....
}
The PVS-Studio analyzer warning: V792 The 'get_flag' function located to the right of the operator '|' will be called regardless of the value of the left operand. Perhaps, it is better to use '||'. gimple-ssa-sprintf.cc 1227
The get_flag function returns an object of the bool type. The bitwise "OR" operator combines the results of two function calls. There is no danger in the code, but the bitwise operator will evaluate both operands. By using the logical "OR", you can eliminate the unnecessary evaluation if the first call returns true:
unsigned flagmin = (1 /* for the first digit */
+ (dir.get_flag ('+') || dir.get_flag (' ')));
struct gcov_info *
gcov_read_profile_dir (const char* dir_name,
int recompute_summary ATTRIBUTE_UNUSED)
{
char *pwd;
int ret;
read_profile_dir_init ();
if (access (dir_name, R_OK) != 0)
{
fnotice (stderr, "cannot access directory %s\n", dir_name);
return NULL;
}
pwd = getcwd (NULL, 0); // <=
gcc_assert (pwd);
ret = chdir (dir_name);
if (ret != 0)
{
fnotice (stderr, "%s is not a directory\n", dir_name);
return NULL; // <=
}
#ifdef HAVE_FTW_H
ftw (".", ftw_read_file, 50);
#endif
chdir (pwd);
free (pwd); // <=
return gcov_info_head;;
}
The PVS-Studio analyzer warning: V773 The function was exited without releasing the 'pwd' pointer. A memory leak is possible. libgcov-util.c 450, libgcov-util.c 434
Let's look at the getcwd function call. If a null pointer is passed to it as the first argument, the function dynamically allocates memory for a buffer of the necessary length and writes the current working directory to it. The release of memory is at the developer's discretion. As you can see, the developer released memory at the very end of the function. However, they forgot to do so in one of the code branches with an early return. This results in a potential, though insignificant, memory leak. Here you can read more about how PVS-Studio can detect memory leaks.
Fixed code:
....
ret = chdir (dir_name);
if (ret != 0)
{
fnotice (stderr, "%s is not a directory\n", dir_name);
free(pwd);
return NULL;
}
....
static conversion *
build_aggr_conv (tree type,
tree ctor,
int flags, // <=
tsubst_flags_t complain)
{
unsigned HOST_WIDE_INT i = 0;
conversion *c;
tree field = next_aggregate_field (TYPE_FIELDS (type));
tree empty_ctor = NULL_TREE;
hash_set<tree, true> pset;
flags = LOOKUP_IMPLICIT|LOOKUP_NO_NARROWING; // <=
....
}
The PVS-Studio analyzer warning: V763 Parameter 'flags' is always rewritten in function body before being used. call.cc 994, call.cc 981
The flags function parameter, received by value, is always overwritten. Perhaps developers planned to set flags in this variable additionally, or another variable should have been used.
void
gt_pch_p_13ctf_container (ATTRIBUTE_UNUSED void *this_obj,
void *x_p,
ATTRIBUTE_UNUSED gt_pointer_operator op,
ATTRIBUTE_UNUSED void *cookie)
{
struct ctf_container * x ATTRIBUTE_UNUSED = (struct ctf_container *)x_p;
{
....
size_t l0 = (size_t)(0); // <=
....
if ((*x).ctfc_vars_list != NULL)
{
size_t i0;
for (i0 = 0; i0 != (size_t)(l0) // <=
&& ((void *)(*x).ctfc_vars_list == this_obj); i0++)
{
if ((void *)((*x).ctfc_vars_list) == this_obj)
op (&((*x).ctfc_vars_list[i0]), NULL, cookie);
}
....
}
}
....
}
The PVS-Studio analyzer warning: V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. gtype-desc.cc 11865
The l0 variable, on which the loop state depends, stays at zero. This means that the loop will never be executed.
The PVS-Studio analyzer has issued more of the similar warnings. Here are only some of them:
static bool
combine_reaching_defs (ext_cand *cand,
const_rtx set_pat,
ext_state *state)
{
....
while ( REG_P (SET_SRC (*dest_sub_rtx))
&& (REGNO (SET_SRC (*dest_sub_rtx)) == REGNO (SET_DEST (set))))
{
....
if (....)
break;
if (....)
break;
....
break;
}
}
The PVS-Studio analyzer warning: V612 An unconditional 'break' within a loop. ree.cc 985
The PVS-Studio analyzer has detected an unconditional break within the loop at the first iteration. If we look further, we can also find many other break's that are conditional. Perhaps this is a way to avoid writing the goto statement. It's odd, though, since the statement is used explicitly throughout the project.
void
const_fn_result_svalue::dump_to_pp(pretty_printer *pp,
bool simple) const
{
if (simple)
{
pp_printf (pp, "CONST_FN_RESULT(%qD, {", m_fndecl);
for (unsigned i = 0; i < m_num_inputs; i++)
{
if (i > 0)
pp_string (pp, ", ");
dump_input (pp, i, m_input_arr[i], simple);
}
pp_string (pp, "})");
}
else
{
pp_printf (pp, "CONST_FN_RESULT(%qD, {", m_fndecl);
for (unsigned i = 0; i < m_num_inputs; i++)
{
if (i > 0)
pp_string (pp, ", ");
dump_input (pp, i, m_input_arr[i], simple);
}
pp_string (pp, "})");
}
}
The PVS-Studio analyzer warning: V523 The 'then' statement is equivalent to the 'else' statement. svalue.cc 2009, svalue.cc 1998
Complete logic duplication in conditional branches is detected. I decided to look for the cause of the duplication and used the git log command (‑‑follow -p ./gcc/analyzer/svalue.cc) in the git root directory of the GCC project to see the difference between commits. However, I found nothing there. It turns out that the file had this error all along.
When examining the code that uses macros so actively, one gets tired very quickly. But the satisfaction of working on a project as serious as GCC makes up for the fatigue. PVS-Studio has proven that it can detect hardly noticeable errors in well-debugged codebases, and even a mountain of macros did not become an obstacle for it.
At the end of this article, I'd want to wish everyone as few errors in code and as few issues in releases as possible :). Also, taking the opportunity, I invite you to try out the trial version of the analyzer. And to make sure you don't miss anything of interest, you're welcome to subscribe to our digest.