Checking the GCC 13 compiler with the help of PVS-Studio

Fragment N1. Uninitialized structure
Fragment N2. Potential null pointer
Fragment N3. Forgotten JSON
Fragment N4. Unsafe function that may cause a buffer overflow
Fragment N5. Redundant calculations
Fragment N6. Memory leak
Fragment N7. Rewriting a parameter
Fragment N8. Unexecutable loop
Fragment N9. Suspicious loop
Fragment N10. Strange conditional
Conclusion

I've been looking for a challenge to put the PVS-Studio analyzer through its paces, and that's why I chose GCC, an open-source compiler collection. It's not the first time we check this project. However, as the programming languages supported by the collection evolve, so does the GCC code. So, we will use the PVS-Studio analyzer to check GCC for bugs.

The GCC compiler collection, dating back to 1987, has been a major contributor to many open-source projects. Initially, GCC included only a compiler for the C programming language, C++, Objective-C, Java, Fortran, Ada, Go, GAS, and D were supported later. Currently, the latest major update to GCC is 13. This article discusses the 13.1 release. GCC has attracted many developers and even more users over the course of its long development history. As the result, the project is carefully tested, and looking for errors in such well-debugged and polished code is an exciting challenge.

The compiler code follows special coding conventions. The code teems with macros, which makes reading it a real nightmare for an unprepared user. It also makes the job of the PVS-Studio analyzer more complicated, but not impossible. Here are 10 code snippets that caught my eye when I was looking through the warnings. Unfortunately, it's difficult to examine the report in more detail because of the macros I mentioned. They require a preliminary configuration of the analyzer, which goes beyond the scope of just writing an article. If you'd like to read about the previous check of the GCC project, follow the link.

Fragment N1. Uninitialized structure

static ctf_id_t
gen_ctf_function_type (ctf_container_ref ctfc,
                       dw_die_ref function,
                       bool from_global_func)
{
....
  ctf_funcinfo_t func_info;                                   // <=
  ....
  {
    ....
    if (....)
    {
      do
      {
       ....

        if (....)
          ....
        else if (....)
        {
          func_info.ctc_flags |= CTF_FUNC_VARARG;             // <=
          ....
        }
      }
    }
  ....
  }
....
}

The PVS-Studio analyzer warning: V614 Uninitialized variable 'func_info.ctc_flags' used. gcc/dwarf2ctf.cc 676

The func_info object is not initialized when it is declared. Further down the code, in one of the branches, the CTF_FUNC_VARARG flag is set in the ctc_flags data member, but it has not been initialized.

Fragment N2. Potential null pointer

static struct gcov_info *
read_gcda_file (const char *filename)
{
  ....
  curr_gcov_info = obj_info = 
    (struct gcov_info *) xcalloc (sizeof (struct gcov_info) +
          sizeof (struct gcov_ctr_info) * GCOV_COUNTERS, 1);

  obj_info->version = version;
  obj_info->filename = filename;
  ....
}

The PVS-Studio analyzer warning: V522 There might be dereferencing of a potential null pointer 'obj_info'. Check lines: 290, 287. libgcov-util.c 290. libgcov-util.c 287

The PVS-Studio analyzer has detected that the validity of the obj_info pointer is not checked after allocation. The xmalloc, xcalloc, and xrealloc functions always return a valid pointer. If one of them is unable to allocate memory, it displays an error message and terminates the program. When I saw this warning, I considered it as false alarm and decided to make a minimal reproducible example for future debugging. However, when I opened a preprocessed file, I found a call to calloc in the line where xcalloc should have been. This is what the xcalloc macro actually expands to:

// libgcc/libgcov.h
....
/* work around the poisoned malloc/calloc in system.h.  */
#ifndef xmalloc
#define xmalloc malloc
#endif
#ifndef xcalloc
#define xcalloc calloc
#endif

I anticipate future comments, so here is the link to my colleague's article. There, he gave the reasons why we need to check the result of these functions.

Fragment N3. Forgotten JSON

static void
generate_results (const char *file_name)
{
  ....

  json::object *root = new json::object ();
  root->set ("format_version", new json::string ("1"));
  root->set ("gcc_version", new json::string (version_string));

  if (....)
    root->set ("current_working_directory", new json::string (bbg_cwd));
  root->set ("data_file", new json::string (file_name));

  json::array *json_files = new json::array ();
  root->set ("files", json_files);

....

  if (....)
  {
      if (....)
      {
        root->dump (stdout);
        printf ("\n");
      }
      else
      {
        pretty_printer pp;
        root->print (&pp);
        ....
      }
  }
}

The PVS-Studio analyzer warning: V773 The function was exited without releasing the 'root' pointer. A memory leak is possible. gcov.cc 1525

Developers forgot to delete root, the json object, that was created. This caused a memory leak. Most likely, this JSON output is a one-time event, and the error will have no real impact. I chose this case to demonstrate PVS-Studio's capabilities in memory leak detection.

Fragment N4. Unsafe function that may cause a buffer overflow

char m_flag_chars[256];
....
void
flag_chars_t::add_char (char ch)
{
  int i = strlen (m_flag_chars);
  m_flag_chars[i++] = ch;
  m_flag_chars[i] = 0;
}

The PVS-Studio analyzer warning: V557 Array overrun is possible. The value of 'i' index could reach 256. c-format.cc 1994

There may be an array overrun when writing a terminal null if the length of the m_flag_chars string was equal to 255 before calling the add_char function.

Fragment N5. Redundant calculations

....
/* Return True when the format flag CHR has been used.  */
bool get_flag (char chr) const
{
  unsigned char c = chr & 0xff;
  return (flags[c / (CHAR_BIT * sizeof *flags)]
    & (1U << (c % (CHAR_BIT * sizeof *flags))));
}
....
static fmtresult
format_floating (const directive &dir, const HOST_WIDE_INT prec[2])
{
  ....
  unsigned flagmin = (1 /* for the first digit */
    + (dir.get_flag ('+') | dir.get_flag (' ')));       
  ....
}

The PVS-Studio analyzer warning: V792 The 'get_flag' function located to the right of the operator '|' will be called regardless of the value of the left operand. Perhaps, it is better to use '||'. gimple-ssa-sprintf.cc 1227

The get_flag function returns an object of the bool type. The bitwise "OR" operator combines the results of two function calls. There is no danger in the code, but the bitwise operator will evaluate both operands. By using the logical "OR", you can eliminate the unnecessary evaluation if the first call returns true:

unsigned flagmin = (1 /* for the first digit */
          + (dir.get_flag ('+') || dir.get_flag (' ')));

Fragment N6. Memory leak

struct gcov_info *
gcov_read_profile_dir (const char* dir_name,
                       int recompute_summary ATTRIBUTE_UNUSED)
{
  char *pwd;
  int ret;

  read_profile_dir_init ();

  if (access (dir_name, R_OK) != 0)
  {
    fnotice (stderr, "cannot access directory %s\n", dir_name);
    return NULL;
  }
  pwd = getcwd (NULL, 0);                                     // <=
  gcc_assert (pwd);
  ret = chdir (dir_name);
  if (ret != 0)
  {
    fnotice (stderr, "%s is not a directory\n", dir_name);
    return NULL;                                              // <=
  }
#ifdef HAVE_FTW_H
  ftw (".", ftw_read_file, 50);
#endif
  chdir (pwd);
  free (pwd);                                                 // <=
  return gcov_info_head;;
}

The PVS-Studio analyzer warning: V773 The function was exited without releasing the 'pwd' pointer. A memory leak is possible. libgcov-util.c 450, libgcov-util.c 434

Let's look at the getcwd function call. If a null pointer is passed to it as the first argument, the function dynamically allocates memory for a buffer of the necessary length and writes the current working directory to it. The release of memory is at the developer's discretion. As you can see, the developer released memory at the very end of the function. However, they forgot to do so in one of the code branches with an early return. This results in a potential, though insignificant, memory leak. Here you can read more about how PVS-Studio can detect memory leaks.

Fixed code:

....
ret = chdir (dir_name);
if (ret != 0)
{
  fnotice (stderr, "%s is not a directory\n", dir_name);
  free(pwd);
  return NULL;
}
....

Fragment N7. Rewriting a parameter

static conversion *
build_aggr_conv (tree type,
                 tree ctor,
                 int flags,                               // <=
                 tsubst_flags_t complain)
{
  unsigned HOST_WIDE_INT i = 0;
  conversion *c;
  tree field = next_aggregate_field (TYPE_FIELDS (type));
  tree empty_ctor = NULL_TREE;
  hash_set<tree, true> pset;
  flags = LOOKUP_IMPLICIT|LOOKUP_NO_NARROWING;            // <=
  ....
}

The PVS-Studio analyzer warning: V763 Parameter 'flags' is always rewritten in function body before being used. call.cc 994, call.cc 981

The flags function parameter, received by value, is always overwritten. Perhaps developers planned to set flags in this variable additionally, or another variable should have been used.

Fragment N8. Unexecutable loop

void
gt_pch_p_13ctf_container (ATTRIBUTE_UNUSED void *this_obj,
                          void *x_p,
                          ATTRIBUTE_UNUSED gt_pointer_operator op,
                          ATTRIBUTE_UNUSED void *cookie)
{
  struct ctf_container * x ATTRIBUTE_UNUSED = (struct ctf_container *)x_p;
  {
    ....
    size_t l0 = (size_t)(0);                                 // <=
    ....
    if ((*x).ctfc_vars_list != NULL) 
    {
      size_t i0;
      for (i0 = 0; i0 != (size_t)(l0)                        // <=
           && ((void *)(*x).ctfc_vars_list == this_obj); i0++) 
      {
         if ((void *)((*x).ctfc_vars_list) == this_obj)
           op (&((*x).ctfc_vars_list[i0]), NULL, cookie);
      }
      ....
    }
  }
  ....
}

The PVS-Studio analyzer warning: V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. gtype-desc.cc 11865

The l0 variable, on which the loop state depends, stays at zero. This means that the loop will never be executed.

The PVS-Studio analyzer has issued more of the similar warnings. Here are only some of them:

V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. gtype-desc.cc 11874
V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. gtype-desc.cc 11883
V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. gtype-desc.cc 11892

Fragment N9. Suspicious loop

static bool
combine_reaching_defs (ext_cand *cand,
                       const_rtx set_pat,
                       ext_state *state)
{
  ....
  while (   REG_P (SET_SRC (*dest_sub_rtx))
         && (REGNO (SET_SRC (*dest_sub_rtx)) == REGNO (SET_DEST (set))))
  {
    ....
    if (....)
      break;

    if (....)
      break;
    
    ....
    break;
  }
}

The PVS-Studio analyzer warning: V612 An unconditional 'break' within a loop. ree.cc 985

The PVS-Studio analyzer has detected an unconditional break within the loop at the first iteration. If we look further, we can also find many other break's that are conditional. Perhaps this is a way to avoid writing the goto statement. It's odd, though, since the statement is used explicitly throughout the project.

Fragment N10. Strange conditional

void
const_fn_result_svalue::dump_to_pp(pretty_printer *pp,
                                   bool simple) const
{
  if (simple)
  {
    pp_printf (pp, "CONST_FN_RESULT(%qD, {", m_fndecl);
    for (unsigned i = 0; i < m_num_inputs; i++)
    {
      if (i > 0)
        pp_string (pp, ", ");
      dump_input (pp, i, m_input_arr[i], simple);
    }
    pp_string (pp, "})");
  }
  else
  {
    pp_printf (pp, "CONST_FN_RESULT(%qD, {", m_fndecl);
    for (unsigned i = 0; i < m_num_inputs; i++)
    {
      if (i > 0)
        pp_string (pp, ", ");
      dump_input (pp, i, m_input_arr[i], simple);
    }
      pp_string (pp, "})");
  }
}

The PVS-Studio analyzer warning: V523 The 'then' statement is equivalent to the 'else' statement. svalue.cc 2009, svalue.cc 1998

Complete logic duplication in conditional branches is detected. I decided to look for the cause of the duplication and used the git log command (‑‑follow -p ./gcc/analyzer/svalue.cc) in the git root directory of the GCC project to see the difference between commits. However, I found nothing there. It turns out that the file had this error all along.

Conclusion

When examining the code that uses macros so actively, one gets tired very quickly. But the satisfaction of working on a project as serious as GCC makes up for the fatigue. PVS-Studio has proven that it can detect hardly noticeable errors in well-debugged codebases, and even a mountain of macros did not become an obstacle for it.

At the end of this article, I'd want to wish everyone as few errors in code and as few issues in releases as possible :). Also, taking the opportunity, I invite you to try out the trial version of the analyzer. And to make sure you don't miss anything of interest, you're welcome to subscribe to our digest.