Andrey Karpov

Apr 16 2020

Tags:

#Cpp #Compiler

Checking the GCC 10 compiler with PVS-Studio

Apr 16 2020

Author: Andrey Karpov

Time to double-check the GCC compiler code
10 suspicious code fragments
Conclusion
Our other articles about checking compilers

The GCC compiler is written with copious use of macros. Another check of the GCC code using PVS-Studio once again confirms the opinion of our team that macros are evil in the flesh. Not only does the static analyzer struggle with reviewing such code, but also a developer. GCC developers are certainly used to the project and are well versed in it. Nonetheless, it is very difficult to understand something on the third hand. Actually, due to macros, it was not possible to fully perform code checking. However, the PVS-Studio analyzer, as always, showed that it can find errors even in compilers.

Time to double-check the GCC compiler code

The last time I checked the GCC compiler four years ago. Time flies quickly and imperceptibly, and somehow I completely forgot to return to this project and recheck it. The post "Static analysis in GCC 10" pushed me back to this idea.

Actually, it is no secret that compilers have their own built-in static code analyzers and they are also developing. Therefore, from time to time we write articles that the PVS-Studio static analyzer can find errors even inside compilers and that we are worth our salt :).

In fact, one can't compare classic static analyzers with compilers. Static analyzer is not only a search for errors in the code, but also a developed infrastructure. For example, it is also integration with such systems as SonarQube, PlatformIO, Azure DevOps, Travis CI, CircleCI, GitLab CI/CD, Jenkins, Visual Studio. In addition, it is mechanisms of mass warnings suppression, which allows you to quickly start using PVS-Studio even in a large old project. It is a notification mailing list. And it's just to name a few. However, anyway, the first question is: "Can PVS-Studio find something that compilers can't?" Which means that we will write articles over and over again about checking these compilers themselves.

Let's get back to the GCC project check. There is no need to dwell on this project and tell what it is. Let's better talk what's inside of this project.

Inside there is a huge number of macros that interfere with the check. Firstly, the PVS-Studio analyzer generates a large number of false positives. There is nothing wrong about that, but it's not easy to take and start reviewing the resulting report. In a good way, one has to take the effort to suppress false warnings in macros. Otherwise, useful warnings are drowning in a flood of noise. This setup goes beyond the scope of this article. Frankly speaking, I was just too lazy to do this, even though there is nothing complicated about that. Due to the noise, viewing the report was quite superficial.

Secondly, it is very challenging for me, a person who is not familiar with the project, to understand the code. Macros, macros ... I have to check out what they are expanding into in order to understand why the analyzer generates warnings. Very hard. I do not like macros. Someone might say that you can't do without macros in C. But GCC has not been written in C for a while. For historical reasons, files do have the extension .c. At the same time, if we look inside we'll see the following:

// File alias.c
....
struct alias_set_hash : int_hash <int, INT_MIN, INT_MIN + 1> {};
struct GTY(()) alias_set_entry {
  alias_set_type alias_set;
  bool has_zero_child;
  bool is_pointer;
  bool has_pointer;
  hash_map<alias_set_hash, int> *children;
};
....

This is clearly not C, but C++.

In short, macros and coding style make it very tricky to deal with the analyzer report. So this time I will not please the reader with a long list of various errors. Thanks to a few cups of coffee I highlighted 10 interesting fragments right by the skin of my teeth. At that point, I grew weary of that :).

10 suspicious code fragments

Fragment N1, Seems like failed Copy-Paste

static bool
try_crossjump_to_edge (int mode, edge e1, edge e2,
                       enum replace_direction dir)
{
  ....
  if (FORWARDER_BLOCK_P (s->dest))
    s->dest->count += s->count ();

  if (FORWARDER_BLOCK_P (s2->dest))
    s2->dest->count -= s->count ();
  ....
}

PVS-Studio warning: V778 Two similar code fragments were found. Perhaps, this is a typo and 's2' variable should be used instead of 's'. cfgcleanup.c 2126

If I am honest, I'm not sure that this is an error. However, I have a strong suspicion that this code was written using Copy-Paste, and in the second block in one place they forgot to replace s with s2. That is, it seems to me the second block of code should be like this:

if (FORWARDER_BLOCK_P (s2->dest))
  s2->dest->count -= s2->count ();

Fragment N2, Typo

tree
vn_reference_lookup_pieces (....)
{
  struct vn_reference_s vr1;
  ....
  vr1.set = set;
  vr1.set = base_set;
  ....
}

PVS-Studio warning: V519 The 'vr1.set' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 3448, 3449. tree-ssa-sccvn.c 3449

It is very strange that different values are written into the same variable twice in a row. This is an obvious typo. Right in this file, next to the above there is the following code:

vr1.set = set;
vr1.base_set = base_set;

Most likely, the suspicious code fragment should also look like this one.

Fragment N3, Assigning a variable to itself

static omp_context *
new_omp_context (gimple *stmt, omp_context *outer_ctx)
{
  omp_context *ctx = XCNEW (omp_context);

  splay_tree_insert (all_contexts, (splay_tree_key) stmt,
         (splay_tree_value) ctx);
  ctx->stmt = stmt;

  if (outer_ctx)
    {
      ctx->outer = outer_ctx;
      ctx->cb = outer_ctx->cb;
      ctx->cb.block = NULL;
      ctx->local_reduction_clauses = NULL;
      ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;  // <=
      ctx->depth = outer_ctx->depth + 1;
    }
  ....
}

PVS-Studio warning: V570 The 'ctx->outer_reduction_clauses' variable is assigned to itself. omp-low.c 935

It is very strange to assign a variable to itself.

Fragment N4. 0,1,2, Freddy is coming for you.

I recently posted an article "Zero, one, two, Freddy's coming for you". It seems to me that the following code fragment enlarges the collection of errors discussed in that article.

#define GET_MODE(RTX)    ((machine_mode) (RTX)->mode)
....
static int
add_equal_note (rtx_insn *insns, rtx target, enum rtx_code code, rtx op0,
                rtx op1, machine_mode op0_mode)
{
  ....
  if (commutative_p
      && GET_MODE (xop0) != xmode0 && GET_MODE (xop1) != xmode1
      && GET_MODE (xop0) == xmode1 && GET_MODE (xop1) == xmode1)
    std::swap (xop0, xop1);
  ....
}

PVS-Studio warning: V560 A part of conditional expression is always false: ((machine_mode)(xop1)->mode) == xmode1. optabs.c 1053

Pay attention to these two subexpressions:

GET_MODE (xop1) != xmode1
GET_MODE (xop1) == xmode1

The AND operation is performed on the results of these subexpressions, which obviously has no practical meaning. Actually, if the second subexpression gets executed, then we know in advance that it will result in false.

Most likely, there is a typo here in zeros and ones, and in fact the condition should have been like this:

&& GET_MODE (xop0) != xmode0 && GET_MODE (xop1) != xmode1
&& GET_MODE (xop0) == xmode1 && GET_MODE (xop1) == xmode0

Of course, I'm not sure that I changed the code correctly, since I have not taken the project apart.

Fragment N5. Suspicious change in the argument value

bool
ipa_polymorphic_call_context::set_by_invariant (tree cst,
                                                tree otr_type,
                                                HOST_WIDE_INT off)
{
  poly_int64 offset2, size, max_size;
  bool reverse;
  tree base;

  invalid = false;
  off = 0;                // <=
  ....
  if (otr_type && !contains_type_p (TREE_TYPE (base), off, otr_type))
    return false;

  set_by_decl (base, off);
  return true;
}

PVS-Studio warning: V763 Parameter 'off' is always rewritten in function body before being used. ipa-polymorphic-call.c 766

The value of the off argument is immediately replaced by 0. Moreover, there is no explanatory comment. All this is very suspicious. Sometimes this code appears during debugging. The programmer wanted to see how the function was behaving in a certain mode, and temporarily changed the value of the argument, and then forgot to delete that line. This is how the error appeared in the code. Of course, everything may be correct here, but this code clearly needs to be checked and clarified in comments in order to avoid similar concerns in the future.

Fragment N6. Little thing

cgraph_node *
cgraph_node::create_clone (....)
{
  ....
  new_node->icf_merged = icf_merged;
  new_node->merged_comdat = merged_comdat;   // <=
  new_node->thunk = thunk;
  new_node->unit_id = unit_id;
  new_node->merged_comdat = merged_comdat;   // <=
  new_node->merged_extern_inline = merged_extern_inline;
  ....
}

PVS-Studio warning: V519 The 'new_node->merged_comdat' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 406, 409. cgraphclones.c 409

The assignment is accidently duplicated. Most likely, nothing crucial here. However, there is always a risk that in reality the author forgot to perform another assignment.

Fragment N7. Code that looks dangerous

static void
complete_mode (struct mode_data *m)
{
  ....
  if (m->cl == MODE_COMPLEX_INT || m->cl == MODE_COMPLEX_FLOAT)
    alignment = m->component->bytesize;
  else
    alignment = m->bytesize;

  m->alignment = alignment & (~alignment + 1);

  if (m->component)
  ....
}

PVS-Studio warning: V595 The 'm->component' pointer was utilized before it was verified against nullptr. Check lines: 407, 415. genmodes.c 407

First the pointer m->component is dereferenced in one of the branches of the if statement. I mean this expression: m->component->bytesize.

It further turns out that this pointer may be null. This follows from the check: if (m->component).

This code is not necessarily wrong. It is well possible that the dereferencing branch is only executed if the pointer is not null. That is, there is an indirect relationship between the value of the variable m->cl and m->component. But this code looks very dangerous in any case. Besides, there are no explanatory comments.

Fragment N8. Double check

void
pointer_and_operator::wi_fold (value_range &r, tree type,
                               const wide_int &lh_lb,
                               const wide_int &lh_ub,
                               const wide_int &rh_lb ATTRIBUTE_UNUSED,
                               const wide_int &rh_ub ATTRIBUTE_UNUSED) const
{
  // For pointer types, we are really only interested in asserting
  // whether the expression evaluates to non-NULL.
  if (wi_zero_p (type, lh_lb, lh_ub) || wi_zero_p (type, lh_lb, lh_ub))
    r = range_zero (type);
  else 
    r = value_range (type);
}

PVS-Studio warning: V501 There are identical sub-expressions 'wi_zero_p(type, lh_lb, lh_ub)' to the left and to the right of the '||' operator. range-op.cc 2657

Some kind of a strange check. The wi_zero_p function is called twice with the same set of actual arguments. One may suspect that in fact, the second call should use the arguments received from the outside: rh_lb, rh_ub. But no, these arguments are marked as unused (ATTRIBUTE_UNUSED).

Therefore, it is not clear to me why not write the check in a simpler way:

if (wi_zero_p (type, lh_lb, lh_ub))
  r = range_zero (type);
else 
  r = value_range (type);

Or is there a typo here? Or a logical mistake? I don't know, but the code is weird.

Fragment N9. Dangerous array access

struct algorithm
{
  struct mult_cost cost;
  short ops;
  enum alg_code op[MAX_BITS_PER_WORD];
  char log[MAX_BITS_PER_WORD];
};

static void
synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
            const struct mult_cost *cost_limit, machine_mode mode)
{
  int m;
  struct algorithm *alg_in, *best_alg;
  ....
  /* Cache the result.  */
  if (!cache_hit)
  {
    entry_ptr->t = t;
    entry_ptr->mode = mode;
    entry_ptr->speed = speed;
    entry_ptr->alg = best_alg->op[best_alg->ops];
    entry_ptr->cost.cost = best_cost.cost;
    entry_ptr->cost.latency = best_cost.latency;
  }

  /* If we are getting a too long sequence for `struct algorithm'
     to record, make this search fail.  */
  if (best_alg->ops == MAX_BITS_PER_WORD)
    return;
  ....
}

PVS-Studio warning: V781 The value of the 'best_alg->ops' variable is checked after it was used. Perhaps there is a mistake in program logic. Check lines: 3157, 3164. expmed.c 3157

Let's shorten the code to make it clear what the analyzer does not like here:

if (!cache_hit)
{
  entry_ptr->alg = best_alg->op[best_alg->ops];
}
if (best_alg->ops == MAX_BITS_PER_WORD)

At the beginning, the variable best_alg->ops is used to index the array. Only afterwards this variable is checked for a boundary value. An array index out of bounds might potentially happen (a classic type of of the error CWE-193: Off-by-one Error).

Is this a legit error? And as this is constantly happening in this article, I'm not sure :). Perhaps there is a relationship between the value of this index and the cache_hit variable. Perhaps nothing is cached if the index has the maximum value MAX_BITS_PER_WORD). The function code is large, and I did not figure it out.

In any case, this code is best to be checked. Even if it turns out to be correct, I would recommend leaving a comment for the considered section of the program. It can confuse not only me or PVS-Studio, but also someone else.

Fragment N10. Code that has not been fixed for 4 years

Even in the last article, I drew attention to this code:

static bool
dw_val_equal_p (dw_val_node *a, dw_val_node *b)
{
  ....
  case dw_val_class_vms_delta:
    return (!strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)
            && !strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1));
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions '!strcmp(a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)' to the left and to the right of the '&&' operator. dwarf2out.c 1481

Two strcmp functions compare the same pointers. That is, a clearly redundant check is performed. In a previous article, I assumed that it was a typo, and the following should actually have been written:

return (   !strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)
        && !strcmp (a->v.val_vms_delta.lbl2, b->v.val_vms_delta.lbl2));

However, it's already 4 years since this code has not been fixed. By the way, we informed the authors about the suspicious sections of code that we described in the article. Now I'm not so sure that this is a bug. Perhaps this is just redundant code. In this case, the expression can be simplified:

return (!strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1));

Let's see if GCC developers will change this piece of code after a new article.

Conclusion

I'd like to kindly remind you that you are welcome to use this free license option for checking open source projects. By the way, there are other options of free PVS-Studio licensing even for closed projects. They are listed here: "Ways to Get a Free PVS-Studio License".

Thank you for your attention. Follow the link and read our blog. Lots of interesting stuff await.