To get a trial key
fill out the form below
Team license
Enterprise license
** By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
USD
EUR
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
** By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
** By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
** By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
The dangers of using multi-character co…

The dangers of using multi-character constants

Jun 26 2019

During code analysis, PVS-Studio analyzes the data flow and operates variable values. Values are taken from constants or derived from conditional expressions. We call them virtual values. Recently, we have refined them in order to work with multi-character constants and this has become the reason to create a new diagnostic rule.

0634_MulticharacterLiterals/image1.png

Introduction

Multi-character-literals are implementation-defined, so different compilers can encode them in different ways. For example, GCC and Clang set a value, based on the order of the symbols in the literal, while MSVC moves them depending on the symbol's type (regular or escape).

For example, the 'T\x65s\x74' literal will be encoded in different of ways, depending on the compiler. A similar logic had to be added in the analyzer. As a result, we've made a new diagnostic rule V1039 to identify such literals in the code. These literals are dangerous in cross-platform projects that use multiple compilers for building.

Diagnostic V1039

Let's look at the example. The code below, compiled by different compilers, will behave differently:

#include <stdio.h>

void foo(int c)
{
  if (c == 'T\x65s\x74')                       // <= V1039
  {
    printf("Compiled with GCC or Clang.\n");
  }
  else
  {
    printf("It's another compiler (for example, MSVC).\n");
  }
}

int main(int argc, char** argv)
{
  foo('Test');
  return 0;
}

The program, compiled by different compilers, will print different messages on the screen.

For a project that uses a specific compiler, it won't be noticeable. But when porting, problems may occur, so one should replace such literals with simple numerical constants, such as 'Test' is to be changed with 0x54657374.

To demonstrate the difference between compilers, we'll write a small utility that takes sequences of 3 and 4 symbols, such as 'GHIJ' and 'GHI', and displays their representation in memory after compilation.

Utility code:

#include <stdio.h>

typedef int char_t;

void PrintBytes(const char* format, char_t lit)
{
  printf("%20s : ", format);

  const unsigned char *ptr = (const unsigned char*)&lit;
  for (int i = sizeof(lit); i--;)
  {
    printf("%c", *ptr++);
  }
  putchar('\n');
}

int main(int argc, char** argv)
{
  printf("Hex codes are: G(%02X) H(%02X) I(%02X) J(%02X)\n",'G','H','I','J');
  PrintBytes("'GHIJ'", 'GHIJ');
  PrintBytes("'\\x47\\x48\\x49\\x4A'", '\x47\x48\x49\x4A');
  PrintBytes("'G\\x48\\x49\\x4A'", 'G\x48\x49\x4A');
  PrintBytes("'GH\\x49\\x4A'", 'GH\x49\x4A');
  PrintBytes("'G\\x48I\\x4A'", 'G\x48I\x4A');
  PrintBytes("'GHI\\x4A'", 'GHI\x4A');
  PrintBytes("'GHI'", 'GHI');
  PrintBytes("'\\x47\\x48\\x49'", '\x47\x48\x49');
  PrintBytes("'GH\\x49'", 'GH\x49');
  PrintBytes("'\\x47H\\x49'", '\x47H\x49');
  PrintBytes("'\\x47HI'", '\x47HI');
  return 0;
}

Output of the utility, compiled by Visual C++:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : GHIJ
     'G\x48\x49\x4A' : HGIJ
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : GHI
            'GH\x49' : IHG
         '\x47H\x49' : HGI
            '\x47HI' : IHG

Output of the utility, compiled by GCC or Clang:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : JIHG
     'G\x48\x49\x4A' : JIHG
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : IHG
            'GH\x49' : IHG
         '\x47H\x49' : IHG
            '\x47HI' : IHG

Conclusion

The V1039 diagnostic is added in the PVS-Studio analyzer of 7.03 version, which has been recently released. You can download the latest version of the analyzer on the download page.

Popular related articles
Is there life without RTTI or How we wrote our own dynamic_cast

Date: Oct 13 2022

Author: Vladislav Stolyarov

There aren't many things left in modern C++ that don't fit the "Don't pay for what you don't use" paradigm. One of them is dynamic_cast. In this article, we'll find out what's wrong with it, and afte…
"Our legacy of the past" or why we divided the V512

Date: Aug 12 2022

Author: Mikhail Gelvih

As the saying goes, the first step is always the hardest. That's exactly what happened in our case – after delaying it for so long, we have finally split the V512 diagnostic rule. You can read more a…
Why do arrays have to be deleted via delete[] in C++

Date: Jul 27 2022

Author: Mikhail Gelvih

This note is for C++ beginner programmers who are wondering why everyone keeps telling them to use delete[] for arrays. But, instead of a clear explanation, senior developers just keep hiding behind …
Intermodular analysis of C and C++ projects in detail. Part 2

Date: Jul 14 2022

Author: Oleg Lisiy

In part 1 we discussed the basics of C and C++ projects compiling. We also talked over linking and optimizations. In part 2 we are going to delve deeper into intermodular analysis and discuss its ano…
Intermodular analysis of C and C++ projects in detail. Part 1

Date: Jul 08 2022

Author: Oleg Lisiy

Starting from PVS-Studio 7.14, the C and C++ analyzer has been supporting intermodular analysis. In this two-part article, we'll describe how similar mechanisms are arranged in compilers and reveal s…

Comments (0)

Next comments
Unicorn with delicious cookie
Our website uses cookies to enhance your browsing experience.
Accept