to the top
close form

Fill out the form in 2 simple steps below:

Your contact information:

Step 1
Congratulations! This is your promo code!

Desired license type:

Step 2
Team license
Enterprise license
** By clicking this button you agree to our Privacy Policy statement
close form
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
* By clicking this button you agree to our Privacy Policy statement

close form
Free PVS-Studio license for Microsoft MVP specialists
** By clicking this button you agree to our Privacy Policy statement

close form
To get the licence for your open-source project, please fill out this form
** By clicking this button you agree to our Privacy Policy statement

close form
I am interested to try it on the platforms:
** By clicking this button you agree to our Privacy Policy statement

close form
check circle
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
The dangers of using multi-character co…

The dangers of using multi-character constants

Jun 26 2019

During code analysis, PVS-Studio analyzes the data flow and operates variable values. Values are taken from constants or derived from conditional expressions. We call them virtual values. Recently, we have refined them in order to work with multi-character constants and this has become the reason to create a new diagnostic rule.

0634_MulticharacterLiterals/image1.png

Introduction

Multi-character-literals are implementation-defined, so different compilers can encode them in different ways. For example, GCC and Clang set a value, based on the order of the symbols in the literal, while MSVC moves them depending on the symbol's type (regular or escape).

For example, the 'T\x65s\x74' literal will be encoded in different of ways, depending on the compiler. A similar logic had to be added in the analyzer. As a result, we've made a new diagnostic rule V1039 to identify such literals in the code. These literals are dangerous in cross-platform projects that use multiple compilers for building.

Diagnostic V1039

Let's look at the example. The code below, compiled by different compilers, will behave differently:

#include <stdio.h>

void foo(int c)
{
  if (c == 'T\x65s\x74')                       // <= V1039
  {
    printf("Compiled with GCC or Clang.\n");
  }
  else
  {
    printf("It's another compiler (for example, MSVC).\n");
  }
}

int main(int argc, char** argv)
{
  foo('Test');
  return 0;
}

The program, compiled by different compilers, will print different messages on the screen.

For a project that uses a specific compiler, it won't be noticeable. But when porting, problems may occur, so one should replace such literals with simple numerical constants, such as 'Test' is to be changed with 0x54657374.

To demonstrate the difference between compilers, we'll write a small utility that takes sequences of 3 and 4 symbols, such as 'GHIJ' and 'GHI', and displays their representation in memory after compilation.

Utility code:

#include <stdio.h>

typedef int char_t;

void PrintBytes(const char* format, char_t lit)
{
  printf("%20s : ", format);

  const unsigned char *ptr = (const unsigned char*)&lit;
  for (int i = sizeof(lit); i--;)
  {
    printf("%c", *ptr++);
  }
  putchar('\n');
}

int main(int argc, char** argv)
{
  printf("Hex codes are: G(%02X) H(%02X) I(%02X) J(%02X)\n",'G','H','I','J');
  PrintBytes("'GHIJ'", 'GHIJ');
  PrintBytes("'\\x47\\x48\\x49\\x4A'", '\x47\x48\x49\x4A');
  PrintBytes("'G\\x48\\x49\\x4A'", 'G\x48\x49\x4A');
  PrintBytes("'GH\\x49\\x4A'", 'GH\x49\x4A');
  PrintBytes("'G\\x48I\\x4A'", 'G\x48I\x4A');
  PrintBytes("'GHI\\x4A'", 'GHI\x4A');
  PrintBytes("'GHI'", 'GHI');
  PrintBytes("'\\x47\\x48\\x49'", '\x47\x48\x49');
  PrintBytes("'GH\\x49'", 'GH\x49');
  PrintBytes("'\\x47H\\x49'", '\x47H\x49');
  PrintBytes("'\\x47HI'", '\x47HI');
  return 0;
}

Output of the utility, compiled by Visual C++:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : GHIJ
     'G\x48\x49\x4A' : HGIJ
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : GHI
            'GH\x49' : IHG
         '\x47H\x49' : HGI
            '\x47HI' : IHG

Output of the utility, compiled by GCC or Clang:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : JIHG
     'G\x48\x49\x4A' : JIHG
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : IHG
            'GH\x49' : IHG
         '\x47H\x49' : IHG
            '\x47HI' : IHG

Conclusion

The V1039 diagnostic is added in the PVS-Studio analyzer of 7.03 version, which has been recently released. You can download the latest version of the analyzer on the download page.

Popular related articles
C++ subtleties: so, you've declared a class...

Date: Feb 07 2023

Author: Sergey Larin

Our team constantly encounters some C++ features that may be unknown to some developers. In this article, we're going to learn how a seemingly typical feature — class forward declarations — works.
Under the hood of SAST: how code analysis tools look for security flaws

Date: Jan 26 2023

Author: Sergey Vasiliev

Here we'll discuss how SAST solutions find security flaws. I'll tell you about different and complementary approaches to detecting potential vulnerabilities, explain why each of them is necessary, an…
Is there life without RTTI or How we wrote our own dynamic_cast

Date: Oct 13 2022

Author: Vladislav Stolyarov

There aren't many things left in modern C++ that don't fit the "Don't pay for what you don't use" paradigm. One of them is dynamic_cast. In this article, we'll find out what's wrong with it, and afte…
Why do arrays have to be deleted via delete[] in C++

Date: Jul 27 2022

Author: Mikhail Gelvih

This note is for C++ beginner programmers who are wondering why everyone keeps telling them to use delete[] for arrays. But, instead of a clear explanation, senior developers just keep hiding behind …
Intermodular analysis of C and C++ projects in detail. Part 2

Date: Jul 14 2022

Author: Oleg Lisiy

In part 1 we discussed the basics of C and C++ projects compiling. We also talked over linking and optimizations. In part 2 we are going to delve deeper into intermodular analysis and discuss its ano…


Comments (0)

Next comments next comments
close comment form
Unicorn with delicious cookie
Our website uses cookies to enhance your browsing experience.
Accept