Unicorn with delicious cookie
Nous utilisons des cookies pour améliorer votre expérience de navigation. En savoir plus
Accepter
to the top
>
>
>
The dangers of using multi-character co…

The dangers of using multi-character constants

26 Jui 2019

During code analysis, PVS-Studio analyzes the data flow and operates variable values. Values are taken from constants or derived from conditional expressions. We call them virtual values. Recently, we have refined them in order to work with multi-character constants and this has become the reason to create a new diagnostic rule.

Introduction

Multi-character-literals are implementation-defined, so different compilers can encode them in different ways. For example, GCC and Clang set a value, based on the order of the symbols in the literal, while MSVC moves them depending on the symbol's type (regular or escape).

For example, the 'T\x65s\x74' literal will be encoded in different of ways, depending on the compiler. A similar logic had to be added in the analyzer. As a result, we've made a new diagnostic rule V1039 to identify such literals in the code. These literals are dangerous in cross-platform projects that use multiple compilers for building.

Diagnostic V1039

Let's look at the example. The code below, compiled by different compilers, will behave differently:

#include <stdio.h>

void foo(int c)
{
  if (c == 'T\x65s\x74')                       // <= V1039
  {
    printf("Compiled with GCC or Clang.\n");
  }
  else
  {
    printf("It's another compiler (for example, MSVC).\n");
  }
}

int main(int argc, char** argv)
{
  foo('Test');
  return 0;
}

The program, compiled by different compilers, will print different messages on the screen.

For a project that uses a specific compiler, it won't be noticeable. But when porting, problems may occur, so one should replace such literals with simple numerical constants, such as 'Test' is to be changed with 0x54657374.

To demonstrate the difference between compilers, we'll write a small utility that takes sequences of 3 and 4 symbols, such as 'GHIJ' and 'GHI', and displays their representation in memory after compilation.

Utility code:

#include <stdio.h>

typedef int char_t;

void PrintBytes(const char* format, char_t lit)
{
  printf("%20s : ", format);

  const unsigned char *ptr = (const unsigned char*)&lit;
  for (int i = sizeof(lit); i--;)
  {
    printf("%c", *ptr++);
  }
  putchar('\n');
}

int main(int argc, char** argv)
{
  printf("Hex codes are: G(%02X) H(%02X) I(%02X) J(%02X)\n",'G','H','I','J');
  PrintBytes("'GHIJ'", 'GHIJ');
  PrintBytes("'\\x47\\x48\\x49\\x4A'", '\x47\x48\x49\x4A');
  PrintBytes("'G\\x48\\x49\\x4A'", 'G\x48\x49\x4A');
  PrintBytes("'GH\\x49\\x4A'", 'GH\x49\x4A');
  PrintBytes("'G\\x48I\\x4A'", 'G\x48I\x4A');
  PrintBytes("'GHI\\x4A'", 'GHI\x4A');
  PrintBytes("'GHI'", 'GHI');
  PrintBytes("'\\x47\\x48\\x49'", '\x47\x48\x49');
  PrintBytes("'GH\\x49'", 'GH\x49');
  PrintBytes("'\\x47H\\x49'", '\x47H\x49');
  PrintBytes("'\\x47HI'", '\x47HI');
  return 0;
}

Output of the utility, compiled by Visual C++:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : GHIJ
     'G\x48\x49\x4A' : HGIJ
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : GHI
            'GH\x49' : IHG
         '\x47H\x49' : HGI
            '\x47HI' : IHG

Output of the utility, compiled by GCC or Clang:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : JIHG
     'G\x48\x49\x4A' : JIHG
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : IHG
            'GH\x49' : IHG
         '\x47H\x49' : IHG
            '\x47HI' : IHG

Conclusion

The V1039 diagnostic is added in the PVS-Studio analyzer of 7.03 version, which has been recently released. You can download the latest version of the analyzer on the download page.

Popular related articles

S'abonner

Comments (0)

close comment form
close form

Remplissez le formulaire ci‑dessous en 2 étapes simples :

Vos coordonnées :

Étape 1
Félicitations ! Voici votre code promo !

Type de licence souhaité :

Étape 2
Team license
Enterprise licence
** En cliquant sur ce bouton, vous déclarez accepter notre politique de confidentialité
close form
Demandez des tarifs
Nouvelle licence
Renouvellement de licence
--Sélectionnez la devise--
USD
EUR
* En cliquant sur ce bouton, vous déclarez accepter notre politique de confidentialité

close form
La licence PVS‑Studio gratuit pour les spécialistes Microsoft MVP
close form
Pour obtenir la licence de votre projet open source, s’il vous plait rempliez ce formulaire
* En cliquant sur ce bouton, vous déclarez accepter notre politique de confidentialité

close form
I want to join the test
* En cliquant sur ce bouton, vous déclarez accepter notre politique de confidentialité

close form
check circle
Votre message a été envoyé.

Nous vous répondrons à


Si l'e-mail n'apparaît pas dans votre boîte de réception, recherchez-le dans l'un des dossiers suivants:

  • Promotion
  • Notifications
  • Spam