>
>
>
V1076. Code contains invisible characte…


V1076. Code contains invisible characters that may alter its logic. Consider enabling the display of invisible characters in the code editor.

The analyzer has detected characters in code that may confuse the developer. These characters may be invisible and change the code representation in IDEs. Such character sequences may lead to the fact that the developer and the compiler would interpret the code differently.

This can be done on purpose. This type of attack is called Trojan Source. To learn more:

The analyzer issues a warning if it finds one of the following characters:

Character

Code

Definition

Description

LRE

U+202A

LEFT-TO-RIGHT EMBEDDING

The text after the LRE character is interpreted as inserted and displayed left-to-right. The action of LRE is interrupted by the PDF character or a newline character.

RLE

U+202B

RIGHT-TO-LEFT EMBEDDING

The text after the RLE character is interpreted as inserted and displayed right-to-left. The action of the RLE character is interrupted by the PDF character or a newline character.

LRO

U+202D

LEFT-TO-RIGHT OVERRIDE

The text after the LRO character is forcibly displayed left-to-right. The action of the LRO character is interrupted by the PDF character or a newline character.

RLO

U+202E

RIGHT-TO-LEFT OVERRIDE

The text after the RLO character is forcibly displayed right-to-left. The action of the RLO character is interrupted by the PDF character or a newline character.

PDF

U+202C

POP DIRECTIONAL FORMATTING

The PDF character interrupts the action of one of the LRE, RLE, LRO or RLO characters encountered earlier. Interrupts exactly one last character encountered.

LRI

U+2066

LEFT‑TO‑RIGHT ISOLATE

The text after the LRI symbol is displayed left-to-right and interpreted as isolated. This means that other control characters do not affect the display of this text fragment. The action of the LRI character is interrupted by the PDI character or a newline character.

RLI

U+2067

RIGHT‑TO‑LEFT ISOLATE

The text after the RLI symbol is displayed right-to-left and interpreted as isolated. This means that other control characters do not affect the display of this text fragment. The RLI action is interrupted by the PDI symbol or the newline symbol.

FSI

U+2068

FIRST STRONG ISOLATE

The direction of the text after the FSI character is set by the first control character not included in this text fragment. Other control characters do not affect the display of this text. The action of the FSI character is interrupted by the PDI character or a newline character.

PDI

U+2069

POP DIRECTIONAL ISOLATE

The PDI symbol interrupts the action of one of the LRI, RLI or FSI symbols encountered earlier. Interrupts exactly one last character encountered.

LRM

U+200E

LEFT-TO-RIGHT MARK

The text after the LRM character is displayed left-to-right. The LRM action is interrupted by a newline character.

RLM

U+200F

RIGHT-TO-LEFT MARK

The text after the RLM character is displayed right-to-left. The RLM action is interrupted by a newline character.

ALM

U+061C

ARABIC LETTER MARK

The text after the ALM character is displayed right-to-left. The ALM action is interrupted by a newline character.

ZWSP

U+200B

ZERO WIDTH SPACE

An invisible space character. The use of ZWSP character causes different strings to be displayed the same way. For example, 'str[ZWSP]ing' is displayed as 'string'.

Look at the following code fragment:

#include <iostream>

int main()
{
  bool isAdmin = false;
  /*[RLO] } [LRI] if (isAdmin)[PDI] [LRI] begin admins only */ // (1)
      std::cout << "You are an admin.\n";
  /* end admins only [RLO]{ [LRI]*/                            // (2)
  return 0;
}

Let's look closer at line (1).

[LRI] if (isAdmin)[PDI]

Here the [LRI] character has effect up to the [PDI] character. The 'if (isAdmin)' string is displayed left-to-right and is isolated. We get 'if (isAdmin)'.

[LRI] begin admins only */

Here the [LRI] character has effect up to the end of the string. We get an isolated string: 'begin admins only */'

[RLO] {space1}, '}', {space2}, 'if (isAdmin)', 'begin admins only */'

Here the [RLO] character has effect up to the end of the string and displays the text right-to-left. Each of the isolated strings obtained in the previous paragraphs is treated as a separate indivisible character. We get the following sequence:

'begin admins only */', 'if (isAdmin)', {space2}, '{', {space1}

Note that the closing brace character is now displayed as '{' instead of '}'.

The final view of line (1) that can be displayed in the editor:

/* begin admins only */ if (isAdmin) {

Similar transformations affect line (2), which is displayed like this:

/* end admins only */ }

The code fragment that can be displayed in the editor:

#include <iostream>

int main()
{
  bool isAdmin = false;
  /* begin admins only */ if (isAdmin) { 
      std::cout << "You are an admin.\n";
  /* end admins only */ }
  return 0;
}

The reviewer may think that the code is checked before displaying the message. They will ignore the comments and think that the code should be executed like this:

#include <iostream>

int main()
{
  bool isAdmin = false;
  if (isAdmin) { 
    std::cout << "You are an admin.\n";
  }
  return 0;
}

However, there is no check. For the compiler, the code above looks like this:

#include <iostream>

int main()
{
  bool isAdmin = false;
  std::cout << "You are an admin.\n";
  return 0;
}

Now let's look at a simple and at the same time dangerous example where non-displayed characters are used:

#include <string>
#include <string_view>

enum class BlockCipherType { DES, TripleDES, AES, /*....*/ };

constexpr BlockCipherType
StringToBlockCipherType(std::string_view str) noexcept
{
  if (str == "AES[ZWSP]")
    return BlockCipherType::AES;
  else if (str == "TripleDES[ZWSP]")
    return BlockCipherType::TripleDES;
  else
    return BlockCipherType::DES;
}

The 'StringToBlockCipherType' function converts a string to one of the values of the 'BlockCipherType' enumeration. You may think that the function returns three different values, but it doesn't. Since a invisible space character [ZWSP] is added at the end of each string literal, the check for equality with strings 'AES' and 'TriplesDES' will be false. As a result, out of three expected returned values, the function returns only 'BlockCipherType::DES'. At the same time, the code editor may display the code like this:

#include <string>
#include <string_view>

enum class BlockCipherType { DES, TripleDES, AES, /*....*/ };

constexpr BlockCipherType
StringToBlockCipherType(std::string_view str) noexcept
{
  if (str == "AES")
    return BlockCipherType::AES;
  else if (str == "TripleDES")
    return BlockCipherType::TripleDES;
  else
    return BlockCipherType::DES;
}

If the analyzer issued the warning about invisible characters in code, turn on the display of invisible characters. Make sure they don't change the logic of the program execution.

This diagnostic is classified as:

You can look at examples of errors detected by the V1076 diagnostic.