Our website uses cookies to enhance your browsing experience.
Accept
to the top
>
>
>
V1076. Code contains invisible...
menu mobile close menu
Additional information
toggle menu Contents

V1076. Code contains invisible characters that may alter its logic. Consider enabling the display of invisible characters in the code editor.

Dec 07 2021

The analyzer has detected characters in code that may confuse a developer. These characters can be invisible and change the code representation in IDEs. Such character sequences may lead to the fact that the developer and the compiler would interpret the code differently.

This can be done on purpose. This type of attack is called Trojan Source. To learn more:

The analyzer issues a warning if it finds one of the following characters:

Character

Code

Definition

Description

LRE

U+202A

LEFT-TO-RIGHT EMBEDDING

The text after the LRE character is interpreted as embedded and displayed left-to-right. The action of LRE is interrupted by the PDF character or a newline character.

RLE

U+202B

RIGHT-TO-LEFT EMBEDDING

The text after the RLE character is interpreted as embedded and displayed right-to-left. The action of the RLE character is interrupted by the PDF character or a newline character.

LRO

U+202D

LEFT-TO-RIGHT OVERRIDE

The text after the LRO character is forcibly displayed left-to-right. The action of the LRO character is interrupted by the PDF character or a newline character.

RLO

U+202E

RIGHT-TO-LEFT OVERRIDE

The text after the RLO character is forcibly displayed right-to-left. The action of the RLO character is interrupted by the PDF character or a newline character.

PDF

U+202C

POP DIRECTIONAL FORMATTING

The PDF character interrupts the action of one of the LRE, RLE, LRO or RLO characters encountered earlier. It interrupts exactly one last encountered character.

LRI

U+2066

LEFT‑TO‑RIGHT ISOLATE

The text after the LRI character is displayed left-to-right and interpreted as isolated. This means that other control characters do not affect the display of the text fragment. The action of the LRI character is interrupted by the PDI character or a newline character.

RLI

U+2067

RIGHT‑TO‑LEFT ISOLATE

The text after the RLI character is displayed right-to-left and interpreted as isolated. This means that other control characters do not affect the display of this text fragment. The RLI action is interrupted by the PDI character or the newline character.

FSI

U+2068

FIRST STRONG ISOLATE

The direction of the text after the FSI character is set by the first control character not included in this text fragment. Other control characters do not affect the display of this text. The action of the FSI character is interrupted by the PDI character or a newline character.

PDI

U+2069

POP DIRECTIONAL ISOLATE

The PDI character interrupts the action of one of the LRI, RLI or FSI characters encountered earlier. Interrupts exactly one last character encountered.

LRM

U+200E

LEFT-TO-RIGHT MARK

The text after the LRM character is displayed left-to-right. The LRM action is interrupted by a newline character.

RLM

U+200F

RIGHT-TO-LEFT MARK

The text after the RLM character is displayed right-to-left. The RLM action is interrupted by a newline character.

ALM

U+061C

ARABIC LETTER MARK

The text after the ALM character is displayed right-to-left. The ALM action is interrupted by a newline character.

ZWSP

U+200B

ZERO WIDTH SPACE

It represents an invisible space character. The use of ZWSP character causes different strings to be displayed the same way. For example, str[ZWSP]ing is displayed as string.

The example:

#include <iostream>

int main()
{
  bool isAdmin = false;
  /*[RLO] } [LRI] if (isAdmin)[PDI] [LRI] begin admins only */ // (1)
      std::cout << "You are an admin.\n";
  /* end admins only [RLO]{ [LRI]*/                            // (2)
  return 0;
}

Look at line N1:

[LRI] if (isAdmin)[PDI]

The [LRI] character applies up to the [PDI] character. The if (isAdmin) string is displayed left-to-right and treated as an isolated. We get if (isAdmin).

[LRI] begin admins only */

The [LRI] character applies up the end of the string. We get an isolated string: begin admins only */.

[RLO] {space1}, '}', {space2}, 'if (isAdmin)', 'begin admins only */'

The [RLO] character applies up to the end of the string and displays the text right-to-left. Each of the isolated strings previously obtained is treated as a separate indivisible character. We get the following sequence:

'begin admins only */', 'if (isAdmin)', {space2}, '{', {space1}

Note. The closing brace character is now displayed as { instead of }.

The final view of line N1 that can be displayed in the editor:

/* begin admins only */ if (isAdmin) {

Similar transformations apply to line N2, which is displayed like this:

/* end admins only */ }

The code fragment that can be displayed in the editor:

#include <iostream>

int main()
{
  bool isAdmin = false;
  /* begin admins only */ if (isAdmin) { 
      std::cout << "You are an admin.\n";
  /* end admins only */ }
  return 0;
}

Reviewers may think that the code is checked before displaying the message. They ignore the comments and think that the code should be executed like this:

#include <iostream>

int main()
{
  bool isAdmin = false;
  if (isAdmin) { 
    std::cout << "You are an admin.\n";
  }
  return 0;
}

However, there is no check. For the compiler, the code above looks like this:

#include <iostream>

int main()
{
  bool isAdmin = false;
  std::cout << "You are an admin.\n";
  return 0;
}

Look at a simple and at the same time dangerous example where non-displayed characters are used:

#include <string>
#include <string_view>

enum class BlockCipherType { DES, TripleDES, AES, /*....*/ };

constexpr BlockCipherType
StringToBlockCipherType(std::string_view str) noexcept
{
  if (str == "AES[ZWSP]")
    return BlockCipherType::AES;
  else if (str == "TripleDES[ZWSP]")
    return BlockCipherType::TripleDES;
  else
    return BlockCipherType::DES;
}

The StringToBlockCipherType function converts a string to one of the BlockCipherType enumeration values. It looks like the function returns three different values, but it does not. Since an invisible space character [ZWSP] is added at the end of each string literal, the check for equality with strings AES and TriplesDES will be false. As a result, out of three expected returned values, the function returns only BlockCipherType::DES. At the same time, the code editor may display the code like this:

#include <string>
#include <string_view>

enum class BlockCipherType { DES, TripleDES, AES, /*....*/ };

constexpr BlockCipherType
StringToBlockCipherType(std::string_view str) noexcept
{
  if (str == "AES")
    return BlockCipherType::AES;
  else if (str == "TripleDES")
    return BlockCipherType::TripleDES;
  else
    return BlockCipherType::DES;
}

If the analyzer issues the warning about invisible characters, enable the display of invisible characters. Ensure that they do not change the logic of the program execution.

This diagnostic is classified as:

You can look at examples of errors detected by the V1076 diagnostic.