Captain Blood's adventures: would Arabella sink?

Project overview
Captain Blood's thirteen
Conclusion

We recently found out that the source code of the Captain Blood game (formerly known as Age of Pirates: Captain Blood) has been opened. We couldn't pass it up so we checked the game's quality with the help of PVS-Studio. Will Captain Blood's legendary ship sink into the sea abyss because of the detected bugs? Let's find out!

Project overview

Captain Blood is an action-adventure game made by 1C Company and based on the novels by Rafael Sabatini. The game follows the adventures of Captain Peter Blood, the main character of Sabatini's novel. The action takes place in New England. The protagonist of the game is an Irish doctor who has been unjustly convicted of treason against the king and exiled to the island of Barbados. Peter Blood managed to escape from the island by capturing a Spanish ship. Thus, he became the most famous pirate of that time.

The project was never released and was closed in 2010. This happened because 1C and Playlogic were unable to decide who owns the rights to the game. There were rumors that the final build of the game had been irretrievably lost. Recently, some news came out proving the opposite. A working build of the game appeared on a popular torrent tracker, and a walkthrough of the game was posted on YouTube. Later, the the game's source code was posted on GitHub under the GPLv3 license.

Loading video

So, after we got learned more about the project, let's discuss the most interesting warnings that PVS-Studio found when analyzing it.

Captain Blood's thirteen

Warning N1

V1075 The function expects the file to be opened for writing, but it was opened for reading. Debugger.cpp 172

We can congratulate our teammate Sergey Larin on the first "trophy" warning of his diagnostic rule (it was released in PVS-Studio 7.15). According to this diagnostic rule, data is written into files that are opened for reading and vice versa. Honestly, we didn't even expect that this warning would ever be issued. Here is this problematic function:

void appDebuger::CopyDump(const char * src, DWORD from, const char * dst)
{
  FILE * file = fopen(src, "rb");
  if(file)
  {
    fseek(file, 0, SEEK_END);
    DWORD size = ftell(file);
    fseek(file, from, SEEK_SET);
    if(size > from)
    {
      FILE * to = fopen(dst, "a+b");
      if(to)
      {
        char buf[128];
        while(from < size)
        {          
          DWORD s = size - from;
          if(s > sizeof(buf)) s = sizeof(buf);
          memset(buf, ' ', sizeof(buf));
          fread(buf, s, 1, file);
          fwrite(buf, s, 1, file);            // <=
          from += s;
        }
        fclose(to);
      }
    }
    fclose(file);
  }
}

So, the function should read data from a file at the src path, starting at the from position, and write it to another file at the dst path. The file variable is responsible for the source file, the to variable is responsible for the resulting file.

Unfortunately, we can see reading from the file variable and writing to the file variable in the code. We can fix it this way:

fwrite(buf, s, 1, to);

Warning N2

V1065 [CWE-682] Expression can be simplified, check 'bi' and similar operands. PreviewAnimation.cpp 146

PreviewAnimation::PreviewAnimation(....)
{
  // ....
  const int bw = 30;
  const int bh = 30;
  const int bi = 3;
  const int cn = 9;

  bar.r.w = 7 + (bw + bi)*cn + 7 - bi + 1 + (bw + bi) + 7 - bi + 1;
  // ....
}

That's one of my favorite diagnostic rules! It indicates that an expression can be simplified by removing the bi variable:

bar.r.w = 7 + (bw + bi)*cn + 7 - bi + 1 + bw + 7 + 1;

Maybe there should be another variable instead of one of the bi. Or the (bw + bi) expression was supposed to be multiplied by something. I wonder how the game animation would change...

I would like to mention that it's a bad practice to give constants such names (bw, bh, bi, cn). They can make the code confusing to the person reading it. It would be better to make the names of the variables more meaningful or leave a comment to identify them.

Warning N3

V570 [CWE-480] The 'ldMaxChars' variable is assigned to itself. StringAttr.cpp 198

void StringAttribute::LoadFromFile (....)
{
  int ldMinChars, ldMaxChars;
  // ....
  minChars = ldMinChars;
  ldMaxChars = ldMaxChars;
  // ....
}

class StringAttribute : public BaseAttribute
{
  int minChars;
  int maxChars;
}

The usual assignment of a variable to itself takes place here. Let's look at the definition of the StringAttribute class to see what should be in place of the variable where the assignment occurs. Here we can see the similar maxChars data member. Obviously, the developer made a mistake and actually should have assigned the ldMaxChars value to the variable. Here's the fixed version of the code:

void StringAttribute::LoadFromFile (....)
{
  int ldMinChars, ldMaxChars;
  // ....
  minChars = ldMinChars;
  maxChars = ldMaxChars;
  // ....
}

Warning N4

V781 [CWE-20, CERT-API00-C] The value of the 'start' index is checked after it was used. Perhaps there is a mistake in program logic. coreloader.h 136

__forceinline void GetBootParams(const char * bootIni)
{
  long start = 0;
  // ....
  if (bootIni[start] && start > 0)
  {
    // ....
  }
  // ....
}

In this code fragment, the developer first used the start value to index over the bootIni array, and only then checked it for more than zero. This check may be pointless, then it would be better to remove it. The check can make sense, then it should be interchanged with the array subscripting:

if (start > 0 && bootIni[start])
{
  // ....
}

Otherwise, the developer runs the risk of crashing the game. In fact, quite a few warnings of this type are issued for this code. Here are a few that are worth paying attention:

V781 [CWE-20, CERT-API00-C] The value of the 'i' index is checked after it was used. Perhaps there is a mistake in program logic. tinyxml.h 352
V781 [CWE-20, CERT-API00-C] The value of the 'start' index is checked after it was used. Perhaps there is a mistake in program logic. coreloader.h 136
V781 [CWE-20, CERT-API00-C] The value of the 'pathLen' index is checked after it was used. Perhaps there is a mistake in program logic. winworkpath.h 50
....

Warning N5

V700 [CWE-480] Consider inspecting the 'mView = mView = chr.Render().GetView()' expression. It is odd that variable is initialized through itself. CharacterItems.cpp 705

void CharacterItems::DrawFlares(float dltTime, const Matrix & toWorld)
{
  // ....
  Matrix mView = mView = chr.Render().GetView();
  // ....
}

Here the mView variable is assigned to itself, that's pretty weird. There must have been a typo. Anyway, this redundant assignment can be removed:

void CharacterItems::DrawFlares(float dltTime, const Matrix & toWorld)
{
  // ....
  Matrix mView = chr.Render().GetView();
  // ....
}

Warning N6

V654 [CWE-834] The condition 'n >= 0' of loop is always true. resourceselect.cpp 42

typedef unsigned int dword;

TResourceSelectorWindow::TResourceSelectorWindow ()
{
  string PakPath;
  // ....
  miss->EditorGetPackPath(PakPath);
  //....
  for (dword n = PakPath.Size()-1; n >= 0; n--)
  {
    if (PakPath[n] == '\\')
    {
      PakPath.Delete(0, n+1);
      break;
    }
  }

  // ....
}

In this case, we see a potentially infinite loop, if there is no \ in the string. The n variable is of unsigned type, which means that it will always be greater than or equal to 0.

But in fact, there will be no infinite loop, because when the n variable overflows, there will be a failure. Overflowing of an unsigned variable does not lead to undefined behavior, but accessing beyond the array boundaries will. Actually, an access violation is likely to occur.

Here the developer has written an algorithm, that searches for the first occurrence of \ from the end. Then the algorithm removes a substring from the beginning to this character inclusively. The best way to fix this code is to use rbegin and rend, but there's a nuance. We are dealing with a self-written string (a std::string analog with extra functionality), that has no iterators. Then, to fix the code, we can take a signed variable twice the size (int64_t) and run a loop with it.

for (int64_t n = PakPath.Size() - 1; n >= 0; n--)
{
  if (PakPath[n] == '\\')
  {
    PakPath.Delete(0, n+1);
    break;
  }
}

However, this will not work if you port the application to a 64-bit system, where the Size function returns a value of the uint64_t type. In this case, using iterators would be a good way to write correct code. For example, here's what the code in C++14 would look like if the self-written string still doesn't have iterators, but at least has functions to access the underlying buffer:

auto r_begin = std::make_reverse_iterator(PakPath.GetBuffer() + PakPath.Size());
auto   r_end = std::make_reverse_iterator(PakPath.GetBuffer());

while (r_begin != r_end )
{
  if (*r_begin == '\\')
  {
    PakPath.Delete(0, std::distance(r_begin, r_end));
    break;
  }

  ++r_begin;
}

Warning N7

V611 [CWE-762, CERT-MEM51-CPP] The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] ldName;'. ColorAttr.cpp 198

void ColorAttribute::LoadFromFile (....)
{
  DWORD slen = 0;
  // ....
  char* ldName = NEW char[slen+1];
  // ....
  delete ldName;
}

Here the developer has cleared the dynamically allocated memory, but in the wrong way. Instead of calling the operator delete[] to release previously allocated memory by the operator new[], the developer called a simple delete. Here is the fixed version:

void ColorAttribute::LoadFromFile (....)
{
  DWORD slen = 0;
  // ....
  char* ldName = NEW char[slen+1];
  // ....
  delete ldName[];
}

This error is discussed in more detail in the article written by my colleague: "Why do arrays have to be deleted via delete[] in C++".

Warning N8

V607 Ownerless expression 'isSetPause ? 1 : 0'. SoundsEngine.cpp 741

void SoundsEngine::SetPause(bool isSetPause)
{
  //....
  isSetPause ? 1 : 0;
  //....
}

The analyzer detects dead code. I've noticed that the SoundsEngine class contains the similar data member — isPause, but it has the long type. The code might have done a conversion of the isSetPause parameter from bool to long type earlier, but then it was refactored. This line is harmless and can be removed.

Warning N9

V583 [CWE-783] The '?:' operator, regardless of its conditional expression, always returns one and the same value: 0xffff0000. NoFatality.cpp 63

void _cdecl NoFatality::Draw(float dltTime, long level)
{
  //....
  for (....)
  {
    Render().DrawBox(...., IsActive() ? 0xffff0000 : 0xffff0000);
  }
  //....
}

As we can see, the second and the third operands of the conditional operator are identical. So, the second argument of the DrawBox function is always the 0xffff0000 constant. It's likely that a different value should have been passed in one of the branches, otherwise the code can be simplified.

Warning N10

V523 [CWE-691] The 'then' statement is equivalent to the 'else' statement. TaskViewer.cpp 298

void TaskViewer::SetText(const char * rawText)
{
  // ....
  if (rawText[0] == '#')
  {
    // Let's replace the identifier with a real string
    // then
    str = rawText;
  }
  else
  {
    str = rawText;
  }
  // ....
}

In this code fragment, the analyzer has warned us that the if statement has identical then and else branches. So, regardless the condition, the same code fragment is executed. This may not be the behavior the developer expected.

Warning N11

V614 [CWE-457, CERT-EXP53-CPP] Uninitialized variable 'color.c' used. Color.h 1268

mathinline dword mathcall Color::GetDword() const
{
  DColor color;
  color.r = (byte)(r * 255.0f);
  color.g = (byte)(g * 255.0f);
  color.b = (byte)(b * 255.0f);
  color.a = (byte)(a * 255.0f);

  return color.c;
}

Let's find out what is DColor:

class DColor
{
public:

  union
  {
#ifndef _XBOX
  struct
  {
    ///Blue
    unsigned char b;
    ///Green
    unsigned char g;
    ///Red
    unsigned char r;
    ///Transparency
    unsigned char a;
  };
#else
  struct
    {
    ///Transparency
    unsigned char a;
    ///Red
    unsigned char r;
    ///Green
    unsigned char g;
    ///Blue
    unsigned char b;
  };
#endif
  union
  {
    ///Packed color
    dword c;
    ///Packed color
    dword color;
  };
};

The four RGBA components are written byte-by-byte to the required data members, and then returned from the function as value of the dword type (the 4-byte unsigned type). If this code were written in the C language, it would be harmless and the developer would get the expected behavior. In C++ reading the union inactive data member results in undefined behavior. This is because C++ works with objects that should start their lifetime. Only one data member can be active at any given time. The lifetime of the inactive data member (the last union data member) has not started.

Some compilers allow such code to be written as a non-standard compiler extension, and in most cases this code works as expected. However, please note that this code is non-portable to other compilers. This code may work correctly on the current compiler and platform, but if we change a tiny little thing, it may break.

So, how do we write code in C++ that would work the same way always and everywhere? There are several ways to do it. We can keep the four RGBA components in four data members and (if necessary) get their combination as a dword using bitwise operators:

class DColor
{
public:
  ///Blue
  uint8_t b;
  ///Green
  uint8_t g;
  ///Red
  uint8_t r;
  ///Transparency
  uint8_t a;

  dword getAsDword() const noexcept
  {
#ifndef _XBOX
    return (static_cast<uint32_t>(b) << 0)
         | (static_cast<uint32_t>(g) << 8)
         | (static_cast<uint32_t>(r) << 16)
         | (static_cast<uint32_t>(a) << 24);
#else
    return (static_cast<uint32_t>(a) << 0)
         | (static_cast<uint32_t>(r) << 8)
         | (static_cast<uint32_t>(g) << 16)
         | (static_cast<uint32_t>(b) << 24);
#endif
  }
};

There is another way. We can use a 4-byte array that can be converted to a dword with the memcpy function or std::bit_cast since C++20:

class DColor
{
private:
  uint8_t m_components[4];

public:
  DColor(uint8_t r, uint8_t g, uint8_t b, uint8_t a) noexcept; 
  DColor(dword comp) noexcept; // To construct from dword

  DColor& operator=(dword) noexcept; // To be assigned to dword

  uint8_t& R() noexcept;
  uint8_t& G() noexcept;
  uint8_t& B() noexcept;
  uint8_t& A() noexcept;

  dword getAsDword() const noexcept
  {
#if __cplusplus >= 202002L && __has_cpp_attribute(__cpp_lib_bit_cast)
    return std::bit_cast<dword>(m_components);
#else
    dword result;
    memcpy(&result, m_components, sizeof(m_components));
    return result;
#endif
  }
};

My colleague, Andrey Karpov, made a comment on this. The use of conditional compilation directives makes the code less comprehensible. The nesting is broken and then we have to guess what the code is about by going through various code fragments in our mind.

We need to write code not to make it as short as possible, but to make it easier to read and understand. I would make two different implementations of the function here:

#if __cplusplus >= 202002L && __has_cpp_attribute(__cpp_lib_bit_cast)
  dword getAsDword() const noexcept
  {
    return std::bit_cast<dword>(m_components);
  }
#else
  dword getAsDword() const noexcept
  {
    dword result;
    memcpy(&result, m_components, sizeof(m_components));
    return result;
  }
#endif

The code is a little longer, but it's much more readable that way.

If we are dealing with long functions, this will not work with conditional compilation fragments. In this case, we can enclose these conditional compilation fragments in separate functions, which are implemented differently in different cases. It means that we need to separate the general code from the compilation conditions.

Warning N12

V579 [CWE-687, CERT-ARR01-C] The memset function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. Debugger.cpp 282

void appDebuger::debuggerMakeThreadDump(....)
{
  CONTEXT ct;
  memset (&ct, 0, sizeof(&ct));
  // ....
}
typedef struct DECLSPEC_NOINITALL _CONTEXT 
{
  DWORD ContextFlags;
  // ....
  DWORD   SegCs;              // MUST BE SANITIZED
  DWORD   EFlags;             // MUST BE SANITIZED
  DWORD   Esp;
  DWORD   SegSs;
  BYTE   ExtendedRegisters[MAXIMUM_SUPPORTED_EXTENSION];

} CONTEXT;

The developer intended to initialize all structure data members with zeros. The size of the ct object is 716 bytes, and they passed the size of the pointer – 4 bytes. We can fix the memset call this way:

memset (&ct, 0, sizeof(ct));

P.S. Oddly enough, developers are still using this structure initialization pattern with the memset function, although they could write this way:

CONTEXT ct {};

Warning N13

I would like to finish listing the errors with a few amusing typos:

V501 [CWE-571] There are identical sub-expressions 'lleg.to >= 0' to the left and to the right of the '&&' operator. CharacterLegs.cpp 51
V501 [CWE-571] There are identical sub-expressions 'lleg.fo >= 0' to the left and to the right of the '&&' operator. CharacterLegs.cpp 51

void CharacterLegs::Invalidate()
{
  lleg.th = ani->FindBone("Bedro_left_joint"  ,true);
  lleg.kn = ani->FindBone("Golen_left_joint"  ,true);
  lleg.fo = ani->FindBone("Foolt_left_joint"  ,true);
  lleg.to = ani->FindBone("Foolt_left_joint_2",true);

  rleg.th = ani->FindBone("Bedro_right_joint"  ,true);
  rleg.kn = ani->FindBone("Golen_right_joint"  ,true);
  rleg.fo = ani->FindBone( "Foot_right_joint"  ,true);
  rleg.to = ani->FindBone( "Foot_right_joint_2",true);

  if (   lleg.th >= 0 && lleg.kn >= 0 
      && lleg.fo >= 0 && lleg.to >= 0 
      && rleg.th >= 0 && rleg.kn >= 0 
      && lleg.fo >= 0 && lleg.to >= 0 )
  {
    // ....
  }

  // ....
}

Here the analyzer found identical subexpressions in the condition of the if statement — lleg.to >= 0 and lleg.fo >= 0. As you can see, the developer intended to check all the data members but made a typo. Instead of two identical subexpressions there must be checks for other variables:

if (   lleg.th >= 0 && lleg.kn >= 0 
    && rleg.th >= 0 && rleg.kn >= 0 
    && lleg.fo >= 0 && lleg.to >= 0
    && rleg.fo >= 0 && rleg.to >= 0)
{
  // ....
}

Conclusion

The analyzer found various kinds of errors in the project. I have described the Captain Blood's thirteen in this article, yo-ho-ho! You can find other bugs with the PVS-Studio analyzer as well. That's a good incentive to give the analyzer a try (if you have not yet). And it would be even more compelling to see what you can find in your own code or that of your teammates :)

I'd like to thank people who made it possible to finally see the game, as well as those who developed it and supported the game with reviews, articles and discussions. Well, whether the "Arabella" would fo down or not, let the readers decide for themselves!