Andrey Karpov

Aug 25 2023

Tags:

#Cpp #CSharp #Knowledge #Java

Common patterns of typos in programming

Aug 25 2023

Author: Andrey Karpov

Research background
The last line effect
The most dangerous memset function
The incorrect comparison functions
The incorrect copy functions
The errors related to data and time
Unlucky numbers: 0, 1, 2
The off-by-one error
Share your experience with us!
PVS-Studio
Additional links:

The developers have an endless amount of ways to make mistakes while writing code. However, sometimes we can find obvious and interesting patterns in how and where developers make mistakes. Let's talk about the code as a "magnet" for typos.

Research background

In order to test and promote the PVS-Studio static code analyzer, we check various open-source projects. If we find errors, we report about them to the authors of projects. We collect errors and write articles about the most interesting cases.

When looking at all these errors, I notice various repeating patterns of typos. With only a few exceptions, they are independent of the programming language. At least, they are common to code written in C, C++, C#, and Java. In this article, I'll outline 7 patterns that I've noticed by now:

The last line effect
The most dangerous memset function
The incorrect comparison functions
The incorrect copy functions
The errors related to data and time
The unlucky numbers: 0, 1, 2
The off-by-one error

The fact that these error patterns are noticeable shows that they are common. It's useful to know how to avoid potentially dangerous coding or how to find errors more efficiently during code reviews. In other words, you'll learn which code attracts errors and how to review code more carefully. PVS-Studio can indeed detect a lot of errors, but it cannot detect them all. So, more attention to them isn't out of place.

The last line effect

It's my first found pattern. I first described it in an article of 2014. Here's the description of the pattern: when writing code blocks of the same type, you may make a mistake in the last block.

Let me explain it by an example of code from the Godot Engine project (C++).

String SoftBody::get_configuration_warning() const {
  ....
  Transform t = get_transform();
  if ((ABS(t.basis.get_axis(0).length() - 1.0) > 0.05 ||
       ABS(t.basis.get_axis(1).length() - 1.0) > 0.05 ||
       ABS(t.basis.get_axis(0).length() - 1.0) > 0.05)) {
    if (!warning.empty())
  ....
}

It's a classic copy-paste error. The developers don't want to retype similar lines of code. That's why, they just double the code line:

ABS(t.basis.get_axis(0).length() - 1.0) > 0.05

After that, they replace 0 by 1 in the second line, but they forget to replace 0 by 2 in the last line. When we separately review such code fragment, it seems surprising to find such simple error. But that's exactly how this pattern looks like.

Here's another example found in the unit tests of the LLVM (C++) project.

TEST(RegisterContextMinidump, ConvertMinidumpContext_x86_64) {
  MinidumpContext_x86_64 Context;
  ....
  Context.rax = 0x0001020304050607;
  Context.rbx = 0x08090a0b0c0d0e0f;
  ....
  Context.eflags = 0x88898a8b;
  Context.cs = 0x8c8d;
  Context.fs = 0x8e8f;
  Context.gs = 0x9091;
  Context.ss = 0x9293;    // <=
  Context.ds = 0x9495;
  Context.ss = 0x9697;    // <=
  llvm::ArrayRef<uint8_t> ContextRef(reinterpret_cast<uint8_t *>(&Context),
                                     sizeof(Context));
  ....
}

Let's take a look at the code lines highlighted with the code comments. This code was hardly written using copy-paste, but an error is still in the last line. The developer hurried and inattentively made a typo when repeatedly writing the value into the ss register instead of the es register.

Note. Another interesting thing is that the error shows how a static analyzer complements unit testing. Unit tests can also contain errors that cause certain untested or not fully tested scenarios. However, writing unit tests for unit tests is not practical. Code analyzers will help you here.

To avoid the impression that such errors are common only in C and C++ languages, I will show you an example of the Java code. I found the following typo in the IntelliJ IDEA Community Edition project (Java).

private static boolean isBeforeOrAfterKeyword(String str, boolean trimKeyword) {
  return (trimKeyword ? LoadingOrder.BEFORE_STR.trim() :
           LoadingOrder.BEFORE_STR).equalsIgnoreCase(str) ||
         (trimKeyword ? LoadingOrder.AFTER_STR.trim() :
           LoadingOrder.AFTER_STR).equalsIgnoreCase(str) ||
         LoadingOrder.BEFORE_STR_OLD.equalsIgnoreCase(str) ||         // <=
         LoadingOrder.BEFORE_STR_OLD.equalsIgnoreCase(str);           // <=
}

Here we notice an obvious pattern. You can see other similar examples in the old article "The last line effect".

Why did I name this pattern in such a way? The "last meters effect" from the world of mountaineering — this is my source of inspiration. I've read the observation that climbers often make a mistake at the end of a climb. The reason is not because they are tired — they are less concentrated. They mistakenly think that they've already reached the mountain top and soon they can rest, and enjoy the end of the journey. The climber loses concentration, rushes to finish the climb and does something wrong.

When writing code, developers can experience the same thing. It's boring to write code blocks of the same type. We will prove it below when we talk about the comparison functions. So, developers rush to finish a monotonous job. When writing the last code line, they are glad that they are done with routine and may start to think about what to write next.

In any case, the concentration declines, and developers make errors. Well, they can make errors randomly, but the observation says it's more likely to happen in the last code block of the same type.

The same story happens with a code review. It's boring to review the code of the same type. That's why the concentration quickly declines, and developers carelessly check the last code lines or don't check them at all. This is how errors occur in code and can remain unnoticed. When you find these errors, their simplicity can surprise.

Tip: When reviewing code, you should pay more attention to the last code lines.

The most dangerous memset function

In addition to common patterns, there are patterns which are specific to a particular language. For example, the memset function causes a great amount of errors. It's mostly used in the C programs, but you can often find it in the C++ code as well.

void* memset( void* dest, int ch, size_t count );           // C
void* std::memset( void* dest, int ch, std::size_t count ); // C++

The memset function copies the value (unsigned char) ch into each of the first count characters of the object pointed to by dest. The description:

The memset is the C variant of the function.
The std::memset is the C++ variant of the function.

In my opinion, this function is the leader of C and C++ languages in attracting bugs to itself. Most often the error is related to incorrect calculation of the buffer size.

Typos occur when unnecessary indirection operator (*) is used or when it is absent where it is needed. Or in the redundant address-of operator (&). Now, let's take a look at examples of such errors.

The developer forgot to dereference the pointer (*). The code fragment is taken from the FAR Manager v2 project for Linux (Far2l, C++). Due to a typo, the sizeof operator evaluates the pointer size instead of the structure size.

int64_t FileList::VMProcess(int OpCode,
                            void *vParam,
                            int64_t iParam)
{
  ....
  PluginInfo *PInfo = (PluginInfo *)vParam;
  memset(PInfo, 0, sizeof(PInfo));
  ....
}

The correct version:

memset(PInfo, 0, sizeof(*PInfo));

The unnecessary address-of operator (&). Here is the code from the Energy Checker SDK project (C). The sizeof operator evaluates the pointer size instead of the size of the WIN32_FIND_DATA structure. As a result, only the first 4 or 8 bytes of the structure will be zeroed out. It depends on whether Win32 or Win64 application is being built.

int plh_read_pl_folder(PPLH_PL_FOLDER_INFO pconfig) {
  ....
  WIN32_FIND_DATA file_data;
  ....
  memset(
    &file_data,
    0,
    sizeof(&file_data)
  );
  ....
}

The correct version: sizeof(file_data). It's even better to get rid of the memset function at all. To do this, you can initialize the object with zeros at the declaration:

WIN32_FIND_DATA file_data = { };

The developer confused & with *. The example is taken from the Wolfenstein 3D project (C). Here, the developer had to dereference the pointer to correctly evaluate the structure size instead of using the address-of operator.

void CG_RegisterItemVisuals( int itemNum ) {
  ....
  itemInfo_t *itemInfo;
  ....
  memset( itemInfo, 0, sizeof( &itemInfo ) );
  ....
}

The buffer overflow. In the Open source emulator PS4 (C++) project, the developer carelessly confused the sizeof argument with the countof argument. They thought that the sizeof operator would evaluate the number of elements. That's why they multiplied this value by the size of one element.

struct GnmCmdPSShader
{
  ....
  uint32_t reserved[27];
};

int PS4API sceGnmSetPsShader350(....)
{
  ....
  memset(param->reserved, 0, sizeof(param->reserved) * sizeof(uint32_t)); 
  return SCE_OK;
}

The correct version of the code:

memset(param->reserved, 0, sizeof(param->reserved));

Note. By the way, the memset function attracts not only typos but other errors as well. You can read about them in the article: "The most dangerous function in the C/C++ world". For example, potential vulnerabilities occur when the developers use the memset function at the end to overwrite private data. Here is the example from the WebRTC project (C++).

void AsyncSocksProxySocket::SendAuth() {
  ....
  char * sensitive = new char[len];
  pass_.CopyTo(sensitive, true);
  request.WriteString(sensitive);  // Password
  memset(sensitive, 0, len);
  delete [] sensitive;
  ....
}

The point is that compilers will delete the call of the memset function to optimize the code. In terms of the compiler, since the buffer is freed, we may not fill it. As a result, the private data will remain in memory. This topic is beyond the scope of the article about typos. So, if you want more details, I suggest you click the following links:

V597: the PVS-Studio diagnostic
CWE-14: Compiler Removal of Code to Clear Buffers
Safe clearing of private data

Tip: Carefully review the code which contains the call of the memset function. Moreover, during a code review, you can try to use a different way to fill a memory.

The memset function is frequently used during initialization to fill all structure fields with zero values. Here is an abstract example:

MyStruct foo;
memset(foo, 0, sizeof(foo));

It's easier and safer to immediately fill structures with zeroes at declaration:

MyStruct foo = { };

If you need to zeroize an array in C++, it's better to use the std::fill function. Let me explain it using an example from the Notepad++ project (C++).

#define CONT_MAP_MAX 50
int _iContMap[CONT_MAP_MAX];
....
DockingManager::DockingManager()
{
  ....
  memset(_iContMap, -1, CONT_MAP_MAX);
  ....
}

The developers think that they initialize an array of the int type, containing 50 elements, with the value -1. But the memset function works with bytes. In fact, the 0xFFFFFFFF value will initialize the first 12 elements. They will be filled with the correct value -1, but that's just luck. One more element will get the 0x0000FFFF value. Other array elements will remain uninitialized.

If we use the std::fill function, it's more difficult to make an error:

// since C++11
std::fill(std::begin(_iContMap), std::end(_iContMap), -1);

// since C++20
std::ranges::fill(_iContMap, -1);

The incorrect comparison functions

It's boring to write the comparison functions. In fact, this helper code is simple and uninteresting. So, there's no place for creativity. We just have to compare all the variables to each other.

That's why developers rush when writing such functions. They rush when reviewing code as well. It's hard to focus on such simple code. Such functions are diagonally viewed during a code review. It leads to many unnoticed typos.

Let's start with a classic typo in a large block of comparisons of the same type. The example is taken from the Apache Flink project (Java).

@Override
public boolean equals(Object o) 
{
  ....
  CheckpointStatistics that = (CheckpointStatistics) o;
  return id == that.id &&
    savepoint == that.savepoint &&
    triggerTimestamp == that.triggerTimestamp &&
    latestAckTimestamp == that.latestAckTimestamp &&
    stateSize == that.stateSize &&
    duration == that.duration &&
    alignmentBuffered == that.alignmentBuffered &&
    processedData == processedData &&
    persistedData == that.persistedData &&
    numSubtasks == that.numSubtasks &&
    numAckSubtasks == that.numAckSubtasks &&
    status == that.status &&
    Objects.equals(checkpointType, that.checkpointType) &&
    Objects.equals(
      checkpointStatisticsPerTask, 
      that.checkpointStatisticsPerTask);
}

Found an error? I guess, no. This code is boring to write and boring to read. You didn't probably care to scrutinize it. Here is a string with a typo.

processedData == processedData

The variable is compared to itself. The correct version:

processedData == that.processedData

We can find the same typos in smaller functions from the eShopOnContainers project (C#).

private bool CheckSameOrigin(string urlHook, string url)
{
  var firstUrl = new Uri(urlHook, UriKind.Absolute);
  var secondUrl = new Uri(url, UriKind.Absolute);

  return firstUrl.Scheme == secondUrl.Scheme &&
         firstUrl.Port == secondUrl.Port &&
         firstUrl.Host == firstUrl.Host;
}

Take a look at the last line. The field of a class is compared to itself. By the way, there's a double magnet for error. Firstly, it's the comparison function. Secondly, here is the last line effect. Bingo! That's why the typo may be not found in a quite simple and short function. The simple and boring function is a major reason to pay attention to it during the code review.

But it turns out that a typo can be made even in such short comparison function. Here is the example from the Skia Graphics Engine project (C++).

inline bool operator==(const SkPDFCanon::BitmapGlyphKey& u,
                       const SkPDFCanon::BitmapGlyphKey& v)
{
  return memcmp(&u, &u, sizeof(SkPDFCanon::BitmapGlyphKey)) == 0;
}

The u object is byte-by-byte compared to itself.

However, errors in the comparison functions are not limited to something that is compared to itself. Errors are diverse. I took an interesting typo from the Azure Service Fabric project as an example (C++).

template <typename TK, typename TV>
static bool MapCompare(const std::map<TK, TV>& lhs,
                       const std::map<TK, TV>& rhs)
{
  if (lhs.size() != rhs.size()) { false; }

  return std::equal(lhs.begin(), lhs.end(), rhs.begin());
}

Take a look at the lonely false keyword. The code is compiled, but doesn't make any sense. Here the developer forgot to write the return statement. The correct version:

if (lhs.size() != rhs.size()) { return false; }

Here is the example from the Roslyn project (C#).

protected override bool AreEqual(object other)
{
  var otherResourceString = other as LocalizableResourceString;
  return
    other != null &&
    _nameOfLocalizableResource ==
      otherResourceString._nameOfLocalizableResource &&
    _resourceManager == otherResourceString._resourceManager &&
    _resourceSource == otherResourceString._resourceSource &&
    ....
}

The NullReferenceException may occur here. The developers need to check the otherResourceString reference instead of other for the null inequality.

You can find other examples of errors in the article "The Evil within the Comparison Functions".

If I say "pay more attention to the comparison functions during code reviews", I don't think that it's going to motivate you. So, I'll answer this question in a more practical way.

Tip: You can use static code analyzers, like PVS-Studio, to assist yourself. Such analyzers carefully work on any code.

You can use the table-style formatting of code.

I wrote more about the table-style formatting of code in my mini-book "60 terrible tips for a C++ developer" — see "Terrible Tip N20. Compact code". The tip starts with explaining how not to do, and then gives a useful tip. The table-style formatting does not guarantee the absence of typos, but it definitely reduces their number.

I should note that since C++20, there is a way to request the compiler to generate the code of the comparison operators: equality, inequality, and relation. You rarely need to write your own comparison functions. This reduces the chance of making an error. Use the modern features of the C++ language!

The incorrect copy functions

While comparison functions seem boring for developers, the implementation of copy functions provokes them to an excessive enthusiasm. I occasionally find errors in such functions. It's interesting that the function text seems overcomplicated to me. Actually, this excessive complexity is the source of errors.

Here is a custom implementation of the strdup function from the Zephyr RTOS project (C).

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    ((u8_t *)mntpt)[strlen(mntpt)] = '\0';
    memcpy(cpy_mntpt, mntpt, strlen(mntpt));
  }
  return cpy_mntpt;
}

The memcpy function copies the line but fails to copy the terminal null. So, additional code is written to copy the terminal null:

((u8_t *)mntpt)[strlen(mntpt)] = '\0';

But it doesn't work! The code contains a typo that causes the terminal null to get copied into itself. Notice that the target array is mntpt, not cpy_mntpt. As a result, the mntpt_prepare function returns a non-null-terminated string.

The correct version:

((u8_t *)cpy_mntpt)[strlen(mntpt)] = '\0';

You may be confused why the developer has written this code in such a messy and strange way. We can simplify it to this version:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    strcpy(cpy_mntpt, mntpt);
  }
  return cpy_mntpt;
}

The code from the LFortran project (C) — the error in the concatenation of two strings in the new buffer.

void _lfortran_strcat(char** s1, char** s2, char** dest)
{
    int cntr = 0;
    char trmn = '\0';
    int s1_len = strlen(*s1);
    int s2_len = strlen(*s2);
    int trmn_size = strlen(&trmn);
    char* dest_char = (char*)malloc(s1_len+s2_len+trmn_size);
    for (int i = 0; i < s1_len; i++) {
        dest_char[cntr] = (*s1)[i];
        cntr++;
    }
    for (int i = 0; i < s2_len; i++) {
        dest_char[cntr] = (*s2)[i];
        cntr++;
    }
    dest_char[cntr] = trmn;
    *dest = &(dest_char[0]);
}

It's hard to say whether some errors are typos or not. The code above is just the case.

I think I'll take it as a typo after all. The developer used strlen instead of sizeof to determine the size of the terminal null in bytes.

The explanation:

char trmn = '\0';
int trmn_size = strlen(&trmn);

Here, the trmn character is interpreted as an empty string with the zero length. Thus, the trmn_size variable, whose name means the size of the terminal null, will always be 0.

You don't need to evaluate the length of an empty string. It's good idea to use the sizeof operator to evaluate how many bytes a terminal character occupies. Here is the correct code:

int trmn_size = sizeof(trmn);

We can improve this code, but it's beyond the scope of this article. For other improvements, get a chance to read the article "A beautiful error in the implementation of the string concatenation function".

And now we are going to discuss the last example. I took it from the QuantConnect Lean project (C#).

/// <summary>
/// Copy contents of the portfolio collection to a new destination.
/// </summary>
/// <remarks>
/// IDictionary implementation calling the underlying Securities collection
/// </remarks>
/// <param name="array">Destination array</param>
/// <param name="index">Position in array to start copying</param>
public void CopyTo(KeyValuePair<Symbol, SecurityHolding>[] array, int index)
{
  array = new KeyValuePair<Symbol, SecurityHolding>[Securities.Count];
  var i = 0;
  foreach (var asset in Securities)
  {
    if (i >= index)
    {
      array[i] = new KeyValuePair<Symbol,SecurityHolding>(asset.Key,
                                                          asset.Value.Holdings);
    }
    i++;
  }
}

The method gets the collection and immediately overwrites its value. The comment and the method name suggest that one array should be copied to another. However, this will not happen, and the array value outside the current method will remain unchanged.

The case is that the array argument is passed to the method by value, not by reference. Thus, after the assignment operation, the array variable which is available within the method will store a reference to the new object. The variable value within the method will remain unchanged.

This is a version of the typo where the developer forgot the out modifier. Here's the correct code:

public void CopyTo(out KeyValuePair<Symbol, SecurityHolding>[] array,
                   int index)
{
  array = new KeyValuePair<Symbol, SecurityHolding>[Securities.Count];
  ....
}

Tip: Write unit tests for your copy functions.

Indeed, unit tests are useful to write for finding errors of other varieties as well. However, they are more relevant here.

For example, I didn't mention unit tests when we discussed comparison functions. It is a thankless task to check comparison functions with unit tests:

It's boring to write unit tests for comparison functions. Anyway, we have already known that boring code is rarely successful :).
In the unit tests, you need to think about many special scenarios. For example, you need to separately check the correctness of comparison to each class member. The unit test code will be dozens of times larger than the comparison functions' code. Thus, the using of unit tests may not be practical from the benefit-cost ratio perspective.

Copy functions are another matter. Take a look at the above errors again. Even the simplest set of unit tests will detect them all. Unit tests will show that functions return the wrong things.

The errors related to data and time

You can find many typos in the code that manages time and dates. These errors are inherently different. That's why it's hard for me to explain why such code is a magnet for errors. So, I'll just give you a few examples. I invite readers to speculate on this topic in comments.

The example from the Umbraco project (C#) — the constructors of the DateTime class are called incorrectly.

public static DateTime TruncateTo(this DateTime dt,
                                  DateTruncate truncateTo)
{
  if (truncateTo == DateTruncate.Year)
    return new DateTime(dt.Year, 0, 0);
  if (truncateTo == DateTruncate.Month)
    return new DateTime(dt.Year, dt.Month, 0);
  ....
}

The developers made a typo because they got used to numbering everything starting from zero — the array indexes, for example. But here we have dates instead of arrays. The first day or month has the number 1 instead of 0. The call of constructors in the above code leads to ArgumentOutOfRangeException. The correct version:

if (truncateTo == DateTruncate.Year)
  return new DateTime(dt.Year, 1, 1);
if (truncateTo == DateTruncate.Month)
  return new DateTime(dt.Year, dt.Month, 1);

The example from the MPC-HC project (C++) — the uninitialized variable may be used here.

void CSyncAP::RenderThread()
{
  ....
  REFERENCE_TIME rtRefClockTimeNow;
  if (m_pRefClock) {
    m_pRefClock->GetTime(&rtRefClockTimeNow);
  }
  LONG lLastVsyncTime =
    (LONG)((m_llEstVBlankTime - rtRefClockTimeNow) / 10000);
  ....

}

If the condition is not met, the rtRefClockTimeNow variable will remain uninitialized. It will still be used in evaluations. The developers might forget to initialize the variable when declaring it with the default value.

Here is a beautiful typo from the Tizen project (C). The developers made a mistake with the data view.

static void preview_down_cb(....)
{
  ....
  int delay = 0.5;
  double fdelay;
  fdelay = ((double)delay / 1000.0f);
  DbgPrint("Long press: %lf\n", fdelay);
  ....
}

The idea to write the 0.5 value into the int type variable is bad. The delay is actually counted in milliseconds. The correct default delay value of 500 milliseconds should be written like this:

int delay = 500;

Here is the example of the incorrect type conversion from the MSBuild project (C#).

internal static void Trace(....)
{
  ....
  long now = DateTime.UtcNow.Ticks;

  float millisecondsSinceLastLog =
    (float)((now - s_lastLoggedTicks)/10000L);
  ....
}

Here the integer division occurs first, and only then the explicit conversion to the float type is performed. Type conversion should be performed before division, not after. The typo is that developer made a mistake with parentheses. The correct code:

float millisecondsSinceLastLog = (float)(now - s_lastLoggedTicks)/10000L;

Here is an example of a forgotten comma from the Linux Kernel project (C).

static ssize_t lp8788_show_eoc_time(struct device *dev,
        struct device_attribute *attr, char *buf)
{
  struct lp8788_charger *pchg = dev_get_drvdata(dev);
  char *stime[] = { "400ms", "5min", "10min", "15min",
      "20min", "25min", "30min" "No timeout" };
  ....
}

There is no comma between the penultimate and the last string literal. So, they stick together and the result is "30minNo timeout".

I think, it's enough. You can see examples of other errors in articles "February 31" and "Date processing attracts bugs or 77 defects in Qt 6".

I have shown how varied the errors related to dates and time are. In this chapter, there is no pattern of errors. At least I don't see one.

Tip: Just be aware that the code that handles time and dates is difficult and attracts errors for some reason. You should pay more attention to such code during code reviews.

And don't forget about unit tests, they are useful too.

Unlucky numbers: 0, 1, 2

The numbers 0, 1, 2 are frequently used in programming as the array indexes or as part of variable names. The presence of such numbers indicates that the code is typical.

Since the code is typical, it's hard to focus on it. As a result, such code attracts various typos. This relates the error pattern in question to the last line effect and incorrect comparisons.

Let's start with the typos in the indexes

Here is a classic copy-paste error from the LibreOffice project (C++).

Sequence< OUString > FirebirdDriver::
  getSupportedServiceNames_Static() throw (RuntimeException)
{
  Sequence< OUString > aSNS( 2 );
  aSNS[0] = "com.sun.star.sdbc.Driver";
  aSNS[0] = "com.sun.star.sdbcx.Driver";
  return aSNS;
}

To avoid retyping the boring line, the developer copies it. They replace sdbc with sdbcx in the duplicated line, but they forget to correct the array index.

Here is another Copy-Paste error from the Quake III Arena project (C).

int VL_FindAdjacentSurface(....)
{
  ....
  if (fabs(dir[0]) > test->radius ||
      fabs(dir[1]) > test->radius ||
      fabs(dir[1]) > test->radius)
  {
  ....
}

You can see the last line effect again. It's a combo. Such code is a magnet for errors.

Here is a more unexpected typo (the array index is 2 instead of 1) from the OpenCOLLADA project (C++).

struct short2
{
  short values[2];
  short2(short s1, short s2)
  {
    values[0] = s1;
    values[2] = s2;
  }
  ....
};

There is an overlap with the off-by-one error, which we will discuss at the end of the article.

Note to self :). Once again, we see how patterns increase the probability of error. It's an interesting topic. It may be good idea to broach it in the future article.

Here is another example of the typo from the .NET Core SDK project (C#).

public bool MatchesXmlType(IList<XPathItem> seq, int indexType)
{
  ....
  typBase = typBase.Prime;
  for (int i = 0; i < seq.Count; i++)
  {
    if (!CreateXmlType(seq[0]).IsSubtypeOf(typBase))
      return false;
  }

  return true;
}

The first element of the seq array is handled inside the loop all the time. The correct version:

if (!CreateXmlType(seq[i]).IsSubtypeOf(typBase))

Let's take a look at typos in the named variables

Here is the example from the Doom project (C++).

uint AltOp::fixedLength()
{
  uint l1 = exp1->fixedLength();
  uint l2 = exp1->fixedLength();

  if (l1 != l2 || l1 == ~0u)
    return ~0;

  return l1;
}

Most likely, the developer copied and pasted code and forgot to replace exp1 with exp2 in the second line.

Here is the example from the Vangers project (C++).

const char* iGetJoyBtnNameText(int vkey, int lang)
{
  ....
  if (vkey >= VK_STICK_SWITCH_1 && vkey <= VK_STICK_SWITCH_9)
  {
     ret = (lang)
      ? iJoystickStickSwitch2[vkey - VK_STICK_SWITCH_1]
      : iJoystickStickSwitch2[vkey - VK_STICK_SWITCH_1];
    return ret;
  }
  ....
}

Here is the opposite case. The correct version:

ret = (lang)
  ? iJoystickStickSwitch2[vkey - VK_STICK_SWITCH_1]
  : iJoystickStickSwitch1[vkey - VK_STICK_SWITCH_1];

Here is the example from the Boost project (C++).

point3D operator/(const point3D &p1, const point3D &p2)
{
  return point3D(p1.x/p2.x, p1.y/p2.y, p1.z/p1.z);
}

There's a typo here: p1.z/p1.z.

The other variants of typos

Here is the error of formatting from the Azure PowerShell project (C#).

protected override void ProcessRecordInternal()
{
  ....
  if (this.ShouldProcess(this.Name,
    string.Format("Creating Log Alert Rule '{0}' in resource group {0}",
      this.Name, this.ResourceGroupName)))
  ....
}

The correct version:

string.Format("Creating Log Alert Rule '{0}' in resource group {1}",
              this.Name, this.ResourceGroupName))

I think we've looked at enough examples. If you want even more, I suggest checking out the article "Zero, one, two, Freddy's coming for you".

If you have to use numbers in the names of variables or in the array indexing, you will probably find such code boring. Of course, sometimes developers have to write such code anyway. Unfortunately, it's difficult to focus on writing and checking boring code.

Let's take a look again at this code:

point3D operator/(const point3D &p1, const point3D &p2)
{
  return point3D(p1.x/p2.x, p1.y/p2.y, p1.z/p1.z);
}

We can't say that the code is difficult. Just the opposite, it's too simple. We can't be just accurate in writing such code. Instead, we start to think about algorithms and how to write any class better. And this is what we get.

Tip: Make yourself to be more attentive during the code reviews where unlucky numbers 0, 1, 2 occur.

A static code analyzer will be a good assistant during code reviews.

The off-by-one error

This is a long-known error pattern. So, I don't take a credit for its discovery. It was described a long time ago:

Wikipedia. Off-by-one error.
Common Weakness Enumeration. CWE-193: Off-by-one Error.

The most common error is accessing a one-past-the-end array element. I think everyone has encountered them when starting to learn programming. These are the same cases where you were confused by the fact that the elements in the array are numbered from zero instead of one.

So, the errors of this type have been known to everyone for a long time. However, I could detect this error pattern even if I hadn't known about it before. Unfortunately, even experienced developers continue to make such errors. I regularly see them in the code of projects. I'll give you a few examples.

Here is an example of the index typo from the Umbraco project (C#).

protected virtual string VisitMethodCall(MethodCallExpression m)
{
    ....
    if (m.Arguments.Count == 2)
    {
      var n1 = Visit(m.Arguments[0]);
      var f = m.Arguments[2];
      ....
}

The correct code:

var f = m.Arguments[1];

By the way, this code could be referred to the part about unlucky numbers 0, 1, 2. I guess that the mutual attraction of different error patterns doesn't surprise you.

Here is an example of the typo when checking the range of values from the CMake project (C).

static int64_t
expand(struct archive_read *a, int64_t end)
{
  ....
  if ((lensymbol = read_next_symbol(a, &rar->lengthcode)) < 0)
    goto bad_data;

  if (lensymbol > (int)(sizeof(lengthbases)/sizeof(lengthbases[0])))
    goto bad_data;
  ....
  len = lengthbases[lensymbol] + 2;
  ....
}

When accessing the lengthbases array, the array overrun is possible because the developers wrote the operator > instead of the operator >= above. The correct version of check:

if (lensymbol >= (int)(sizeof(lengthbases)/sizeof(lengthbases[0])))

Here is an example of the index typo from the ELKI project (Java).

public List<double[]> points;

@Override
public double[] computeMean() {
  // Not supported except for singletons.
  return points.size() == 1 ? points.get(1) : null;
}

The correct code:

return points.size() == 1 ? points.get(0) : null;

Tip: Even though everyone knows about this type of error, they are still common. Be more careful.

Static and dynamic code analyzers are good helpers in detecting errors of array overrun.

Share your experience with us!

Thank you for your attention! If you know similar patterns of errors, feel free to share them in the comments. First, you can help warn your teammate about such patterns. Second, we may write new diagnostics for the PVS-Studio static analyzer to detect such errors.

I may also find some new patterns and describe them in the future. To make sure you don't miss interesting news, you can subscribe to the monthly digest.

PVS-Studio

The PVS-Studio analyzer helped detect all the bugs described in this article. This tool will help you quickly detect typos and other defects in new code, thereby shortening the cost of fixing them. You can try the trial version of PVS-Studio for free: download PVS-Studio.

Additional links:

#Cpp #CSharp #Knowledge #Java

Tags:

#Cpp #CSharp #Knowledge #Java

Fill out the form in 2 simple steps below:

Your contact information:

Desired license type:

Common patterns of typos in programming

Research background

The last line effect

The most dangerous memset function

The incorrect comparison functions

The incorrect copy functions

The errors related to data and time

Unlucky numbers: 0, 1, 2

Let's start with the typos in the indexes

Let's take a look at typos in the named variables

The other variants of typos

The off-by-one error

Share your experience with us!

PVS-Studio

Additional links:

Posts: articles

Poll:

Comments (0)

Want to try PVS‑Studio for free?

Achievements

PVS-Studio

Licensing

Company