>
>
>
The most dangerous function in the C/C+…

Andrey Karpov
Articles: 671

The most dangerous function in the C/C++ world

After checking hundreds of C/C++ projects of various types, I can claim: memset() is the most inefficient and dangerous function. Most errors that I see in projects are related to the usage of this particular memset() function. I understand that my conclusion is probably neither a revolutionary one, nor an extremely useful one, but I think our readers would be interested to find out why I have come to it.

A few words about me

My name is Andrey Karpov. I do a lot of things in my life. But the main thing that I do, is tell programmers about the benefits of using static code analysis. Of course I do it pursuing an additional goal - I try to raise interest in PVS-Studio. However, this should not lessen the usefulness of my articles.

The only form of advertising that can pierce through the armor of programmers' skepticism, is the demonstration of the bugs that were found by PVS-Studio. For this purpose I run the analyzer on a large number of projects, and write articles about the check results. This brings common benefits. Open-source projects are gradually getting better, and our company is obtaining new customers.

You'll see what I'm leaning to. Doing numerous checks of open-source projects, I have gathered a pretty big collection of various bug examples. And now, based on this, I see interesting error patterns.

For example, one of the most amusing observations was that most often programmers make mistakes using Copy-Paste at the very end. On this topic there is an article "The Last Line Effect" for those who may be interested.

New observation

Now I have another interesting observation. Using one or another function, the programmers can make mistakes. That is clear, you may say. But the probability of the error may also depend on the function. In other words, some functions provoke errors, and some don't.

And now I am ready to name the function which causes the most troubles, and which gives the biggest chance of an epic fail in its use.

So, the biggest looser among the functions is the memset function!

It's hard to say where the root of this evil. Apparently it has an unfortunate interface.

On top of it, its very usage is quite toiling, and it's very easy to get wrong, evaluating values of the actual arguments.

The second "Biggest looser" award goes to the printf() function and its variants. I guess it's no surprise. Only lazy people won't write about the danger of the printf() function. It is probable, that the popularity of the issues related to the printf() functions brought it to the second place.

All in all there are 9055 bugs in my storage. These are errors that PVS-Studio is able to detect. It is clear that this list is far from being a complete one. However, such a large number of bugs allows me to be confident, making such statements about the functions. So, I figured that 329 errors are caused by the memset() function.

In total, about 3,6% of bugs are related to this function! That's a lot, I have to say.

Examples

I've decided to enumerate some typical errors. Looking at them, I think you'll agree that there is something wrong with memset() function. It kind of attracts evil.

To begin with, let's brush up on how this function is declared:

void * memset ( void * ptr, int value, size_t num );

  • ptr - Pointer to the block of memory to fill.
  • value - Value to be set. The value is passed as an int, but the function fills the block of memory using the unsigned char conversion of this value.
  • num - Number of bytes to be set to the value. 'size_t' is an unsigned integral type.

Example N1 (ReactOS project)

void
Mapdesc::identify( REAL dest[MAXCOORDS][MAXCOORDS] )
{
  memset( dest, 0, sizeof( dest ) );
  for( int i=0; i != hcoords; i++ )
    dest[i][i] = 1.0;
}

This error occurred because in C and C++ you cannot pass arrays by value (more details). The argument 'dest' is nothing more than an ordinary pointer. That's why the sizeof() operator evaluates the size of the pointer, not the array.

At first glance, it has nothing to do with memset(). But on the other hand, this function will fill with zeroes only 4 or 8 bytes (exotic architectures don't count). We really have a bug here, and it got there when the memset() function was called.

Example N2 (Wolfenstein 3D project)

typedef struct cvar_s {
  char *name;
  ...
  struct cvar_s *hashNext;
} cvar_t;

void Cvar_Restart_f( void ) {
  cvar_t  *var;
  ...
  memset( var, 0, sizeof( var ) );
  ...
}

A similar bug. It most likely occurred because of the carelessness of a programmer. 'var' variable is a pointer here, which means that memset() will zero out only a part of the structure. But in practice, only 'name' member will be zeroed out.

Example N3 (SMTP Client project)

void MD5::finalize () {
  ...
  uint1 buffer[64];
  ...
  // Zeroize sensitive information
  memset (buffer, 0, sizeof(*buffer));
  ...
}

A very common error pattern that only few programmers are aware of. The thing is that memset() function will be removed by the compiler. The buffer is no longer used after the memset() call. And the compiler removes the function call for the sake of optimization. In terms of C/C++ language it doesn't have any impact on the program performance. The fact that the private information will remain in the memory, will not affect the operation of the program.

It is neither an error of the compiler, nor my imagination. The compiler really removes the memset() calls. And every time I write about this vulnerability error, I get e-mails from people who start arguing with me. I am quite tired of replying to such letters. Therefore, I ask those who are still in doubt to read these materials first before starting a new round of discussion.

Example N4 (Notepad++ project)

#define CONT_MAP_MAX 50
int _iContMap[CONT_MAP_MAX];
...
DockingManager::DockingManager()
{
  ...
  memset(_iContMap, -1, CONT_MAP_MAX);
  ...
}

It is often forgotten that the third argument of memset() function is not the number of elements, but the buffer size in bytes. This is exactly what happened in the code fragment given above. As a result, only one-quarter of the buffer will be filled (on condition that the size of the 'int' type is 4 bytes).

Example N5 (Newton Game Dynamics project)

dgCollisionCompoundBreakable::dgCollisionCompoundBreakable(....)
{
  ...
  dgInt32 faceOffsetHitogram[256];
  dgSubMesh* mainSegmenst[256];
  ...
  memset(faceOffsetHitogram, 0, sizeof(faceOffsetHitogram));
  memset(mainSegmenst, 0, sizeof(faceOffsetHitogram));
  ...
}

Here we definitely see a typo. Most likely someone was too lazy to do the memset() function call twice. The string was duplicated. In one fragment the 'faceOffsetHitogram' was replaced with 'mainSegmenst', but in the other case the programmer forgot to do it.

It turns out that sizeof() doesn't evaluate the size of the array, filled with zeros. We may think - "What does it have in common with the memset() function?" But it is this function that will work incorrectly.

Example N6 (CxImage project)

static jpc_enc_tcmpt_t *tcmpt_create(....)
{
  ...
  memset(tcmpt->stepsizes, 0,
    sizeof(tcmpt->numstepsizes * sizeof(uint_fast16_t)));
  ...
}

There is an extra sizeof() operator. It would be correct to evaluate in such a way:

tcmpt->numstepsizes * sizeof(uint_fast16_t)

But instead of it we had an additional sizeof() and some rubbish as a result.

sizeof(tcmpt->numstepsizes * sizeof(uint_fast16_t))

Here the sizeof() operator evaluates the size of the size_t type. Exactly this expression has exactly this type.

I know that you probably want to make an objection. It is not the first time that the error is related to the sizeof () operator, i.e. the programmer makes an error evaluating the buffer size. However, the cause of these errors is still memset() function. It works in such a way that doing these evaluations you can easily make an error.

Example N7 (project WinSCP)

TForm * __fastcall TMessageForm::Create(....)
{
  ....
  LOGFONT AFont;
  ....   
  memset(&AFont, sizeof(AFont), 0);
  ....
}

Memset() function absorbs everything. That's why it is all right if you confuse the 2nd and the 3rd argument. This is exactly what happened here. This function fills 0 bytes.

Example N8 (Multi Theft Auto project)

Here is another similar error. Win32 API developers were joking when they were writing such a macro:

#define RtlFillMemory(Destination,Length,Fill) \
  memset((Destination),(Fill),(Length))

According to the meaning, it's like an alternative to the memset(). But you have to be careful. Note that the 2nd and 3rd argument change their places.

Sometimes when people start using RtlFillMemory(), they treat it as memset(), and think that they have the same parameters. But as a result they get more bugs.

#define FillMemory RtlFillMemory
LPCTSTR __stdcall GetFaultReason ( EXCEPTION_POINTERS * pExPtrs )
{
  ....
  PIMAGEHLP_SYMBOL pSym = (PIMAGEHLP_SYMBOL)&g_stSymbol ;
  FillMemory ( pSym , NULL , SYM_BUFF_SIZE ) ;
  ....
}

NULL is nothing but a 0. That's why the memset() function filled 0 bytes.

Example N9 (IPP Samples project)

I think you understand that I can provide a large list of the errors we've found. However, it won't be very interesting, because it's boring to look at the same errors, most of which you've already heard about. But let's look at just one more case.

Although some of the errors given above were found in the C++ code, they don't have anything to do with C++. In other words, these programming errors are related to the C language style.

The following error is connected with improper use of the memset() in a C++ program. The example is quite a long one, so you don't have to look too thoroughly at it. Read the description below and everything will become clear.

class _MediaDataEx {
  ...
  virtual bool TryStrongCasting(
    pDynamicCastFunction pCandidateFunction) const;
  virtual bool TryWeakCasting(
    pDynamicCastFunction pCandidateFunction) const;
};

Status VC1Splitter::Init(SplitterParams& rInit)
{
  MediaDataEx::_MediaDataEx *m_stCodes;
  ...
  m_stCodes = (MediaDataEx::_MediaDataEx *)
    ippsMalloc_8u(START_CODE_NUMBER*2*sizeof(Ipp32s)+
                  sizeof(MediaDataEx::_MediaDataEx));
  ...
  memset(m_stCodes, 0, 
    (START_CODE_NUMBER*2*sizeof(Ipp32s)+
    sizeof(MediaDataEx::_MediaDataEx)));
  ...
}

Memset() function is used to initialize an array consisting of class objects. The biggest trouble is that the class has virtual functions. Thereafter, the memset() function zeroes out not only the class fields, but the pointer to the virtual methods chart (vptr) as well. What it will lead to is a good question, but there is nothing positive in coding in such a way. It's no good using the classes like this.

Conclusion

As you can see, the memset() function has an extremely tricky interface. This function provokes way more bugs than the other ones. Be careful!

I don't know how beneficial this knowledge will be to you. But I hope you found it interesting to read this note. Perhaps from now on, you will be more careful using memset(), it would certainly be a good thing.

Thank you all for your attention, and please subscribe to my Twitter @Code_Analysis.

Note

Right after the article was posted, one of our readers sent a link to this interesting article "memset is Evil". I've decided to share it with you as well. One more point that proves that memset() is really dangerous.