To get a trial key
fill out the form below
Team License (a basic version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
* By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
Safe Clearing of Private Data

Safe Clearing of Private Data

Apr 06 2016

We often need to store private data in programs, for example passwords, secret keys, and their derivatives, and we usually need to clear their traces in the memory after using them so that a potential intruder can't gain access to these data. In this article we will discuss why you can't clear private data using memset() function.

0388_Safe_Clearing_of_Private_Data/image1.png

memset()

You may have already read the article discussing vulnerabilities in programs where memset() is used to erase memory. However, that article doesn't fully cover all the possible scenarios of incorrect use of memset(). You may have problems not only with clearing stack-allocated buffers but with clearing dynamically allocated buffers as well.

The stack

For a start, let's discuss an example from the above-mentioned article that deals with using a stack-allocated variable.

Here is a code fragment that handles a password:

#include <string>
#include <functional>
#include <iostream>

//Private data
struct PrivateData
{
  size_t m_hash;
  char m_pswd[100];
};

//Function performs some operations on password
void doSmth(PrivateData& data)
{
  std::string s(data.m_pswd);
  std::hash<std::string> hash_fn;

  data.m_hash = hash_fn(s);
}

//Function for password entering and processing
int funcPswd()
{
  PrivateData data;
  std::cin >> data.m_pswd;

  doSmth(data);
  memset(&data, 0, sizeof(PrivateData));
  return 1;
}

int main()
{
  funcPswd();
  return 0;
}

This example is rather conventional and completely synthetic.

If we build a debug version of that code and run it in the debugger (I was using Visual Studio 2015), we'll see that it works well: the password and its calculated hash value are erased after they have been used.

Let's take a look at the assembler version of our code in the Visual Studio debugger:

.... 
    doSmth(data);
000000013F3072BF  lea         rcx,[data]  
000000013F3072C3  call        doSmth (013F30153Ch)  
  memset(&data, 0, sizeof(PrivateData));
000000013F3072C8  mov         r8d,70h  
000000013F3072CE  xor         edx,edx  
000000013F3072D0  lea         rcx,[data]  
000000013F3072D4  call        memset (013F301352h)  
  return 1;
000000013F3072D9  mov         eax,1  
....

We see the call of memset() function, that clears the private data after use.

We could stop here, but we'll go on and try to build an optimized release version. Now, this is what we see in the debugger:

....
000000013F7A1035  call
        std::operator>><char,std::char_traits<char> > (013F7A18B0h)  
000000013F7A103A  lea         rcx,[rsp+20h]  
000000013F7A103F  call        doSmth (013F7A1170h)  
    return 0;
000000013F7A1044  xor         eax,eax   
....

All the instructions associated with the call to the memset() function have been deleted. The compiler assumes that there is no need to call a function erasing data since they are no longer in use. It's not an error; it's a legal choice of the compiler. From the language viewpoint, a memset() call is not needed since the buffer is not used further in the program, so removing this call cannot affect its behavior. So, our private data remain uncleared, and it's very bad.

The heap

Now let's dig deeper. Let's see what happens to data when we allocate them in dynamic memory using the malloc function or the new operator.

Let's modify our previous code to work with malloc:

#include <string>
#include <functional>
#include <iostream>

struct PrivateData
{
  size_t m_hash;
  char m_pswd[100];
};

void doSmth(PrivateData& data)
{
  std::string s(data.m_pswd);
  std::hash<std::string> hash_fn;

  data.m_hash = hash_fn(s);
}

int funcPswd()
{
  PrivateData* data = (PrivateData*)malloc(sizeof(PrivateData));
  std::cin >> data->m_pswd;
  doSmth(*data);
  memset(data, 0, sizeof(PrivateData));
  free(data);
  return 1;
}

int main()
{
  funcPswd();
  return 0;
}

We'll be testing a release version since the debug version has all the calls where we want them to be. After compiling it in Visual Studio 2015, we get the following assembler code:

.... 
000000013FBB1021  mov         rcx,
        qword ptr [__imp_std::cin (013FBB30D8h)]  
000000013FBB1028  mov         rbx,rax  
000000013FBB102B  lea         rdx,[rax+8]  
000000013FBB102F  call
        std::operator>><char,std::char_traits<char> > (013FBB18B0h)  
000000013FBB1034  mov         rcx,rbx  
000000013FBB1037  call        doSmth (013FBB1170h)  
000000013FBB103C  xor         edx,edx  
000000013FBB103E  mov         rcx,rbx  
000000013FBB1041  lea         r8d,[rdx+70h]  
000000013FBB1045  call        memset (013FBB2A2Eh)  
000000013FBB104A  mov         rcx,rbx  
000000013FBB104D  call        qword ptr [__imp_free (013FBB3170h)]  
    return 0;
000000013FBB1053  xor         eax,eax  
....

Visual Studio has done well this time: it erases the data as planned. But what about other compilers? Let's try gcc, version 5.2.1, and clang, version 3.7.0.

I've modified our code a bit for gcc and clang and added some code to print the contents of the allocated memory block before and after the cleanup. I print the contents of the block the pointer points to after the memory is freed, but you shouldn't do it in real programs because you never know how the application will respond. In this experiment, however, I'm taking the liberty to use this technique.

....
#include "string.h"
....
size_t len = strlen(data->m_pswd);
for (int i = 0; i < len; ++i)
  printf("%c", data->m_pswd[i]);
printf("| %zu \n", data->m_hash);
memset(data, 0, sizeof(PrivateData));
free(data);
for (int i = 0; i < len; ++i)
  printf("%c", data->m_pswd[i]);
printf("| %zu \n", data->m_hash);
....

Now, here's a fragment of the assembler code generated by gcc compiler:

movq (%r12), %rsi
movl $.LC2, %edi
xorl %eax, %eax
call printf
movq %r12, %rdi
call free

The printing function (printf) is followed by a call to the free() function while the call to the memset() function is gone. If we run the code and enter an arbitrary password (for example "MyTopSecret"), we'll see the following message printed on the screen:

MyTopSecret| 7882334103340833743

MyTopSecret| 0

The hash has changed. I guess it's a side effect of the memory manager's work. As for our password "MyTopSecret", it stays intact in the memory.

Let's check how it works with clang:

movq (%r14), %rsi
movl $.L.str.1, %edi
xorl %eax, %eax
callq printf
movq %r14, %rdi
callq free

Just like in the previous case, the compiler decides to remove the call to the memset() function. This is what the printed output looks like:

MyTopSecret| 7882334103340833743

MyTopSecret| 0

So, both gcc and clang decided to optimize our code. Since the memory is freed after calling the memset() function, the compilers treat this call as irrelevant and delete it.

As our experiments reveal, compilers tend to delete memset() calls for the sake of optimization working with both stack and dynamic memory of the application.

Finally, let's see how the compilers will respond when allocating memory using the new operator.

Modifying the code again:

#include <string>
#include <functional>
#include <iostream>
#include "string.h"

struct PrivateData
{
  size_t m_hash;
  char m_pswd[100];
};

void doSmth(PrivateData& data)
{
  std::string s(data.m_pswd);
  std::hash<std::string> hash_fn;

  data.m_hash = hash_fn(s);
}

int funcPswd()
{
  PrivateData* data = new PrivateData();
  std::cin >> data->m_pswd;
  doSmth(*data);
  memset(data, 0, sizeof(PrivateData));
  delete data;
  return 1;
}

int main()
{
  funcPswd();
  return 0;
}

Visual Studio clears the memory as expected:

000000013FEB1044  call        doSmth (013FEB1180h)  
000000013FEB1049  xor         edx,edx  
000000013FEB104B  mov         rcx,rbx  
000000013FEB104E  lea         r8d,[rdx+70h]  
000000013FEB1052  call        memset (013FEB2A3Eh)  
000000013FEB1057  mov         edx,70h  
000000013FEB105C  mov         rcx,rbx  
000000013FEB105F  call        operator delete (013FEB1BA8h)  
    return 0;
000000013FEB1064  xor         eax,eax

The gcc compiler decided to leave the clearing function, too:

call printf
movq %r13, %rdi
movq %rbp, %rcx
xorl %eax, %eax
andq $-8, %rdi
movq $0, 0(%rbp)
movq $0, 104(%rbp)
subq %rdi, %rcx
addl $112, %ecx
shrl $3, %ecx
rep stosq
movq %rbp, %rdi
call _ZdlPv

The printed output has changed accordingly; the data we have entered are no longer there:

MyTopSecret| 7882334103340833743

| 0

But as for clang, it chose to optimize our code in this case as well and cut out the "unnecessary" function:

movq (%r14), %rsi
movl $.L.str.1, %edi
xorl %eax, %eax
callq printf
movq %r14, %rdi
callq _ZdlPv

Let's print the memory's contents:

MyTopSecret| 7882334103340833743 
MyTopSecret| 0

The password remains, waiting for being stolen.

Let's sum it all up. We have found that an optimizing compiler may remove a call to the memset() function no matter what type of memory is used - stack or dynamic. Although Visual Studio didn't remove memset() calls when using dynamic memory in our test, you can't expect it to always behave that way in real-life code. The harmful effect may reveal itself with other compilation switches. What follows from our small research is that one cannot rely on the memset() function to clear private data.

So, what is a better way to clear them?

You should use special memory-clearing functions, which can't be deleted by the compiler when it optimizes the code.

In Visual Studio, for example, you can use RtlSecureZeroMemory. Starting with C11, function memset_s is also available. In addition, you can implement a safe function of your own, if necessary; a lot of examples and guides can be found around the web. Here are some of them.

Solution No. 1.

errno_t memset_s(void *v, rsize_t smax, int c, rsize_t n) {
  if (v == NULL) return EINVAL;
  if (smax > RSIZE_MAX) return EINVAL;
  if (n > smax) return EINVAL;
  volatile unsigned char *p = v;
  while (smax-- && n--) {
    *p++ = c;
  }
  return 0;
}

Solution No. 2.

void secure_zero(void *s, size_t n)
{
    volatile char *p = s;
    while (n--) *p++ = 0;
}

Some programmers go even further and create functions that fill the array with pseudo-random values and have different running time to hinder attacks based on time measuring. Implementations of these can be found on the web, too.

Conclusion

PVS-Studio static analyzer can detect data-clearing errors we have discussed here, and uses diagnostic V597 to signal about the problem. This article was written as an extended explanation of why this diagnostic is important. Unfortunately, many programmers tend to think that the analyzer "picks on" their code and there is actually nothing to worry about. Well, it's because they see their memset() calls intact when viewing the code in the debugger, forgetting that what they see is still just a debug version.

Popular related articles
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…

Comments (0)

Next comments
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept