To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
Overwriting memory - why?

Overwriting memory - why?

Jan 17 2012
Author:

We decided to publish this article in our knowledge base to show programmers how easily private data can get out of the program handling them. There is the V597 diagnostic rule in PVS-Studio that allows you to detect those calls of the memset() function which fail to clear the memory. But the danger looks unconvincing and improbable. This article shows well that the danger is real and must not be ignored.

This is a translation of an article written by an ABBYY employee and first published here: "ABBYY's blog. Overwriting memory - why?". Translation done and published with permission of the copyright holder.

There is SecureZeroMemory() function in the depths of Win32 API. Its description is rather concise and reads that this function overwrites a memory region with zeroes and is designed in such way that the compiler never eliminates a call of this function during code optimization. The description further says this function should be used for overwriting memory that was previously used to store passwords and cryptokeys.

One question remains - why is that needed? One can find some abstract speculations about the risk of application's memory being written into swap file, hibernate file or crash dump where an intruder could find it. It looks like paranoia - definitely not every intruder can get access to these files.

There are much more possibilities to get access to data a program has forgotten to overwrite, actually - sometimes even access to the computer is not needed. Next we will consider an example, and everyone will decide for himself/herself whether this paranoia is reasonable.

All the examples are in pseudocode that suspiciously resembles C++. Below are lots of text and not very clean code, and later you will see that things are not much better in clean code.

So. In a far-away function we get a cryptokey, a password or a PIN (further called simply "the secret"), use it and do not overwrite it:

{
    const int secretLength = 1024;
    WCHAR secret[secretLength] = {};
    obtainSecret( secret, secretLength );
    processWithSecret( what, secret, secretLength );
}

In another function that is completely unrelated to the previous one, our application's instance asks another instance for a file with a specified name. This is done using RPC - a dinosaur-age technology present in many platforms and widely used by Windows for interprocess and intercomputer communication.Usually you have to write an interface specification in IDL to use RPC. It will have a method specification similar to this:

//MAX_FILE_PATH == 1024
error_status_t rpcRetrieveFile(
    [in] const WCHAR fileName[MAX_FILE_PATH],
    [out] BYTE_PIPE filePipe );

The second parameter here has a special type that facilitates passing data streams of arbitrary lengths. The first parameter is a character array for the filename.This specification is compiled by the MIDL compiler, and the latter produces a header file (.h) with this function

error_status_t rpcRetrieveFile (
  handle_t IDL_handle, 
  const WCHAR fileName[1024], 
  BYTE_PIPE filePipe);

MIDL has added a service parameter here, and the second and the third parameters are the same as in the previous specification.We call that function:

void retrieveFile( handle_t binding )
{
  WCHAR remoteFileName[MAX_FILE_PATH];
  retrieveFileName( remoteFileName, MAX_FILE_PATH );
  CBytePipeImplementation pipe;
  rpcRetrieveFile( binding, remoteFileName, pipe );           
}

Everything is fine - retrieveFileName() gets a null-terminated (no, the terminating null character wasn't omitted) string, the called party receives the string and handles it, i.e. gets the full path to the file, opens it and passes data from it.

Everyone is optimistic, and several product releases are shipped with this code, but nobody has noticed the elephant yet. Here it is. From C++ point of view, the following function's parameter

const WCHAR fileName[1024]

is not an array, but a pointer to the first array element. The rpcRetrieveFile() function is just a thunk also generated by MIDL. It packages all its parameters and calls the same WinAPI NdrClientCall2() function each time which semantics is "Windows, could you please execute an RPC-call with theeese parameters?" and passes the parameters list to the NdrClientCall2() function. One of the first parameters being passed is the format string generated by MIDL according to the specification in IDL. Looks much like the good old printf().

NdrClientCall2() looks carefully at the received format string and packages the parameters for passing them to the other party (this is called marshalling). Each parameter is accompanied by a type specifier, so each parameter is packaged according to its type. In our case, the address of the first array element is passed for the fileName parameter and "an array of 1024 items of the WCHAR type" specifier is passed for its type.

Now we have two successive calls in code:

processWithSecret( whatever );
retrieveFile( binding );

The processWithSecret() function occupies 2 Kbytes on the stack to store the secret and forgets about them on return. The retrieveFile() function is then called, and it retrieves the filename which length is 18 characters (18 characters plus terminating null - 19 characters total, i.e. 38 bytes). The filename is again stored on the stack and most likely it will be the same memory region as the one used to store the secret in the first function.

Then a remote call occurs and the packing function dutifully packages the whole array (2048 bytes, not 38 bytes) into a packet, and then this packet is sent over the network.QUITE SUDDENLY

the secret is passed over the network. The application did not even intend to ever pass the secret over the network, but the secret is passed. This defect is much more convenient to "use" than even looking into the swap file. Who is paranoid now?The example above looks rather complicated. Here is similar code that you can try on codepad.org

const int bufferSize = 32;

void first()
{
  char buffer[bufferSize];
  memset( buffer, 'A', sizeof( buffer ) );
}

void second()
{
  char buffer[bufferSize];
  memset( buffer, 'B', bufferSize / 2 );
  printf( "%s", buffer );
}

int main()
{
  first();
  second();
}

The code yields undefined behavior. At the moment of writing this post, the results are the following: a string of 16 'B' characters followed by 16 'A' characters.Now it's just the right time for brandishing pitchforks and torches and angry shouts that no sane person uses simple arrays and that we must use std::vector, std::string and the CanDoEverything class that handle memory "correctly", and for a holy war worth no fewer than 9 thousand comments.All that wouldn't actually help in the above case because the packing function in the depths of RPC would still read more data than previously written by the calling code. As a result, it would read the data at the adjacent addresses or (in some cases) the application would crash on illegal memory access. Those adjacent addresses could again store data that must not be sent over the network.Whose fault is it? As usual, it's the developer's fault - it is he/she who misunderstood how the rpcRetrieveFile() function handles received parameters. This results in undefined behavior which leads to uncontrolled transmission of data over the network. This can be fixed either by changing the RPC-interface and altering the code on the both sides, or by using an array of large enough size and fully overwriting it before copying a parameter into the array.This is a situation where the SecureZeroMemory() function would help: should the first function overwrite the secret before returning, an error in the second function would at least cause transmission of an overwritten array. Getting a Darwin Award gets harder this way.

Popular related articles
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
PVS-Studio ROI

Date: Jan 30 2019

Author: Andrey Karpov

Occasionally, we're asked a question, what monetary value the company will receive from using PVS-Studio. We decided to draw up a response in the form of an article and provide tables, which will sho…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept