Webinar: Parsing C++ - 10.10
This time I want to speak on the 'printf' function. Everybody has heard of software vulnerabilities and that functions like 'printf' are outlaw. But it's one thing to know that you'd better not use these functions, and quite the other to understand why. In this article, I will describe two classic software vulnerabilities related to 'printf'. You won't become a hacker after that but perhaps you will have a fresh look at your code. You might create similar vulnerable functions in your project without knowing that.
STOP. Reader, please stop, don't pass by. You have seen the word "printf", I know. And you're sure that you will now be told a banal story that the function cannot check types of passed arguments. No! It's vulnerabilities themselves that the article deals with, not the things you have thought. Please come and read it.
The previous post can be found here: Part one.
Have a look at this line:
printf(name);
It seems simple and safe. But actually it hides at least two methods to attack the program.
Let's start our article with a demo sample containing this line. The code might look a bit odd. It is, really. We found it quite difficult to write a program so that it could be attacked then. The reason is optimization performed by the compiler. It appears that if you write a too simple program, the compiler creates a code where nothing can be hacked. It uses registers, not the stack, to store data, creates intrinsic functions and so on. We could write a code with extra actions and loops so that the compiler lacked free registers and started putting data into the stack. Unfortunately, the code would be too large and complicated in this case. We could write a whole detective story about all this, but we won't.
The cited sample is a compromise between complexity and the necessity to create a code that would not be too simple for the compiler to get it "collapsed into nothing". I have to confess that I still have helped myself a bit: I have disabled some optimization options in Visual Studio 2010. First, I have turned off the /GL (Whole Program Optimization) switch. Second, I have used the __declspec(noinline) attribute.
Sorry for such a long introduction: I just wanted to explain why my code is such a crock and prevent beforehand any debates on how we could write it in a better way. I know that we could. But we didn't manage to make the code short and show you the vulnerability inside it at the same time.
The complete code and project for Visual Studio 2010 can be found here.
const size_t MAX_NAME_LEN = 60;
enum ErrorStatus {
E_ToShortName, E_ToShortPass, E_BigName, E_OK
};
void PrintNormalizedName(const char *raw_name)
{
char name[MAX_NAME_LEN + 1];
strcpy(name, raw_name);
for (size_t i = 0; name[i] != '\0'; ++i)
name[i] = tolower(name[i]);
name[0] = toupper(name[0]);
printf(name);
}
ErrorStatus IsCorrectPassword(
const char *universalPassword,
BOOL &retIsOkPass)
{
string name, password;
printf("Name: "); cin >> name;
printf("Password: "); cin >> password;
if (name.length() < 1) return E_ToShortName;
if (name.length() > MAX_NAME_LEN) return E_BigName;
if (password.length() < 1) return E_ToShortPass;
retIsOkPass =
universalPassword != NULL &&
strcmp(password.c_str(), universalPassword) == 0;
if (!retIsOkPass)
retIsOkPass = name[0] == password[0];
printf("Hello, ");
PrintNormalizedName(name.c_str());
return E_OK;
}
int _tmain(int, char *[])
{
_set_printf_count_output(1);
char universal[] = "_Universal_Pass_!";
BOOL isOkPassword = FALSE;
ErrorStatus status =
IsCorrectPassword(universal, isOkPassword);
if (status == E_OK && isOkPassword)
printf("\nPassword: OK\n");
else
printf("\nPassword: ERROR\n");
return 0;
}
The _tmain() function calls the IsCorrectPassword() function. If the password is correct or if it coincides with the magic word "_Universal_Pass_!", then the program prints the line "Password: OK". The purpose of our attacks will be to have the program print this very line.
The IsCorrectPassword() function asks the user to specify name and password. The password is considered correct if it coincides with the magic word passed into the function. It is also considered correct if the password's first letter coincides with the name's first letter.
Regardless of whether the correct password is entered or not, the application shows a welcome window. The PrintNormalizedName() function is called for this purpose.
The PrintNormalizedName() function is of the most interest. It is this function where the "printf(name);" we're discussing is stored. Think of the way we can exploit this line to cheat the program. If you know how to do it, you don't have to read further.
What does the PrintNormalizedName() function do? It prints the name making the first letter capital and the rest letters small. For instance, if you enter the name "andREy2008", it will be printed as "Andrey2008".
Suppose we don't know the correct password. But we know that there is some magic password somewhere. Let's try to find it using printf(). If this password's address is stored somewhere in the stack, we have certain chances to succeed. Any ideas how to get this password printed on the screen?
Here is a tip. The printf() function refers to the family of variable-argument functions. These functions work in the following way. Some amount of data is written into the stack. The printf() function doesn't know how many data is pushed and what type they have. It follows only the format string. If it reads "%d%s", then the function should extract one value of the int type and one pointer from the stack. Since the printf() function doesn't know how many arguments it has been passed, it can look deeper into the stack and print data that have nothing to do with it. It usually causes access violation or printing trash. And we may exploit this trash.
Let's see how the stack might look at the moment when calling the printf() function:
Figure 1. Schematic arrangement of data in the stack.
The "printf(name);" function's call has only one argument which is the format string. It means that if we type in "%d" instead of the name, the program will print the data that lie in the stack before the PrintNormalizedName() function's return address. Let's try:
Name: %d
Password: 1
Hello, 37
Password: ERROR
This action has little sense in it for now. First of all, we have at least to print the return addresses and all the contents of the char name[MAX_NAME_LEN + 1] buffer which is located in the stack too. And only then we may get to something really interesting.
If an attacker cannot disassemble or debug the program, he/she cannot know for sure if there is something interesting in the stack to be found. He/she still can go the following way.
First we can enter: "%s". Then "%x%s". Then "%x%x%s" and so on. Doing so, the hacker will search through the data in the stack in turn and try to print them as a line. It helps the intruder that all the data in the stack are aligned at least on a 4-byte boundary.
To be honest, we won't succeed if we go this way. We will exceed the limit of 60 characters and have nothing useful printed. "%f" will help us - it is intended to print values of the double type. So, we can use it to move along the stack with an 8-byte step.
Here it is, our dear line:
%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%x(%s)
This is the result:
Figure 2. Printing the password. Click on the picture to enlarge it.
Let's try this line as the magic password:
Name: Aaa
Password: _Universal_Pass_!
Hello, Aaa
Password: OK
Hurrah! We have managed to find and print the private data which the program didn't intend to give us access to. Note also that you don't have to get access to the application's binary code itself. Diligence and persistence are enough.
You should give a wider consideration to this method of getting private data. When developing software containing variable-argument functions, think it over if there are cases when they may be the source of data leakage. It can be a log-file, a batch passed on the network and the like.
In the case we have considered, the attack is possible because the printf() function receives a string that may contain control commands. To avoid this, you just need to write it in this way:
printf("%s", name);
Do you know that the printf() function can modify memory? You must have read about it but forgotten. We mean the "%n" specifier. It allows you to write a number of characters, already printed by the printf() function, by a certain address.
To be honest, an attack based on the "%n" specifier is just of a historical character. Starting with Visual Studio 2005, the capability of using "%n" is off by default. To perform this attack, I had to explicitly allow this specifier. Here is this magic trick:
_set_printf_count_output(1);
To make it clearer, let me give you an example of using "%n":
int i;
printf("12345%n6789\n", &i);
printf( "i = %d\n", i );
The program's output:
123456789
i = 5
We have already found out how to get to the needed pointer in the stack. And now we have a tool that allows us to modify memory by this pointer.
Of course, it's not very much convenient to use it. To start with, we can write only 4 bytes at a time (int type's size). If we need a larger number, the printf() function will have to print very many characters first. To avoid this we may use the "%00u" specifier: it affects the value of the current number of output bytes. Let's not go deep into the detail.
Our case is simpler: we just have to write any value not equal to 0 into the isOkPassword variable. This variable's address is passed into the IsCorrectPassword() function, which means that it is stored somewhere in the stack. Do not be confused by the fact that the variable is passed as a reference: a reference is an ordinary pointer at the low level.
Here is the line that will allow us to modify the IsCorrectPassword variable:
%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f %n
The "%n" specifier does not take into account the number of characters printed by specifiers like "%f". That's why we make one space before "%n" to write value 1 into isOkPassword.
Let's try:
Figure 3. Writing into memory. Click on the picture to enlarge it.
Are you impressed? But that's not all. We may perform writing by virtually any address. If the printed line is stored in the stack, we may get the needed characters and use them as an address.
For example, we may write a string containing characters with codes 'xF8', 'x32', 'x01', 'x7F' in a row. It turns out that the string contains a hard-coded number equivalent to value 0x7F0132F8. We add the "%n" specifier at the end. Using "%x" or other specifiers we can get to the coded number 0x7F0132F8 and write the number of printed characters by this address. This method has some limitations, but it is still very interesting.
We may say that an attack of the second type is hardly possible nowadays. As you see, support of the "%n" specifier is off in contemporary libraries by default. But you may create a self-made mechanism subject to this kind of vulnerabilities. Be careful when external data input into your program manage what and where is written into memory.
Particularly in our case, we may avoid the problem by writing the code in this way:
printf("%s", name);
We have considered only two simple examples of vulnerabilities here. Surely, there are much more of them. We don't make an attempt to describe or at least enumerate them in this article; we wanted to show you that even such a simple construct like "printf(name)" can be dangerous.
There is an important conclusion to draw from all this: if you are not a security expert, you'd better follow all the recommendations to be found. Their point might be too subtle for you to understand the whole range of dangers on yourself. You must have read that the printf() function is dangerous. But I'm sure that many of you reading this article have learned only now how deep the rabbit hole is.
If you create an application that is potentially an attack object, be very careful. What is quite safe code from your viewpoint might contain a vulnerability. If you don't see a catch in your code, it doesn't mean there isn't any.
Follow all the compiler's recommendations on using updated versions of string functions. We mean using sprintf_s instead of sprintf and so on.
It's even better if you refuse low-level string handling. These functions are a heritage of the C language. Now we have std::string and we have safe methods of string formatting such as boost::format or std::stringstream.
P.S. Some of you, having read the conclusions, may say: "well, it's as clear as day". But be honest to yourself. Did you know and remember that printf() can perform writing into memory before you read this article? Well, and this is a great vulnerability. At least, it used to. Now there are others, as much insidious.
0