The article discusses the new capabilities of C++ language described in the standard C++0x and supported in Visual Studio 2010. By the example of PVS-Studio we will see how the changes in the language influence static code analysis tools.
The new C++ language standard is about to come into our life. They are still calling it C++0x, although its final name seems to be C++11. The new standard is partially supported by modern C++ compilers, for example, Intel C++ and Visual C++. This support is far from being full-fledged and it is quite clear why. First, the standard has not been accepted yet, and second, it will take some time to introduce its specifics into compilers even when it is accepted.
Compiler developers are not the only ones for whom support of the new standard is important. The language innovations must be quickly provided with support in static source code analyzers. It is promised that the new standard will provide backward compatibility. The obsolete C++ code is almost guaranteed to be able to be correctly compiled by new compilers without any modifications. But it does not mean that a program that does not contain new language constructs still can be processed by a static analyzer that does not support the new standard C++0x. We got convinced of it in practice when trying to check a project created in the beta-version of Visual Studio 2010 with PVS-Studio. The point is about the header files that already use the new language constructs. For example, you may see that the header file "stddef.h" uses the new operator decltype:
namespace std { typedef decltype(__nullptr) nullptr_t; }
Such constructs are naturally considered syntactically wrong by an analyzer that does not support C++0x, and either cause a program abort or incorrect results. It got obvious that we must provide support for C++0x in PVS-Studio by the moment Visual Studio is released, at least to the extent it is done in this compiler.
We may say that we have fulfilled this task with success, and by the moment of writing this article, the new version PVS-Studio 3.50, integrating both into Visual Studio 2005/2008 and Visual Studio 2010, has become available on our site. Beginning with the version PVS-Studio 3.50, the tool provides support for the same part of C++0x standard as in Visual Studio 2010. This support is not perfect as, for example, in case of "right-angle brackets", but we will continue the work on developing the support for C++0x standard in the next versions.
In this article, we will study the new features of the language which are supported in the first edition of Visual Studio 2010. We will look at these features from different viewpoints: what this or that new ability is about, if there is a relation to 64-bit errors, how the new language construct is supported in PVS-Studio and how its appearance impacts the library VivaCore.
Note. VivaCore is a library of code parsing, analysis and transformation. VivaCore is an open-source library that supports the languages C and C++. The product PVS-Studio is based on VivaCore as well as other program projects may be created relying on this library.
The article we want to present may be called a report on the investigation and support of the new standard in PVS-Studio. The tool PVS-Studio diagnoses 64-bit and parallel OpenMP errors. But since the topic of moving to 64-bit systems is more relevant at the moment, we will mostly consider examples that show how to detect 64-bit errors with PVS-Studio.
Like in C, the type of a variable in C++ must be defined explicitly. But with the appearance of template types and techniques of template metaprogramming in C++ language, it became usual that the type of an object is not so easy to define. Even in a rather simple case - when searching for array items - we need to define the type of an iterator in the following way:
for (vector<int>::iterator itr = myvec.begin();
itr != myvec.end();
++itr)
Such constructs are very long and cumbersome. To make the record briefer, we may use typedef but it will spawn new entities and do little for the purpose of convenience.
C++0x offers its own technique to make this issue a bit less complicated. The meaning of the key word auto is replaced with a different one in the new standard. While auto has meant before that a variable is created in the stack, and it was implied if you had not specified otherwise (for example, register), now it is analogous to var in C# 3.0. The type of a variable defined as auto is determined by the compiler itself relying on what object initializes this variable.
We should notice that an auto-variable cannot store values of different types during one instance of program execution. C++ still remains a statically typed language, and by using auto we just tell the compiler to see to defining the type on its own: once the variable is initialized, its type cannot be changed.
Now the iterator can be defined in this way:
for (auto itr = myvec.begin(); itr != myvec.end(); ++itr)
Besides mere convenience of writing the code and its simplification, the key word auto makes the code safer. Let us consider an example where auto will be used to make the code safe from the viewpoint of 64-bit software development:
bool Find_Incorrect(const string *arrStr, size_t n)
{
for (size_t i = 0; i != n; ++i)
{
unsigned n = arrStr[i].find("ABC");
if (n != string::npos)
return true;
}
return false;
};
This code has a 64-bit error: the function behaves correctly when compiling the Win32 version and fails when the code is built in the Win64 mode. The error is in using the type unsigned for the variable "n", although the type string::size_type must be used which is returned by the function find(). In the 32-bit program, the types string::size_type and unsigned coincide and we get correct results. In the 64-bit program, string::size_type and unsigned do not coincide any more. When the substring is not found, the function find() returns the value string::npos that equals 0xFFFFFFFFFFFFFFFFui64. This value is cut to the value 0xFFFFFFFFu and placed into a 32-bit variable. As a result, the condition 0xFFFFFFFFu != 0xFFFFFFFFFFFFFFFFui64 is true and we have the situation when the function Find_Incorrect always returns true.
In this example, the error is not so dangerous because it is detected even by the compiler not to speak of a specialized analyzer Viva64 (included into PVS-Studio).
This is how the compiler detects the error:
warning C4267: 'initializing' :
conversion from 'size_t' to 'unsigned int', possible loss of data
This is how Viva64 does it:
V103: Implicit type conversion from memsize to 32-bit type.
What is most important, this error is quite possible and often occurs in code due to inaccurate choice of a type to store the returned value. The error might appear even because the programmer is reluctant to use a cumbersome construct of the string::size_type kind.
Now we can easily avoid such errors without overloading the code. Using the type auto, we may write the following simple and safe code:
auto n = arrStr[i].find("ABC");
if (n != string::npos)
return true;
The error disappeared by itself. The code has not become more complicated or less effective. Here is the conclusion - it is reasonable in many cases to use auto.
The key word auto will reduce the number of 64-bit errors or let you eliminate them with more grace. But auto does not in itself guarantee that all the 64-bit errors will be eliminated! It is just one more language tool that serves to make programmers' life easier but not to take all their work of managing the types. Consider this example:
void *AllocArray3D(int x, int y, int z,
size_t objectSize)
{
int size = x * y * z * objectSize;
return malloc(size);
}
The function must calculate the array's size and allocate the necessary memory amount. It is logical to expect that this function will be able to allocate the necessary memory amount for the array of the size 2000*2000*2000 of double type in the 64-bit environment. But the call of the "AllocArray3D(2000, 2000, 2000, sizeof(double));" kind will always return NULL, as if it is impossible to allocate such an amount of memory. The true reason for this is the overflow in the expression "int size = x * y * z * sizeof(double)". The variable size takes the value -424509440 and the further call of the function malloc is senseless. By the way, the compiler will also warn that this expression is unsafe:
warning C4267: 'initializing' :
conversion from 'size_t' to 'int', possible loss of data
Relying on auto, an inaccurate programmer may modify the code in the following way:
void *AllocArray3D(int x, int y, int z,
size_t objectSize)
{
auto size = x * y * z * objectSize;
return (double *)malloc(size);
}
But it will not eliminate the error at all and will only hide it. The compiler will not generate a warning any more but the function AllocArray3D will still return NULL.
The type of the variable size will automatically turn into size_t. But the overflow occurs when calculating the expression "x * y * z". This subexpression has the type int at first and only then it will be extended to size_t when being multiplied by the variable "objectSize".
Now this hidden error may be found only with the help of Viva64 analyzer:
V104: Implicit type conversion to memsize type in an
arithmetic expression.
The conclusion - you must be attentive even if you use auto.
Let us now briefly look how the new key word is supported in the library VivaCore the static analyzer Viva64 is based on. So, the analyzer must be able to understand that the variable AA has the type int to warn (see V101) the programmer about an extension of the variable AA to the type size_t:
void Foo(int X, int Y)
{
auto AA = X * Y;
size_t BB = AA; //V101
}
First of all, a new table of lexemes was composed that included the new C++0x key words. This table is stored in the file Lex.cc and has the name tableC0xx. To avoid modifying the obsolete code responsible for processing the lexeme "auto" (tkAUTO), it got the name tkAUTOcpp0x in this table.
With the appearance of the new lexeme, the following functions were modified: isTypeToken, optIntegralTypeOrClassSpec. A new class LeafAUTOc0xx appeared. TypeInfoId has a new object class - AutoDecltypeType.
To code the type auto, the letter 'x' was chosen and it was reflected in the functions of the classes TypeInfo and Encoding. These are, for example, such functions as IsAutoCpp0x, MakePtree.
These corrections let you parse the code with the key word auto that has a new meaning and save the type of objects in the coded form (letter 'x'). But this does not let you know what type is actually assigned to the variable. That is, VivaCore lacks the functionality that would let you make sure that the variable AA in the expression "auto AA = X * Y" will have the type int.
This functionality is implemented in the source code of Viva64 and cannot be integrated into the code of VivaCore library. The implementation principle lies in additional work of calculating the type in TranslateAssignInitializer method. After the right side of the expression is calculated, the association between the (Bind) name of the variable and the type is replaced with another.
In some cases it is useful to "copy" the type of some object. The key word auto determines the type relying on the expression used to initialize the variable. If the variable is not initialized, you may use the key word decltype to determine the type of the expression during compilation. Here is an example of code where the variable "value" has the type returned by the function Calc():
decltype(Calc()) value;
try {
value = Calc();
}
catch(...) {
throw;
}
You may use decltype to define the type:
void f(const vector<int>& a,
vector<float>& b)
{
typedef decltype(a[0]*b[0]) Tmp;
for (int i=0; i<b.size(); ++i)
{
Tmp* p = new Tmp(a[i]*b[i]);
// ...
}
}
Keep in mind that the type defined with decltype may differ from that defined with auto.
const std::vector<int> v(1);
auto a = v[0];
decltype(v[0]) b = 1;
// type a - int
// type b - const int& (returned value
// std::vector<int>::operator[](size_type) const)
Let us look at another sample where decltype can be useful from the viewpoint of 64 bits. The function IsPresent searches for an element in a sequence and returns true if it is found:
bool IsPresent(char *array,
size_t arraySize,
char key)
{
for (unsigned i = 0; i < arraySize; i++)
if (array[i] == key)
return true;
return false;
}
This function cannot work on a 64-bit system with large arrays. If the variable arraySize has a value more than UINT_MAX, the condition "i < arraySize" will never be fulfilled and an eternal loop will occur.
If we use the key word auto, it will not change anything:
for (auto i = 0; i < arraySize; i++)
if (array[i] == key)
return true;
The variable "i" will have the type int because 0 has int type. The appropriate correction of the error lies in using decltype:
for (decltype(arraySize) i = 0; i < arraySize; i++)
if (array[i] == key)
return true;
Now the counter "i" has the type size_t as well as the variable arraySize.
decltype in VivaCore library is supported much like auto. A new lexeme tkDECLTYPE was added. The parsing function rDecltype in the file Parser.cc was added. With the appearance of the new lexeme they had to modify the function optIntegralTypeOrClassSpec. A new class LeafDECLTYPE appeared.
To code the type returned by the operator decltype, the character 'X' was chosen (capital 'X' unlike lower-case 'x' used for auto). Because of this, the functionality of the classes TypeInfo and Encoding changed too: for example, the functions WhatIs, IsDecltype, MakePtree.
The functionality of calculating the types for decltype operator is implemented in the class Environment and included into VivaCore library. The type is calculated while writing a new variable/type into Environment (the functions RecordTypedefName, RecordDeclarator, RecordConstantDeclarator). The function FixIfDecltype is responsible for calculating the type.
In the standard C++98, temporary objects can be passed into functions but only as a constant reference (const &). Therefore, a function cannot determine if it is a temporary object or a common one which is also passed as const &.
In C++0x, a new type of references is added - R-value reference. It is defined in the following way: "TYPE_NAME &&". It may be used as a non-constant, legally modified object. This innovation lets you take account of temporary objects and implement the move semantics. For example, if std::vector is created as a temporary object or returned from a function, you may simply move all the internal data from the reference of the new type when creating a new object. The move constructor std::vector simply copies through the reference to a temporary object it has received the pointer of the array situated in the reference which is emptied when the copying is over.
The move constructor or move operator may be defined in the following way:
template<class T> class vector {
// ...
vector(const vector&); // copy constructor
vector(vector&&); // move constructor
vector& operator=(const vector&); // copy assignment
vector& operator=(vector&&); // move assignment
};
From the viewpoint of analyzing 64-bit errors in code, it does not matter if '&' or '&&' is processed when defining the type. Therefore, the support of this innovation in VivaCore is very simple. Only the function optPtrOperator of Parser class underwent some modifications: we consider '&' and '&&' equally there.
From the viewpoint of C++98 standard, the following construct has a syntactical error:
list<vector<string>> lvs;
To avoid it, we should input a space between the two right angle brackets:
list<vector<string> > lvs;
The standard C++0x makes it legal to use double closing brackets when defining template types without adding a space between them. As a result, it enables us to write a bit more elegant code.
It is important to implement support for this innovation in static analyzers because developers will be very glad to avoid adding a lot of unnecessary spaces.
At the moment, parsing of definitions of template types with ">>" is implemented in VivaCore not very well. In some cases, the analyzer makes mistakes and it seems that we will significantly modify some analyzer's parts responsible for template parsing in time. Until it is done, you will meet the following ugly functions which use heuristic methods to determine if we deal with the shift operator ">>" or part of the definition of the template type "A<B<C>> D": IsTemplateAngleBrackets, isTemplateArgs. We recommend those, who want to know how to correctly solve this task, to see this document: "Right Angle Brackets (N1757)". In time, we will make processing of right angle brackets in VivaCore better.
Lambda-expressions in C++ are a brief way of writing anonymous functors (objects that can be used as functions). Let us touch upon some history. In C, pointers to a function are used to create functors:
/* callback-function */
int compare_function(int A, int B) {
return A < B;
}
/* definition of sorting function */
void mysort(int* begin_items,
int num_items,
int (*cmpfunc)(int, int));
int main(void) {
int items[] = {4, 3, 1, 2};
mysort(items,
sizeof(items)/sizeof(int),
compare_function);
return 0;
}
Earlier, the functor in C++ was created with the help of a class with an overloaded operator():
class compare_class {
public:
bool operator()(int A, int B) {
return (A < B);
}
};
// definition of sorting function
template <class ComparisonFunctor>
void mysort (int* begin_items,
int num_items,
ComparisonFunctor c);
int main() {
int items[] = {4, 3, 1, 2};
compare_class functor;
mysort(items,
sizeof(items)/sizeof(int),
functor);
}
In C++0x, we are enabled to define the functor even more elegantly:
auto compare_function = [](char a, char b)
{ return a < b; };
char Str[] = "cwgaopzq";
std::sort(Str,
Str + strlen(Str),
compare_function);
cout << Str << endl;
We create a variable compare_function which is a functor and whose type is determined by the compiler automatically. Then we may pass this variable to std::sort. We may also reduce the code a bit more:
char Str[] = "cwgaopzq";
std::sort(
Str,
Str + strlen(Str),
[](char a, char b) {return a < b;}
);
cout << Str << endl;
Here "[](char a, char b) {return a < b;}" is that very lambda-function.
A lambda-expression always begins with brackets [] in which you may specify the capture list. Then there is an optional parameter list and optional type of the returned value. The definition is finished with the function's body itself. On the whole, the format of writing lambda-functions is as follows:
'[' [<capture_list>] ']'
[ '(' <parameter_list> ')' ['mutable' ] ]
[ 'throw' '(' [<exception_types>] ')' ]
[ '->' <returned_value_type> ]
'{' [<function_body>] '}'
Note. Specification of exceptions in common and lambda-functions is considered obsolete nowadays. There is a new key word noexcept introduced but this innovation has not been supported in Visual C++ yet.
The capture list specifies what objects from the exterior scope a lambda-function is allowed to access:
Unfortunately, it is impossible to cover lambda-functions very thoroughly within the scope of this article. You may read about them in detail in the sources given in the references at the end of this article. To demonstrate using lambda-functions, let us look at the code of a program that prints the strings in increasing order of their lengths.
The program creates an array of strings and an array of indexes. Then the program sorts the strings' indexes so that the strings are arranged according to growth of their lengths:
int _tmain(int, _TCHAR*[])
{
vector<string> strings;
strings.push_back("lambdas");
strings.push_back("decltype");
strings.push_back("auto");
strings.push_back("static_assert");
strings.push_back("nullptr");
vector<size_t> indices;
size_t k = 0;
generate_n(back_inserter(indices),
strings.size(),
[&k]() { return k++; });
sort(indices.begin(),
indices.end(),
[&](ptrdiff_t i1, ptrdiff_t i2)
{ return strings[i1].length() <
strings[i2].length(); });
for_each(indices.begin(),
indices.end(),
[&strings](const size_t i)
{ cout << strings[i] << endl; });
return 0;
}
Note. According to C++0x, you may initialize arrays std::vector in the following way:
vector<size_t> indices = {0,1,2,3,4};
But Visual Studio 2010 has no support for such constructs yet.
The quality of analysis of lambda-functions in static analyzers must correspond to the quality of analysis of common functions. On the whole, analysis of lambda-functions resembles that of common functions with the exception that lambda-functions have a different scope.
In PVS-Studio, we implemented the complete diagnosis of errors in lambda-functions. Let us consider an example of code containing a 64-bit error:
int a = -1;
unsigned b = 0;
const char str[] = "Viva64";
const char *p = str + 1;
auto lambdaFoo = [&]() -> char
{
return p[a+b];
};
cout << lambdaFoo() << endl;
This code works when compiling the program in the Win32 mode and displays the letter 'V'. In the Win64 mode, the program crashes because of an attempt to access the item with the number 0xFFFFFFFF. To learn more about this kind of errors, see the lessons on development of 64-bit C/C++ applications - "Lesson 13. Pattern 5. Address arithmetic".
When checking the code shown above, PVS-Studio generates the diagnostic message:
error V108: Incorrect index type: p[not a memsize-type]. Use memsize
type instead.
Correspondingly, the analyzer must have parsed the lambda-function and make out the scope of variables to do this. It is a difficult yet necessary functionality.
The most significant modifications in VivaCore are related to lambda-function support. It is a new function rLamdas that participates in the process of building the parse tree. The function is situated in the class Parser and called from such functions as rInitializeExpr, rFunctionArguments, rCommaExpression. The function rLambdas parses lambda-functions and adds a new type of an object into the tree - PtreeLambda. The class PtreeLambda is defined and implemented in the files PtreeLambda.h and PtreeLambda.
Processing of PtreeLambda in the built tree is performed by TranslateLambda function. The whole logic of working with lambda-functions is concentrated in VivaCore. Inside TranslateLambda, you can see the call of the function GetReturnLambdaFunctionTypeForReturn implemented in PVS-Studio's code. But this function serves for internal purposes of PVS-Studio and an empty function-stub GetReturnLambdaFunctionTypeForReturn does not impact code parsing in VivaCore at all.
There are cases when it is difficult to determine the type returned by a function. Let us consider an example of a template function that multiplies two values by each other:
template<class T, class U>
??? mul(T x, U y)
{
return x*y;
}
The returned type must be the type of the expression "x*y". But it is not clear what to write instead of "???". The first idea is to use decltype:
template<class T, class U>
decltype(x*y) mul(T x, U y) //Scope problem!
{
return x*y;
}
The variables "x" and "y" are defined after "decltype(x*y)" and this code, unfortunately, cannot be compiled.
To solve this issue, we should use a new syntax of returned values:
template<class T, class U>
[] mul(T x, U y) -> decltype(x*y)
{
return x*y;
}
Using the brackets [], we spawn a lambda-function here and say that "the returned type will be determined or defined later". Unfortunately, this sample cannot be compiled in Visual C++ by the moment of writing this article although it is correct. But we go an alternative way (where we also use Suffix return type syntax):
template<class T, class U>
auto mul(T x, U y) -> decltype(x*y)
{
return x*y;
}
This code will be successfully built by Visual C++ and we will get the needed result.
The version PVS-Studio 3.50 supports the new function format only partially. Constructs are fully parsed by VivaCore library but PVS-Studio does not take into consideration the data types returned by these functions in the analysis. To learn about support of an alternative record of functions in VivaCore library, see the function Parser::rIntegralDeclaration.
The standard C++0x has a new key word static_assert. Its syntax is:
static_assert(expression, "error message");
If the expression is false, the mentioned error message is displayed and compilation aborts. Let us consider an example of using static_assert:
template <unsigned n>
struct MyStruct
{
static_assert(n > 5, "N must be more 5");
};
MyStruct<3> obj;
When compiling this code, Visual C++ compiler will display the message:
error C2338: N must be more 5
xx.cpp(33) : see reference to class template
instantiation 'MyStruct<n>' being compiled
with
[
n=3
]
From the viewpoint of code analysis performed by PVS-Studio, the construct static_assert is not very interesting and therefore is ignored. In VivaCore, a new lexeme tkSTATIC_ASSERT is added. On meeting this lexeme, the lexer ignores it and all the parameters referring to the construct static_assert (implemented in the function Lex::ReadToken).
There has been no key word to denote a null pointer before the standard C++0x in C++. To denote it, the number 0 was used. But a good style is to use the macro NULL. When opening the macro NULL, it turns into 0 and there is no actual difference between them. This is how the macro NULL is defined in Visual Studio:
#define NULL 0
In some cases, absence of a special key word to define a null pointer was inconvenient and even led to errors. Consider an example:
void Foo(int a)
{ cout << "Foo(int a)" << endl; }
void Foo(char *a)
{ cout << "Foo(char *a)" << endl; }
int _tmain(int, _TCHAR*[])
{
Foo(0);
Foo(NULL);
return 0;
}
Although the programmer expects that different Foo functions will be called in this code, it is wrong. It is 0 that will be put instead of NULL and that will have the type int. When launching the program you will see on the screen:
Foo(int a)
Foo(int a)
To eliminate such situations, the key word nullptr was introduced into C++0x. The constant nullptr has the type nullptr_t and is implicitly converted to any pointer type or a pointer to class members. The constant nullptr cannot be implicitly converted to integer data types except for bool type.
Let us return to our example and add the call of the function Foo with the argument nullptr:
void Foo(int a)
{ cout << "Foo(int a)" << endl; }
void Foo(char *a)
{ cout << "Foo(char *a)" << endl; }
int _tmain(int, _TCHAR*[])
{
Foo(0);
Foo(NULL);
Foo(nullptr);
return 0;
}
Now you will see:
Foo(int a)
Foo(int a)
Foo(char *a)
Although the key word nullptr is not relevant from the viewpoint of searching for 64-bit error, it must be supported when parsing the code. For this purpose, a new lexeme tkNULLPTR was added in VivaCore as well as the class LeafNULLPTR. Objects of LeafNULLPTR type are created in the function rPrimaryExpr. When calling the function LeafNULLPTR::Typeof, the type "nullptr" is coded as "Pv", i.e. "void *". From the viewpoint of existing tasks of code analysis in PVS-Studio, it is quite enough.
The standard C++0x introduces new standard classes referring to namespace std. Some of these classes are already supported in Visaul Studio 2010, for example:
Since these entities are usual template classes, they do not demand any modification of PVS-Studio or VivaCore library.
At the end of our article, I would like to mention one interesting thing related to using C++0x standard. On the one hand, the new features of the language make code safer and more effective by eliminating old drawbacks, but on the other hand, they create new unknown traps the programmer might fall into. However, I cannot tell you anything about them yet.
But one might fall into already known traps as well because their diagnosis in the new C++0x constructs is implemented much worse or not implemented at all. Consider a small sample showing the use of an uninitialized variable:
{
int x;
std::vector<int> A(10);
A[0] = x; // Warning C4700
}
{
int x;
std::vector<int> A(10);
std::for_each(A.begin(), A.end(),
[x](int &y)
{ y = x; } // No Warning
);
}
The programmer might hope to get a warning from the compiler in both cases. But in the example with the lambda-function, there will be no diagnostic message (it was tried on Visual Studio 2010 RC, /W4) - like there have not been many other warnings about various dangerous situations before. It needs some time to implement such diagnosis.
We may expect a new round in development of static analyzers concerning the topic of searching for potentially dangerous constructs that occur when using C++0x constructs. We position our product PVS-Studio as a tool to test contemporary programs. At the moment, we understand 64-bit and parallel technologies by this term. In the future, we plan to carry out an investigation into the question what potential issues one may expect using C++0x. If there are a lot of traps, perhaps we will start developing a new tool to diagnose them.
We think that C++0x brings many good features. Obsolete code does not demand an immediate upgrading, although it may be modified during refactoring in time. What the new code is concerned, we may write it already with the new constructs. So, it seems reasonable to start employing C++0x right now.