The Microsoft Corporation has recently published, for free, access the source code of the CoreCLR engine, which is a key component of .NET Core. We couldn't help but pay attention to this event. The wider a project's audience is, the worse defects found in the code will seem, won't they? Despite Microsoft themselves being the authors of the product, there are still some issues to examine and think over in their code - just like in any other large project.
CoreCLR is a runtime environment of .NET Core performing such functions as garbage collection, or compilation into target machine code. .Net Core is a modular implementation of .Net, which can be used as the base stack for a wide variety of scenarios.
The source code has recently been uploaded to GitHub, and was analyzed by PVS-Studio 5.23. Just like me, anyone interested can get the complete analysis log through Microsoft Visual Studio Community Edition, the release of which is another recent event by Microsoft.
The custom is that I start my reports with the typos section. Errors of this type have to do with duplicated variables, constants, macros or structure/class fields in conditional expressions. Whether or not there is a real error is subject for debate. Nevertheless, we found a couple of such fragments in the project, and they do look strange.
V501 There are identical sub-expressions 'tree->gtOper == GT_CLS_VAR' to the left and to the right of the '||' operator. ClrJit lsra.cpp 3140
// register variable
GTNODE(GT_REG_VAR , "regVar" ,0,GTK_LEAF|GTK_LOCAL)
// static data member
GTNODE(GT_CLS_VAR , "clsVar" ,0,GTK_LEAF)
// static data member address
GTNODE(GT_CLS_VAR_ADDR , "&clsVar" ,0,GTK_LEAF)
....
void LinearScan::buildRefPositionsForNode(GenTree *tree, ....)
{
....
if ((tree->gtOper == GT_CLS_VAR ||
tree->gtOper == GT_CLS_VAR) && i == 1)
{
registerType = TYP_PTR;
currCandidates = allRegs(TYP_PTR);
}
....
}
Although the 'GenTree' structure has a field with a similar name "tree->gtType", this field has a different type than "tree->gtOper". I guess the mistake was made by copying the constant. That is, there should be another constant besides GT_CLS_VAR in the expression.
V501 There are identical sub-expressions 'DECODE_PSP_SYM' to the left and to the right of the '|' operator. daccess 264
enum GcInfoDecoderFlags
{
DECODE_SECURITY_OBJECT = 0x01,
DECODE_CODE_LENGTH = 0x02,
DECODE_VARARG = 0x04,
DECODE_INTERRUPTIBILITY = 0x08,
DECODE_GC_LIFETIMES = 0x10,
DECODE_NO_VALIDATION = 0x20,
DECODE_PSP_SYM = 0x40,
DECODE_GENERICS_INST_CONTEXT = 0x80,
DECODE_GS_COOKIE = 0x100,
DECODE_FOR_RANGES_CALLBACK = 0x200,
DECODE_PROLOG_LENGTH = 0x400,
DECODE_EDIT_AND_CONTINUE = 0x800,
};
size_t GCDump::DumpGCTable(PTR_CBYTE table, ....)
{
GcInfoDecoder hdrdecoder(table,
(GcInfoDecoderFlags)( DECODE_SECURITY_OBJECT
| DECODE_GS_COOKIE
| DECODE_CODE_LENGTH
| DECODE_PSP_SYM // <=1
| DECODE_VARARG
| DECODE_PSP_SYM // <=1
| DECODE_GENERICS_INST_CONTEXT // <=2
| DECODE_GC_LIFETIMES
| DECODE_GENERICS_INST_CONTEXT // <=2
| DECODE_PROLOG_LENGTH),
0);
....
}
Here we have even two duplicated constants, although the "GcInfoDecoderFlags" enumeration includes other constants which are not used in the condition.
Other similar fragments:
V700 Consider inspecting the 'T foo = foo = ...' expression. It is odd that variable is initialized through itself. cee_wks zapsig.cpp 172
BOOL ZapSig::GetSignatureForTypeHandle(....)
{
....
CorElementType elemType = elemType =
TryEncodeUsingShortcut(pMT);
....
}
Seems just like an excessive assignment, but errors like this are often made when copying code, the programmer forgetting to rename some entity. Anyway, the code doesn't make sense that way.
V523 The 'then' statement is equivalent to the 'else' statement. cee_wks threadsuspend.cpp 2468
enum __MIDL___MIDL_itf_mscoree_0000_0004_0001
{
OPR_ThreadAbort = 0,
OPR_ThreadRudeAbortInNonCriticalRegion = .... ,
OPR_ThreadRudeAbortInCriticalRegion = ....) ,
OPR_AppDomainUnload = .... ,
OPR_AppDomainRudeUnload = ( OPR_AppDomainUnload + 1 ) ,
OPR_ProcessExit = ( OPR_AppDomainRudeUnload + 1 ) ,
OPR_FinalizerRun = ( OPR_ProcessExit + 1 ) ,
MaxClrOperation = ( OPR_FinalizerRun + 1 )
} EClrOperation;
void Thread::SetRudeAbortEndTimeFromEEPolicy()
{
LIMITED_METHOD_CONTRACT;
DWORD timeout;
if (HasLockInCurrentDomain())
{
timeout = GetEEPolicy()->
GetTimeout(OPR_ThreadRudeAbortInCriticalRegion); // <=
}
else
{
timeout = GetEEPolicy()->
GetTimeout(OPR_ThreadRudeAbortInCriticalRegion); // <=
}
....
}
This diagnostic detects identical blocks in if/else constructs. And here, we are also dealing with what seems to be a typo in a constant. In the first case, as suggested by the logic of the code, it is "OPR_ThreadRudeAbortInNonCriticalRegion" that fits best here.
Other similar fragments:
V670 The uninitialized class member 'gcInfo' is used to initialize the 'regSet' member. Remember that members are initialized in the order of their declarations inside a class. ClrJit codegencommon.cpp 92
CodeGenInterface *getCodeGenerator(Compiler *comp);
class CodeGenInterface
{
friend class emitter;
public:
....
RegSet regSet; // <= line 91
....
public:
GCInfo gcInfo; // <= line 322
....
};
// CodeGen constructor
CodeGenInterface::CodeGenInterface(Compiler* theCompiler) :
compiler(theCompiler),
gcInfo(theCompiler),
regSet(theCompiler, gcInfo)
{
}
Under the standard, the class members are initialized in the constructor in the same order as they are declared in the class. To fix the error, we should move the declaration of the 'gcInfo' class member above that of 'regSet'.
V705 It is possible that 'else' block was forgotten or commented out, thus altering the program's operation logics. daccess daccess.cpp 2979
HRESULT Initialize()
{
if (hdr.dwSig == sig)
{
m_rw = eRO;
m_MiniMetaDataBuffSizeMax = hdr.dwTotalSize;
hr = S_OK;
}
else
// when the DAC initializes this for the case where the target is
// (a) a live process, or (b) a full dump, buff will point to a
// zero initialized memory region (allocated w/ VirtualAlloc)
if (hdr.dwSig == 0 && hdr.dwTotalSize == 0 && hdr.dwCntStreams == 0)
{
hr = S_OK;
}
// otherwise we may have some memory corruption. treat this as
// a liveprocess/full dump
else
{
hr = S_FALSE;
}
....
}
The analyzer has detected a suspicious code fragment. You can see that the code is commented ON, and everything works fine. But errors like this are very frequent when the code after 'else' is commented OUT, the operator following it becoming a part of the condition. There is no error in this particular case but it might well appear there, when editing this fragment in the future.
V673 The '0xefefefef << 28' expression evaluates to 1080581331517177856. 60 bits are required to store the value, but the expression evaluates to the 'unsigned' type which can only hold '32' bits. cee_dac _dac object.inl 95
inline void Object::EnumMemoryRegions(void)
{
....
SIZE_T size = sizeof(ObjHeader) + sizeof(Object);
....
size |= 0xefefefef << 28;
....
}
For the definition of the term "64-bit error", please see this note. In the example above, after the shift, the "size |= 0xf0000000" operation will be executed in the 32-bit program and "size |= 0x00000000f0000000" in the 64-bit one. The programmer most likely wanted the following calculations to be done in the 64-bit program: "size |= 0x0efefefef0000000". But where have we lost the most significant part of the number?
The number "0xefefefef" has the 'unsigned' type, as it doesn't fit into the 'int' type. A shift of a 32-bit number occurs, which results in unsigned 0xf0000000. Then this unsigned number is extended to SIZE_T and we get 0x00000000f0000000.
To have the code to work right, we need to execute an explicit type conversion first. This is the fixed code:
inline void Object::EnumMemoryRegions(void)
{
....
SIZE_T size = sizeof(ObjHeader) + sizeof(Object);
....
size |= SIZE_T(0xefefefef) << 28;
....
}
Another issue of the same kind:
Sometimes you may find conditions that literally contradict each other.
V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 31825, 31827. cee_wks gc.cpp 31825
void gc_heap::verify_heap (BOOL begin_gc_p)
{
....
if (brick_table [curr_brick] < 0)
{
if (brick_table [curr_brick] == 0)
{
dprintf(3, ("curr_brick %Ix for object %Ix set to 0",
curr_brick, (size_t)curr_object));
FATAL_GC_ERROR();
}
....
}
....
}
This code never gets control, but it doesn't look that critical, as in the following example:
V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 2353, 2391. utilcode util.cpp 2353
void PutIA64Imm22(UINT64 * pBundle, UINT32 slot, INT32 imm22)
{
if (slot == 0)
{
const UINT64 mask0 = UI64(0xFFFFFC000603FFFF);
/* Clear all bits used as part of the imm22 */
pBundle[0] &= mask0;
UINT64 temp0;
temp0 = (UINT64) (imm22 & 0x200000) << 20; // 1 s
temp0 |= (UINT64) (imm22 & 0x1F0000) << 11; // 5 imm5c
temp0 |= (UINT64) (imm22 & 0x00FF80) << 25; // 9 imm9d
temp0 |= (UINT64) (imm22 & 0x00007F) << 18; // 7 imm7b
/* Or in the new bits used in the imm22 */
pBundle[0] |= temp0;
}
else if (slot == 1)
{
....
}
else if (slot == 0) // <=
{
const UINT64 mask1 = UI64(0xF000180FFFFFFFFF);
/* Clear all bits used as part of the imm22 */
pBundle[1] &= mask1;
UINT64 temp1;
temp1 = (UINT64) (imm22 & 0x200000) << 37; // 1 s
temp1 |= (UINT64) (imm22 & 0x1F0000) << 32; // 5 imm5c
temp1 |= (UINT64) (imm22 & 0x00FF80) << 43; // 9 imm9d
temp1 |= (UINT64) (imm22 & 0x00007F) << 36; // 7 imm7b
/* Or in the new bits used in the imm22 */
pBundle[1] |= temp1;
}
FlushInstructionCache(GetCurrentProcess(),pBundle,16);
}
Perhaps it is a very important piece of code that never gets control, because of a bug in the cascade of conditional operators.
Other suspicious fragments:
V610 Undefined behavior. Check the shift operator '<<'. The left operand '-1' is negative. bcltype metamodel.h 532
inline static mdToken decodeToken(....)
{
//<TODO>@FUTURE: make compile-time calculation</TODO>
ULONG32 ix = (ULONG32)(val & ~(-1 << m_cb[cTokens]));
if (ix >= cTokens)
return rTokens[0];
return TokenFromRid(val >> m_cb[cTokens], rTokens[ix]);
}
The analyzer has detected a negative-number shift causing undefined behavior.
V610 Undefined behavior. Check the shift operator '<<'. The left operand '(~0)' is negative. cee_dac decodemd.cpp 456
#define bits_generation 2
#define generation_mask (~(~0 << bits_generation))
#define MASK(len) (~((~0)<<len))
#define MASK64(len) ((~((~((unsigned __int64)0))<<len)))
void Encoder::Add(unsigned value, unsigned length)
{
....
value = (value & MASK(length));
....
}
Thanks to the V610 message, I discovered a number of incorrect macros. '~0' is cast to a signed negative number of the int type, after which a shift is executed. Just like in one of the macros, an explicit conversion to the unsigned type is necessary:
#define bits_generation 2
#define generation_mask (~(~((unsigned int)0) << bits_generation))
#define MASK(len) (~((~((unsigned int)0))<<len))
#define MASK64(len) ((~((~((unsigned __int64)0))<<len)))
V579 The DacReadAll function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. daccess dacimpl.h 1688
template<class T>
inline bool MisalignedRead(CORDB_ADDRESS addr, T *t)
{
return SUCCEEDED(DacReadAll(TO_TADDR(addr), t, sizeof(t), false));
}
Here's just a small function which always takes the pointer size. The programmer most likely intended to write it like "sizeof(*t)", or maybe "sizeof(T)".
Another good example:
V579 The Read function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. util.cpp 4943
HRESULT GetMTOfObject(TADDR obj, TADDR *mt)
{
if (!mt)
return E_POINTER;
HRESULT hr = rvCache->Read(obj, mt, sizeof(mt), NULL);
if (SUCCEEDED(hr))
*mt &= ~3;
return hr;
}
When using memXXX-functions, one risks making a variety of mistakes. The analyzer provides a number of diagnostic rules to detect such fragments.
V512 A call of the 'memset' function will lead to underflow of the buffer 'pAddExpression'. sos strike.cpp 11973
DECLARE_API(Watch)
{
....
if(addExpression.data != NULL || aExpression.data != NULL)
{
WCHAR pAddExpression[MAX_EXPRESSION];
memset(pAddExpression, 0, MAX_EXPRESSION);
swprintf_s(pAddExpression, MAX_EXPRESSION, L"%S", ....);
Status = g_watchCmd.Add(pAddExpression);
}
....
}
A very common bug when programmers forget to allow for the type size:
WCHAR pAddExpression[MAX_EXPRESSION];
memset(pAddExpression, 0, sizeof(WCHAR)*MAX_EXPRESSION);
Other similar fragments:
V598 The 'memcpy' function is used to copy the fields of 'GenTree' class. Virtual table pointer will be damaged by this. ClrJit compiler.hpp 1344
struct GenTree
{
....
#if DEBUGGABLE_GENTREE
virtual void DummyVirt() {}
#endif // DEBUGGABLE_GENTREE
....
};
void GenTree::CopyFrom(const GenTree* src, Compiler* comp)
{
....
memcpy(this, src, src->GetNodeSize());
....
}
When the preprocessor variable 'DEBUGGABLE_GENTREE' is declared, a virtual function is defined. The class then contains a pointer to the virtual method table, and cannot be copied that freely.
V598 The 'memcpy' function is used to copy the fields of 'GCStatistics' class. Virtual table pointer will be damaged by this. cee_wks gc.cpp 287
struct GCStatistics
: public StatisticsBase
{
....
virtual void Initialize();
virtual void DisplayAndUpdate();
....
};
GCStatistics g_LastGCStatistics;
void GCStatistics::DisplayAndUpdate()
{
....
memcpy(&g_LastGCStatistics, this, sizeof(g_LastGCStatistics));
....
}
In this fragment, incorrect copying is done in any mode, not only the debugging one.
V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead. sos util.cpp 142
bool operator( )(const GUID& _Key1, const GUID& _Key2) const
{ return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }
It's incorrect to compare the result of the 'memcmp' function to 1 or -1. Whether or not such constructs will work depends on the libraries, compiler and its settings, the operating system and its bitness, and so on. In cases like this, you should check one of the three states: '< 0', '0', or '> 0'.
One more issue of the same kind:
V522 Dereferencing of the null pointer 'hp' might take place. cee_wks gc.cpp 4488
heap_segment* gc_heap::get_segment_for_loh (size_t size
#ifdef MULTIPLE_HEAPS
, gc_heap* hp
#endif //MULTIPLE_HEAPS
)
{
#ifndef MULTIPLE_HEAPS
gc_heap* hp = 0;
#endif //MULTIPLE_HEAPS
heap_segment* res = hp->get_segment (size, TRUE);
....
}
When 'MULTIPLE_HEAPS' is not defined, it's no good, because the pointer will equal zero.
V595 The 'tree' pointer was utilized before it was verified against nullptr. Check lines: 6970, 6976. ClrJit gentree.cpp 6970
void Compiler::gtDispNode(GenTreePtr tree, ....)
{
....
if (tree->gtOper >= GT_COUNT)
{
printf(" **** ILLEGAL NODE ****");
return;
}
if (tree && printFlags)
{
/* First print the flags associated with the node */
switch (tree->gtOper)
{
....
}
....
}
....
}
There are many fragments in the project's source code where pointers are checked for being valid - but only after they have been dereferenced.
Here's a complete list of all the fragments of that kind: CoreCLR_V595.txt.
Even if excessive code doesn't do any harm, its mere presence may distract the programmers' attention from working on more important things.
V503 This is a nonsensical comparison: pointer >= 0. cee_wks gc.cpp 21707
void gc_heap::make_free_list_in_brick (BYTE* tree,
make_free_args* args)
{
assert ((tree >= 0));
....
}
A nice pointer check, huh? Two more examples:
V547 Expression 'maxCpuId >= 0' is always true. Unsigned type value is always >= 0. cee_wks codeman.cpp 1219
void EEJitManager::SetCpuInfo()
{
....
unsigned char buffer[16];
DWORD maxCpuId = getcpuid(0, buffer);
if (maxCpuId >= 0)
{
....
}
A similar example, but with the DWORD type.
V590 Consider inspecting the 'wzPath[0] != L'\0' && wzPath[0] == L'\\'' expression. The expression is excessive or contains a misprint. cee_wks path.h 62
static inline bool
HasUncPrefix(LPCWSTR wzPath)
{
_ASSERTE(!clr::str::IsNullOrEmpty(wzPath));
return wzPath[0] != W('\0') && wzPath[0] == W('\\')
&& wzPath[1] != W('\0') && wzPath[1] == W('\\')
&& wzPath[2] != W('\0') && wzPath[2] != W('?');
}
This function can be simplified to the following code:
static inline bool
HasUncPrefix(LPCWSTR wzPath)
{
_ASSERTE(!clr::str::IsNullOrEmpty(wzPath));
return wzPath[0] == W('\\')
&& wzPath[1] == W('\\')
&& wzPath[2] != W('\0')
&& wzPath[2] != W('?');
}
Another fragment:
V571 Recurring check. The 'if (moduleInfo[MSCORWKS].baseAddr == 0)' condition was already verified in line 749. sos util.cpp 751
struct ModuleInfo
{
ULONG64 baseAddr;
ULONG64 size;
BOOL hasPdb;
};
HRESULT CheckEEDll()
{
....
// Do we have clr.dll
if (moduleInfo[MSCORWKS].baseAddr == 0) // <=
{
if (moduleInfo[MSCORWKS].baseAddr == 0) // <=
g_ExtSymbols->GetModuleByModuleName (
MAIN_CLR_MODULE_NAME_A,0,NULL,
&moduleInfo[MSCORWKS].baseAddr);
if (moduleInfo[MSCORWKS].baseAddr != 0 && // <=
moduleInfo[MSCORWKS].hasPdb == FALSE)
{
....
}
....
}
....
}
There's no need to check 'baseAddr' in the second case.
V704 'this == nullptr' expression should be avoided - this expression is always false on newer compilers, because 'this' pointer can never be NULL. ClrJit gentree.cpp 12731
bool FieldSeqNode::IsFirstElemFieldSeq()
{
if (this == nullptr)
return false;
return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}
Under the C++ standard, the 'this' pointer can never be null. For details about possible consequences of code such as the sample above, see the description of the V704 diagnostic. Such code working correctly after having been compiled by the Visual C++ compiler is mere luck, and one can't really rely on that.
The complete list of all other fragments of this kind: CoreCLR_V704.txt.
V668 There is no sense in testing the 'newChunk' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of a memory allocation error. ClrJit stresslog.h 552
FORCEINLINE BOOL GrowChunkList ()
{
....
StressLogChunk * newChunk = new StressLogChunk (....);
if (newChunk == NULL)
{
return FALSE;
}
....
}
If the 'new' operator has failed to allocate memory, then it must throw the std::bad_alloc() exception, as required by the C++ language standard. Therefore, checking the pointer for being null doesn't make any sense here.
I advise reviewing all the fragments of this kind. Here's a complete list: CoreCLR_V668.txt.
The recently published CoreCLR project is a nice example of what a proprietary software product's code may look like. There are unceasing debates on this subject, so here's another topic for you to think about, and argue on.
What is personally important for us, is the fact that there are always some bugs to be found in any large project, and that the best way to use a static analyzer is to use it regularly. Don't be lazy, go download PVS-Studio, and check your project.