To be honest, I don't know what the TPP project is intended for. As far as I understand, this is a set of tools to assist in research of proteins and their interaction in living organisms. However, that's not so much important. What is important is that their source codes are open. It means that I can check them with the PVS-Studio static analyzer. Which I'm very much fond of.
So, we have checked the Trans-Proteomic Pipeline (TPP) version 4.5.2 project. To learn more about the project, see the following links:
We don't write about every project we analyze. It must have some prominent features for us to make a report of its analysis. Otherwise, description of found errors would be boring. For example, a project has to be well-known, or contain many errors of a certain class, whatever. Usually certain types of defects prevail in projects. In case of TPP there are few repeating errors. They are diverse - and that is what makes the project outstanding.
I should note that many errors refer not to the TPP project itself but to the library it uses to handle XML. But I don't think there is any difference whether it is the program's fault or the XML library's fault when a file is processed incorrectly. That's why I won't specify to which part of the project this or that bug refers. No more talk. Let's see what interesting samples we have.
Unfortunately, I don't know what peptides are. Wikipedia tells me that these are short polymers of amino acid monomers linked by peptide bonds. It's quite expectable that TPP has a class called Peptide which in its turn has a comparison operator. It is realized in the following way:
bool Peptide::operator==(Peptide& p) {
...
for (i = 0, j = 0;
i < this->stripped.length(), j < p.stripped.length();
i++, j++) {
...
}
PVS-Studio: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. tpplib peptide.cpp 191
Note that there is the comma operator ',' between two comparisons. The comma operator returns the value of the expression standing to the right. It means that only one condition is checked: "j < p.stripped.length()". To make the comparison correct we need to rewrite it in the following way:
for (i = 0, j = 0;
i < this->stripped.length() && j < p.stripped.length();
i++, j++) {
An identical mistake is made in the Peptide::strippedEquals() method. I'm worried about the peptides :).
When one handles file paths in a program, one may often want a path to have a slash \ or / at the end. The following code is written in TPP for this purpose:
bool TandemResultsParser::writePepXML(....)
{
...
char c = pathSummary.at(pathSummary.length() - 1);
if (c != '\\' || c != '/')
{
if (pathSummary.find('\\') != string::npos)
pathSummary.append("\\");
else
pathSummary.append("/");
}
...
}
PVS-Studio: V547 Expression 'c != '\\' || c != '/'' is always true. Probably the '&&' operator should be used here. Tandem2XML tandemresultsparser.cxx 787
If you look close at the "if (c != '\\' || c != '/')" condition, you will see a misprint. The condition is always true. The 'c' variable will be either not equal to '\\' or not equal to '/'. As a result, two slashes may appear at the end of the file path. This mistake is not crucial perhaps, yet it is unpleasant.
This is the correct condition:
if (c != '\\' && c != '/')
Consider a code fragment intended to find the " PI " substring in a string:
class basic_string
{
...
size_type find(const _Elem *_Ptr, size_type _Off = 0) const
...
}
void PipelineAnalysis::prepareFields(void) {
...
if (peptideProphetOpts_.find(" PI ", 0)>=0) {
fields_.push_back(Field("PpI_zscore"));
}
...
}
PVS-Studio: V547 Expression 'peptideProphetOpts_.find(" PI ", 0) >= 0' is always true. Unsigned type value is always >= 0. pepXMLViewer pipelineanalysis.cxx 1590
The std::string::find() function used incorrectly. If the substring cannot be found, the find() function returns the value string::npos. Note that this value has an unsigned type.
At the same time, it is supposed in the program that if the substring is not found, the find() function should return a negative number. This will never happen. The "peptideProphetOpts_.find(" PI ", 0)>=0" condition is always true, as an unsigned value is always >= 0.
As a result, regardless what data the 'peptideProphetOpts' string actually contains, it will be marked as "PpI_zscore" anyway. And one more thing - an identical mistake can be found in the same function a bit farther. I'm worried about the peptides again.
This is what the correct substring search should look like:
if (peptideProphetOpts_.find(" PI ", 0) != string::npos)
A code fragment generating random numbers might cause much more random consequences than needed. Consider this code:
int main(int argc, char **argv) {
...
char salt[3];
...
salt[0] = (argc>2)?(argv[1][0]):rndChar[rand() % 64];
salt[1] = (argc>2)?(argv[1][1]):rndChar[rand() % 64];
salt[3] = 0;
...
}
PVS-Studio: V557 Array overrun is possible. The '3' index is pointing beyond array bound. crypt crypt.cxx 567
It should fill a small array with two random numbers and zero. But zero is written outside the array. As a result, the last array item remains uninitialized. That a value is written outside the array is unpleasant too. All this leads to program undefined behavior in future.
This is the fixed code:
salt[2] = 0;
void DIGEST_PROTEIN(char *szSeq, int iLenSeq)
{
...
if (pOptions.bMarkNXST
&& szPeptide[i] == 'N'
&& szPeptide[i + 1] != 'P'
&& (szPeptide[i + 2] == 'S' ||
szPeptide[i + 2] == 'T')
&& szPeptide[i + 1] != 'P')
...
}
PVS-Studio: V501 There are identical sub-expressions 'szPeptide[i + 1] != 'P'' to the left and to the right of the '&&' operator. Comet_fastadb comet-fastadb1.cxx 1906
A cell of the 'szPeptide[i + 1]' array is compared to 'P' twice. This causes the peptide type to be checked only approximately. I think the last string contains a misprint and the code should actually be the following one:
if (pOptions.bMarkNXST
&& szPeptide[i] == 'N'
&& szPeptide[i + 1] != 'P'
&& (szPeptide[i + 2] == 'S' ||
szPeptide[i + 2] == 'T')
&& szPeptide[i + 3] != 'P')
Lines in the program are long, that's why I replaced some text with dots. Don't worry about them - nothing interesting was omitted.
void MascotConverter::init(....) {
...
if(line_len > 8 && .... && line[7] == '=')
if(database_ == NULL)
database_ = strCopy(line+8);
else if(line_len > 5 && .... && line[4] == '=') {
header_ = strCopy(line+5);
...
}
Look at 'else if'. Do you see the trick? The else operator refers to the second 'if' operator, not the first one. If we format the code correctly, it will look like this:
if(line_len > 8 && .... && line[7] == '=')
if(database_ == NULL)
database_ = strCopy(line+8);
else if(line_len > 5 && .... && line[4] == '=') {
header_ = strCopy(line+5);
...
Strange logic? I agree. This is most likely a logical error here, not a code formatting mistake. The correct code seems to look like this:
if(line_len > 8 && .... && line[7] == '=')
{
if(database_ == NULL)
database_ = strCopy(line+8);
}
else if(line_len > 5 && .... && line[4] == '=') {
header_ = strCopy(line+5);
...
Conclusion: don't be greedy trying to skimp on curly braces.
One may often feel a temptation to call one constructor from another constructor in order not to duplicate code. Unfortunately, one may easily make a mistake while doing it. This is how this error looks:
class ExperimentCycleRecord {
public:
ExperimentCycleRecord()
{ ExperimentCycleRecord(0,0,0,True,False); }
...
}
PVS-Studio: V603 The object was created but it is not being used. If you wish to call constructor, 'this->ExperimentCycleRecord::ExperimentCycleRecord(....)' should be used. Mascot2XML mascotconverter.cxx 101
The "ExperimentCycleRecord(0,0,0,True,False);" statement creates a temporary object and successfully initializes it. But it won't initialize the current class's fields. To learn more about this error type see the article: Wade not in unknown waters. Part one. We also offer ways to fix such errors there
Identical errors can be found in some other places:
void TRANSLATE(int iFrame, char *szNewSeq,
char *szSeq, int *iLenSeq)
{
...
*iLenSeq;
}
PVS-Studio: V607 Ownerless expression '* iLenSeq'. Comet_fastadb comet-fastadb1.cxx 2241
There is a strange statement "*iLenSeq;" at the end of the 'TRANSLATE' function. This statement does nothing. Perhaps this is just an odd line. And maybe this fragment misses some code. But what exactly?..
Again, it's time to worry about peptides. Consider this code:
void MixtureModel::assessPeptideProperties(char* filename, Boolean
icat, Boolean glyc)
{
...
double fval;
...
// fval is not initialized
...
if(! icat && strstr(pep, "C") != NULL && fval >= min_fval) {
...
}
PVS-Studio: V614 Uninitialized variable 'fval' used. tpplib mixturemodel.cxx 834
You cannot say how the check will behave. The 'fval' variable is not initialized anywhere.
Here is a loop which will repeat an indefinite number of iterations:
double mscore_c::dot_hr(unsigned long *_v)
{
...
int iSeqSize;
//perform a single pass through each array.
//check every point in m_pfSeq,
//but don't revisit positions in m_vmiType
for (int a = 0; a < iSeqSize; a++) {
...
}
PVS-Studio: V614 Uninitialized variable 'iSeqSize' used. xtandem mscore_c.cpp 552
The 'iSeqSize' variable is not initialized.
There are other uninitialized variables as well. I won't tell you about them in detail - here is just a list:
One can but marvel looking at all this. And feel afraid. Scientific research based on uninitialized variables may produce interesting results :).
The next code fragment is intended to calculate the item sum. But two mistakenly swapped characters prevent it from doing so.
int main(int argc, char **argv)
{
...
ii=0;
for (i=0; pEnvironment.szPeptide[i]!=0; i++)
ii =+ pEnvironment.szPeptide[i];
...
}
PVS-Studio: V588 The expression of the 'A =+ B' kind is utilized. Consider reviewing it, as it is possible that 'A += B' was meant. plot_msms plot-msms1.cxx 478
The error is an elementary one. But this fact doesn't make it stop being an error. The code shows very well that many defects in programs are simple as hell. They are much more numerous than programmers believe. I wrote on this phenomenon in detail here: "Myths about static analysis. The second myth - expert developers do not make silly mistakes".
This is the correct code:
for (i=0; pEnvironment.szPeptide[i]!=0; i++)
ii += pEnvironment.szPeptide[i];
Let's look at an implementation of one iterator.
CharIndexedVectorIterator& operator++()
{ // preincrement
++m_itr;
return (*this);
}
CharIndexedVectorIterator& operator--()
{ // predecrement
++m_itr;
return (*this);
}
PVS-Studio: V524 It is odd that the body of '--' function is fully equivalent to the body of '++' function (charindexedvector.hpp, line 68). pwiz charindexedvector.hpp 81
The '++' operator is written correctly. But the '--' operator seems to be written through the Copy-Paste method. As a result, it behaves just like the '++' operator. However, other '--' operators are written in the same way. Maybe it's not an error but a smart trick.
We have found a loop that iterates only once.
const char* ResidueMass::getStdModResidues(....) {
...
for (rmap::const_iterator i = p.first; i != p.second; ++i) {
const cResidue &r = (*i).second;
if (r.m_masses[0].m_nterm) {
n_term_aa_mod = true;
} else if (r.m_masses[0].m_cterm) {
c_term_aa_mod = true;
}
return r.m_residue.c_str();
}
...
}
PVS-Studio: V612 An unconditional 'return' within a loop. tpplib residuemass.cxx 1442
There is the 'return' operator at the end of the loop body. At the same time, you can see that the loop doesn't contain the 'continue' operator or other mechanisms to continue the loop. It means that the loop iterates only once. I cannot say for sure what this code should actually look like. Perhaps there should be 'else' before the 'return' operator.
void ASAPCGIParser::writeProteinRatio(....)
{
...
pvalue = (double)norm_->normalize(adj_inv_ratio);
double tmp[2];
tmp[0] = adj_inv_ratio[0];
tmp[1] = adj_inv_ratio[1];
adj_inv_ratio[0] = 1/ tmp[0];
adj_inv_ratio[1] = tmp[1]/(tmp[0]*tmp[0]);
pvalue = (double)norm_->normalize(adjratio);
...
}
PVS-Studio: V519 The 'pvalue' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 205, 214. tpplib asapcgiparser.cxx 214 (...)
The 'pvalue' variable is initialized twice in a row with different values. It's strange. Perhaps some other variable should be initialized.
There are several more defects related to comparing unsigned variables to zero. For instance, here is one more fragment where the programmer had a trouble with slashes \, /.
int Dta2mzXML::extractScanNum(const string& dtaFileName)
{
...
std::string::size_type pos = dtaFileName.rfind("/");
if (pos < 0) {
pos = dtaFileName.rfind("\\");
}
...
}
PVS-Studio: V547 Expression 'pos < 0' is always false. Unsigned type value is never < 0. dta2mzXML dta2mzxml.cpp 622
The 'pos' variable is always above or equal to 0. We already touched upon this case above. Here is a list of several more errors of this type:
OK, we're done with search functions. There are a couple of errors on this subject left. The first error:
void SpectraSTReplicates::aggregateStats(....)
{
...
unsigned int numAssignedPeaks =
(*r)->entry->getPeakList()->getNumAssignedPeaks();
if (numAssignedPeaks >= 0) {
sumFracAssigned += (double)numAssignedPeaks/(double)numPeaks;
numAnnotated++;
}
...
}
PVS-Studio: V547 Expression 'numAssignedPeaks >= 0' is always true. Unsigned type value is always >= 0. tpplib spectrastreplicates.cpp 642
I think there's no need in comments and explanations here. This is the second error:
V547 Expression 'pl->getNumAssignedPeaks() >= 0' is always true. Unsigned type value is always >= 0. tpplib spectrastreplicates.cpp 724
We came across a strange fragment where one and the same code is executed regardless of a condition. Maybe this is a consequence of Copy-Paste.
bool KernelDensityRTMixtureDistr::recalc_RTstats(....)
{
...
if (catalog) {
tmp = (*run_RT_calc_)[i]->recalc_RTstats(
(*probs)[i], min_prob, (*ntts)[i], min_ntt, 2700);
}
else {
tmp = (*run_RT_calc_)[i]->recalc_RTstats(
(*probs)[i], min_prob, (*ntts)[i], min_ntt, 2700);
}
...
}
PVS-Studio: V523 The 'then' statement is equivalent to the 'else' statement. tpplib kerneldensityrtmixturedistr.cxx 104
Various errors occur during the process of protein analysis which should be reported to the user. The code below tries to create an error message but unfortunately fails.
RAMPREAL *readPeaks(RAMPFILE *pFI,
ramp_fileoffset_t lScanIndex)
{
...
else
{
const char* pEndAttrValue;
pEndAttrValue = strchr( pBeginData +
strlen( "contentType=\"") + 1 , '\"' );
pEndAttrValue = '\0';
fprintf(stderr, "%s Unsupported content type\n" , pBeginData);
return NULL;
}
...
}
PVS-Studio: V527 It is odd that the '\0' value is assigned to 'char' type pointer. Probably meant: *pEndAttrValue = '\0'. tpplib ramp.cpp 1856
This code searches for double quotes in the string and tries to replace them with terminal null. The error is this: the pEndAttrValue pointer is not dereferenced. Instead of writing zero where it should be written the pEndAttrValue pointer is cleared. As a consequence, the user will get some unnecessary text printed.
This is the fixed code:
*pEndAttrValue = '\0';
Identical errors can be found in some other places:
When writing XML files you need to create a temporary 10-byte buffer. The first byte in this buffer must equal '1', while all the rest bytes must be cleared. The strncpy() function is quite suitable for this purpose. Here is a description of the strncpy function:
char *strncpy (char *dst, const char *src, size_t len);
dst — Destination string.
src — Source string.
len — Number of characters to be copied.
The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If count is greater than the length of strSource, the destination string is padded with null characters up to length count.
The XML library contains code which seems correct at first sight:
void Out2XML::writeOutData() {
...
// assume a string of less than
// 9 characters will represent the charge state
char *chg=(char*)malloc(10 * sizeof(char));
//zero-fill the rest of the array
strncpy(chg, "1", sizeof(chg));
...
}
PVS-Studio: V579 The strncpy function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. CombineOut out2xml.cxx 210
The error is this: the sizeof() operator returns the pointer size, not the buffer size. Several bytes at the end will remain uninitialized. This is the fixed code:
strncpy(chg, "1", 10); // zero-fill the rest of the array
An identical error here:
V579 The strncpy function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. CombineOut out2xml.cxx 214
A quick way to check whether a string is empty is to compare its first character to zero. For example: str[0] == '\0'. Programmers often do so, but unfortunately they sometimes forget to dereference the pointer. This is how such errors look:
void SAXSpectraHandler::pushPeaks(....)
{
...
while(*pValue != '\0' && a < m_peaksCount) {
while(*pValue != '\0' && isspace(*pValue))
pValue++;
if(pValue == '\0')
break;
m_vfM.push_back((float)atof(pValue));
...
}
PVS-Studio: V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *pValue == '\0'. xtandem saxhandler.cpp 323
The second while() operator serves to skip all the blanks. Then we need to find out if there is something after the blanks. But the 'pValue' pointer is not dereferenced and the check never works.
This is the fixed code:
if(*pValue == '\0')
There are a couple of fragments where pointer dereferencing is missing:
The XML library's developers are careless when handling private data. I don't think it may do harm to TPP in any way, but since this error has been found, I should tell about it.
When private data (passwords, logins, etc.) are not needed anymore, they should be cleared in the memory. Otherwise, they may unexpectedly get into a file or sent over a network, and so on. These are not scary stories - this is reality. Please read this article to find out how it might happen: Overwriting memory - why?
To destroy private data in the buffer you have to write something into it. Many programmers use the memset() function for this purpose. But this is a bad idea. The compiler has the right to delete its call if its result is not used in any way. This subject is discussed in detail in the documentation: V597.
Here is an example of dangerous code:
void CSHA1::Final()
{
UINT_8 finalcount[8];
...
memset(finalcount, 0, 8);
Transform(m_state, m_buffer);
}
PVS-Studio: V597 The compiler could delete the 'memset' function call, which is used to flush 'finalcount' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. pwiz sha1.cpp 205
The compiler can delete the memset() function's call, as the 'finalcount' array is not used after that.
This is the fixed code:
RtlSecureZeroMemory(finalcount, 8);
Other identical defects:
The DiscriminantFunction class contains virtual functions but the destructor is not declared as a virtual one.
V599 The virtual destructor is not present, although the 'DiscriminantFunction' class contains virtual functions. tpplib discrimvalmixturedistr.cxx 201
The analyzer generated a lot of V595 warnings. It means that a pointer is used first and only then is checked for being a null pointer. This is strange but it's far not always an error sign. In many cases pointers in such fragments just cannot be null at all, so a check for 0 is unnecessary.
Unfortunately, I'm lazy and didn't search for places where potential null pointers may get dereferenced and where they may not. I'm sorry :). I think those of you who are interested by the article will do it themselves, having downloaded PVS-Studio. I will also note that this article demonstrates far not all the code fragments that may contain errors. It's very difficult to analyze a project you are absolutely unfamiliar with.
Having inspected this article after finishing it, I had a feeling that readers might misunderstand it. It seems that the article over-stresses silly mistakes and makes programmers look in an unfavorable way. No, its meaning is quite different. I wanted to show that programming is hard. Very hard. So much hard that one may easily make a whole lot of misprints and other slip-ups besides algorithmic errors. The reason for these slip-ups is not silliness or little knowledge at all. You have to keep in mind too much and focus on many unrelated tasks when programming. All this causes not only complex errors but simple mistakes as well. Only the programmer can fight the former. But the static analyzer can well fight the latter. At the same time, by catching simple errors it allows the programmer to pay more attention to program algorithms and structure. Do not ignore this class of tools.
Well, we've got a too long conclusion. So, here it is brief and clear:
0