After hot discussions on the article about "The Big Calculator" I felt like checking some other projects related to scientific computations. The first program that came to hand was the open-source project OpenMS dealing with protein mass spectrometry. This project appeared to have been written in a very serious and responsible way. Developers use at least Cppcheck to analyze their project. That's why I didn't hope to find anything sensational left unnoticed by that tool. On the other hand, I was curious to see what bugs PVS-Studio would be able to find in the code after Cppcheck. If you want to know this too, follow me.
OpenMP support in PVS-Studio had been dropped after version 5.20. If you have any questions, feel free to contact our support.
So, there exists a project named OpenMS. I won't dare to explain what it is designed to do in my own words, for I may say something stupid. Here's just an extract from the product description on Wikipedia:
OpenMS is an open-source project for data analysis and processing in protein mass spectrometry and is released under the 2-clause BSD licence. OpenMS has tools for many common data analysis pipelines used in proteomics, providing algorithms for signal processing, feature finding (including de-isotoping), visualization in 1D (spectra or chromatogram level), 2D and 3D, map mapping and peptide identification. It supports label-free and isotopic-label based quantification (such as iTRAQ and TMT and SILAC). Furthermore, it also supports metabolomics workflows and DIA/SWATH targeted analysis.
Taken from: Wikipedia. OpenMS.
The project is of a medium size yet rather complex. The source code's size is 20 Mbytes plus a lot of third-party libraries (Boost, Qt, Zlib, and so on). The project exploits templates very extensively. You can download the source code from the SourceForge website.
I can say for sure that static analysis is employed in the OpenMS development process. Presence of the "cppcheck.cmake" file and comments like this:
if (i != peptide.size()) // added for cppcheck
indicate that the developers use Cppcheck at least. I also saw mentions of Cpplint and the file "cpplint.py". That's a really serious way to do the job. Well done!
Now let's see what PVS-Studio has managed to find in the project.
Note. The project C++ files have the '*.C' extension for some reason. So, don't be confused when you see a C++ code sample located in a '*.C' file.
It is very seldom that I come across projects employing the OpenMP technology. You know, I even think sometimes of removing all the OpenMP-related diagnostics from the analyzer. That's why I was genuinely surprised to see these warnings in the message list. I have checked dozens of projects during the last year and I have never seen a warning on OpenMP. Well, glad to see there's somebody using this technology.
There were false positives among those messages, but a few reported genuine bugs.
DoubleReal ILPDCWrapper::compute(....) const
{
....
DoubleReal score = 0;
....
#pragma omp parallel for schedule(dynamic, 1)
for (SignedSize i = 0; i < (SignedSize)bins.size(); ++i)
{
score += computeSlice_(fm, pairs, bins[i].first,
bins[i].second, verbose_level);
}
return score;
}
PVS-Studio's diagnostic message: V1205 Data race risk. Unprotected concurrent operation with the 'score' variable. ilpdcwrapper.c 213
The sum is calculated incorrectly. The variable 'score' is not protected from simultaneous use by different threads.
Other warnings are not that critical, but I think we still should take a look at them. Every exception must be caught inside parallel sections. If an exception leaves a parallel section, it will most likely lead to a crash. This subject is discussed in more detail in the following posts: "OpenMP and exceptions", "Processing of exceptions inside parallel sections".
An exception can be generated explicitly through using the throw operator, or it may occur when calling the new (std::bad_alloc) operator.
The first way. The function getTheoreticalmaxPosition() may throw an exception.
Size getTheoreticalmaxPosition() const
{
if (!this->size())
{
throw Exception::Precondition(__FILE__, __LINE__,
__PRETTY_FUNCTION__,
"There must be at least one trace to ......");
}
....
}
virtual void run()
{
....
#pragma omp parallel for
for (SignedSize i = 0; i < (SignedSize)seeds.size(); ++i)
{
....
f.setMZ(
traces[traces.getTheoreticalmaxPosition()].getAvgMZ());
....
}
....
}
PVS-Studio's diagnostic message: V1301 The 'throw' keyword cannot be used outside of a try..catch block in a parallel section. featurefinderalgorithmpickedhelperstructs.h 199
The second way. Calling the 'new' operator might lead to throwing an exception.
TraceFitter<PeakType>* chooseTraceFitter_(double& tau)
{
// choose fitter
if (param_.getValue("feature:rt_shape") == "asymmetric")
{
LOG_DEBUG << "use asymmetric rt peak shape" << std::endl;
tau = -1.0;
return new EGHTraceFitter<PeakType>();
}
....
}
virtual void run()
{
....
#pragma omp parallel for
for (SignedSize i = 0; i < (SignedSize)seeds.size(); ++i)
{
....
TraceFitter<PeakType>* fitter = chooseTraceFitter_(egh_tau);
....
}
....
}
PVS-Studio's diagnostic message: V1302 The 'new' operator cannot be used outside of a try..catch block in a parallel section. featurefinderalgorithmpicked.h 1926
Other defects of this kind:
std::vector< std::pair<std::string, long> > spectra_offsets;
std::vector< std::pair<std::string, long> > chromatograms_offsets;
template <typename MapType>
void MzMLHandler<MapType>::writeFooter_(std::ostream& os)
{
....
int indexlists;
if (spectra_offsets.empty() && spectra_offsets.empty() )
{
indexlists = 0;
}
else if (!spectra_offsets.empty() && !spectra_offsets.empty() )
{
indexlists = 2;
}
else
{
indexlists = 1;
}
....
}
PVS-Studio's diagnostic messages:
V501 There are identical sub-expressions 'spectra_offsets.empty()' to the left and to the right of the '&&' operator. mzmlhandler.h 5288
V501 There are identical sub-expressions '!spectra_offsets.empty()' to the left and to the right of the '&&' operator. mzmlhandler.h 5292
These checks are very strange. The container 'spectra_offsets' is checked twice. There must be a misprint and actually two different containers must be checked: 'spectra_offsets' and 'chromatograms_offsets'.
template <typename MapType>
void MzMLHandler<MapType>::characters(
const XMLCh* const chars, const XMLSize_t)
{
....
if (optionalAttributeAsString_(data_processing_ref,
attributes,
s_data_processing_ref))
{
data_.back().meta.setDataProcessing(
processing_[data_processing_ref]);
}
else
{
data_.back().meta.setDataProcessing(
processing_[data_processing_ref]);
}
....
}
PVS-Studio's diagnostic message: V523 The 'then' statement is equivalent to the 'else' statement. mzmlhandler.h 534
If you look at other similar code fragments, you can guess what should have been written there:
Many misprints relate to dealing with exception throwing. These mistakes are very trivial: the keyword 'throw' is missing. Due to that, a temporary object is created and gets destroyed at once. For example:
inline UInt asUInt_(const String & in)
{
UInt res = 0;
try
{
Int tmp = in.toInt();
if (tmp < 0)
{
Exception::ConversionError(
__FILE__, __LINE__, __PRETTY_FUNCTION__, "");
}
res = UInt(tmp);
}
catch (Exception::ConversionError)
{
error(LOAD,
String("UInt conversion error of \"") + in + "\"");
}
return res;
}
PVS-Studio's diagnostic message: V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw ConversionError(FOO); xmlhandler.h 247
Similar misprints can be found in the following fragments:
The last misprint I've noticed:
inline typename Value<Pipe>::Type const & operator*() {
tmp.i1 = *in.in1;
tmp.i2 = *in.in2;
tmp.i3 = *in.in2;
return tmp;
}
PVS-Studio's diagnostic message: V525 The code containing the collection of similar blocks. Check items 'in1', 'in2', 'in2' in lines 112, 113, 114. pipe_joiner.h 112
The correct code should look like this:
tmp.i1 = *in.in1;
tmp.i2 = *in.in2;
tmp.i3 = *in.in3;
CompressedInputSource::CompressedInputSource(
const String & file_path, const char * header,
MemoryManager * const manager)
: xercesc::InputSource(manager)
{
if (sizeof(header) / sizeof(char) > 1)
{
head_[0] = header[0];
head_[1] = header[1];
}
else
{
head_[0] = '\0';
head_[1] = '\0';
}
....
}
PVS-Studio's diagnostic message: V514 Dividing sizeof a pointer 'sizeof (header)' by another value. There is a probability of logical error presence. compressedinputsource.c 52
If we divide the pointer size by the byte size, we'll always get a value larger than one. At least, I don't know such an intricate architecture where it isn't so. That's why it's some mistake here.
A similar strange check can be found here: compressedinputsource.c 104
template <typename TStringSet, typename TSpec>
inline Iter<TStringSet, ConcatVirtual<TSpec> > const &
operator++(Iter<TStringSet, ConcatVirtual<TSpec> > & me, int)
{
Iter<TStringSet, ConcatVirtual<TSpec> > before = me;
goNext(me);
return before;
}
PVS-Studio's diagnostic message: V558 Function returns the reference to temporary local object: before. iter_concat_virtual.h 277
The function returns a reference to the temporary variable 'before'. When leaving the function, this variable will be destroyed. Using a reference to a destroyed object may have unexpected outcome.
The fixed operator looks like this:
template <typename TStringSet, typename TSpec>
inline Iter<TStringSet, ConcatVirtual<TSpec> > const
operator++(Iter<TStringSet, ConcatVirtual<TSpec> > & me, int)
{ ... }
A similar trouble is with the '--' operator: iter_concat_virtual.h 310
typedef size_t Size;
typedef double DoubleReal;
void updateMeanEstimate(const DoubleReal & x_t,
DoubleReal & mean_t, Size t)
{
DoubleReal tmp(mean_t);
tmp = mean_t + (1 / (t + 1)) * (x_t - mean_t);
mean_t = tmp;
}
PVS-Studio's diagnostic message: V636 The '1 / (t + 1)' expression was implicitly casted from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. masstracedetection.c 129
The "(1 / (t + 1))" expression is always equal to zero or one. It is determined by the fact that this expression is integer. Perhaps the programmer intended to get quite a different value. I'm not familiar with the program logic, but I guess the following thing was meant:
tmp = mean_t + (1.0 / (t + 1)) * (x_t - mean_t);
I also didn't like that instead of the M_PI constant explicit values are used which are, moreover, not very inaccurate. This is not an error of course, but it's still no good. Here's an example:
bool PosteriorErrorProbabilityModel::fit(
std::vector<double> & search_engine_scores)
{
....
incorrectly_assigned_fit_param_.A =
1 / sqrt(2 * 3.14159 *
pow(incorrectly_assigned_fit_param_.sigma, 2));
....
}
PVS-Studio's diagnostic message: V624 The constant 3.14159 is being utilized. The resulting value could be inaccurate. Consider using the M_PI constant from <math.h>. posteriorerrorprobabilitymodel.c 92
Other similar bugs:
static const Int CHANNELS_FOURPLEX[4][1];
static const Int CHANNELS_EIGHTPLEX[8][1];
ExitCodes main_(int, const char **)
{
....
if (itraq_type == ItraqQuantifier::FOURPLEX)
{
for (Size i = 0; i < 4; ++i)
{
std::vector<std::pair<String, DoubleReal> > one_label;
one_label.push_back(std::make_pair<String, DoubleReal>(
String("Channel ") +
String(ItraqConstants::CHANNELS_FOURPLEX[i][0]),
DoubleReal(ItraqConstants::CHANNELS_FOURPLEX[i][0])));
labels.push_back(one_label);
}
}
else //ItraqQuantifier::EIGHTPLEX
{
for (Size i = 0; i < 8; ++i)
{
std::vector<std::pair<String, DoubleReal> > one_label;
one_label.push_back(std::make_pair<String, DoubleReal>(
String("Channel ") +
String(ItraqConstants::CHANNELS_FOURPLEX[i][0]),
DoubleReal(ItraqConstants::CHANNELS_FOURPLEX[i][0])));
labels.push_back(one_label);
}
}
....
}
PVS-Studio's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 7. itraqanalyzer.c 232
This bug can be actually put into the category of Copy-Paste related bugs. But anyway, let it be "array index out of bounds" - it sounds scarier this way. And, after all, this classification is pretty relative; you can put one and the same bug into different categories.
In this sample, the 'CHANNELS_EIGHTPLEX' array must have been handled in the 'else' branch. There's a comment proving that:
else //ItraqQuantifier::EIGHTPLEX
However, the copied-and-pasted code fragment was modified only partially. It results in the CHANNELS_FOURPLEX array having a smaller size.
A similar bug can be found here (also caused by Copy-Paste): tmtanalyzer.c 225
One more sample.
DoubleReal masse_[255]; ///< mass table
EdwardsLippertIterator::EdwardsLippertIterator(const
EdwardsLippertIterator & source) :
PepIterator(source),
f_file_(source.f_file_),
actual_pep_(source.actual_pep_),
spec_(source.spec_),
tol_(source.tol_),
is_at_end_(source.is_at_end_),
f_iterator_(source.f_iterator_),
f_entry_(source.f_entry_),
b_(source.b_),
e_(source.e_),
m_(source.m_),
massMax_(source.massMax_)
{
for (Size i = 0; i < 256; i++)
{
masse_[i] = source.masse_[i];
}
}
PVS-Studio's diagnostic message: V557 Array overrun is possible. The value of 'i' index could reach 255. edwardslippertiterator.c 134
The masse_ array is being incorrectly handled in the copying constructor: the array consists of 255 items, while 256 items are copied.
The fixed loop looks like this:
for (Size i = 0; i < 255; i++)
{
masse_[i] = source.masse_[i];
}
An even better practice is to avoid using magic constants.
svm_problem * LibSVMEncoder::encodeLibSVMProblem(....)
{
....
node_vectors = new svm_node *[problem->l];
if (node_vectors == NULL)
{
delete[] problem->y;
delete problem;
return NULL;
}
....
}
PVS-Studio's diagnostic message: V668 There is no sense in testing the 'node_vectors' pointer against null, as the memory was allocated using the 'new' operator. The exception will be generated in the case of memory allocation error. libsvmencoder.c 177
The check "if (node_vectors == NULL)" is pointless. If memory cannot be allocated, an exception is thrown. It results in the program behaving quite differently than the programmer expects. For instance, a memory leak might occur.
There are other similar checks implemented in an obsolete manner:
I think the OpenMS developers will benefit from using PVS-Studio as well in addition to Cppcheck, Cpplint - especially if doing it regularly. So, I invite you to write us at support@viva64.com. We can grant you a free registration key so that you can do a complete check of OpenMS.