>
>
>
Checking OpenCV with PVS-Studio

Andrey Karpov
Articles: 671

Checking OpenCV with PVS-Studio

OpenCV is a library of computer vision algorithms, picture processing algorithms, and general-purpose numerical algorithms. The library is written in C/C++ and is free both for academic and commercial use, as it is distributed under the BSD license. The time has come to check this library with the PVS-Studio code analyzer.

OpenCV is a large library. It contains more than 2500 optimized algorithms and consists of more than 1 million code lines. Cyclomatic complexity of the most complex function cv::cvtColor() is 415. It is no wonder that we have found quite many errors and questionable fragments in its code. However, the size of the source code taken into account, we may call this library a high quality one.

Old errors

But here's a small remark to start with. When studying samples of errors detected by PVS-Studio, programmers don't want to believe that these errors are real. Maybe they don't like being aware of the fact that their own and others' programs may be unreliable. They argue: "Ok. There are some real errors found in the program, but they don't actually influence the program behavior. This code doesn't seem to be used. There's no problem".

They are of course wrong, unfortunately. Now it's the good time for me to prove that. While analyzing one project we also checked the OpenCV library integrated into it. It was the old version of the library included into the project, so we studied the errors found in it but didn't describe them in our report. It would be just reasonable to check the new version of the OpenCV library and write a post about it. It's just what we've done now.

The result is quite expected: many errors in the old library version are fixed in the new one. Here are a couple of these.

The first fixed error:

CV_IMPL CvGLCM* cvCreateGLCM(....)
{
  CvGLCM* newGLCM = 0;
  ....
  memset( newGLCM, 0, sizeof(newGLCM) );
  ....
}

V512 A call of the 'memset' function will lead to underflow of the buffer 'newGLCM'. cvtexture.cpp 138

The second fixed error:

CvDTreeSplit* CvDTree::find_split_cat_reg(....)
{
  ....
  double** sum_ptr = 0;
  
  .... // sum_ptr not in use
    for( i = 0; i < mi; i++ )
    {
        R += counts[i];
        rsum += sum[i];
        sum[i] /= MAX(counts[i],1);
        sum_ptr[i] = sum + i;
    }
  ....
}

V522 Dereferencing of the null pointer 'sum_ptr' might take place. mltree.cpp 2001

There are some other examples, but describing already fixed bugs is not interesting. The main point is that this fact allows us to draw the inexorable conclusions:

1. Errors detected by the PVS-Studio analyzer are absolutely real. They torture and suck the blood of both its users and its developers. They must be found and fixed, and this process is sad and slow and starts only after the bugs are discovered by users.

2. These and many other errors can be detected by the PVS-Studio analyzer at the coding stage already, which significantly reduces the development cost. The incremental analysis mode might appear especially useful.

New bugs

Note. When checking a project, we don't distinguish between whether a bug refers to the project itself or to one of the third-party libraries used by it. It's not interesting to describe each small library individually.

Note also that you shouldn't view this article as a complete list of bugs PVS-Studio has managed to find in the OpenCV library. The article cites only those code fragments that we have found most suspicious when scanning through the messages generated by the analyzer. If you participate in development of the OpenCV project, we recommend you to use the demo version of the tool to study the list of the analyzer-generated warnings more thoroughly.

Copy-Paste bugs

The PVS-Studio analyzer is good at detecting errors caused by misprints and copy-paste. Here's a classic example of code copying-and-pasting. We have a set of functions such as augAssignAnd, augAssignOr, augAssignXor, augAssignDivide, and so on. These functions are different only in one operator. Surely you can't but feel a great temptation to copy the body function and then fix the operator responsible for what it must do. The trouble is that the probability of making a mistake is great too.

void MatOp::augAssignAnd(const MatExpr& expr, Mat& m) const
{
    Mat temp;
    expr.op->assign(expr, temp);
    m &= temp;
}

void MatOp::augAssignOr(const MatExpr& expr, Mat& m) const
{
    Mat temp;
    expr.op->assign(expr, temp);
    m |= temp;
}

void MatOp::augAssignDivide(const MatExpr& expr, Mat& m) const
{
    Mat temp;
    expr.op->assign(expr, temp);
    m /= temp;
}

void MatOp::augAssignXor(const MatExpr& expr, Mat& m) const
{
    Mat temp;
    expr.op->assign(expr, temp);
    m /= temp;
}

V524 It is odd that the body of 'augAssignXor' function is fully equivalent to the body of 'augAssignDivide' function (matop.cpp, line 294). matop.cpp 318

Note that the augAssignXor() function does the same thing as the 'augAssignDivide() function. It's certainly not right. The augAssignXor() function should contain this text: "m ^= temp;".

Code logic contradicting code formatting

Here's one more error related to copy-paste. The program lines we need to investigate are too long. If we format them to match the article text, you won't understand where the error is. That's why we have to show it using a picture.

Figure 1. The program logic doesn't correspond to its formatting. Click on the picture to enlarge it.

V640 The code's operational logic does not correspond with its formatting. The second statement will always be executed. It is possible that curly brackets are missing. test_stereomatching.cpp 464

As you can see, the long line was copied and put after the 'if' operator. It results in the program formatting contradicting its execution logic.

A misprint

The following error must be caused by a misprint, not code copying. Perhaps it was the autocomplete that had failed the programmer when writing the variable name.

static jpc_enc_cp_t *cp_create(....)
{
  ....
  ccp->sampgrdsubstepx = 0;
  ccp->sampgrdsubstepx = 0;
  ....
}

V519 The 'ccp->sampgrdsubstepx' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 414, 415. jpc_enc.c 415

The second line must be this one: ccp->sampgrdsubstepy = 0;.

Meaningless loops

typedef struct CvStereoCamera
{
 ....
 float fundMatr[9]; /* fundamental matrix */
 ....
};
CvStereoCamera stereo;

void CvCalibFilter::Stop( bool calibrate )
{
  ....
  for( i = 0; i < 9; i++ )
  {
    stereo.fundMatr[i] = stereo.fundMatr[i];
  }
  .... 
}

V570 The 'stereo.fundMatr[i]' variable is assigned to itself. calibfilter.cpp 339

The loop as it is found here is meaningless. It seems that some other operations should be executed over the array items.

Here's a loop whose body is executed only once:

virtual CvBlob* Process(....)
{
  ....
  while(!m_Collision && m_FGWeight>0)
  {
    ....
    break;
  }
  ....
}

V612 An unconditional 'break' within a loop. blobtrackingmsfg.cpp 600

The loop body doesn't contain 'continue' operators, and there is the 'break' operator at the end of it. All this is very strange, and the function must be incorrect.

The null character and the null pointer confused

int jpc_atoaf(char *s, int *numvalues, double **values)
{
  char *cp;
  ....
  while ((cp = strtok(0, delim))) {
    if (cp != '\0') {
      ++n;
    }
  }
  ....
}

V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *cp != '\0'. jpc_util.c 105

The same mistake can be found here: jpc_util.c 123.

The check if(cp != '\0') is meaningless. If the strtok() function returns a null pointer, the loop will terminate. The programmer must have intended to check if the line-end was found. In this case the check must look like this: if(*cp != '\0').

Misprints in conditions

There is a whole class of errors when misprints prevent the values of some variables from being checked.

The dr3dr2 variable is not checked:

CV_IMPL void cvComposeRT(
  const CvMat* _rvec1, const CvMat* _tvec1,
  const CvMat* _rvec2, const CvMat* _tvec2,
  CvMat* _rvec3, CvMat* _tvec3,
  CvMat* dr3dr1, CvMat* dr3dt1,
  CvMat* dr3dr2, CvMat* dr3dt2,
  CvMat* dt3dr1, CvMat* dt3dt1,
  CvMat* dt3dr2, CvMat* dt3dt2)
{
  ....
  if( _rvec3 || dr3dr1 || dr3dr1 )
  ....
}

V501 There are identical sub-expressions to the left and to the right of the '||' operator: _rvec3 || dr3dr1 || dr3dr1 calibration.cpp 415

The cmptlut[2] array item is not checked:

bool Jpeg2KDecoder::readHeader()
{
  ....
  cmptlut[0] = ....
  cmptlut[1] = ....
  cmptlut[2] = ....
  if( cmptlut[0] < 0 || cmptlut[1] < 0 || cmptlut[0] < 0 )
    result = false;
  ....
}

V501 There are identical sub-expressions 'cmptlut[0] < 0' to the left and to the right of the '||' operator. grfmt_jpeg2000.cpp 215

The dst_size.height variable is compared to itself:

CV_IMPL IplImage* icvCreateIsometricImage(....)
{
  ....
  if( !dst || dst->depth != desired_depth ||
      dst->nChannels != desired_num_channels ||
      dst_size.width != src_size.width ||
      dst_size.height != dst_size.height )
  ....
}

V501 There are identical sub-expressions to the left and to the right of the '!=' operator: dst_size.height != dst_size.height epilines.cpp 2118

An absolutely meaningless condition:

void CvDTreeTrainData::read_params(....)
{
  ....
  if( cat_var_count != cat_var_count ||
      ord_var_count != ord_var_count )
    CV_ERROR(CV_StsParseError,
    "var_type is inconsistent with cat_var_count and ord_var_count");
  ....
}

V501 There are identical sub-expressions to the left and to the right of the '!=' operator: cat_var_count != cat_var_count tree.cpp 1415

V501 There are identical sub-expressions to the left and to the right of the '!=' operator: ord_var_count != ord_var_count tree.cpp 1415

For other similar errors, let me just cite the corresponding diagnostic messages:

  • V501 There are identical sub-expressions to the left and to the right of the '==' operator: M.size() == M.size() imgwarp.cpp 3672
  • V501 There are identical sub-expressions to the left and to the right of the '&&' operator: data && dims >= 1 && data mat.hpp 434
  • V501 There are identical sub-expressions to the left and to the right of the '&&' operator: 0 <= d && _sizes && d <= 32 && _sizes matrix.cpp 186
  • V501 There are identical sub-expressions to the left and to the right of the '==' operator: M.size() == M.size() imgwarp.cpp 3685

The pointer is used before the check

It's a very frequent error when a pointer is first used and only then is checked for being a null pointer. The OpenCV library is no exception. This is what these errors look like:

CV_IMPL CvStringHashNode* 
cvGetHashedKey( CvFileStorage* fs, .... )
{
  ....
  CvStringHash* map = fs->str_hash;
  if( !fs )
    return 0;
  ....
}

V595 The 'fs' pointer was utilized before it was verified against nullptr. Check lines: 617, 619. persistence.cpp 617

void CvBlobTrackerAuto1::Process(IplImage* pImg, IplImage* pMask)
{
  ....
  CvBlob* pBN = NewBlobList.GetBlob(i);
  pBN->ID = m_NextBlobID;

  if(pBN &&
     pBN->w >= CV_BLOB_MINW &&
     pBN->h >= CV_BLOB_MINH)
  ....
}

V595 The 'pBN' pointer was utilized before it was verified against nullptr. Check lines: 432, 434. blobtrackingauto.cpp 432

I don't think I should cite any more code fragments the V595 diagnostic message is generated on. They are numerous and look the same. It's better to run PVS-Studio on it and check all these fragments.

Note. The V595 diagnostic doesn't always indicate that the code fragment is certainly incorrect. Sometimes the pointer even theoretically cannot be equal to zero. In this case you can remove the check so that it doesn't cause confusion when reading the code. And it's best to pass an object by reference, not by pointer.

Sizes confused

There are many bugs causing only the first bytes of a buffer to be processed instead of complete buffer processing. In most cases it is caused by an issue when the pointer size is confused with the size of the array it points to, and the former is calculated instead of the latter (examples). We seem to have the same thing here.

CAPDRIVERCAPS caps;
bool CvCaptureCAM_VFW::open( int wIndex )
{
  ....
  memset( &caps, 0, sizeof(caps));
  capDriverGetCaps( hWndC, &caps, sizeof(&caps));
  ....
}

V568 It's odd that the argument of sizeof() operator is the '& caps' expression. cap_vfw.cpp 409

The pointer size is passed instead of the CAPDRIVERCAPS structure's size into the capDriverGetCaps() function.

Here's another code fragment. The error must have been caused by a misprint. It is the 'latestCounts' array which is filled with zeros, while it's the size of the 'latestPoints' array which is calculated instead.

class CV_EXPORTS CvCalibFilter
{
  ....
  enum { MAX_CAMERAS = 3 };
  int latestCounts[MAX_CAMERAS];
  CvPoint2D32f* latestPoints[MAX_CAMERAS];
  ....
};

void CvCalibFilter::SetCameraCount( int count )
{
  ....
  memset( latestCounts, 0, sizeof(latestPoints) );
  ....
}

V512 A call of the 'memset' function will lead to overflow of the buffer 'latestCounts'. calibfilter.cpp 238

This code fragment contains a 64-bit error. The code will work well in the 32-bit program version, as the pointer size coincides with that of the 'int' type in 32-bit applications. But a buffer overflow will occur when compiling the 64-bit program version.

Strange, but these errors may stay unnoticed for a long time. First, the 32-bit program always works correctly. But even if you have the 64-bit version, memory clearing beyond the array may cause no harm. These errors usually reveal themselves when you start using another compiler or perform refactoring of the nearby code fragments.

Poor tests

In the post written not so long ago I told you that errors in tests are one of the vulnerabilities of the TDD technology: tests often only pretend to provide program security. Static code analysis is a very good complement to the TDD methodology. It doesn't only find bugs in the program text, but also helps eliminate many of them from tests.

It's quite natural that errors are found in tests of the OpenCV library as well.

void CV_Resize_Test::resize_1d(....)
{
  ....
  for (int r = 0; r < cn; ++r)
  {
    xyD[r] = 0;
    for (int k = 0; k < ksize; ++k)
      xyD[r] += w[k] * xyS[k * cn + r];
    xyD[r] = xyD[r];
  }
  ....
}

V570 The 'xyD[r]' variable is assigned to itself. test_imgwarp_strict.cpp 560

The "xyD[r] = xyD[r];" expression looks very suspicious. Perhaps this test checks not quite what it is intended to check.

Here's another line: "cls_map[r];". What does it mean?

void ann_get_new_responses(....)
{
  ....
  for( int si = 0; si < train_sidx->cols; si++ )
  {
    int sidx = train_sidx_ptr[si];
    int r = cvRound(responses_ptr[sidx*r_step]);
    CV_DbgAssert(fabs(responses_ptr[sidx*r_step]-r) < FLT_EPSILON);
    int cls_map_size = (int)cls_map.size();
    cls_map[r];
    if ( (int)cls_map.size() > cls_map_size )
      cls_map[r] = cls_count++;
  }
  ....
}

V607 Ownerless expression 'cls_map[r]'. test_mltests2.cpp 342

There are some other strange fragments, for example:

void Core_DetTest::get_test_array_types_and_sizes(....)
{
  ....
  sizes[INPUT][0].width =
  sizes[INPUT][0].height = sizes[INPUT][0].height;
  ....
}

V570 The 'sizes[INPUT][0].height' variable is assigned to itself. test_math.cpp 1356

Unspecified behavior

The code shown below might work in your program quite the way you want it to. But be aware that it won't last forever. We mean the negative number shift. To learn more about these shifts see the article "Wade not in unknown waters. Part three".

CvSeq * cvFindNextContour( CvContourScanner scanner )
{
  ....
  new_mask = INT_MIN >> 1;
  ....
}

V610 Unspecified behavior. Check the shift operator '>>. The left operand '(- 2147483647 - 1)' is negative. contours.cpp 1012

Miscellaneous

void CvFuzzyMeanShiftTracker::SearchWindow::initDepthValues(....)
{
  unsigned int d=0, mind = 0xFFFF, maxd = 0,
           m0 = 0, m1 = 0, mc, dd;
  ....
  for (int j = 0; j < height; j++)
  {
    ....
    if (d > maxd)
      maxd = d;
    ....
  }
}

V547 Expression 'd > maxd' is always false. Unsigned type value is never < 0. fuzzymeanshifttracker.cpp 386

The 'd' variable is not changed in the loop. It means that the 'd > maxd' condition never holds.

void jpc_init_t2state(jpc_enc_t *enc, int raflag)
{
  ....
  for (pass = cblk->passes; pass != endpasses; ++pass) {
    pass->lyrno = -1;
    pass->lyrno = 0;
  }
  ....
}

V519 The 'pass->lyrno' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 539, 540. jpc_t2enc.c 540


void KeyPointsFilter::retainBest(vector<KeyPoint>& keypoints, int
                                 n_points)
{
  ....
  if( n_points > 0 && keypoints.size() > (size_t)n_points )
  {
    if (n_points==0)
    {
      keypoints.clear();
      return;
    }
  ....
}

V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 195, 197. keypoint.cpp 195

void HOGDescriptor::detectMultiScaleROI(....) const
{
  ....
  double *linearwt = new double[totwords+1];
  ....
  delete linearwt;
  ....
}

V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] linearwt;'. hog.cpp 2630

Conclusion

Even highly qualified programmers aren't secure from making mistakes, but the PVS-Studio tool can help to eliminate many of them already at the stage of coding. Otherwise detecting and fixing these errors will be ten times more expensive.