Maxim Stefanov

Dec 18 2019

Tags:

#Java

Apache Hadoop code quality: production vs test

Dec 18 2019

Author: Maxim Stefanov

About the project
About the check
Production code
Test code
Conclusion

In order to get high quality production code, it's not enough just to ensure maximum coverage with tests. No doubts, great results require the main project code and tests to work efficiently together. Therefore, tests have to be paid as much attention as the main code. A decent test is a key success factor, as it will catch regression in production. Let's take a look at PVS-Studio static analyzer warnings to see the importance of the fact that errors in tests are no worse than the ones in production. Today's focus: Apache Hadoop.

About the project

Those who were formerly interested in Big Data have probably heard or even worked with the Apache Hadoop project. In a nutshell, Hadoop is a framework that can be used as a basis for building Big Data systems and working with them.

Hadoop consists of four main modules, each of them performs a specific task required for a big data analytics system:

Hadoop Common
MapReduce
Hadoop Distributed File System
YARN

Anyway, there are plenty of materials about it on the Internet.

About the check

As shown in the documentation, PVS-Studio can be integrated in the project in a variety of ways:

Using the maven plugin;
Using the gradle plugin;
Using the gradle IntellJ IDEA;
Directly using the analyzer.

Hadoop is based on the maven build system, therefore, there were no obstacles with the check.

After I integrated the script from the documentation and edited one of the pom.xml files (there were modules in dependencies that were not available), the analysis started!

After the analysis was completed, I chose the most interesting warnings and noticed that I had the same number of warnings in production code and in tests. Normally, I don't consider analyzer warnings, given for tests. But when I divided them, I couldn't leave 'tests' warnings unattended. "Why not take a look at them?"- I thought, because bugs in tests might also have adverse consequences. They can lead to incorrect or partial testing, or even to mishmash (they exist just to check the box, that they are always green).

After I selected the most intriguing warnings, I divided them by the following groups: production, test and the four main Hadoop modules. And now I'm glad to offer for your attention the review of analyzer warnings.

Production code

Hadoop Common

V6033 An item with the same key 'KDC_BIND_ADDRESS' has already been added. MiniKdc.java(163), MiniKdc.java(162)

public class MiniKdc {
  ....
  private static final Set<String> PROPERTIES = new HashSet<String>();
  ....
  static {
    PROPERTIES.add(ORG_NAME);
    PROPERTIES.add(ORG_DOMAIN);
    PROPERTIES.add(KDC_BIND_ADDRESS);
    PROPERTIES.add(KDC_BIND_ADDRESS); // <=
    PROPERTIES.add(KDC_PORT);
    PROPERTIES.add(INSTANCE);
    ....
  }
  ....
}

The value added twice in HashSet is a very common defect when checking projects. The second addition will be actually ignored. I hope this duplication is just a needless tragedy. What if another value was meant to be added?

MapReduce

V6072 Two similar code fragments were found. Perhaps, this is a typo and 'localFiles' variable should be used instead of 'localArchives'. LocalDistributedCacheManager.java(183), LocalDistributedCacheManager.java(178), LocalDistributedCacheManager.java(176), LocalDistributedCacheManager.java(181)

public synchronized void setup(JobConf conf, JobID jobId) throws IOException {
  ....
  // Update the configuration object with localized data.
  if (!localArchives.isEmpty()) {
    conf.set(MRJobConfig.CACHE_LOCALARCHIVES, StringUtils
        .arrayToString(localArchives.toArray(new String[localArchives  // <=
            .size()])));
  }
  if (!localFiles.isEmpty()) {
    conf.set(MRJobConfig.CACHE_LOCALFILES, StringUtils
        .arrayToString(localFiles.toArray(new String[localArchives     // <=
            .size()])));
  }
  ....
}

The V6072 diagnostic sometimes yields some interesting findings. The purpose of this diagnostic is to detect the same type code fragments resulted from copy-paste and replacement of one-two variables. In this case, some variables have even been left "underchanged".

The code above demonstrates this. In the first block the localArchives variable is used, in the second similar fragment - localFiles. If you study this code with due diligence, rather then quickly run through it, as it often happens while code reviewing, you'll notice the fragment, where the author forgot to replace the localArchives variable.

This gaffe can lead to the following scenario:

Suppose, we have localArchives (size = 4) and localFiles (size = 2);
When creating the array localFiles.toArray(new String[localArchives.size()]), the last 2 elements will be null (["pathToFile1", "pathToFile2", null, null]);
Then org.apache.hadoop.util.StringUtils.arrayToString will return the string representation of our array, in which the last files names will be presented as "null" ("pathToFile1, pathToFile2, null, null");
All this will be passed further and God only knows what kind of checks there are for such cases =).

V6007 Expression 'children.size() > 0' is always true. Queue.java(347)

boolean isHierarchySameAs(Queue newState) {
  ....
  if (children == null || children.size() == 0) {
    ....
  }
  else if(children.size() > 0)
  {
    ....
  }
  ....
}

Due to the fact that the number of elements is checked for 0 separately, the further check children.size() > 0 will always be true.

HDFS

V6001 There are identical sub-expressions 'this.bucketSize' to the left and to the right of the '%' operator. RollingWindow.java(79)

  RollingWindow(int windowLenMs, int numBuckets) {
    buckets = new Bucket[numBuckets];
    for (int i = 0; i < numBuckets; i++) {
      buckets[i] = new Bucket();
    }
    this.windowLenMs = windowLenMs;
    this.bucketSize = windowLenMs / numBuckets;
    if (this.bucketSize % bucketSize != 0) {         // <=
      throw new IllegalArgumentException(
          "The bucket size in the rolling window is not integer: windowLenMs= "
              + windowLenMs + " numBuckets= " + numBuckets);
    }
  }

Here the defect is that the variable is divided by itself. As a result, the check for multiplicity will always be successful and in case of getting incorrect inputs (windowLenMs, numBuckets), the exception will never be thrown.

YARN

V6067 Two or more case-branches perform the same actions. TimelineEntityV2Converter.java(386), TimelineEntityV2Converter.java(389)

 public static ApplicationReport
 convertToApplicationReport(TimelineEntity entity)
{
  ....
  if (metrics != null) {
    long vcoreSeconds = 0;
    long memorySeconds = 0;
    long preemptedVcoreSeconds = 0;
    long preemptedMemorySeconds = 0;

    for (TimelineMetric metric : metrics) {
      switch (metric.getId()) {
      case ApplicationMetricsConstants.APP_CPU_METRICS:
        vcoreSeconds = getAverageValue(metric.getValues().values());
        break;
      case ApplicationMetricsConstants.APP_MEM_METRICS:
        memorySeconds = ....;
        break;
      case ApplicationMetricsConstants.APP_MEM_PREEMPT_METRICS:
        preemptedVcoreSeconds = ....;                         // <=
        break;
      case ApplicationMetricsConstants.APP_CPU_PREEMPT_METRICS:
        preemptedVcoreSeconds = ....;                         // <=
        break;
      default:
        // Should not happen..
        break;
      }
    }
    ....
  }
  ....
}

Same code fragments in two case branches. It's just all over the place! In the prevailing number of cases, this is not a real error, but just a reason to think about refactoring of switch. But not for the case at hand. Repeated code fragments set the value of the variable preemptedVcoreSeconds. If you look closely at the names of all variables and constants, you will probably conclude that in case if metric.getId() == APP_MEM_PREEMPT_METRICS the value must be set for the preemptedMemorySeconds variable, not for preemptedVcoreSeconds. In this regard, after the 'switch' operator, preemptedMemorySeconds will always remain 0, while the value of preemptedVcoreSeconds might be incorrect.

V6046 Incorrect format. A different number of format items is expected. Arguments not used: 2. AbstractSchedulerPlanFollower.java(186)

@Override
public synchronized void synchronizePlan(Plan plan, boolean shouldReplan)
{
  ....
  try
  {
    setQueueEntitlement(planQueueName, ....);
  }
  catch (YarnException e)
  {
    LOG.warn("Exception while trying to size reservation for plan: {}",
              currResId,
              planQueueName,
              e);
  }
  ....
}

The planQueueName variable isn't used when logging. In this case, either too much is copied or the format string isn't finished. But I still blame the good old copy-paste, which in some cases is just great to shoot yourself in the foot.

Test code

Hadoop Common

V6072 Two similar code fragments were found. Perhaps, this is a typo and 'allSecretsB' variable should be used instead of 'allSecretsA'. TestZKSignerSecretProvider.java(316), TestZKSignerSecretProvider.java(309), TestZKSignerSecretProvider.java(306), TestZKSignerSecretProvider.java(313)

public void testMultiple(int order) throws Exception {
    ....
    currentSecretA = secretProviderA.getCurrentSecret();
    allSecretsA = secretProviderA.getAllSecrets();
    Assert.assertArrayEquals(secretA2, currentSecretA);
    Assert.assertEquals(2, allSecretsA.length);         // <=
    Assert.assertArrayEquals(secretA2, allSecretsA[0]);
    Assert.assertArrayEquals(secretA1, allSecretsA[1]);

    currentSecretB = secretProviderB.getCurrentSecret();
    allSecretsB = secretProviderB.getAllSecrets();
    Assert.assertArrayEquals(secretA2, currentSecretB);
    Assert.assertEquals(2, allSecretsA.length);        // <=
    Assert.assertArrayEquals(secretA2, allSecretsB[0]);
    Assert.assertArrayEquals(secretA1, allSecretsB[1]);
    ....
}

And again the V6072. Look closely at variables allSecretsA and allSecretsB.

V6043 Consider inspecting the 'for' operator. Initial and final values of the iterator are the same. TestTFile.java(235)

private int readPrepWithUnknownLength(Scanner scanner, int start, int n)
    throws IOException {
  for (int i = start; i < start; i++) {
    String key = String.format(localFormatter, i);
    byte[] read = readKey(scanner);
    assertTrue("keys not equal", Arrays.equals(key.getBytes(), read));
    try {
      read = readValue(scanner);
      assertTrue(false);
    }
    catch (IOException ie) {
      // should have thrown exception
    }
    String value = "value" + key;
    read = readLongValue(scanner, value.getBytes().length);
    assertTrue("values nto equal", Arrays.equals(read, value.getBytes()));
    scanner.advance();
  }
  return (start + n);
}

A test that's always green? =). A part of the loop, which is a part of the test itself, will never get executed. This is due to the fact that the initial and final counter values are equal in the for statement. As a result, the condition i < start will immediately become false, leading to such behavior. I ran through the test file and leapt to the conclusion that actually i < (start + n) had to be written in the loop condition.

MapReduce

V6007 Expression 'byteAm < 0' is always false. DataWriter.java(322)

GenerateOutput writeSegment(long byteAm, OutputStream out)
    throws IOException {
  long headerLen = getHeaderLength();
  if (byteAm < headerLen) {
    // not enough bytes to write even the header
    return new GenerateOutput(0, 0);
  }
  // adjust for header length
  byteAm -= headerLen;
  if (byteAm < 0) {    // <=
    byteAm = 0;
  }
  ....
}

The condition byteAm < 0 is always false. To figure it out, let's give the code above another glance. If the test execution reaches the operation byteAm -= headerLen, it means that byteAm >= headerLen. From here, after subtraction, the byteAm value will never be negative. That's what we had to prove.

HDFS

V6072 Two similar code fragments were found. Perhaps, this is a typo and 'normalFile' variable should be used instead of 'normalDir'. TestWebHDFS.java(625), TestWebHDFS.java(615), TestWebHDFS.java(614), TestWebHDFS.java(624)

public void testWebHdfsErasureCodingFiles() throws Exception {
  ....
  final Path normalDir = new Path("/dir");
  dfs.mkdirs(normalDir);
  final Path normalFile = new Path(normalDir, "file.log");
  ....
  // logic block #1
  FileStatus expectedNormalDirStatus = dfs.getFileStatus(normalDir);
  FileStatus actualNormalDirStatus = webHdfs.getFileStatus(normalDir); // <=
  Assert.assertEquals(expectedNormalDirStatus.isErasureCoded(),
                      actualNormalDirStatus.isErasureCoded());
  ContractTestUtils.assertNotErasureCoded(dfs, normalDir);
  assertTrue(normalDir + " should have erasure coding unset in " + ....);

  // logic block #2
  FileStatus expectedNormalFileStatus = dfs.getFileStatus(normalFile);
  FileStatus actualNormalFileStatus = webHdfs.getFileStatus(normalDir); // <=
  Assert.assertEquals(expectedNormalFileStatus.isErasureCoded(),
                      actualNormalFileStatus.isErasureCoded());
  ContractTestUtils.assertNotErasureCoded(dfs, normalFile);
  assertTrue( normalFile + " should have erasure coding unset in " + ....);
}

Believe it or not, it's V6072 again! Just follow the variables normalDir and normalFile.

V6027 Variables are initialized through the call to the same function. It's probably an error or un-optimized code. TestDFSAdmin.java(883), TestDFSAdmin.java(879)

private void verifyNodesAndCorruptBlocks(
    final int numDn,
    final int numLiveDn,
    final int numCorruptBlocks,
    final int numCorruptECBlockGroups,
    final DFSClient client,
    final Long highestPriorityLowRedundancyReplicatedBlocks,
    final Long highestPriorityLowRedundancyECBlocks)
    throws IOException
{
  /* init vars */
  ....
  final String expectedCorruptedECBlockGroupsStr = String.format(
      "Block groups with corrupt internal blocks: %d",
      numCorruptECBlockGroups);
  final String highestPriorityLowRedundancyReplicatedBlocksStr
      = String.format(
      "\tLow redundancy blocks with highest priority " +
          "to recover: %d",
      highestPriorityLowRedundancyReplicatedBlocks);
  final String highestPriorityLowRedundancyECBlocksStr = String.format(
      "\tLow redundancy blocks with highest priority " +
          "to recover: %d",
      highestPriorityLowRedundancyReplicatedBlocks);
  ....
}

In this fragment, variables highestPriorityLowRedundancyReplicatedBlocksStr and highestPriorityLowRedundancyECBlocksStr are initialized by the same values. Often it should be so, but this is not the case. Variables' names are long and similar, so it's not surprising that copy-pasted fragment wasn't changed in any way. To fix it, when initializing the highestPriorityLowRedundancyECBlocksStr variable, the author has to use the input parameter highestPriorityLowRedundancyECBlocks. In addition, most likely, they still need to correct the format line.

V6019 Unreachable code detected. It is possible that an error is present. TestReplaceDatanodeFailureReplication.java(222)

private void
verifyFileContent(...., SlowWriter[] slowwriters) throws IOException
{
  LOG.info("Verify the file");
  for (int i = 0; i < slowwriters.length; i++) {
    LOG.info(slowwriters[i].filepath + ....);
    FSDataInputStream in = null;
    try {
      in = fs.open(slowwriters[i].filepath);
      for (int j = 0, x;; j++) {
        x = in.read();
        if ((x) != -1) {
          Assert.assertEquals(j, x);
        } else {
          return;
        }
      }
    } finally {
      IOUtils.closeStream(in);
    }
  }
}

The analyzer complains that the i++ counter in the loop can't be changed. Which means that in the for (int i = 0; i < slowwriters.length; i++) {....} loop no more than one iteration will execute. Let's find out why. So, in the first iteration, we link the thread with the file that corresponds to slowwriters[0] for further reading. Next, we read the file contents via the loop for (int j = 0, x;; j++).

if we read something relevant, we compare the read byte with the current value of the j counter though assertEquals (if the check is not successful, we fail out of the test);
if the file is checked successfully and we get to the end of the file (we read -1), the method exits.

Therefore, whatever happens during the check of slowwriters[0], it won't get to checking subsequent elements. Most likely, break has to be used instead of return.

YARN

V6019 Unreachable code detected. It is possible that an error is present. TestNodeManager.java(176)

@Test
public void 
testCreationOfNodeLabelsProviderService() throws InterruptedException {
  try {
    ....
  } catch (Exception e) {
    Assert.fail("Exception caught");
    e.printStackTrace();
  }
}

In this situation, the Assert.fail method will interrupt the test and stacktrace won't be printed in case of an exception. If the message about the caught exception is enough here, it's better to remove printing of stacktrace to avoid confusion. If printing is necessary, you just need to swap them.

Many similar fragments have been found:

V6019 Unreachable code detected. It is possible that an error is present. TestResourceTrackerService.java(928)
V6019 Unreachable code detected. It is possible that an error is present. TestResourceTrackerService.java(737)
V6019 Unreachable code detected. It is possible that an error is present. TestResourceTrackerService.java(685)
....

V6072 Two similar code fragments were found. Perhaps, this is a typo and 'publicCache' variable should be used instead of 'usercache'. TestResourceLocalizationService.java(315), TestResourceLocalizationService.java(309), TestResourceLocalizationService.java(307), TestResourceLocalizationService.java(313)

@Test
public void testDirectoryCleanupOnNewlyCreatedStateStore()
    throws IOException, URISyntaxException
{
  ....
  // verify directory creation
  for (Path p : localDirs) {
    p = new Path((new URI(p.toString())).getPath());

    // logic block #1
    Path usercache = new Path(p, ContainerLocalizer.USERCACHE);
    verify(spylfs).rename(eq(usercache), any(Path.class), any()); // <=
    verify(spylfs).mkdir(eq(usercache), ....);

    // logic block #2
    Path publicCache = new Path(p, ContainerLocalizer.FILECACHE);
    verify(spylfs).rename(eq(usercache), any(Path.class), any()); // <=
    verify(spylfs).mkdir(eq(publicCache), ....);
    ....
  }
  ....
}

And finally, V6072 again =). Variables to view the suspicious fragment: usercache and publicCache.

Conclusion

Hundreds of thousands of code lines are written in development. Production code is usually kept clean from bugs, defects and flaws (developers test their code, the code is reviewed and so on). Tests are definitely inferior in this regard. Defects in tests can easily hide behind a "green tick". As you've probably got from today's warnings review, a green test is not always a successful check.

This time, when checking the Apache Hadoop code base, static analysis turned out to be highly needed both in production code and tests that also play an important role in development.

So if you care about your code and tests quality, I suggest that you set sights on static analysis. Let PVS-Studio be the first contender in this undertaking.

#Java