>
>
>
Few words about Java code hygiene

Konstantin Volohovsky
Articles: 15

Few words about Java code hygiene

While reviewing code or working on a new project, we can get annoyed by many things: style, approaches, quality. However, the most frustrating is the trite lack of code hygiene. Since the issue is quite common, I'd like to pay attention to it and remind you how to maintain code hygiene.

In lieu of introduction

Indeed, we're all human beings, living in an imperfect world. So, the last thing I'd like to do is to chide the already constrained programmers. Moreover, I wouldn't like to blame the open-source projects' developers for nothing. However, now I suggest you to put yourself in a perfectionist's shoes and recall all cases when you have dealt with such code.

To reproduce the case, let's take a random project from GitHub: NGB, a web genome-viewer tool. To simplify the search for suspicious code fragments, I use the PVS-Studio static analyzer.

Let's dive in

It's not welcome here

So, do you like the code that doesn't do anything? I hope you do, otherwise, the following snippet may ruin your mood:

public NggbIntervalTreeMap<Gene> loadGenesIntervalMap(GeneFile geneFile,
                                                      int startIndex,
                                                      int endIndex,
                                                      Chromosome chromosome) {
    ....
    List<Future<Boolean>> results;
    try {
        results = taskExecutorService.getExecutorService().invokeAll(callables);
    } catch (InterruptedException | AssertionError e) {
        throw new GeneReadingException(geneFile, 
                                       chromosome, 
                                       startIndex, 
                                       endIndex, 
                                       e);
    }

    results.stream().map(future -> {
        try {
            return future != null ? future.get() : null;
        } catch (InterruptedException | ExecutionException e) {
            log.error(getMessage(MessagesConstants.ERROR_GENE_BATCH_LOAD,
                                 geneFile.getId(), 
                                 chromosome.getId(), 
                                 e));
        }
        return null;
    });
    ....
    return genesRangeMap;
}

V6010 The return value of function 'map' is required to be utilized. GffManager.java(477)

You may think this code fragment is OK. The results variable is declared, populated with values, and handled. However, everything is OK until you see that the result of the call to map() isn't used, and another value is returned from the method (I confess, I have omitted its handling for your comfort reading). Does it negatively affect the program execution, or is it still there after refactoring? Guess yourself (as you remember, we imagine that it's our own project.)

Is it a critical error?

Here we go again: the case where something is clearly going wrong. Unfortunately, it has become even more difficult for me to come up with an excuse for this snippet:

private static OrganismType determineHeterozygousGenotypeGA4GH(
    VariantGA4GH ga4GHEntity, List<Allele> alleles, int[] genotypeArray) {

    OrganismType organismType = null;
    for (int i = 0; i < genotypeArray.length; i++) {
        if (alleles.get(i).isReference()) {
            genotypeArray[i] = 0;
            organismType = OrganismType.HETEROZYGOUS;
        } else {
            if (organismType == null) {
                organismType = OrganismType.HETERO_VAR;
            }
            genotypeArray[i] = ga4GHEntity.getAlternateBases()
                                          .indexOf(alleles.get(i)) + 1; // <=
        }
    }

    return organismType;
}

V6066 The type of object passed as argument is incompatible with the type of collection: String, Allele. VcfGa4ghReader.java(535)

The ga4GHEntity.getAlternateBases() method returns List<String>, while we try to find an object of the Allele type. Since the object of a different type is obviously not found, zeros (the result of adding -1 and 1) are written to genotypeArray. Would you walk away from the trouble and pretend that you notice nothing, or would you try to fix it? The moral pressure intensifies when you understand that even IntelliJ IDEA can detect these errors.

How's that even possible?

We continue to play detectives, (unsuccessfully) finding out what the developers have wanted to do. Here are two curious areas of code: Here is the area # 1:

private VariantSet variationMetadata(final String path) {
    final int index = path.indexOf('-');
    VariantSet variantSet;
    final String variantSetId;
    if (index >= 0) {
        variantSetId = path.substring(0, index - 1);
    } else {
        variantSetId = path;
    }
    ....
    return variantSet;
}

And the area # 2:

private TIntv getIntv(final String s) {
    ....
    int end = s.indexOf('\t', beg);
    while (end >= 0 || end == -1) {
        if (col == mSc) {
            ....
        } else {
            if ((mPreset & TYPE_FLAG) == 0) {
                if (col == mEc) {
                    intv.end = Integer.parseInt(end != -1 
                                                ? s.substring(beg, end) 
                                                : s.substring(beg));
                }
            } else if ((mPreset & TYPE_FLAG) == 1) {
                if (col == 6) {
                    String cigar = s.substring(beg, end);
                    ....
                }
            } else if ((mPreset & TYPE_FLAG) == 2) {
                String alt;
                alt = end >= 0 ? s.substring(beg, end) 
                               : s.substring(beg);
                ....
            }
        }
        if (end == -1) {
            break;
        }
        beg = end + 1;
        end = s.indexOf('\t', beg);
    }
    return intv;
}

The phrase in the title is what I'd say if I saw such a case during the code review. So, I don't show you the error on purpose — try to find it yourself (it's easy, I've shortened the code :) ). For now, let me share my opinion on this.

After "digging" into the project, I have no confidence that the app is easy to crash. However, it's still frustrating that developers allow a negative index value (directly in the second case and due to subtraction in the first case) to extract the substring. Especially in the first case, where the index check and substring extraction are literally next to each other. In the second case, developers may have missed a ternary operator, because the other two lines have it.

Here are the warnings for these two pieces of code:

V6009 The 'substring' function could receive the '-1' value while non-negative value is expected. Inspect argument: 2. VcfGa4ghReader.java(212)

V6009 The 'substring' function could receive the '-1' value while non-negative value is expected. Inspect argument: 2. TabixReader.java(430)

Is that the way it's meant to be, or is that the way it's meant to be?

Do you want to fix the DAO infrastructure? You go to the required module, and you see the following method:

private String getGroupByField(final List<VcfFile> files, final String groupBy){
    final VcfIndexSortField sortField = VcfIndexSortField.getByName(groupBy);
    if (sortField == null) {
        ....
        if (   infoItem.getType() == VCFHeaderLineType.Integer
            || infoItem.getType() == VCFHeaderLineType.Float) {
            return infoItem.getName().toLowerCase();
        } else {
            return infoItem.getName().toLowerCase();
        }
    } else {
        if (   sortField.getType() == SortField.Type.INT
            || sortField.getType() == SortField.Type.FLOAT) {
            return sortField.getField().fieldName;
        } else {
            return sortField.getField().fieldName;
        }
    }
}

V6004 The 'then' statement is equivalent to the 'else' statement. FeatureIndexDao.java(852), FeatureIndexDao.java(854)

Misunderstanding. Sadness. Frustration. Judging by the blame, the case has occurred when different developers have edited one part of the code. However, even when we've found someone to blame, the shock is still here: why would someone need two conditions in a row with the same branches? Am I missing something?

There's a good chance that the code doesn't behave incorrectly here. But what if it does? All we have left is to poke unit tests...

How to test unit tests

Speaking of unit tests: it's easy enough to stop trusting them. For example, if we make a mistake in their setup:

public void addFeatureIndexedFileRegistration(Long refId,
                                              String path,
                                              String indexPath,
                                              String name,
                                              Long fileId,
                                              Long fileBioId, 
                                              BiologicalDataItemFormat format,
                                              boolean doIndex,
                                              String prettyName) {
    String absolutePath = getNormalizeAndAbsolutePath(path);
    onRequest().havingMethodEqualTo(HTTP_POST)
               .havingPathEqualTo(....)
               .havingBodyEqualTo(TestDataProvider
                                     .getRegistrationJsonWithPrettyName(
                                         refId, 
                                         absolutePath, 
                                         indexPath,
                                         name, 
                                         doIndex, 
                                         prettyName))
               .respond()
               .withBody(....)
               .withStatus(HTTP_STATUS_OK);
}

To get the hang of the trick, let's look at the method signature: TestDataProvider.getRegistrationJsonWithPrettyName(....):

public static String getRegistrationJsonWithPrettyName(
    Long referenceID,
    String path,
    String name,
    String indexPath,
    boolean doIndex,
    String prettyName) {
    ....
}

Do you see that? If so, you may thank me for formatting the code a bit. Here's the warning:

V6029 Possible incorrect order of arguments passed to method: 'indexPath', 'name'. TestHttpServer.java(139), TestHttpServer.java(140)

When calling the method, the arguments order is mixed up. The order of passing indexPath and name variables doesn't correspond to the order of parameters with similar names in the method. Does another bug compensate the bug, or does it negatively affect the tests somewhere? We can only guess and be glad that it's not our responsibility to fix that.

How to take care of the code

I see these issues all the time. Though it seems clear: write good code, don't write bad code, don't make mistakes. Next time, Captain Obvious will come back to you with a surprising new discovery.

But seriously, everyone knows it. As I've said, my job is just to remind you. Even though you have your good heads on your shoulders, don't ignore the IDE warnings. Then you can think about linters for simple cases and specialized tools for finding more suspicious things. Thus, I use the PVS-Studio static analyzer in this article. Unfortunately, automation isn't a panacea. We shouldn't give up on simple human code review. So, it's still not possible for a machine to replace a human in this case.

Afterword

I haven't had any major concerns about the code quality from the NGB developers. I've just taken the most confusing code snippets. I'll stop grumbling here, but you might want to grumble a little more in the comments. If you'd like to check a project using PVS-Studio, you can try it by following the link.

To follow new articles on the code quality topic, you're welcome to subscribe to: