Kirill Epifanov

Jun 21 2024

Tags:

#Knowledge #Java

Bitwise operators in Java: unpacking ambiguities

Jun 21 2024

Author: Kirill Epifanov

Before we start
Once upon a time
Operators and how do they differ
The far it goes...
Why, exactly, is that bad?
Conclusion

The "&" and "|" operators are pretty straightforward and unambiguous when applied correctly. But do you know all the implications of using bitwise operators instead of logical ones in Java? In this article, we will examine both the pros of this approach in terms of performance and the cons in terms of code readability.

Before we start

This is the third and the final article of the series. It's written based on the results of checking DBeaver version 24 with the help of the PVS-Studio static analyzer. The tool detected a few suspicious code fragments that caught our development team's attention and prompted us to cover them in several articles. If you haven't read the previous ones, you can find them here:

Volatile, DCL, and synchronization pitfalls in Java
How template method can ruin your Java code
Bitwise operators in Java: unpacking the ambiguities (you are here)

Once upon a time

While wading through the warnings issued by the analyzer, we encountered a warning for logical expressions that contained bitwise operations in an if block. The analyzer claims it to be a suspicious code snippet and issues a warning of the second (medium) level — V6030. Here are some of the PVS-Studio messages:

The ExasolSecurityPolicy.java(123) file.

public ExasolSecurityPolicy(....) {
  ....
  String value = JDBCUtils.safeGetString(dbResult, "SYSTEM_VALUE");
  
  if (value.isEmpty() | value.equals("OFF"))
    this.enabled = false;
  } else {
    assignValues(ExasolSecurityPolicy.parseInput(value));
  }
}

The analyzer reports that the value.equals("OFF") method is called even if value.isEmpty() is true, which is meaningless. The warning:

V6030 The method located to the right of the '|' operator will be called regardless of the value of the left operand. Perhaps, it is better to use '||'.

The ExasolConnectionManager.java(151) file.

@Override
protected void addObjectModifyActions(....) {
  ....
  // url, username or password have changed
  if (com.containsKey("url") | com.containsKey("userName") |
      com.containsKey("password"))
  {
    // possible loss of information - warn
    ....
    if (!(con.getUserName().isEmpty() | con.getPassword().isEmpty())) {
      ....
    }
  }
}

Here we see sequential checks of Map com for url keys, userName, or password. Then, the absence of one of the connection parameters is checked similarly. As a result, we receive the same old message yet again:

V6030 The method located to the right of the '|' operator will be called regardless of the value of the left operand. Perhaps, it is better to use '||'.

The documentation of the PVS-Studio analyzer states that the function returning a boolean value is located to the right of the bitwise operator, which is probably a typo. I got curious about this code style. After all, if there was a single warning of this kind, I'd think it was a typo. However, the project shows 16 such code fragments, and I wondered if there's a double dip in such an error pattern.

Operators and how do they differ

Let's take a quick dive into the theory. I think it's reasonable to skip the long list of Java operators and focus directly on the bitwise and logical ones we need:

&& — logical AND
|| — logical OR
& — bitwise AND
| — bitwise OR

Every theoretical Java website states that bitwise operations, unlike logical ones, always execute both parts of an expression. This code is an example of using the bitwise operator:

public static void main(String[] args) {
    // updateA() and updateB() is always executed
    if (updateA() & updateB()) { 
      // Doing something
    }
}

public boolean updateA() {
    // Changing the state...
    return ....;
}

public boolean updateB() {
    // Changing the state...
    return ....;
}

If we replace the bitwise operator with the logical one and the first condition returns false, the execution of the checks stops. Let's get a bit imaginative and explore examples that alter the inner state. In the following code, if prepareConnection() returns false, the establishConnection() method isn't executed.

public static void main(String[] args) {
    // establishConnection() is executed only if prepareConnection() -> false
    if (prepareConnection() && establishConnection()) { 
        ....
    }
}

public boolean prepareConnection() {
    ....
}

public boolean establishConnection() {
    ....
}

So, if the operator can determine the result of the operation based on the evaluation of the left argument alone, the second argument is not evaluated as it's deemed unnecessary. This mechanism is called short-circuit evaluation. Perhaps one of the most popular use cases for short circuits is to check for null before using a variable.

public boolean testString(String str) {
    return str != null && str.equals("test");
}

At this point, it may seem that the DBeaver developers made a mistake that increased the number of checks done in the if blocks. This could have been due to a typo or just plain unawareness. But that's where the ambiguity comes into play.

The far it goes...

Let's dig into the details. It turns out that a bitwise operation can be faster than a logical operation, depending on the context. This is mainly because bitwise operators don't involve a branch prediction mechanism.

Branch prediction is a unit included in the CPU. It enables prefetching of branch instructions as well as execution of instructions placed after a branch (e.g., an if block) before it is executed. The branch predictor is an integral part of all modern pipelined processors. It enables the optimization of computing resources.

As a side note, I should mention that branch prediction is closely related to speculative execution. In short, it enables the processor to perform some operations on a stream of data ahead of time without waiting until it determines whether data is taken or not. If lucky and data is taken, we get a performance boost. If not, the changes are rolled back, incurring overhead.

So, modern processors have instruction pipelining. Each executed instruction has several stages: fetch, decode, execute, and save.

The fetch stage: the processor retrieves the instruction from memory.
The decode stage: the processor interprets the instruction.
The execute stage: the processor executes the operation specified by the instruction.
The save stage: the processor saves the result of the execution.

Within a processor, different operations can be executed in parallel: one instruction is being executed while another is being fetched from memory. The table below can illustrate this:

Everything seems fine until we encounter a branch instruction along the way. And that's where our pipelined processor is going to hit a snag. If a conditional operator appears in the middle of the pipeline execution, the processor can't fetch the next operations because they depend on the outcome of the logical block.

That's why modern processors try to predict the execution flow and anticipate the instructions to be executed next. It optimizes code execution if it's not explicitly specified. For example:

private boolean debug = false;

public void test(....) {
    ....
    if (debug) {
        log("....");
    }
    ....
}

In this case, the processor (though the compiler and JVM can also make adjustments, which we'll ignore for now) can predict with high probability that the log method call won't occur. So, the processor won't prepare to execute this code based on execution statistics. However, if the processor makes an error, the pipeline needs to be rebuilt, which may impact performance.

Bitwise operations lack this mechanism: they do not involve branches and are devoid of their overhead.

To confirm this, I ran a tiny benchmark based on the source code provided by a developer known as Rostor (source).

Code

Uploaded to GitHub Gist

Here we call methods with logical expressions one by one, using both bitwise and logical operations.

The research results on my PC surprised me. In an average test without using functional interfaces and other OOP tricks, bitwise operations appear to be up to 40 percent faster than logical ones. You may read the chart below displaying the time taken for each individual operation.

The test compares conjunctions and disjunctions, both separately and all together. The biggest ones are shown at the top for quick reference. One immediate conclusion is apparent: logical operators turn out to be slower in this case.

Why? Branch prediction can often be erroneous and cause performance issues. In such a case, the bitwise operator clearly gains in execution performance due to the absence of this mechanism.

However, it's worth mentioning that this test is not flawless, and outcomes can still vary based on the processor design, performance, and testing conditions, which we'll delve into shortly. I'd love to discuss the test results and hear your thoughts on them in the comments. Nevertheless, it now seems that bitwise operators can run faster.

Why, exactly, is that bad?

After reading all this, you might wonder, "Why does this diagnostic exist in your analyzer if such code can be executed faster? Even if a programmer makes a typo, isn't it just a code smell?"

I'm still against such code, and here's why:

First, let me show you two interesting warnings from the same project that were among several other "false positives".

The ExasolTableColumnManager.java(79) and DB2TableColumnManager.java(77) files.

@Override
public boolean canEditObject(ExasolTableColumn object) {
    ExasolTableBase exasolTableBase = object.getParentObject();
    if (exasolTableBase != null &
        exasolTableBase.getClass().equals(ExasolTable.class)) {
        return true;
    } else {
        return false;
    }
}

V6030 The method located to the right of the '&' operator will be called regardless of the value of the left operand. Perhaps, it is better to use '&&'.

This practice sets the stage for an error and a NullPointerException.

In this case, it's worth considering the side effects and being careful. Sometimes it's easy to overlook the fact that you're writing code incorrectly. In the above case, if the exasolTableBase parameter turns out to be null, an exception is thrown. And honestly, I don't think the developer meant for the program to crash over a simple check for editability :)

Besides, one of the developers copied the erroneous code and pulled it into another program module. Let's add copy-pasting to the list of sins. That's why there are two erroneous files above.

Second, another drawback is that we lose short-circuit evaluation, which often improves the performance of the operation.

For example, as in this case, for which the analyzer also issued a warning:

The ExasolDataSource.java(950) file.

@Override
public ErrorType discoverErrorType(@NotNull Throwable error) {
  String errorMessage = error.getMessage();
  if (errorMessage.contains("Feature not supported")) {
    return ErrorType.FEATURE_UNSUPPORTED;
  } else if (errorMessage.contains("insufficient privileges")) {
    return ErrorType.PERMISSION_DENIED;
  } else if (
      errorMessage.contains("Connection lost") | 
      errorMessage.contains("Connection was killed") | 
      errorMessage.contains("Process does not exist") | 
      errorMessage.contains("Successfully reconnected") | 
      errorMessage.contains("Statement handle not found") |
      ....
      )
  {
    return ErrorType.CONNECTION_LOST;
  }
  return super.discoverErrorType(error);
}

V6030 The method located to the right of the '|' operator will be called regardless of the value of the left operand. Perhaps, it is better to use '||'.

In this method, the developer checks the errorMessage string and returns the error type. Everything seems fine, but when using a bitwise operator in an else if statement, we lose the short-circuit evaluation optimization. This is obviously a bad thing, because if any substring is found, we could jump straight to return instead of checking the other options.

Third, there's no guarantee bitwise operations will be faster for you. And if it will, that's just a micro-optimization of a few nanoseconds.

Fourth, the synthetic test above succeeded because the provided values exhibit a normal distribution and show no apparent patterns. In the test, branch prediction only makes it worse.

So, if the branch predictor operates correctly, bitwise operations cannot catch up with the performance of logical operations. This case is very specific.

To sum up all of the above, the final drawback is that constant bitwise checks may seem unconventional from a code-writing style perspective. Especially when the code involves actual bitwise operations. It's easy to picture the reaction of a modern Java programmer who encounters a bitwise operator in an if statement during code review.

Conclusion

That's all for today. I have investigated the relevance of replacing logical operators with bitwise operators, as well as the applicability of the analyzer's diagnostic rule. And you've gained some insights into our favorite Java and its practices =)

If you'd like to search for this or other errors in your project as well, you can try PVS-Studio for free at this link.

#Knowledge #Java