Our website uses cookies to enhance your browsing experience.
Accept
to the top

Webinar: Let's make a programming language. Part 1. Intro - 20.02

>
>
>
Code generation for algorithms in Java

Code generation for algorithms in Java

Feb 19 2026

Devs often use "magic code" to solve tree problems on LeetCode. However, enterprise code requires readability and maintainability for years. What shall we do when a task is so large that completing it manually would take weeks? Let's take a look at how code generation can help.

What's problem?

In the previous article, we examined a problem of tree translation. There, we scrutinized how it could be decomposed and proposed a solution. If you want to start with the basics, I suggest taking a peek there.

However, I briefly rewind to what we'll discuss.

What shall we do? We need an AST translator from the TypeScript compiler representation to our own.

For what? To develop a static analyzer for JavaScript/TypeScript.

An abstract syntax tree (or AST) surprisingly has not so many differences from any other tree. I'd like to highlight two important characteristics: it has many different types of nodes (for each syntactic structure in the language), and the trees can grow quite large, even though operations on the nodes remain standard. In the last article, we defined three types of operations:

  • Structural equivalent transformations (isomorphic transformations): when nothing changes in a tree structure.
  • Decomposition: when we turn one note into several new nodes based on its properties. There was no place in the problem for the inverse operation (aggregation).
  • Normalization: sometimes, we need to perform some arbitrary action that can't be classified on a node. Last time, we normalized () => true to () => { return true }.

In the previous article, we concluded that we needed to translate a tree consisting of 263 nodes into our tree of ~100 nodes. This means that one of the operations would have to be implemented at least a hundred times manually. These are thousands of identical code lines that must be carefully written and then meticulously maintained. Of course, we could do it by hand, but there's a better option, and it is code generation.

How to write code that writes code

Choosing a framework

So, we want to generate code. The Java build system allows us to add additional files during the build; we're not limited here.

The task seems simple: take a StringBuilder and do append as many times as we can—and for trivial cases, this may work, but even a slight complexity, like indents, causes problems.

To simplify our lives and avoid reinventing the wheel, it's better to use an existing tool. There are several options:

  • JavaPoet is a specialized and lightweight library that solves the task at hand—it generates Java code. So, that's what we'll use.
  • Spoon is a heavier and more powerful framework that helps read, modify, and create Java code. It's powerful enough that we built our Java static analyzer on it. It may be an overkill for our task, but it's worth mentioning.

With the right tool in hand, we can start exploring how to use it.

JavaPoet

Great things have small beginnings, so I'll demonstrate JavaPoet functionality with the well-known "Hello World" example:

public static JavaFile createMain() {
  var main = MethodSpec.methodBuilder("main")
                       .addModifiers(Modifier.PUBLIC, Modifier.STATIC)
                       .returns(void.class)
                       .addParameter(String[].class, "args")
                       .addStatement("$T.out.println($S)",
                                     System.class,
                                     "Hello world!")
                       .build();

  var hello = TypeSpec.classBuilder("Main")
                      .addModifiers(Modifier.PUBLIC, Modifier.FINAL)
                      .addMethod(main)
                      .build();

  return JavaFile.builder("com.example.generated", hello)
                 .indent("  ")
                 .build();
}

After saving to a file, we got:

package com.example.generated;

import java.lang.String;
import java.lang.System;

public final class Main {
  public static void main(String[] args) {
    System.out.println("Hello world!");
  }
}

To understand what JavaPoet offers, we'll examine the API on the example above, and it'd be enough for our purposes. At the same time, I'll try to briefly overview the framework.

Common API

  • JavaFile is used for determining where we write our code to, including methods for writing to files. We can add a package and comments here.
  • TypeSpec provides builders for creating types. They have everything: modifiers, fields, methods, and so on.
  • FieldSpec defines class fields.
  • MethodSpec is similar to the previous ones; it allows us to define a method. To avoid writing the entire method body in one place, we can immediately pass CodeBlock to addStatement.

A convenient way to create code blocks

Defining the method body in a separate location isn't the only thing that CodeBlock offers. Do you remember when I mentioned indents? This task is being solved here. We don't need to worry about where to add braces and indents: with block.beginControlFlow("if ($L > 0)", "x") and block.endControlFlow(), we automatically open and close the code block:

if (x > 0) {
}

Examples of usage will appear later in the text.

String formatting

In the examples above, we may notice that the written code lines contain patterns such as $L and $T. The first is literal substitution, the second is type substitution, which helps with imports.

Literal substitution is a simple thing because it inserts whatever we pass to it. Handling types, however, is a bit trickier. In the "Hello World" example, the class is passed directly. We can use primitive types via TypeName, but the module where we're writing the code generator may have no access to the type we need. The next section explains how to handle it.

How to process imports correctly

If we need to refer to an arbitrary class, we need ClassName. It's defined as simply as possible: ClassName.get("com.example.package", "ClassName"). They can be used in the previously mentioned $T pattern.

ClassName will be needed frequently, so it's better to cache it.

Tree translation

Where we left off

Finally, we can move on to the promised tree translation. Last time, I showed how to perform the three transformations mentioned earlier using a visitor. I'll repeat the code:

interface Visitor<R, P> {
    R visitBinary(Binary n, P parent);
    R visitLiteral(Literal n, P parent);
}

class Builder extends Scanner<JsNode, JsNode> {
    @Override
    public JsNode visitBinary(Binary n, JsNode parent) {
        var translated = new JsBinary(n.getKind());
        translated.setLeft((JsExpression) scan(n.getLeft()));
        translated.setRight((JsExpression) scan(n.getRight()));
        return translated;
    }

    @Override 
    public JsNode visitLiteral(Literal n, JsNode parent) {
        var literal = new JsLiteral(n.getValue());
        literal.setParent(parent);
        return literal;
    }
}

Here we can see an isomorphism of a binary operation using the visit-type method:

  • the first argument is the node for translation;
  • the second argument is the pre-created parent;
  • the method returns the newly created target tree node.

By recursively traversing the source tree this way, we obtain the target tree root.

As outlined earlier, we have a plethora of node types that need to be handled manually. Obviously, we can't get away from writing the logic of the translation—collecting and linking nodes—but there are many code repeats: method declarations, annotations, similar calls, and type conversions. In short, it's boilerplate, which we can avoid using code generation.

YAML programming

How can we eliminate boilerplate in the Java code while preserving the logic? Just don't use Java! Instead, any other format that our self-written generator can read will be a good option. Personally, I chose YAML because its nested structure fits our needs well.

First, we need to define an isomorphic transformation, i.e., simply specifying which node in the target tree each source node maps into:

CallExpression:
  target: JsInvocation
  args:
    - getQuestionDotToken()

Here, we translate the CallExpression node into JsInvocation, additionally specifying that the result of the getQuestionDotToken() call on the source node should be passed to the constructor.

We simply moved some Java code into a separate file, which the generator then uses to create Java code from the template. As a result, we got a simple and punchy pseudo-DSL. The main advantage lies in preserving a consistent generation pattern: we set the mapping between nodes and their bindings, ensuring its correctness via Java typing and build- and test-level checks.

Secondly, we should not forget about decomposition. Here's an example with string interpolation from the previous article, where we split TemplateSpan into InterpolationExpression:

We can describe it like this:

TemplateSpan:
  - member: Expression
    target: JsInterpolationExpression
  - member: TemplateTail
    target: JsInterpolationText

A JsInterpolationExpression should be created from the Expression property of the TemplateSpan node, and JsInterpolationText from the TemplateTail.

Thirdly, we need to connect parent and child nodes somehow. Binding children to parents is straightforward: we create a stack of elements and use the top of the stack as the parent node. But what about binding parents to children? How do we know whether an Expression is the right or left operand of a binary expression? Or is it an argument for a function call?

To do this, we define roles:

BINARY_LEFT:
  parentAs: JsBinaryExpression
  nodeAs: Expression
  invoke: setLeft
  
BINARY_RIGHT:
  parentAs: JsBinaryExpression
  nodeAs: Expression
  invoke: setRight

They define how child nodes are attached to a parent and how nodes are cast based on their role. In code, the role will be set when scanning a node property.

The remaining step is to specify roles for the tree node properties:

BinaryExpression:
  - member: Left
    role: BINARY_LEFT
  - member: Right
    role: BINARY_RIGHT

We assign the corresponding roles to the left and right operands of the binary expression.

This way, we reduce the number of handwritten lines from thousands to hundreds, leaving only those that contain actual logic. The next step is to understand how to generate code based on the configuration.

Stack approach

We've decided how we'll generate the code, but we haven't yet figured out what exactly will be generated. Although I've given some hints: we need a context (state) of the translation based on a stack.

Let me remind you that we already have a generated visitor and walker for the source tree, as described in the previous article. So, the base for mounting the translator is ready.

First, I'll reproduce under the spoiler what a functional visitor looks like.

class Builder extends Scanner<JsNode, JsNode> {
    @Override
    public JsNode visitBinary(Binary n, JsNode parent) {
        var translated = new JsBinary(n.getKind());
        translated.setLeft((JsExpression) scan(n.getLeft()));
        translated.setRight((JsExpression) scan(n.getRight()));
        return translated;
    }

    @Override 
    public JsNode visitLiteral(Literal n, JsNode parent) {
        var literal = new JsLiteral(n.getValue());
        literal.setParent(parent);
        return literal;
    }
}

I called it "functional" because it's always busy returning something. We can rewrite it to stack-based, where stacks are used for parent nodes and roles instead of call results, like this:

class Builder extends Scanner {
    private final Deque<JsNode> elements = new ArrayDeque<>();
    private final Deque<Role> roles = new ArrayDeque<>();
    
    @Override
    public void visitBinary(Binary n) {
        var translated = new JsBinary(n.getKind());
        elements.push(translated);
        
        roles.push(Role.BINARY_LEFT);
        scan(n.getLeft());
        roles.pop();
        
        roles.push(Role.BINARY_RIGHT);
        scan(n.getRight());
        roles.pop();
        
        elements.pop();
        
        if (!elements.isEmpty()) {
            var parent = elements.peek();
            var role = roles.peek();
            translated.setParent(parent);
            
            if (role == Role.BINARY_LEFT) {
                ((JsBinaryExpression) parent).setLeft((Expression) translated);
            } else if (role == Role.BINARY_RIGHT) {
                ((JsBinaryExpression) parent).setRight((Expression) translated);
            }
        }
    }

    @Override 
    public void visitLiteral(Literal n) {
        var literal = new JsLiteral(n.getValue());
        
        var parent = elements.peek();
        var role = roles.peek();
        literal.setParent(parent);
        
        if (role == Role.BINARY_LEFT) {
            ((JsBinaryExpression) parent).setLeft((Expression) literal);
        } else if (role == Role.BINARY_RIGHT) {
            ((JsBinaryExpression) parent).setRight((Expression) literal);
        }
    }
}

Here, the roles stack appears, where we set which role to use when scanning a child node. Depending on it, the child node determines how exactly to connect itself to the parent. The parent, in turn, is taken from the elements stack. Roles are duplicated for both literals and binary expressions because binary expressions can be subexpressions.

Thanks to stacks, the code becomes more uniform and consists of discrete operations, which is good for us because we need to generate code. In addition, there's no need to support several different visitors for cases when the project requires ones that neither accept parents and nor return anything.

On the other hand, the example looks clumsy, but I deliberately simplified it for clarity. The code can definitely be improved:

  • We can split Builder into separate parts—creating nodes, scanning, and processing their finalization here—because it's the only one here and does everything at once.
  • We can move stacks to a separate context class so that they can be conveniently shared between different components of the translator from the previous point.
  • We can add and remove elements from stacks using try-with-resources syntax to reduce visual noise.
  • We can move role binding to a separate method that contains a single large switch statement, rather than clutter nodes with duplicate if statements.

Actually, that's exactly what I did in the project. The steps roughly follow this order:

  • EnterVisitor creates a node and places it on the stack;
  • Binder contains a giant switch with all roles and binds parents with children in both directions;
  • Walker (Scan children) places roles on the stack and scans child nodes with those roles. During decomposition, it also places elements created from node properties onto the stack. The scan is recursive, so for each child element we'll again enter EnterVisitor;
  • ExitVisitor removes the element from the stack.

Normalization

We sought to achieve consistent code, but we need to factor in normalization, which by definition is a certain arbitrary action. How can this be addressed?

As shown in the diagram above, I have replaceable EnterVisitor and ExitVisitor, and for Walker, we can define any code after traversal but before extracting the current element from the stack.

Let's look at this through the example of entry and exit visitors. EnterVisitor and ExitVisitor are created through factories, where the default factory performs a direct conversion from YAML. To completely replace this logic, we just need to register a factory that will generate what we need instead of standardized code. For example:

public class VisitFile implements VisitGenerator {
    @Override
    public CodeBlock generateEnter() {
        return CodeBlock.builder()
            .addStatement("var file = new $T()",
                          ClassName.get(filePkg, "JsCodeFile"))
            .addStatement("ctx.getElements().push(file)")
            .addStatement(
                "file.setPosition($T.startPosition($T.of(src.getPath())))",
                ClassName.get(posPkg, "SourcePosition"),
                ClassName.get(Path.class))
            .build();
    }

    @Override
    public CodeBlock generateExit() {
        return CodeBlock.builder()
            .addStatement("file = (JsCodeFile) ctx.getElements().pop()")
            .build();
    }
}

The generated entry visitor method will look like this:

public void visitFileNode(FileNode src) {
  var file = new JsCodeFile();
  ctx.getElements().push(file);
  file.setPosition(SourcePosition.startPosition(Path.of(src.getPath())));
}

And the exit visitor method will look like this:

public void visitFileNode(FileNode src) {
  file = (JsCodeFile) ctx.getElements().pop();
}

This factory helps us add special processing to the input position and assignment to the output field.

However, directly modifying Walker methods or replacing visitor methods can lead to issues:

  • The IDE won't highlight errors within strings. JavaPoet makes things easier and reduces errors, but it doesn't eliminate them entirely.
  • If we overuse it, all the benefits of code generation will be lost, as maintenance costs can outweigh them. In this sense, architecture isn't perfect because it doesn't protect against misuse.

Let's sum up. Although the decision is not optimal, we've reached the desired compromise:

  • Most transformations are covered by YAML configuration.
  • For isolated cases, the generated code is replaced with arbitrary code.
  • If we forgot to take something into account beforehand, we can hook into the process at almost any moment.
  • In total, more than 5,000 lines are generated. When requirements change, all we need to do is to make small adjustments to the architecture via the code generator and rebuild the project.

Why not do all this through LLVM?

It's now 2026, and I'm completely ignoring current trends. Writing a large amount of boilerplate code? That's literally a case for AI agents. The idea did cross my mind, but let me explain why I didn't go that route:

  • AI-generated code is a good option for developing internal tools where quality isn't so critical, but if the code will be run on customers' machines, such things will be undesirable.
  • AI still struggles with following invariants in code, and there are plenty of them here: the source tree, the target tree, and their properties. The generated code will have to be reviewed by a human in any case.
  • As a result, due to the need for the thorough code review, the AI agent option appears to be the same manual option with all its disadvantages and poorly predictable benefits.

Afterword

This is the end of the dilogy about the tree translation. It turned out to be quite an unusual problem, which resulted in a series of two developer's notes. I hope that these articles helped you learn more about trees and code generation. Please share in the comments if you've ever run into similar problems.

While this series ends here, we'll continue publishing articles about the new analyzer. To keep up with us, subscribe to PVS-Studio on X and our monthly article digest. You're welcome to try the tool for free.

If you'd like to follow my posts, subscribe to my personal blog.

Posts: articles

Poll:

Subscribe
and get the e-book
for free!

book terrible tips


Comments (0)

Next comments next comments
close comment form