Vladislav Bogdanov

Nov 22 2024

Tags:

#Knowledge #Java

Java serialization: let's dig it up

Nov 22 2024

Author: Vladislav Bogdanov

Introduction
Refreshing our brains
Serializable
- Let's dig a little deeper
- Can we impact the serialization process?
Externalizable
- Digging digging digging!
Conclusion

Java equips developers with convenient tools for serializing objects. Although they seem primitive at first glance, their internal implementation contains a wealth of interesting insights. In this article, we'll explore the essentials of serialization and its nuances. Let's see how it operates under the hood!

Introduction

Java serialization is treated as a straightforward mechanism: just implement the Serializable interface, and that's it. However, if we start digging a little deeper, it turns out that there are numerous subtleties and interesting points.

In the article, we'll talk about the two built-in serialization mechanisms in Java, the Serializable and Externalizable interfaces. Let's start with the basics, and then take a peek behind the scenes of each process.

Refreshing our brains

Serialization is the process of representing the object state as a byte sequence. The reverse process, i.e., converting the byte sequence into the object, is called deserialization.

"Why is this needed?" someone may ask. Serialization allows us to transfer the object over a network. Alternatively, we can serialize the object and save it to a file, so that upon the next program run, we can continue working with the same instance.

In Java, we can't just serialize any object—the object should implement one of the following interfaces:

Serializable
Externalizable.

Let's take a closer look at each one:

Serializable

Serializable is a marker interface that marks a class as serializable.

Marker interfaces don't contain any methods.

What we need to serialize via Serializable:

The object should implement the Serializable interface.
The class fields should be serializable (primitive types are serializable by default).
The first non-serializable superclass should contain a no-argument constructor.

I think the first and second points are clear. The third, however, may not be straightforward. It arises from the specifics of deserialization via Serializable—we'll talk about it a bit later. But first, let's start with examples.

Here's a class that we'll serialize:

public class Target implements Serializable {
    private int intField;
    private String strField;
    private FieldOfTargetClass field;
    private transient int transField;

    public Target(int i, String s, FieldOfTargetClass f, int trn) {
        this.intField = i;
        this.strField = s;
        this.field = f;
        this.transField = trn;
    }

    @Override
    public String toString() {
        return "{Target = " +
               "integerField: " + intField +
               ", stringField: " + strField +
               ", transientField: " + transField +
               ", customObjectField: " + field.toString() + "}";
    }
}

Here's a class whose object serves as the field of the serializable class:

public class FieldOfTargetClass implements Serializable {
    private int fieldInt;
    private String fieldString;

    public FieldOfTargetClass(int i, String s) {
        this.fieldInt = i;
        this.fieldString = s;
    }

    @Override
    public String toString() {
        return "{FieldOfTargetClass = " +
               "fieldId = " + fieldInt +
               ", fieldString='" + fieldString + "'}";
    }
}

Now that we have the classes, let's create their instances and serialize objects.

Here's a process of object serialization:

public static void main(String[] args) throws .... {
    Target target = new Target(112, "bzzz",
            new FieldOfTargetClass(13, "Friday"), 52);
    System.out.println(target);

    var fileOutput = new FileOutputStream("ser_obj");
    var objectOutput = new ObjectOutputStream(fileOutput);
    objectOutput.writeObject(target);
    fileOutput.flush();
    objectOutput.flush();
}

The console output:

{Target = integerField: 112, stringField: bzzz, transientField: 52, customObjectField:

{FieldOfTargetClass = fieldId: 13, fieldString: 'Friday'}}

We've created an object of the FileOutputStream class. It saves the object byte stream to the file. There's no need to dwell on this for long. As mentioned before, saving the object to the file is just one of the serialization options.

Next, we instantiate the ObjectOutputStream object, which performs the actual serialization.

Now let's look at object deserialization:

public static void main(String[] args) throws .... {
    var fileInput = new FileInputStream("ser_obj");
    var objectInput = new ObjectInputStream(fileInput);
    Target deserializeTarget = (Target) objectInput.readObject();
    fileInput.close();
    objectInput.close();
    
    System.out.println(deserializeTarget);
}

The console output:

{Target = integerField: 112, stringField: bzzz, transientField: 0, customObjectField:

{FieldOfTargetClass = fieldId: 13, fieldString: 'Friday'}}

In this example, fileInput reads the file containing the byte stream; objectInput deserializes the object and returns it.

If we want to exclude a field from serialization, we can use the transient keyword, as we do in transientField in the example above. Before serialization, the transientField has a value of 52. After deserialization, the field will have its default value.

For the sake of readability, I've omitted exceptions thrown by methods and constructors in the code. Here's a list of potential exceptions:

FileOutputStream, FileInputStream — FileNotFoundException
ObjectOutputStream, ObjectInputStream — IOException
writeObject — IOException
flush, close — IOException
readObject — ClassNotFoundException.

Well, we've covered the basics of the Serializable interface and so...

Let's dig a little deeper

Let's take another look at how to serialize via Serializable. Specifically, why should the first non-serializable superclass have a no-arg constructor?

To begin, we need to understand how the deserializable object is created. According to the documentation (Section 3.1, 11th algorithm paragraph), the constructor of the first non-serializable superclass should be called to create the deserializable object.

What if we don't have a non-serializable superclass? open icon

Any object always has the non-serializable superclass, the Object class. However, if there's the non-serializable class without the no-arg constructor in the inheritance hierarchy between the Object class and the serializable class, we'll catch an InvalidClassException with the no valid constructor message.

What happens if we create an object by calling the constructor of the first non-serializable superclass? Sounds quite interesting. I suggest taking a look at how this is implemented.

The first interesting aspect is how the class descriptor is created.

The descriptor is an instance of the ObjectStreamClass class where the deserializable object is created, and its state is restored. It contains fields that describe the serializable class, such as the number of primitive and non-primitive fields, as well as the constructor object for the class being restored. By the way, the deserializable object will be created via this constructor.

Let's look at the descriptor constructor:

private ObjectStreamClass(final Class<?> cl) {
    ....
    if (externalizable) {
        cons = getExternalizableConstructor(cl);
    } else {
        cons = getSerializableConstructor(cl);
        ....
    }
    ....
}

It clearly shows that, depending on the serialization type, the constructor used to restore the object is created differently.

Let's see how the deserializable object is created. Since a Serializable object is restored, we'll look into the getSerializableConstructor method. Next, we'll explore the newConstructorForSerizalization method from the ReflectionFactory class.

Here's a mentioned code fragment:

public final Constructor<?> newConstructorForSerialization(Class<?> cl) {
    Class<?> initCl = cl;
    while (Serializable.class.isAssignableFrom(initCl)) {
        Class<?> prev = initCl;
        if ((initCl = initCl.getSuperclass()) == null ||
            (!disableSerialConstructorChecks && 
            !superHasAccessibleConstructor(prev))) {
            return null;
        }
    }
    Constructor<?> constructorToCall;
    try {
        constructorToCall = initCl.getDeclaredConstructor();
        int mods = constructorToCall.getModifiers();
        if ((mods & Modifier.PRIVATE) != 0 ||
                ((mods & (Modifier.PUBLIC | Modifier.PROTECTED)) == 0 &&
                        !packageEquals(cl, initCl))) {
            return null;
        }
    } catch (NoSuchMethodException ex) {
        return null;
    }
    return generateConstructor(cl, constructorToCall);
}

This method searches for the first deserializable superclass and returns its constructor.

Let's see what the Javadoc says about the method:

Returns a constructor that allocates an instance of cl and that then initializes the instance by calling the no-arg constructor of its first non-serializable superclass. This is specified in the Serialization Specification, section 3.1, in step 11 of the deserialization process. If cl is not serializable, returns cl's no-arg constructor. If no accessible constructor is found, or if the class hierarchy is somehow malformed (e. g., a serializable class has no superclass), null is returned.

We can see that the method returns the result of calling generateConstructor, where we pass the class of the deserializable object and the no-arg constructor of the first non-serializable superclass. Take a look at the implementation:

private final Constructor<?> generateConstructor(
                Class<?> cl,
                Constructor<?> constructorToCall) {

    ConstructorAccessor acc = new MethodAccessorGenerator().
        generateSerializationConstructor(
              cl,
              constructorToCall.getParameterTypes(),
              constructorToCall.getExceptionTypes(),
              constructorToCall.getModifiers(),
              constructorToCall.getDeclaringClass());

    Constructor<?> c = newConstructor(
        constructorToCall.getDeclaringClass(),                            
        constructorToCall.getParameterTypes(),
        constructorToCall.getExceptionTypes(),
        constructorToCall.getModifiers(),
        langReflectAccess().
        getConstructorSlot(constructorToCall),
        langReflectAccess().
        getConstructorSignature(constructorToCall),
        langReflectAccess().
        getConstructorAnnotations(constructorToCall),
        langReflectAccess().
        getConstructorParameterAnnotations(constructorToCall));

    setConstructorAccessor(c, acc);
    c.setAccessible(true);
    return c;
}

In the case of the second block, the process is clear: we create a constructor object, but what comes before it? It all boils down to the concept of ConstructorAccessor.

ConstructorAccessor is an interface that delegates the process of the object creation to its implementation. For clarity, how about we take a peek at the newInstance method of the Constructor class:

public T newInstance(Object... initargs) .... {
    ....
    ConstructorAccessor ca = constructorAccessor;   // read volatile
    if (ca == null) {
        ca = acquireConstructorAccessor();
    }
    @SuppressWarnings("unchecked")
    T inst = (T) ca.newInstance(initargs);
    return inst;
}

Here's a method where ConstructorAccessor is created:

public SerializationConstructorAccessorImpl
generateSerializationConstructor(Class<?> declaringClass,
                                 Class<?>[] parameterTypes,
                                 Class<?>[] checkedExceptions,
                                 int modifiers,
                                 Class<?> targetConstructorClass)
{
    return (SerializationConstructorAccessorImpl)
        generate(declaringClass,
                 "<init>",
                 parameterTypes,
                 Void.TYPE,
                 checkedExceptions,
                 modifiers,
                 true,
                 true,
                 targetConstructorClass);
}

What is SerializationConstructorAccessorImpl?

Let's move to its declaration:

abstract class SerializationConstructorAccessorImpl
    extends ConstructorAccessorImpl {
}

As we can see, the class is completely empty. The Javadoc brings some clarity:

Java serialization (in java. io) expects to be able to instantiate a class and invoke a no-arg constructor of that class's first non-Serializable superclass. This is not a valid operation according to the VM specification; one can not (for classes A and B, where B is a subclass of A) write "new B; invokespecial A()" without getting a verification error.

In all other respects, the bytecode-based reflection framework can be reused for this purpose. This marker class was originally known to the VM and verification disabled for it and all subclasses, but the bug fix for 4486457 necessitated disabling verification for all of the dynamically-generated bytecodes associated with reflection. This class has been left in place to make future debugging easier.

In other words, developers planned to use SerializationConstructorAccessorImpl as a "marker" for the JVM, indicating that if the object is created through the constructor of its superclass, no warnings should be issued. It's precisely because the ConstructorAccessor is the SerializationConstructorAccessorImpl instance that the object can be created in such an interesting way during deserialization.

To sum up: the deserializable object is created not via its constructor but via the constructor of its first non-serializable superclass. This makes sense because the deserializable object and its superclasses that support serialization are restored from the byte stream. It's meaningless to execute code from constructors and initializers at this point. We're left only with non-serializable superclasses, so to initialize them, we call the constructor of the first non-serializable superclass of the restored object.

Can we impact the serialization process?

Although we perceive serialization via Serializable as a purely automatic process, the reality is quite different.

According to the documentation, we can declare methods in the serializable class to control serialization and deserialization processes. Here's an example of two of them:

writeObject is a method that is called during serialization to write data to the byte stream;
readObject is a method that is called during deserialization to initialize the object fields with values from the byte stream.

How can we use them?

Take a look at how to implement readObject and writeObject in the class:

public class Target implements Serializable {

    private int integerField;
    private String stringField;
    private FieldOfTargetClass field;
    private transient int transientField;

    // the same constructor as in the Serializable example 

    private void writeObject(ObjectOutputStream out) throws .... {
        System.out.println("writeObject was executed");
        out.defaultWriteObject();
        out.writeInt(transientField);
    }

    private void readObject(ObjectInputStream input) throws ....{
        System.out.println("readObject was executed");
        input.defaultReadObject();
        this.transientField = input.readInt();
    }

    // the same toString
}

Here's the code with serialization and deserialization:

public static void main(String[] args) throws .... {
    Target target = new Target(112, "bzzz", 
        new FieldOfTargetClass(13, "Friday"), 52);
    System.out.println(target);

    var fileOutput = new FileOutputStream("ser_obj");
    var objectOutput = new ObjectOutputStream(fileOutput);
    objectOutput.writeObject(target);
    fileOutput.flush();
    objectOutput.flush();

    var fileInput = new FileInputStream("ser_obj");
    var objectInput = new ObjectInputStream(fileInput);
    Target deserializeTarget = (Target) objectInput.readObject();
    fileInput.close();
    objectInput.close();
    
    System.out.println(deserializeTarget);
}

The console output:

{Target = integerField: 112, stringField: bzzz, transientField: 52, customObjectField:

{FieldOfTargetClass = fieldId: 13, fieldString: 'Friday'}}

writeObject was executed

readObject was executed

{Target = integerField: 112, stringField: bzzz, transientField: 52, customObjectField:

{FieldOfTargetClass = fieldId: 13, fieldString: 'Friday'}}

I'd like to highlight two key points here. First, we serialize the transient field and the transient keyword serves as a marker for automatic serialization. However, it's still possible to manually write and read its values. That's exactly what we've done in the example above. Second, we need to read the fields in the readObject method in the same order in which we've written them in the writeObject method.

Externalizable

Its main difference from Serializable is that it's not a marker interface. When we implement Externalizable, we should override the following methods:

writeExtrenal(ObjectOutput out) is a method that is used to write the values of serializable fields to the out object;
readExternal(ObjectInput input) is a method that is used to restore the object, assigning values from the input object to the previously serializable fields.

Requirements for serializing via Externalizable:

implement the Externalizable interface;
override the readExternal and writeExternal methods;
ensure that all fields involved in serialization support it;
declare a public no-arg constructor in the class.

The fourth point, as it was to Serializable, stems from the nuances of deserialization. We'll also look at them a little later. First, let's look at examples.

I've changed the classes used in the Serializable example.

The code responsible for serialization/deserialization hasn't changed, so for clarity, I'll omit it and show only the console output.

Here's the deserializable class:

public class Target implements Externalizable {

    private int integerField;
    private String stringField;
    private FieldOfTargetClass field;
    private transient int transientField;

    public Target() { }

    public Target(int i, String s, FieldOfTargetClass f, int trn) {
        this.integerField = i;
        this.stringField = s;
        this.field = f;
        this.transientField = trn;
    }

    @Override
    public void writeExternal(ObjectOutput out) throws .... {
        out.writeInt(integerField);
        out.writeUTF(stringField);
        out.writeObject(field);
        out.writeInt(transientField);
    }

    @Override
    public void readExternal(ObjectInput in) throws .... {
        this.integerField = in.readInt();
        this.stringField = in.readUTF();
        this.field = (FieldOfTargetClass) in.readObject();
        this.transientField = in.readInt();
    }

    // the toString method is the same as in the Serializable example
    ....
}

Here's the FieldOfTargetClass:

public class FieldOfTargetClass implements Externalizable {

    private int fieldId;
    private String fieldString;

    public FieldOfTargetClass() { }

    public FieldOfTargetClass(int i, String s) {
        this.fieldId = i;
        this.fieldString = s;
    }

    @Override
    public void writeExternal(ObjectOutput out) throws .... {
        out.writeInt(fieldId);
        out.writeUTF(fieldString);
    }

    @Override
    public void readExternal(ObjectInput in) throws .... {
        this.fieldId = in.readInt();
        this.fieldString = in.readUTF();
    }

    // the toString method is the same as in the Serializable example
    .... 
}

The console output during serialization:

{Target = integerField: 112, stringField: bzzz, transientField: 52, customObjectField:

{FieldOfTargetClass = fieldId: 13, fieldString: 'Friday'}}

The console output during deserialization:

{Target = integerField: 112, stringField: bzzz, transientField: 52, customObjectField:

{FieldOfTargetClass = fieldId: 13, fieldString: 'Friday'}}

As with the readObject and writeObject methods we defined in Serializable, note that the fields are read in the order they're written to the stream. Since we manually serialize fields by writing their values to the stream, we can serialize the transient field as well.

Note that the subclasses that implement Externalizable are Externalizable. That means they should also have the public no-arg constructor.

Digging digging digging!

Here, we'll explore how an Externalizable object is deserialized.

Let's go back to how the object constructor is created in the descriptor and take a peek at getExternalizableConstructor:

private static Constructor<?> getExternalizableConstructor(Class<?> cl) {
    try {
        Constructor<?> cons = cl.getDeclaredConstructor((Class<?>[]) null);
        cons.setAccessible(true);
        return ((cons.getModifiers() & Modifier.PUBLIC) != 0) ?
            cons : null;
    } catch (NoSuchMethodException ex) {
        return null;
    }
}

We pass the class of the deserializable object to the method, which returns the no-argument constructor. This constructor will instantiate the object.

The object state is restored in the readExternalData method of the ObjectInputStream class:

private void readExternalData(Externalizable obj, 
                              ObjectStreamClass desc) throws .... {
    ....
    if (obj != null) {
        try {
            obj.readExternal(this);
        } catch (ClassNotFoundException ex) {
            ....
        }
    }
    ....
}

Conclusion

In this article, we got a glimpse behind the scenes of Java serialization. I aimed to introduce the basics of serialization and highlight some interesting aspects, using both documentation and Java code examples. However, it's worth noting that serialization has many more intriguing facets to explore! Unfortunately, I can't cover them all in one article.

I think it's time to wrap up! If you have any thoughts to share, feel free to join the discussion in the comments!

#Knowledge #Java