In the Bulgarian JUG we had an event dedicated to trying out the OpenJDK Valhalla project’s achievements in the area of using primitive parameters of generics. Our colleague and blogger Mihail Stoynov already wrote about our workshop.

In the first part of this three-part series of posts you could read about the reasoning behind not supporting generic classes with primitive parameters. Before I continue with the current proposal for the implementation, I would like to again make a very important disclaimer. I am not an expert in this matter. I just try to follow the project Valhalla mailing list. I also read Brian Goetz’s State of Specialization document. So look at this series more as explaining the generics proposals in layman’s terms.

Project Valhalla

Whenever the OpenJDK developers want to experiment with a concept they first create a dedicated OpenJDK project for that. This project usually has its own source repository, which is a fork of the OpenJDK sources. It has its own page and mailing list and its main purpose is to experiment with ideas for implementing the new concept before creating the Java Enhancement Proposals (JEPs), the Java Specification Requests (JSRs) and committing source code in the real repositories. Features like lambdas, script engine support, method handles and invokedynamic went down this route before entering the official Java release.

One such project is Project Valhalla. Its goal is to research things like value types, enhanced volatiles and primitives (and value types) as parameters of generic types. The last feature of this impressive list is the topic of this blog series. Being part of a research project means that it may or may not exist as such in one of the future releases of Java. As Java 9 will be out quite soon (hopefully in a year or so), it is almost certain that we will not have the opportunity to generify over primitives any time soon. Anyway, it is a good idea to closely follow the development of this and other features that is why I wrote this blog post.

Coming back to the primitives in generics topic. After considering many arguments, the project developers, led by the Java language architect Brian Goetz, decided to make three substantial compromises and came up with a proposal.

Compromise#1: the language syntax

The first syntactic construct that comes into mind when talking about a, let’s say, list of primitive integers isList<int>. However, in the previous installment we saw why this is not possible. Or to put it another way, it wouldn’t be possible without making big changes in the platform and possibly breaking backward compatibility. That is, the existing rules are rigorous about the fact that the generic parameter of a type can always be converted to Object, it should be assignable to null, etc. As big incompatible changes are not permitted, we come to compromise number one: if a class wants to allow enhanced generic support, i.e. support of primitives as generic parameter types, it has to explicitly state it in its definition, rather than relying that the rules for generics will be changed. This means that a new special syntax will be introduced on language level that will distinguish these enhanced generics from the existing ones. Here is the current proposal:

public class Box<any T> {

    private T value;

    public Box(T value) {
        this.value = value;
    }

    T value() {
        return value;
    }
}

You can probably notice the any modifier on the type variable. As per the current proposal it will be used to denote that the Box class can be parameterized with both reference as well as primitive types. Now, you can do things like this with the Box:

Box intBox = new Box<>(42);
System.out.println(intBox.value());

Compromise#2: the runtime representation

Another concern that has to be taken into account is about the runtime representation of an enhanced generic type. Before we go into further details here, let me explain the term specialization.

The process of creating different implementation of a certain type based on its generic characteristics is called specialization. Let’s take C++. There you have templates. C++ will generate a different class for each different template type. This is called heterogeneous translation. In Java and C# the situation is the opposite. These languages create one and the same runtime class for any type parameter. This is called homogeneous translation. With heterogeneous translation you are flexible in terms of combining parameter types: you can do things like <String+Integer> for example. But you are not allowed to do things like <? extends Number>, which as we know is perfectly fine in Java.

So, coming back to the proposed implementation topic. The homogeneous translation of generic types was possible in Java because of the erasure. However, primitive types cannot be erased. And this brings compromise number two: there will be a hybrid homogeneous-heterogeneous translation. This means that the reference types will continue to be erased and they will be translated as they used to be. While the primitive type parameters will be specialized: there will be a separate runtime class for every primitive generic type.

To illustrate this, let’s go back to the code from above:

Box<int> intBox = new Box<>(42);
System.out.println(intBox.value());

Let’s take a look at the byte code that is emitted (java -p), or at least at the relevant parts of it:

0: new           #3                  // class "Box${0=I}"
......
6: invokespecial #4                  // Method "Box${0=I}"."<init>":(I)V
....
14: invokevirtual #6                  // Method "Box${0=I}".value:()I

You can easily notice that whenever the Box is parameterized with a primitive int, at runtime the class that is generated is not called just Box (as it would be called in case of erased Box), but Box${0=I}, where I stands for Integer. So the class name is augmented with specialization info to help the virtual machine generate the right runtime class.

Compromise#3: subtyping

In the generics as they exist today the following subtyping rules are valid: Box<Integer> extends Box<?>, which in turns extends the raw type Box. This does not apply for Box<int>, though. The reason is hiding again in the fact that there is no common type of reference and primitive types. So if we were to allow Box<int> to extend raw Box, then the former should have its value field of type Object. At the same time this field should be of type int, because that is how it was declared. As int and object don’t share common super type, this is not possible.

So the only subtyping relationship with primitive generics would be of the kind ArrayList<int> extends List<int>. And most unfortunate: List<int> cannot extend List<Integer> because of the transitive inheritance leading to the raw type.

Restrictions and special features

Let’s take again our Box<any T> class. Because the T type can be both primitive as well as reference, there are some restrictions for the things that you can do with it:

  • You cannot assign or even compare the value field of the Box class with null
  • It cannot be converted to Object or Object[]
  • You cannot synchronize or lock a block of code with it
  • It is not possible to convert Box<any T> to Box<?> or Box

At the same time, there are some features that are only available to the enhanced generics:

  • You can do things like new T[<size>] (it is not possible to do that with erased T). This will instantiate Object[] when T is reference type and the correct array in case of primitive type.
  • You can do comparisons with the instanceof operator
  • You can call Box<any T>.class

Generic methods

So far we’ve only discussed the implementation proposal for enhanced generic types. But what about enhanced generic methods? They are supported, so you can do things like:

<any T> void printValue(Box<T> box)

While it preserves the language syntax of the generic types, the internal representation is different. With enhance generic types it is possible to have specialization – different runtime type for the different generic parameter types. But this is not possible with methods (i.e. separate runtime method for the different method calls). This is because in that way the interface of the class will change – it would gain some more methods than declared. This is not so easy to achieve as most VM implementations are organized on the assumption that the number of methods for a given class is fixed.

That is why the enhanced generic methods take the same approach as the lambda expressions:invokedynamic. There will be a special bootstrap class (GenericMethodSpecializer), which will receive as arguments all the needed information in order to make the proper decision which special method to call.

Conclusion

In this second installment of the Primitives in Generics series we went quickly through the proposal coming from project Valhalla on how this feature will be implemented in Java. We saw what the proposed syntax will be and how will it be represented in the virtual machine. Then we discussed some of the restrictions introduced in subtyping and in the operations allowed with generic parameters. We also touched the topic of generic methods and how they differ in terms of internal representation.

In the final part of the series we’ll walk the migration path of existing JDK APIs and namely the most important of them all: the collections library.

Project Valhalla and Compromises Made – Primitives in Generics Under the Microscope: Part 2

About The Author
-

1 Comment

  • nikt
    Reply

    Prediction: all these corner cases will just eventually make Java as complex as C++. Other JVM languages will increasingly eat Java’s lunch.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>