Static Checker for Immutability in Kotlin

An immutable object is an object whose state cannot be modified after it is created. Immutable objects have several desirable properties, of which the two most important ones relevant to this post are:

  • They are inherently thread-safe: being read-only, they can be accessed safely from separate threads without having to worry about unexpected state or overwriting changes.
  • They can act as value objects: two value objects created equal should remain equal, which is guaranteed when they are unable to change state.

Therefore, it is common to implement objects as immutable, and software frameworks may require/expect polymorphic objects passed to their APIs to be immutable. Unfortunately, correctly implementing an object as immutable relies on the developer’s expertise in most mainstream OOP languages (it cannot be enforced by the programming language).

Data classes in Kotlin certainly simplify creating value objects: an equals, hashCode, and copy function are automatically generated. However, data classes are not immutable by default! Kotlin still allows defining var members on data classes, and adding members of mutable types.

True guarantees for immutability would have to be baked in to the programming language/compiler, or verified using a static checker. While proposals to support immutability in Kotlin exist, a fully functional static checker for Kotlin is already available—detekt.

Detekt contains a rule out of the box to verify whether all members in data classes are specified as val, but does not verify whether those members are immutable types. Furthermore, not all immutable objects should necessarily be implemented as data classes, and there might be cases in which mutability in data classes is desirable.

Having a need to enforce developers that extend base types in a framework I am working on as immutable and/or as data classes, I implemented a detekt plugin which enables verifying whether concrete classes are implemented as specified according to annotations applied to base types (e.g., @Immutable and @ImplementAsDataClass).

For example, the following implementation will warn NotImmutable is not implemented as immutable when running the static checker since Mutable.foobar is specified as var:

class Mutable( var foobar: Int ) 

class NotImmutable( val mutable: Mutable ) : Base

@Immutable
interface Base

The plugin is definitely not complete yet, in that it does not verify all cases which may be mutable, but it already catches a majority of errors and as an open-source contribution on GitHub I hope other might take an interest and contribute to the project.

Beyond private accessibility

Surely you know about public, protected, internal and private. They are access modifiers often used in Object-Oriented Programming which indicate from where members can be accessed. If you’ve been following this blog you might have noticed I am a big fan of encapsulation. Previously I discussed how anonymous functions can be used to encapsulate reuseable logic within the scope of one function. This inspired me to look for more ways in which lambdas can be used for improved encapsulation.

Private access is the least permissive access level. Private members are accessible only within the body of the class or the struct in which they are declared. – msdn (.NET)

Some languages go a step further and support static locals, which allow a value to be retained from one call of a function to another with a static lifetime. The variable can only be accessed from within the function scope where it is declared. I have yet to find a language which supports the same, but with instance lifetime. Such a variable would only be visible to a single method within a single instance.

Function private?

Let’s call such a hypothetical accessibility ‘function private’. When would you use it? Consider those annoying light bulbs that burn out just as they’re turned on.


The _lifeTime variable is only relevant to the SwitchOn method. Notice how C# already offers a way to write the IsOn property in a concise way by using an auto-implemented property. This actually already limits the scope of its backing field beyond private! The auto-generated backing field can only be accessed from the getter and setter.
class LightBulb
{
    public bool IsOn
    {
      get;
      private set;
    }

    private int _lifeTime = 100;
    public void SwitchOn()
    {
        if ( _lifeTime - 1 >= 0 )
        {
            --_lifeTime;
            IsOn = true;
        }
    }

    public void SwitchOff()
    {
        IsOn = false;
    }
}

What if the following would be possible? It would encapsulate lifeTime just to the scope where it’s needed.

public void SwitchOn()
{
    private int lifeTime = 100;  // function private variable
    // private static int staticVar;  // add 'static' to create a static local

    if ( lifeTime - 1 >= 0 )
    {
        --lifeTime;
        IsOn = true;
    }
}

Lambda scope

A lambda is something peculiar. It can be used to create delegates, but how do they work? What happens behind the scenes? I asked the question on Stack Exchange, and got some really insightful answers. Relevant to this discussion are the following two points:

  • The method that backs the delegate for a given lambda is always the same.
  • The method that backs the delegate for “the same” lambda that appears lexically twice is permitted to be the same, but in practice is not the same in .NET 4.0.

The fact that every lambda has a unique backing method gives the possibility to use it as an identifier of the scope where it is defined.

Unsafe implementation

By ab(using) this behavior I was able to create the following working implementation. The Private class creates and keeps track of its instances by linking them to the delegates passed to the static constructor methods. The first time the method is called the instance is created. Subsequent calls the instance is retrieved by doing a lookup in static dictionaries based on the passed arguments.

public void SwitchOn()
{
    Private<int> lifeTime = Private<int>.Instance( () => 100, this );  // function private
    // Private<int> staticVar = Private<int>.Static( () => 0 );  // static local

    if ( lifeTime.Value - 1 >= 0 )
    {
        --lifeTime.Value;
        IsOn = true;
    }
}

Why is it unsafe?

  • This behavior is not guaranteed to be supported in later versions of .NET. As mentioned before: “The method that backs the delegate for “the same” lambda that appears lexically twice is permitted to be the same, but in practice is not the same in .NET 4.0.”
  • Objects which use the code as it is now are never garbage collected. Using weak references could solve this issue. It seems a dictionary with weak references already exists.
  • There is a small performance overhead doing the dictionary lookups. Ideally this feature would be supported at compile time.

Improved encapsulation using lambdas

Let’s take a sidestep from the more advanced runtime code generation topic (don’t worry, still more to follow later) and move to easier territory. On a blog about software design, what better topic than encapsulation.

Abstraction, information hiding, and encapsulation are very different, but highly-related, concepts. One could argue that abstraction is a technique that helps us identify which specific information should be visible, and which information should be hidden. Encapsulation is then the technique for packaging the information in such a way as to hide what should be hidden, and make visible what is intended to be visible. – Edward V. Berard

Modern object-oriented languages start to lend features from functional programming languages, like lambdas. They can be used to create anonymous functions using a very concise syntax. One of the often overlooked advantages of anonymous functions beside its conciseness is they can be used to encapsulate logic even further than a private method. When making a method private, it can be accessed by the entire scope of the class, while anonymous functions can only be accessed from the scope they are created in, or passed to.

Consider the following highly inefficient “Hello World!” implementation:

        public void HelloWorld()
        {
            string[] words = new[] { "hello", "pretty", "world" };

            Console.WriteLine(
                ComplexHelloWorldParsing( words[ 0 ] ) +
                words[ 1 ] +
                ComplexHelloWorldParsing( words[ 2 ] ) );
        }

        private static string ComplexHelloWorldParsing( string input )
        {
            string parsed = input;
            // ... plenty of complex specific parsing, only used in HelloWorld.
            return parsed;
        }

ComplexHelloWorldParsing is – and let’s assume should – only be used inside the HelloWorld method. Still it is visible to the entire class. In such a scenario, where encapsulating the specific behavior doesn’t make sense, I would use the following approach regardless of lines of code inside the delegate.

public void HelloWorld()
{
    string[] words = new[] { "hello", "pretty", "world" };

    Func complexParsing = s =>
    {
        string parsed = s;
        // ... plenty of complex method specific parsing.
        return parsed;
    };

    Console.WriteLine( complexParsing( words[ 0 ] ) + words[ 1 ] + complexParsing( words[ 2 ] ) );
}

It surprises me not all people agree on this advantage. For the same reason some professional programmers prefer to split functions just to make them smaller, some state lambdas should always be short. They state longer logic should (by a rule of thumb) be placed in a private method. If you know any pro ‘short delegate’ arguments, be sure to let me know. I started a question relating to this topic on Programmers Stack Exchange.

Abstraction is everything

Recently I had a discussion with someone where I bluntly stated I didn’t like virtual functions. Now that I’m reading more about software design principles I can formulate myself a bit better. After a few quick google searches, I couldn’t find the following argumentation anywhere, so I thought it might be relevant to post about it.

First, to rephrase myself: “Prefer pure virtual functions over non-pure virtual functions where possible.” As a quick refresher, pure virtual functions are required to be implemented by a derived class, while non-pure aren’t. Classes containing pure virtual functions are called abstract classes.

In good tradition I’ll start by quoting a design principle to which my statement relates.

Open/Closed Principle: Software entities should be open for extension, but closed for modification. – Bertrand Meyer

More specifically, the “Polymorphic Open/Closed Principle”. This definition advocates inheritance from abstract base classes. The existing interface is closed to modifications and new implementations must, at a minimum, implement that interface.

Although most people understand abstract classes, and how it relates to the template method pattern, I often see implementations relying on overridable functions instead. I rarely use non-pure virtual methods. Only after considering all other possibilities, I might find an overridable virtual method the cleanest approach.

Consider the following C# example taken from the msdn documentation on override:

public class Square
{
    public double x;

    public Square( double x )
    {
        this.x = x;
    }

    public virtual double Area()
    {
        return x * x;
    }
}

class Cube : Square
{
    public Cube( double x ) : base( x )
    {
    }

    public override double Area()
    {
        return 6 * base.Area();
    }
}

I understand that the sole intent of the example is to demonstrate the usage of the override keyword, so I’m not criticizing it. It just serves as a nice example where I find an other approach to be a better implementation. I also added additional functionality to highlight the Open/Closed Principle.

 

public abstract class AbstractShape
{
    public abstract double Area();

    public override string ToString()
    {
        return GetType().ToString() + ": area " + Area();
    }
}

public class Square : AbstractShape
{
    private double _sideLength;

    public Square( double sideLength )
    {
        _sideLength = sideLength;
    }

    public override double Area()
    {
        return _sideLength * _sideLength;
    }
}

public class AbstractCompositeShape : AbstractShape
{
    private List<AbstractShape> _childShapes;

    protected AbstractCompositeShape( List<AbstractShape> shapes )
    {
        _childShapes = shapes;
    }

    public override double Area()
    {
        double totalArea = 0;
        foreach ( AbstractShape shape in _childShapes )
        {
            totalArea += shape.Area();
        }
        return totalArea;
    }
}

public class Cube : AbstractCompositeShape
{
    private Cube( List<AbstractShape> sides ) : base( sides )
    {
    }

    public static Cube FromSideLength( double sideLength )
    {
        List<AbstractShape> sides = new List<AbstractShape>();
        for ( int i = 0; i < 6; ++i )
        {
            sides.Add( new Square( sideLength ) );
        }
        return new Cube( sides );
    }
}

Of course this example shouldn’t be considered as a real world example. An implementation is mainly dependant on its usage. This example could be useful where unique identification of every face is important, and where a lot of other shapes are expected. Just to state a few things that would also be worth considering: generics, localization, performance, …

The second sample prevents misuse and promotes reuse, which are advantages of following the Open/Closed Principle like this. You can’t forget to implement a required function. It also demonstrates there are more alternatives to virtual functions than just abstract functions. Composition was used here to achieve the same goal as the first example.

As a set of guidelines I would specify:

  • When a function always needs a custom implementation in an extending class, never use non-pure virtual functions.
  • When it is possible to group a certain default implementation in a subclass, do so instead of defining it as a default non-pure virtual function. (e.g. Area() function of AbstractCompositeShape in the example above.)
  • Question yourself why the behavior of a function needs to be changed. Can this new behavior be encapsulated? (e.g. Usage of composition in AbstractCompositeShape.)
  • Not overriding a certain virtual method shouldn’t have major functional consequences. (e.g. ToString method.)

The lead architect of C#, Anders Hejlsberg, explains why non-virtual is default in C#, as opposed to Java. He also advises caution when making functions virtual.

UPDATE
A more Java oriented argumentation can be found here.

OO VS Procedural

Consider this part 2 of a critical review of the book “Clean Code” by Robert C. Martin. Previously I mentioned that I don’t agree with some of the ideas found in this book on how to write clean code. I argued that following some of his ideas, result in the exact opposite, unmaintainable code. More specifically, his wording of the principle:

Functions should do one thing. They should do it well. They should do it only. – Uncle Bob (Robert C. Martin)

I argued that the main reason for a function should be reuse, and not readability. As a result of this statement, I also disagree with his statement that comments should be replaced by functions wherever possible.

It’s important to note, that I do agree with most of the other concepts mentioned in the book (as far as I have read it), and even gained some new insights.

I will not go into detail about all pros and cons of procedural programming versus OOP, but I do want to point out that his argument against OOP seems flawed.

Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.
Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change. – Uncle Bob (Robert C. Martin)

To help you visualize, here is the procedural example from the book. I’m quite sure you’ll be able to figure out the OO example yourself.

public class Geometry {
    public double area(Object shape) throws NoSuchShapeException {
        if (shape instanceof Square) {
            Square s = (Square)shape;
            return s.side * s.side;
        }
        else if (shape instanceof Rectangle) {
            Rectangle r = (Rectangle)shape;
            return r.height * r.width;
        }
        else if (shape instanceof Circle) {
            Circle r = (Circle)shape;
            return PI * c.radius * c.radius;
        }
        throw new NoSuchShapeException();
    }
}

John already mentions one argument in his post, relating to what “hard” and “easyexactly means in this statement. By changing these definitions with what he is actually referring to, it becomes obvious it doesn’t say anything about whether or not one is better than the other.

Procedural code allows you to extend functionality by adding one function which supports behavior for all the required data structures. OO code, on the other hand, requires you to add the new function to all the different data structures.
Procedural code makes you adjust all functions to support a new data structure. OO code makes you implement a new function in all data structures.

I would even say, it becomes clear why this is actually a pro argument for OO:

  • Handling all data structures inside one function breaks the cohesion principle, which ironically as John already stated, is a principle on which Robert C. Martin himself based his Single Responsibility Principle.
  • When adding a new data structure in procedural code, you need to make sure that every function supports it. OO forces you to do all the required implementations.

Based on his previous quotation Martin concludes:

The idea that everything is an object is a myth. Sometimes you really do want simple data structures with procedures operating on them. – Uncle Bob (Robert C. Martin)

Everything can be represented as an object (also concepts), so I don’t see how that is a myth. I do agree with the conclusion, but in my opinion it has got nothing to do with the first sentence. Instead of the geometry example, a better example would be different functions operating on a set of Point objects to draw various different shapes.