Interpolation no more

Mathematics isn’t my forte. That’s why each time I have to implement some form of interpolation, I feel a headache coming up. Even simple linear interpolation has been bugging me, as it’s the same thing over and over again. One of the beauties of software development is you can write something once, and as long as it works, you don’t have to worry about it anymore. Why then, have I already implemented interpolation for around 5 different data types? My latest need for interpolation, I finally decided to put some effort into this bugger and solve it once and for all.

Armed with a recently discovered library which allows usage of generics for calculations in C#, I managed to come up with the following classes.

Abstract interpolation class diagram.

Implementations of AbstractInterpolation decide on the type of interpolation between certain key points. How key points should be interpreted is decided by the implementation of AbstractKeyPointCollection. The difference between the two implementations will be demonstrated later. In order to know what to interpolate (the TValue template parameter), a trivial implementation of  AbstractTypeInterpolationProvider is required for the type which needs to be interpolated. All classes have a base class AbstractBasicArithmetic, which allows to do basic arithmetic operations on the specified template type TMath. This type determines the value type to use, and return for the calculations of the interpolation. They are done through a ‘Calculator‘ object, as specified in the generic calculations article. Normally you have to pass this calculator object yourself, but I created a CalculatorFactory which creates the corresponding correct calculator for known value types like double.

Perhaps, to better understand the design, a code sample is easier to demonstrate. Consider the desired result pictured in the following image. Of course to achieve drawing simple lines, better solutions exist, but for demonstration purposes, this works really well. The subsequent code sets up some points to interpolate between, and the interpolation which will be used.

CumulativeKeyPointCollection<Point, double> keyPoints = new
    CumulativeKeyPointCollection<Point, double>( new PointInterpolationProvider() );

keyPoints.Add( new Point( 50, 50 ) );
keyPoints.Add( new Point( 100, 100 ) );
...
keyPoints.Add( new Point( 300, 330 ) );

LinearInterpolation<Point, double> linear = new LinearInterpolation<Point, double>( keyPoints );

CumulativeKeyPointCollection is chosen instead of AbsoluteKeyPointCollection, because every key point added to the collection ‘continues’ on the previous point. The line needs to go through the points in the order in which they were added. When using AbsoluteKeyPointCollection, every key point has an absolute position. It doesn’t matter in which order the key points are added. E.g. Interpolating between known locations at specified times, where time is then used as the absolute position.

Next, some interpolated points are calculated as follows.

List<Point> interpolatedPoints = new List<Point>();
double curPercentage = 0;
while ( curPercentage < 1 )
{
    interpolatedPoints.Add( linear.Interpolate( curPercentage ) );
    curPercentage += 0.001;
}

And that’s how easy interpolation can be! To achieve the result as pictured earlier, I just draw line segments between all the calculated points, and draw rectangles on top of the key points. All that remains for the code to work, is the type provider for Point which was passed to the key point collection. As you can see, that’s just a simple implementation of AbstractTypeInterpolationProvider.

    /// <summary>
    ///   Allows AbstractInterpolation to interpolate over a list of points.
    /// </summary>
    public class PointInterpolationProvider : AbstractTypeInterpolationProvider<Point, double>
    {
        /// <summary>
        ///   Create a new provider to allow AbstractInterpolation to interpolate
        ///   between a list of points.
        /// </summary>
        public PointInterpolationProvider()
            : base( 2 )  // The amount of dimensions this type has.
        {
        }

        protected override double GetDimensionValue( Point value, int dimension )
        {
            switch ( dimension )
            {
                case 0:
                    return value.X;
                case 1:
                    return value.Y;
                default:
                    throw new NotSupportedException(
                        "Unsupported dimension while doing interpolation on type " +
                        typeof(Point) + "." );
            }
        }

        public override double RelativePosition( Point from, Point to )
        {
            // Pythagoras to get distance.
            return Math.Sqrt( Math.Pow( to.X - from.X, 2 ) + Math.Pow( to.Y + from.Y, 2 ) );
        }

        public override Point CreateInstance( double[] interpolated )
        {
            return new Point( interpolated[ 0 ], interpolated[ 1 ] );
        }
    }

To show the flexibility of this approach, cardinal spline interpolation is possible by just changing the used LinearInterpolation class to CardinalSplineInterpolation.

CardinalSplineInterpolation spline =
    new CardinalSplineInterpolation( keyPoints );

Every time I need to implement something a bit more mathematically challenging, and search for help, I seem to end up on web pages created during the Middle Ages of the internet. Aren’t there any reuseable libraries out there which support e.g. interpolation out of the box on the actual types you want to perform it on? Like Point in the before given sample. I experience that so many libraries implement concrete implementations instead of generic solutions. For the interpolation to work I had to implement a generic binary search algorithm because SortedList (used to store the key points) didn’t support a binary search out of the box. To indicate intervals of values I had to create a generic Interval class, which I immediately could use in other source files as well. All this, of course brings some overhead with it. Instead of Donald Knuth‘s famous “premature optimization is the root of all evil” quote, I find Bill Harlan’s succinct expression even better.

It is easier to optimize correct code than to correct optimized code. – Bill Harlan

Source code can be found as part of my FCL Extension library in the Whathecode.System assembly and namespace Whathecode.System.Arithmetic.Interpolation.

Abstraction is everything

Recently I had a discussion with someone where I bluntly stated I didn’t like virtual functions. Now that I’m reading more about software design principles I can formulate myself a bit better. After a few quick google searches, I couldn’t find the following argumentation anywhere, so I thought it might be relevant to post about it.

First, to rephrase myself: “Prefer pure virtual functions over non-pure virtual functions where possible.” As a quick refresher, pure virtual functions are required to be implemented by a derived class, while non-pure aren’t. Classes containing pure virtual functions are called abstract classes.

In good tradition I’ll start by quoting a design principle to which my statement relates.

Open/Closed Principle: Software entities should be open for extension, but closed for modification. – Bertrand Meyer

More specifically, the “Polymorphic Open/Closed Principle”. This definition advocates inheritance from abstract base classes. The existing interface is closed to modifications and new implementations must, at a minimum, implement that interface.

Although most people understand abstract classes, and how it relates to the template method pattern, I often see implementations relying on overridable functions instead. I rarely use non-pure virtual methods. Only after considering all other possibilities, I might find an overridable virtual method the cleanest approach.

Consider the following C# example taken from the msdn documentation on override:

public class Square
{
    public double x;

    public Square( double x )
    {
        this.x = x;
    }

    public virtual double Area()
    {
        return x * x;
    }
}

class Cube : Square
{
    public Cube( double x ) : base( x )
    {
    }

    public override double Area()
    {
        return 6 * base.Area();
    }
}

I understand that the sole intent of the example is to demonstrate the usage of the override keyword, so I’m not criticizing it. It just serves as a nice example where I find an other approach to be a better implementation. I also added additional functionality to highlight the Open/Closed Principle.

 

public abstract class AbstractShape
{
    public abstract double Area();

    public override string ToString()
    {
        return GetType().ToString() + ": area " + Area();
    }
}

public class Square : AbstractShape
{
    private double _sideLength;

    public Square( double sideLength )
    {
        _sideLength = sideLength;
    }

    public override double Area()
    {
        return _sideLength * _sideLength;
    }
}

public class AbstractCompositeShape : AbstractShape
{
    private List<AbstractShape> _childShapes;

    protected AbstractCompositeShape( List<AbstractShape> shapes )
    {
        _childShapes = shapes;
    }

    public override double Area()
    {
        double totalArea = 0;
        foreach ( AbstractShape shape in _childShapes )
        {
            totalArea += shape.Area();
        }
        return totalArea;
    }
}

public class Cube : AbstractCompositeShape
{
    private Cube( List<AbstractShape> sides ) : base( sides )
    {
    }

    public static Cube FromSideLength( double sideLength )
    {
        List<AbstractShape> sides = new List<AbstractShape>();
        for ( int i = 0; i < 6; ++i )
        {
            sides.Add( new Square( sideLength ) );
        }
        return new Cube( sides );
    }
}

Of course this example shouldn’t be considered as a real world example. An implementation is mainly dependant on its usage. This example could be useful where unique identification of every face is important, and where a lot of other shapes are expected. Just to state a few things that would also be worth considering: generics, localization, performance, …

The second sample prevents misuse and promotes reuse, which are advantages of following the Open/Closed Principle like this. You can’t forget to implement a required function. It also demonstrates there are more alternatives to virtual functions than just abstract functions. Composition was used here to achieve the same goal as the first example.

As a set of guidelines I would specify:

  • When a function always needs a custom implementation in an extending class, never use non-pure virtual functions.
  • When it is possible to group a certain default implementation in a subclass, do so instead of defining it as a default non-pure virtual function. (e.g. Area() function of AbstractCompositeShape in the example above.)
  • Question yourself why the behavior of a function needs to be changed. Can this new behavior be encapsulated? (e.g. Usage of composition in AbstractCompositeShape.)
  • Not overriding a certain virtual method shouldn’t have major functional consequences. (e.g. ToString method.)

The lead architect of C#, Anders Hejlsberg, explains why non-virtual is default in C#, as opposed to Java. He also advises caution when making functions virtual.

UPDATE
A more Java oriented argumentation can be found here.

OO VS Procedural

Consider this part 2 of a critical review of the book “Clean Code” by Robert C. Martin. Previously I mentioned that I don’t agree with some of the ideas found in this book on how to write clean code. I argued that following some of his ideas, result in the exact opposite, unmaintainable code. More specifically, his wording of the principle:

Functions should do one thing. They should do it well. They should do it only. – Uncle Bob (Robert C. Martin)

I argued that the main reason for a function should be reuse, and not readability. As a result of this statement, I also disagree with his statement that comments should be replaced by functions wherever possible.

It’s important to note, that I do agree with most of the other concepts mentioned in the book (as far as I have read it), and even gained some new insights.

I will not go into detail about all pros and cons of procedural programming versus OOP, but I do want to point out that his argument against OOP seems flawed.

Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.
Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change. – Uncle Bob (Robert C. Martin)

To help you visualize, here is the procedural example from the book. I’m quite sure you’ll be able to figure out the OO example yourself.

public class Geometry {
    public double area(Object shape) throws NoSuchShapeException {
        if (shape instanceof Square) {
            Square s = (Square)shape;
            return s.side * s.side;
        }
        else if (shape instanceof Rectangle) {
            Rectangle r = (Rectangle)shape;
            return r.height * r.width;
        }
        else if (shape instanceof Circle) {
            Circle r = (Circle)shape;
            return PI * c.radius * c.radius;
        }
        throw new NoSuchShapeException();
    }
}

John already mentions one argument in his post, relating to what “hard” and “easyexactly means in this statement. By changing these definitions with what he is actually referring to, it becomes obvious it doesn’t say anything about whether or not one is better than the other.

Procedural code allows you to extend functionality by adding one function which supports behavior for all the required data structures. OO code, on the other hand, requires you to add the new function to all the different data structures.
Procedural code makes you adjust all functions to support a new data structure. OO code makes you implement a new function in all data structures.

I would even say, it becomes clear why this is actually a pro argument for OO:

  • Handling all data structures inside one function breaks the cohesion principle, which ironically as John already stated, is a principle on which Robert C. Martin himself based his Single Responsibility Principle.
  • When adding a new data structure in procedural code, you need to make sure that every function supports it. OO forces you to do all the required implementations.

Based on his previous quotation Martin concludes:

The idea that everything is an object is a myth. Sometimes you really do want simple data structures with procedures operating on them. – Uncle Bob (Robert C. Martin)

Everything can be represented as an object (also concepts), so I don’t see how that is a myth. I do agree with the conclusion, but in my opinion it has got nothing to do with the first sentence. Instead of the geometry example, a better example would be different functions operating on a set of Point objects to draw various different shapes.

Function Hell

Recently I bought the book “Clean Code” by Uncle Bob which discusses, well …, clean code. I’m very picky when it comes to clean code, which is why I wanted to read opinions of others to perhaps broaden or refine some of the practices I use myself.

I just read through a chapter on functions, of which a presentation by the author himself can be found online. So far I generally agreed to most statements made in the book, but the extent to which the following rule is applied bothers me, … a lot, … and then some more.

Functions should do one thing. They should do it well. They should do it only. – Uncle Bob

To demonstrate, consider the following “good practice” code taken from his blog post about the subject.

class SymbolReplacer {
    protected String stringToReplace;
    protected List alreadyReplaced = new ArrayList();
    private Matcher symbolMatcher;
    private final Pattern symbolPattern = Pattern.compile("\\$([a-zA-Z]\\w*)");

    SymbolReplacer(String s) {
      this.stringToReplace = s;
      symbolMatcher = symbolPattern.matcher(s);
    }

    String replace() {
      replaceAllSymbols();
      return stringToReplace;
    }

    private void replaceAllSymbols() {
      for (String symbolName = nextSymbol(); symbolName != null; symbolName = nextSymbol())
        replaceAllInstances(symbolName);
    }

    private String nextSymbol() {
      return symbolMatcher.find() ? symbolMatcher.group(1) : null;
    }

    private void replaceAllInstances(String symbolName) {
      if (shouldReplaceSymbol(symbolName))
        replaceSymbol(symbolName);
    }

    private boolean shouldReplaceSymbol(String symbolName) {
      return getSymbol(symbolName) != null && !alreadyReplaced.contains(symbolName);
    }

    private void replaceSymbol(String symbolName) {
      alreadyReplaced.add(symbolName);
      stringToReplace = stringToReplace.replace(
        symbolExpression(symbolName),
        translate(symbolName));
    }

    private String symbolExpression(String symbolName) {
      return "$" + symbolName;
    }

    protected String translate(String symbolName) {
      return getSymbol(symbolName);
    }
  }

This code explains the title of this post. Due to the amount of functions, the overview of this trivial procedural task is lost. The irony is that the decomposition of the code into so many functions is supposed to make it more readable.

A lot of the replies on his post share my sentiment that over extracting like this isn’t favorable. The replies of “Angry Man”, albeit harsh, describe the problems perfectly, and provide for a fun read in case you don’t get the point I’m trying to get across. The question that arises is: “When should a separate function be created?“.

I’ll try to define my approach by a few rules of thumb, where the emphasis is on reusability, encapsulation and global readability, instead of Uncle Bob’s specification which sadly seems solely based on creating small methods as a means to achieve local readability. What I mean exactly by this is explained later, along with the argumentation.

  • Prevent creating new methods that are guaranteed to only be called from one location.
  • Group strongly associated code together (cohesion/coupling principle), and put a comment above it mentioning its intent. Separate these blocks of code from other blocks by using e.g. newlines.
  • For every ‘block’ of code written, think whether it might be used somewhere else. When it’s reusable, encapsulate it inside a suitable function/class.

Every experienced programmer probably follows similar guidelines. Notice how instead of blindly following a rule, you actually think about how the code will be used later on. I would specify this as a design principle:

Always develop towards an API.

Argumentation

One of the main design principles I’d like to use as an argument is “encapsulation“. Creating a new function within a class makes it available to the entire class, and possibly extending classes. When there is no reuse of a certain piece of code, other scopes shouldn’t be bothered by it.

By following the rule of placing code in logical blocks, and documenting it’s intent, the entire block can be interpreted just as easily, and even better than by extracting it into a function. It’s intent can be described in plain english. This is what I defined as local readability. Additionally, global readability is maintained, as you immediately see where, and how often a block of code is used. Thanks to encapsulation, you know only about the definitions relevant to the current scope.

It’s a well-known fact that, when a method becomes too long, this is a strong indication that it should be refactored, following the Separation of Concerns principle. By blindly following the “extract till you drop” rule of Uncle Bob, you attempt to solve a problem, without reflecting what you are actually trying to solve. Often it just moves the problem to another scope, as too many methods in a class also indicate the need for refactoring.

UPDATE

While continuing reading the book I came across a perfect example to demonstrate this sad fact. Consider the following extracted function which can be found in Chapter 5 on Formatting, page 83.

private String getPageNameOrDefault(Request request, String defaultPageName)
{
    String pageName = request.getResource();
    if (StringUtil.isBlank(pageName))
        pageName = defaultPageName;

    return pageName;
}

Instead of immediately extracting this code into a function, he might have realised this behavior is pretty common. You could consider writing a “StringUtil.ensureValue(string value, string default)” function. Actually a quick google check brings up “StringUtils.defaultIfEmpty()” in the apache.commons library. You don’t want to write a new function for every string needing a default value. Actually this behavior is even more common, as you might also need default values for other types than strings. For small conditional checks like this some languages (including java) empower you with conditional operators. All you really need is the following:

String pageName = request.getResource();
pageName = StringUtil.isBlank(pageName) ? defaultPageName : pageName;

The thing that amazes me is that his trigger happy extraction rule violates so many good rules which are mentioned in the book as well. Besides the already mentioned encapsulation problem, it’s not hard to see how the above mentioned example violates the “Don’t Repeat Yourself” principle. If you would argue you might need to call this code more than once, don’t. This should be called exactly once, and stored in a variable. Which of the following two is more readable to be used after the first call: “getPageNameOrDefault(request, defaultName)” or “pageName“.

As far as I’ve read through the book (chapter 5), a better name for it would be “Readable Code” instead of “Clean Code“. I do hope the remainder contains more valuable tips. I am afraid however of the second section, full with code samples, which probably apply this nonsense “extract till you drop” rule.

Language Oriented Programming

Yesterday I spent several hours thinking about a concept I have been toying with for several days. Every programming language I have ever used, has its own limitations. The past few months I have been reaching those a lot, which is part of the reason why I started this blog. It made me wonder whether a programming language could be created, which can adjust itself at any given time. One important result is that this would allow you to create new keywords for a set of operations that are used regularly, solving the problem for which now only exist workaround solutions. Another cool aspect I realized, is when you start this process from byte level, the programming language itself could be its own compiler at runtime.

As a thought experiment I tried to come up with a basic set of rules, allowing for this required flexibility. If I didn’t make any logic errors, the rules I came up with can be implemented with a surprisingly small implementation, which for now, I’ll attempt to do in C#. When this turns out fruitful, the next step would be to do the same in assembly. The C# implementation, and the assembly implementation, would be able to run the same code.

While thinking of a name for the programming paradigm I had in mind (Read as: I needed to give the newly created solution in Visual Studio a name.), I checked Wikipedia for existing ones, but didn’t find any similar ones at first. Only by a subsequent google search with some keywords I found appropriate I stumbled upon Language Oriented Programming, and I’m really excited about it! The article reflects a lot (and more) of what I was thinking about and does a really good job of explaining it. As a nice ending brainfuck quotation:

Language Oriented Programming will not just be writing programs, but also creating the languages in which to write our programs. – Sergey Dmitriev, cofounder and CEO of JetBrains Inc.

Why code snippets promote bad design

It is a well established guideline that duplicate code should be prevented. It’s one of the main software design principles, also known as “Don’t Repeat Yourself” (DRY).

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system. – Andy Hunt and Dave Thomas in their book The Pragmatic Programmer

Code snippets are defined as “a small region of re-usable source-code“. A lot of IDE’s have a feature to manage and use code snippets. It should be quite obvious why this contradicts DRY. It is actually even worse, as it makes it easier for you to do so. Yet, so many people embrace them as a good ‘solution’. There already exists a solution for having to write duplicate code, it is called encapsulation.

So why do code snippets exist? Sometimes you reach the limitations of a programming language, and a piece of code can’t be encapsulated any further. You want to do a similar task over and over again, but with variable parameters. A good example of this is a C# property, which defines a value, accessible through a getter and optionally a setter.

public string Name
{
    get;
    set;
}

The previous sample is an auto-implemented property. Every keyword is relevant and required by C#. An equivalent would be a simple member field, “public string Name;“, but this wouldn’t give us the flexibility to easily make the setter private, or add additional code to the getter or the setter. The point I’m making is, within the confines of the language, it can’t be simplified without sacrificing flexibility. This should be the only reason to use code snippets!

Personally, I don’t use snippets at all, as I spend way more time designing my code, than writing it anyhow. Furthermore, I see the following disadvantages when using them:

  • You think less about the code you are writing. E.g. for the property, by default the setter is public, which is definitely not always desirable.
  • It promotes laziness, not caring whether a certain snippet can be encapsulated. E.g. code snippets for dependency properties as I discussed earlier.

Even when you find yourself in a scenario where you write a lot of code which can’t be improved any further, other options are available, e.g. code generation. I might make this the subject of one of my future posts.

Java Generic Observer

Design patterns, you gotta love them! What would we do without our beloved Observer Pattern? Since its usage is so common, most frameworks provide an implementation out of the box. Microsoft even added a language construct for it to C#, called ‘events‘. As a Java developer, it would seem you are less lucky. Through the years, my unconditional trust in the design of frameworks has been wavering. Java’s default implementation of the Observer pattern is a nice example of what I consider bad design. Perhaps that’s why it’s not even used in the framework itself. (ActionListener in awt)

If you haven’t watched “Despicable Me” yet, stop reading and watch it first! 😉 It’s a great movie. Consider a person (Gru) needing the ability to watch over his minions.

A standard implementation using the framework’s classes would be:

Gru gru = new Gru();
Minion fred = new Minion();
fred.addObserver(gru);
fred.moo();

public class Minion extends Observable
{
    public void moo()
    {
        setChanged();
        notifyObservers("hehehe");
    }
}

public class Gru implements Observer
{
    public void punch(Minion minion) { ... }

    @Override
    public void update(Observable o, Object arg)
    {
        if (o instanceof Minion)
        {
            String s = (String)arg;
            if (s.equals("hehehe"))
            {
                punch((Minion)o);
            }
        }
    }
}

The thing that bothers me (and others) in this implementation is the need for type checking. After browsing around, I found an implementation by Allain Lalonde who addressed this issue by using a dynamic proxy. I found his solution to be really useful, implemented it, and added a small feature which allows extending from a generic AbstractObservable class. The result looks as follows:

Gru gru = new Gru();
Minion fred = new Minion();
fred.addObserver(gru);
fred.moo();

public interface IMinionListener
{
    public void laughing(Minion minion);
}

public class Minion extends AbstractObservable<IMinionListener>
{
    public void moo()
    {
        getEventDispatcher().laughing(this);
    }
}

public class Gru implements IMinionListener
{
    public void punch(Minion minion) { ... }

    public void laughing(Minion minion)
    {
        punch(minion);
    }
}

How it works.

Events are dispatched through an ObserverPool as described by Allain Lalonde. This is a proxy class which forwards calls (as defined by the interface) to a set of listeners, the observers. You can use this pool directly, or extend from AbstractObservable which already creates and wraps it for you.

Possible problems.

As with the original Observable, you might not be able to extend from AbstractObservable since multiple inheritance isn’t allowed. You could use composition instead, and use the IAbstractObservable interface.

AbstractObservable source code

Better solution? (most likely)

I wrote this code a while ago, before I started this blog. While writing, I followed a link on Wikipedia which I wish I found before. It seems there is a free package available called PerfectJPattern, containing componentized design patterns, including the observer pattern. From the looks of it, they also use a proxy approach. The project looks really well maintained, clean, and thoroughly implemented. Next time I develop in Java, I’ll check it out.