Downcasting, what’s up with that?

Just a short post today, or a question rather, on what must be the easiest subject I’ve written about yet. Though, I’ve caught myself making errors against it often. Nowadays I’m extra cautious when writing about inheritance.

What do you think about first when you hear “up”?

graph upviagra-pills

What do you think about first when you hear “down”?

down graph.png

 

Now, how do we programmers interpret upcasting and downcasting?

updown

You upcast to an object which can do less, and you downcast to an object which can do more.

What’s up with that?

UPDATE: I have since started a discussion on this on Programmers Stack Exchange.

Massive-scale Online Software Development

Cutting right to the chase: Would it be possible to create software, entirely developed and moderated by an open community?

Call it democratic software development, or open source on steroids if you will. While discussing this the default answer I usually get is “it can’t be done”, which is why I gladly filed this post under the newly created category “Crazy Ideas”. Nevertheless, I find it a valuable exercise to discuss any nutcase ideas, in order to evaluate how far-fetched they actually are.

A person with a new idea is a crank until the idea succeeds. – Mark Twain

So what would such a system look like? What would be some of the requirements?

Easily accessible.
And when I say easy, I mean it. You shouldn’t have to download the repository first. You shouldn’t have to set up a development environment. It should run as a web service on the cloud (more buzzwords coming up!). A user account and an internet connection is all you need to get going.

Motivate people to participate.
Ever heard of gamification? “Gamification is the use of game design techniques and mechanics to solve problems and engage audiences.” If you are a software developer, chances are you ended up on Stack Overflow at some point. It’s a Q&A site for programmers which is quickly becoming one of the main resources for help for professional programmers. Stack Overflow incorporates many aspects of gamification, and it’s mere existence shows the power of it. A significant amount of developers is prepared to share and learn in this fun environment. Quality content is pushed to the top via a voting system, while erroneous posts are addressed by the community.

Divide work in small enough tasks.
The key to dividing work across many people is to divide it in such a way that any person only has to implement one small aspect of it at a time. Traditional software development where somebody develops a feature from a to z won’t work. One programming paradigm which at first sight seems extremely suitable for this is functional programming. A person could implement functions, and define new functions on which he relies. Combining this with aspects from test driven development where the caller has to comment and write tests for the desired function would result in automated testing.

Without worrying about the specifics too much (it’s just a nutcase idea after all) consider what the possibilities would be. In Luis von Ahn’s great TED talk the CAPTCHA inventor discusses how he re-purposed CAPTCHA in order to digitize books. Around 2,5 million books a year can be digitized through this massive-scale collaborative effort. Their next project indicates this doesn’t have to be limited to really mundane tasks. They are now working on translating the web!

Moderation guided by conventions.
Conventions are important in a group effort. Unfortunately, when discussing programming conventions people most often discuss naming and formatting conventions, while there are plenty of other important conventions to agree on. This will most likely be the topic of one of my future posts. Conventions should be as unambiguous as possible in order to know where to expect a certain piece of code, or where to place it. Conventions like these could be agreed upon through a democratic process, which seems to be working pretty well for Stack Overflow through its meta site. This allows for community moderation following the guidelines established by the community.

Couple all the separate work together into one entity.
Going from a set of loosely coupled functions to a working library would result in plenty of extra challenges, but also opportunities. Since nobody wants an all encompassing library just to use part of its functionality, the system should allow you to extract just the functionality you are interested in.

Beyond the idea

Well, … I went a bit further and attempted to start a small proof of concept. I figured the Stack Exchange platform on which Stack Overflow runs already encompasses much of the desired functionality, and creating a small scale library on it would be possible.  The idea was to create an extension method library for C#, which exists primarily out of a set of functions. Requesting new Stack Exchange sites is possible through Area 51. Not unexpectedly, my idea got shot down since it doesn’t fit the intended Q&A format. Oh well, … one can only try.

UPDATE:

At the UIST 2014 conference, Thomas D. LaToza presented “Microtask Programming: Building Software with a Crowd”a system encompassing many of these ideas and actually evaluating them, resulting in usable code. The paper is available on ACM.

The code formatting fallacy

Time to stir my category cloud on the right a bit, and publish a post under a category I’ve so far only used once. Hopefully it will become clear what Language Oriented Programming (LOP) has got to do with code formatting after this post.

There are several holy wars on code formatting:

… and  many more.

All these wars are about how text should be formatted. The very notion of code being text is so ingrained in programmers, that they often can’t think outside the box. What if the actual problem is code being text?

The problem is that text editors are stupid and don’t know how to work with the underlying graph structure of programs. But with the right tools, the editor could work directly with the graph structure, and give us freedom to use any visual representation we like in the editor. – Sergey Dmitriev, cofounder and CEO of JetBrains Inc.

This is just one of many interesting statements in the must-read article by Sergey introducing LOP. Once you consider the possibility of separating code and it’s representation, most of the code formatting discussions become obsolete.

  • Identifiers can contain spaces, or can be represented differently based on preferences. Heck, they could be icons in a diagram.
  • Visually separating a piece of code for readability would be possible without having to split into smaller functions.
  • Multiple exit points of a function can be visualized in a diagram.

… and you can think of many more.

Modern IDEs like Visual Studio are slowly adopting alternate visualizations for code made possible by using extensions. As useful as some of these extensions are, they inherit the limitations of having to work on top of text.

If the progress on JetBrain’s Meta Programming System (MPS) is any indication of how feasible LOP will be in the future, my guess is we will be hearing a lot more from this paradigm in the years to come. A new actionscript editor, Realaxy, seems to be more than capable of competing with existing editors, and is built entirely on top of MPS.

UPDATE: Markus Voelter announced mbeddr on his blog, the C language made extensible thanks to MPS.

UPDATE: It seems the Realaxy website no longer exists (I don’t know why), thus I linked to the wikipedia article instead.

Like me you might not have time to try out MPS properly, but I can highly recommend subscribing to their blog. New screencasts are appearing at a regular basis, showcasing features ranging from implementing new language keywords to creating a visual editor for a state machine.

Attribute metabehavior

Attributes in .NET can be used to add metadata to language elements. In combination with reflection they can be put to use in plenty of powerful scenarios. A separate parser can process the data added and act upon the annotations in any way desired. A common example of this is a serializer which knows how to serialize an instance of a class based on attributes applied to its members. E.g. a NonSerialized attribute determines the member doesn’t need to be serialized.

In my previous post I promised I would show some interesting scenarios where the mandatory enum values in my solution, needed to identify different dependency properties (DPs), can be put to good use. It is not required to grasp the entire subject previously discussed, but it does help to know about DPs and their capabilities. Simply put, they are special properties (used by XAML) which can notify whenever they are changed, allowing to bind other data to them. They can also be validated (check whether a value assigned to them is valid or not) and coerced (adjusting the value so it satisfies a certain condition). I consider these capabilities to be metadata, describing particular behavior associated with a given property. After extensive tinkering I found a way to express this behavior in an attribute, but it wasn’t easy. Ordinarily WPF requires you to pass callback methods along when ‘registering’ the DPs, which implement the desired behavior.

Two solutions can be considered:

  1. The parser knows how to interpret the metadata and acts accordingly.
  2. The attribute implements the actual behavior itself, and is simply called by the parser.

Solution 1 is the easiest to implement, and is how attributes are used most often. Solution 2 however has a great added benefit. It allows you to apply the strategy pattern. Unlike solution 1, the parser doesn’t need to know about the specific implementation, but only the interface. Additional behaviors can be implemented and used without having to modify the parser. In contrast to simple metadata, an actual behavior is attached, hence I am dubbing this metabehavior. I will discuss the loopholes you have to jump through to achieve this in a next post. For now, consider the following examples to see how it could be used.

A regular expression validation of a dependency property can be used as follows:

[DependencyProperty( Property.RegexValidation, DefaultValue = "test" )]
[RegexValidation( "test" )]
public string RegexValidation { get; set; }

Coercing a value within a certain range, defined by two other properties can be used as follows:

[DependencyProperty( Property.StartRange, DefaultValue = 0 )]
public int StartRange { get; set; }

[DependencyProperty( Property.EndRange, DefaultValue = 100 )]
public int EndRange { get; set; }

[DependencyProperty( Property.CurrentValue )]
[RangeCoercion( typeof( int ), Property.StartRange, Property.EndRange )]
public int CurrentValue { get; set; }

The greatest thing about all this, is that new behaviors can be implemented by extending from ValidationHandlerAttribute and CoercionHandlerAttribute respectively, albeit with some added complexities due to the limitations of attributes which I will discuss later.

Casting to less generic types

… because yes, there are valid use cases for it! Finally I found the time to write some unit tests, followed by fixing the remaining bugs, and am now excited to report on the result which effectively allows you to break type safety for generic interfaces, if you so please. Previously I discussed the variance limitations for generics, and how to create delegates which overcome those limitations. Along the same lines I will now demonstrate how to overcome those limitations for entire generic interfaces.

Consider the following interface which is used to check whether a certain value is valid or not.

public interface IValidation<in T>
{
    bool IsValid( T value );
}

Notice how T is made contravariant by using the in keyword? Not only values of type T can be validated, but also any extended type. This recent feature of C# won’t help a bit in the following scenario however. During reflection you can only use this interface when you know the complete type, including the generic type parameters. In order to support any type, you would have to check for any possible type and cast to the correct corresponding interface.

object validator;  // An object known to implement IValidation<T>
object toValidate; // The object which can be validated by using the validator.

if ( validator is IValidation<string> )
{
    IValidation<string> validation = (IValidation<string>)validator;
    validation.IsValid( (string)toValidate );
}
else if ( validator is IValidation<int> )
{
    IValidation<int> validation = (IValidation<int>)validator;
    validation.IsValid( (int)toValidate );
}
else if ...

Hardly entertaining, nor maintainable. What we actually need is covariance in T, or at least something that looks like it. We want to treat a more concrete IValidation as IValidation. For perfectly good reasons covariance is only possible when the type parameter is only used as output, otherwise objects of the wrong type could be passed. In the given scenario however, where we know only the correct type will ever be passed, this shouldn’t be a problem.

The solution is using a proxy class which implements our less generic interface and delegates all calls to the actual instance, doing the required casts where necessary.

public class LessGenericProxy : IValidation<object>
{
    readonly IValidation<string> _toWrap;

    public LessGenericProxy( IValidation<string> toWrap )
    {
        _toWrap = toWrap;
    }

    public bool IsValid( object value )
    {
        return _toWrap.IsValid( (string)value );
    }
}

With the power of Reflection.Emit, such classes can be generated at runtime! RunSharp is a great library which makes writing Emit code feel like writing ordinary C#. It’s relatively easy (compared to using Emit) to generate the proxy class. The end result looks as follows:

object validator;  // An object known to implement IValidation<T>
object toValidate; // The object which can be validated by using the validator.

IValidation<object> validation
    = Proxy.CreateGenericInterfaceWrapper<IValidation<object>>( validator );

validation.IsValid( toValidate ); // This works! No need to know about the type.

// Assuming the validator validates strings, this will throw an InvalidCastException.
//validation.IsValid( 10 );

Of course the proxy should be cached when used multiple times. Originally I also attempted to proxy classes instead of just interfaces by extending from them. This only works properly for virtual methods. Since non-virtual methods can’t be overridden there is no way to redirect the calls to the required inner instance.

Source code can be found in my library: Whathecode.System.Reflection.Emit.Proxy.

Beyond private accessibility

Surely you know about public, protected, internal and private. They are access modifiers often used in Object-Oriented Programming which indicate from where members can be accessed. If you’ve been following this blog you might have noticed I am a big fan of encapsulation. Previously I discussed how anonymous functions can be used to encapsulate reuseable logic within the scope of one function. This inspired me to look for more ways in which lambdas can be used for improved encapsulation.

Private access is the least permissive access level. Private members are accessible only within the body of the class or the struct in which they are declared. – msdn (.NET)

Some languages go a step further and support static locals, which allow a value to be retained from one call of a function to another with a static lifetime. The variable can only be accessed from within the function scope where it is declared. I have yet to find a language which supports the same, but with instance lifetime. Such a variable would only be visible to a single method within a single instance.

Function private?

Let’s call such a hypothetical accessibility ‘function private’. When would you use it? Consider those annoying light bulbs that burn out just as they’re turned on.


The _lifeTime variable is only relevant to the SwitchOn method. Notice how C# already offers a way to write the IsOn property in a concise way by using an auto-implemented property. This actually already limits the scope of its backing field beyond private! The auto-generated backing field can only be accessed from the getter and setter.
class LightBulb
{
    public bool IsOn
    {
      get;
      private set;
    }

    private int _lifeTime = 100;
    public void SwitchOn()
    {
        if ( _lifeTime - 1 >= 0 )
        {
            --_lifeTime;
            IsOn = true;
        }
    }

    public void SwitchOff()
    {
        IsOn = false;
    }
}

What if the following would be possible? It would encapsulate lifeTime just to the scope where it’s needed.

public void SwitchOn()
{
    private int lifeTime = 100;  // function private variable
    // private static int staticVar;  // add 'static' to create a static local

    if ( lifeTime - 1 >= 0 )
    {
        --lifeTime;
        IsOn = true;
    }
}

Lambda scope

A lambda is something peculiar. It can be used to create delegates, but how do they work? What happens behind the scenes? I asked the question on Stack Exchange, and got some really insightful answers. Relevant to this discussion are the following two points:

  • The method that backs the delegate for a given lambda is always the same.
  • The method that backs the delegate for “the same” lambda that appears lexically twice is permitted to be the same, but in practice is not the same in .NET 4.0.

The fact that every lambda has a unique backing method gives the possibility to use it as an identifier of the scope where it is defined.

Unsafe implementation

By ab(using) this behavior I was able to create the following working implementation. The Private class creates and keeps track of its instances by linking them to the delegates passed to the static constructor methods. The first time the method is called the instance is created. Subsequent calls the instance is retrieved by doing a lookup in static dictionaries based on the passed arguments.

public void SwitchOn()
{
    Private<int> lifeTime = Private<int>.Instance( () => 100, this );  // function private
    // Private<int> staticVar = Private<int>.Static( () => 0 );  // static local

    if ( lifeTime.Value - 1 >= 0 )
    {
        --lifeTime.Value;
        IsOn = true;
    }
}

Why is it unsafe?

  • This behavior is not guaranteed to be supported in later versions of .NET. As mentioned before: “The method that backs the delegate for “the same” lambda that appears lexically twice is permitted to be the same, but in practice is not the same in .NET 4.0.”
  • Objects which use the code as it is now are never garbage collected. Using weak references could solve this issue. It seems a dictionary with weak references already exists.
  • There is a small performance overhead doing the dictionary lookups. Ideally this feature would be supported at compile time.

Improved encapsulation using lambdas

Let’s take a sidestep from the more advanced runtime code generation topic (don’t worry, still more to follow later) and move to easier territory. On a blog about software design, what better topic than encapsulation.

Abstraction, information hiding, and encapsulation are very different, but highly-related, concepts. One could argue that abstraction is a technique that helps us identify which specific information should be visible, and which information should be hidden. Encapsulation is then the technique for packaging the information in such a way as to hide what should be hidden, and make visible what is intended to be visible. – Edward V. Berard

Modern object-oriented languages start to lend features from functional programming languages, like lambdas. They can be used to create anonymous functions using a very concise syntax. One of the often overlooked advantages of anonymous functions beside its conciseness is they can be used to encapsulate logic even further than a private method. When making a method private, it can be accessed by the entire scope of the class, while anonymous functions can only be accessed from the scope they are created in, or passed to.

Consider the following highly inefficient “Hello World!” implementation:

        public void HelloWorld()
        {
            string[] words = new[] { "hello", "pretty", "world" };

            Console.WriteLine(
                ComplexHelloWorldParsing( words[ 0 ] ) +
                words[ 1 ] +
                ComplexHelloWorldParsing( words[ 2 ] ) );
        }

        private static string ComplexHelloWorldParsing( string input )
        {
            string parsed = input;
            // ... plenty of complex specific parsing, only used in HelloWorld.
            return parsed;
        }

ComplexHelloWorldParsing is – and let’s assume should – only be used inside the HelloWorld method. Still it is visible to the entire class. In such a scenario, where encapsulating the specific behavior doesn’t make sense, I would use the following approach regardless of lines of code inside the delegate.

public void HelloWorld()
{
    string[] words = new[] { "hello", "pretty", "world" };

    Func complexParsing = s =>
    {
        string parsed = s;
        // ... plenty of complex method specific parsing.
        return parsed;
    };

    Console.WriteLine( complexParsing( words[ 0 ] ) + words[ 1 ] + complexParsing( words[ 2 ] ) );
}

It surprises me not all people agree on this advantage. For the same reason some professional programmers prefer to split functions just to make them smaller, some state lambdas should always be short. They state longer logic should (by a rule of thumb) be placed in a private method. If you know any pro ‘short delegate’ arguments, be sure to let me know. I started a question relating to this topic on Programmers Stack Exchange.