Software Simplexity: 2012

Monday, December 31, 2012

Top 12 on Maven Central

For my project, which I am hoping to share more about soon, I am having a full copy of Maven Central and some other repositories. Since the work I do is related to dependencies I have an list of artifacts in ranking order. I based this ranking on the popularity (number of transitive inbound dependencies) and its weight. Dependencies are calculated per program, using its latest version. There were almost 40.000 programs in the database. This is not an exact science, some heuristics were used. However, having a top ten to close 2012 sounds interesting.

#1 Hamcrest Core — Never heard of it before. It turns out that this is a library that adds matchers to Junit, making test assertions more readable. Its (for me unexpected) popularity is likely caused by JUnit that depends on it (actually embeds it). The number of inbound dependencies is almost equal (27772 versus 27842 for Hamcrest).

#2 JUnit — is a regression testing framework written by Erich Gamma and Kent Beck. It is used by the developer who implements unit tests in Java. It has more than 10000 direct dependent projects and is likely the most dependent upon project.

#3 JavaBeans(TM) Activation Framework — The JavaBeans(TM) Activation Framework is used by the JavaMail(TM) API to manage MIME data. It is, for me, a perfect example of a library that was over designed during the initial excitement of Java. It has a complete command framework but I doubt it is used anywhere. However, the Javamail library did provide a useful abstraction and it depended on the activation framework.

#4 JavaMail API — The illustrious Java Mail library, developed before there even was a Java Community Process. Provides functionality to mail text from Java (which few people seem to know can also be done with the URL class, but that is another story). Still actively maintained since the artifact was updated less than 10 months ago.

#5 Genesis Configuration :: Logging — Provides the common logging configuration used by the build process, primarily used to collect test output into 'target/test.log'. Surprisingly, it has over 20.000 transitive inbound dependencies. Likely caused by the fact that it looks like every Geronimo project depends on it.

#6 oro — I remember using Oro somewhere south of 1999, it was a regular expression library since Java before 1.4 did not support regular expressions. It turns out that Oro was retired 7 years ago and should not be used anymore. Still it has also over 20.000 dependencies. At first sight, many Apache projects still seem to depend on it even though it recommends that the Java regular expressions should be used.

#7 XML Commons External Components XML APIs — xml-commons provides an Apache-hosted set of DOM, SAX, and JAXP interfaces for use in other xml-based projects. Our hope is that we can standardize on both a common version and packaging scheme for these XML standards interfaces to make the lifes of both developers and users easier. The External Components portion of xml-commons contains interfaces that are defined by external standards organizations. Has not been updated for 7 years (I guess XML's heydays are over by now).
#8 OpenEJB :: Dependencies :: JavaEE API — An open source, modular, configurable and extendable EJB Container System and EJB Server. The popularity of this library is likely caused by the fact that it has an inbound dependency of log4j.

#9 & #10 mockobjects:mockobjects-core — A library to make mock objects. It is over 8 years ago when it was updated but it still has more than 20.000 inbound dependencies.

#11 org.apache.geronimo.specs:geronimo-jms_1.1_spec — Provides a clean room version of the JMS specification. Since this ended so surprisingly high up I look where its popularity came from. It turns out that again here log4j is the culprit.

#12 Apache Log4j — Which brings us to the artifact that is pushing all these previous artifacts to greater heights than they deserve. log4j is directly referenced by a very large number of projects. The following image shows its dependency tree:

Why a log library should depend on the Java EE API is a bit of a puzzle. Anyway, happy 2013! Peter Kriens

Monday, December 17, 2012

The Looming Threat to Java

A meteorite likely caused the demise of dinosaurs; since that time we tend to use the term dinosaurs for people that are too set in their ways to see what is coming. Though an awful lot of practitioners still feel Java is the new kid on the block we must realize that the languages is in its mid life after 20 years of heavy use. The young and angry spirits that fought the battle to use Java over C++ have long ended up in the manager's seat. Java today has become the incumbent so, can we can keep on grazing the green and lush fields and not having to worry about any meteorites coming our direction?

In 1996 Applets were the driving force behind Java in the browser. They were supposed to bring the programmability to the browser in an attempt to kill of Microsoft's dominance on the desktop. While applets got totally messed up by Sun due to a complete lack of understanding of the use case (they did it again with Web Start), Java's silly little brother Javascript grew up and has recently become an exciting platform for UI applications. With the advent of the Web Hypertext Application Technology Working Group (WHATWG) that specified HTML 5 we finally have a desktop environment that achieves the dream of very portable code with an unbelievable graphic environment for a large range of devices.

"Great", you think "we support HTML5 and Javascript from our web frameworks. So what's the problem?" Well, the problem (for Java at least) is that AJAX now has grown up and is calls itself JSON. Basically, all those fancy Java web frameworks lost there reason of existence. The consequence of a grown up programming environment in the browser is that the server architecture must adapt or go in extintction. Adapt in a very fundamental way.

One of the primary tenets of our industry is encapsulation. Best practice is to hide your internal data and do access through get/set methods. On top of these objects we design elaborate APIs to modify those objects. As long as we remain in a single process things actually work amazingly well as the success of object oriented technology demonstrates. However, once the objects escape to other processes, the advantages are less clear. Anybody that has worked with object relational mapping (JPA, Hibernate, etc) or communication architectures knows the pain of ensuring that the receiver properly understand these "private" instance fields.You might have a chance in a homogenous system under central control but in an Internet world such systems are rare and will be rarer. Unfortunately, clinging to object oriented technologies has given us APIs that work very badly in large scale distributed systems.

The first time I became aware of this problem was with Java security in 1997. The security model of Java is very object oriented, hiding the semantics of the security grant behind a user defined method call (implies). Though very powerful its cost is very high. Not only is it impossible to optimize (the method call is not required to answer the same answer under the same conditions), it is also virtually impossible to provide the user interface with this authorization information. Though a browser based program cannot be trusted to enforce security, the authorization information is crucial to make good user interfaces. Few things are more annoying than being able to push a button and then be told you're not allowed to push that button. Such an unauthorized button should obviously not have been visible in the first place. Remote procedure calls for such fine grained authorization calls are neither feasible nor desirable from a scalability point of view.

Another, more recent problem is the JSR 303 data validation API. This specification uses a very clever technique to create elaborate validation schemes. Incredibly powerful but due to reliance on inheritance and annotations. When the UI is built in the server, this provides a neat tool but when the UI is executed remotely you are stuck with a lot of obtuse information that is impossible to transfer to the browser where the user can be guided in providing the right input. Simple regular expressions might not be nearly as powerful but are trivial to share between browser and server.

The last example is just plain API design. Most of the APIs I've designed heavily rely on object references. The reference works fine in the same VM but has no meaning outside this VM. Once you go to a distributed model you need to have object identities that can travel between processes. Anybody that needs to provide an API to MBeans knows how painful it is to create a distributed API on top of a pure object oriented API. It requires a lot of mapping and caching code for no obvious purpose. A few weeks ago I tried to use the OSGi User Admin but found myself having to do this kind of busy-work over and over again. In the end I designed a completely new API (and implementation) that assumes that today many Java APIs must be useful in a distributed environments.

To prevent Java from becoming obsolete we must therefore rethink the way we design APIs. For many applications today the norm is being a service in a network of peers, where even the browser is becoming one of the peers. Every access to this service is a remote procedure. Despite the unbelievable increase in network speed a remote procedure call will always be slower than a local call, not to mention the difference in reliability. APIs must therefore be designed to minimize roundtrips and data transfers. Instead of optimizing for local programs I think it is time to start thinking global so we can avoid this up-coming meteorite called HTML5.

Friday, August 24, 2012

About Versions

A version (like 1.2.3) is a remarkably ill defined concept, Wikipedia does not even have a lemma for it and several readers will remember the extensive discussions about version syntax between OSGi and Sun. For an industry where versioning (a word the spelling checker flags) is at the root of their business this comes a bit as a shock. This article tries to define the concept of a version and to propose a versioning model for software artifacts with the idea to start a discussion.

A software version is a promise that a program will change in the future, the version is a discriminator between these different revisions of the same program. A program is conceptual, it represents source code, the design documents, the people, ideas, etc. A program is intended to be used as a library for other software developers or as an application. If we talk about the Apache commons project then it maintains multiple programs, like for example commons-lang. A revision is a reified (made concrete) representation of a program, for example a JAR file. The version of a revision discriminates the revision between all other revisions that exist and promises that this will also be true for future revisions.

This last requirement would be easy to fulfill with a unique identifier, for example a sufficiently large digest (e.g. SHA-1) of some identifying file in/of the revision. This was actually the approach taken in .NET. However, the clients of the program will receive multiple revisions over time; they will not only need to discriminate between the revisions (digests work fine for that), they will in general need to make decisions about compatibility.

The most common model for deciding compatibility is to make the version identifier comparable. The assumption behind is that if version a is higher than version b then a can substitute for b; higher versions are backward compatible with earlier versions. In this model, an integer would suffice. However, a version is a message to the future, it is a promise to evolve the program in time withe multiple revisions.

Since the version is such a handy little place to describe an artifact, versions over time were heavily abused to carry more information than just this single integer. Tiny Domain Specific Language (DSL) were developed to convey commitments to future users. As usual, the domain specific language ended as a developer specific language. Versions are especially abused when they convey variations of the same revision. For example, an artifact for Java 1.4 and the same source code compiled for Java 7. These are not versions, but variations, another dimension.

The lack of de-jura or de-facto standard for versioning made relying on the implicit promises in versions hard and haphazard. Worse of all, it makes it impossible to develop tools that take the chores of maintaining versions out of our hands.

A few years ago a movement in the industry coined semantic versioning. At about the same time the OSGi came out with the Semantic Versioning whitepaper that was based on some identical and some very similar ideas. Basically, these are attempts to standardize the version DSL so tools can take over the versioning chores. Tools are important because versioning is hard and humans are really bad in it. And with the exponentially increasing number of dependencies we are going to loose without tools.

Semantic versions consists of 4 parts (following the OSGi definition):

major - Signals backward incompatible changes.
minor - Signals backward compatibility for clients of an API, however, it breaks providers of this API.
micro - Bug fix, fully compatible, also called patch.
qualifier - Build identifier.

In general the industry is largely consensus on the first three parts, the contention is in the qualifier. Since the qualifier has only one requirement, being comparable and, in contrast with the first three parts, can hold more than digits, it is the outlet for developer creativity. Long date strings, appended with git SHA digests, internal numbers, etc.

The qualifier's flexibility made a perfect candidate to signal the phase change any revision has to go through in its existence. The phase of a revision indicates where it fits in the development life cycle: developers sharing revisions because they work closely together, release to quality assurance for testing, approval from management to make it public, retiring of a revision because it is superseded by a newer revision, and in rare cases withdrawn when it contains serious bugs. The qualifier became the discriminator to signal some of these phases: qualifiers like BETA1, RC8, RELEASE, FINAL etc.

Using the qualifier to signal a phase implies a change after the revision has been quality assured, which implies a complete retest since changing a version can affect resolution processes, which can affect the test results. It also suffers from the qualifiers that invariably pop up with in this model called REALLYFINAL and REALLYREALLYFINAL, and PLEASEGODLETHISBETHEFINALONE. Also, this model does not allow a revision to retire or be withdrawn since the revision is out there, digested, and unmodifiable.

It should therefore be clear that the phase of a revision can logically not be part of that revision. The phase should therefore be maintained in the repository.The process I see is as follows.

It all starts with a plan to build a new revision, lets say Foo-1.2.3 that is a new version of Foo-1.2.2. Since Foo-1.2.3 is a new version, the repository allows the developers to (logically) overwrite the previous revisions. That is, requesting Foo-1.2.3 from the repository returns the latest built, e.g. Foo-1.2.3-201208231010. As soon as possible the revisions should be built as if they are the final released revision.

At a certain point in time the revisions need to be approved by Quality Assurance (QA). The development group then changes the phase of the revisions to be tested to testing or approval. This effectively locks the revision in the repository, it is no longer possible to put new revisions with the same major, minor, micro parts. If QA approves the actual revisions then the phase is set to master, otherwise, it is set back to staging so the development group can continue to built. After a revision is in the master phase it becomes available to "outsiders", before this moment, only a selected group had visibility to the revision depending on repository policies.

If the revision is valid then existing projects that depend on that revision should never have to change unless they decide to use new functionality that is not available in their current dependency. However, repositories grow over time. Currently maven central is more than 500 Gb, contains over 4 million files, more than 40.000 programs and a staggering 350.000 revisions. Most of these revisions have been replaced by later revisions, still a new user is confronted with all this information. It is clear that we need to archive revisions when they are superseded. Archiving must hide the revision for new users, it must still be available for existing users of the artifact.

Last but not least, it is also necessary to expire a revision in exceptional cases when the revision causes more harm, for example a significant security bug, than the resulting build failure.

Summary of the phases:

staging - Available as major.minor.micro without qualifier, not visible in searches
candidate - Can no longer be overwritten but is not searchable. Can potentially move back to staging but in general will move to master.
master - Becomes available for searches and is ok to rely on.
retired - Should no longer be used for new projects but is available for existing references
withdrawn - Revision is withdrawn and might break builds

I am very interested in feedback and pointers.

Peter Kriens

Thursday, July 19, 2012

Xray again

I am having so much fun developing a system from scratch that it is hard to take time away for writing blogs. However, they say I need to do this otherwise I'd be forgotten ... Anyway the reason for this blog is XRay. XRay is a plugin for the Apache Webconsole that provides you with a quick overview of the health of your system. Since I first wrote about it I've added some features because I am using it all the time, this is the best OSGi tool I've ever made. I have the XRay window on my screen all the time. When I make changes in bndtools I see the screen move on my laptop screen in the corner of my eye. When things go wrong, the colors change and you know you have to take a deeper look. It is kind of amazing how much the inner bowels of the framework are alive. The tools has spent me countless hours because the number of wild goose chases are far fewer.

One of the major new features is that it now shows when a bundle is not refreshed. For some reason, one of my bundles sometimes escapes the refresh cycle after a bundle is updated (still have to figure out why). This caused that some services were not wired up (showing up as white services, dashed when requested but not found, see picture) which should have been connected.

When I figured out this was caused by the refresh, I just created a red dotted border around the bundle. Now the symptom is highly visible. Those poor lazily activated Eclipse bundles that are in the starting state (now, that was a bad idea ...) are now also color coded with a slightly lighter touch of orange.

The Javascript libs are now included so it can be used in environments where there is no internet. If you use service based programming, check it out! It saves an amazing amount of time.

Some people reported layout problems because they had hundreds of bundles and few services. Though I do not have a lot of time for this, I improved the layout algorithm. Still could use some improvements. The window also became bigger and better scrollable. It probably still won't work for hundreds of bundles but I am having about 30 bundles and it works surprisingly well.

You can download the bundle from

https://github.com/bnd/aQute.repo/raw/master/repo/aQute.xray.plugin/aQute.xray.plugin-1.0.1.jar

Ok, back to development!

Peter Kriens

Wednesday, June 20, 2012

Modules in JavaScript

Working with JavaScript (JS) (well CoffeeScript) was a bit of a shock. After working with in the late 90's I always felt that JS was an inferior, hard to learn, and badly standardized language that utterly sucked. Over the years I regularly read up on it because JS was used in the place where I had such great hopes for Java. In '98 I wrote an application for Ericsson Telecom completely based on applets on Netscape's IFC. Many aspects were a nightmare but in the end it all worked very nice, the application ran for many years. Then Sun screwed everything up by developing Swing (way to heavy at the time while IFC was doing well). However, most of all they killed Java in the browser with their security warnings before you could run anything. While a native code active-x was downloading in the background, executing a JAR file felt like playing in the Dear Hunter's most famous scene. Goodbye applets.

Welcome JavaScript. If you look today Javascript is quite a mature language and the browser incompatibilities have mostly disappeared (if you ignore previous Internet Explorer versions). Languages like CoffeeScript (that translate 1:1 to JavaScript) have even made the syntax pleasant by removing boilerplate, and standardizing the way you do classes in JavaScript (which is extremely flexible but easy to tangle yourself up in). However, the most interesting thing for me is how conventions have petered out over time to let different JS programs collaborate in the browser. This is the software collaboration problem I've been struggling with over the past decade and it is interesting to see how JS modularity is converging on a service model that is remarkably close to OSGi's service registry.

Modularity is crucial in the browser because the JavaScript can loaded from the HTML, different frames, on XMLHttpRequests, etc. This all in a single namespace. Obviously one has problems then with global variables, especially since the default in JS is to create a global variable. In Java this problem was addressed to use really long names (class names) that always have a unique prefix. In JS they like really short names ($ comes to mind, used in jQuery and Prototype) but they quickly found out that mixing libraries caused fatal clashes.

Scoping in JS can only be done with functions. Coming from Java, JS functions take some getting used to. A function is actually a plain JS object (Map comes closest) with properties but it can be executed as well by using it with parentheses:

function f() { ... }   // defines a function
f()                    // call function
f.a = 3                // set a property in the function object
x = f['a']             // get a property, with alt. syntax

Since the function is the only scoping mechanism, a new module was born. Fortunately, JS is completely recursive so you can nest variables and functions in functions ad nauseum.

If the module is a function, the semantics of the bundle activator can be seen as calling the function. Since JS is executed as it goes along it is easy to call a function:

  function foo() {
     var moduleLocal="x";
     Foo = { bar: function() {return moduleLocal; } }
  }
  foo();
  // Foo.bar() == "x"

For a Smalltalker it is quite natural that a local variable remains in existence long after the function is called. For the common Java programmer it is unsettling as this contradicts the stack model of the language (and has been the subject of fierce debates around Java closures. Bloch et al wanted to stick to final variables, albeit with a better default, while Gafter et al wanted real closures. After all, the name closures is derived that it created a closure around the local scope. With an initialization function, we can provide local names to the name for global variables by passing them as parameters:

  function foo($) {
     var moduleLocal=$("x");
     Foo = { bar: function() {return moduleLocal; } }
  }
  foo(jQuery);
  // Foo.bar() == "x"

Since it is superfluous to define and then call a function, a clever script kiddie came up with the anonymous function. An anonymous function is the declaration of a closure and directly calling it. This limited the pollution of the global namespace (no initialization function name):

(function ($) {...})(jQuery)

The missing piece of the puzzle is dependencies. In the popular CommonJS framework a module specifies its dependencies via a require(name) function. This function takes the name of a dependency (usually related to a file name searched on a module path), loads it if it is not yet loaded and executes it. This model uses lazy initialization, each module makes sure its dependencies are loaded before it runs its activation code by calling require() for each of these dependencies. A problem with lazy initialization is that all modules must be loaded sequentially, you only find out about the need for the next module after you've loaded the previous. Therefore AMD was born. Instead of calling the activator function directly, the module function is registered with its dependencies.

    define( 'Foo', ['Bar', 'Lib'], function(bar,lib) { ... } );

This very interestingly inverses the control flow. Now the module system now load dependencies at its leisure because it can decide when to initialize the module. A bit like Declarative Services in OSGi that also wait until the dependencies are met. In browsers this is not overly useful because loading is usually in the order of the JS files in the html, but the model opens up new possibilities. The most interesting approach I've seen so far is actually part of a JS framework that I had never heard of before: Angular.js. It completely changes your view on how to develop JS applications in the browser. All wiring management (clicking here changes something over there) becomes declarative and the JS code only has to worry about the model. Backed by an extensive set of services to it has the potential to change the change the way we build JS applications. It is built by Google and is used internally though the authors are not allowed to say what for.

Back to modules. From a modular perspective it found an intriguing way to inject services. First, dependencies are specified on instances and not modules. A basic concept in Angular is the model controller, which is a function that is being called to setup the model. This function can be declared as follows:

function MyModelController($scope, $http, other) {
  ...
}

Amazingly, this sets up a dependency to objects in a service registry that have the names '$scope', '$http', and 'other'. This puzzled me to no avail when I first saw this, wondering where the heck it got the names of the dependencies until I completely read the developers manual and found out they use a very clever hack. In (most!) JS implementations you can call toString() method on a function object, giving you source code for the function. This source code is then used to parse out the parameter names, which are then used to establish the dependencies. Though all known browsers support this feature, it is not standardized. It is therefore also possible to provide the dependencies by specifying an $inject property in the function object (really handy, those properties!).

function MyModelController($scope, $http, other) {...}
MyModelController.$inject=['$scope', '$http', 'other'];

Declaring a module is done by including a JS file and in the code of that file registering an object or factory with angular:

    angular.
       module('myModule', ['ngResource']).
       factory('MyService', function($resource){ ... })

The Angular approach is awfully close to the OSGi service registry minus the dynamics. It provides a central point to share and find instances, completely decoupled from the modules they originate from. For me that has always been the greatest benefit of OSGi since this model significantly reduces dependencies between the code of the modules. This is as far as I know the best practice to create reusable components.

Even closer to OSGi is of course Orion from Eclipse. They've implemented a full blown OSGi service registry, including dynamics and isolation. Each module is now a separate HTML file that runs in a headless iframe. Communication is asynchronously through promises (like Futures). This model is identical to the OSGi Service Registry and even uses most of the same method names. Though I am very inclined to like it, it feels like they need to learn from what Angular does to make the registry be less in your face. This was the same problem OSGi had before we had DS and annotations.

It should be clear that a lot of exciting things are happening in the script kiddies world, these kids surely have grown up. It is refreshing to see that they've come up with ways of working that resemble the OSGi service registry.

Peter Kriens

Friday, June 8, 2012

bnd week

Next week Beaulieu will be made unsafe with the bnd(tools) crew. Neil Bartlett (Paremus), PK Sörelde (ComActivity), Bert Bakker (Luminis), Ferry Huberts, Marcel Offermans (Luminis), and Marian Grigoras (Siemems) are coming over to prepare for the next release. Unfortunately, Stuart McCulloch (Sonatype) won't be able to come this time. However, he helped us with a very fresh snapshot release of the maven bundle plugin. It would be highly appreciated if people tested this plugin against their code base. You can find the maven bundle plugin here. Please report any errors or inconsistencies you find on github.
It will be a heavy week, as usual, because there have been a lot of new functions added. For bnd, this actually means I will move bndlib to version 2.0.0. Except for the significantly new functionality, the API has also changed. When bndlib was small, Map<String,Map<String,String>> worked quite well to maintain the manifest information and package attributes. However, in the current code base it was becoming painful. Especially since Java has a naming fetish.


 org.example.Foo.X, org/example/Foo$X, org/example/Foo$X.class,
 Lorg/example/Foo
 .X

and Lorg/example/Foo$X; are all identifying the same class in different contexts. Just imagine how easy it is to confuse these strings. So now bndlib has Parameters, Instructions, and Packages with lots of convenience methods. bndlib is used in ant, maven, sbt, osmorc, bndtools, and other products. Though the number of indirect users is quite large, the developers that program its API is quite small. However, it is a fun library to use if you need to work with JAR files and/or bundles. Some examples:

 File asm = new File("asm.jar");
 Jar jar = new Jar(asm);
 jar.getManifest().write(System.out);

This will output the following manifest:

 Manifest-Version: 1.0
 Implementation-Vendor: France Telecom R&D
 Ant-Version: Apache Ant 1.6.2
 Implementation-Title: ASM
 Implementation-Version: 2.2.2
 Created-By: 1.5.0_04-b05 (Sun Microsystems Inc.)

As manifests go, this is actually quite good. Most people have a significantly more lonely manifests. However, since this is no bundle, we need to add OSGi headers. The following code will set the versions of the bundle version and the version of the org.objectweb.asm packages to 2.2.2. In this case we can use a macro. We could create a special macro, version, for this and use this in the Bundle-Version and Export-Package headers. However, we can also reuse the Bundle-Version header since any header is also a macro. Notice that we use a time stamp on the version to find out about the build date.

 Analyzer analyzer = new Analyzer();
 analyzer.setJar(jar);
 analyzer.setProperty("Bundle-Version", "2.2.2.${tstamp}");
 analyzer.setExportPackage("org.objectweb.asm.*;version=${Bundle-Version}");
 Manifest manifest = analyzer.calcManifest();
 jar.setManifest(manifest);
 jar.getManifest().write(System.out);

This provides the following manifest:

Manifest-Version: 1.0
Export-Package: org.objectweb.asm;version="2.2.2.201206081457",org.obj
 ectweb.asm.signature;version="2.2.2.201206081457"
Implementation-Title: ASM
Implementation-Version: 2.2.2
Tool: Bnd-1.52.2
Bundle-Name: showcase
Created-By: 1.6.0_27 (Apple Inc.)
Implementation-Vendor: France Telecom R&D
Ant-Version: Apache Ant 1.6.2
Bundle-Version: 2.2.2.201206081457
Bnd-LastModified: 1339160222619
Bundle-ManifestVersion: 2
Bundle-SymbolicName: showcase
Originally-Created-By: 1.5.0_04-b05 (Sun Microsystems Inc.)

bnd added defaults for crucial OSGi information that was missing. The name, symbolic name, version, etc. It also copied all the headers from the old jar so that no information is lost. However, most important it calculated the Export-Package header.

Bundle-Version: 2.2.2.201206081456
Export-Package: 
 org.objectweb.asm;version="2.2.2.201206081457",
 org.objectweb.asm.signature;version="2.2.2.201206081457"

So lets save the jar on disk, including the digests so the bundle can be verified:

  jar.calcChecksums(new String[] {"SHA", "MD5"});
  jar.write("asm-2.2.2.jar");

So what more can we do? Lets take some of our own code and create a JAR out of it. The following example takes code from the bin directory, packages it and links it to the asm on disk.

 Builder b = new Builder();
 b.setPrivatePackage("simple");
 b.addClasspath(asm);
 b.addClasspath(new File("bin"));
 Jar simple = b.build();
 simple.getManifest().write(System.out);

The Private-Package will copy any package it specifies from the class path to the Jar. Since the asm on disk has no OSGi headers, we do not get import ranges.

Import-Package: org.objectweb.asm

So lets use the Jar we've just created instead, and lets also export the simple package.

 Builder b = new Builder();
 b.setExportPackage("simple");
 b.addClasspath(jar);
 b.addClasspath(new File("bin"));
 Jar simple = b.build();

Since the asm built jar we added has versions, the imports have version ranges. bnd also calculates the uses constraints on the exported packages:

Export-Package: simple;uses:="org.objectweb.asm";version="1.0.0"
Import-Package: org.objectweb.asm;version="[2.2,3)"

Last, lets say you want to know all the references from a JAR:

 Analyzer analyzer = new Analyzer();
 analyzer.setJar(j3);
 analyzer.analyze();
 System.out.println("Referred    " + analyzer.getReferred());
 System.out.println("Contains    " + analyzer.getContained());
 System.out.println("Uses" );
 for ( Entry<PackageRef, List<PackageRef>> from : analyzer.getUses().entrySet())
   System.out.printf("  %-40s %s\n", from.getKey(), new TreeSet<PackageRef>(from.getValue()) );

Which gives the following output:

Referred    java.lang,org.objectweb.asm
Contains    simple
Imports     [org.objectweb.asm]
Exports     [simple]
Uses
  simple                                   [java.lang, org.objectweb.asm]

This blog could go on forever (ok, for quite a long time); there is quite a lot of useful functionality in the API. However, for normal usage bnd(tools) works best since it has a nicer user interface, integrates with continuous integration, and is tremendously nice to develop with. However, if you find yourself processing jars or OSGi bundles, consider working with the API.

Peter Kriens

Wednesday, May 23, 2012

X-Rays for OSGi

One of my frustrations of my OSGi years is that while OSGi provides this tremendous wealth of dynamic data about the state of the system nobody has taken the time to really visualize it. By far the best tool I know is the Apache Felix Web Console because it provides comprehensive information about the operations. However, even with this tools I am well known to waste hours chasing problems that would have been obvious had I looked at the proper page. Though the Web Console is by far the best tool around to find information, it falls far short of a tool that can inform me when there are problems.

Our intelligence is closely related to our visual brain. We can spot problems in one glance, problems that would take ages to discover in raw data. With a 17" MacBook Pro and an extra 27" screen, I've so much screen space (and many developers have a similar setup) that I gladly devote the Pro's 17" to a continuous X-Ray of my running framework. Alas, such a view does not exist.

Well, it did not exist, past tense. Though not related my venture my frustration made me spent some time on creating such an X-Ray for OSGi. The excuse was that I need to learn Javascript anyway and this seemed an excellent opportunity to learn the intriguing d3 library. I had hoped to finish this last week but Richard S. Hall and Karl Pauls spent a worthwhile week here to discuss their plans and see how we can collaborate. This was much more (intense!) talking than I had figured so I had to postpone the finishing touches till this week.

Since I really like the Apache Felix Web Console I started from there. It turns out that it is trivial to do a plugin for the Web Console since there is a good base class that takes of most of the chores. The Web Console does not only look good, it is also well designed. Adding a plugin is as simple as registering a service. The Abstract Web Console Plugin they provide was an easy way to get quickly started.

So after a few hours I had a basic SVG window with bundles and services. Getting it look nice was the really hard part. I wanted to use the diagramming technique I always used, triangles for services and boxes with rounded corners for bundles. That was the easy part. Getting the wires to run without causing a visual mess was the hard part. The d3 library contains some layout managers but none was suitable for my purpose (though bundles and services look pretty cool in a force directed graph!). In the end I settled on a grid where bundles go vertical and services go horizontal. In this model you can always wire with a horizontal and vertical line that do not meet obstacles. This sounds simpler than it is because the diagramming technique for OSGi requires registering bundles to connect at the sharp side of the triangle, getters at the flat side, and listeners at the angled side. Since wires can overlap you also need a visual cue to see what wires are joined. Getting this right was quite tricky and required me to dust off my trigonometry books. The result looks like the following picture:

What functions did I implement. In some random order:

Objects navigate to the page in the Web Console where they are detailed. Clicking on a bundle takes you to the page for that bundle, clicking on a service takes you to the services page.
The state of a bundle is indicated with color. An orange bundle is happy and active, a grey bundle is resolved, and a white bundle is installed only.
The information is polled from the system and automatically updated. The update is transitioned so uninstalling a bundle causes the bundle beneath it to smoothly crawl up, automatically adjusting any services if necessary. Useless from a functional point of view but pretty cool to see!
You can remove services by dragging them off the screen. Refreshing the window brings them back.
If a bundle has recent errors or warnings in the log then a small warning icon is displayed. Clicking on this icon takes you to the log, hovering over it gives you the log messages. The Log Service must obviously be present for this to work.
If the Service Component Runtime service is registered then the information about DS components is collected and displayed in the bundle. Each component is summarized with a LED that can be red (not satisfied), green (active).
Listener Hooks are used to find out what bundles are waiting for what service. Services that are not present but still are waited for are drawn with a dashed outline. Services that are only registered but not used are displayed white. Active services used by bundles are yellow.

The xray bundle only requires the Web Console, you can download it from github repo. Source code for this plugin is inside the JAR. If you just want to try it out quickly, you can:

$ sudo jpm install xray.demo.jar http://dl.dropbox.com/u/2590603/xray.demo.jar
Installed command [xray]
$ # start/stop bundles with the shell ...
$ xray
-> stop 10 ...

Oops, guess jpm is not ubiquitous yet ... So just download it from the URL and use it like:

$ java -jar xray.demo.jar
-> stop 10 ...

If you got it running, open your browser on http://localhost:8080/system/console/xray and have fun. Feedback very much appreciated! Since I am a committer at Apache Felix I will likely move the code over there once it gets a bit more mature.

Peter Kriens

Wednesday, May 9, 2012

Clustering

One of the fundamental features that I need is a way to distribute tasks reliably over the systems in the cluster. The requirements I have are:

Load balancing, systems should be evenly loaded between the nodes in the cluster.
Persistent, once a task is submitted it should be executed once. If the component can provide this guarantee then transactional results can be achieved without locking.
Transient failures should be handled by trying to re-execute the task.
Periodic tasks, some tasks must happen on a regular basis.
Timed tasks, some tasks should happen after a future time.
Asymmetric clusters, that is, no requirement that each cluster is identical. Certain nodes can handle certain tasks that others potentially cannot.
Low overhead, though it is clear that a persistent queue is needed, the mechanism should be useful for simple tasks.
Support for non-Java languages.

The model I came up with is a Task Queue service. For example, to queue a task that takes a Charge item:

Charge w = new Charge();
w.card = "6451429121212";
w.exp  = "03/12";
w.ccv  = 887;
w.charge = 1200;

TaskData td = taskQueue.with(w).queue();

Queuing a task will persist it first and then finds a cluster to execute it on. I am currently using Hazelcast as the communications library between nodes. Hazelcast has distributed maps with events for inserting and removal. When a new task is added, the node checks if it can handle that task type, if it can it will queue it locally. When the task is ready for execution it is removed from the distributed map, the first one wins and executes the task. So far I really like Hazelcast because it is very cohesive library and seems to provide straightforward solutions in a really complex area.

Connecting the workers with the task is done with one of my favorite mechanisms: the white board. A worker registers a Worker<T> service, where T is the type of task data it can receive. A worker looks like:

@Component
public class CardWorker implements Worker<Charge> {
  public void execute(Charge charge) throws Exception {
    ... take your time
  }
}

This model allows nodes to differ, not all nodes have to implement all types of workers. This is especially important for rolling updates where different versions must run at the same time. It also automatically load balances the tasks between the different systems that have the appropriate types.

For these systems, the successful execution is usually not that hard to code; the error handling is the hard part. Especially, if you also want to keep things efficient. And if you think that is difficult then wait until you actually have to test many of the possible error scenarios!

The component does its basic work at the moment and it was very satisfying when I saw a task being executed after I installed a new bundle that provided the appropriate worker type.

The Task Queue component is fully based on the ideas sketched earlier that basically forbid the use of objects between systems. This in general works better than expected and provides many benefits. Lessons are being learned as well but those will be discussed in another blog.

Peter Kriens

Friday, April 27, 2012

Picking a NoSQL Database

Relational databases and me never got along very well. I think this is caused by the impedance mismatch between objects and the relational model. The fact that the child (e.g. the order line) refers to the parent (the order) has always seem bizarre, the natural order of traversal is parent to child, the direction we know from the object and network models. This reversal causes a huge accidental complexity in object oriented languages, add to this the required type conversion and the need to spread what is basically one entity with some fully owned children over many different tables, each with their associated maintenance nightmares. And spending 14.5 Mb in code memory and untold megabytes of dynamic memory for each of your customers just to run an ORM like JPA has always stroke me as a bit, well, not right. The embedded world goes out of their ways to minimize runtime costs.

Now I am old enough to remember CODASYL and their networked database model, but I largely witnessed their demise against the relational model. I've also witnessed the attempts of the object oriented databases like Gemstone and others. It is clear that these models failed while it is hard to deny that relational databases are widely used and very successful. I do realize that the discipline required and type checks part of the relational model have an advantage. I also see the advantage of the maturity. That said, I really think ORMs suck to program with.

So in my new project I decided to start with a NoSQL database. This is a bit of an unfortunate name because one of the things I really like about the relational model is the query language , which happens to be called SQL. Many NoSQL do not have a query language and that is a bit too sparse for me.

So what I am looking for is a store for documents with a query language. These documents will have lots of little data items that will likely vary in type over time. The (obvious) model is basically a JSON store. It should be easy to create a collection of JSON documents, store them, retrieve them efficiently on different aspects (preferably with a query language) and apply changes, preferably partial updates. On top of that I expect horizontal scalability and full fault tolerance.

Though I strongly prefer to have such a store in Java since it integrates easier with OSGi I found a database that looks on paper to fit the bill: mongodb. Except for the awful name (sometimes you wonder if there is actually a need for marketing people), it offers exactly what I need. The alternatives do not look bad but I really like the 100% focus on JavaScript.

Clearly JavaScript is now the only language available on the browser and it is geared to play a much larger role in the server. If you have not looked at JavaScript for the last two years, look again. It is incredibly impressive what people are doing nowadays in the browser and in also in the server. It seems obvious that we're moving back to fat clients and the server will only provide access to the underlying data, I really fail to see any presentation code on the server in the future. Since JSON is native to JavaScript is quickly becoming the lingua franca of software.

Unfortunately, the Mongodb Java API does not play very nicely. Since Mongodb uses JavaScript as the interface it is necessary to interact with special maps (DBObject). This creates really awkward code in Java. Something that looks like the following in Javascript:

> db.coll.insert( type: "circle", center : { x:30, y:40 }, r:10 )
> db.coll.find( { "center.x":{ $gt: 2 } } )

Looks like the following in Java:

  BasicDBObject doc = new BasicDBObject();
  doc.put("type", "circle");
  doc.put("center", new BasicDBObject("x", 30).append("y", 40));
  doc.put("r", 10);
  db.getCollection("coll").insert(doc);

  BasicDBObject filter = new BasicDBObject();
  filter.put("center", new BasicDBObject("$gt", 2));

  for ( DBObject o : db.getCollection("coll").find(filter) ) {
      System.out.println(o);
  }

Obviously this kind of code is not what you want to write for a living. The JavaScript is more than twice as concise and therefore better readable. And in this case we do not get bonus points for type safety since the Java code reverts to strings for the attributes. Not good!

So to get to know Mongodb better I've been playing with a (for me) better Java API, based on my Data objects. Using the Java Data objects as the schema enforces consistency throughout an application and helps the developers use the right fields. So the previous example looks like:

public class Shape {
  public String type;
  public Point  center;
  public int r;

  // ... toString, etc
}

Store store = new Store(Shape.class,db.getCollection("coll"));

Shape s = new Shape();
s.type = "circle";
s.point = new Point();
s.point.x = 30;
s.point.y = 40;
s.r = 10;

store.insert(s); 
for ( Shape i : store.find("point.x>2")
  System.out.println(s);

Though maybe not as small as the JavaScript example it at least provide proper data types and completion in the IDE. It also provides much more safety since the Store class can do a lot of verification against the type information from the given class.

So after spending two days on Mongodb I am obviously just getting started but I like the programming model (so far). They key issue is of course how does it work in practice? I already got some mails from friends pointing to disgruntled mongodb users. Among the reviewers there are many Pollyanna's not hindered by their lack of knowledge but there are also some pretty decent references that backup their experiences with solid experience. Alas, the proof of the pudding is in the eating.

Peter Kriens

Wednesday, April 25, 2012

I've Sinned

"Forgive me father, I've sinned against the first law of software: Thou shall reuse." I've written a JSON codec. Yes, I known there are already lots of them around (probably too many already) but out of the myriad of JSON converters I could not find any one that had the qualities I wanted.

I needed JSON because a few months ago I wrote a blog about what is called Data objects (the C struct). I've come to the firm conclusion that objects and classes are nice but not between processes. Objects between process just do not work.

The problem is intrinsic to the object oriented paradigm. Classes provide modularity (hiding of private data) and this is very beneficial in a single program. However, once objects are exchanged between processes there is an implicit requirement that the classes in both processes are compatible. If they are identical then there is no issue because any hiding is properly maintained. However, any difference between the classes requires an agreement between the processes about the private details. In a distributed system this is very hard to guarantee at all times since systems must be updated and you rarely can bring down all instances in a cluster. Ergo, the modularity of the class is no longer effective since private data can no longer be changed without affecting others, which implies the loss of all modularity advantages. So we have to get over it and live with public data on the wire.

The dynamically typed languages have a significant advantage in a distributed world since extra fields do not harm and a missing fields are easy to detect. Where type safety can provide significant advantages inside a program, it seems to be in the way when we communicate. Though I like Javascript (I am a Smalltalker from origin) I've come to like Eclipse and its extensive support that it can provide due to the type information. Can we combine the best of those worlds?

Clearly, JSON is the current standard in a distributed world. JSON is mostly simple, it only provides numbers, strings, arrays, and maps. Any semantics of this data are up to the receiver. This is very different from XML or ASN.1 where detailed typing is available. The advantages of this simplicity is that it is easy to get something working. The disadvantage is of course that it is also very easy to break things.

The Java JSON libraries I looked at all had a design had an impedance mismatch between Java and the JSON data. As I said, I like Eclipse's powerful support and want to use Java types, I do not want to use maps or special JSON lists in Java, they are awkward to use. Using foo.get("field") combines the worst of all worlds. What I needed was a converter that can take a Data object (an object with only public fields) and turn it into a JSON stream and take a JSON stream and turn it into a Data object. Fields in this object must be able to handler other data objects, primitives, most value objects, objects that can be constructed from their toString() method, enums, collections, maps, and some special cases like byte[], Pattern, and Date. And this all in a type safe way?

Yes we can!

It turns out that the Java type system can be a tremendous help in generating the JSON (this is really almost trivial) but it is also extremely useful in parsing. The object's fields can provide extensive type information through reflection and this can be used to convert one of the 4 basic JSON types (strings, numbers, lists, and maps) to to the field type. Since the class information is available there is no need for dynamic class loading, evil in an OSGi world. It also works surprisingly well with generics. Though every Java developers know that types are erased with generics, all generic type information about fields, methods, interfaces, and classes is available reflectively. Erasure is only about instances, from an instance you cannot find out its generic parameters. For example, take a Data object like:

public class Data {
  List<Person> persons;
}

The persons field provides the full generic information that that this is a List with instances of the Person class. When the JSON decoder encounters the persons field with a JSON list it can find out that the members of that list should be Person instances. For each member of the JSON list it will create an instance and parse the member. Obviously this is all working recursively. For example:

public enum Sex { MALE, FEMALE; }
public class Person {
  public String name;
  public Sex    sex;
  public List<Person> children = new ArrayList<Person>();
}

JSONCodec codec = new JSONCodec();
Person user = getUser();
String s = codec.enc().put( user ).toString();

// {"name":"Peter","children":[
//     {"name":"Mischa","sex":"FEMALE"},
//     {"name":"Thomas","sex":"MALE"}
//   ],
//   sex":"MALE"
// }

Person user = codec.dec().from(s).get( Person.class );

So what did I learn? First that the primitives and Java numbers absolutely totally suck, what a mess. It feels like someone with a lot of knowledge set out to make fluid use of different number types as complex as possible. The compiler can hardly do any type coercion but with reflection you better get the types exactly right and the only way to do it is to have special cases for each primitive and wrapper types. #fail

The other thing I learned was that it makes a lot of sense to stream JSON records. Initially I had in my mind to use the envelope model. If I had multiple persons than I would create a List of Person. However, in this model you need to parse the whole before you can process the first element. It turns out to work much better to sequence the Person objects. One of the nice things about JSON is that the parser is very simple and does not have to look beyond what it needs. Sequencing records also allows earlier records to provide help in parsing later records. For example, the first record could contain the number of records and maybe protocol version. It also works very well for digests, signing, and getting the class name that is needed to parse the next record.

So yes, I've sinned against the first law of reuse because I am confident that somebody will point out that there already exists a library out there that does exactly what I described. Well, it wasn't too much work and actually really like what I've got, this model turns out to work extremely well.

Peter Kriens

Thursday, April 12, 2012

Crawling to Run

In the previous post I explained how I needed jpm (Just Another Package Manager) so that I could install commands and services on a freshly minted EC2 instance. That works now fine and I am very excited about this.

The next step is to get my OSGi framework running. However, jpm is very consciously kept unrelated to OSGi, there are no dependencies except some metadata reuse since I do not have the time nor inclination to reinvent well defined applicable metadata. That said, jpm is about standard off the shelf nothing special JARs. Its only purpose is to make any JAR with a Main-Class header run on any operating system in the world without forcing the developer to learn how install services and commands for each operating system.

So if jpm is agnostic of OSGi, how do I get my framework running as an OS service or command?

Last year EclipseCon Karl Pauls was nagging me for a function in bnd to create an executable out of a bnd or bndrun file. In bnd(tools), you can launch a bndrun file so in principle it is possible to create a single JAR that you can then run from the command line with java -jar x.jar command. I started and as too many things it moved too soon to the background for anything useful to come out of it. However, it is also slightly different when you need it yourself, and I need this now. So this week I perfected the bnd package command. If you can develop a framework application in bnd(tools) then you can now turn it into an executable jar (once this bnd will hit the streets of course).

Once you have this executable JAR, jpm can then be used to installed it as a command or a service. I created a short video that demonstrates how I plan to make this all work:

This starts to look pretty promising (nothing was faked). The video was only edited to protect my thick slow fingers that try to get used to my new keyboard. However, don't start using it yet because it needs some chore work industrializing it (only used this on MacOS yet) and I need the freedom to make significant changes. Anyway, feedback appreciated and anybody that wants to spent some serious effort (test cases, windows) on this is welcome to contact me.

Peter Kriens

Friday, April 6, 2012

Just another Package Manager

The first thing you need to do when you go into the cloud is have a plan. Well, have been brooding for the last three years so there is no lack of ideas. Second thing is to get a cluster running. Obviously it is not that hard to get some process going on a Linux machine and start them. However, I ran into my by far greatest frustration with Java. Java is supposed to be "Write once, Run anywhere," but in practice they leave out this perky little detail of deployment. Java is amazingly poor in bridging that last mile. In the last year I've done some playing with node.js and npm. Let me show you a session that makes me envious:

 ~$ lessc base.less
 -bash: /usr/local/bin/lessc: No such file or directory
 ~$ sudo npm -g install less 
 /usr/local/bin/lessc 
 -> /usr/local/lib/node_modules/less/bin/lessc
 less@1.3.0 /usr/local/lib/node_modules/less 
 ~$ lessc base.less
  #header { ... }

Why on earth can't we do this in Java? Now, installing npm might be painful? Nope:

 curl http://npmjs.org/install.sh | sh

It is terribly frustrating that these javascript kiddies (kidding, node.js is very interesting!) can do this and we the enterprise Java developers can't! Something I ran into when I tried to launch my image on Amazon EC2.
At Amazon EC2 you can use a standard image with their variation of Linux. They provide one mechanism to customize the start-up to add your software, this goes through the user-data, that is specified during instance configuration. For the standard Amazon Linux image, this user data can also be a shell script that is run after the end of a reboot; the perfect place to initialize or synchronize your system. How would my user-data shell script look like in a perfect world?

  #!/bin/sh
  jpm || (curl http://jpm4j.org/install.sh | sh)
  jpm libsync http://www.example.com/system/config.jpm
  jpm schedule --hourly mywatchdog 
  jpm start myservice
  myapp

This jpm (abbreviation of Just another Package Manager) command should store the files in an appropriate place, it should be able to add commands to the shell path, install services, and schedule. Obviously, all these things are different today in MacOS, Linux, IBM mainframes, Windows, and other places where Java is running. The jpm command should therefore know about these differences and act as a good guest in the host environment. For example, on the Mac it should store the repository in /Library/Java/PackageManager but on Linux /var/jpm seems more appropriate. Same as user local files should be stored in Library on the Mac and not in the common Unix ~/.jpm directory. Daemons should follow the local rules and be manageable from the native management tools.
It should provide the following requirements:

Applications should not require special knowledge of the platform or jpm
Act, look, and feel native
Manage a local repository of jars (either apps or libs)
Allow (limited) use without being the admin or root
Install command line apps written in Java
Install services/daemons written in Java
Schedule services or apps in time
Monitor the Java services
Install as GUI app (with proper icon and clickable)
Install from the web
Support at least any system with a basic *nix shell, MacOS, and Windows
Allow others to contribute new platforms
Support JAR signing
Full uninstall
Simple & small

A wise man would have given a deep sigh, and spent his time on writing the scripts to run this on Linux. Since I am not a wise man and this is such a great frustration for me of Java I incubated jpm this week. Given the short time, and some of the false directions I took, it is not yet ready for consumption. However, I am using it on Linux and MacOS. It is not yet ready for others to play with it yet but if there are people interested in adapting it to Windows and other environments then I am more than willing to talk if you have the time to work on this. (A student project? I can provide mentoring.) It is ASL 2.0 licensed and it will probably end up at bndtools.

Peter Kriens

Monday, April 2, 2012

Hello

Welcome to this blog about Software Simplexity. Since I started developing software in the late seventies I've been on a quest for better ways to do this job, mostly through chasing reuse. I have advocated objects since the early eighties and worked for the OSGi Alliance for a decade. The Alliance work was a fantastic adventure. We've been able to develop a novel architectural paradigm that over time will have a significant influence on our industry.

The last few years were spent adapting the technology for the mainstream (Enterprise) developers to solve their problems. However, many of the problems we solved could just not exist in the originally simple programming model. Since these additions made it a lot easier to use OSGi with legacy code, they created a certain amount of uneasiness with me. A feature for one is a conceptual burden for another; I like simple which usually means small or concise.

Adapting OSGi to the mainstream development is therefore not my ambition. Over the past few years I actually came to the conclusion that there are simpler ways to write (enterprise) software than many do today. However, sitting in my ivory tower grumbling about these 'stupid' enterprise developers is not very satisfying either.

A few years a ago I saw a business opportunity to work closer with non-OSGi mainstream developers on modularity. This venture, which I like to keep confidential for the moment, requires a cloud based system. I know it would probably be wiser to select some existing (web) framework and work from there but I like to build things myself. Especially since there are a number of things happening in the industry that seem to fundamentally change the architecture of systems: the browser becoming a (very) rich client, cloud based systems, and of course OSGi maturing. Though I could go off the radar and work on this in secrecy, I've decided to be brave and record my progress (and likely struggles) in this blog.

So if you're interested in seeing my progress and struggles, follow this blog! If you have feedback, do not hesitate to respond with comments. Just one thing, I am not interested in big frameworks though pointers to cohesive (small) libraries are always welcome.

Peter Kriens