Friday, April 27, 2012

Picking a NoSQL Database

Relational databases and me never got along very well. I think this is caused by the impedance mismatch between objects and the relational model. The fact that the child (e.g. the order line) refers to the parent (the order)  has always seem bizarre, the natural order of traversal is parent to child, the direction we know from the object and network models. This reversal causes a huge accidental complexity in object oriented languages, add to this the required type conversion and the need to spread what is basically one entity with some fully owned children over many different tables, each with their associated maintenance nightmares. And spending 14.5 Mb in code memory and untold megabytes of dynamic memory for each of your customers just to run an ORM like JPA has always stroke me as a bit, well, not right. The embedded world goes out of their ways to minimize runtime costs.

Now I am old enough to remember CODASYL and their networked database model, but I largely witnessed their demise against the relational model. I've also witnessed the attempts of the object oriented databases like Gemstone and others. It is clear that these models failed while it is hard to deny that relational databases are widely used and very successful.  I do realize that the discipline required and type checks part of the relational model have an advantage. I also see the advantage of the maturity. That said, I really think ORMs suck to program with.

So in my new project I decided to start with a NoSQL database. This is a bit of an unfortunate name because one of the things I really like about the relational model is the query language , which happens to be called SQL. Many NoSQL do not have a query language and that is a bit too sparse for me.

So what I am looking for is a store for documents with a query language. These documents will have lots of little data items that will likely vary in type over time. The (obvious) model is basically a JSON store. It should be easy to create a collection of JSON documents, store them, retrieve them efficiently on different aspects (preferably with a query language) and apply changes, preferably partial updates. On top of that I expect horizontal scalability and full fault tolerance.

Though I strongly prefer to have such a store in Java since it integrates easier with OSGi I found a database that looks on paper to fit the bill: mongodb. Except for the awful name (sometimes you wonder if there is actually a need for marketing people), it offers exactly what I need. The alternatives do not look bad but I really like the 100% focus on JavaScript.

Clearly JavaScript is now the only language available on the browser and it is geared to play a much larger role in the server.  If you have not looked at JavaScript for the last two years, look again. It is incredibly impressive what people are doing nowadays in the browser and in also in the server. It seems obvious that we're moving back to fat clients and the server will only provide access to the underlying data, I really fail to see any presentation code on the server in the future. Since JSON is native to JavaScript is quickly becoming the lingua franca of software.

Unfortunately, the Mongodb Java API does not play very nicely. Since Mongodb uses JavaScript as the interface it is necessary to interact with special maps (DBObject). This creates really awkward code in Java. Something that looks like the following in Javascript:

> db.coll.insert( type: "circle", center : { x:30, y:40 }, r:10 )
> db.coll.find( { "center.x":{ $gt: 2 } } )
Looks like the following in Java:

  BasicDBObject doc = new BasicDBObject();
  doc.put("type", "circle");
  doc.put("center", new BasicDBObject("x", 30).append("y", 40));
  doc.put("r", 10);

  BasicDBObject filter = new BasicDBObject();
  filter.put("center", new BasicDBObject("$gt", 2));

  for ( DBObject o : db.getCollection("coll").find(filter) ) {

Obviously this kind of code is not what you want to write for a living. The JavaScript is more than twice as concise and therefore better readable. And in this case we do not get bonus points for type safety since the Java code reverts to strings for the attributes. Not good!

So to get to know Mongodb better I've been playing with a (for me) better Java API, based on my Data objects. Using the Java Data objects as the schema enforces consistency throughout an application and helps the developers use the right fields. So the previous example looks like:

public class Shape {
  public String type;
  public Point  center;
  public int r;

  // ... toString, etc

Store store = new Store(Shape.class,db.getCollection("coll"));

Shape s = new Shape();
s.type = "circle";
s.point = new Point();
s.point.x = 30;
s.point.y = 40;
s.r = 10;

for ( Shape i : store.find("point.x>2")

Though maybe not as small as the JavaScript example it at least provide proper data types and completion in the IDE. It also provides much more safety since the Store class can do a lot of verification against the type information from the given class.

So after spending two days on Mongodb I am obviously just getting started but I like  the programming model (so far). They key issue is of course how does it work in practice? I already got some mails from friends pointing to disgruntled mongodb users. Among the reviewers there are many Pollyanna's not hindered by their lack of knowledge but there are also some pretty decent references that backup their experiences with solid experience. Alas, the proof of the pudding is in the eating.

Peter Kriens

Wednesday, April 25, 2012

I've Sinned

"Forgive me father, I've sinned against the first law of software: Thou shall reuse." I've written a JSON codec. Yes, I known there are already lots of them around (probably too many already) but out of the myriad of JSON converters I could not find any one that had the qualities I wanted.

I needed JSON because a few months ago I wrote a blog about what is called Data objects (the C struct). I've come to the firm conclusion that objects and classes are nice but not between processes. Objects between process just do not work.

The problem is intrinsic to the object oriented paradigm. Classes provide modularity (hiding of private data) and this is very beneficial in a single program. However, once objects are exchanged between processes there is an implicit requirement that the classes in both processes are compatible. If they are identical then there is no issue because any hiding is properly maintained. However, any difference between the classes requires an agreement between the processes about the private details. In a distributed system this is very hard to guarantee at all times since systems must be updated and you rarely can bring down all instances in a cluster. Ergo, the modularity of the class is no longer effective since private data can no longer be changed without affecting others, which implies the loss of all modularity advantages. So we have to get over it and live with public data on the wire.

The dynamically typed languages have a significant advantage in a distributed world since extra fields do not harm and a missing fields are easy to detect. Where type safety can provide significant advantages inside a program, it seems to be in the way when we communicate. Though I like Javascript (I am a Smalltalker from origin)  I've come to like Eclipse and its extensive support that it can provide due to the type information. Can we combine the best of those worlds?

Clearly, JSON is the current standard in a distributed world. JSON is mostly simple, it only provides numbers, strings, arrays, and maps. Any semantics of this data are up to the receiver. This is very different from XML or ASN.1 where detailed typing is available. The advantages of this simplicity is that it is easy to get something working. The disadvantage is of course that it is also very easy to break things.

The Java JSON libraries I looked at all had a design had an impedance mismatch between Java and the JSON data. As I said, I like Eclipse's powerful support and want to use Java types, I do not want to use maps or special JSON lists in Java, they are awkward to use. Using foo.get("field") combines the worst of all worlds. What I needed was a converter that can take a Data object (an object with only public fields) and turn it into a JSON stream and take a JSON stream and turn it into a Data object. Fields in this object must be able to handler other data objects, primitives, most value objects, objects that can be constructed from their toString() method, enums, collections, maps, and some special cases like byte[], Pattern, and Date. And this all in a type safe way?

Yes we can!

It turns out that the Java type system can be a tremendous help in generating the JSON (this is really almost trivial) but it is also extremely useful in parsing. The object's fields can provide extensive type information through reflection and this can be used to convert one of the 4 basic JSON types (strings, numbers, lists, and maps) to to the field type. Since the class information is available there is no need for dynamic class loading, evil in an OSGi world. It also works surprisingly well with generics. Though every Java developers know that types are erased with generics, all generic type information about fields, methods, interfaces, and classes is available reflectively. Erasure is only about instances, from an instance you cannot find out its generic parameters. For example, take a Data object like:
public class Data {
  List<Person> persons;
The persons field provides the full generic information that that this is a List with instances of the Person class. When the JSON decoder encounters the persons field with a JSON list it can find out that the members of that list should be Person instances. For each member of the JSON list it will create an instance and parse the member. Obviously this is all working recursively. For example:

public enum Sex { MALE, FEMALE; }
public class Person {
  public String name;
  public Sex    sex;
  public List<Person> children = new ArrayList<Person>();

JSONCodec codec = new JSONCodec();
Person user = getUser();
String s = codec.enc().put( user ).toString();

// {"name":"Peter","children":[
//     {"name":"Mischa","sex":"FEMALE"},
//     {"name":"Thomas","sex":"MALE"}
//   ],
//   sex":"MALE"
// }

Person user = codec.dec().from(s).get( Person.class );

So what did I learn? First that the primitives and Java numbers absolutely totally suck, what a mess. It feels like someone with a lot of knowledge set out to make fluid use of different number types as complex as possible. The compiler can hardly do any type coercion but with reflection you better get the types exactly right and the only way to do it is to have special cases for each primitive and wrapper types. #fail

The other thing I learned was that it makes a lot of sense to stream JSON records. Initially I had in my mind to use the envelope model. If I had multiple persons than I would create a List of Person. However, in this model you need to parse the whole before you can process the first element. It turns out to work much better to sequence the Person objects. One of the nice things about JSON is that the parser is very simple and does not have to look beyond what it needs. Sequencing records also allows earlier records to provide help in parsing later records. For example, the first record could contain the number of records and maybe protocol version. It also works very well for digests, signing, and getting the class name that is needed to parse the next record.

So yes, I've sinned against the first law of reuse because I am confident that somebody will point out that there already exists a library out there that does exactly what I described. Well, it wasn't too much work and actually really like what I've got, this model turns out to work extremely well.

Peter Kriens

Thursday, April 12, 2012

Crawling to Run

In the previous post I explained how I needed jpm (Just Another Package Manager) so that I could install commands and services on a freshly minted EC2 instance. That works now fine and I am very excited about this.

The next step is to get my OSGi framework running. However, jpm is very consciously kept unrelated to OSGi, there are no dependencies except some metadata reuse since I do not have the time nor inclination to reinvent well defined applicable metadata. That said, jpm is about standard off the shelf nothing special JARs. Its only purpose is to make any JAR with a Main-Class header run on any operating system in the world without forcing the developer to learn how install services and commands for each operating system.

So if jpm is agnostic of OSGi, how do I get my framework running as an OS service or command?

Last year EclipseCon Karl Pauls was nagging me for a function in bnd to create an executable out of a bnd or bndrun file. In bnd(tools), you can launch a bndrun file so in principle it is possible to create a single JAR that you can then run from the command line with java -jar x.jar command. I started and as too many things it moved too soon to the background for anything useful to come out of it. However, it is also slightly different when you need it yourself, and I need this now. So this week I perfected the bnd package command. If you can develop a framework application in bnd(tools) then you can now turn it into an executable jar (once this bnd will hit the streets of course).

Once you have this executable JAR, jpm can then be used to installed it as a command or a service. I created a short video that demonstrates how I plan to make this all work:

This starts to look pretty promising (nothing was faked). The video was only edited to protect my thick slow fingers that try to get used to my new keyboard. However, don't start using it yet because it needs some chore work industrializing it (only used this on MacOS yet) and I need the freedom to make significant changes. Anyway, feedback appreciated and anybody that wants to spent some serious effort (test cases, windows) on this is welcome to contact me.

Peter Kriens

Friday, April 6, 2012

Just another Package Manager

The first thing you need to do when you go into the cloud is have a plan. Well, have been brooding for the last three years so there is no lack of ideas. Second thing is to get a cluster running. Obviously it is not that hard to get some process going on a Linux machine and start them. However, I ran into my by far greatest frustration with Java. Java is supposed to be "Write once, Run anywhere," but in practice they leave out this perky little detail of deployment. Java is amazingly poor in bridging that last mile. In the last year I've done some playing with node.js and npm. Let me show you a session that makes me envious:
 ~$ lessc base.less
 -bash: /usr/local/bin/lessc: No such file or directory
 ~$ sudo npm -g install less 
 -> /usr/local/lib/node_modules/less/bin/lessc
 less@1.3.0 /usr/local/lib/node_modules/less 
 ~$ lessc base.less
  #header { ... }
Why on earth can't we do this in Java? Now, installing npm might be painful? Nope:
 curl | sh
It is terribly frustrating that these javascript kiddies (kidding, node.js is very interesting!) can do this and we the enterprise Java developers can't! Something I ran into when I tried to launch my image on Amazon EC2.
At Amazon EC2 you can use a standard image with their variation of Linux. They provide one mechanism to customize the start-up to add your software, this goes through the user-data, that is specified during instance configuration. For the standard Amazon Linux image, this user data can also be a shell script that is run after the end of a reboot; the perfect place to initialize or synchronize your system. How would my user-data shell script look like in a perfect world?
  jpm || (curl | sh)
  jpm libsync
  jpm schedule --hourly mywatchdog 
  jpm start myservice
This jpm (abbreviation of Just another Package Manager) command should store the files in an appropriate place, it should be able to add commands to the shell path, install services, and schedule. Obviously, all these things are different today in MacOS, Linux, IBM mainframes, Windows, and other places where Java is running. The jpm command should therefore know about these differences and act as a good guest in the host environment. For example, on the Mac it should store the repository in /Library/Java/PackageManager but on Linux /var/jpm seems more appropriate. Same as user local files should be stored in Library on the Mac and not in the common Unix ~/.jpm directory. Daemons should follow the local rules and be manageable from the native management tools.
It should provide the following requirements:
  • Applications should not require special knowledge of the platform or jpm 
  • Act, look, and feel native
  • Manage a local repository of jars (either apps or libs)
  • Allow (limited) use without being the admin or root
  • Install command line apps written in Java
  • Install services/daemons written in Java
  • Schedule services or apps in time
  • Monitor the Java services
  • Install as GUI app (with proper icon and clickable)
  • Install from the web
  • Support at least any system with a basic *nix shell, MacOS, and Windows
  • Allow others to contribute new platforms 
  • Support JAR signing
  • Full uninstall
  • Simple & small
A wise man would have given a deep sigh, and spent his time on writing the scripts to run this on Linux. Since I am not a wise man and this is such a great frustration for me of Java I incubated jpm this week. Given the short time, and some of the false directions I took, it is not yet ready for consumption. However, I am using it on Linux and MacOS. It is not yet ready for others to play with it yet but if there are people interested in adapting it to Windows and other environments then I am more than willing to talk if you have the time to work on this. (A student project? I can provide mentoring.) It is ASL 2.0 licensed and it will probably end up at bndtools.
Peter Kriens

Monday, April 2, 2012


Welcome to this blog about Software Simplexity. Since I started developing software in the late seventies I've been on a quest for better ways to do this job, mostly through chasing reuse. I have advocated objects since the early eighties and worked for the OSGi Alliance for a decade. The Alliance work was a fantastic adventure. We've been able to develop a novel architectural paradigm that over time will have a significant influence on our industry.

The last few years were spent adapting the technology for the mainstream (Enterprise) developers to solve their problems. However, many of the problems we solved could just not exist in the originally simple programming model. Since these additions made it a lot easier to use OSGi with legacy code, they created a certain amount of uneasiness with me. A feature for one is a conceptual burden for another; I like simple which usually means small or concise.

Adapting OSGi to the mainstream development is therefore not my ambition. Over the past few years I actually came to the conclusion that there are simpler ways to write (enterprise) software than many do today. However, sitting in my ivory tower grumbling about these 'stupid' enterprise developers is not very satisfying either.

A few years a ago I saw a business opportunity to work closer with non-OSGi mainstream developers on modularity. This venture, which I like to keep confidential for the moment, requires a cloud based system. I know it would probably be wiser to select some existing (web) framework and work from there but I like to build things myself. Especially since there are a number of things happening in the industry that seem to fundamentally change the architecture of systems: the browser becoming a (very) rich client, cloud based systems, and of course OSGi maturing. Though I could go off the radar and work on this in secrecy, I've decided to be brave and record my progress (and likely struggles) in this blog.

So if you're interested in seeing my progress and struggles, follow this blog! If you have feedback, do not hesitate to respond with comments. Just one thing, I am not interested in big frameworks though pointers to cohesive (small) libraries are always welcome.

Peter Kriens