Wednesday, July 23, 2003

Object Relational Mapping (again)

So my ORM system was conceptually validated again yesterday, when I implemented the 'ledger manager' part of the TSG Office Assistant. It was really nice to simply inherit an object that understands the basic database primitives required for business logic, and completely avoid writing SQL in the presentation layer. It isn't quite model/view - I didn't formalize it to that extent - but it is nicely stratified.


I did some searching around for other object-relational systems for .NET, preferably a lot more advanced than mine - considering only free software (both as in beer and as in speech). I found two on SourceForge. NHybernate appears to be a dead project, largely because the designers tried to copy a mature Java project class-by-class, rather than realising that the .Net Framework and Java libs work very differently in some cases. It does serve as a great example of database-agnosticism, though. (Hibernate for Java appears to be a pretty impressive ORM system, although I think that I might get angry using it; it tries to abstract away all of the little databasey details such as when to cache and when to commit to disk!). A more promising contender - at least in that it is still alive - is OBJ.NET. This is based on OJB/Java, part of Apache. It exhibits some very nice design, including transactionality (with explicit commit), not saving until you mark an object as dirty, and cacheing. It also features some horrible database code (OLE.net only!), but the developers say this is due for fixing in a later release (it is still very pre-alpha). The XML mapping between tables and classes isn't bad, but it looks like it might add a bit more overhead than I would like. Definitely a project to watch!


This got me thinking about ORM in general. It seems to me that in a traditional n-tier system, several tiers are all struggling to gold-plate their job and take over - and fuzzy thinking has allowed this to happen. Looking at a typical 3-tier system:


  • The Database Tier handles storage. At this level, you want normalized data, formalized set theory to ensure referential integrity, pure storage worries (replication, etc.). You may also want triggers to help keep everything in order (not strictly necessary if you implement referential integrity correctly), and stored procedures to ease/speed-up data access. In other words, just storage and related worries. (This should itself be broken into physical and logical storage, since the two are separate; fortunately, the DBMS should worry about physical for you!).
  • Business Logic Tier. This tier typically needs code to talk to the database tier (preferably in an agnostic way in case the physical medium changes), code to talk to applications, and lots of objects encapsulating business procedures. Lots of safety net code is a good idea here, too, since apps programmers can and will break things!
  • Application Tier. At this level, you worry about things like displaying data, having a user do stuff with it, and then sending the results back (via the business logic tier). Typically, you need a means of talking to the business logic tier, and lots of UI code.


The 3-tier model above makes a great deal of sense. It separates out three very different types of problem. So far, so good. Unfortunately, vendors just don't get it - and seem to be working pretty hard to make it easy to break this mold. For example:


  • Oracle can run fully-fledged Java in the DB server; they even advertise that the database can "help your business logic layer". Likewise, SQL Server will soon be able to host CLR programs. MySQL - barely an RDBMS anyway - can already run C code locally.
  • On the business-logic level, you need to resort to 3rd party items for truly seamless Object-Relational Mapping - or you need to waste scads of time writing plumbing on every project (in other words, the language vendors don't properly support the model they espouse, maybe because they want to sell bigger databases/database servers!). Worse yet, many business logic level applications become concerned with physical storage, particularly cacheing systems. Even worse - "object stores" designed to avoid having a relational database at all, save as a unit for storing BLOBs (binary large objects) holding serialized class data. You aren't going to get any benefit at all from your RDBMS if you don't let it do what its good at!
  • On the application level, the sins are countless. .NET offers some really nice platform agnostic data handling - and then plugs it directly into user interface objects! You can wrangle it to require separation, but I've seen so many projects - particularly ASP projects - that embed some of the business logic IN the display logic that it isn't even funny. (PHP, ASP and similar scripting langauges are particularly prone to this). Also, there needs to be a way to have the compiler shoot a programmer who needs a quick query from the database - but doesn't want to go through all the tiers to get it - and decides to embed a direct statement in the display logic.


All of the above problems can be avoided by avoiding fuzzy thinking, and applying some discipline to development. Everyone has made at least one of these errors (myself included), and it is really easy to make them over and over. Vendors screaming and shouting about their latest solution to a nonexistant problem (ie. a way to break a rational system by offering shortcuts) certainly don't help. (The general disdain for applying scientific method to business computing doesn't help, either!)

What seems to be needed is an easy way to create formally-correct tiers from a logically-correct data representation. Ideally, I would be able to create a logical representation of the data I wish to store - and it would be created in an RDBMS (with full integrity constraints), skeleton object mapping code would be created for the business logic tier, and an easy way to expose objects to apps would be presented to me. Oh, and if the database changes - as we add more requirements (reqs. are never static in the real world!), I want it to update the framework without (substantially) breaking higher levels of the system. I can do all of this with separate tools and much time/effort - why isn't there a one-stop-shop, yet? Am I asking too much?
Mood: restless
Music: Robert Plant - Tie Dye On The Highway

No comments: