Expresso 6.0 Long Range Roadmap

This document discusses proposed feature direction of Expresso for discussion purposes. Some of it is incremental, some of it is sweeping. It contains a high-level overview of the development work planned longterm for Expresso. By it's very nature, this information will change as new rechnologies and standards are available, so check back about quarterly for updates.

Version:

Expresso 6.x

Author:

Michael Rimov

Introduction

Expresso 4 had it's controller API refactored as part of its integration with Struts. Since then, performance has been way up and the new implementation has proven itself flexible and fast. I propose that by and large we keep the controller API as is.

Some other areas of Expresso are due for a refactoring to reflect technology changes.

While 5.0 offers additional functionality and improvements to the data access and security layers of Expresso, it is desireable to do some refactoring which requires a major release. Here are a few examples:

  • Not easy to integrate with EJBs
  • DBObjects have become one monolithic class that could be used to be broken down into smaller sets. Maintainability of this object has become difficult.
  • Not easy to integrate with non-JDBC data sources.
  • Limited LDAP support
  • No JAAS support as of yet.

So, here's what is proposed:

Revised Security API

The basic goal here is to keep as much of the idea that Shash has started and implement all roles and users as interfaces that can be easily extended. Here's some of the features that should be considered:

  • JAAS Integration. The number one design goal will be to integrate with JAAS. This will allow easy communication with any J2EE servers, as well as other authorization containers.
  • Database Independence. Currently there's still a lot of database dependencies within the Expresso security system. We need to fully allow the Security Module to manage it with or without database requirements. Does a person want to plug in an XML-based security module where all the roles and permissions are defined in an XML file? Fine. Although Expresso won't attempt to implement all these ideas. We will allow a pluggable system whereby somebody can extend as they will.

Revised Data Access API

So here's the basical goals:

  • Scalability. We need to be able to deal with systems that can scale to multiple servers both on the web end and lower layers.
  • Multi-Datasource Aware: We need to be able to mesh with JDBC datasources, JDO systems, as well as EJB, and JMS datasources.
  • High Performance: We don't want to sacrifice performance for the single-server setup that is the vast majority of Expresso's current user base.
  • Ease of APIs: Ideally, the system would allow a basic set of API's that allow a programmer to get a working system with very few lines of code. However, we want another layer of API's available that allow a programmer to tweak according to their deployment system that allows for high performance capabilities. (It should be noted here that we want decent performance even with the basic set of API's)
  • Concurrancy Control: Currently Expresso DBObjects have no capabilitity to deal with concurrant updates and the potential conflicts arising there. Although this may be no problem in heavily transacted systems, it becomes more of a problem within widely distributed applications.
  • Take the load of the databases: Commercial databases systems are expensive and can easily get bogged down. Adding more Database hardware or replicating databases may only be a partial solution, and cost easily cost upwards of multi-million dollars for many companies. Being able to scale using multiple, inexpensive machines is a highly desirable scalability trait.
  • Good Low-cohesion module design: For our goals to work, we must design a system that has low cohesion. Ie, we can replace chunks of the system without seriously affecting other chunks. 

DataObject Stack

Some of the most successful systems have been designed around the simple notation of of "stacked layers." Probably the most notable example is the tcp/ip network stack. The premise is that each layer by and large only communicates with the layers above and below it. Each layer can be reimplemented as long as the boundaries between the layers are left undisturbed. Thus providing loose cohesion. So here's what the Expresso DataObject Stack would look like:

Here's a quick summary of what the job of each layer is:

  • Data Access Object Layer: The Data Object Layer will be the layer that each programmer communicates with. By and large it will be a "Dynabean" with the setField() and getField() methods that we have all come to know and love. It will have some additional fields that are automatically kept to allow for concurrancy support, but by and large people should recognize it as a DBObject.
  • Cache:Before, the DBObjects would check a cache and go directly to a database if it wasn't in the cache. The new API will be telling the cache "I want such and such object". If the cache has it, it will produce it, if not, then it will go to the next layer and tell it to get it for the Cache. The goal is to have Cache follow the JCache specification being developed in a JSR.
  • Messaging Middleware:This layer isn't necessary if you're dealing with only a single webserver. The middleware's goal is to keep all the various cache systems on all the different webservers or whatever system synchronized properly. Unlike before where the Cache was passive, every time another system updates a database, the Middleware layer keeps track of who has what objects and sends the new copy to the various cache's that are interested. There can be more than one Messaging Middleware layer on a network, and they all communicate with one another. Usually the idea is that you'll have one Middleware for each 'subdivision' of an enterprise.
  • Transaction Layer:This can be JDBC Transactions, Entity EJB's, whatever you wish. This is the layer that controls the final writing to the database.
  • Conflict Manager: The job of this guy is to handle the problems that arise when multiple updates take place at once. The conflict manager's algorithm will be pluggable. Example options are: Last record always overwrites, Merge records if modified fields are different fields. Either way, the Conflict Manager will pass back to the system a "resolved" data object, or throw an Exception. All layers can hook into the Conflict Manager because it's much more effecient to detect and resolve a conflict the further up the stack you go, but all conflicts may not necessarily be discovered until the Transaction Layer is reached.

Stack Detail

DataObjects

As said earlier, the DataObject will in some ways look similar to the good old DBObject we know and love. There's some key API differences, however:

  • The DataObject will essentially be a "dyna-bean" with the setFields("fieldName"), getField("fieldName") capabilities that we've come to know and love about DBObjects. The data objects will also basic operands upon them, add, update, delete. Anything that should be done on a single data object. The dataobject will be an interface instead of a concrete class. This will provide integration with the various data sources we're likely to encounter.
  • All functions that would be operating on multiple data objects for querying would be moved to a separate interface. For example, setMaxRecords, setOffeset, searchAndRetrieve(). All these could be moved into a DataQuery interface. Some of them are very SQL specific, and will only be available for the JDBC implementation of the DataQuery interface.
The absolute biggest difference will be the following:

All DataObject Operations other than set/getField() are Asynchronous

To make this design goal clearer, here's a few questions and answers about it:

  • What benefits do you get from Asynchronous data operations?Well several actually. From the data object integration standpoint, Asynchronous communications allow for easy integration with JMS-based transaction systems. Secondly, there's many places where asynchronous communications can speed up execution on the webserver side. Take for example a job that simply updates a lot of records in the database. The job can call a batch of updates(), and it's done. Any special circumstances can be either dealt with using callbacks, or simple things like reporting any conflicts to the system administrator. Contrived example, but why should a job server be bogged down for perhaps hours waiting for each add() or update() to return? And finally, if you have a reliable transport system, it doesn't matter if the database connection dies all together. All modifications will be resolved when the database comes back online and until then, the Cache system will have the updated copy as it is.
  • Won't Asynchronous programming be much harder to work with? The answer to this is not necessarily. The short answer is that if we do our job and provide the right constructs to work with, your job should be no more difficult than synchronous programming. It truly is all a matter of design.
  • Won't I have to redo my entire program design because of this? This shouldn't be necessary. Again, this mainly deals with whether we do our design job correctly. An example of a construct that could make your life easier, is a synchronous wrapper class to go around all data objects. This wrapper class would wait for the results of the record modification and time out if a result isn't obtained in X amount of time, and throw an Exception if a timeout occurs.... which is exactly how things work if a DBConnection dies while trying to do a record update. This is an example of our goal to provide simple wrappers and facades to allow for a novice web programmer to get some results without having to tackle tricky synchronization issues. The construction of a synchronous data wrapper isn't tough, it would be just like if you use a new SynchronizedMap(new HashMap()) type of constructor.

Other under-the-hood changes to a DataObject would be:

  • Before Modification Snapshot: Each DataObject will have a "pristine" copy of itself that is maintained until a transaction is complete. This helps us in dealing with concurrancy as per one of our goals. The conflict manager can look at this and decide what has been modified. The pristine field value copies would be constructied lazily, to save on memory.
  • Date Last Updated: If you're working with multiple machines that are time synchronized, this is a quick and CPU efficient way of determining which modification came first. Again dealing with concurrant update support.
  • Last Modified By: This will contain the user credentials of the person to update a particular object. This allows the system to figure out who to notify if there's a person's modifications conflict with another concurrant update.
  • Aggregated Concrete Classes for transactional behavior. Somewhere along the line, we often need to be able to get to low level stuff. This reduces system portability, but it also is often necessary for performance of particular features that the "lowest common denominator" approach to data source systems is insufficient. Each data object knows which underlying datasource it came from, and the programmer can get to that datasource and downcast to the appropropriate classes. This is much in the same way as Servlet vs HttpServlet and Controller, vs. Servlet Controller. These classes will still provide hooks into the rest of the system, so any special updates are still sent to the cache, etc.
  • Update Priority The programmer of the system should be given the opportunity to define what priority this object should have if it is updated. For example, security updates probably have a greater priority to propagate throughout an enterprise and get written to a database, vs. say, "Updated Workforce Slogan of the Week".
  • Routing Information If multiple middle tiers are used then the system should have a quick way of figuring out which Middleware and which Transaction layer to route to when doing an update. Needless to say, this information would be null if we were dealing with only a single server. Again, we won't waste memory if we don't need it for a particular configuration.
Of course, as we all think about the problem domain more, we'll come up with other modifications that need to happen:

Cache

The cache layer will underlie the data object layers. Some of the details of the Cache will be:

  • Make the Cache compliant with the JCache standard being drafted. This is already based upon a working Object Cache module that Oracle is providing for it's developers.
  • The Cache Module will have both memory and disk based caches. The sizes for these will be configurable to tune system performance. The increased size will GREATLY improve data access performance since the number of round-trips to the database will be significantly reduced.
  • The Cache Module is ACTIVE. This means that whenever it retrieves a copy of a DataObject from the lower layers, the lower layers now know that this Cache is interested in this data object. If another system updates the same dataobject, the cache receives the updated copy as well. When an object leaves the Cache, the Cache system notifies the lower layers that it is "unsubscribing" so to speak, and the lower layers will no longer notify this cache with updates. This immediatly allows for multiple webservers with a data middle tier, but with the added advantage that data is still cached at each webserver.
  • The Cache Module checks for potential concurrancy problems before passing updates to the lower tier and sends the offending records to the ConflictManager for resolution.

The absolute biggest difference will be the following:

Messenging Middleware

This "layer" is responsible for queueing up requests to the back end database, as well as routing dataobject updates to other Middleware's if other ones are responsible for this particular data object. (See the routing information described in the Data Object layer). Priority Queues and multiple threads will be used for:

  • Dispatching updates to the appropriate Caches and other middleware servers.
  • Dispatching updates to the underlying data layer.

Other properties include:

  • Concurrency conflicts. Sometimes the Cache layer won't be able to tell that a dataobject modification is conflicting due to routing time differences. The Messenging tier will check against potential conflicts before dispatching updates.
  • Each Middleware tier is only responsible for figuring out concurrancy conflicts with it's "own" data objects. This prevents multiple systems from "rejecting" posts that should have otherwise taken place.

Transactional Layer

This layer will do the actual "dirty work" It will be responsible for transactional integrity, and can be programmitcally customized. Specifics include:

  • The actual custom behavior of data objects will be used here. Probably this can be encapsulated in an inner class to better define the custom behaviors, but I'm unclear how best to keep "extra classes clutter" out of the situation and still define encapsulations.
  • This layer can speak to Entity EJB's, JDBC recordsets, or whatever you need. The current DBConnectionPool would reside at this layer, for example.
  • If JNDI datasources and JTA managers are used, this layer can interact with the transaction architecture this way.

Conflict Manager

The conflict manager will have to choose the appropriate algorithm to apply to each particular data object and see if it can resolve any conflicts or if it has to deny a data object modification. Examples are:

  • In a bank account situation, if two withdrawls happen at the same time, then the conflict manager would apply BOTH withdrawls and perhaps add a penalty to boot if the resulting bank balance is less than zero.
  • In a normal personnel situation, the conflict manager could attempt to merge any fields. If field A was modified by person A and field B was modified by person B, then the manager would just merge the two changes, and we're set. If both Person A & Person B modified the same field, then the transaction would proceed with the first record, and the second record that arrived would be rejected.
  • No concurrant updates are acceptable, if any records conflict at all, reject the changes.

Other Details

Boundaries

The border between each layer on the dataobject stack is a boundary. The dataobject must be somehow transmitted to the next layer. The boundaries are actually pluggable as well. Here's some examples:

  • In a single-webserver system, the boundaries would simply call the next class to be called and hand the data objects to the next layer without any serialization taking place. This will result in a nice, high speed system. (Actually much faster then Local Entity-beans since those actually only bypass the RMI layer)
  • In a multiple tier system, JMS would be the ideal way to communicate all the messages between layers, this is especially feasable since the data objects themselves are asynchronous.
  • For systems needing a little more open approach, the data objects could be serialized to an XML document or document fragment that the underlying system (whatever the person desires to implement) could transport the data in a portable way.
  • For systems needing maxium open-endedness, the XML boundary could have an XSLT Translet added to it to "massage" the data into a format that the rest of the enterprise can speak.

Finally, the boundaries can be individual. For example, you can have an in-memory boundary between the DataObject, and the Cache. A JMS boundary between the Cache and the Middleware. And an XML boundary between the Middleware and the transaction layer. This enables easy separation of work between machines if scalability is required.

Exception Propagation

All data passed between layers is simply an "Encapsulated" message. Any Exceptions are just that, another Message that is bundled and passed back up the stack to the Web Application for appropriate handling. The system should be flexible enough so that Exceptions can be routed to a special machine for handling as well.

General Implementation Methodology

Instead of directly continuing the expresso 5, tearing it down and rebuilding it while everybody waits for their bug fixes to the last problem. We will create a special Expresso 6 module that will eventually take Expresso 5's spot when it's ready. Until then, maintenance work will continue on Expresso 5 so that everybody can continue developing their commercial products off of Expresso 5.

As each part is functional, we will release a public review release of that particular functionality. Of course, at first, it will look nothing like a full blown Expresso application. It will initially be just some unit tests to prove that what we created works and provide a usage example. Once it seems that the API has stabalized sufficiently, and we know that the release date is on the horizon, we will cease work on Expresso 5 (except for completely critical bugs), and flesh out Expresso 6 until it is ready to be released.

Conclusion

This of course is only a high level overview and will be fleshed out in further design sessions via the community process. The 6.0 plans are to say the very least, a daring undertaking. The Expresso core team's goal is to reduce risk by still maintaining Expresso 5 series while we work on the Expresso 6 API.

Given the current climate of requiring J2EE compatability as well as the need for large enterprises to have a framework that will work well for them, I believe such a refactoring will give the Expresso a definite edge for the Java developer. Please email or post to the listserv with any comments and additions about this proposal.

Please feel to email the author or the opensource mailing list with any comments, ideas and/or suggestions.

Copyright © 2002 Jcorporate Ltd. All rights reserved. Copyright Privacy

Last Modified: 08-Oct-02 6:08:50 PM