These paradigm changes have greatly increased
my power to express program logic, such that my programs have gotten smaller,
simpler, and much easier to understand, while supporting ever-increasing
user capabilities. When I started programming, I worked with simple command-line
interfaces and text-based “green screens.” Next I produced “fat-client” graphical
user interfaces, and now I work on Web-enabled user interfaces. Again, each
paradigm switch has greatly increased user power, flexibility, and ease of
use while the code required to produce the interfaces has decreased and is
much simpler to understand.
Data Storage and Retrieval Problems
Unfortunately, I haven’t seen the same kinds of advances in data retrieval
and storage. In fact, I think we’ve declined in that area as an increasing
number of data source/data sink technologies such as XML, guaranteed messaging,
and directory services have come into mainstream development. Besides the
user interface, of course, relational databases used to be the sole data
source/sink technology I dealt with. As such, programming environments of
the recent past provided first class support. My PowerBuilder and Oracle
Forms developer friends have extolled the virtues of these environments over
the somewhat primitive JDBC support in Java. My only defense has been the
promise of reusable logic in my Java objects that transcend the hard-coded
data mappings between PowerBuilder or Oracle Forms screens and the database.
Unfortunately, it takes a great deal of JDBC code to map the data involved
in the Java objects to the database. Add XML documents, queued messages,
and LDAP directories to the mix, and things get even worse. Each of these
technologies requires a different Java API, a new learning curve, and a great
deal of code to implement. In a recent code survey at my workplace, I found
that over 50% of a major application was devoted to nothing but data retrieval,
translation, and storage. That left under 50% of the system to do the real
work, namely providing a user interface and logic to do something useful
with the data users provide us.
Another problem I encountered as I tried to modify and extend the systems
at work was the hard-coded data mappings that proliferated throughout. I
couldn’t add inheritance hierarchies or new classes and relationships. Related
classes required inefficient secondary database queries as I moved from one
class to another. It was practically impossible to get the existing data
mapping code to recognize the need to instantiate the correct subclasses
of an object in a class hierarchy as instances were read from a data source.
My most discouraging finding of all was the large number of critical
defects in the data storage, manipulation, and retrieval code. There was
little care in the placement of transaction boundaries, allowing for all
kinds of data integrity problems under less than ideal system operating circumstances.
Resources like database connections, statements, and result sets were not
being freed correctly, resulting in problems as the application ran over
an extended period of time. When processing message queues, the code was
committing transactions to the database without a synchronization strategy,
such as a two-phase commit, to properly remove messages in the same unit
of work. XML documents were not being parsed or generated in an “extensible”
way, thus eliminating the crucial X in XML.
To solve the problem, I started looking into new Java technologies and
APIs like XML data binding, Java data objects, and message-driven EJBs. Each
of these technologies had limitations as I tried to hook them up to the logic
in my application. Where should I put the logic for objects that crossed
data source/data sink technologies? For example, information for my Customer
class came in from both the user interface and a message queue, was created
or updated in the database, and output as XML documents to the user interface
or other enterprise systems. Pretty much every data mapping technology I
tried, including the more traditional commercial object/relational mapping
frameworks on the market, had either a heavy or exclusive bias to a particular
data source/sink technology. I was forced to create multiple Customer classes,
one per data source/sink technology (for example, DatabaseCustomer, XMLCustomer,
MessageCustomer). Then I’d either have to duplicate the application logic
concerned with processing a customer or I’d need to have one Customer class
with the logic and transformations to and from the other Customer classes.
None of these designs are object-oriented. In responsibility-driven design,
a Customer class shouldn’t have any logic in it to communicate with a data
storage or retrieval mechanism. Instead it should perform the responsibilities
of a Customer as abstracted from the problem domain. Other classes in the
system should be responsible for the data mapping.
JLF Prototype Data Mapping Framework
Being somewhat of a framework buff, I wondered if I could design a framework
that abstracted the dirty details of data source/sink technologies, but provided
much of the power and flexibility of the native JDBC, XML, JMS, and JNDI
APIs. I came up with the data mapping portion of an open source framework
called Java Layered Frameworks (JLF), located at http://jlf.sourceforge.net
. This framework works to minimize the amount of code in your application
needed to map your Java objects to any number of different data sources/sinks.
It also helps you execute complex mappings in a relatively efficient way.
For example, when using a JDBC data source/sink, JLF can help reduce the
number of SQL statements sent to the database, and it can cache relatively
static data so you don’t have to read the same data every time you use it.
JLF Data Mapping Overview
JLF is a set of layered frameworks designed to help Java application
developers create their applications quicker and with less code. These frameworks
include the following capabilities:
1. Configuration framework
2. Logging framework
3. Utility library
4. Data mapping framework
5. HTTP request processing framework
The configuration framework basically initializes JLF by identifying
where property files are located. Java property files configure the operation
of the remainder of the frameworks in JLF, and the configuration framework
helps the other frameworks to find those property files.
The logging framework is an evolution of my JLog logging framework.
It helps to instrument events and log errors in your application so you can
detect and correct defects more quickly.
The utility library portion of JLF contains code that performs
some common coding tasks in Java. Examples include properly creating hash
values for complex objects and using the Reflection API.
The data mapping framework is the main framework in JLF and the
focus of this article. It’s designed to help you map data in your Java objects
to any number of different data source/sink technologies. Most of the capabilities
of the current version of the framework deal with the JDBC API, but JLF accommodates
other types of data sources and sinks as well (for example, output to XML
documents or input from servlets). It’s also extensible to fit any number
of other transactional or nontransactional data source/sink technologies.
The framework layers described above are shown graphically in Figure
1. Each layer shows where the Java package is implemented in parentheses,
so you know which package to import in your code.
To use the data mapping framework in JLF, you must understand three
core concepts:
1. Data mapped objects: These are the Java classes you
create for your application. They hold the data you want to map to your data
source/sink.
2. Data mappers: The JLF framework provides these objects
for you to map your data to and from the data source/sink.
3. Data location property files: These are the Java property
files you create. They tell the data mappers how to map data between the
data mapped objects and the data source/sink.
All three concepts go hand-in-hand to accomplish data mapping. We’ll
now go through each concept in further detail.
Data Mapped Objects
Any Java classes that you want JLF to map to a data source/sink must
be subclasses of JLF’s DataMappedObject class. This class contains all the
core code to help you define and access variables, relationships, and inheritance
hierarchies, so the framework can map these for you. Instead of defining
instance variables in your object, define DataAttributeDescriptors. When
you want to create relationships between DataMappedObjects in your design,
create RelationshipDescriptors. If you have an inheritance hierarchy in your
DataMappedObject subclasses, create a hierarchy table so JLF can instantiate
the proper types of objects automatically. Figure 2 shows the primary classes
in the JLF framework you use to define your DataMappedObjects.
Once you’ve defined your DataMappedObject subclasses with the proper
attributes, relationships, and an optional hierarchy table, the data mapped
object framework goes to work. It creates DataAttributes and relationships as it maps data back and forth between your Java
objects and the database. These two classes of objects help the data mapping
framework coordinate the data flowing to and from the database.
DataAttributes are used to replace instance variables in your classes.
You may wonder why you can’t simply use instance variables like any other
JavaBean class would. The answer is twofold. DataAttributes help the data
mapping framework efficiently map the data to a database, and they also help
to do optimistic locking. In the first case, if you don’t change a value
in your object after it’s read from the database, there’s no need to send
an update SQL statement when you store your object back to the database.
Since you’ve made no change to the object, sending a SQL statement to the
database uses up database resources to change a row to the same values it
already contains. Not only would this consume precious database resources,
it would also delay application response time to the application user.
The data mapping framework, in the execution of an update() method,
first checks to see if anything has really changed in the object before it
executes the SQL update statement. If you use simple instance variables in
your design, the JLF data mapping framework would have a much more difficult
time discovering if you’ve updated your object. Second, the most efficient
way to use a database in a very high-volume transactional system is almost
always to use optimistic locking. To use this, execute a locking query before
you update or delete an object in the database. The locking query makes sure
another process hasn’t modified the object since you originally read it from
the database. One common way to do this locking query is to check the values
of the object in the database and make sure they haven’t changed since the
original query. With a simple instance variable in your objects, there’s
no initial value to do the locking query before you update the row with the
new value. DataAttributes keep the original value read from the database,
as well as the new value that you wish to change the object to.
DataAttributes have different subclasses to help overcome the limitations
of Java native types. For example, Java string variables do not have a limit
on the number of characters you can store in them. When using a relational
database, you almost always define a maximum string length for any of the
character columns in your database. The StringAttribute subclass of DataAttribute
allows you to define and enforce a maximum string length. Use LongAttribute
for int and long variables, DoubleAttribute for float and double variables,
DateAttribute for Dates, and, of course, StringAttribute for strings.
Relationship objects help you efficiently map related DataMappedObjects
to a database. They help to introduce different database mapping optimizations.
For example, you can use them when you deem it more efficient to use one
query to populate any number of related Java objects. On the other hand,
in cases where you rarely traverse a relationship, you don’t want to take
the time to populate the objects on the other side of the relationship until
you know you need them. Otherwise you’d be inefficiently pulling back large
quantities of unused data from the database. The data mapping framework uses
relationship objects to “lazy read,” or read on demand, such objects when
you deem that approach to be more efficient.
Figure 3 shows how the DataAttribute and Relationship objects described
earlier work with DataMappedObjects.
Data Mappers
The data mapping framework uses a data mapping “plug-in” called a DataMapper.
DataMappers map objects to and from a particular data source/sink technology.
The goal behind the data mapping plug-in design is to hide the complexity
of mapping data to and from that technology. For example, say your Java application
needs to map data in its objects to a relational database using the JDBC
API, to XML documents using an XML-parsing API, from HTML input forms via
the Servlet API, and then send messages to queues using the JMS API. You’d
have to learn the complexities of four different and complex APIs to get
your work done. You’d also need to write a lot of code, as each API is different
and requires completely different code to execute the mapping.
The data mapping framework hides this complexity from you. The code
to map your objects to a relational database looks almost identical to the
code that maps your objects to an XML document or from the input parameters
of a servlet. The DataMapper plug-in deals with the appropriate Java API,
so under ideal circumstances your code has no technology-specific API code
in it. There will always be cases where the framework doesn’t do what you
need it to do when using, for example, the JDBCDataMapper. In those cases
you write a little bit of JDBC code and hopefully the JDBCDataMapper will
do the rest of the work for you. Data mappers in the JLF framework, including
the JDBCDataMapper, are shown in Figure 4.
Data Location Property Files
Each DataMapper looks to a property file for information on how to map
objects to the data source/sink technology it supports. These property files
are called data locations. They describe how to get to a particular
data location and map the data between Java objects and that location. To
open a connection to a JDBC data location, the data mapper needs information
such as the database URL, the appropriate JDBC driver, and perhaps a user
ID and password. Once the connection is established, the data mapper needs
to know which SQL statements to send to CRUD (create, read, update, delete)
the data. You also tell the data mapper how you want to efficiently map your
relationships – reading them in the same query as the original object, or
perhaps lazy reading them on demand. In a future article, I hope to explain
how each of the data mappers works to rid you of the burdens of data mapping
API code.
Conclusion
Enterprise Java software developers, undergoing due diligence in their
object design, have a difficult task at hand. Java’s APIs for dealing with
data sources and data sinks are quite different from technology to technology.
The JDBC, XML Parsing, JNDI, and JMS APIs have only the Java programming
language in common. As a result, object designers typically hard-code the
data mapping between their Java classes and the data source/sink technology
they currently deal with. In most cases, this hard-coding is tedious, error-prone,
and takes quite a bit of code to carry out.
Inheritance hierarchies, involved in almost any nontrivial object design,
are typically abandoned because of data mapping difficulties. In addition,
if the data source/sink design changes, it has a direct impact on the Java
code (for example, the Java code is tightly coupled to the design of a database).
When the same Java class needs to communicate with another data source/sink
technology, it’s often easier to start from scratch rather than incorporate
a second data source/sink mapping into the current class.
The JLF data mapping framework tries to address all these problems by
separating the design of your Java classes from the mapping of them to and
from a data source/sink. JLF abstracts the details of executing different
technology mappings using data mappers. It provides default implementations
of JDBC, XML (currently write only), and servlet data mappers and is hopefully
extensible for you to add your own. This should leave you free to concentrate
on good object design instead of dealing with all of Java’s data mapping
APIs.
|