Version 5 (modified by gordonrachar, 15 years ago)

--

Mapping Databases to Each Other is Expensive

Status of this document: Working Draft

This document is open for feedback, please post questions and comments in the forum at the bottom of this page. You will need a login to post in the forum.


Contents

  1. Abstract
  2. Mapping In a Perfect World
  3. Mapping in the Real World
  4. A Custom Map
    1. Benefits of a Custom Mapping Solution
    2. Complication - The Real World is Complex
  5. A Confederation of Applications
    1. Benefits of a Confederation of Applications
    2. Complication - Redo Work Every Time
    3. Complication - Short Lifetime, High Maintenance
    4. Complication - Ossification
  6. Common Data Definitions
    1. Complications of Abstraction and Encapsulization
  7. Reuse
    1. Benefits of a Common Standard
    2. Complication - Who Minds the Dictionary?
  8. Industry Reference Data
  9. Summary
  10. Next

Abstract

When we want to exchange information between two software applications, the traditional way is to map the respective databases together. This is expensive and only done in a few high-value situations.

[Note 2008.10.27 - This page is an amalgamation of two pages. I'll clean it up as soon as possible. GPR]


We will introduce several new concepts that are fundamental principles of interoperability and integration. They directly relate to the cost of interoperability and integration. Understanding these is the key to reducing cost.

Conservation of complexityYou can't get away from complexity. You can move it from one place to another, but it will always be there somewhere
AbstractionRepresenting real objects in a general, abstract form
EncapsulationHiding complexity from users who really don't want to know any more
Semantic versus syntactic-
Semantic precision-
Semantic fidelity-
Reuse-

Mapping In a Perfect World

Let us say that you are an engineer working for a construction company planning the erection of some piping spools. Part of your job is to plan the hydrotesting of all the pipelines before commissioning the plant. So, along with a great deal of other information, you need to know the design temperatures and pressures of all the pipelines in the project. Fortunately, the design engineer has given you a line list, a list of all the pipe line numbers and their critical attributes.

Your first job, then, is to simply get the design engineer's line list into a form you are used to dealing with--your company's construction management software. You might decide to bite the bullet and just rekey the information manually. But this gets old really quickly and after a few dozen lines you will be wondering why you just can't download the data from the engineer's design software into your construction software automatically. After all, the engineer did not (in this day and age) have a secretary type the line list onto a piece of paper with a typewriter and fax it to you; it almost certainly exists in electronic form somewhere.

Now it is in the design engineer's best interest to make sure that the construction contractor, has the correct information, so it is easy to imagine that "your IT folks" and "their IT folks" get together to give you a connection over the internet to just "suck in the data." What are the chances of it being just as easy as Windows cut and paste?

"Artificial intelligence is the study of how to make real
 computers act like the ones in the movies."
   - Port 2000 Newsletter, The Information 
     Technology Newsletter for Port Washington Educators, Dec. 96

In a perfect world, everyone developing an application for some part of plant design would use the same column names, and ensure the meaning of the contents of the columns were the same. (In database jargon, the "schemas" would be the same.) If that were the case in the real world, mapping the contents of one application to another would be limited to the relatively "simple" technical aspects of server names, database names, permissions and firewalls.

Error: Macro Image(Mapping01.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping01.JPG' does not exist.

Figure 1 - A Perfect World

Mapping in the Real World

But even in the rather small scope of a "line list", where there the "lexical scope" is, we think, only a few terms, there is ambiguity.

Error: Macro Image(Mapping02.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping02.JPG' does not exist.

Figure 2 - Actual Situation

For instance, in Figure 2, two columns have names that do not match. One of them, "tag_no" in the Engineering Application is most probably equivalent to the "id" column in the Construction Application. But what about the column "cl" in the Construction Application? And, for that matter, what does "temp" really mean? Could it mean "Temporary"? It's possible. It is common to have temporary lines erected for commisioning, then demolished when a plant is put in production. And further, if you are being a bit paranoid (and if you have to sign off on the actual hydrotest pressure you better be at least a bit paranoid) what about "press"? It almost certainly means "pressure", but what kind of pressure?

  • Operating Pressure
  • Maximum Allowable Working Pressure
  • Design Pressure

If you have to specify the hydrotest pressure, you have to know for sure.

The reality is that we are inferring everything.

The only solution is to start digging. Fortunately, the lexical scope of a line list is only a dozen or so terms. To determine if "temp" means "temperature" or "temporary", you will have to look at a few rows of data. If the data values are "y/n", or "0/1", it points to "temporary". But if the data values are numbers in the range of, say "-50" to "1500", it probably means "temperature".

"press" will take a bit longer. You will have to look for clues in other documents and use your engineering judgement to determine which kind of pressure it is. You may have to take the time to contact the design engineer.

What looked fairly simple just a while ago is looking to be a bit more complex.

"There is always an easy solution to every human
 problem--neat, plausible, and wrong."
 - H. L. Mencken, 1917 (Variously attributed
   to Albert Einstein, Winston Churchill and others.)

A Custom Map

Error: Macro Image(Mapping03.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping03.JPG' does not exist.

Figure 3 - A Custom Solution

If the line list were big enough it might warrent a custom solution. In this example someone has examined the data coming from the Engineering Application and determined which columns matched those of your Construction Application. You might simply dump the line list from the Engineering Application into a Windows Excel file, change the two column names, then read the file into your Construction Application. If the line list were subject to periodic change, you might even have a programmer write some software to do this automatically, perhaps once a week.

Benefits of a Custom Mapping Solution

  • You don't have to expend more effort than you actually need right now.
  • The solution might be very simple. For instance, if the information you wanted to load into your Construction Application were truly as simple as the example above and if you don't have to update the information very often.

Complication - The Real World is Complex

In the simplistic example above, writing the mapping software might not even take very long to create. But the real world is more complex. One common problem (from the point of view, a humble construction engineer) is that the data is often mangled across many fields.

Take, for example, a "simple" line number. As a construction engineer you are likely to look at a line number as "the name of this piece of pipe, from here to there" as a single label. You want to see the line number as one string of text:

Line Number
150-HCL-250-1C200-35

But the design engineer needs to be able to sort and filter on the pieces. Her idea of a line number looks more like this:

Line Number
AreaFluid CodeNom DiaService ClassInsulation Thk
150HCL2501C20035

This added complication may not reveal itself until you get "under the hood" and look at the database dump. The design engineer could well have concatenated the components of the line number in the written report to appear as if they were a single string of text. If you are the one transfering information from one application to another, it is up to you to figure out all the pieces and put Humpty together again for your Construction Application.

Your "quick" mapping exercise isn't looking so "quick" any more, but is certainly becoming "dirty". It's a good thing that the lexical scope of a line list is only a few dozen terms.

A Confederation of Applications

Error: Macro Image(Mapping04.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping04.JPG' does not exist.

Figure 4 - A Confederation of Applications

Even with the potential complications the solution for the above example was not terribly difficult, as these operations go. The next step is to link in another application. This is the beginning of what might be called a "Confederation of Applications", where many applications get information from each other, with custom-built database maps. Eventually, you could conceivably link all of the Construction Applications at your company together in this manner.

Error: Macro Image(Mapping05.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping05.JPG' does not exist.

Figure 5 - Enterprise-Wide Solution

Benefits of a Confederation of Applications

  • You no longer have to manually rekey information to transfer it between applications

Complication - Redo Work Every Time

To make everything work together you have to know a great deal about all of the applications in your confederation. As the number of linked applications expands, the effort required to connect them expands exponentially.

Since each of your software applications was likely written by a different organization, it is likely that they will all use different terminology for describing the same attributes. For instance, to describe something common like Maxiumum Allowable Working Pressure, one organization may use MAWP, another pMax, or Press_13, or even a very criptic attribute1078. Thus, even if several applications need all or part of the line list, you will have to do the mapping each and every time.

Complication - Short Lifetime, High Maintenance

For a short period of time, say, the duration of a construction project, linking software applications with custom routines may work just fine. But software tends to get revised with "new and improved" capability. Eventually, one of the "improvements" will be to refine the definitions of the data and the names of the database columns. Your hard-coded custom database map will break. And if your applications are daisy-chained together, they may all break. To keep all applications working together someone has to maintain the linking functions.

Complication - Ossification

To reduce your maintenace chores, you organization may try "freezing" the design of your software. This will give short-term relief, but at the expense of stopping progress. (Of course some will see this as an advantage, but eventually your organization will be left with a porfolio of old software than no one knows how to maintain anymore.

Common Data Definitions

To make a better solution, you must understand the concepts of Abstraction and Encapsulation.

Abstraction means that you represent an object with a certain set of attributes that are of interest to you. This invariably means some information is lost. For instance, the pipe in one of the pipe spools may have come from a mill in Italy. Or the drawing of one of the pipe spools may have been made by a drafter by the name of Ford Prefect. When the design engineer created the line list for the Engineering Application in our example, she omitted these two attributes because they were not of interest to the Line List (although other parts of the engineering organization may still be very interested in them).

To make your software applications more robust you can go to another level of abstraction, where you make generalizations of the content of databases. Essentially, you make a "Dictionary" of the meanings of all the columns in your databases.

Error: Macro Image(Mapping06.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping06.JPG' does not exist.

Figure 6 - Abstraction

In this example, someone has created a dictionary containing the meaning of the attributes of the Line List. Perhaps there has been dictionary design meetings for interested parties throughout the organization, or perhaps someone decided to simply use a public standard.

Using the dictionary, the developer of the Engineering Application has determined the appropriate definitions that apply to the columns in the database. Without having to change the Engineering Application in any way, he builds an export function that extracts the appropropriate data values to some neutral medium. (It could be to a comma-seperated text file, XML, an Excel spreadsheet, or something else.)

DictionaryEngineering App
szTagtag_no
dDiameterdia
dLengthlen
dTemperaturetemp
dPressurepress
oDateIFCifc

Similarly, the developer of the Construction Application would use the same dictionary and determine the appropriate definitions that apply to the columns in her database.

DictionaryConstruction App
szTagid
dDiameterdia
dLengthcl
dTemperaturetemp
dPressurepress
oDateIFCifc

Again, without having to change the Engineering Application in any way, she builds an import function that reads the appropriate data values from a neutral file. One important point here, is that she no longer has to know anything at all about the Engineering Application, or any other application.

This demonstrates the concept of Encapsulation. The complexity is hidden from the user.

This also demonstrates the concept of Conservation of Complexity. "Conservation" here, has the same meaning as it does in the Law of Conservation of Energy. That is, you cannot eliminate complexity. All you can do is move it one place or the other.

In our example above, the complexity of each software application has been contained in ech application. To move information between them, no one has to know anything at all about the application.

=== Benefits of Abstraction and Encapsulation ==

  • Complexity is hidden. The developer of the Construction Application did not have to know anything at all about the Engineering Application.
  • Semantic Precision in terminology. You would know the exact meaning of all the terms. For instance, there would be no doubt in your mind what dTemperature was.
    • Easy reuse. (More on this later.)

Complications of Abstraction and Encapsulization

* Someone has to heard the cats together to decide on the meaning of each term and what to call them. * The developers of both the Engineering Application and the Construction Application had to design import/export functions that matched the common standard.

In short, there's more initial work in making and working to a common standard.

Reuse

The value of a common dictionary is demonstrated when you want to link in another application. Say, when a purchasing agent wants to import information from the Engineering Application.

Instead of having to examine the contents of the Engineering Application's database, all he has to do is consult the dictionary and determine the values he wants to use.

DictionaryProcurement App
szTagtag
dDiameterdia
dLengthlen
oDateIFCifc

The fact that the Engineering Application is exposing other data attributes is not of interest. The Procurement Application will simply choose the four data items and use them.

Error: Macro Image(Mapping07.JPG, 700px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping07.JPG' does not exist.

Figure 7 - Reuse

Benefits of a Common Standard

  • Easy re-use. The developer of the Engineering Application didn't have to do anything at all. All she had to do is make the developer of the Purchasing Application aware of the data dictionary.
  • Robust. If the keepers of the Dictionary decide to extend the standard, for instance, to add another type of Pressure, they will simply make a new term. New linking applications will use the new definition if appropriate, but none of the existing functions will break.

Complication - Who Minds the Dictionary?

Regardless of the size of the organization creating the dictionary, eventually its sphere of interest will intersect the sphere of interest of another organization which may well have made its own dictionary. If the two organizations want to link their respective applications, which of them has to change all their software?

Industry Reference Data

The dictionary in the example above ...

But we are still not at ISO 15926

. . .

Error: Macro Image(Mapping08.jpg, t00px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping08.jpg' does not exist.

Figure 8 - ISO 15926

Summary

In these examples we have gone from manually moving data from one software application to another, all the way to makeing generic definitions of database terms and using them in the original applications to build export and import functions. Making the generic definitions of database terms is an example of abstraction. Hiding the complexity of the original applications by exporting data values using a common dictionary is an example of encapsulation.

Next

Mapping Databases is Impractical?

Even if we accept the expense of mapping applications together, it is usually impractical to do so in any realistic business scenario.


Discussion

You have no rights to see this discussion.

Home
About PCA
Reference Data Services
Projects
Workgroups