Version 44 (modified by gordonrachar, 14 years ago)

--

Mapping Databases to Each Other is Expensive


Contents

  1. Abstract
  2. Mapping In a Perfect World
  3. Mapping in the Real World
  4. A Custom Map
    1. Complication - The Real World is, well, Complex
  5. A Confederation of Applications
    1. Benefits of a Confederation of Applications
    2. Complication - You Have to Redo Work Every Time
    3. Complication - Short Lifetime, High Maintenance
    4. Complication - Ossification
  6. Next

Abstract

When we want to exchange information between two software applications, the traditional way is to map the respective databases together. We preserve context by manually examining the terms in each database to determine which are equivalent. Directly mapping two applications together is often the easiest short-term solution. But as more applications are mapped to form a network, the cost to maintain the individual maps grows exponentially.


Mapping In a Perfect World

Let us say that you are an engineer working for a construction company planning the erection of some piping spools. Part of your job is to plan the hydrotesting of all the pipelines before commissioning the plant. So, along with a great deal of other information, you need to know the design temperatures and pressures of all the pipelines in the project. Fortunately, the design engineer has given you a Line List, a list of all the pipe line numbers and their critical attributes.

Your first job, then, is to simply get the design engineer's line list into a form you are used to dealing with--your company's construction management software. You might decide to bite the bullet and just rekey the information manually. But this gets old really quickly and after a few dozen lines you will be wondering why you can't just download the data from the engineer's design software into your construction software automatically. After all, the design engineer did not (in this day and age) have a secretary type the line list onto a piece of paper with a typewriter and fax it to you; it almost certainly exists in an electronic database somewhere.

Now it is in the design engineer's best interest to make sure that the construction contractor has the correct information, so it is easy to imagine that "your IT folks" and "their IT folks" get together to give you a connection over the Internet to just "suck in the data." And since the "lexical scope" of a Line List is only a few terms, it is easy to imagine that the data will "go right in".

"Artificial intelligence is the study of how to make real computers act like the ones in the movies."
Port 2000 Newsletter, The Information Technology Newsletter for Port Washington Educators, Dec. 96

In a perfect world, everyone developing an application for some part of plant design would use the same column names, and ensure the meaning of the contents of the columns were the same. (In database jargon, the "schemas" would be the same.) Figure 1 shows what this would look like.

Error: Macro Image(Mapping01.JPG, 300px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping01.JPG' does not exist.

Figure 1 - A Perfect World

Mapping in the Real World

But of course in the real world, things seldom work out this nicely. Even in the rather small scope of a "line list" (where there the "lexical scope" is only a dozen or so terms) there is usually ambiguity.

Error: Macro Image(Mapping02.JPG, 300px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping02.JPG' does not exist.

Figure 2 - Actual Situation

For instance, in Figure 2, two columns have names that do not match. One of them, "tag_no" in the Engineering Application is most probably equivalent to the "id" column in the Construction Application. But what about the column "cl" in the Construction Application? And, for that matter, what does "temp" really mean? Could it mean "Temporary"? It's possible. It is common to have temporary lines erected for commissioning, then demolished when a plant is put in production. And further, if you are being a bit paranoid (and if you have to sign off on the actual hydrotest pressure you better be at least a bit paranoid) what about "press"? It almost certainly means "pressure", but what kind of pressure?

  • Operating Pressure
  • Maximum Allowable Working Pressure
  • Design Pressure

If you have to specify the hydrotest pressure, you have to know for sure.

The reality is that we are inferring everything, based on context.

The only solution is to start digging. Fortunately, the lexical scope of a line list is only a couple dozen terms. To determine if "temp" means "temperature" or "temporary", you will have to look at a few rows of data. If the data values are "y/n", or "0/1", it points to "temporary". But if the data values are numbers in the range of, say "-50" to "1500", it probably means "temperature".

"press" will take a bit longer. You will have to look for clues in other documents and use your engineering judgment to determine which kind of pressure it is. You may have to contact the design engineer.

What looked fairly simple just a while ago is looking to be a bit more complex.

"There is always an easy solution to every human problem--neat, plausible, and wrong." - H. L. Mencken, 1917 (Variously attributed to Albert Einstein, Winston Churchill and others.)

A Custom Map

If you have to import information from the Engineering Application more than once, you will probably have someone create a custom mapping application.

Error: Macro Image(Mapping03.JPG, 350px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping03.JPG' does not exist.

Figure 3 - A Custom Solution

In Figure 3, someone has examined the data coming from the Engineering Application and determined which columns matched those of your Construction Application. In software development jargon they have "Mapped the databases together." The red box represents software that uses the map to transfer information from the Design application to the Construction application.

Note that very little data is actually transferred. For instance if we know that both applications expect pipe sizes based on ASME B36.10M, we would only have to transfer numerical digits, such as "6" or "12".

Complication - The Real World is, well, Complex

In the simplistic example above, writing the mapping software would be quite simple, as these things go. But the real world is more complex. One common problem (from the point of view of a humble construction engineer) is that the data is often mangled across many fields.

Take, for example, a "simple" line number such as what might be represented by the column "tag_no". As a construction engineer you are likely to look at a line number as "a label representing the name of this run of pipe, from here to there." You expect the line number to be one string of text, something like:

Line Number
150-HCL-250-1C200-35

But the design engineer needs to be able to sort and filter on the pieces. Her idea of a line number probably looks more like this:

Line Number
AreaFluid CodeNom DiaService ClassInsulation Thk
150HCL2501C20035

This added complication may not reveal itself until you get "under the hood" and look at the database dump. The design engineer could well have concatenated the components of the line number in the written report to appear as if they were a single string of text. If you are the one transferring information from one application to another, it is up to you to figure out all the pieces and put Humpty together for your Construction Application.

It's a good thing that the lexical scope of a line list is only a few dozen terms.

A Confederation of Applications

If linking the first two applications together works out well, you may be tempted to link in another one. Figure 4 shows how the Engineering Application might be mapped to a Procurement Application with a second custom database map.

Error: Macro Image(Mapping04.JPG, 600px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping04.JPG' does not exist.

Figure 4 - A Confederation of Applications

This is the beginning of what might be called a "Confederation of Applications", where many applications get information from each other, with custom-built database maps between each pair of applications. Conceivably, you could eventually link all of the Construction Applications at your company together in this manner.

Error: Macro Image(Mapping05.JPG, 400px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping05.JPG' does not exist.

Figure 5 - Enterprise-Wide Solution

Benefits of a Confederation of Applications

  • You no longer have to manually rekey information to transfer it between applications.

Complication - You Have to Redo Work Every Time

  • To make everything work together you will likely have to re-discover everything about all of the applications in your confederation each time you map between any two of them. What this means is that when you want to, for instance, create the map between the Engineering Application and the Procurement Application (see Figure 4, above) there will likely be new developers. Unless the second mapping exercise is conducted right away, the original developers may not be available. Thus, even if several applications need all or part of the same line list, the new developers will have to re-discover everything about the Engineering Application every time. As the number of linked applications expands, the effort required to connect them expands exponentially.

Complication - Short Lifetime, High Maintenance

  • For a short period of time, say, the duration of a construction project, linking software applications with custom mapping routines may work just fine. But software tends to get revised with "new and improved" capability. Eventually, one of the "improvements" will be to refine the definitions of the data and the names of the database columns. Your hard-coded custom database application map will break. And if your applications are daisy-chained together, they may all break like a row of dominos. To keep all applications working together someone has to maintain the linking functions.

Complication - Ossification

  • To reduce your maintenance chores, you organization may try "freezing" the design of your software. This will give short-term relief, but at the expense of stopping progress. (Of course some may see this as an advantage!) But eventually your organization will be left with a portfolio of old software than no one knows how to maintain anymore.

What we need is a way of transferring information that does not require us to maintain custom maps. This is where ISO 15926 comes in.

Next


Home
About PCA
Reference Data Services
Projects
Workgroups