Version 23 (modified by gordonrachar, 15 years ago)

--

Mapping Databases to Each Other is Expensive

Status of this document: Ready for Cold Eyes Review

This document is open for feedback, please post questions and comments in the forum at the bottom of this page. You will need a login to post in the forum.


Contents

  1. Abstract
  2. Mapping In a Perfect World
  3. Mapping in the Real World
  4. A Custom Map

Abstract

When we want to exchange information between two software applications, the traditional way is to map the respective databases together. We preserve context by examining the terms in each data base to determine which are equivalent. This is expensive and only done in high-value situations.

Mapping In a Perfect World

Let us say that you are an engineer working for a construction company planning the erection of some piping spools. Part of your job is to plan the hydrotesting of all the pipelines before commissioning the plant. So, along with a great deal of other information, you need to know the design temperatures and pressures of all the pipelines in the project. Fortunately, the design engineer has given you a Line List, a list of all the pipe line numbers and their critical attributes.

Your first job, then, is to simply get the design engineer's line list into a form you are used to dealing with--your company's construction management software. You might decide to bite the bullet and just rekey the information manually. But this gets old really quickly and after a few dozen lines you will be wondering why you can't just download the data from the engineer's design software into your construction software automatically. After all, the design engineer did not (in this day and age) have a secretary type the line list onto a piece of paper with a typewriter and fax it to you; it almost certainly exists in an electronic database somewhere.

Now it is in the design engineer's best interest to make sure that the construction contractor has the correct information, so it is easy to imagine that "your IT folks" and "their IT folks" get together to give you a connection over the Internet to just "suck in the data." And since the "lexical scope" of a Line List is only a few terms, it is easy to imagine that the data will "go right in".

"Artificial intelligence is the study of how to make real computers act like the ones in the movies." - Port 2000 Newsletter, The Information Technology Newsletter for Port Washington Educators, Dec. 96

In a perfect world, everyone developing an application for some part of plant design would use the same column names, and ensure the meaning of the contents of the columns were the same. (In database jargon, the "schemas" would be the same.) Figure 1 shows what this would look like.

Error: Macro Image(Mapping01.JPG, 500px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping01.JPG' does not exist.

Figure 1 - A Perfect World

Mapping in the Real World

But of course in the real world, things seldom work out this nicely. Even in the rather small scope of a "line list" (where there the "lexical scope" is only a few terms) there is usually ambiguity.

Error: Macro Image(Mapping02.JPG, 500px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping02.JPG' does not exist.

Figure 2 - Actual Situation

For instance, in Figure 2, two columns have names that do not match. One of them, "tag_no" in the Engineering Application is most probably equivalent to the "id" column in the Construction Application. But what about the column "cl" in the Construction Application? And, for that matter, what does "temp" really mean? Could it mean "Temporary"? It's possible. It is common to have temporary lines erected for commissioning, then demolished when a plant is put in production. And further, if you are being a bit paranoid (and if you have to sign off on the actual hydrotest pressure you better be at least a bit paranoid) what about "press"? It almost certainly means "pressure", but what kind of pressure?

  • Operating Pressure
  • Maximum Allowable Working Pressure
  • Design Pressure

If you have to specify the hydrotest pressure, you have to know for sure.

The reality is that we are inferring everything, based on context.

The only solution is to start digging. Fortunately, the lexical scope of a line list is only a dozen or so terms. To determine if "temp" means "temperature" or "temporary", you will have to look at a few rows of data. If the data values are "y/n", or "0/1", it points to "temporary". But if the data values are numbers in the range of, say "-50" to "1500", it probably means "temperature".

"press" will take a bit longer. You will have to look for clues in other documents and use your engineering judgment to determine which kind of pressure it is. You may have to contact the design engineer.

What looked fairly simple just a while ago is looking to be a bit more complex.

"There is always an easy solution to every human problem--neat, plausible, and wrong." - H. L. Mencken, 1917 (Variously attributed to Albert Einstein, Winston Churchill and others.)

A Custom Map

If you have to import information from the Engineering Application more than once, you will probably have someone create a custom mapping application.

Error: Macro Image(Mapping03.JPG, 600px) failed
Attachment 'wiki:ISO15926Primer_Mapping_Expensive: Mapping03.JPG' does not exist.

Figure 3 - A Custom Solution

In Figure 3, someone has examined the data coming from the Engineering Application and determined which columns matched those of your Construction Application. In software development jargon they have "Mapped the databases together." The red box represents software that uses the map to transfer information from the Design application to the Construction application.

Note that very little data is actually transferred. For instance if we know that both applications expect pipe sizes based on ASME B36.10M, we would only have to transfer numerical digits, such as "6" or "12".


Discussion

You have no rights to see this discussion.

Home
About PCA
Reference Data Services
Projects
Workgroups