Version 1 (modified by gordonrachar, 15 years ago)


Encapsulation and Abstraction

Status of this document: Working Draft

This document is open for feedback, please post questions and comments in the forum at the bottom of this page. You will need a login to post in the forum.


  1. Abstract
  2. Encapsulation
  3. Abstraction
  4. Back


[Enter abstract]


Roughly speaking, encapsulation means Hiding. It means you hide everything you can about an object and only reveal what others need to see. The purpose of hiding unnecessary information is that it gives all members of the transaction freedom from each other.

Example: A computer subroutine that converts pressure from one system of units to another.

Typically, a program that calls the subroutine only passes the data value to be converted, and receives back only the converted value. The calling program does not have to know the particular algorithm you use. This gives you the freedom to alter the algorithm. Perhaps you need to include a conversion factor that has more significant digits. If the calling program can't see the algorithm, you are free to change the algorithm any time you want. In fact, in this case, it is better for the calling program not to know the algorithm.

An object has many representations. For instance, when an object is stored in a database, the database is a representation of the object. If the same object is stored in a second database, the representation might be completely different, perhaps because of different scopes of work, or different assumptions about the object. If you want to be able to move information between the two databases and you use special knowledge of the two, you create bindings. Your implementation will be fragile--any change to either database runs the risk of breaking the information transfer. But if you move the information in a way that does not depend on any special knowledge of the representations, your implementation will be robust. We say that the two databases each encapsulate the object.

If you have anything less than full encapsulation, your cost of ownership will go up. This runs counter to typical thinking. Usually on an interoperability project, the computer programmers want to know as much as possible about the members of the information exchange.


Abstraction is the result of generalization of an object by reducing the information to only that which is relevent to a particular purpose.

Example: A pump in a refinery has many representations: (See Encapsulation, above.)

  • A very detailed drawing from the manufactuerer showing the baseplate and nozzle dimensions in great detail.
  • A plastic model.
  • A simple symbol on a P&ID.

All of these representations are abstractions of one another.

Example: Five Representations of a Car:

  1. Give me one of these! 2010 Mustang GT
    • 2010
    • Ford
    • Mustang GT
    • Color Red
  2. Give me a Ford car
  3. Give me a car
  4. Give me a vehicle with doors, windows, 4 wheels
  5. Give me a vehicle

Each of these representations is an abstraction of the previous one. Each one increases the abstraction. (Note that #5 includes covered wagons and airplanes.) As you increase the level of abstraction you include more and more and will eventually include everything within it.

Reference Data

It appropriate at this point to bring in another idea, Reference Data. When you have to relate abstractions between two representations, you really need a third. (You can say that one of the representations is the abstract one, but usually leads to compromises you shouldn’t make.) You need a third that truly abstracts the two representations. If you increase the number of representations, it forces you to generalize that abstraction to cover all possible representations. Having a neutral representation of information is whole point of Reference Data.

Reference data is not constrained by any storage method.

By mapping several representations into abstract content, you achieve integration of information. If two or more representations map to one abstract content, those two representations are said to be related to each other. And having that abstraction enables you to achieve the encapsulation you want.

How far do you go? Hmmm... There is an art to finding the Goldilocks point. Not too much. Not too little. Just right.

Look at all the different storages you have of the information. You need a point of encapsulation. And that point of encapsulation is just some boundary that you put on that information on that system. Where you implement that point of encapsulation could be at any point. Could be right in the database. Could be on some programming API or a webservice, or a file.

There are many ways to implement that point of encapsulation. Once you represent a specific representation into an abstracted one there’s the (??? Fine line third party???). What’s in there is just the information itself, but nothing about how it is stored, or location, what’s its schema is.

Here is the mental picture: There are an infinite variety of representations, and there is a representation at every storage point of information, and every storage point needs to expose an abstracted representation that is understood by everybody else, and that abstracted representation is the point of encapsulation.

The point of the abstraction is the encapsulation. Where ever you finally transform your representation into that 3rd party abstracted representation, that is the point of encapsulation.

What’s left is purely the information you want to make public. Private information, schema, column names, table names, storage location path, all should remain private. You want freedom. You don't want nobody depending on that level of detail.

Interestingly, we have a convergence of interests. We generally want to keep proprietary information secret. But this principle of encapsulation says that no one wants the proprietary information anyway, because if we use that proprietary knowledge in the data exchange, we are forever shackled by someone else’s proprietary thoughts.

Most people violate this principle because its so much easier to violate it. It’s easier to connect systems together without encapsulation. The greatest offenders are the people in IT, because they are project-oriented. “What’s the easiest right now.”

Benefit: Total control of what you publish Benefit: No one really even wants your proprietary information


You need abstraction to encapsulate. If you violate encapsulation, your costs go up. You violate interoperability. Most people don’t understand this. Most computer science geeks especially don’t understand this.

The big leap forward with ISO 15926 is that is provided out of the box.



You have no rights to see this discussion.


About PCA
Reference Data Services