Good article initiated by Henrik Liliendahl completed by the very good comment of John Owens. I decided to re-blog it in order to continue and complete it. As John Owens mentioned, “QUACKs (Quack Alternative Codes & Keys, also called Structured Codes) are a very useful way in business of referring to a frequently used entities, such as products, locations, etc. The problems start when data analysts confuse them with Unique Identifiers and then system designers further compound this error by implementing these QUACKs as Primary Keys in tables. This embeds a flawed data structure into every single record that is created in which these flawed primary keys are used foreign keys, for example, in tables linking a flights to St Petersburg airport. Now, had the designers used an unstructured Primary Key and shown “LED” simply as a QUACK referencing the Leningrad/St Petersburg airport, the code “LED” could at any time have been very simply changed without in any negative impact on referential integrity in the data sets involved. All past and future flights would automatically reflect the change in the airport code. The reason the code LED is hard coded in this way is not an innate part of MDM, it is the result of bad data analysis and worse systems design. A major part of current MDM is to move the knowledge, practices and skills into enterprises to enable them to avoid to making these totally avoidable and hugely costly errors.” All of us have faced this kind of problems, consciously or not. This can be a very big issue especially when having very tightly connected legacies. Such original mistakes can lead to decades of re-engineering, trying to decouple legacies in order to be able, later on to bring more flexibility. As you may notice, this is a 2 steps approach that does not bring any business value first – while decoupling – and might bring a Return Of Investment 3 or 5 years after the initiative has been started… This is the main reason why, most of these initiatives are never started!! Actually, the only cases I’ve seen such projects initiated – to try to correct the initial error – was due to an unique phenomenon called: “hit the roof”. “Hit the roof” is a side effect of mainframe developers usage, who most of the time were the same ones who did the initial error: “using the QUACKS as primary key”. “Hit the roof” is quite simple as well, after several decades of usage, some QUACKS table value reach the end of the range. We could take the previous example – the airport code based on 3 letters, with one assumption: only letters, no numbers. It gives you a total of: 13,824 possible airports, which might sounds reasonable if we think about the main airport, but which will be quickly reached as soon as you include the small aerodromes into the same table… I guess you saw me coming!! I am not an airport specialist, but I am quite sure that some politicians will have this brilliant idea, one day or another – if it is not already done – to have a common repository (Table) for all airport and aerodrome of the world… Then, we will hit the roof! Ok, very good that we “hit the roof” you should tell me, finally, it gives us the opportunity to correct the initial mistake… Well… I won’t be so optimistic if I were you, even though if I agree with the reasoning: having a golden opportunity to put things back into order… Most of the time, the “solution” that wins is the following: let’s introduce numbers in the 3 digits code that we got! Wouhou! Jackpot! We increase from 13,824 values to 39,304… And that will cost only 8 to 10 millions $ to the company!!! You think I’m joking, unfortunately I am not. So, what to do then? To start with, the less silly solution is to create “Alias” to smoothly (can take years of continuous efforts, depending on how much your systems are “connected” to each others) move from the “QUACKS ID” that was initially set to a real (dummy) object ID (primary key). It is not the silver bullet that some might look for.. but it’s the only reasonable way forward I know about. Any experience to share on this topic?
Originally posted on Liliendahl on Data Quality:
All airports have a tree letter code usually being a mnemonic of the city name or airport name. The airport at Saint Petersburg in Russia thus has the code LED because the code was assigned when the city was called LEningraD. That’s how it is with master data: Names may change but the code of an entity must be kept as it was. And that’s why you usually shouldn’t put meaning into codes.
The Russian MDM (Master Data Management) market has been well described by Dmitry Kovalchuk in a post on the Hub Design Magazine.