In data quality services, a knowledge base (KB) consists of domains. And domains has: domain rules, term based relations, domain values and reference data. In this post, we’ll see a use case of Term based relations. But before we do that, you can consider reading: Difference between Term based relations and Domain values in SQL server 2012 Data Quality Services
Now, Here’s Term Based Relation’s in action
After data profiling you realize that there are records such as “my Company Inc.” and “my Company Incorporated” – so you set out to automatically find these mismatches in terms inside a value and correct them.
BEFORE cleaned data, the report showed that “my company Inc” revenue is less than that of Google:
AFTER cleaned data, the report correctly shows that “my company Inc” revenue is more than that of Google:
Steps taken to clean data:
(Just an overview, not covered in a step by step fashion)
1. Created the Knowledge Base w/ Two domains Company Names & Revenue
2. Term Based Relations Tab of Company Names domain:
3. Published the KB > Let’s create a DQS project
4. Mapped the domains:
5. DQS cleaned following records:
6. Exported the data and created a report out of clean data-set!
Business user is happy :)
In this post, we saw how to correct a word/term within a value in a domain. The example we used was Inc. , Inc and Incorporated . It can be used to correct terms like Jr. and Junior. Sr. and Senior. etc. Things like this are difficult to catch during data entry – But using Term Based Relations, a person who knows the Data can clean it so that it generates correct reports.
After all reports like this are of little to no use, are they? So Let’s NOT create confusing reports anymore.
- The revenue figures shown are just for demo purposes. I pulled up these numbers from Wikipedia. Please don’t make any financial decision based on these reports and if you do, I am not responsible for that.
- The name “my Company Inc” is a fictional firm. It’s not any firm that I am/was associated with in past, future of present. It’s a fictional name!
And Writing disclaimers like this are NO fun – sucks the joy out of “Thinking out Loud” :)
- Difference between Term based relations and Domain values in SQL server 2012 Data Quality Services (parasdoshi.com)
- How to clean records using Regular Expressions in Data Quality Services? (parasdoshi.com)
- How to standardize data using Data Quality Services? (parasdoshi.com)
- How to detect unrealistic or invalid values using Data Quality Services? (parasdoshi.com)
- Guest Blog: How we use Fuzzy Lookup add-in in our company to solve data inconsistency problems: (parasdoshi.com)