Wednesday, 22 January 2014

Synthetic keys are almost always undesirable in qlikview as the calculations in a data model involving Synthetic keys are always resource intensive. Also the data model will look clumsy and often is difficult to understand.
When are they created?
If you have tables that have more than 1 common column then qlikview automatically creates synthetic key tables to have the unique combination of the 2 columns.
How to avoid it?
We can use link tables to avoid them.

SET ThousandSep=',';
SET DecimalSep='.';
SET MoneyThousandSep=',';
SET MoneyDecimalSep='.';
SET MoneyFormat='$#,##0.00;-$#,##0.00';
SET TimeFormat='h:mm:ss TT';
SET DateFormat='D/MM/YYYY';
SET TimestampFormat='D/MM/YYYY h:mm:ss[.fff] TT';
SET MonthNames='Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec';
SET DayNames='Mon;Tue;Wed;Thu;Fri;Sat;Sun';

product:
load * inline
[
product_code, prod_description
1001,Shirt
1002,Pant
1003,Cricket Gear
1004,Belt
1005,Shoes
1006
1007,Pilate wear
1008,Helmet
];

warehouse:
load * inline
[
warehouse_code,warehouse_type, warehouse_description
2001,Retail,DFO Brisbane
2002,Retail,DFO Sydney
2003,Retail,DFO Moorabin
2004,Retail,DFO Chadstone
2005,Retail,Richmond
2006,Retail,Burnley
];

Sales_Fact:
load * inline
[
product_code,warehouse_code,txn_date,sales
1001,2001,10-Jun-2012,20502
1001,2002,10-Jun-2012,2959
1001,2001,11-Jun-2012,6674
1001,2003,11-Jun-2012,6868
1001,2003,11-Jun-2012,2755
1002,2004,12-Jun-2012,43521
1004,2005,13-Jun-2012,2757
1003,2002,13-Jun-2012,6868
1003,2006,13-Jun-2012,4343
1003,2002,13-Jun-2012,9000
1005,2001,15-Jun-2012,8745
1001,2004,15-Jun-2012,6733
1001,2005,15-Jun-2012,8585
1005,2003,15-Jun-2012,2858
];

Stock_Fact:
load * inline
[
product_code,warehouse_code,stock
1001,2001,1000
1001,2002,2000
1001,2001,6868
1001,2003,7857
1001,2003,28727
1002,2004,18285
1004,2005,28585
1003,2002,1010
1003,2006,10
1003,2002,19595
1005,2001,1929
1001,2004,2588
1001,2005,6576
1005,2003,1334
];


if you do this script in qlikview editor and reload the file, you would potentially create synthetic keys.


How to avoid this? We have to create a surrogate key ( _prodwhousekey) based on the combination of the 2 common columns, product_code and warehouse_code. So the sales table will be changed like this. Do the same for the Stock Fact as sell.

Sales_Fact:
load AutoNumber(product_code, warehouse_code) as _prodwhousekey, *;
load * inline
[
product_code,warehouse_code,txn_date,sales
1001,2001,10-Jun-2012,20502
1001,2002,10-Jun-2012,2959
1001,2001,11-Jun-2012,6674
1001,2003,11-Jun-2012,6868
...................................
..................................

Stock_Fact:
load AutoNumber(product_code, warehouse_code) as _prodwhousekey, *;
load * inline
[
product_code,warehouse_code,stock
1001,2001,1000
1001,2002,2000
1001,2001,6868
1001,2003,7857

And now create the link table. This table takes the _prodwhosekey from both the fact tables.

Key:
Load Distinct
  _prodwhousekey,
  product_code,
  warehouse_code
Resident
  Sales_Fact;

Concatenate(Key)
Load Distinct
  _prodwhousekey,
  product_code,
  warehouse_code
Resident
  Stock_Fact;

Now finally drop the natural keys

Drop Fields product_code, warehouse_code From Sales_Fact;
Drop Fields product_code, warehouse_code From Stock_Fact;

Now the data model looks as follows. We have removed the synthetic keys and hence addressed the potential problems it would have caused in terms of the calculations and also the data model is easy to understand.