Locked lesson.
About this lesson
This module shows how to apply de-aggregation to our sample model. The end goal here is to avoid m:m (many to many) joins by creating new dimensional tables that can be linked into the model via 1:m (one to many) relationships.
Exercise files
Download this lesson’s related exercise files. You can download the source data files for the course from the resources section of your Lessons page.
Table De-Aggregation - Application.pbix369.6 KB Table De-Aggregation - Application - Completed.pbix
405.7 KB
Quick reference
Table De-Aggregation - Application
Creating new dimensional tables for our sample model.
When to use
This module is a practical example of retro-fitting an existing model with staging queries, as well as creating new tables for one to many joins.
Instructions
Assume we had a Sales and Budget table where each had a Category column with repeating values
Creating staging queries
- Select the Sales table
- Ensure the last step is “Changed Type” (and set the data types if not)
- Right click the “Changed Type” step -> Extract Previous
- Call the new query Staging-<table name>
- Right click the query in the queries pane (left) and uncheck “Enable Load”
- Repeat for the Budget table
- Move both queries to a new group called “Staging”
Creating the new dimension table
- Create a new query by referencing one of the Staging tables
- Select the column you wish to use for linking (Category)
- Right click the column -> Remove Other Columns
- Right click the column -> Remove Duplicates
- Set the data type(s)
- Rename the table with a descriptive name (Categories)
- Load the new table to the model
Create the relationships
- Link the new (Categories) table to each of the original (not Staging) tables
- Sales[Category] to Categories[Category]
- Budget[Category] to Categories[Category]
- Hide the relationship field on the Many side (Sales and Budget tables)
- Update any visuals that used Sales[Category] or Budget[Category] to use Categories[Category]
- 00:05 Before we get into building our properly structured tables.
- 00:08 Let's quickly create and
- 00:09 example here on a new page of why we're doing this to begin with.
- 00:13 What we're gonna start with, is we are gonna make a nice little cluster column
- 00:17 chart, and for my Budgets I'm gonna put on the category and the amount.
- 00:20 And you can see that I get a nice little chart that runs in it,
- 00:23 picks in around $2 million with the sales in wine.
- 00:26 And then I'm gonna go and create another little column chart,
- 00:30 but this time I'm gonna put my Category and my Total Sale.
- 00:33 Now, you can see that we have a very similar pattern here.
- 00:37 Because we try to budget somewhat of the same as what we're looking for sales,
- 00:41 which makes sense.
- 00:42 But here's the next question, what if I wanna compare budgets and sales?
- 00:46 I'm gonna go copy the Budget chart, paste it down here.
- 00:51 Here we are, we'll move it in here.
- 00:52 And now, this is budget with the amount and the category,
- 00:55 I'm gonna add the total sales that you can see in the chart up here.
- 00:59 And you'll notice that we get these really tall bars that are nowhere close to what
- 01:03 the values are in our original chart.
- 01:05 And if you actually were to go through and count this up, you'll find that the total
- 01:09 data set has about $5.2 million worth of sales, which is what comes up here.
- 01:13 It's actually doesn't know a way to be able to filter this information so
- 01:16 it gives us all of the records in the Sales table.
- 01:18 So plainly that's not good,
- 01:20 we need to have a separate categories table to work with.
- 01:23 So let's go make those changes.
- 01:24 So let's edit our queries.
- 01:27 Now, inside we see the nice little tables of Sales and Budgets and that's great.
- 01:31 But one of the things that I'd like to do at this point is I'd like to make sure
- 01:34 that I can split this into a staging and loading approach.
- 01:37 Before I do that though, I really wanna make sure that my last step here is to
- 01:42 change site step, which sets the data types.
- 01:44 And this is something that's really important to actually do before you load
- 01:48 your data the model is always make sure that last step really is a change type
- 01:52 step just to make sure the data types are correct.
- 01:54 Now, in this case, the data types have been set but what I'm gonna do is,
- 01:57 I'm going to right click change type and
- 01:59 I'm gonna re apply the same data types to these particular columns.
- 02:03 So again, I'm gonna change type here to a decimal number,
- 02:06 we'll reset this again to a date.
- 02:07 I don't need to change type as local if it's already been set to a date,
- 02:12 so that's helpful.
- 02:13 I'm not really happy with this one anyway on Hour,
- 02:15 I think I'm gonna change this to become a whole number.
- 02:18 And Category and Item Name, we can change to be Text.
- 02:22 Units Sold, too, we'll reset that again to a whole number.
- 02:26 And the Unit Price, and the Total Sales column.
- 02:28 Again, we'll change type to be a fixed decimal number.
- 02:32 Now, that this is done, we have our final step being Change Type,
- 02:36 this is a perfect place to right-click and
- 02:40 extract the previous steps into a separate query called Staging-Sales.
- 02:45 And what you'll see when we do this is that the Sales query loses most of
- 02:49 it's steps.
- 02:49 It has Source and Change Type.
- 02:52 Change Type of course sets the data set types.
- 02:54 And when you look at Source it now refers to hashtag quote staging-sales which is
- 02:58 pulling it's results from the Staging-Sales query which is all of
- 03:02 the previous steps that we did.
- 03:04 So that works out really nicely.
- 03:06 And the next thing I'm gonna do is right click on the guy here.
- 03:09 I'm gonna disable the ability to load it, the reason for
- 03:12 this is because I don't wanna separate table in my model for this.
- 03:15 I only want the Sales table to show up.
- 03:18 I'm also gonna put this into a nice of a bucket by saying to
- 03:21 Move To Group New Group, and I'm gonna call this one Staging.
- 03:25 And now I can actually classify my queries nicely.
- 03:28 I'm gonna do a similar thing for Budgets.
- 03:31 We're gonna jump over to the Budgets table.
- 03:32 And you'll notice that down at the bottom here,
- 03:35 the Change Type step is the last one, so this makes it really convenient,
- 03:38 because I can now right-click and say Extract Previous.
- 03:41 I can give this one a nice little name like Staging-Budgets and say OK.
- 03:47 Right click on it, uncheck the enable load and I'm fine with the possible data loss
- 03:51 warning, nothing has been loaded to the model, it's not a big deal.
- 03:55 And we're gonna left click and drag him up and
- 03:57 put him into my nice little Staging pocket as well.
- 04:00 So now I have two individual pieces here.
- 04:03 And if you ever get this warning about a preview here, you can just refresh it and
- 04:06 that will make sure that everything is nice and up to date for you.
- 04:09 So now with these two pieces in place, what can I do?
- 04:13 Well this is the beautiful thing.
- 04:15 I can jump onto the Staging-Sales, right-click on it, and say Reference.
- 04:21 This will create me a new query called Staging-Sales(2) that I'm gonna use to
- 04:25 build up my Categories query.
- 04:27 So I'll change the name to Categories and
- 04:30 I'm gonna move it into my other queries group because I do want this to load.
- 04:34 So how do I make a good categories query?
- 04:36 I'm gonna come over here and find the Category column,
- 04:40 I'm gonna right-click on it and say Remove Other Columns.
- 04:45 I'm gonna right-click on it again and say Remove Duplicates.
- 04:49 And I now have a beautiful little column that shows me just the unique items,
- 04:53 that's great.
- 04:54 I now need to do the same with my locations.
- 04:56 So again I'm gonna go to my Staging-Sales.
- 05:00 Right click, and Reference it.
- 05:03 And in this case, I'm gonna pull over all the information around the location.
- 05:07 So we're gonna grab the Location all the way through to the Longitude.
- 05:10 Right click, remove other columns, and I've got lots of repeating rows.
- 05:15 So why don't I grab the location name only,
- 05:19 right click and then say Remove Duplicates.
- 05:22 And I've now got a nice three line table that shows me just the locations I'm
- 05:25 looking for, that's perfect.
- 05:27 So I'm gonna go and call this one Locations.
- 05:32 And again, I'll move it into my nice little group and just because I kinda like
- 05:36 to have everything ordered nicely in alphabetical order.
- 05:39 I'm gonna move my Sales down as well.
- 05:40 So I now have four tables.
- 05:41 I've got my Budgets, which is the original budget.
- 05:44 I have Categories, which is my unique list of categories.
- 05:47 A unique list of locations with the columns that are needed for
- 05:50 those that are specific to that.
- 05:51 I've also got my Sales table.
- 05:54 But wait, look,
- 05:55 I've got a whole bunch of information about these columns I don't really need.
- 05:59 Now, this information is all in the Staging-Sale's table, but
- 06:04 it's also in the end load.
- 06:06 Because this one is pulling from staging, I can now go back and
- 06:11 say these five columns, right click, and remove, why?
- 06:16 Because when I relate the location name to what's in the locations table
- 06:21 I can pull back these particular properties.
- 06:24 So I'm done now.
- 06:25 I can say Close and Apply.
- 06:28 And it will load into my model, which is great.
- 06:32 And here we go.
- 06:33 And now, I can click on the relationship view and make sure that I have some nice
- 06:38 relationships between my tables that are setup in one to many fashion.
- 06:42 And if I go and take I look now, you can see that everything looks pretty good.
- 06:46 I've got one too many relationships flowing bound on both sides.
- 06:50 Now, let's go back to our model for a second and see.
- 06:53 Nothing has been fixed.
- 06:55 Well, the reason is all around the category that's being used on this chart.
- 07:01 Let me just go to the categories table here, uncheck this Category and
- 07:06 put this Category on the model instead,
- 07:08 and look how much nicer that actually looks and works.
- 07:12 I can now remove these two charts here.
- 07:16 This actually shows the importance of going back into in doing one more thing.
- 07:20 When we go back into our relationship view,
- 07:23 if we find the one to many joined and right click and I do side of the many,
- 07:28 go to this join, just right click and hide the category.
- 07:32 Again we're always doing hiding the star side.
- 07:35 All right, that one in report view and the location.
- 07:39 When we go back over to our model tables now,
- 07:41 you'll notice that the location field does not show on budgets or
- 07:45 in sales it only shows under location as does the category field.
- 07:49 So this prevents you from getting into a situation where you put the wrong thing on
- 07:53 your models
Lesson notes are only available for subscribers.