Locked lesson.
About this lesson
In this module we'll explore how to build dimensional tables that solve the m:m (many to many) linking problem, allowing you to create much more robust models.
Exercise files
There are no related exercise files for this lesson.
Quick reference
Solving m:m Joins via Table De-Aggregation
Creating new tables in order to link tables via one to many relationships.
When to use
Use when you are trying to create relationships, but both tables have many repeating values in the columns you wish to use for the relationship.
Instructions
Before you start
- Creating the new dimension table
- Create a new query by referencing one of the Staging tables
- Select the column you wish to use for linking
- Right click the column --> Remove Other Columns
- Right click the column --> Remove Duplicates
- Set the data type(s)
- Rename the table with a descriptive name
- Load the new table to the model
Create the relationships
- Link the new table to each of the original (not Staging) tables
- Hide the relationship field on the Many side of the relationship
- Update any visuals that used fields from the column you used for creating the relationships
Hints & tips
- You don’t always need to remove all other columns when creating your new table. If it makes sense to take extra fields, then go back to the finalized table and remove the columns you pulled to the dimensional table
- 00:05 Unfortunately, not every many-to-many issue is one that can be solved with
- 00:09 composite keys.
- 00:10 There's another process which we can use,
- 00:13 which allows us to create dimensional tables via what we call deaggregation.
- 00:17 This particular model is a perfect example of this.
- 00:20 We have a Sales table, and inside the Sales table,
- 00:23 we have got approximately 38,000 rows.
- 00:26 Within those 38,000 rows, we have 3 unique locations,
- 00:30 7 unique categories, and 692 unique dates.
- 00:33 What does this mean?
- 00:35 It means the same value must repeat many, many times.
- 00:38 If you've only got 3 locations and
- 00:40 38,000 rows, you know that it's gonna be in there somewhere around 13,000 times.
- 00:44 So that's a many on this particular side.
- 00:48 Likewise, in the Budgets table, we've got 432 rows.
- 00:51 And, again, the same 3 unique locations.
- 00:53 This time we only have 6 unique categories, so 1 must not have been
- 00:57 budgeted, and we've got 24 unique dates that span a couple of years.
- 01:02 Now the challenge is, what we really want to do,
- 01:05 is we want to link these fields together.
- 01:08 But naturally, with a many-to-many, when you try and
- 01:10 do this, you're going to get an error message since it's not supported.
- 01:14 At this point, you'll be told, hey, we can't create the relationship because
- 01:18 one of these columns needs to have unique values on one side, and they don't.
- 01:21 So this is the first problem that we're going to end up running into.
- 01:26 So what do we actually need?
- 01:27 We need tables that have unique entries in between our Sales and Budgets table.
- 01:33 That would allow us to get the one-to-many joined.
- 01:35 So if we still have our 38,000 records on one side and
- 01:39 we still have our 432 records on the other side, can we somehow come up with a way to
- 01:44 get us 3 rows with 3 unique locations on our Locations table, and
- 01:49 7 rows with 7 unique categories on our Categories table?
- 01:52 Does it matters that there's 7 categories here and only 6 in the Budgets table?
- 01:56 No, there's things that don't match, that's fine.
- 01:59 As long as we have the ones we need in the Categories table
- 02:02 then we can get a one-to-many join.
- 02:04 So, this is what we're actually after.
- 02:06 How do we do it?
- 02:07 Well, let's take a look at the model that we've built so far.
- 02:11 What we have so far is we have a data source that feeds into a sales
- 02:16 query inside the power query engine in Power BI.
- 02:19 That lands in the Sales table that feeds our model,
- 02:23 which is then fed into our visuals.
- 02:25 In order to make this work,
- 02:27 what we're going to do is we're going to actually split the sales query so
- 02:32 that we end up with what we call a staging query, which feeds into our sales query.
- 02:38 We're then going to take our sales query and say, you know what?
- 02:41 Don't load this, because we don't really need a separate
- 02:44 table showing up in the model called staging, it's gonna feed into sales.
- 02:48 Why do we do this?
- 02:49 We do it because it allows us to actually go and start building out more logic.
- 02:55 We could go back and say hey look, you know what?
- 02:57 In that Sales table, we have of all of the locations we need.
- 03:01 Now granted, we've got 38,000 rows, but
- 03:04 could we use a query to go and pull out and extract just the unique locations?
- 03:09 So that we could land that in a Locations table,
- 03:12 which we could then use in visuals and link into to our model?
- 03:16 And the answer is, yes, we can.
- 03:19 There's some other stuff that's in that Sales table as well, or
- 03:21 in the Staging table that we create.
- 03:24 And that is that we also know about the unique categories.
- 03:28 So why not go and create a query that calls the Staging query, which,
- 03:31 again, isn't loaded into the eventual tables, and says,
- 03:34 let's go and distill this down to get just our unique categories.
- 03:38 We could then load that out to a Categories table, which, again,
- 03:42 could then be linked into the model and used in our visuals.
- 03:45 So this is the way that we can actually go back and
- 03:47 build that particular structure that we need.
- 03:49 And in the next video, this is exactly what we're going to do.
- 03:53 We're gonna go back and retrofit a model to use this approach so
- 03:57 that we actually have the specific dimensional tables we need
- 04:00 to create a good model to serve visuals in a good way.
Lesson notes are only available for subscribers.