Locked lesson.
About this lesson
In this module we will explore another common Many-To-Many join issue which can be solved by creating a "Bridge" table.
Exercise files
Download this lesson’s related exercise files.
Solving Many-To-Many Joins with Bridge Tables - Theory.docx66.7 KB Solving Many-To-Many Joins with Bridge Tables - Theory - Solution.docx
66.7 KB
Quick reference
Solving Many-To-Many Joins with Bridge Tables - Theory
Bridge tables allow us to create separate tables with unique keys in order to link tables via One to Many relationships.
When to use
When you are trying to create relationships between tables, but both tables have many repeating values in the columns you wish to use for the relationship.
Instructions
Create a Staging Query
- Edit the Power Query that creates the table which has been loaded to Power Pivot
- Make sure that the final step of the query is to set the data type for each column
- Right click on the final step --> Extract Previous
- Name the new query “Staging – “ and the original table name
- Close and load the query as a connection only, but do not choose to load it to the data model
Create the new Dimension table:
- Create a new query by referencing the newly created Staging query
- Select the column(s) you wish to keep --> right click --> Remove Other Columns
- Right click the column you wish to use as the “one” side of the relationship -> Remove Duplicates
- Rename the table with a descriptive name
- Load the new table to the Data Model
Edit the Original query
- If you kept more than one column when creating your new Dimension table, remove the additional columns from the original table
Create the relationships:
- Link the new table to each of the original tables (not the staging query tables)
- Hide the relationship field on the Many side of the relationship
Hints & tips
- If there could be values in each table to be linked that don’t appear in the other, it may be a good idea to create two staging queries, append them, and then remove duplicates again to create your dimension. This way you will ensure you always have all entries in the bridge table
- This technique can be combined with the Composite Key technique covered in the Solving Many-To-Many Joins with Composite Keys section
- 00:04 There's another type of many-to-many problem that can sometimes occur that
- 00:09 doesn't have anything to do with actually building composite keys.
- 00:13 Let's take a look at this scenario.
- 00:14 We've got two tables here, one of Sales and one of Budgets.
- 00:18 The challenge we have is that there is 38,000 and
- 00:20 somewhat rows in the sales table.
- 00:22 And we've got three unique location names.
- 00:25 We've got seven unique categories, 692 unique dates.
- 00:29 In the budgets table, we have 432 rows.
- 00:32 We still got three unique locations and
- 00:34 this time only six unique categories in the budgets and we have 24 unique dates.
- 00:39 The challenge is, is that the only logical keys to merge these guys are those fields.
- 00:45 And everyone one of those would be a many to many join which you've already seen
- 00:49 is not supported by Power Pivot.
- 00:52 So what do you do?
- 00:54 And the answer is that you actually have to go out and
- 00:56 spin off your own tables of unique entries to bridge these two tables together.
- 01:02 And the way that that looks is something like this.
- 01:04 What we want to do is create a separate locations table and
- 01:07 a separate categories table that sit in between sales and budgets.
- 01:12 So that we can actually create the one to many relations we need.
- 01:16 In this case, the Sales table is still the same old table.
- 01:19 It's got a same 38,000 rows with the same three unique locations, and
- 01:22 seven unique categories, and 692 unique dates.
- 01:26 Budgets, it's same 432 row table with all the same characteristics it had before.
- 01:31 But what we've done is we've actually spun out a separate location table
- 01:35 that only has three rows.
- 01:38 It has the three unique locations that exist on both the Sales and
- 01:42 the Budget's table.
- 01:43 And the category's table?
- 01:45 It's got seven rows, which lists the seven unique categories which
- 01:49 are both common between the Sales and the Budgets table.
- 01:55 So how do you go about doing something like that?
- 01:57 I mean, you connect to these tables in the database,
- 01:59 there's really not a lot you can do.
- 02:00 Well, guess what?
- 02:01 Again, this is why Power Query is so
- 02:05 important to use when you're going and collecting your data.
- 02:07 Because with Power Query, we can absolutely do the spinning off individual
- 02:12 tables from the raw data source, even if they don't originally exist.
- 02:17 How do we do it?
- 02:18 Well, let's take a look at our query chain.
- 02:22 We know that when we start with our data, there's always that first.
- 02:26 And then we are creating here a sales query.
- 02:29 That is what the little guy is with the x's and
- 02:31 the manipulation that we are doing there to turn that into query.
- 02:35 That query gets landed into a Sales table which then gets landed into
- 02:39 the Power Pivot, Pivot Table.
- 02:41 So the Sales table is in the data model.
- 02:44 The ales transformation is the query itself and Power Query.
- 02:47 What we do is we go back into Power Query and
- 02:50 we actually break the sales query into two separate components.
- 02:55 So it's gonna split like this, and we call the first one staging.
- 02:58 So anything that we can find that's common we actually break out into the staging
- 03:02 query.
- 03:03 And then we say,
- 03:03 all right, I'm gonna use my sales query is gonna point back to the staging query.
- 03:07 And then say I'm gonna do a little bit more work here before I land this into
- 03:10 the sales table in Power Pivot.
- 03:13 Why is that important?
- 03:14 Well, because I can then reference the staging query and
- 03:19 create a new query to say give me just the locations.
- 03:23 Remove all the extraneous columns.
- 03:25 Remove all the duplicates and
- 03:27 create a table in Power Pivot that's only got the three location rows.
- 03:31 With that, I can now link it into the model and
- 03:34 then use it in the Pivot Table as a separate dimension.
- 03:38 Likewise for my categories, I can go back and reference the staging query again,
- 03:42 remove all of the columns that don't have anything to do with categories.
- 03:45 I can then filter it down to get rid of everything that's a duplicate,
- 03:50 land that into a Power Pivot table for categories, link it into the mode, and
- 03:54 I now have a dimensional table to drive my Pivot Table.
- 03:58 So, this is a fantastic use of Power Query to be able to reshape data
- 04:03 on the fly as needed when those individual tables are not accessible.
- 04:08 The reason this is so important is because, if your connecting to a database,
- 04:13 it's very likely that you can get to the original locations,
- 04:16 sales, categories table.
- 04:18 But what if you're connecting to a database that doesn't have those broken
- 04:21 apart, or you're only allowed to connect to a view or we haven't looked at this but
- 04:26 what if that data source was actually a text file?
- 04:29 You can totally do that with Power Query, connect with text files,
- 04:32 spin seven tables out of the one text file,
- 04:34 and boom, you still got the ability to actually go through here.
- 04:37 So this tends to speak to why Power Query becomes very important and
- 04:41 why you should learn more about it.
- 04:42 And what we're gonna do is we're gonna actually take a look at combining
- 04:45 this technique with the compass key technique,
- 04:49 to fix the Mulligan's model that we've been playing with.
Lesson notes are only available for subscribers.