Locked lesson.
About this lesson
When building models, many users pull in data that is far too granular, resulting a many more records than they need. Power Query offers a tool to group that data for you, so that you aren't wasting storage space on unnecessary records.
Exercise files
Download this lesson’s related exercise files. You can download source data files for the course from the resources section of your Lessons page.
Grouping Data.xlsx19.4 KB Grouping Data - Completed.xlsx
25.8 KB
Quick reference
Grouping Data
Grouping data allows aggregating tables of data to get to a summarized level.
When to use
Use when you have an excess of detailed data and only need the summarized version for your analysis.
Instructions
Getting started
- Create a new query to bring your data to Power Query
- Go to the Transform tab --> Group By --> Advanced
Adding/modifying grouping levels
- Choose the field to group by from the “Group By” drop-down
- To add new grouping levels click Add Grouping
- Mouse over the right side of the drop-down and click the ellipses (…) to
- Change the grouping order
- Delete a grouping level
Adding/modifying aggregation columns
- In the bottom half of the Group By dialog
- Enter the name you wish for your aggregated column
- Choose the type of aggregation for your column
- Choose the column you wish to aggregate
- Add new aggregation levels by clicking Add Aggregation
- Mouse over the right side of the aggregation and click the ellipses (…) to
- Change the grouping order
- Delete a grouping level
Hints & tips
- Changing a grouping operation will not update the new column name for you
- You can set as many grouping levels or aggregations as you like
- There is no need to remove columns you do not wish to preserve before applying grouping as only the grouped columns will remain after grouping your data
- Aggregation operations include
- Calculating the Sum, Average, Median, Min or Max of the column
- Counting the rows or Distinct rows of data in the grouped column
- Preserving All Rows that made up the grouped data (for later expansion)
- 00:04 I'm usually a fan when I'm trying to do some serious grouping of
- 00:08 bringing in all the data and then using a pivot table to group it.
- 00:13 But sometimes you connect to tables that are big enough that you really only need
- 00:17 a summary, so it doesn't make sense to bring all the data across.
- 00:22 And in that case, we should actually probably use power created group things
- 00:25 because we don't need to bring all the data to Excel and land it in Excel first,
- 00:29 consuming variable resources and taking extra space.
- 00:34 So to demo this one, we're gonna go and we're gonna create a new query
- 00:37 against a database, and we're gonna use an Access database.
- 00:42 We're gonna drill down into our GoSkill sample files.
- 00:46 We're gonna find the data from Database Folder, and we're gonna choose POS, and
- 00:51 we'll say Import.
- 00:53 Now, this will open up the list of tables.
- 00:56 We're gonna grab table cheats,
- 00:57 because that's what the majority of the detail is that we need.
- 01:00 And we'll say Edit.
- 01:02 And once this comes into power query with a nice preview here, we can look and
- 01:06 say hmm, what columns do we actually need?
- 01:10 So I'm gonna expand the table categories because that's the one that I'm really
- 01:14 interested in.
- 01:15 And the reason being is because It has the category description and
- 01:19 the category group.
- 01:21 I'm gonna uncheck the use original column name as prefix and we'll say okay.
- 01:26 So this brings me over my category description, sandwich, breakfast,
- 01:29 non-alcoholic beverages, it brings me my point of sale group as well, and
- 01:33 I've got a bunch of other data in here as well.
- 01:35 Most of which, to be honest with you, I really don't even need.
- 01:40 So if I don't need it, should I start attacking it and removing things?
- 01:44 I'll say, well, you know what?
- 01:45 At the end of the day, no I'm really actually not worried about it because of
- 01:49 the way that the grouping algorithm works.
- 01:51 So you'll notice now if we go to Transform,
- 01:55 we have a big option here to group by.
- 01:58 And when I click on Group By, it's gonna bring me up a dialog.
- 02:02 And you'll notice that because I'd already selected two columns here,
- 02:06 it's immediately taken me into advanced and
- 02:09 said, you must want to actually go and group those.
- 02:12 The ironic thing is, is that I didn't even really do anything here.
- 02:16 Those columns got selected when I expanded my category.
- 02:19 So if I don't want that, I would need to go and actually start removing grouping
- 02:23 layers by hitting this little ellipse to make it delete.
- 02:28 As it happens, I'm actually not unhappy with this at all.
- 02:31 What I'm gonna do though is that I'd really rather group
- 02:34 by the point of sale group being food and alcohol first.
- 02:38 So I'm gonna go and say let's move this one up.
- 02:41 You'll notice that I could add other fields if I wanted.
- 02:44 Should their be another field, I would click to add a grouping level and
- 02:47 I would be able to choose whatever the heck it was that I wanted.
- 02:50 In this case, I'm not really fussed about it so I'm gonna go and
- 02:53 say let's delete that grouping level and get it out of there.
- 02:56 Now down the bottom it says, what´s the new column name that you'd like.
- 02:59 So this is the actual grouping itself, how is it gonna aggregate things?
- 03:03 And it says count, would you like to count the rows?
- 03:06 Well I'm not really sure I´m all that interested in the count of rows,
- 03:10 what I'm more interested in is actually doing a sum.
- 03:13 And it says, well, would you like to sum the idea?
- 03:15 No, I don't wanna sum the idea, I wanna sum the number of units that were sold.
- 03:20 Notice that the new column name does not change, it's still staying with count.
- 03:26 I'm not 100% sure how I feel about this.
- 03:28 I think it might upset me more if I'd already put in my name first and
- 03:33 then it changed it on me when I changed the operation.
- 03:35 But regardless, you will need to manually type your new name there, so
- 03:40 go with units.
- 03:41 I'm also gonna add another aggregation as well for this one, and
- 03:45 notice it doesn't give me a description.
- 03:48 I'm now gonna go and say, you know what,
- 03:51 why don't you give me a sum also of the amount.
- 03:55 And we'll call this one total sales.
- 04:00 And that's really what it takes to make grouping.
- 04:02 At this point what you're gonna notice is something very interesting.
- 04:05 All those extra columns that I had, let's watch what happens when I click OK.
- 04:13 You'll notice that all of those columns got dropped,
- 04:16 now it's not quite as quick as you saw because I cut some of the time out.
- 04:20 It took about 30 seconds altogether to execute.
- 04:23 But at the end of the day, it's actually now grouped all of my alcohols together,
- 04:28 each group my category descriptions together.
- 04:30 It's given me my sum of units and my total sales.
- 04:33 So now I can come back and say, hey, my units in sales,
- 04:36 we're gonna right-click Change Type and set them to Currency.
- 04:40 So we'll give this guy here a new name of group.
- 04:45 We'll give this one a new name of description.
- 04:50 And now it's as simple as going and saying home, close and load, and after
- 04:56 a couple of seconds I have a beautiful new table that's pulling just 16 rows of data
- 05:02 from the database rather than the 350,000 that made up the pre-aggregated numbers.
Lesson notes are only available for subscribers.