Locked lesson.
About this lesson
The groupby function allows us to group different pieces of data together. We'll discuss the function in this video.
Exercise files
Download this lesson’s related exercise files.
Groupby Grouping Data.docx57.2 KB Groupby Grouping Data - Solution.docx
55.5 KB
Quick reference
Groupby Grouping Data
The .groupby() function allows us to group data in our DataFrame together.
When to use
Use .groupby() whenver you want to group data together in any given way.
Instructions
To group data in a DataFrame by a certain column:
variable = df.groupby('Column')
Once grouped, we can run universal functions such as max, min, mean, std, var, sum, describe, etc.
variable.sum()
variable.min()
variable.max()
variable.mean()
variable.var()
variable.std()
variable.describe()
Hints & tips
- df.groupby('Column')
- 00:05 Okay, in this video, I want to talk about grouping by.
- 00:08 And group by allows us to just like a sounds, group different data together.
- 00:12 And a lot of times we're going to group by column and
- 00:15 then do something with that grouped data.
- 00:17 So I've just created a basic data frame here, and I created some data.
- 00:23 And we're going to have columns of corporation, employee, and salary.
- 00:27 And the columns in there are going to be corporations.
- 00:32 There's a list of the corporations we're going to have.
- 00:34 For the employees, there's a list of employees.
- 00:36 And salaries, there's a list of their salaries.
- 00:38 Assume this is like in thousands, right?
- 00:41 So, you know, 100,000, I put 180, 80,000, 80, etc.,
- 00:45 just to make it easier to look at.
- 00:46 So then we can run this and just create a data frame like we've done before and
- 00:51 pass in that data.
- 00:52 And if we look at this, we can just see we've got Corporation, Employee and
- 00:57 Salary, we got some Amazon, Tesla and Apple, some employees and their salaries.
- 01:02 So let's say we want to group by the company, right?
- 01:05 We can go df.groupby, and then just call whatever column,
- 01:11 so we got the corporation.
- 01:13 Now, if we run this, we get this DataFrameGroupBy object, and
- 01:17 this is just a location in memory.
- 01:20 So to actually do something with this, let's add this to a variable.
- 01:26 Now, we can also just call this, and we'll get that same object, but
- 01:31 now we can do stuff to this variable.
- 01:33 So say, for instance, we want to find the sum of all the salaries for
- 01:38 each company, we could call the sum function.
- 01:42 Remember, we looked at these functions way back at the beginning of the course.
- 01:45 So the sum of Apple salaries is 180.
- 01:48 So we can confirm that 100 plus 80 is 180.
- 01:50 Apple's 246, Tesla's 330.
- 01:53 We could find the average, the mean, right?
- 01:59 90, 123, and 165, very cool.
- 02:02 Now notice, it's not putting out anything for employees.
- 02:05 And that's because our Employee column, these are strings, right,
- 02:08 this is text, and Python can't find the average of text, right?
- 02:12 It needs numbers.
- 02:12 But interesting, we can call the max function, remember the min and the max?
- 02:17 And we do get Employee column spit out for that.
- 02:20 And that's because Python can take strings and
- 02:24 find the max alphabetical value, so the highest alphabetical value.
- 02:29 So for instance, Steve, S comes later in the alphabet than J, John.
- 02:34 So the max would be Steve, kind of interesting, right?
- 02:36 We could do the same thing for min and we get that.
- 02:40 All kinds of things we could do.
- 02:42 We could do standard deviation, right,
- 02:44 ff we wanted to find the standard deviation of our salaries or the variance.
- 02:47 You could do that.
- 02:51 All those universal functions we looked at we could count, right?
- 02:56 So there are two employees per company.
- 03:01 So it's 2 and there are two salaries per company, so that's 2.
- 03:05 Very cool so I'm going to put this back to mean.
- 03:07 So that's how to group by.
- 03:09 There's one more I want to show you the describe.
- 03:11 This is kind of a good one because describe,
- 03:14 it spits out all that information in one big thing.
- 03:18 So we can find the count, the mean, the standard deviation, the min.
- 03:22 It looks like quartiles and the max.
- 03:26 So very, very cool.
- 03:27 And just a fun thing to spit,
- 03:29 out just to get a description of all your data in there.
- 03:32 So that's group by.
- 03:35 Pretty simple, but pretty powerful, and
- 03:38 you're going to use things like this a lot.
- 03:40 So in the next video, we'll look at merging, joining, and concatenating.
Lesson notes are only available for subscribers.