Locked lesson.
About this lesson
We discuss how to merge and join data frames together in this video.
Exercise files
Download this lesson’s related exercise files.
Concatenating, Merging, and Joining part 2.docx57.2 KB Concatenating, Merging, and Joining part 2 - Solution.docx
57.4 KB
Quick reference
Concatenating, Merging, and Joining part 2
Merging and Joining allow us to combine data based on the specific data in a DataFrame.
When to use
Use Merging when you have a column in common (like a key column). Use Joining when the dataFrames have different indexes.
Instructions
To Merge two or more dataframes based on a similar column (such as Key):
pd.merge(left_df, right_df, how='inner', on='Key')
To Join two or more dataframes with different row indexes:
left_df.join(right_df)
Hints & tips
- Merge: pd.merge(left_df, right_df, how='inner', on='Key')
- Join: left_df.join(right_df)
- Join Inner: left_df.join(right_df, how='inner')
- Join Outer: left_df.join(right_df, how='outer')
- Join Left: left_df.join(right_df, how='left')
- Join Right: left_df.join(right_df, how='join')
- 00:05 Okay, in the last video we looked at concatenating just a way to combine two
- 00:08 different data frames two or
- 00:10 more data frames together without giving it a whole lot of thought.
- 00:13 We're not doing it based on any sort of logic or anything,
- 00:16 or really based on any data inside the data frame.
- 00:18 We're just slapping them together.
- 00:20 In this video, I want to talk about merging and joining.
- 00:23 And with merging and joining, you're going to merge or
- 00:26 join based on the data inside the data frame, right?
- 00:28 So let's just get right to this here.
- 00:31 So I'm going to paste in a couple of data frames that I created.
- 00:34 And I'm calling the first one left data frame and the second one right data frame.
- 00:39 And you'll see we have columns of Monday, Tuesday and key in the first one and
- 00:44 columns of key Wednesday and Thursday in the second one.
- 00:48 And I stuck with the same convention where the actual row data is M1,
- 00:53 M2, M3, based on Monday, T1 for Tuesday, K1 for key, right?
- 00:58 So you'll see right away, we can run these, we can go left_df.
- 01:06 And we can go right_df.
- 01:08 And you'll notice I didn't create any specific row indexes I just left
- 01:13 the default 0, 1, 2, 0, 1, 2.
- 01:15 And but you'll notice here each of these dataframes has this key column in common,
- 01:21 right?
- 01:21 Each one has a K1, K2 and K3, K1, K2 and K3.
- 01:25 So, we may want to merge these two dataframes and
- 01:29 we want to line them up based on the key column, right?
- 01:33 And if you've done any work with databases,
- 01:35 especially SQL databases, you've often merge based on a key, right?
- 01:40 Each record in your database will have its own key value its own primary key.
- 01:45 And so you'll combine different tables based on a key or a primary key.
- 01:49 That's what we're doing here and the actual syntax is very similar.
- 01:54 We're going to join and merge based on SQL syntax of inner, outer left or right.
- 02:00 If you're familiar with SQL syntax for joining things and merging things.
- 02:03 If you're not, it's no big deal, you don't have to know this stuff.
- 02:06 But if you're curious where this syntax is coming from,
- 02:08 they took it straight out of SQL.
- 02:10 So, let's get rid of each of these and let's merge these.
- 02:16 So to merge one or more data frames we just call pd.merge.
- 02:21 And it's a function and we want to pass in whatever we want to merge.
- 02:24 So let's merge the left data frame and the right data frame.
- 02:32 And now we have to designate how we want to join these,
- 02:34 how we want to merge them.
- 02:35 So how = inner.
- 02:38 Now you can merge by inner, outer left or right.
- 02:42 And those are pure SQL things.
- 02:44 So almost always you're going to do inner.
- 02:46 So I'm not even going to talk about what those other things mean,
- 02:48 I'm not even going to talk about what inner means.
- 02:50 It's just the default one we're always going to use so
- 02:54 we'll just go with that for now.
- 02:56 So inner and then you have to tell it what do you want to merge it on.
- 02:59 So we want to merge it on our key column.
- 03:02 So if we do that boom, we get Monday, Tuesday,
- 03:05 Wednesday, Thursday, and the key column is the same for both of them.
- 03:10 So we only have one column of key, so that's merging.
- 03:13 Pretty simple if you have a column in common, a key column or
- 03:17 a primary id column or something like that,
- 03:20 anything similar between these two dataframes we can merge on that column.
- 03:24 If our data is a little bit different,
- 03:25 we may want to join and join is similar to merging, but
- 03:29 with join allows us to sort of merge data frames that may have different indexes.
- 03:34 You notice up here when we merge this one, it had the same index 0, 1, 2.
- 03:39 What if our indexes were different?
- 03:40 Well, let's look at that.
- 03:42 I'm going to paste in two more dataframes, a new left and a new right one.
- 03:47 And again, we're going to have Monday and Tuesday.
- 03:50 I left off Wednesday just to keep it simple.
- 03:52 So Monday, Tuesday in the first one, Wednesday, Thursday in the second one.
- 03:56 Notice there's no key column in this one but
- 03:58 our indexes I changed I didn't take the defaults.
- 04:01 The first one is 1, 2, 3, the second one is 1, 3, 4.
- 04:04 So let's just look at these real quick left_df and
- 04:10 right_df.
- 04:13 So you notice the indexes are different 1, 2, 3 versus 1, 3 and 4.
- 04:17 And our columns are different Monday, Tuesday, Wednesday, Thursday.
- 04:20 So we want to merge these together, we want to join them, but
- 04:24 there's different indexes.
- 04:26 How do we do this?
- 04:28 Well, we can go left_df.join and then designate
- 04:33 what we want to join it to right_df and when we do that,
- 04:39 we see Monday, Tuesday, Wednesday, Thursday.
- 04:45 1, 2, 3, notice 4 is gone, because there was no 4 in common.
- 04:51 And you'll notice our second data frame didn't have a row two.
- 04:54 It's just 1, 3 and 4.
- 04:56 So when we join these together and we have a row two,
- 04:59 we just get null values for Wednesday and Thursday because it didn't have a row two.
- 05:04 So that's kind of cool.
- 05:06 We can also inside of here specify how we want to do this.
- 05:09 So again, we could go inner which is this and
- 05:13 then we just get the two that is similar 1 and 3.
- 05:17 Each one has 1 and 3.
- 05:19 We could go outer and then we get all of them.
- 05:24 We could go left, this is what we got, I think the first time by default and
- 05:29 we can go right and you get different values based on what you choose.
- 05:34 So very interesting and that's how you join.
- 05:36 So right now we went over concatenating merging and joining pretty quickly, but
- 05:40 as you use them more often going forward, they become pretty easy, and
- 05:44 they're not really that difficult.
- 05:45 So that's all for this video.
- 05:46 In the next video, we'll start to look at operations.
Lesson notes are only available for subscribers.