Locked lesson.
About this lesson
Power Query adds the ability to reach into a folder and append all files in the folder based on the transformations of a single file. Before you can leverage it, however, you need to learn the background, and that's what this lesson is all about.
Exercise files
There are no related exercise files for this lesson.
Quick reference
Append All Files in a Folder - Theory
An introduction on how to import and combine all the files in a folder at once.
When to use
When you want to append data from each file in a folder without having to process each file manually.
Instructions
The Combine Binaries process
- Create a FilesList staging query
- Connect to the folder
- Filter to only the desired file types to be combined
- Load the query as a Connection Only query
- Combine the binaries
- Reference the FilesList query
- Rename the new “Master” query
- Click the Combine Binaries button
- Delete the Changed Type step (to prevent it erroring after modifying the Transform Sample)
- Modify the Transform Sample
- Select the Transform Sample query
- Transform the data as needed
- Finalize the Master query
- Return to the Master query
- Modify the Removed Other Columns1 step to keep any desired file properties
- Add any additional transformation steps required to the Master query
- Set the Data Types
- Finalize the query by loading it to the desired location
This Combine Binaries setup allows us to
- Only hard code the file path into the solution once (the Source step of the FilesList query)
- Apply transformations before the data is combined in the Master Query (pre-aggregation)
- Apply other transformations after the data is combined in the Master Query (post-aggregation)
- Preserve file properties that may not be part of the data files being combined
Hints & tips
- The only queries that we ever modify in this technique are the:
- File Listing
- Transform Sample
- Master Query
- 00:05 As great as it is to have the ability to append individual tables,
- 00:08 what if you have an entire folder full of files to work through?
- 00:12 Maybe a whole bunch of CSV's or text or even Excel files.
- 00:16 Are you gonna import each one and transform each one and
- 00:18 then append them individually?
- 00:20 Man, that would take a long time.
- 00:22 So, it's great that Power Query actually has the ability to import and
- 00:25 combine an entire folder full of files, all at once.
- 00:30 It starts like this, we browse for files in a folder,
- 00:32 we click one button to combine them and that creates a master
- 00:37 transform which is where we started, and injects a sample transform in between.
- 00:43 The purpose of the sample transform is that you get one file out of all of
- 00:48 the ones that you have listed as a sample to use to make your transformations.
- 00:52 And then all of those same steps will be applied to every file in the folder before
- 00:57 being appended into the master transform altogether.
- 01:00 That's a great set up and it actually works really well but I do recommend that
- 01:03 you make a couple of changes to it, rather than just getting files from a folder and
- 01:07 just clicking the button right away, to make these beautiful combines.
- 01:10 What I'll do is, I'll make my file to get files from folders, and
- 01:15 I'll rename that query as FilesList.
- 01:17 And I'll load it as a staging query, connection only.
- 01:21 At this point what I'll do is I will reference that and
- 01:25 then run my one-click combine.
- 01:27 That's gonna inject the sample transform and before I finish with that,
- 01:32 I'm gonna delete the change type step in the master transform,
- 01:35 then modify the sample, and then things will work out relatively well.
- 01:38 Now, it sounds a little confusing, I promise we'll go through all this.
- 01:42 The reason why I do this is this setup actually provides a really easy entry
- 01:46 point in the FilesList.
- 01:47 So if somebody throws a different type of file into this thing,
- 01:51 the files list query gives me easy point to figure out what's going on.
- 01:55 It also means that the file path only gets hard coded into the solution once and
- 02:00 since sharing is kind of one of the things we tend to do, it would really be kind
- 02:04 of nasty to actually have to get people to update in two separate places.
- 02:07 Especially because if you follow this set, when you update the second file path,
- 02:12 it actually adds new steps and breaks the whole things, so you don't wanna do that.
- 02:15 So this is a much better method.
- 02:18 So, what's the sample transform all about?
- 02:20 This is the query where we get to go and deal with things on a file level.
- 02:24 So transformations that need to happen before we actually append things.
- 02:28 So you think, what does that mean.
- 02:30 Well, some examples are where you open a file and you may need to split it into
- 02:34 different columns or remove garbage rows and columns.
- 02:37 Maybe you need to get rid of some of the unnecessary data or
- 02:40 unpivot or group the data before the stuff gets appended.
- 02:44 That's the purpose of the sample transform.
- 02:47 We get one file, we can work through all of the steps we need to do to prepare
- 02:51 the data for appending.
- 02:54 That's what the purpose of the sample transform is.
- 02:57 Now, I'll warn you that when you actually start working with this and
- 02:59 you click the the one click combine, things get a little bit concerning.
- 03:02 You're gonna start with this file folder, you're gonna get your files list.
- 03:08 And then when you click this button, it creates a whole bunch of new things.
- 03:11 It creates a sample file parameter, it creates a sample file,
- 03:15 it creates a transform sample query, it creates a transform function query.
- 03:19 And this can be a little bit overwhelming when you've clicked one button and
- 03:22 all of a sudden boom, all this stuff happens on the left hand side.
- 03:26 The purpose of this though is for you to use the transform sample.
- 03:30 When you've made your changes in there,
- 03:32 it automatically copies the steps into the transform function which is then applied
- 03:37 to every file in the file listing before being appended in the master query.
- 03:41 And if you're thinking holy smokes, how am I supposed to remember which one to use?
- 03:45 Well, it's pretty much this, you use the file listing, the transform sample,
- 03:49 and the master query.
- 03:51 And I'm gonna show you how you can identify which one to actually deal with,
- 03:55 the other three we essentially never touch.
- 03:59 This setup, as complicated as it looks,
- 04:01 actually allows us to do a bunch of different things.
- 04:03 It allows us to do transformations before we actually aggregate,
- 04:06 things like filtering, grouping, unpivoting.
- 04:09 It allows us to do transformations post-aggregation.
- 04:11 So once we get them in the master query, we can apply further steps to our data.
- 04:16 It also allows a very easy way to preserve file properties.
- 04:21 And that is actually a really, really useful thing as well especially if you
- 04:24 have properties that aren't maybe in the data of the file but
- 04:26 are contained in something like the file name.
- 04:29 The master query itself has its own purpose.
- 04:33 This is for
- 04:34 use on transformations that can be applied to every row in the dataset.
- 04:37 Or transformations that have to take
- 04:40 place on the entire dataset rather than individual pieces.
- 04:43 Maybe you need to group the entire data set together.
- 04:46 You would prepare all the individual files and
- 04:48 then do grouping in the master query once everything is there.
- 04:52 Some helpful tips that you may wanna remember, to preserve file properties we
- 04:56 always modify the removed other Columns1 step in the master query, right?
- 05:00 So that's always what's done.
- 05:01 And I'll show you where the happens.
- 05:04 We also may need to remove and
- 05:06 replace the change-type step if it errors after modifying the sample transform.
- 05:11 We actually always have to add a change-step anyway because the data types
- 05:16 don't carryover from the sample transform query.
- 05:19 So, oftentimes, honestly, it's easier to just delete the Changed Type step in
- 05:23 the Master Query, before you start making changes.
- 05:25 That's gonna save you seeing an error, and
- 05:27 will actually allow things to work relatively easily.
- 05:31 One caveat I do want to throw out here, though, if you're working with the Excel
- 05:36 2016 non-subscription version, things are gonna look very different for you.
- 05:41 You will still actually combine all the files in the folder like this, but
- 05:44 you're not gonna get any of the transform samples queries being made.
- 05:47 That's something that will happen when you get into the subscription basis.
- 05:50 So for you, it's essentially like buying everything in advance, and
- 05:54 just using the master query
Lesson notes are only available for subscribers.