Locked lesson.
About this lesson
Importing data from a text file into an Excel table.
Exercise files
Download this lesson’s related exercise files. You can download source data files for the course from the resources section of your Lessons page.
Getting data from Text files.xlsx19.7 KB Getting data from Text files - Completed.xlsx
30.5 KB GoSkills Power Query Source Data.zip
8.6 MB
Quick reference
Getting Data from Text Files
An overview of importing and cleaning data from a text file.
When to use
When you use a text file as a data source for your query and must clean up the (often) messy results so the data can be loaded into the data model.
Instructions
Getting started
- Create a new query --> From File --> From Text/CSV --> browse for the file to import
- When greeted by the preview window, click Edit
Helpful tips for cleaning up text files
Removing irrelevant rows
- Go to Home --> Remove Rows
- Can be used to remove Top x, Bottom x, or even alternating rows
Tips for removing characters from text
- Leading and trailing whitespace: right click the column header --> Transform --> Trim
- Internal whitespace: right click the column header --> Replace Values --> replace 2 spaces with 1
- Non-printing characters: right click the column header --> Transform --> Clean
Splitting data into columns
- Right click the column header --> Split Column
- Options include splitting by delimiter or by a specified number of spaces
- You can also split at just the left most, right most or repeating intervals
Merging columns
- Select the columns to merge (hold down CTRL or SHIFT when clicking to select multiple columns)
- Right click any of the selected column headers --> Merge Columns
- Options include adding a separator character
Other tips
- Use Promote First Row to Headers to give the columns decent names
- Rename column headers by double-clicking the header name and typing a replacement
- To change the data type, click on the symbol and choose the data type you want
Setting data types
- The symbol to the right of the column header indicates the data type (e.g. ABC is text, 123 is whole number, $ is currency, a calendar grid is date, etc.).
- Change a data type by clicking the symbol and choosing a new one
- Set a data type with Locale by clicking the symbol and choosing Using Locale --> choose the Data Type (e.g. Date) and the Locale (e.g. English United States)
Errors
- Appear in green font
- Click on the white space next to Error to display the actual error message
- Remove all rows that contain errors via the Remove Rows function --> Remove Errors
- Errors should be removed as soon as they are caused before other filtering operations
Filtering
- Use the filter arrows to the right of the column header to remove rows
- Filters can be used to remove values (including nulls) but not errors
Hints & tips
- You can go back through the steps applied to the query in the Query Settings window
- Any query step can be deleted by clicking on the “X” to the left of the query step
- It is recommended that you always remove rows with errors before filtering out rows with nulls
- If you filter out the nulls first and then trigger an error, the data after the row on which the error was triggered is cut off and removed, and then not all of your data will be loaded into the model
- You may have to remove errors and nulls multiple times after new steps are applied
- 00:04 We're now gonna look at some of Power Query's amazing features for
- 00:08 data clean up, cuz we're gonna import this awful-looking text file.
- 00:11 It's not delimited by anything in particular,
- 00:14 you'll notice it has got non-printing characters in it as
- 00:17 recognized by these little up arrows in Windows 10.
- 00:19 It's got repeating header rows and garbage,
- 00:22 data, all kinds of things and it goes on for we don't know how long.
- 00:27 This is an ASCII style text file print out and it is beautifully formatted to look
- 00:31 great on a dot matrix printer which of course we all still have in our office.
- 00:36 Trying to import a file like this is a painful exercise.
- 00:39 If it's five pages, you'll see people scrolling down and deleting rows and
- 00:43 whatnot, but that doesn't scale well for 500,000 pages.
- 00:46 So let's go look at how we can use Power Query to pull this in.
- 00:50 Once again, we're gonna get data from a file, from a text or CSV file.
- 00:55 And I'm gonna drill down into my GoSkills data,
- 00:59 into data from text and I'm gonna pull in GL Printout for 2006 Q1.
- 01:04 You'll notice that Power Query this time gives me a single column preview,
- 01:09 so this is not starting off so well.
- 01:11 So we're gonna say Edit.
- 01:14 And now, we're gonna need to play around with this and try and
- 01:17 do some different things.
- 01:18 Now, if your data looks like this, don't worry, go to the View tab and
- 01:23 make sure you've checked every single button that's on here.
- 01:26 We particularly want monospaced and show white space.
- 01:30 There were some versions of Power Query that didn't have these very old ones, and
- 01:34 it would actually make your data look like this,
- 01:35 which was really hard to break apart.
- 01:38 So monospace puts every character in the same width, and
- 01:41 showing white space gives us a little bit more to actually work with.
- 01:44 And what we're gonna do right now is, we're gonna try and clean this up.
- 01:48 So here is the first thing I wanna look at.
- 01:50 The first ten rows of this file don't appear to have any value,
- 01:55 they're just headers.
- 01:57 So, there is a feature here called Remove rows, remove top rows,
- 02:01 which will remove the top, in this case, 10 rows of data.
- 02:05 And we'll split those off.
- 02:08 Now we've got a whole bunch of leading white space here, and
- 02:11 possibly some trailing white space here.
- 02:13 I want to get rid of that.
- 02:14 So there is a function for this.
- 02:16 Right click > Transform, we're going to trim it.
- 02:20 Now will remove all leading and trailing white space.
- 02:23 Unlike Excel, it does not trim internal white space.
- 02:27 At this point, I'm gonna transport this into columns.
- 02:30 I'm gonna right click and I'm gonna say, split column,
- 02:34 and I'm gonna split because I don't have a delimiter here, by number of characters.
- 02:40 And I'm gonna make a guess, I'm gonna count number of characters here, 10,
- 02:43 I'm gonna give it a couple extra, 12.
- 02:45 I'm gonna split repeatedly and say OK.
- 02:49 That did not work.
- 02:50 And this is the beautiful thing about Power Query, is we can go back and say,
- 02:54 well, we had a source, I removed the top rows, I trimmed the text,
- 02:58 I split column by position and it automatically added this change type step.
- 03:03 I'm gonna delete these last one, I'm gonna delete that step and
- 03:07 I'm gonna come back to that gear item here.
- 03:10 And I'm gonna try again, I'm gonna try and
- 03:12 split this time here by lets try 15 characters.
- 03:17 And that actually doesn't look so bad, so I'm gonna stick with that.
- 03:20 Now, I've noticed here that these things appear to have been split too far apart.
- 03:27 Actually, all of this stuff here.
- 03:29 Let's take all these and try and put them back together.
- 03:31 So we'll right-click, and
- 03:33 we'll merge these columns back together because, well, no separator, that's fine.
- 03:39 I'll call it Merge, that's okay too.
- 03:42 Because I've noticed that there's dash as the delimiter here.
- 03:46 So I'm gonna now right-click and split column, this time by a delimiter.
- 03:55 And Power Query picks up a custom delimiter of a dash.
- 03:59 But I wanna be careful here because I can see that we have a dash here, but
- 04:02 this looks like it's a category, and then maybe a vendor name.
- 04:06 I don't know, some vendors like to hyphenate their name, so
- 04:09 I'm gonna only split at the left most delimiter here as well,
- 04:12 just in case I get a hyphenated vendor in the future.
- 04:15 So split that apart.
- 04:16 Now that actually doesn't do so bad.
- 04:19 The next thing I'm gonna do is I'm gonna promote my first row to headers to try and
- 04:22 give myself some context.
- 04:24 So I'll click this button here, and then I'm gonna start to do some clean up.
- 04:29 I'm gonna call this first column Date, and I'm gonna call the next column Amount.
- 04:37 This column here looks like it's all full of garbage and blanks.
- 04:42 So I don't think I really need that.
- 04:44 So I'm going to right-click and remove it.
- 04:47 The source, if I double-click it looks like it got some extra spaces at the end,
- 04:51 so I'll just get rid of those.
- 04:54 This looks like it should be called Category.
- 04:59 And this one looks like it should probably be called Vendor.
- 05:05 I'm now gonna play around with these dates.
- 05:06 They're showing this text right now but I need to force this to make
- 05:09 sure it comes in as a proper date that can be read for any of you.
- 05:13 So I'm gonna click on this little ABC.
- 05:14 And I'm gonna go to using local.
- 05:19 And I'm gonna change it to be a date type.
- 05:22 And the locale I'm gonna choose is English US, which we find by pressing F.
- 05:28 Because there is a lots of Englishes, and US is down near the bottom.
- 05:30 And we'll say OK.
- 05:33 Now, when we do that, it converts this into my format, but
- 05:37 it also triggers a bunch of errors.
- 05:40 Now, errors are green word, we also want to click in white space beside green words
- 05:44 to get the actual error message itself.
- 05:46 And it says, we couldn't convert tran date into a valid date, which makes sense.
- 05:51 In this case here, we couldn't covert XYZ company to a date, or general.
- 05:55 All of these make sense because they're not real dates.
- 05:58 And the beautiful thing about this is that everywhere there's an error,
- 06:01 it's garbage going all the way across.
- 06:03 If I highlight this column, and I go to remove rows.
- 06:07 And remove rows with errors,
- 06:09 that would get rid of every one of those rows that had an error in there.
- 06:12 So that's cool.
- 06:13 Errors are beautiful here because you can use it to get rid of data.
- 06:16 I've also got some areas here that are null.
- 06:20 We always remove errors before we remove nulls.
- 06:22 But then we can filter here, say get rid of the null and say OK.
- 06:26 Now there is some extra dates in here that looks a little bit weird but
- 06:31 I'm going to come back to those after I've dealt with the amount column.
- 06:35 The amount column I'm going to convert using currency.
- 06:39 And now you'll notice I have triggered some more errors.
- 06:46 So I'm going to remove those errors.
- 06:50 And then I can see, I've got some null values, so
- 06:54 I'm going to remove the null value.
- 06:57 Now, just as a note, if you're in Europe and you use commas for
- 07:00 your decimal places, you may need to change this to currency using locale.
- 07:03 Because I use periods as decimal numbers and commas upfront, so
- 07:08 just be aware of that.
- 07:10 The next item, we’ve got some dot, dot,
- 07:12 dots on the end of this, that means there’s extra spaces on the end.
- 07:16 So I’m gonna right-click and I’m gonna transform and I’m gonna trim.
- 07:22 My category looks like it's all fine.
- 07:25 I'll just right-click, Transform and Trim that, to be sure.
- 07:30 And my vendor I'm gonna do something similar with as well.
- 07:33 Right-click > Transform > Trim to get rid of any extra spaces.
- 07:38 But you'll notice that I've got spaces in the middle, that's not good.
- 07:41 Right-click, we are gonna replace values, and
- 07:46 I'm gonna place two spaces with one space.
- 07:50 That's a trick to get rid some of these, but
- 07:52 you'll notice it still hasn't fixed this guy.
- 07:54 And this because those none printing characters,
- 07:57 we can get rid of those by right-clicking > transforming and cleaning them.
- 08:04 And at this point,
- 08:05 I now have a beautifully reformatted file that I'm gonna call transactions.
- 08:10 And what you're gonna see here is when I click close and
- 08:13 load, it's going to load to a table.
- 08:15 And the most important thing about this that I want you to recognize
- 08:19 is that this file is contiguous, there are no garbage rows and columns.
- 08:22 It doesn't matter if it was 5 pages or 500,000 pages, I got them all nice and
- 08:27 quick, and totally refreshable.
Lesson notes are only available for subscribers.