Locked lesson.
About this lesson
We need to install some more elements to do a linear regression, so we'll discuss what we need in this video.
Exercise files
Download this lesson’s related exercise files.
Linear Regression Installation.docx59.2 KB Linear Regression Installation - Solution.docx
57.4 KB
Quick reference
Linear Regression Installation
To run Linear Regression analysis, we need to install a number of things.
When to use
You only need to do this once. After it's all installed, you can use it whenever you like.
Instructions
From the Terminal:
- pip install scikit-learn
- pip install matplotlib
- pip install seaborn
- pip install matplotlib
From your Jupyter Notebook, import everything:
- import numpy as np
- import pandas as pd
- %matplotlib inline
- import scipy.stats as stats
- import matplotlib.pyplot as plt
- import sklearn
- from sklearn.datasets import load_boston
- import seaborn as sns
- boston = load_boston()
Finally, create a DataFrame with the Boston Data:
bost = pd.DataFrame(boston.data, columns=boston.feature_names)
Hints & tips
- First Install all the programs we need on the Terminal
- Next import all the modules into your Jupyter Notebook
- Load The Boston Data, then create a DataFrame with it
- 00:05 Okay, so I've created a new Jupyter notebook.
- 00:08 I just went to File > New > Python 3 Notebook.
- 00:11 I named it linear, you can name it anything you want.
- 00:13 And I've imported the things that we have done in the prior videos.
- 00:16 So we imported numpy as np, pandas as pd, and matplotlib inline.
- 00:20 We also need SciPy, which is a stats program for Python.
- 00:24 We need matplotlib, we've probably already got that, but
- 00:27 I'm going to install it again just to make sure.
- 00:29 We need scikit-learn, and that's pretty much it.
- 00:33 So let's head over to our terminal and Ctrl+C to break out of here.
- 00:39 And I'm just going to clear the screen here.
- 00:41 And remember, we're in our C data directory and
- 00:43 our virtual environment is turned on.
- 00:45 So the first thing we want to install is pip install, and
- 00:48 I'm just going to install matplotlib just to make sure it's in there.
- 00:55 Next, we can pip install scikit-learn, and
- 00:59 this is a machine learning thing for Python.
- 01:04 And you'll notice it's also installing scipy, so we don't have to install that.
- 01:10 And finally, I'm going to pip install seaborn.
- 01:15 And like I said, this is just a charting tool,
- 01:17 similar to our pandas charts that we did in the last few videos, but
- 01:20 they're just nicer looking and there's a little bit more functionality.
- 01:24 So we can clear the screen.
- 01:25 Now we can scroll up until we've come to our jupyter notebook command and
- 01:29 run this again so that our Jupyter notebook is now running.
- 01:32 We can come back here and hit reload.
- 01:34 And inside of here, I've just commented these out, I'll go ahead and
- 01:37 uncomment them.
- 01:38 We want to import scipy.stats as stats,
- 01:42 we want to import matplotlib.pyplot as plt, and we want to import sklearn.
- 01:48 Now, this is a little bit different looking than scikit-learn, which we pip
- 01:52 installed, but this is how you import scikit-learn, so it's just sklearn.
- 01:56 And then from sklearn.datasets, we want to import load_boston.
- 02:00 And load_boston, I Shift+Entered to run this, load_boston is
- 02:04 some dummy data that comes with scikit-learn that we can play around with.
- 02:09 And now, finally, to load this, we need to run,
- 02:11 let's just create a variable and I'm going to call it boston.
- 02:15 And we set this equal to load_boston, and it's a function.
- 02:21 Now, later on, we're going to use seaborn, so I'm just going to go ahead and
- 02:24 install this now.
- 02:25 So import seaborn as sns, sns is the convention.
- 02:32 So now we can actually run boston here.
- 02:35 And there's all kinds of stuff.
- 02:36 And first we see data, and it's just got a bunch of data,
- 02:39 we're not really sure what that is.
- 02:40 We also see target, this is a bunch of data, we're not really sure what it is.
- 02:44 target is actually housing prices, I'll just tell you that now.
- 02:47 We have now description, we have feature names,
- 02:51 these are the columns which is going to be in our data frame columns later on.
- 02:56 We have a description, which has a whole bunch of information.
- 02:59 So let's just call a boston.DESC, was it DESC?
- 03:05 DESCR.
- 03:07 If we run that, we get some stuff.
- 03:11 It's kind of hard to read, so let's wrap this in a print, and
- 03:15 now it's easier to read.
- 03:17 And there's just information about what comes with this load_boston stuff.
- 03:21 So we could see attribute information, like I said, these are the column names.
- 03:26 And we still have per capita crime rate by town,
- 03:30 proportion of residential land zoned, looking through here,
- 03:34 age, owner, owner age of the houses, all kinds of stuff.
- 03:39 And you can read here, median value of owner occupied homes in thousands, etc.
- 03:45 So this data was taken from StatLib library
- 03:48 maintained by Carnegie Mellon University.
- 03:51 It's Boston house-price data from, looks like 1978.
- 03:56 So this is very old data, but we don't really care,
- 03:58 it's interesting to mess with.
- 04:00 Now we can look at description, I think target was one of the other ones.
- 04:06 So these are actually house prices in thousands,
- 04:09 I guess, or hundreds of thousands.
- 04:12 Let's see what else was there, feature names.
- 04:17 So let's take a look at that real quick, boston.feature_names.
- 04:21 And like I said, these are going to be the headings for our data frame.
- 04:24 So let's just create a data frame for this stuff,
- 04:27 let's call bost =, it's going to be a pd.DataFrame.
- 04:30 We already know how to do this, right?
- 04:33 And what do we want to pass in to our data frame?
- 04:36 Well, we want the Boston data, so let's go boston.data.
- 04:41 We want columns to equal boston.feature_names.
- 04:49 And now if we run this guy, we can see we've got this data frame.
- 04:53 And so it looks like there's 505 entries.
- 04:57 And we have each of these things in here.
- 04:59 And if you don't remember what any of these are, the CRIM, the ZN, the INDUS,
- 05:03 etc., you can run that boston.DESCR thing again to read all about it.
- 05:08 So we can also, if we wanted to, add the price on here, we don't really want to,
- 05:12 but we can look at bost.target.
- 05:16 And we've already looked at this before, these are the prices of the actual houses.
- 05:20 So we'll play around with these things a little bit later.
- 05:23 So we've got our stuff installed, we've got scikit-learn, we've got Seaborn,
- 05:27 we've got all the things we need, and we've got our data frame.
- 05:31 Now we just need to set up our linear regression, and
- 05:34 put all of our data into it, and then run the analysis.
- 05:36 So we'll start to look at that in the next video.
Lesson notes are only available for subscribers.