
First I will show the full function, then I will break it down further. Now we will have to translate these two formulas to Python to calculate the regression line. head() function provided by Pandas, which will show us the first few rows of the data. To get a look at the data we can use the. data = pd.read_csv('Salary_Data.csv') x = data y = data For this example, we will be using the years of experience to predict the salary, so the dependent variable will be the salary ( y) and the independent variable will be the years of experience ( x). Next, we will load in the data and then assign each column to its appropriate variable. import numpy as np import pandas as pd import matplotlib.pyplot as plt All we will need is NumPy, to help with the math calculations, Pandas, to store and manipulate the data and Matplotlib (optional), to plot the data. The data can be found here.įirst, we will import the Python packages that we will need for this analysis. The data consists of two columns, years of experience and the corresponding salary. Simple Linear Regression Using Pythonįor this example, we will be using salary data from Kaggle.


For every 1-unit increase in the independent variable ( x), there will be a 0.50 increase in the dependent variable ( y). For example, let's say we have a regression equation of y = 2 + 0.5x.
