Gradient Descent Algorithms
It's about to go down! 👇
Series
Series
A gentle introduction to Linear Regression
Bijon Setyawan Raya
February 11, 2022
4 mins
Introduction
Linear Regression
Mathematics of Gradient Descent
Batch Gradient Descent
Mini Batch Gradient Descent
Stochastic Gradient Descent
I am sure most of you reading this post right now are familiar with the following graph, and we did use equation in high school or college to find the distance between two points in a 2D coordinate.
Not it is able to determine the distance between two points, but it can used to get a machine learning model to learn better.
Let's say that we have an Iris dataset from sklearn-learn
. Plotting it will give us the following visualization.
Clearly, we can see that there are two features involved in the classification: sepal_length
and petal_width
.
The relationship between these two features can be written as follow.
You should know that the intercept or is the starting point of the regression line. Whether the line is going up or down depends on the and the data. If , it means that our regression line will start from .
Expressing the equation like we did above is quite cryptic for people who don't have strong mathematical background. Since we are using the Iris Dataset, we can translate the equation into a more readable form.
From the translation above can tell us the relationship between those two variables.
Now you will be wondering their correlations whether sepal_length
and petal_width
are correlated or inversely correlated.
sepal_length
and petal_width
are said to be correlated when sepal_length
increases, the petal_length
also increases.
Conversely, sepal_length
and petal_width
are said to be inversely correlated when sepal_length
increases, but the petal_width
decreases.
With a regression line, it can help us to predict the value given a single value. However, most predictions made by the regression line are not always accurate since its ability to predict depends heavily on and . If the values and are not tweaked correctly, the regression line will sit right far from most data points. Let's see an example down below where datapoints are far from the regression line.
When the regression line, which is the green line, sees then it predicts . In reality, should be when . Meaning that our regression line is very bad at prediction. Since our regression line is bad at predicting, that indicates that we have a huge Mean Squared Erorr value.
Calculating the MSE for the graph above, we have
Mean Squared Error (MSE) has always been used to measure the quality of regression lines. If the MSE of a certain regression line is miniscule, we can say that the regression line is relatively better at prediction. If the MSE of a certain regression line is large, then it's the opposite. Here is the equation for MSE.
where is the number of data points, is the predicted value, and is the actual value.
Let's see another example where the data points are close to the regression line.
Let's calculate the MSE for this example to see if the MSE is small when the regression line is close to most data points.
From the MSE value above, we can verify that when the regression line is close to most data points, the MSE value will be small. Although this regression line is not really that good at predicting, but it still predicted better than the one in the previous example.