Implementing batch gradient descent
This is the next part of my discussion of gradient descent.
As promised in the previous part, I will show how I implemented gradient descent, particularly batch gradient descent.
The implementation uses the nature of gradient descent as described by the following equation
The gradient is calculated using the partial derivatives of the mean squared error cost function which are
The function’s interface consists of the
Y sets of x-y pairs for some dataset, initial
bias hyperparameters, the number of
epochs to run the algorithm for, and the learning rate
gamma. It works by calculating the sum of all successive gradient descents using all the x-y data pairs supplied. The current hyperparameter value is used as the accumulator for each gradient. The number of times the algorithm runs the summing process is based on the number of epochs.
The full source code:
def gradientdescent(X, Y, weight = 0, bias = 0, epochs = 500000, gamma = 0.00001): current_h0 = weight current_h1 = bias for epoch in range(epochs): previous_h0 = current_h0 previous_h1 = current_h1 for i in range(len(X)): current_h0 += -gamma * (-2 * X[i] * (Y[i] - (previous_h0 * X[i] + previous_h1))) current_h1 += -gamma * (-2 * (Y[i] - (previous_h0 * X[i] + previous_h1))) return (current_h0, current_h1)
It converges to the local minimum fairly well and fast enough with a large number of epochs for not-too-many data pairs, although I haven’t measured it in any objective way. And for some reason, a learning rate higher than 0.00001 produces large numbers that yield show-stopping Python
nans during run-time.
Now, I have implementation achievements for simple linear regression, k-means clustering, and batch gradient descent in my portfolio. A portfolio of success that grows bit-by-bit.