```
def linspace(min_val, max_val, npoints):
"""
Creates a list of floating point values between the minimum and
maximum values, with the number of points (+1) to cover the entire
span specified.
Parameters
----------
min_val
The minimum value in the range as a float.
max_val
The maximum value in the range as a float.
npoints
The number of points in the range as a float.
Returns
-------
list[float]
A list of floating point values as specified.
"""
return [min_val+(max_val-min_val)/npoints*i for i in range(npoints+1)]
```

# Simple Gradient Descent with Python

This is a simple polynomial gradient descent. It is a naive implmentation that is not optimized or vectorized. It is meant to be a simple demo on how gradient descent can be accomplished. To keep it simple, it uses base Python. I don’t intend for it to be used for anything important!

## Some Equations

The “target” polynomial that will be fitted is defined as Eqn. 1:

\[ y_i = \sum_{i=0}^{n} a_i x^i \]

The “estimate” polynomial that will be fitted is defined the same way but with a hat of the y Eqn. 2:

\[ \hat y_i = \sum_{i=0}^{n} a_i x^i \]

To fit the polynomial the code will minimize the residual sum of squares (RSS) function Eqn. 3:

\[ RSS = \sum_{i=0}^{n} (y_i - \hat y_i)^2 \]

Gradient desscent needs the partial derivative of the RSS with respect to each ceofficient, which is Eqn. 4:

\[ {\partial \over \partial a_k} = -2 \sum_{i=1}^n \left(y_i-\hat y_i\right) x_i^{k} \]

To define each polynomial, I will gather the coefficients into a vector Eqn. 5:

\[ \mathbf a = [a_1, a_2, ..., a_i] \]

## Custom Functions

The code in this example is built around functions, so I will define these functions in the first part of the notebook and use these functions in the second part of the notebook.

### Linspace Function

Since I am not using numpy, I am making my own linspace function. Later, this function will make the x-axis to evaluate the polynomials with.

### Polynomial Function

This will evaluate a list of y values from a polynomial according to Eqns 1 and 2 with an x-axis and vector of coefficients.

```
def polynomial(x_axis, coeffs):
"""
Evaluates a polynomial along a given axis given coefficients.
Parameters
----------
x_axis:
A list of floats that is the x-axis to supply as x_i.
coeffs:
The coefficients of the polynomial as defined in the vector
in Eqn. 4.
Returns
-------
list[float]
Returns a list of floats that are the values of the polynomial.
"""
= []
ys for x in x_axis:
= 0.
y for i, coeff in enumerate(coeffs):
+= coeff * x**i
y
ys.append(y)return ys
```

### RSS Function

This function will calculate the error between the target and estimate polynomials according to Eqn. 3.

```
def rss(x_axis, y_coeffs, y_hat_coeffs):
"""
Calculates the RSS as defined in Eqn 3 between the two polynomials
specified in Eqns 1 or 2.
Parameters
----------
x_axis
The x-axis as a list of floats with which to compare the
polynomials.
y_coeffs:
The coefficients as a list of floats for the target polynomial.
y_hat_coeffs
The coefficients as a list of floats for the estimate polynomial.
Returns
-------
float
An RSS value between the target and estimate.
"""
= polynomial(x_axis, y_coeffs)
target_ys = polynomial(x_axis, y_hat_coeffs)
estimate_ys return sum((target_y - estimate_y)**2 for target_y, estimate_y in zip(target_ys, estimate_ys))
```

### Gradient Function

This function calculates the components of the gradient of the RSS error between the target and estimate coefficients. It implements Eqn 4.

```
def gradient(x_axis, target_ys, estimate_ys, y_hat_coeffs):
"""
Calculates the gradient of the error between the target and estimate
polynomials and returns the components of the gradient in a list of
floats.
Parameters
----------
x_axis
The x axis as a list of floats.
target_ys
List of floats that is the target polynomial y values.
estimate_ys
List of floats that is the estimate polynomial y values.
y_hat_coeffs
The estimate coefficients as a list of floats.
Returns
-------
list[float]
The components of the gradient as a list of floats.
"""
= []
components for k, _ in enumerate(y_hat_coeffs):
= 0.
component for i, (target_y, estimate_y) in enumerate(zip(target_ys, estimate_ys)):
+= (target_y - estimate_y) * x_axis[i] ** k
component -2 * component)
components.append(return components
```

### Gradient Descent Function

This function uses the gradient to iteravely refine the estimate coefficients to move the estimate closer to the target. It returns the history of the RSS values along the way.

```
def gradient_descent(x_axis, target_ys, target_coeffs, y_hat_coeffs_initial, learn_rate=1e-6, max_iter=1000, rss_threshold=50.):
"""
Performs gradient descent optimization to make the estimate coefficients
converge to values that give a polynomial function with a shape similar
to the target values.
Training continues until max iterations are reached or the RSS diminishes
below the given threshold.
Parameters
----------
x_axis
List of floats that is the x-axis for the polynomials.
target_ys
List of floats that is the target polynomial.
y_hat_coeffs_initial
The initial guess for the estimate coefficients
learn_rate
The rate at which to descend the gradient during fitting. Higher numbers
descend quicker but may not find the true minimum.
max_iter
Integer of the maximum number of iterations the algorithm will attempt.
Used to prevent infinite loops.
rss_threshold
If RSS diminishes below this threshold, training iterations will stop.
Returns
-------
list[dict]
A list of dictionaries containing the training history at each iteration.
"""
= []
fit_history = y_hat_coeffs_initial[:]
y_hat_coeffs for i in range(max_iter):
= polynomial(x_axis, y_hat_coeffs)
estimate_ys = rss(x_axis, target_coeffs, y_hat_coeffs)
estimate_rss
fit_history.append({'rss': estimate_rss,
'y_hat_coeffs': y_hat_coeffs[:]
})if estimate_rss < rss_threshold:
break
= gradient(x_axis, target_ys, estimate_ys, y_hat_coeffs)
current_gradient = [y_hat_coeff-learn_rate*gi for y_hat_coeff, gi in zip(y_hat_coeffs, current_gradient)]
y_hat_coeffs return fit_history
```

## Define the “Target” and “Estimate” Polynomials

### Create the Common X-Axis

`= linspace(-5., 5., 100) common_x_axis `

### Create Initial Coefficients

The target polynomial will be defined by a vector called `target_coeffs`

. The coefficients for the initial estimated coefficients will be `estimate_coeffs`

. `target_coeffs`

will not change since they represent the truth in the estimate polynomial. The `estiamte_coeffs`

will be iteratively updated as the estimate moves closer to the target.

```
= [-2.0, 0.0, 2.5, 1.2]
target_coeffs = [-2.5, 0.0, 2.0, -1.7] estimate_coeffs
```

## Fitting the Polynomial

### Plot the Target and Initial Estimate Polynomials

For the remainder of this notebook, the target polynomial will be in blue and the estimate polynomial will be in orange. I will include the initial RSS in the title.

```
= polynomial(common_x_axis, target_coeffs)
target_0 = polynomial(common_x_axis, estimate_coeffs)
estimate_0 = plt.subplots(nrows=1, ncols=1)
fig, ax ='blue', alpha=0.6, label='target')
ax.scatter(common_x_axis, target_0, color='orange', alpha=0.6, label='estimate')
ax.scatter(common_x_axis, estimate_0, color ax.legend()
```

As a reminder, here are the target coefficients and the initial estimate coefficients:

```
print(f'Target coefficients {target_coeffs}')
print(f'Initial estimate coefficients {estimate_coeffs}')
print(f'Initial RSS {rss(common_x_axis, target_coeffs, estimate_coeffs)}')
```

```
Target coefficients [-2.0, 0.0, 2.5, 1.2]
Initial estimate coefficients [-2.5, 0.0, 2.0, -1.7]
Initial RSS 2015004.0007105004
```

### Fit the Estimate Coefficients

`= gradient_descent(common_x_axis, target_0, target_coeffs, estimate_coeffs) gradient_descent_history `

### Plot the RSS History

As gradient descent fits the estimate coefficients, the RSS should drop with each iteration.

```
= [step['rss'] for step in gradient_descent_history]
rss_history = plt.subplots(nrows=1, ncols=1)
fig, ax 'log')
ax.set_yscale(list(range(len(rss_history))), rss_history, color='green')
ax.plot(f'RSS vs Iteration')
ax.set_title('RSS')
ax.set_ylabel('iteration') ax.set_xlabel(
```

`Text(0.5, 0, 'iteration')`

### Plot the Final Estimate and Target Polynomials

This is a graphical representation of how close the fit is after training.

```
= polynomial(common_x_axis, gradient_descent_history[-1]['y_hat_coeffs'])
estimate_final = plt.subplots(nrows=1, ncols=1)
fig, ax ='blue', alpha=0.6, label='target')
ax.scatter(common_x_axis, target_0, color='orange', alpha=0.6, label='estimate')
ax.scatter(common_x_axis, estimate_final, color ax.legend()
```

### Report the Final Numbers

These are the numeric results of the training.

```
= gradient_descent_history[-1]['y_hat_coeffs']
final_estimate_coeffs = gradient_descent_history[0]['rss']
initial_rss print(f'Training iterations {len(gradient_descent_history)}')
print(f'Target coefficients {target_coeffs}')
print(f'Final estimate coefficients {final_estimate_coeffs}')
print(f'Initial RSS {initial_rss}')
print(f'Final RSS {rss(common_x_axis, target_coeffs, final_estimate_coeffs)}')
```

```
Training iterations 88
Target coefficients [-2.0, 0.0, 2.5, 1.2]
Final estimate coefficients [-2.4649992311439837, 0.1551299749689758, 2.4784435198598582, 1.1914759552905416]
Initial RSS 2015004.0007105004
Final RSS 48.45559484188128
```

## References

- Göbel, Börge.
*Computational Physics*, Section 3. - James, Gareth et al.
*An Introduction to Statistical Learning with Applications in R*, Eqn 3.16.