This notebook is adapted from original tutorial by Justin Johnson.

In this notebook, we will fit a third order polynomial on y = sin(x). Our polynomial have four parameters, and we will use gradient descent to fit the random data by minimizing the Euclidean distance between the predicted output and the true output.

We will see three different ways of fitting our polynomial.

  1. Using numpy and manually implementing the forward and backward passes using numpy operations,
  2. Using the concept of PyTorch Tensor,
  3. Using the AutoGrad package in PyTorch which uses the automatic differentiation to automate the computation of backward passes.

Let's start with numpy!


1. Numpy

Numpy is a great tool for scientific computing but is not very handy for deep learning as it does not know anything about gradients or computation graphs. Nevertheless, it is very easy to fit a third order polynomial to our sine function. Let's see how this can be done...

import numpy as np
import math
import matplotlib.pyplot as plt



x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# We randomly initialize weights

a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

# print randomly initialized weights

print(f'a = {a}, b = {b}, c = {c}, d = {d}')

# learning rate
lr = 1e-6

for i in range(5000):
  # y = a + bx + cx^2 + dx^3

  y_pred = a + b*x + c*x ** 2 + d*x ** 3

  # Compute and print loss

  loss = np.square(y_pred -y).sum() 
  if i%100 == 0:
    print(i,loss)

  # Backprop to compute the gradients of a, b, c, d with respect to loss 


  #dL/da = (dL/dy_pred) * (dy_pred/da) 
  #dL/db = (dL/dy_pred) * (dy_pred/db)
  #dL/dc = (dL/dy_pred) * (dy_pred/dc)
  #dL/dd = (dL/dy_pred) * (dy_pred/dd)

  grad_y_pred = 2.0 * (y_pred-y)
  grad_a = grad_y_pred.sum()
  grad_b = (grad_y_pred * x).sum()
  grad_c = (grad_y_pred * x ** 2).sum()
  grad_d = (grad_y_pred * x ** 3).sum()


  # Update Weights

  a -= lr * grad_a
  b -= lr * grad_b
  c -= lr * grad_c
  d -= lr * grad_d


plt.plot(x,y,label = 'y = sin(x)', c = 'b')
plt.plot(x, y_pred, label = 'y = a + bx + cx^2 + dx^3', c = 'r',linestyle = 'dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([-2,2])
plt.legend()
plt.show()

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')
a = 0.5212317253221784, b = -0.9805915149858428, c = -0.027376927441378273, d = -1.4262831777937377
0 703371.2506724729
100 1990.7685801408975
200 1327.0031477034922
300 885.8603635657721
400 592.5781313206602
500 397.5296011492919
600 267.7642695815589
700 181.39826161321042
800 123.89338756930366
900 85.58851940179167
1000 60.06144593380803
1100 43.041568420184774
1200 31.688032432273822
1300 24.11034884768269
1400 19.049958249677985
1500 15.668641966272823
1600 13.40788464942117
1700 11.895365287550472
1800 10.882761722859414
1900 10.204367249159688
2000 9.749544226693866
2100 9.444380518360695
2200 9.23946886344227
2300 9.101761631434972
2400 9.009139235728266
2500 8.946786269115005
2600 8.904772417197037
2700 8.876436698538871
2800 8.857307622798578
2900 8.844381063434946
3000 8.835637031893949
3100 8.8297160973774
3200 8.825702555431604
3300 8.822979021471838
3400 8.821128846547559
3500 8.819870574849652
3600 8.81901388552776
3700 8.81842995097411
3800 8.818031476578964
3900 8.81775924750458
4000 8.817573052631591
4100 8.817445555564575
4200 8.817358151641226
4300 8.817298164554096
4400 8.817256947450312
4500 8.8172285953216
4600 8.817209070954023
4700 8.817195610956425
4800 8.81718632167016
4900 8.817179903949093
Result: y = 8.989750355029876e-05 + 0.8566886829615072 x + -1.5508815263346535e-05 x^2 + -0.09332296971884167 x^3

2. PyTorch: Tensors

We saw how easy it is to fit a third order polynomial using numpy. But what about modern deep neural networks? Unfortunately, numpy cannot utilize GPUs to accelerate its numerical computation. This is where PyTorch Tensor are useful. A Tensor is basically an n-dimensional array and can keep track of gradients and computational graphs. To run a PyTorch Tensor on GPU, we simply need to specify the correct device. But for now, we will stick to CPU.

Let's see how we can use PyTorch Tensor to accomplish our task...

import torch
import math

dtype = torch.float
device = torch.device("cpu") 
#device = torch.device("cuda:0") # Uncomment this if GPU is available. 

# Create random input and data

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights

a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6


for t in range(5000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

plt.plot(x,y,label = 'y = sin(x)', c = 'b')
plt.plot(x, y_pred, label = 'y = a + bx + cx^2 + dx^3', c = 'r',linestyle = 'dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([-2,2])
plt.legend()
plt.show()
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
99 983.1625366210938
199 657.2805786132812
299 440.57037353515625
399 296.4062805175781
499 200.46615600585938
599 136.59274291992188
699 94.05044555664062
799 65.70246124267578
899 46.803749084472656
999 34.19861602783203
1099 25.786611557006836
1199 20.169822692871094
1299 16.41724967956543
1399 13.908618927001953
1499 12.230535507202148
1599 11.107256889343262
1699 10.354836463928223
1799 9.850460052490234
1899 9.512123107910156
1999 9.284987449645996
2099 9.132362365722656
2199 9.02973747253418
2299 8.960660934448242
2399 8.914128303527832
2499 8.882747650146484
2599 8.861570358276367
2699 8.847265243530273
2799 8.837589263916016
2899 8.831039428710938
2999 8.8266019821167
3099 8.823591232299805
3199 8.821544647216797
3299 8.820154190063477
3399 8.819206237792969
3499 8.818561553955078
3599 8.818121910095215
3699 8.817822456359863
3799 8.81761646270752
3899 8.817476272583008
3999 8.81737995147705
4099 8.817313194274902
4199 8.817267417907715
4299 8.817235946655273
4399 8.817214965820312
4499 8.817200660705566
4599 8.817190170288086
4699 8.817183494567871
4799 8.817177772521973
4899 8.817174911499023
4999 8.81717300415039
Result: y = -6.64535109535791e-05 + 0.8567030429840088 x + 1.146564773080172e-05 x^2 + -0.09332501143217087 x^3

3. PyTorch: Tensors and autograd

We saw above how Tensors can also be used to fit a third order polynomial to our sin function. However, we had to manually include both forward and backward passes. This is not so hard for a simple task such as fitting a polynomial but can get very messy for deep neural networks. Fortunately, PyTorch's Autograd package can be used to automate the computation of backward passes. Let's see how we can do this...

import torch
import math

dtype = torch.float
device = torch.device("cpu")

# Create tensors to hold input and outputs

# As we don't need to compute gradients with respect to these Tensors, we can set requires_grad = False. This is also the default setting.

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)


# Create random tensors for weights. For these Tensors, we require gradients, therefore, we can set requires_grad = True

a = torch.randn((), device = device, dtype = dtype, requires_grad=True)
b = torch.randn((), device = device, dtype = dtype, requires_grad=True)
c = torch.randn((), device = device, dtype = dtype, requires_grad=True)
d = torch.randn((), device = device, dtype = dtype, requires_grad=True)



learning_rate = 1e-6

for t in range(5000):
    # Forward pass: we compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

plt.plot(x,y,label = 'y = sin(x)', c = 'b')
# We need to use tensor.detach().numpy() to convert our tensor into numpy array for plotting
plt.plot(x, y_pred.detach().numpy(), label = 'y = a + bx + cx^2 + dx^3', c = 'r',linestyle = 'dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([-2,2])
plt.legend()
plt.show()
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
99 1246.2117919921875
199 854.2945556640625
299 587.1728515625
399 404.901123046875
499 280.38446044921875
599 195.22470092773438
699 136.91519165039062
799 96.94408416748047
899 69.5127944946289
999 50.665863037109375
1099 37.70232391357422
1199 28.77569580078125
1299 22.622011184692383
1399 18.375377655029297
1499 15.441656112670898
1599 13.412825584411621
1699 12.008347511291504
1799 11.035103797912598
1899 10.360037803649902
1999 9.891359329223633
2099 9.565652847290039
2199 9.339130401611328
2299 9.181443214416504
2399 9.07158088684082
2499 8.994975090026855
2599 8.941520690917969
2699 8.904190063476562
2799 8.878105163574219
2899 8.859864234924316
2999 8.847099304199219
3099 8.838162422180176
3199 8.831900596618652
3299 8.82751178741455
3399 8.824435234069824
3499 8.822273254394531
3599 8.820756912231445
3699 8.819692611694336
3799 8.818944931030273
3899 8.818416595458984
3999 8.818046569824219
4099 8.817787170410156
4199 8.817604064941406
4299 8.817475318908691
4399 8.817383766174316
4499 8.817319869995117
4599 8.81727409362793
4699 8.817242622375488
4799 8.817219734191895
4899 8.817205429077148
4999 8.817193031311035
Result: y = 0.0001697125844657421 + 0.8567075133323669 x + -2.9278115107445046e-05 x^2 + -0.0933256447315216 x^3

Let me know if you have any comments or suggestions.