PyTorch basics
A simple PyTorch tutorial to fit a function with a third order polynomial
This notebook is adapted from original tutorial by Justin Johnson.
In this notebook, we will fit a third order polynomial on y = sin(x)
. Our polynomial have four parameters, and we will use gradient descent to fit the random data by minimizing the Euclidean distance between the predicted output and the true output.
We will see three different ways of fitting our polynomial.
- Using numpy and manually implementing the forward and backward passes using numpy operations,
- Using the concept of PyTorch Tensor,
- Using the AutoGrad package in PyTorch which uses the automatic differentiation to automate the computation of backward passes.
Let's start with numpy!
import numpy as np
import math
import matplotlib.pyplot as plt
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)
# We randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()
# print randomly initialized weights
print(f'a = {a}, b = {b}, c = {c}, d = {d}')
# learning rate
lr = 1e-6
for i in range(5000):
# y = a + bx + cx^2 + dx^3
y_pred = a + b*x + c*x ** 2 + d*x ** 3
# Compute and print loss
loss = np.square(y_pred -y).sum()
if i%100 == 0:
print(i,loss)
# Backprop to compute the gradients of a, b, c, d with respect to loss
#dL/da = (dL/dy_pred) * (dy_pred/da)
#dL/db = (dL/dy_pred) * (dy_pred/db)
#dL/dc = (dL/dy_pred) * (dy_pred/dc)
#dL/dd = (dL/dy_pred) * (dy_pred/dd)
grad_y_pred = 2.0 * (y_pred-y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()
# Update Weights
a -= lr * grad_a
b -= lr * grad_b
c -= lr * grad_c
d -= lr * grad_d
plt.plot(x,y,label = 'y = sin(x)', c = 'b')
plt.plot(x, y_pred, label = 'y = a + bx + cx^2 + dx^3', c = 'r',linestyle = 'dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([-2,2])
plt.legend()
plt.show()
print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')
2. PyTorch: Tensors
We saw how easy it is to fit a third order polynomial using numpy. But what about modern deep neural networks? Unfortunately, numpy cannot utilize GPUs to accelerate its numerical computation. This is where PyTorch Tensor are useful. A Tensor is basically an n-dimensional array and can keep track of gradients and computational graphs. To run a PyTorch Tensor on GPU, we simply need to specify the correct device. But for now, we will stick to CPU.
Let's see how we can use PyTorch Tensor to accomplish our task...
import torch
import math
dtype = torch.float
device = torch.device("cpu")
#device = torch.device("cuda:0") # Uncomment this if GPU is available.
# Create random input and data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(5000):
# Forward pass: compute predicted y
y_pred = a + b * x + c * x ** 2 + d * x ** 3
# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)
# Backprop to compute gradients of a, b, c, d with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()
# Update weights using gradient descent
a -= learning_rate * grad_a
b -= learning_rate * grad_b
c -= learning_rate * grad_c
d -= learning_rate * grad_d
plt.plot(x,y,label = 'y = sin(x)', c = 'b')
plt.plot(x, y_pred, label = 'y = a + bx + cx^2 + dx^3', c = 'r',linestyle = 'dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([-2,2])
plt.legend()
plt.show()
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
3. PyTorch: Tensors and autograd
We saw above how Tensors can also be used to fit a third order polynomial to our sin function. However, we had to manually include both forward and backward passes. This is not so hard for a simple task such as fitting a polynomial but can get very messy for deep neural networks. Fortunately, PyTorch's Autograd package can be used to automate the computation of backward passes. Let's see how we can do this...
import torch
import math
dtype = torch.float
device = torch.device("cpu")
# Create tensors to hold input and outputs
# As we don't need to compute gradients with respect to these Tensors, we can set requires_grad = False. This is also the default setting.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)
# Create random tensors for weights. For these Tensors, we require gradients, therefore, we can set requires_grad = True
a = torch.randn((), device = device, dtype = dtype, requires_grad=True)
b = torch.randn((), device = device, dtype = dtype, requires_grad=True)
c = torch.randn((), device = device, dtype = dtype, requires_grad=True)
d = torch.randn((), device = device, dtype = dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(5000):
# Forward pass: we compute predicted y using operations on Tensors.
y_pred = a + b * x + c * x ** 2 + d * x ** 3
# Compute and print loss using operations on Tensors.
# Now loss is a Tensor of shape (1,)
# loss.item() gets the scalar value held in the loss.
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
# the gradient of the loss with respect to a, b, c, d respectively.
loss.backward()
# Manually update weights using gradient descent. Wrap in torch.no_grad()
# because weights have requires_grad=True, but we don't need to track this
# in autograd.
with torch.no_grad():
a -= learning_rate * a.grad
b -= learning_rate * b.grad
c -= learning_rate * c.grad
d -= learning_rate * d.grad
# Manually zero the gradients after updating weights
a.grad = None
b.grad = None
c.grad = None
d.grad = None
plt.plot(x,y,label = 'y = sin(x)', c = 'b')
# We need to use tensor.detach().numpy() to convert our tensor into numpy array for plotting
plt.plot(x, y_pred.detach().numpy(), label = 'y = a + bx + cx^2 + dx^3', c = 'r',linestyle = 'dashed')
plt.xlabel('x')
plt.ylabel('y')
plt.ylim([-2,2])
plt.legend()
plt.show()
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
Let me know if you have any comments or suggestions.