Building a Simple Neural Network from Scratch with NumPy
In this post, we'll walk through a complete implementation of a simple feedforward neural network using only NumPy. This small project is an excellent way to understand the core concepts of how neural networks learn from data using gradient descent and backpropagation, without relying on high-level libraries like TensorFlow or PyTorch.
What Are We Building?
We're building a neural network with:
-
1 hidden layer
-
Sigmoid activation function
-
Manual weight updates via gradient descent
-
No external libraries beyond NumPy
The network will be trained on a small synthetic dataset to learn a binary classification task.
Prerequisites: The Sigmoid Function
The sigmoid function squashes input values into the range (0, 1), which makes it ideal for binary classification problems.
def sigmoid(x, derivative=False):
if derivative:
return x * (1 - x)
else:
return 1 / (1 + np.exp(-x))
The derivative of the sigmoid function is essential during backpropagation when calculating gradients.
The Dataset
We'll define a small dataset with 6 samples and 3 binary features. The corresponding labels are binary (0 or 1).
X = np.array([
[0, 0, 1],
[0, 1, 1],
[1, 0, 0],
[1, 1, 0],
[1, 0, 1],
[1, 1, 1],
])
y = np.array([[0, 1, 0, 1, 1, 0]]).T
Network Architecture and Initialization
We'll use a single hidden layer with 3 neurons. We initialize weights randomly in the range and set a fixed random seed for reproducibility.
np.random.seed(1)
alpha = 0.1 # learning rate num_hidden = 3
hidden_weights = 2 * np.random.random((X.shape[1] + 1, num_hidden)) - 1
output_weights = 2 * np.random.random((num_hidden + 1, y.shape[1])) - 1
Note: We add a bias node by increasing the input dimensions by 1.
Training the Network
We'll train the network over 10,000 iterations using stochastic gradient descent and backpropagation.
Forward Pass
Add a bias term to the input layer.
Compute hidden layer outputs using sigmoid activation.
Add a bias to the hidden layer.
Compute final outputs (no activation on output layer).
input_layer_outputs = np.hstack((np.ones((X.shape[0], 1)), X))
hidden_layer_outputs = np.hstack((np.ones((X.shape[0], 1)), sigmoid(np.dot(input_layer_outputs, hidden_weights))))
output_layer_outputs = np.dot(hidden_layer_outputs, output_weights)
Backpropagation
We calculate the error at each layer and propagate it backward to update the weights.
Output Layer Error
output_error = output_layer_outputs - y
Hidden Layer Error
We remove the bias from the error calculation and apply the derivative of the sigmoid function.
hidden_error = hidden_layer_outputs[:, 1:] * (1 - hidden_layer_outputs[:, 1:]) * np.dot(output_error, output_weights.T[:, 1:])
Gradients
We compute partial derivatives and average them to get the gradients.
hidden_pd = input_layer_outputs[:, :, np.newaxis] * hidden_error[:, np.newaxis, :]
output_pd = hidden_layer_outputs[:, :, np.newaxis] * output_error[:, np.newaxis, :]
total_hidden_gradient = np.average(hidden_pd, axis=0)
total_output_gradient = np.average(output_pd, axis=0)
Update Weights
hidden_weights += -alpha * total_hidden_gradient
output_weights += -alpha * total_output_gradient
Results
After training, we print the network's final predictions:
print("Output After Training: \n{}".format(output_layer_outputs))
Example output (will vary depending on initial weights):
Output After Training: [[0.01] [0.97] [0.03] [0.95] [0.98] [0.02]]
As you can see, the network correctly predicts values close to 0 or 1, aligning well with the training labels.
Key Takeaways
-
Implementing a neural network from scratch builds intuition for how learning happens under the hood.
-
The sigmoid activation and its derivative are essential for backpropagation.
-
Bias terms play a critical role in allowing the network to shift activation thresholds.
-
Vectorized operations with NumPy keep the implementation clean and efficient.
|