Introduction to Neural Networks
Neural networks are the foundation of modern deep learning. This article explains perceptrons, activation functions, layers, and how a simple neural network learns from data.
Biological Inspiration
Neural networks are loosely inspired by the human brain. The brain contains ~86 billion neurons, each connected to thousands of others via synapses. Information flows as electrical signals, and learning occurs by strengthening or weakening synaptic connections.
Artificial neural networks mimic this structure: - Artificial neurons (nodes) receive inputs, apply a function, and produce an output. - Weights represent the strength of connections between neurons. - Learning adjusts weights to minimize prediction errors.
The Perceptron
The perceptron, invented by Frank Rosenblatt in 1958, is the simplest neural network - a single neuron.
It computes a weighted sum of inputs and passes it through a step function:
``` output = 1 if (w1*x1 + w2*x2 + ... + wn*xn + b) > 0 output = 0 otherwise ```
Where: - x1...xn are input features - w1...wn are weights - b is the bias term
import numpy as np
class Perceptron:
def __init__(self, learning_rate=0.1, n_epochs=100):
self.lr = learning_rate
self.n_epochs = n_epochs
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for epoch in range(self.n_epochs):
for xi, yi in zip(X, y):
prediction = self.predict_single(xi)
# Update rule: w = w + lr * (y - y_hat) * x
update = self.lr * (yi - prediction)
self.weights += update * xi
self.bias += update
def predict_single(self, x):
linear = np.dot(x, self.weights) + self.bias
return 1 if linear >= 0 else 0
def predict(self, X):
return np.array([self.predict_single(xi) for xi in X])
# Learn AND gate
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0, 0, 0, 1]) # AND: only 1 when both inputs are 1
p = Perceptron(learning_rate=0.1, n_epochs=20)
p.fit(X, y)
print("AND gate predictions:", p.predict(X))
# Output: AND gate predictions: [0 0 0 1]Activation Functions
Activation functions introduce non-linearity, allowing neural networks to learn complex patterns. Without them, a deep network would behave like a single linear layer.
Common activation functions:
- Sigmoid: sigma(x) = 1/(1+e^-x) - Output in (0,1), used for binary classification output.
- Tanh: tanh(x) - Output in (-1,1), zero-centered, better than sigmoid for hidden layers.
- ReLU: max(0, x) - Most popular for hidden layers; fast and avoids vanishing gradients.
- Leaky ReLU: max(0.01x, x) - Fixes "dying ReLU" problem.
- Softmax: e^xi / sum(e^xj) - Converts logits to probabilities for multi-class output.
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh(x):
return np.tanh(x)
def relu(x):
return np.maximum(0, x)
def leaky_relu(x, alpha=0.01):
return np.where(x > 0, x, alpha * x)
def softmax(x):
e_x = np.exp(x - np.max(x)) # Subtract max for numerical stability
return e_x / e_x.sum()
# Test
x = np.array([-2, -1, 0, 1, 2])
print("Input: ", x)
print("Sigmoid: ", sigmoid(x).round(3))
print("ReLU: ", relu(x))
print("Leaky ReLU: ", leaky_relu(x).round(3))
# Softmax for multi-class output
logits = np.array([2.0, 1.0, 0.1])
probs = softmax(logits)
print("\nSoftmax probabilities:", probs.round(3))
print("Sum:", probs.sum()) # Always 1.0Multi-Layer Neural Network Architecture
A modern neural network has multiple layers:
1. Input Layer - Receives raw features (e.g., pixel values, word embeddings). 2. Hidden Layers - Learn intermediate representations. More layers = deeper network. 3. Output Layer - Produces final predictions (class probabilities, regression values).
The number of layers and neurons per layer are hyperparameters chosen by the designer.
import numpy as np
class SimpleNeuralNetwork:
"""2-layer neural network for binary classification."""
def __init__(self, input_size, hidden_size, output_size):
# Xavier initialization
self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2/input_size)
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2/hidden_size)
self.b2 = np.zeros((1, output_size))
def sigmoid(self, x):
return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
def forward(self, X):
# Layer 1: Linear + ReLU
self.z1 = X @ self.W1 + self.b1
self.a1 = np.maximum(0, self.z1) # ReLU
# Layer 2: Linear + Sigmoid
self.z2 = self.a1 @ self.W2 + self.b2
self.a2 = self.sigmoid(self.z2) # Output probability
return self.a2
def predict(self, X):
probs = self.forward(X)
return (probs > 0.5).astype(int)
# Example usage
np.random.seed(42)
nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1)
# XOR problem (not linearly separable - needs hidden layer)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
print("Output probabilities:", nn.forward(X).flatten().round(3))
# Before training, outputs are random - training via backprop would fix thisHow Neural Networks Learn
Neural networks learn by minimizing a loss function that measures prediction error:
1. Forward Pass - Compute predictions from inputs. 2. Compute Loss - Measure error (e.g., Cross-Entropy for classification, MSE for regression). 3. Backward Pass - Compute gradients of loss with respect to weights (backpropagation). 4. Update Weights - Adjust weights using gradient descent: w = w - alpha * dL/dw
This cycle repeats for many epochs until the loss converges.
Key Takeaways
- Neural networks are inspired by biological neurons and learn by adjusting connection weights.
- The perceptron is the simplest neural network - a single neuron with a threshold function.
- Activation functions (ReLU, Sigmoid, Softmax) introduce non-linearity for complex learning.
- Multi-layer networks have input, hidden, and output layers - more layers = deeper network.
- Learning occurs via forward pass -> loss computation -> backpropagation -> weight update.
Contact Us
Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com