AiBeginner12 min readUpdated March 2025

Introduction to Neural Networks

Neural networks are the foundation of modern deep learning. This article explains perceptrons, activation functions, layers, and how a simple neural network learns from data.

Biological Inspiration

Neural networks are loosely inspired by the human brain. The brain contains ~86 billion neurons, each connected to thousands of others via synapses. Information flows as electrical signals, and learning occurs by strengthening or weakening synaptic connections.

Artificial neural networks mimic this structure: - Artificial neurons (nodes) receive inputs, apply a function, and produce an output. - Weights represent the strength of connections between neurons. - Learning adjusts weights to minimize prediction errors.

The Perceptron

The perceptron, invented by Frank Rosenblatt in 1958, is the simplest neural network - a single neuron.

It computes a weighted sum of inputs and passes it through a step function:

``` output = 1 if (w1*x1 + w2*x2 + ... + wn*xn + b) > 0 output = 0 otherwise ```

Where: - x1...xn are input features - w1...wn are weights - b is the bias term

python

import numpy as np

class Perceptron:
    def __init__(self, learning_rate=0.1, n_epochs=100):
        self.lr = learning_rate
        self.n_epochs = n_epochs
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        for epoch in range(self.n_epochs):
            for xi, yi in zip(X, y):
                prediction = self.predict_single(xi)
                # Update rule: w = w + lr * (y - y_hat) * x
                update = self.lr * (yi - prediction)
                self.weights += update * xi
                self.bias += update
    
    def predict_single(self, x):
        linear = np.dot(x, self.weights) + self.bias
        return 1 if linear >= 0 else 0
    
    def predict(self, X):
        return np.array([self.predict_single(xi) for xi in X])

# Learn AND gate
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0, 0, 0, 1])  # AND: only 1 when both inputs are 1

p = Perceptron(learning_rate=0.1, n_epochs=20)
p.fit(X, y)
print("AND gate predictions:", p.predict(X))
# Output: AND gate predictions: [0 0 0 1]

Activation Functions

Activation functions introduce non-linearity, allowing neural networks to learn complex patterns. Without them, a deep network would behave like a single linear layer.

Common activation functions:

Sigmoid: sigma(x) = 1/(1+e^-x) - Output in (0,1), used for binary classification output.
Tanh: tanh(x) - Output in (-1,1), zero-centered, better than sigmoid for hidden layers.
ReLU: max(0, x) - Most popular for hidden layers; fast and avoids vanishing gradients.
Leaky ReLU: max(0.01x, x) - Fixes "dying ReLU" problem.
Softmax: e^xi / sum(e^xj) - Converts logits to probabilities for multi-class output.

python

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

def softmax(x):
    e_x = np.exp(x - np.max(x))  # Subtract max for numerical stability
    return e_x / e_x.sum()

# Test
x = np.array([-2, -1, 0, 1, 2])
print("Input:      ", x)
print("Sigmoid:    ", sigmoid(x).round(3))
print("ReLU:       ", relu(x))
print("Leaky ReLU: ", leaky_relu(x).round(3))

# Softmax for multi-class output
logits = np.array([2.0, 1.0, 0.1])
probs = softmax(logits)
print("\nSoftmax probabilities:", probs.round(3))
print("Sum:", probs.sum())  # Always 1.0

Multi-Layer Neural Network Architecture

A modern neural network has multiple layers:

1. Input Layer - Receives raw features (e.g., pixel values, word embeddings). 2. Hidden Layers - Learn intermediate representations. More layers = deeper network. 3. Output Layer - Produces final predictions (class probabilities, regression values).

The number of layers and neurons per layer are hyperparameters chosen by the designer.

python

import numpy as np

class SimpleNeuralNetwork:
    """2-layer neural network for binary classification."""
    
    def __init__(self, input_size, hidden_size, output_size):
        # Xavier initialization
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2/input_size)
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(2/hidden_size)
        self.b2 = np.zeros((1, output_size))
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
    
    def forward(self, X):
        # Layer 1: Linear + ReLU
        self.z1 = X @ self.W1 + self.b1
        self.a1 = np.maximum(0, self.z1)  # ReLU
        
        # Layer 2: Linear + Sigmoid
        self.z2 = self.a1 @ self.W2 + self.b2
        self.a2 = self.sigmoid(self.z2)   # Output probability
        return self.a2
    
    def predict(self, X):
        probs = self.forward(X)
        return (probs > 0.5).astype(int)

# Example usage
np.random.seed(42)
nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1)

# XOR problem (not linearly separable - needs hidden layer)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
print("Output probabilities:", nn.forward(X).flatten().round(3))
# Before training, outputs are random - training via backprop would fix this

How Neural Networks Learn

Neural networks learn by minimizing a loss function that measures prediction error:

1. Forward Pass - Compute predictions from inputs. 2. Compute Loss - Measure error (e.g., Cross-Entropy for classification, MSE for regression). 3. Backward Pass - Compute gradients of loss with respect to weights (backpropagation). 4. Update Weights - Adjust weights using gradient descent: w = w - alpha * dL/dw

This cycle repeats for many epochs until the loss converges.

Key Takeaways

Neural networks are inspired by biological neurons and learn by adjusting connection weights.
The perceptron is the simplest neural network - a single neuron with a threshold function.
Activation functions (ReLU, Sigmoid, Softmax) introduce non-linearity for complex learning.
Multi-layer networks have input, hidden, and output layers - more layers = deeper network.
Learning occurs via forward pass -> loss computation -> backpropagation -> weight update.

Contact Us

Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com