Skip to main content
Machine LearningIntermediate16 min readUpdated March 2025

Principal Component Analysis

PCA is the most widely used dimensionality reduction technique. It transforms correlated features into a smaller set of uncorrelated principal components that capture the maximum variance in the data.

Why Dimensionality Reduction?

High-dimensional data poses several challenges known as the Curse of Dimensionality:

- Data becomes sparse in high dimensions, making distance metrics unreliable - Models require exponentially more data to generalize well - Training becomes slow and memory-intensive - Visualization is impossible beyond 3 dimensions

Dimensionality reduction addresses these by compressing data into fewer dimensions while retaining the most important information.

How PCA Works

PCA finds new axes (principal components) that capture the maximum variance in the data. The steps are:

  • Step 1: Standardize - Center the data (subtract mean) and scale to unit variance.
  • Step 2: Covariance Matrix - Compute the covariance matrix to understand feature relationships.
  • Step 3: Eigendecomposition - Find eigenvectors (directions of maximum variance) and eigenvalues (amount of variance explained).
  • Step 4: Sort - Sort eigenvectors by eigenvalue in descending order.
  • Step 5: Project - Project the original data onto the top K eigenvectors (principal components).

Explained Variance Ratio

Each principal component explains a certain percentage of the total variance. The explained variance ratio tells you how much information is retained:

- PC1 always explains the most variance - PC2 explains the second most, orthogonal to PC1 - The cumulative explained variance helps choose how many components to keep

A common rule: keep enough components to explain 95% of the variance. This often reduces dimensionality by 50-90% while retaining most information.

Implementing PCA in Python

Full PCA pipeline with visualization and variance analysis:

python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load high-dimensional dataset (64 features)
digits = load_digits()
X, y = digits.data, digits.target
print(f"Original shape: {X.shape}")  # (1797, 64)

# Step 1: Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 2: Apply PCA
pca = PCA(n_components=0.95)  # Keep 95% of variance
X_pca = pca.fit_transform(X_scaled)
print(f"Reduced shape: {X_pca.shape}")
print(f"Components kept: {pca.n_components_}")
print(f"Variance explained: {pca.explained_variance_ratio_.sum():.4f}")

# Step 3: Compare classification accuracy
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
X_pca_train, X_pca_test = pca.transform(X_train), pca.transform(X_test)

# Without PCA
lr_full = LogisticRegression(max_iter=1000)
lr_full.fit(X_train, y_train)
acc_full = accuracy_score(y_test, lr_full.predict(X_test))

# With PCA
lr_pca = LogisticRegression(max_iter=1000)
lr_pca.fit(X_pca_train, y_train)
acc_pca = accuracy_score(y_pca_test := pca.transform(X_test), lr_pca.predict(pca.transform(X_test)))

print(f"\nAccuracy without PCA (64 features): {acc_full:.4f}")
print(f"Accuracy with PCA ({pca.n_components_} features): {lr_pca.score(pca.transform(X_test), y_test):.4f}")

# Scree plot data
print("\nVariance explained by first 10 components:")
for i, var in enumerate(pca.explained_variance_ratio_[:10]):
    print(f"  PC{i+1}: {var:.4f} ({var*100:.1f}%)")

PCA for Visualization

One of the most powerful uses of PCA is reducing data to 2 or 3 dimensions for visualization. By projecting to 2D, you can:

- Visually inspect cluster structure before applying K-Means - Identify outliers and anomalies - Understand class separability before classification

For non-linear structure, t-SNE and UMAP are superior visualization techniques that preserve local neighborhood structure better than PCA.

Limitations of PCA

PCA has important constraints:

  • Linear only - PCA can only capture linear relationships. Use Kernel PCA for non-linear data.
  • Interpretability - Principal components are linear combinations of all features, making them hard to interpret.
  • Sensitive to scaling - Always standardize features before PCA.
  • Assumes Gaussian distribution - Works best when features are approximately normally distributed.
  • Information loss - Discarded components always contain some information, even if small.

Key Takeaways

  • PCA transforms correlated features into uncorrelated principal components ordered by variance explained.
  • Always standardize features before PCA - it is sensitive to feature scale.
  • Use explained variance ratio to choose the number of components (typically 95% threshold).
  • PCA reduces overfitting, speeds up training, and enables 2D/3D visualization of high-dimensional data.
  • PCA is linear - for non-linear dimensionality reduction, use t-SNE, UMAP, or Kernel PCA.

Contact Us

Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com