Sentiment Analysis
Sentiment analysis classifies the emotional polarity of text as positive, negative, or neutral. This article covers lexicon-based approaches, machine learning classifiers, and deep learning methods.
What is Sentiment Analysis?
Sentiment analysis (also called opinion mining) is the NLP task of identifying and extracting subjective information from text - primarily whether the expressed opinion is positive, negative, or neutral.
Applications: - Product review analysis (Amazon, Yelp) - Social media monitoring (Twitter/X brand sentiment) - Customer feedback classification - Financial news sentiment for trading signals - Political opinion analysis
Granularity levels: - Document-level: Overall sentiment of a review. - Sentence-level: Sentiment of each sentence. - Aspect-level: Sentiment toward specific aspects ("The camera is great but battery life is very poor").
Lexicon-Based Approach
The simplest approach uses a sentiment lexicon - a dictionary mapping words to sentiment scores.
# Lexicon-based sentiment analysis
POSITIVE_WORDS = {
'good': 1, 'great': 2, 'excellent': 3, 'amazing': 3, 'love': 2,
'wonderful': 2, 'fantastic': 3, 'best': 2, 'happy': 1, 'perfect': 3,
'awesome': 2, 'brilliant': 2, 'outstanding': 3, 'superb': 3,
}
NEGATIVE_WORDS = {
'bad': -1, 'terrible': -3, 'awful': -3, 'hate': -2, 'worst': -3,
'horrible': -3, 'poor': -1, 'disappointing': -2, 'useless': -2,
'broken': -2, 'slow': -1, 'expensive': -1, 'ugly': -2,
}
NEGATION_WORDS = {'not', 'no', 'never', "n't", 'neither', 'nor', 'hardly'}
INTENSIFIERS = {'very': 1.5, 'extremely': 2.0, 'quite': 1.2, 'really': 1.3}
def lexicon_sentiment(text: str) -> dict:
tokens = text.lower().split()
score = 0
negated = False
intensifier = 1.0
for i, token in enumerate(tokens):
# Check for negation
if token in NEGATION_WORDS:
negated = True
continue
# Check for intensifier
if token in INTENSIFIERS:
intensifier = INTENSIFIERS[token]
continue
# Score the word
word_score = 0
if token in POSITIVE_WORDS:
word_score = POSITIVE_WORDS[token] * intensifier
elif token in NEGATIVE_WORDS:
word_score = NEGATIVE_WORDS[token] * intensifier
if negated:
word_score = -word_score
negated = False
score += word_score
intensifier = 1.0 # Reset
if score > 0:
label = "POSITIVE"
elif score < 0:
label = "NEGATIVE"
else:
label = "NEUTRAL"
return {"score": round(score, 2), "label": label}
# Test
reviews = [
"This product is absolutely amazing and works perfectly!",
"Terrible quality, completely broken after one day.",
"Not bad, but not great either.",
"The camera is great but the battery life is very poor.",
]
for review in reviews:
result = lexicon_sentiment(review)
print(f"{result['label']:8} ({result['score']:+.1f}) | {review[:50]}")Machine Learning Approach
ML-based sentiment analysis trains a classifier on labeled examples. The pipeline:
1. Preprocessing - Clean and tokenize text. 2. Feature Extraction - Convert text to numerical features (TF-IDF, bag-of-words). 3. Train Classifier - Logistic Regression, SVM, Naive Bayes. 4. Evaluate - Accuracy, F1-score on test set.
from collections import Counter
import math
class TFIDFVectorizer:
"""Simple TF-IDF implementation."""
def fit(self, corpus: list):
self.vocab = {}
self.idf = {}
N = len(corpus)
# Build vocabulary
all_words = set()
for doc in corpus:
all_words.update(doc.split())
self.vocab = {w: i for i, w in enumerate(sorted(all_words))}
# Compute IDF
for word in self.vocab:
df = sum(1 for doc in corpus if word in doc.split())
self.idf[word] = math.log((N + 1) / (df + 1)) + 1
return self
def transform(self, corpus: list) -> list:
vectors = []
for doc in corpus:
tokens = doc.split()
tf = Counter(tokens)
vec = [0.0] * len(self.vocab)
for word, count in tf.items():
if word in self.vocab:
tfidf = (count / len(tokens)) * self.idf.get(word, 1)
vec[self.vocab[word]] = tfidf
vectors.append(vec)
return vectors
class LogisticRegressionSentiment:
"""Binary logistic regression for sentiment."""
def __init__(self, lr=0.1, epochs=100):
self.lr = lr
self.epochs = epochs
self.weights = None
self.bias = 0
def sigmoid(self, x):
return 1 / (1 + math.exp(-max(-500, min(500, x))))
def fit(self, X, y):
self.weights = [0.0] * len(X[0])
for _ in range(self.epochs):
for xi, yi in zip(X, y):
pred = self.sigmoid(sum(w*x for w,x in zip(self.weights, xi)) + self.bias)
error = yi - pred
self.weights = [w + self.lr * error * x for w, x in zip(self.weights, xi)]
self.bias += self.lr * error
def predict(self, X):
preds = []
for xi in X:
score = sum(w*x for w,x in zip(self.weights, xi)) + self.bias
preds.append(1 if self.sigmoid(score) > 0.5 else 0)
return preds
# Training data
train_texts = [
"great product love it amazing",
"excellent quality highly recommend",
"best purchase ever fantastic",
"terrible product broke immediately",
"awful quality waste of money",
"horrible experience never again",
]
train_labels = [1, 1, 1, 0, 0, 0] # 1=positive, 0=negative
# Train
vectorizer = TFIDFVectorizer().fit(train_texts)
X_train = vectorizer.transform(train_texts)
clf = LogisticRegressionSentiment(lr=0.5, epochs=200)
clf.fit(X_train, train_labels)
# Test
test_texts = ["amazing quality love it", "terrible broken waste"]
X_test = vectorizer.transform(test_texts)
predictions = clf.predict(X_test)
labels = ["POSITIVE" if p == 1 else "NEGATIVE" for p in predictions]
print("Predictions:", list(zip(test_texts, labels)))Deep Learning for Sentiment Analysis
Modern sentiment analysis uses pre-trained language models fine-tuned on sentiment data:
BERT-based approach: 1. Start with pre-trained BERT (trained on 3.3B words). 2. Add a classification head on top of the [CLS] token. 3. Fine-tune on labeled sentiment data (e.g., SST-2, IMDB).
This achieves ~95%+ accuracy on standard benchmarks - far better than traditional ML.
# Fine-tuning BERT for sentiment (using HuggingFace Transformers)
# pip install transformers torch
from transformers import pipeline
# Load pre-trained sentiment analysis pipeline
# (Downloads model on first run - ~500MB)
sentiment_pipeline = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english"
)
# Analyze sentiments
texts = [
"I absolutely love this product! Best purchase ever.",
"This is the worst thing I've ever bought. Complete waste of money.",
"It's okay, nothing special but does the job.",
"The camera quality is outstanding but the battery drains too fast.",
]
results = sentiment_pipeline(texts)
for text, result in zip(texts, results):
print(f"{result['label']:8} ({result['score']:.3f}) | {text[:60]}")
# Output:
# POSITIVE (0.9998) | I absolutely love this product! Best purchase ever.
# NEGATIVE (0.9997) | This is the worst thing I've ever bought. Complete waste...
# NEGATIVE (0.9811) | It's okay, nothing special but does the job.
# POSITIVE (0.9987) | The camera quality is outstanding but the battery drains...Evaluation Metrics
Sentiment analysis models are evaluated with standard classification metrics:
- Accuracy - (TP + TN) / Total. Simple but misleading for imbalanced datasets.
- Precision - TP / (TP + FP). Of all predicted positives, how many are correct?
- Recall - TP / (TP + FN). Of all actual positives, how many did we catch?
- F1-Score - 2 * (Precision * Recall) / (Precision + Recall). Harmonic mean.
- AUC-ROC - Area under the ROC curve. Measures discrimination ability.
Key Takeaways
- Sentiment analysis classifies text as positive, negative, or neutral.
- Lexicon-based methods use word sentiment dictionaries - simple but limited.
- ML approaches (TF-IDF + Logistic Regression) learn from labeled examples.
- BERT fine-tuning achieves state-of-the-art accuracy (~95%+) on sentiment benchmarks.
- Aspect-level sentiment analysis identifies sentiment toward specific product features.
Contact Us
Have a question or feedback? Fill out the form below or reach us directly at support@nvaitraining.com