MACHINE LEARNING · EMAIL ANALYSIS

Is it Spam
or Ham?

Paste any email and let our trained classifiers decide in milliseconds.

EMAIL INPUT

Algorithm

0 characters 0 words

Try a sample:

Paste an email and click
Analyze Email to begin

ALGORITHM COMPARISON

All 5 Models, One Email

See how each algorithm classifies the same input — spot disagreements and understand model behavior.

Enter an email above and click Run All Models

MODEL PERFORMANCE

Metrics Dashboard

Accuracy, Precision, Recall and F1-Score across all trained classifiers.

—

Total Samples

—

Spam Emails

—

Ham Emails

5,000

TF-IDF Features

—

Train Size

—

Test Size

Algorithm	Accuracy	Precision	Recall	F1-Score	CV Mean	Performance
Loading metrics…

Visual Comparison

Accuracy Precision Recall F1-Score

Confusion Matrices

DOCUMENTATION

How It Works

Data Pipeline

1,210 labeled emails (605 spam / 605 ham) are preprocessed — URLs, phone numbers, and currency symbols are normalized, then tokenized into a TF-IDF matrix with 5,000 bigram features and sublinear term frequency scaling.

Naive Bayes

Multinomial Naive Bayes applies Bayes' theorem with a smoothing factor (α=0.1). It models each word's conditional probability of appearing in spam vs. ham, making it extremely fast and surprisingly accurate for text classification.

Logistic Regression

A linear model that learns a weight for each TF-IDF feature. The sigmoid function maps the weighted sum to a probability between 0 and 1. Trained with L2 regularization (C=1.0) and up to 1,000 iterations for convergence.

Support Vector Machine

LinearSVC finds the maximum-margin hyperplane separating spam from ham in the high-dimensional TF-IDF feature space. Particularly effective when the number of features exceeds the number of samples.

Random Forest

An ensemble of 100 decision trees, each trained on a random feature subset. Majority voting determines the final label. Robust to overfitting and provides implicit feature importance rankings.

K-Nearest Neighbors

KNN (k=5) classifies an email by finding the 5 most similar training examples using cosine distance in TF-IDF space. No explicit training phase — classification is purely instance-based at inference time.

TECH STACK

Python 3.11 Flask scikit-learn TF-IDF Vectorizer Naive Bayes Logistic Regression LinearSVC Random Forest KNN joblib HTML/CSS/JS Canvas API

Is it Spamor Ham?