MACHINE LEARNING · EMAIL ANALYSIS

Is it Spam
or Ham?

Paste any email and let our trained classifiers decide in milliseconds.

EMAIL INPUT
0 characters 0 words
Try a sample:

Paste an email and click
Analyze Email to begin

All 5 Models, One Email

See how each algorithm classifies the same input — spot disagreements and understand model behavior.

Enter an email above and click Run All Models

Metrics Dashboard

Accuracy, Precision, Recall and F1-Score across all trained classifiers.

Total Samples
Spam Emails
Ham Emails
5,000
TF-IDF Features
Train Size
Test Size
Algorithm Accuracy Precision Recall F1-Score CV Mean Performance
Loading metrics…
Visual Comparison
Accuracy Precision Recall F1-Score
Confusion Matrices

How It Works

01

Data Pipeline

1,210 labeled emails (605 spam / 605 ham) are preprocessed — URLs, phone numbers, and currency symbols are normalized, then tokenized into a TF-IDF matrix with 5,000 bigram features and sublinear term frequency scaling.

02

Naive Bayes

Multinomial Naive Bayes applies Bayes' theorem with a smoothing factor (α=0.1). It models each word's conditional probability of appearing in spam vs. ham, making it extremely fast and surprisingly accurate for text classification.

03

Logistic Regression

A linear model that learns a weight for each TF-IDF feature. The sigmoid function maps the weighted sum to a probability between 0 and 1. Trained with L2 regularization (C=1.0) and up to 1,000 iterations for convergence.

04

Support Vector Machine

LinearSVC finds the maximum-margin hyperplane separating spam from ham in the high-dimensional TF-IDF feature space. Particularly effective when the number of features exceeds the number of samples.

05

Random Forest

An ensemble of 100 decision trees, each trained on a random feature subset. Majority voting determines the final label. Robust to overfitting and provides implicit feature importance rankings.

06

K-Nearest Neighbors

KNN (k=5) classifies an email by finding the 5 most similar training examples using cosine distance in TF-IDF space. No explicit training phase — classification is purely instance-based at inference time.

TECH STACK
Python 3.11 Flask scikit-learn TF-IDF Vectorizer Naive Bayes Logistic Regression LinearSVC Random Forest KNN joblib HTML/CSS/JS Canvas API