Is it Spam
or Ham?
Paste any email and let our trained classifiers decide in milliseconds.
Paste an email and click
Analyze Email to begin
All 5 Models, One Email
See how each algorithm classifies the same input — spot disagreements and understand model behavior.
Metrics Dashboard
Accuracy, Precision, Recall and F1-Score across all trained classifiers.
| Algorithm | Accuracy | Precision | Recall | F1-Score | CV Mean | Performance |
|---|---|---|---|---|---|---|
| Loading metrics… | ||||||
How It Works
Data Pipeline
1,210 labeled emails (605 spam / 605 ham) are preprocessed — URLs, phone numbers, and currency symbols are normalized, then tokenized into a TF-IDF matrix with 5,000 bigram features and sublinear term frequency scaling.
Naive Bayes
Multinomial Naive Bayes applies Bayes' theorem with a smoothing factor (α=0.1). It models each word's conditional probability of appearing in spam vs. ham, making it extremely fast and surprisingly accurate for text classification.
Logistic Regression
A linear model that learns a weight for each TF-IDF feature. The sigmoid function maps the weighted sum to a probability between 0 and 1. Trained with L2 regularization (C=1.0) and up to 1,000 iterations for convergence.
Support Vector Machine
LinearSVC finds the maximum-margin hyperplane separating spam from ham in the high-dimensional TF-IDF feature space. Particularly effective when the number of features exceeds the number of samples.
Random Forest
An ensemble of 100 decision trees, each trained on a random feature subset. Majority voting determines the final label. Robust to overfitting and provides implicit feature importance rankings.
K-Nearest Neighbors
KNN (k=5) classifies an email by finding the 5 most similar training examples using cosine distance in TF-IDF space. No explicit training phase — classification is purely instance-based at inference time.