SAGE UNIVERSITY INDORE
Hybrid Multi-Layer
Phishing Defense Engine
Advanced AI-Powered Cybersecurity Detection System
Submitted By
Anurag
B.Tech AI
Submitted To
Dr. Dilip Solanki
Faculty Supervisor
01 — Overview
Introduction
The Threat

Phishing attacks steal sensitive data — passwords, financial info, and personal credentials — using fake websites and deceptive emails that mimic legitimate services.

Our Solution

This project builds a multi-layer detection engine that analyzes URLs and emails, assigns a risk score, and provides clear explanations for every decision.

3.4B
Phishing emails/day globally
36%
Of breaches involve phishing
$4.9M
Avg. cost per breach
02 — Challenge
Problem Statement
Low Accuracy in Legacy Systems
Traditional rule-based and single-layer detection systems fail against modern, sophisticated phishing attacks that constantly evolve.
No Proper Explanation
Existing tools give binary safe/unsafe verdicts without explaining why — leaving users unable to understand or learn from the detection.
Need for Better Detection + Explanation
A hybrid, multi-layer approach combining ML models with explainable AI is required to achieve high accuracy and user trust.
03 — Goals
Objectives
Detect Phishing

Accurately identify phishing URLs and emails with high precision and minimal false positives.

Multi-Layer Approach

Combine URL analysis, content inspection, email headers, and visual similarity checks.

Provide Risk Score

Generate a quantified risk score (0–100) for every analyzed URL or email input.

User Understanding

Use Explainable AI (XAI) to show users exactly why a URL was flagged as phishing.

04 — Pipeline Step 1
Input & Collection
INPUT TYPES
URL Input
Direct URL string analysis
Email Input
Full email content + headers
METHODS
Python Input Handling
Robust parsing & validation pipeline
API / Web Scraping
Real-time data collection from live URLs
Input layer validates and normalizes all data before passing to the feature extraction pipeline.
05 — Pipeline Step 2
Feature Extraction
EXTRACTED FEATURES
URL Length Special Symbols Domain Age Subdomains HTTPS Status IP Address
METHODS
Lexical Analysis
Pattern-based URL structure parsing
WHOIS Lookup
Domain registration & age data
BeautifulSoup + Regex
HTML parsing & pattern matching
06 — Pipeline Step 3
Multi-Layer Analysis
URL Analysis
Rule-based + Blacklist matching
Rule-based Blacklist
Content Analysis
HTML/JS inspection + NLP
HTML/JS NLP
Email Analysis
Header inspection + SPF/DKIM
Header SPF/DKIM
Visual Analysis
CNN + Image comparison
CNN Image Compare
07 — Pipeline Step 4
ML Model Training
Supervised Learning Approach
Trained on labeled datasets of phishing and legitimate URLs/emails using ensemble methods for maximum accuracy.
Logistic Regression
Linear Classifier
Random Forest
Ensemble Trees
SVM
Support Vector Machine
XGBoost
Gradient Boosting
08 — Pipeline Step 5
Hybrid Decision Engine
HOW IT WORKS
Combine All Layers

Aggregates outputs from URL, content, email, and visual analysis layers into a unified decision framework.

Voting System
Majority vote across all detection layers
Weighted Scoring
Confidence-weighted aggregation
DECISION FLOW
Layer 1: URL Score
Layer 2: Content Score
Layer 3: Email Score
⚡ Final Hybrid Decision
09 — Pipeline Step 6
Explainable AI (XAI)

XAI makes the model's decisions transparent and interpretable — users can see exactly which features triggered the phishing alert.

SHAP
SHapley Additive exPlanations — assigns contribution values to each feature
LIME
Local Interpretable Model-agnostic Explanations — local approximations
FEAT
Feature Importance ranking — identifies top contributing signals
10 — Pipeline Step 7
System Output
SAFE / PHISHING Verdict
Binary classification result with confidence percentage
✓ SAFE
Risk Score (0–100)
SafeModerateHigh Risk
72
Explanation
Top features: suspicious domain age (+34%), @ symbol in URL (+28%), mismatched SSL (+18%)
11 — Innovation
URL DNA Fingerprinting
THE CONCEPT
Create a "DNA" of URLs

Break each URL into structural patterns — protocol, subdomain, domain, path, parameters — and encode them as a unique fingerprint signature.

EXAMPLE DNA STRAND
HTTPS · SUB×3 · DOM-NEW · PATH-DEEP · PARAM-@
Patent Opportunity

"A structural fingerprinting method for detecting phishing URLs by encoding URL components into a comparable DNA-like signature pattern."

Compare against phishing DNA signatures database
Detect structural similarity to known phishing patterns
12 — Summary
Conclusion
Hybrid System Improves Detection
Multi-layer approach combining URL, content, email, and visual analysis achieves significantly higher accuracy than single-layer systems.
Provides Clear Explanations
SHAP, LIME, and Feature Importance make every decision transparent — building user trust and enabling learning.
Novel URL DNA Fingerprinting
A unique structural fingerprinting innovation with patent potential for next-generation phishing detection.
🛡️ Smarter Detection · Clearer Explanations · Safer Internet
Navigate