Hybrid Multi-Layer Phishing Defense Engine

SAGE UNIVERSITY INDORE

Hybrid Multi-Layer
Phishing Defense Engine

Advanced AI-Powered Cybersecurity Detection System

Submitted By

Anurag

B.Tech AI

Submitted To

Dr. Dilip Solanki

Faculty Supervisor

01 — Overview

Introduction

The Threat

Phishing attacks steal sensitive data — passwords, financial info, and personal credentials — using fake websites and deceptive emails that mimic legitimate services.

Our Solution

This project builds a multi-layer detection engine that analyzes URLs and emails, assigns a risk score, and provides clear explanations for every decision.

3.4B

Phishing emails/day globally

36%

Of breaches involve phishing

$4.9M

Avg. cost per breach

02 — Challenge

Problem Statement

Low Accuracy in Legacy Systems

Traditional rule-based and single-layer detection systems fail against modern, sophisticated phishing attacks that constantly evolve.

No Proper Explanation

Existing tools give binary safe/unsafe verdicts without explaining why — leaving users unable to understand or learn from the detection.

Need for Better Detection + Explanation

A hybrid, multi-layer approach combining ML models with explainable AI is required to achieve high accuracy and user trust.

03 — Goals

Objectives

Detect Phishing

Accurately identify phishing URLs and emails with high precision and minimal false positives.

Multi-Layer Approach

Combine URL analysis, content inspection, email headers, and visual similarity checks.

Provide Risk Score

Generate a quantified risk score (0–100) for every analyzed URL or email input.

User Understanding

Use Explainable AI (XAI) to show users exactly why a URL was flagged as phishing.

04 — Pipeline Step 1

Input & Collection

INPUT TYPES

URL Input

Direct URL string analysis

Email Input

Full email content + headers

METHODS

Python Input Handling

Robust parsing & validation pipeline

API / Web Scraping

Real-time data collection from live URLs

Input layer validates and normalizes all data before passing to the feature extraction pipeline.

05 — Pipeline Step 2

Feature Extraction

EXTRACTED FEATURES

URL Length Special Symbols Domain Age Subdomains HTTPS Status IP Address

METHODS

Lexical Analysis

Pattern-based URL structure parsing

WHOIS Lookup

Domain registration & age data

BeautifulSoup + Regex

HTML parsing & pattern matching

06 — Pipeline Step 3

Multi-Layer Analysis

URL Analysis

Rule-based + Blacklist matching

Rule-based Blacklist

Content Analysis

HTML/JS inspection + NLP

HTML/JS NLP

Email Analysis

Header inspection + SPF/DKIM

Header SPF/DKIM

Visual Analysis

CNN + Image comparison

CNN Image Compare

07 — Pipeline Step 4

ML Model Training

Supervised Learning Approach

Trained on labeled datasets of phishing and legitimate URLs/emails using ensemble methods for maximum accuracy.

Logistic Regression

Linear Classifier

Random Forest

Ensemble Trees

SVM

Support Vector Machine

XGBoost

Gradient Boosting

08 — Pipeline Step 5

Hybrid Decision Engine

HOW IT WORKS

Combine All Layers

Aggregates outputs from URL, content, email, and visual analysis layers into a unified decision framework.

Voting System

Majority vote across all detection layers

Weighted Scoring

Confidence-weighted aggregation

DECISION FLOW

Layer 1: URL Score

Layer 2: Content Score

Layer 3: Email Score

⚡ Final Hybrid Decision

09 — Pipeline Step 6

Explainable AI (XAI)

XAI makes the model's decisions transparent and interpretable — users can see exactly which features triggered the phishing alert.

SHAP

SHapley Additive exPlanations — assigns contribution values to each feature

LIME

Local Interpretable Model-agnostic Explanations — local approximations

FEAT

Feature Importance ranking — identifies top contributing signals

10 — Pipeline Step 7

System Output

SAFE / PHISHING Verdict

Binary classification result with confidence percentage

✓ SAFE

Risk Score (0–100)

SafeModerateHigh Risk

72

Explanation

Top features: suspicious domain age (+34%), @ symbol in URL (+28%), mismatched SSL (+18%)

11 — Innovation

URL DNA Fingerprinting

THE CONCEPT

Create a "DNA" of URLs

Break each URL into structural patterns — protocol, subdomain, domain, path, parameters — and encode them as a unique fingerprint signature.

EXAMPLE DNA STRAND

HTTPS · SUB×3 · DOM-NEW · PATH-DEEP · PARAM-@

Patent Opportunity

"A structural fingerprinting method for detecting phishing URLs by encoding URL components into a comparable DNA-like signature pattern."

Compare against phishing DNA signatures database

Detect structural similarity to known phishing patterns

12 — Summary

Conclusion

Hybrid System Improves Detection

Multi-layer approach combining URL, content, email, and visual analysis achieves significantly higher accuracy than single-layer systems.

Provides Clear Explanations

SHAP, LIME, and Feature Importance make every decision transparent — building user trust and enabling learning.

Novel URL DNA Fingerprinting

A unique structural fingerprinting innovation with patent potential for next-generation phishing detection.

            🛡️ Smarter Detection · Clearer Explanations · Safer Internet