🛡️

SSH Guardian v3.0

Intelligent SSH Intrusion Detection for SMEs

Student Md Sohel Rana (TP086217)

Supervisor Dr. K.C. Arun

Institution Asia Pacific University

Module CT095-6-M RMCE

📋Presentation Agenda

1️⃣

Introduction

Problem & Research Questions

2️⃣

Literature Review

Gap Analysis & Contribution

3️⃣

Methodology

DSR & System Design

4️⃣

Implementation

System Demo

5️⃣

ML Models

Training & Evaluation

6️⃣

Results

Performance Metrics

7️⃣

Future Work

Roadmap & Open Source

8️⃣

Conclusion

Summary & Q&A

⚠️The Threat Landscape

Every 39 seconds, a server somewhere is being attacked. — Cukier, M. (2007). University of Maryland Study on Computer Hacking

📊SSH Attack Statistics

43%

Cyberattacks Target SMEs

Verizon DBIR, 2023

60%

SMEs Lack IT Security Staff

Ponemon Institute, 2022

$200K

Average SME Breach Cost

IBM Cost of Data Breach Report, 2023

65%

Attacks via SSH Brute Force

Rapid7 Research, 2022

📚Literature References

1 Standard 2006

The Secure Shell (SSH) Transport Layer Protocol

Ylonen, T. and Lonvick, C.

RFC 4253, IETF

2 Report 2023

43% of breaches involve SMEs

Verizon - Data Breach Investigations Report

Verizon Enterprise Solutions

3 Report 2023

Average SME breach costs $200K+

IBM Security - Cost of a Data Breach Report

IBM Security

4 Report 2022

SSH brute force: 65% of initial access

Rapid7 - Under the Hoodie Report

Rapid7 Research

5 Report 2023

15M+ SSH attacks daily in APAC

Akamai Technologies - State of the Internet / Security Report

Akamai Technical Report

6 Journal 2001

Random Forests

Breiman, L.

Machine Learning, 45(1), 5-32 | DOI: 10.1023/A:1010933404324

7 Conference 2016

XGBoost: A Scalable Tree Boosting System

Chen, T. and Guestrin, C.

Proceedings of the 22nd ACM SIGKDD | DOI: 10.1145/2939672.2939785

8 Journal 2007

A Design Science Research Methodology for Information Systems Research

Peffers, K. et al.

Journal of Management Information Systems, 24(3) | DOI: 10.2753/MIS0742-1222240302

🌏SME & South Asia Context

South Asia Threat Landscape

30%

YoY Increase in Cyberattacks

Kaspersky APT Report, 2023

15M+

SSH Attacks Daily in APAC

Akamai State of Internet, 2023

SME Vulnerability in Region

70%

SMEs Form ASEAN Economy

OECD SME Policy Index, 2021

83%

Lack Cybersecurity Budget

APEC Cybersecurity Report, 2022

🎯The Research Gap: Why This Matters

Affordability Gap: Splunk Enterprise costs $15K-$100K/year; 89% of Malaysian SMEs have total IT budgets under $10K (SME Corp Malaysia, 2022)
Expertise Gap: 72% of ASEAN SMEs report lack of cybersecurity skills as primary barrier (Cisco Cybersecurity for SMBs, 2023)
Tool Complexity: Average SIEM deployment requires 6-12 months and dedicated SOC team (Gartner, 2022)
SSH-Specific Threat: SSH brute-force attacks increased 104% in Asia-Pacific region in 2023 (Akamai SOTI Report, 2023)
No Existing Solution: Current open-source tools (Fail2ban, OSSEC) lack ML capabilities and threat intelligence integration

✅SSH Guardian: Bridging the Gap

🆓

Zero Cost

Open-source under Apache 2.0 license. No licensing fees, no vendor lock-in. Accessible to all SMEs regardless of budget.

⚡

10-Minute Setup

Single command installation vs. months for enterprise SIEM. No specialized training required.

🤖

ML-Powered

96.91% detection accuracy with Random Forest. Automated threat response without manual intervention.

🌐

Threat Intelligence

Integrated AbuseIPDB, VirusTotal, GreyNoise APIs. Real-time reputation scoring for every IP.

📊

Intuitive Dashboard

Web-based UI designed for non-security staff. No command-line expertise needed.

💻

Minimal Resources

Runs on 2.3% CPU, 180MB RAM. Works on basic VPS ($5/month) used by most SMEs.

🏢The SME Security Dilemma

💰

Budget Constraints

Enterprise tools cost $15K-$100K/year. SMEs typically have <$10K for all IT security.

👥

Expertise Gap

60% of SMEs have no dedicated security staff. Complex tools require specialists.

⏰

Time Pressure

Limited resources mean security often takes a backseat to core business operations.

🛡️

Introduction

Research Questions & Objectives

❓Research Questions

RQ1: How can machine learning enhance SSH intrusion detection for SMEs while maintaining low resource requirements?
RQ2: Which features are most important for accurately detecting SSH attacks in an SME environment?
RQ3: How can external threat intelligence be integrated to improve detection accuracy?
RQ4: How does an ML-enhanced framework compare to traditional rule-based solutions like Fail2ban?

🎯Research Objectives

🤖

Objective 1

Develop an ML-based SSH intrusion detection system optimized for SME resource constraints

🔬

Objective 2

Identify and evaluate key features for SSH attack detection through feature engineering

🌐

Objective 3

Integrate external threat intelligence APIs for enhanced detection capabilities

📈

Objective 4

Evaluate the system against Fail2ban and document performance improvements

🛡️

Literature Review

Existing Solutions & Research Gap

🔓SSH Protocol Vulnerabilities

🔨

Brute Force

Systematic password guessing using dictionaries or combinations

📋

Credential Stuffing

Using leaked credentials from other breaches

🎭

Key-based Attacks

Exploiting weak or stolen SSH keys

🕳️

Protocol Exploits

Targeting implementation vulnerabilities

🛡️Existing Solution: Fail2ban

✓ Strengths

Free and open-source
Simple to configure
Widely adopted
Low resource usage
Integrates with IPtables

✗ Limitations

No machine learning
No threat intelligence
High false positive rates
Threshold-only detection
No centralized dashboard
No behavioral analysis

🏛️Enterprise SIEM Solutions

$15K-$100K

Splunk Annual Cost

$10K-$50K

IBM QRadar Cost

📊Tool Comparison Matrix

Feature	Fail2ban	OSSEC	CrowdSec	Splunk	SSH Guardian
ML Detection	✗	✗	◐	✓	✓
Threat Intel	✗	◐	✓	✓	✓
Dashboard	✗	✓	✓	✓	✓
SME-Friendly	✓	✗	◐	✗	✓
Free	✓	✓	✓	✗	✓
Setup Time	5 min	Hours	30 min	Days	10 min

🔍The Research Gap

📋Gap Analysis Summary

✗ No tool combines ML + Threat Intel + SME-friendly interface
✗ Enterprise solutions are too expensive for SMEs
✗ Open-source tools lack advanced detection capabilities
✗ No solution addresses all four requirements together
✓ SSH Guardian fills this gap

🛡️

Methodology

Design Science Research Approach

🔬Design Science Research (DSR)

🏗️System Architecture

🛡️Three-Layer Detection

📏

Layer 1: Rule-Based

Handles 67% of attacks with sub-millisecond latency. Simple threshold rules for obvious brute force.

🤖

Layer 2: ML Model

Processes 24% of sophisticated attacks with 3.2ms latency. Random Forest with 50 features.

🌐

Layer 3: Threat Intel

Enriches 9% of detections asynchronously. AbuseIPDB, VirusTotal, GeoIP integration.

⚙️Technology Stack

🐍

Python 3.12

Backend & ML

🌶️

Flask

Web Framework

🗄️

MySQL 8.0

Database

⚡

Redis

Caching

🌲

Scikit-learn

ML Library

📊

Chart.js

Visualizations

🛡️

ML Model Selection

Training & Evaluation

🤖Models Evaluated

🌲

Random Forest

Ensemble of decision trees with bootstrap aggregation

⚡

XGBoost

Gradient boosting with regularization

💡

LightGBM

Light gradient boosting with leaf-wise growth

📈Model Performance Comparison

📊Performance Metrics

96.91

Accuracy

96.81

Precision

96.86

Recall

96.84

F1 Score

97.01

ROC-AUC

🌲Why Random Forest?

Interpretability: Feature importance is easily explainable to SME administrators
Stability: Less sensitive to hyperparameter tuning than boosting methods
Low Resources: Efficient inference with minimal CPU/memory requirements
Proven Track Record: Widely used in production intrusion detection systems
No Overfitting: Built-in regularization through bagging and random feature selection

📊Feature Importance

🔧50 Engineered Features

⏰

Temporal (6)

Hour, minute, day of week, weekend, business hours, night

📍

Geographic (6)

Country, distance, domestic, high-risk region

👤

Username (6)

Entropy, common names, patterns, root attempts

🔄

Behavioral (9)

Attempts per hour, velocity, unique users, ratios

🌐

Reputation (6)

AbuseIPDB, VirusTotal, TOR, VPN, proxy flags

🔢

IP Features (6)

Private IP, IPv6, subnet patterns, numeric analysis

📊Confusion Matrix

🛡️

System Demo

Dashboard & Features

🔐Login Security

Two-factor authentication with email OTP

📊Dashboard Overview

Real-time security metrics and threat distribution

📡Live Events Stream

Real-time SSH authentication event monitoring

📈Events Timeline

Historical event analysis with filtering

🖥️Agent Management

Remote server agent monitoring and status

🔍Threat Intelligence Lookup

Comprehensive IP analysis with multiple threat feeds

🔥Firewall Management

Unified firewall control with UFW and Fail2ban

🚫Blocked IPs

Active IP blocks with reason and expiration

📋Blocking Rules

Configurable rule-based and ML-driven blocking

✅Trusted IPs

Whitelist management to prevent false positives

🤖ML Intelligence Overview

Model performance metrics and predictions

📢Notification Channels

Multi-channel alert configuration

📱Telegram Alerts

Real-time Telegram bot alerts

🔔Notification Rules

Customizable alert rules and thresholds

📈Security Trends

Historical analysis and security trends

📊Daily Reports

Automated daily security summaries

⚙️System Settings

System configuration and preferences

🔌API Integrations

External threat intelligence API configuration

👥User Management

Role-based access control

📜Audit Log

Complete activity tracking and compliance

🛡️

Results & Discussion

Performance Evaluation

📊Detection Performance

96.91

Overall Accuracy

3.1

False Positive Rate

<30

Response Time

500

Events/Second

🎯Attack-Specific Detection

99.2

Brute Force Detection

98.7

Tor-based Attack Detection

97.4

Credential Stuffing Detection

⚔️SSH Guardian vs Fail2ban

Metric	Fail2ban	SSH Guardian	Improvement
Accuracy	78.0%	96.91%	+18.91%
False Positives	12.3%	3.1%	-9.2%
Detection Latency	Threshold only	Real-time ML	Faster
Threat Intel	✗	4 APIs	New
Dashboard	✗	Full UI	New
Behavioral Analysis	✗	50 features	New

💻Resource Utilization

2.3

CPU Usage (avg)

180

RAM Usage

<1

Total Processing

99.9

Uptime

✅Research Questions Answered

✓ RQ1: Random Forest with 50 features achieves 96.91% accuracy with minimal resources
✓ RQ2: Temporal features (failed_attempts_1h, is_night) are most predictive
✓ RQ3: AbuseIPDB integration improves detection by 8.4% for known malicious IPs
✓ RQ4: 18.91% accuracy improvement over Fail2ban baseline

🎯Objectives Achieved

✅

Objective 1

Developed ML-based IDS with 2.3% CPU and 180MB RAM usage

✅

Objective 2

Identified 50 features across 9 categories with importance analysis

✅

Objective 3

Integrated 4 threat intelligence APIs with caching strategy

✅

Objective 4

Documented 18.91% improvement over Fail2ban

🛡️

Future Work

Roadmap & Open Source

⚠️Current Limitations

🖥️

Single-Server

Currently designed for single central server deployment

🐧

Linux-Only

Agent supports Linux only, no Windows/macOS

🔄

Manual Retraining

Model updates require manual intervention

🌍

English-Only

Dashboard and documentation in English

📦Open Source Release

📜

Apache 2.0 License

Enterprise-friendly with patent protection

🔧

GitLab Repository

Full source code and documentation

🤝

Community

Open to contributions and feedback

⚖️License Declaration

Apache License 2.0

Free to Use: Commercial and personal use permitted
Modify & Distribute: Full rights to modify and redistribute
Patent Protection: Express grant of patent rights from contributors
No Trademark Rights: Does not grant permission to use project trademarks

Why Apache 2.0?

Enterprise-Friendly: Compatible with corporate legal requirements
SME Accessible: No licensing fees or restrictions for small businesses
Community Growth: Encourages contributions while protecting contributors
Industry Standard: Used by Kubernetes, TensorFlow, Apache projects

🗺️Technical Roadmap 2026

🐳

Q1: Docker

Containerized deployment

🌐

Q2: Multi-Server

Distributed architecture

🪟

Q3: Windows Agent

Cross-platform support

📱

Q4: Mobile App

iOS/Android dashboard

🧠Future ML Improvements

Deep Learning: LSTM networks for sequence-based attack detection
Online Learning: Continuous model updates from new data
Adversarial Robustness: Defense against ML-aware attackers
Explainable AI: SHAP values for prediction transparency
Transfer Learning: Pre-trained models for faster deployment

🛡️

Conclusion

Summary & Contributions

🏆Key Contributions

🤖

Hybrid Detection

Three-layer architecture combining rules, ML, and threat intelligence

📊

Feature Engineering

50 engineered features across 9 categories for SSH analysis

🎯

SME Focus

Enterprise-grade security accessible to small businesses

📖

Open Source

Full codebase available under Apache 2.0 license

📈

Proven Results

96.91% accuracy with 18.91% improvement over baseline

🛠️

Production Ready

18,000+ lines of code, 47-table database, complete dashboard

📝Conclusion

Problem: SMEs lack affordable, effective SSH security solutions
Solution: SSH Guardian - ML-powered, open-source, SME-friendly
Results: 96.91% accuracy, <3% false positives, <30s response time
Impact: Enterprise-grade security accessible to all organizations
Future: Open source release, community contributions, continuous improvement

🙏Acknowledgments

Supervisor: Dr. Kuruvikulam Chandrasekaran Arun for guidance and support
Co-Supervisor: Ts. Dr. Manimegalai A/P Rajenderan
Institution: Asia Pacific University of Technology & Innovation
Open Source Community: Python, Flask, Scikit-learn contributors
Family & Friends: For continuous encouragement and support

🛡️

Thank You

Questions & Discussion

Email sohell.ranaa@gmail.com

GitLab gitlab.com/sohell.ranaa/ssh-guardian

Demo sshg-app.rpu.solutions