How I Detected Network Attacks with Machine Learning
Tarih: 2026-06-24 | Kategori: Python & Veri Bilimi
Etiketler: Python, Data
Thousands of packets flow through a network every second. Most are
harmless — but a small fraction carry real threats. Traditional
rule-based systems either miss those threats entirely or flood
analysts with false alarms. This project tries to fix both.
The Data
I captured real network traffic with Wireshark: 61,256 normal packets
and 3,654 attack packets generated with Nmap port scanning. 64,910
packets total.
The twist: I also pulled in the official IANA port/service registry
and used NLP keyword analysis to assign a threat risk score to every
port — SSH (22) critical, HTTP (80) medium, unknown ports low.
Merging these two sources made the data significantly more meaningful.
Feature Engineering
Classifying packets in isolation misses the point. The real signal
is in behavioral patterns across a session.
I engineered 5 new features:
The Model
I compared XGBoost and Random Forest. Both scored similarly
(F1: 0.3026 vs 0.3034) but XGBoost was more consistent.
Results on the test set (12,982 packets):
| Metric | Value |
|--------|-------|
| Accuracy | 86.9% |
| Recall | 50.6% |
| AUC-ROC | 0.825 |
The Real Win: Financial Threshold Optimization
Standard practice uses a 50% decision threshold. But that's not
financially optimal.
The two error types carry very different costs:
Modeling these costs, I shifted the threshold from 50% to 34%.
Result: total SOC cost dropped from 698,046 TL to 609,081 TL —
a net saving of 88,965 TL on the test set alone.
Explaining the Model with SHAP
I didn't want to leave the model as a black box. SHAP analysis
revealed which features actually drive decisions:
1. Dst_Port (0.780) — Low port number is the strongest signal
2. Length (0.741) — Small packets match SYN flood signatures
3. Protocol_Enc (0.236) — Protocol diversity flags reconnaissance
Is_High_Risk and Port_Category turned out nearly irrelevant.
I'll redesign both in the next version.
Takeaway
The biggest lesson: a good model isn't just about high accuracy.
Understanding the cost of each error type and tuning the threshold
accordingly can matter more than the algorithm itself.
Code and data: GitHub