How I Detected Network Attacks with Machine Learning

Thousands of packets flow through a network every second. Most are

harmless — but a small fraction carry real threats. Traditional

rule-based systems either miss those threats entirely or flood

analysts with false alarms. This project tries to fix both.

The Data

I captured real network traffic with Wireshark: 61,256 normal packets

and 3,654 attack packets generated with Nmap port scanning. 64,910

packets total.

The twist: I also pulled in the official IANA port/service registry

and used NLP keyword analysis to assign a threat risk score to every

port — SSH (22) critical, HTTP (80) medium, unknown ports low.

Merging these two sources made the data significantly more meaningful.

Feature Engineering

Classifying packets in isolation misses the point. The real signal

is in behavioral patterns across a session.

I engineered 5 new features:

Time_Diff — Time between packets. Attacks arrive 28% faster.

Bytes_Per_Sec — Instant traffic density. 2.4× higher in attacks.

Protocol_Enc — Numeric protocol code. Diversity signals reconnaissance.

Is_High_Risk — Flag for traffic targeting ports with risk score ≥ 80.

Port_Category — Port range class (well-known, registered, dynamic).

The Model

I compared XGBoost and Random Forest. Both scored similarly

(F1: 0.3026 vs 0.3034) but XGBoost was more consistent.

Results on the test set (12,982 packets):

| Metric | Value |

|--------|-------|

| Accuracy | 86.9% |

| Recall | 50.6% |

| AUC-ROC | 0.825 |

The Real Win: Financial Threshold Optimization

Standard practice uses a 50% decision threshold. But that's not

financially optimal.

The two error types carry very different costs:

False alarm → 117 TL (15 min of analyst time)

Missed attack → 1,500 TL (KVKK regulatory risk score)

Modeling these costs, I shifted the threshold from 50% to 34%.

Result: total SOC cost dropped from 698,046 TL to 609,081 TL —

a net saving of 88,965 TL on the test set alone.

Explaining the Model with SHAP

I didn't want to leave the model as a black box. SHAP analysis

revealed which features actually drive decisions:

1. Dst_Port (0.780) — Low port number is the strongest signal

2. Length (0.741) — Small packets match SYN flood signatures

3. Protocol_Enc (0.236) — Protocol diversity flags reconnaissance

Is_High_Risk and Port_Category turned out nearly irrelevant.

I'll redesign both in the next version.

Takeaway

The biggest lesson: a good model isn't just about high accuracy.

Understanding the cost of each error type and tuning the threshold

accordingly can matter more than the algorithm itself.

Code and data: GitHub