M.K.A
Back to blog

When Autoencoders Meet Reinforcement Learning: Network Attack Detection

·2 min read
ResearchDeep LearningReinforcement LearningCybersecurity

During my time as a research assistant at Monash University, I explored an unconventional approach to network intrusion detection: combining autoencoders for feature engineering with reinforcement learning for classification. The results were compelling — ~97% detection accuracy with a 92% recall rate.

The Problem

Network intrusion detection systems (NIDS) face a fundamental challenge: the feature space is enormous, and attack patterns are constantly evolving. Traditional machine learning approaches work well on known attacks but struggle with novel patterns.

The CICIDS2017 dataset contains realistic network traffic with labeled attack categories, making it an ideal benchmark for testing new approaches.

Autoencoders as Feature Engineers

Rather than using autoencoders purely for anomaly detection (the typical approach), we used them as a dimensionality reduction and feature engineering tool. The encoder learns a compressed representation of normal network behavior, and the reconstruction error itself becomes a powerful feature.

class NetworkAutoencoder(nn.Module):
    def __init__(self, input_dim: int, latent_dim: int):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, latent_dim),
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, input_dim),
        )
 
    def forward(self, x):
        latent = self.encoder(x)
        reconstructed = self.decoder(latent)
        return reconstructed, latent

The RL Agent

The reinforcement learning component treats each network flow as a state and the classification (benign vs. attack type) as an action. The reward function balances detection accuracy against false positive rates.

This is where the approach gets interesting: the RL agent can adapt its strategy based on the patterns it encounters, making it more resilient to distribution shifts compared to static classifiers.

Hyperparameter Optimization

We used a combination of grid search and Bayesian optimization to tune the autoencoder architecture. The key parameters that mattered most:

  • Latent dimension size — Too small loses information, too large defeats the purpose
  • Dropout rate — Critical for preventing the autoencoder from learning identity mappings
  • Learning rate scheduling — Cosine annealing outperformed step decay

The optimization process increased recall to 92% while simultaneously reducing false positives by 6%.

Key Takeaways

This research reinforced a principle I keep coming back to: the most interesting results often come from combining techniques in unexpected ways. Autoencoders and RL are well-understood individually, but their combination for network security is relatively unexplored.

The full paper details are available upon request.