
.png)
.png)
.png)
A leading platform for return management, this company serves top-tier online retailers and is scaling rapidly. The first quarter is its most critical period, with intense transaction volumes following the holiday season. Customers expect speed and reliability; merchants demand ironclad service-level agreements for uptime, performance, and data integrity, especially during the peak post-holiday return season.
The stakes are high. Even seconds of downtime during peak season can damage customer trust and disrupt revenue.
The company's Kubernetes-based architecture enables agility but also amplifies risk when visibility gaps occur. For the senior vice president (SVP) of technology, visibility and control are top priorities during high-stakes periods. That became urgent when a serious security incident unfolded at the worst possible time.
A leading platform for return management, this company serves top-tier online retailers and is scaling rapidly. The first quarter is its most critical period, with intense transaction volumes following the holiday season. Customers expect speed and reliability; merchants demand ironclad service-level agreements for uptime, performance, and data integrity, especially during the peak post-holiday return season.
The stakes are high. Even seconds of downtime during peak season can damage customer trust and disrupt revenue.
The company's Kubernetes-based architecture enables agility but also amplifies risk when visibility gaps occur. For the senior vice president (SVP) of technology, visibility and control are top priorities during high-stakes periods. That became urgent when a serious security incident unfolded at the worst possible time.
It began with what seemed like a routine change. During a maintenance window, a Kubernetes service was taken offline and the Sysdg agent responsible for runtime security was disabled. Perimeter protections were intact, but with the environment now blind to runtime activity, attackers found their window.
Unbeknownst to the team, malicious reconnaissance tools were continuously sweeping the internet for just this kind of oversight. The exposed workload, running PHP-FPM, was a known target for remote code execution. In today’s high-speed threat landscape, even minor misconfigurations can become a siren call for opportunistic adversaries scanning billions of endpoints for vulnerable openings.
Initial cryptomining attempts silently failed, likely blocked by default container constraints such as restricted write access or limited privileges. But those failures didn’t deter the attackers. They escalated, impersonating trusted agents and executing lateral movement. Ultimately, they deployed Perfctl, a stealthy rootkit designed to siphon computing resources for cryptomining while evading detection across scanners, logs, and traditional monitoring tools.
While the company’s engineers had missed the initial after-hours alert, Sysdig’s Threat Research team did not. Around midnight, they flagged a high-severity alert. Attackers were masquerading as Datadog agents.
The threat escalated quickly. The attackers exploited the impersonation to pivot into a private internal namespace that should never have been exposed. Containers in that namespace were running with root privileges, and no runtime policies were in place to detect or block the intrusion. Without visibility or enforcement controls, the team initially couldn’t assess the blast radius or contain the spread.
The only viable path forward was a full rebuild. Within approximately 20 minutes, the team wiped and redeployed every pod and container, quickly removing the attackers' immediate foothold.
Once Sysdig agents were restored, the full scope of the attack became clear. Telemetry revealed that the attacker had returned with Perfctl, a cryptoming rootkit engineered to hide its presence. Build-time scanners and cloud security posture management tools never saw it. Network-based intrusion detection and intrusion prevention systems missed it. And standard host and application logs – especially with containers running as root in shared namespaces – offered no insight into in-memory or kernel-level exploits. Only real-time, in-container telemetry could expose and stop this threat.
As they traced the attacker’s movements, the SVP of technology was left confronting difficult questions: Was any customer data accessed? Were multiple services compromised? Could they contain the threat without taking the platform offline?
These were high-stakes questions during the most critical revenue period of the year, when even a small misstep could carry major consequences.
Within seconds of being restored, alerts began streaming in that were precise, context-rich, and immediately actionable. The rules surfaced Perfctl’s behavior patterns, including specific process spawns, system calls, and lateral movement attempts. From there, the investigation unfolded with surgical precision:
With guidance from Sysdig’s incident response experts, the engineering team acted swiftly, wiping and redeploying pods and containers to halt the attack.
Despite contending with a stealthy, evasive adversary during the most critical revenue window of the year, the company emerged unscathed. There was no downtime. No impact to customers. No evidence of data exfiltration.
But the real victory wasn’t just operational, it was strategic. The incident became a vivid proof point that a small, fast-moving engineering team could detect, contain, and recover from a complex cloud-native attack without sacrificing agility or business continuity.
More importantly, it sparked lasting change.
In the days that followed, the SVP of technology and his team translated the lessons of the incident into these systemic improvements:
“This incident could have undermined customer trust and our peak-season performance,” said the SVP of Technology. “Instead, with Sysdig, we contained the threat and improved our security posture without missing a beat.”
This wasn’t just a lucky escape. It was cloud security executed the right way.