.png)
Square Enix Secures Gaming Platforms at Scale
Runtime visibility enabled faster, evidence-based investigations
Square Enix Secures Gaming Platforms at Scale
Runtime visibility enabled faster, evidence-based investigations
Company Overview
Square Enix is a global entertainment company that develops and operates online games played by millions of users worldwide, including long-running franchises such as Final Fantasy and Dragon Quest. These platforms support live services used by millions of players worldwide, with highly variable traffic patterns, and where availability and stability are critical to the player experience.
The Site Reliability Engineering (SRE) team is responsible for the cloud infrastructure that underpins these services. Operating shared backend platforms on Google Kubernetes Engine (GKE), the team supports multiple game titles while managing rapid scaling, ongoing development, and the security demands that come with running always-on online systems.
Business Challenges
- Limited runtime visibility across a growing container environment, obscuring real security risks
- Manual, log-heavy investigations that slowed response times and increased operational efforts
- Poor visibility into vulnerable libraries and dependencies, requiring manual exposure tracking
- Unpredictable traffic spikes tied to live game activity, demanding security that would not slow development or impact availability
Company Overview
Square Enix is a global entertainment company that develops and operates online games played by millions of users worldwide, including long-running franchises such as Final Fantasy and Dragon Quest. These platforms support live services used by millions of players worldwide, with highly variable traffic patterns, and where availability and stability are critical to the player experience.
The Site Reliability Engineering (SRE) team is responsible for the cloud infrastructure that underpins these services. Operating shared backend platforms on Google Kubernetes Engine (GKE), the team supports multiple game titles while managing rapid scaling, ongoing development, and the security demands that come with running always-on online systems.
Business Challenges
- Limited runtime visibility across a growing container environment, obscuring real security risks
- Manual, log-heavy investigations that slowed response times and increased operational efforts
- Poor visibility into vulnerable libraries and dependencies, requiring manual exposure tracking
- Unpredictable traffic spikes tied to live game activity, demanding security that would not slow development or impact availability
Challenges
Operating at Scale Without Clear Signals
Square Enix runs platforms where demand is unpredictable. Traffic can surge suddenly around live events, releases, or player behavior, and the infrastructure must respond immediately. Kubernetes made that level of elasticity possible, but it also changed the security equation.
In containerized environments, traditional perimeter and access controls provide only part of the story. They can limit who connects to systems, but they do not explain what workloads actually do once running.
“Restricting access alone wasn’t enough. We needed to understand what was actually happening inside our systems, not just control who could reach them.”
Masahiro Sato,
Team Leader, Site Reliability Engineering, Square Enix
As the platform grew more dynamic, investigations often began only after suspicious activity or alerts surfaced. Teams had to reconstruct events from logs to determine whether behavior was truly malicious, and if so, how far it had spread across the environment. That process introduced delays that made confident response difficult, particularly during periods of peak activity.
This reactive model carried real consequences. Move too slowly, and exposure increased. Move too aggressively, and live services or development teams could be disrupted. Without clear, real-time context, decisions depended heavily on experience and judgment rather than evidence.
At the same time, application risks became harder to manage. Vulnerabilities surfaced constantly across shared platforms, but translating that information into clear, actionable guidance for developers was inconsistent. Any security approach also had to operate continuously as workloads changed, without adding manual effort or slowing development.
Solutions
Gaining Real-Time Visibility Into Runtime Behavior
The team prioritized runtime visibility, knowing that they needed direct insight into how workloads behaved in production. With the Sysdig platform, they gained continuous runtime context across their container and Kubernetes workloads, without adding operational friction. Security shifted earlier in the lifecycle, and investigations became faster and more rooted in direct evidence.
Sysdig gave the SRE team direct visibility into what was happening inside their running systems, allowing investigations to begin with real-time context instead of after an incident had already unfolded.
Before Sysdig, incident investigations were labor-intensive and slow. “We would download access logs stored on multiple servers, then meticulously check each line using complex queries,” said Sumito Amagasaki, a member of the SRE team. “It took a significant number of people and a lot of time just to understand the scope of impact.”
This visibility eliminated much of that overhead. Teams could see events unfold directly, assess impact faster, and respond with greater confidence during active incidents.
Bringing Vulnerability Awareness Earlier Into Development
Beyond runtime detection, Sysdig helped Square Enix change how vulnerability management actually worked day to day. Instead of relying on severity scores alone, the team could focus on vulnerabilities that were both present in the environment and actionable in practice. This runtime-powered prioritization reduced alert noise and shifted remediation efforts toward true risk based on exposure.
“The cost of vulnerability inventory management was extremely high. By narrowing alerts to what actually required action, we freed up resources that were previously wasted on verification and validation, and drastically reduced vulnerability response times.”
Chihiro Yamamoto,
Site Reliability Engineering, Square Enix
With clearer context and remediation guidance available immediately, developers no longer had to spend time researching vulnerability details or determining how to address them. SRE team members gained a more consistent way to assess exposure across shared platforms, while development teams could move directly from detection to remediation. They also knew what to prioritize to better plan their day. By aligning vulnerability data with runtime behavior, the organization reduced friction between security and engineering while improving risk and maintaining production stability.
Supporting Scale Without Adding Operational Burden
Any security solution had to work seamlessly with the infrastructure behind Square Enix’s online games. Sysdig met that requirement by continuing to detect threats and vulnerabilities as workloads expanded and contracted, without requiring operator intervention.
“Game traffic can spike suddenly,” Sato said. “We needed security that would continue working even as the environment changed.”
By removing the need for manual oversight during traffic surges, Sysdig allowed security controls to remain consistent during peak activity, when operational pressure was highest.
Creating Shared Understanding Across the Team
Sysdig also changed how the team approached investigations. With direct access to runtime data, system call activity, and clear remediation guidance, engineers could evaluate risk using the same evidence, regardless of role or experience. When logs and metrics weren’t enough, they could drill into runtime behavior to analyze complex network activity and understand how systems were communicating at the kernel level.
“When a vulnerability is found, it’s not always obvious where to start,” said Hikari Araya, a member of the SRE team. “With Sysdig, the recommendations are clear, so I can begin investigating right away.”
That clarity reduced reliance on a small group of senior engineers and helped maintain consistent investigation quality as responsibilities expanded. Security work became easier to distribute across the team without compromising rigor.
To learn more about Square Enix, visit square-enix.com.

.png)

%25201%2520(1).png)

.png)
.png)
%25201.png)
.png)
.png)
.png)
.png)