In the vast and rapidly expanding landscape of the cloud, security is no longer a peripheral concern but the very foundation of digital trust. As organizations migrate from on-premise data centers to hyperscale environments like AWS, the traditional security perimeters have dissolved, replaced by a complex web of API calls, identities, and distributed resources. To navigate this new reality, security practitioners must move beyond basic prevention and embrace a robust, data-driven strategy centered on detective controls.
We sat down with Kailash Havildar, a Senior Security Consultant at AWS, to discuss the critical role of logging and monitoring in modern cybersecurity. With years of experience ranging from Splunk administration to developing native detection controls at AWS, Kailash shared his insights on building scalable security postures, the challenges of log centralization, and the power of anomaly detection.
You can read the complete transcript of the epiosde here >
What are the Three Pillars of Cloud Security?
Every cloud security strategy is built upon three fundamental pillars: prevention, detection, and remediation.
- Prevention: These are controls designed before an event can occur. In the cloud, this involves setting up guardrails that block unauthorized actions or insecure configurations, such as preventing S3 buckets from being public-facing or ensuring databases are encrypted at launch.
- Detection: This focuses on identifying events that have already occurred. It relies on robust logging and monitoring systems to surface suspicious activities through rules, signatures, or behavior-based alerts.
- Remediation: This is the response to a detected event, which can range from a simple user notification to automated actions such as isolating a compromised resource or reverting a broad firewall rule.
How Should Organizations Structure Preventive Controls in the Cloud?
For prevention, Kailash recommends a top-down approach that has proven successful in large-scale AWS deployments:
- Region Controls: Define which geographic regions are authorized to host your infrastructure.
- Service Controls: Determine which specific cloud services (e.g., EC2, S3, RDS) are allowed for use by customers or internal users.
- Configuration Controls: Within allowed services and regions, implement granular rules, such as requiring encryption for every database spun up.
What is the Role of Logging and Monitoring in Detective Controls?
Kailash describes logging and monitoring as the “CCTVs for your digital infrastructure”. They are essential not just for security, but also for compliance and troubleshooting.
- Logging: The act of collecting data from various sources, including resources, audit logs, and application data.
- Monitoring: The act of actively looking into and analyzing the collected data to identify issues.
To implement this effectively, organizations should refer to established frameworks like NIST 800-53, specifically the “Audit and Accountability” (AU) control family, which provides a baseline for best practices. Key steps include centralizing all logs, implementing strict access controls, and ensuring data is encrypted both in transit and at rest.
What Types of Logs and Events Should Be Prioritized?
While the order of priority may shift depending on whether you are a developer or a security professional, Kailash recommends the following sequence for security-focused logging:
- User Logs: Tracking who is doing what, at what time, and using which IAM role. In AWS, this is primarily achieved through CloudTrail logging across all accounts.
- System/Configuration Changes: Monitoring any modifications to the environment’s configuration.
- Network Logs: Specifically, VPC Flow Logs to understand traffic patterns.
- Application and Database Logs: Capturing security events and audit trails within the software and data layers.
What are the Biggest Challenges in Implementing Cloud Logging?
Greenfield customers and maturing organizations often encounter significant hurdles when setting up their logging infrastructure:
- Cost Management: Centralizing vast amounts of data into a Security Information and Event Management (SIEM) solution often leads to high license and data retention costs.
- Verbosity and Field Selection: By default, many logs may only provide a few basic fields. The information required for deep analysis is often contained in non-default fields that must be explicitly enabled.
- Alert Fatigue: Generating too many alerts can lead to a team ignoring them. If an alert fires every five minutes, the team will eventually lose trust in the data.
- Log Parsing and Standardization: Parsing complex data from multiple sources requires sophisticated regular expressions (Regex) to ensure that logs from different systems (like two different firewalls) can be cross-correlated effectively.
How Can SIEM Systems Provide Advanced Security Capabilities?
A SIEM solution goes beyond simple rule-based alerts (e.g., “if X happens, then alert”) to provide anomaly detection.
- Volumetric Anomalies: Detecting sudden spikes in data—for example, if a user who typically downloads 5 MB of data suddenly attempts to download 5 GB.
- Behavioral Cross-Correlation: Linking different log types to uncover threats. A classic example is correlating Windows Event Code 4624 (successful login) with IP geolocation data. If a login occurs from a different country while the user is physically in the office, it signals a major security concern.
- Threat Intel Integration: Matching internal network traffic against regularly updated lists of known malicious IP addresses.
How Does Automation Improve Log Analysis and Incident Response?
Given the volume of data, manual analysis is impossible. Automation is the key to managing SIEM systems and SOAR (Security Orchestration, Automation, and Response) platforms.
- Scheduled Tasks: Mission-critical tasks, such as updating malicious IP lists or running machine learning algorithms for behavioral analysis, should be automated to run regularly (e.g., every night).
- Insider Threat Monitoring: Automation can flag suspicious patterns, such as an interactive login by a service account—an event that typically shouldn’t happen and may indicate a compromise.
- Automated Remediation: Starting with a “soft landing” (notifications) and moving toward automated actions ensures that security issues are addressed at scale without alienating application teams.
How Do You Secure the Logging Systems Themselves?
A critical question is: “Who is going to police the police?” Logging systems are prime targets for attackers looking to cover their tracks.
- Role-Based Access Control (RBAC): Within the SIEM, restrict access so that teams only see the data they need (e.g., security sees Windows logs, while infrastructure sees networking logs).
- Audit Logging for Logs: The SIEM itself must log who is performing queries or trying to delete data.
- Tamper Detection: Set alerts for specific event codes, such as Windows Event 1102, which indicates that log data has been deleted.
- Lifecycle Management: Use tiered storage (e.g., S3 standard for 3 months, then Glacier) to balance the need for fast retrieval with long-term compliance storage requirements, which can range from one to seven years.
What Metrics Justify the Investment in Logging and Monitoring?
To show Return on Investment (ROI) to leadership, security teams should focus on the quality and impact of detections rather than raw volume:
- Alert Accuracy: Achieving a high ratio (90-95%) of true positives to false positives. This ensures that executives see the alerts as legitimate issues requiring action.
- Coverage: Demonstrating comprehensive logging across all applications and networking stacks within the organization.
- Incident Response Speed: Tracking how quickly an issue is resolved once an alert is fired.
Conclusion: The Path to Cloud Resilience
Detective controls, fueled by a robust logging and monitoring infrastructure, are the vital link between identifying a threat and successfully remediating it. As Kailash Havildar emphasizes, the cloud’s scale requires a shift from manual oversight to automated, cross-correlated analysis. By mastering the basics—centralizing data, applying the NIST 800-53 framework, and relentlessly fine-tuning alerts to reduce fatigue—organizations can transform their logs from a “black hole” of data into a powerful, proactive defense system. Ultimately, the goal is to build an environment where security isn’t just a hurdle, but an automated, visible, and indispensable part of the business.