GCP Kubernetes Monitoring

Your number of Kubernetes clusters can grow pretty fast - and so could be your pain points.

Following GCP Kubernetes checks are performed at a configurable frequency

Web Dashboard Should Be Disabled

Ensures all Kubernetes clusters have the web dashboard disabled. It is recommended to disable the web dashboard because it is backed by a highly privileged service account.

Private Endpoints Should Be Enabled

Ensures the private endpoint setting is enabled for kubernetes clusters. Kubernetes private endpoints can be used to route all traffic between the Kubernetes worker and control plane nodes over a private VPC endpoint rather than across the public internet.

Private Cluster Should Be Enabled

Ensures private cluster is enabled for all Kubernetes clusters. Kubernetes private clusters only have internal ip ranges, which ensures that their workloads are isolated from the public internet.

Pod Security Policy Should Be Enabled

Ensures pod security policy is enabled for all Kubernetes clusters. Kubernetes pod security policy is a resource that controls security sensitive aspects of the pod configuration.

Network Policy Should Be Enabled

Ensures all Kubernetes clusters have network policy enabled. Kubernetes network policy creates isolation between cluster pods, this creates a more secure environment with only specified connections allowed.

Monitoring Should Be Enabled

Ensures all Kubernetes clusters have monitoring enabled.

Master Authorized Network Should Be Enabled

Ensures master authorized networks is set to enabled on Kubernetes clusters

Logging Should Be Enabled

Ensures all Kubernetes clusters have logging enabled. This setting should be enabled to ensure Kubernetes control plane logs are properly recorded.

Legacy Authorization Should Be Disabled

Ensure legacy authorization is set to disabled on Kubernetes clusters. The legacy authorizer in Kubernetes grants broad, statically defined permissions.

Default Service Accounts Should Not Be Used

Ensure Kubernetes cluster nodes do use the default service account. Kubernetes cluster nodes should use customized service accounts that have minimal privileges to run. This reduces the attack surface in the case of a malicious attack on the cluster.

Container-Optimized OS Should Be Enabled

Ensures all Kubernetes cluster nodes have Container-Optimized OS enabled. Container-Optimized OS is optimized to enhance node security. It is backed by a team at Google that can quickly patch it.

Cluster Should Have Limited Service Account Access

Ensures Kubernetes clusters are created with limited service account access scopes. Kubernetes service accounts should be limited in scope to the services necessary to operate the clusters.

Basic Authentication Should Be Disabled

Ensure basic authentication is set to disabled on Kubernetes clusters.

Automatic Node Upgrades Should Be Enabled

Ensures all Kubernetes cluster nodes have automatic upgrades enabled. Enabling automatic upgrades on nodes ensures that each node stays current with the latest version of the master branch, also ensuring that the latest security patches are installed to provide the most secure environment.

Automatic Node Repair Should Be Enabled

Ensures all Kubernetes cluster nodes have automatic repair enabled. When automatic repair on nodes is enabled, the Kubernetes engine performs health checks on all nodes, automatically repairing nodes that fail health checks. This ensures that the Kubernetes environment stays optimal.

Alias IP Ranges Should Be Enabled

Ensures all Kubernetes clusters have alias IP ranges enabled. Alias IP ranges allow users to assign ranges of internal IP addresses as alias to a network interface.

Kubernetes Node Pool Autoscaling Should Be Enabled

Ensure that node pool autoscaling is enabled

Kubernetes Boot Disk Should Be Encrypted With Customer Managed Keys

Ensure that boot disk on k8 node pools are encrypted with CMK

Integrity Monitoring Should Be Enabled For Kubernetes Node Pools

Ensure that kubernetes node pools have Integrity Monitoring enabled

Secure Boot Should Be Enabled For Kubernetes Node Pools

Ensure that kubernetes node pools have secure boot enabled

Shielded Nodes Should Be Used For Kubernetes Cluster

Ensure that shielded nodes are used in node pools

Autoscaling Profile For Clusters Should Be Set To Optimize_Utilization or Balanced

Ensure that cluster autoscaling profile is set to OPTIMIZE_UTILIZATION or BALANCED for optimal resource utilization

Cluster Master Endpoint Should Not Be Global

Ensure that the endpoint of cluster master in not public

Latest Kubernetes Version Should Be Used

Ensure that the kubernetes version is up to date

Client Certificate Authentication Should Not Be Used For Users

Kubernetes provides the option to use client certificates for user authentication. However as there is no way to revoke these certificates when a user leaves an organization or loses their credential, they are not suitable for this purpose. It is not possible to fully disable client certificate use within a cluster as it is used for component to component authentication.

Ensure Image Vulnerability Scanning Is Performed

Scan images stored in Google Container Registry (GCR) for vulnerabilities.

Minimize Cluster Access To Read-Only For GCR

Configure the Cluster Service Account with Storage Object Viewer Role to only allow readonly access to GCR.

Minimize Container Registries To Only Approved Ones

Use Binary Authorization to allowlist (whitelist) only approved container registries

Ensure GKE Clusters Are Not Using Default Service Account

Create and use minimally privileged Service accounts to run GKE cluster nodes instead of using the Compute Engine default Service account. Unnecessary permissions could be abused in the case of a node compromise.

Use Dedicated GCP Service Accounts And Workload Identity For Clusters

Kubernetes workloads should not use cluster node service accounts to authenticate to Google Cloud APIs. Each Kubernetes Workload that needs to authenticate to other Google services using Cloud IAM should be provisioned a dedicated Service account. Enabling Workload Identity manages the distribution and rotation of Service account keys for the workloads to use.

Ensure Kubernetes Secrets Are Encrypted Using KMS Keys

Encrypt Kubernetes secrets, stored in etcd, at the application-layer using a customermanaged key in Cloud KMS.

Ensure Legacy Compute Engine Instance Metadata APIs Are Disabled

Disable the legacy GCE instance metadata APIs for GKE nodes. Under some circumstances, these can be used from within a pod to extract the node's credentials

Ensure The GKE Metadata Server Is Enabled

Running the GKE Metadata Server prevents workloads from accessing sensitive instance metadata and facilitates Workload Identity

Ensure Clusters Use Stable Release Channels

Subscribe to the Regular or Stable Release Channel to automate version upgrades to the GKE cluster and to reduce version management complexity to the number of features and level of stability required.

Ensure Integrity Monitoring For Shielded GKE Nodes Is Enabled

Enable Integrity Monitoring for Shielded GKE Nodes to be notified of inconsistencies during the node boot sequence.

Enable VPC Flow Logs And Intranode Visibility

Enable VPC Flow Logs and Intranode Visibility to see pod-level traffic, even for traffic within a worker node.

Ensure Use Of VPC-Native Clusters

Create Alias IPs for the node network CIDR range in order to subsequently configure IPbased policies and firewalling for pods. A cluster that uses Alias IPs is called a 'VPC-native' cluster

Ensure Stackdriver Kubernetes Logging And Monitoring Is Enabled

Send logs and metrics to a remote aggregator to mitigate the risk of local tampering in the event of a breach.

Ensure Authentication Using Client Certificates Is Disabled

Disable Client Certificates, which require certificate rotation, for authentication. Instead, use another authentication method like OpenID Connect.

Manage Kubernetes RBAC Users With Google Groups

Cluster Administrators should leverage G Suite Groups and Cloud IAM to assign Kubernetes user roles to a collection of users, instead of to individual emails using only Cloud IAM.

Ensure Kubernetes Web UI Is Disabled

The Kubernetes Web UI (Dashboard) has been a historical source of vulnerability and should only be deployed when necessary.

Ensure Alpha Clusters Are Not Used For Production

Alpha clusters are not covered by an SLA and are not production-ready.

Consider GKE Sandbox For Running Untrusted Workloads

Use GKE Sandbox to restrict untrusted workloads as an additional layer of protection when running in a multi-tenant environment.

Ensure Use Of Binary Authorization

Binary Authorization helps to protect supply-chain security by only allowing images with verifiable cryptographically signed metadata into the cluster.

Schedule Maintenance Windows And Exclusions

GKE cluster should schedule maintenance windows and exclusions to upgrade predictability and to align updates with off-peak business hours.

Upgrades And Updates Should Be Enabled

GKE cluster should be proactively receive updates about GKE upgrades and GKE versions

Control Plane Endpoint Access Should Be Limited To Authorized Networks

Control Plane endpoint access should be limited to authorized networks only

Enable Regional Redundancy For Maximum Availability

GKE cluster should be regional for maximum availability of control plane during upgrades and zonal outages

For Large Clusters L4 ILB Subsetting Should Be Used

GKE cluster should use GKE L4 ILB Subsetting if nodes > 250

Clusters Should Have Network Policies Or Dataplane V2 Enabled

GKE cluster should have Network Policies or Dataplane V2 enabled

Cluster Should Use Node Local DNS Cache

GKE cluster should use node local DNS cache

Node Pools Should Be Regional For High Availability

GKE node pools should be regional (multiple zones) for maximum nodes availability during zonal outages

Workload Identity Should Be Enabled

GKE cluster should have Workload Identity enabled