Configure the autoscaling feature

Autoscaling enables your application to handle varying loads efficiently by dynamically adjusting the number of replicas. Depending on your needs, you can choose between:

HPA (Horizontal Pod Autoscaler): Standard resource-based autoscaling.
Keda (Kubernetes-based Event Driven Autoscaler): Advanced autoscaling based on external or event-driven metrics.

Autoscale with HPA (Horizontal Pod Autoscaler)

HPA allows you to scale your application based on standard resource metrics, such as CPU or memory utilization. Below is an example of configuring an HPA resource:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sample-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-deployment
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75

Find more guidance at horizontal-pod-autoscale.

More metrics with Keda

Keda takes autoscaling a step further by enabling event-driven scaling based on custom metrics, such as queue length, database events, or Prometheus queries. Below is an example configuration using Prometheus as a trigger:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sample-keda-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-deployment
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://kube-prometheus-stack-prometheus.k8saas-system.svc.cluster.local.:9090
        metricName: nginx_ingress_requests
        threshold: '30'
        query: |
          sum(rate(nginx_ingress_controller_requests{namespace="sample-namespace",service="sample-service",status!~"[4-5].*"}[2m]))
        activationThreshold: '5'

This PromQL query calculates the rate of successful requests to the NGINX Ingress Controller for a specific service within the sample-namespace namespace. It watches requests over the past 2 minutes, excluding the ones with 4xx and 5xx HTTP status codes. The sum of rates across all matching time series is compared to the threshold. If the threshold is met, the trigger fires.

Find more guidance at Keda scaled object specifications.

Note: Make sure to switch the documentation version to your Keda version.

Validate integration

To confirm that autoscaling works as expected, follow these steps:

Load Testing: Use a load testing tool (such as k6 or Locust) to generate traffic and test your application’s ability to scale.
Check Pod Status: Run the following command to verify if additional pods have been created to handle the load:

kubectl get pods