Skip to main content

SaaS

Kubernetes Platform with Built-In FinOps

How a Series B SaaS team cut non-production AWS spend and gave every squad a clear view of namespace costs.

Client context

Profile
Series B B2B SaaS, ~85 engineers, 40 services on AWS EKS. Six squads spin up namespaces through a self-service portal.
Engagement
Embedded platform engineer + FinOps dashboards
Duration
Multi-month
Team
1 FOO platform engineer embedded with the client platform team (3 engineers)

Stack

  • Amazon EKS
  • Karpenter
  • Terraform
  • Kubecost
  • Grafana
  • AWS Cost Explorer

Challenge

Non-production AWS spend hit ~$48k/month with no breakdown by squad. Namespaces stayed running 24/7 after demos. Rightsizing was ad hoc. Finance got one lump-sum invoice each month.

Milestones

  1. 1.Tagging standard agreed

    Platform and finance agreed on required labels: team, environment, cost-center. We captured baseline spend and fixed untagged resources.

    Next step: Tags enforced at admission time

  2. 2.Guardrails live in the cluster

    ResourceQuotas, LimitRanges, and Karpenter consolidation policies went in. Dev clusters scale down nights and weekends. First two squads onboarded.

    Next step: Per-squad cost visibility

  3. 3.Finance signed off on dashboards

    Each squad got a Grafana board: month-to-date spend, top services by cost, anomaly flags. Monthly export matches finance’s allocation model.

    Next step: Squads can act on their own costs

  4. 4.Squads handle teardown and rightsizing

    Short guides on rightsizing requests/limits and when to tear down ephemeral environments. Platform team owns ongoing tuning.

Solution summary

We enforced cost tags at the platform layer, added namespace quotas and default resource requests, scaled down non-production clusters outside business hours, and built per-squad cost dashboards in Grafana (fed by Kubecost).

Results

MetricBeforeAfter
Non-production monthly spend~$48k~$31k (−35%)
Namespaces with cost owner label~40%100%
Idle namespaces flaggedAd hocWithin first sprint

Outcomes

  • Non-production spend down 35% in 60 days—production capacity untouched
  • Squads trusted self-service because they could see what their environments cost
  • Finance gets monthly reports by squad instead of one opaque total