SaaS
Kubernetes Platform with Built-In FinOps
How a Series B SaaS team cut non-production AWS spend and gave every squad a clear view of namespace costs.
Client context
- Profile
- Series B B2B SaaS, ~85 engineers, 40 services on AWS EKS. Six squads spin up namespaces through a self-service portal.
- Engagement
- Embedded platform engineer + FinOps dashboards
- Duration
- Multi-month
- Team
- 1 FOO platform engineer embedded with the client platform team (3 engineers)
Stack
- Amazon EKS
- Karpenter
- Terraform
- Kubecost
- Grafana
- AWS Cost Explorer
Challenge
Non-production AWS spend hit ~$48k/month with no breakdown by squad. Namespaces stayed running 24/7 after demos. Rightsizing was ad hoc. Finance got one lump-sum invoice each month.
Milestones
1.Tagging standard agreed
Platform and finance agreed on required labels: team, environment, cost-center. We captured baseline spend and fixed untagged resources.
Next step: Tags enforced at admission time
2.Guardrails live in the cluster
ResourceQuotas, LimitRanges, and Karpenter consolidation policies went in. Dev clusters scale down nights and weekends. First two squads onboarded.
Next step: Per-squad cost visibility
3.Finance signed off on dashboards
Each squad got a Grafana board: month-to-date spend, top services by cost, anomaly flags. Monthly export matches finance’s allocation model.
Next step: Squads can act on their own costs
4.Squads handle teardown and rightsizing
Short guides on rightsizing requests/limits and when to tear down ephemeral environments. Platform team owns ongoing tuning.
Solution summary
We enforced cost tags at the platform layer, added namespace quotas and default resource requests, scaled down non-production clusters outside business hours, and built per-squad cost dashboards in Grafana (fed by Kubecost).
Results
| Metric | Before | After |
|---|---|---|
| Non-production monthly spend | ~$48k | ~$31k (−35%) |
| Namespaces with cost owner label | ~40% | 100% |
| Idle namespaces flagged | Ad hoc | Within first sprint |
Outcomes
- Non-production spend down 35% in 60 days—production capacity untouched
- Squads trusted self-service because they could see what their environments cost
- Finance gets monthly reports by squad instead of one opaque total