Key Takeaways
- 01 Cloud isn't a ripoff—you're paying for flexibility and reduced operational burden
- 02 Owned infrastructure isn't always cheaper—operational costs can wipe out hardware savings
- 03 Cloud wins for unpredictable workloads; owned wins for consistent, predictable demand
- 04 Threshold for owned infrastructure: $500K-$1M+ annual cloud spend
- 05 Best engineers choose based on constraints, not ideology
I spent the last weekend reading comma.ai’s article about owning their $5M data center. It’s a great piece—honest, technical, and refreshingly contrarian. They’re saving $20M by owning their own GPUs and servers instead of renting from AWS or GCP. But here’s the thing: their story is exceptional, not universal.
Let me tell you about the time I helped a company move off the cloud. Spoiler: it didn’t save them money.
The Great Cloud Migration That Wasn’t
Back in 2022, I was working with a fintech startup running their entire stack on AWS. Monthly bills were hitting $18,000. The founders, engineers who had read too many “cloud is a ripoff” tweets, decided to build their own mini data center in a nearby colocation facility.
They spent $250,000 on hardware up front. Eight server racks, redundant power, network switches, the works. The math on paper was beautiful: break-even in 14 months, then pure savings.
Eighteen months later, they spent $40,000 more on a rushed migration back to AWS.
What happened? The stuff nobody talks about in those viral “we left the cloud” posts.
The Hidden Costs Nobody Mentions
1. Your Time Has a Price, Too
comma.ai mentions their data center is “maintained and built by only a couple engineers and technicians.” Key word: only a couple.
In reality, finding people who can properly run a data center is brutally hard. I’m not talking about someone who knows how to rack a server—I mean people who understand power redundancy, cooling thermodynamics, network topology, security hardening, and disaster recovery planning.
The fintech startup? They tried to handle data center ops with their two DevOps engineers. Both were cloud-native—great at Terraform, clueless about physical infrastructure. They spent 40% of their time dealing with hardware issues instead of shipping features.
In a month of dealing with a failed RAID controller, a power surge that took down three servers, and a cooling system failure, they calculated those two engineers’ time cost the company $15,000. Per month.
Real ops talent—not cloud engineers masquerading as infrastructure specialists—are incredibly rare. People who have designed cooling systems, implemented proper power redundancy, set up complex network topologies, and handled hardware procurement at scale.
2. The “Capex Trap”
Buying hardware feels like a one-time cost. It’s not. That comma.ai article mentions they spent ~$5M on their data center. Here’s what $5M buys you today, and what it looks like in three years:
- Day 1: State-of-the-art hardware, warranty coverage, peak performance
- Year 2: Newer GPUs are 30% faster. Your 18-month-old servers are now “mid-tier”
- Year 3: Warranty expires. Power efficiency looks dated compared to new silicon
- Year 4: You’re running “legacy” hardware. Cloud providers have moved on twice
The fintech startup bought their servers right before the GPU shortage eased. Six months later, similar hardware was 40% cheaper. That’s a $100,000 timing mistake.
Cloud providers spread these costs across thousands of customers. You absorb it all yourself.
3. The Upgrade Nightmare
In the cloud, upgrading is a few clicks. When you own hardware, it’s a logistics project.
I watched a small company spend three months planning a GPU upgrade. Ordering hardware, scheduling installation, migrating data, validating performance—every step required careful coordination. The whole project cost $180,000 in hardware plus $40,000 in engineering time.
In AWS, they could have just launched new instance types over a weekend.
4. Scaling Is Asymmetric
Scaling up with your own hardware takes weeks or months. Scaling down? That’s just servers sitting idle.
comma.ai runs workloads with “consistent” compute needs—training models, processing data. That’s the perfect scenario for owned infrastructure.
The fintech startup? Their traffic fluctuated wildly. During crypto market booms, they needed 4x their usual capacity. During quiet periods, they needed 20%. With owned hardware, they either overprovision (paying for idle servers) or underprovision (missing revenue).
Cloud lets you pay for exactly what you use, when you use it.
When Owning Actually Makes Sense
After all this, you might think I’m anti-owned-infrastructure. I’m not. Here’s when it makes sense:
1. Your Workload Is Predictable and Consistent
If you know you’ll need 500 GPUs 24/7 for the next two years, owning them is a no-brainer. The cost advantage is real.
comma.ai fits this perfectly. They train driving models continuously—consistent, predictable, long-term demand.
2. You Have Ops Talent In-House
Real ops talent, not cloud engineers masquerading as infrastructure specialists. People who have:
- Designed and built cooling systems
- Implemented proper power redundancy
- Set up and maintained complex network topologies
- Handled hardware procurement at scale
If you don’t have these skills, you’re not saving money—you’re shifting it from your cloud bill to your learning curve.
3. Security or Data Sovereignty Requirements
Sometimes, you literally can’t use the cloud. Government contracts, financial data, IP protection—some requirements mandate physical control.
I’ve seen healthcare companies build their own infrastructure because HIPAA compliance was simpler with servers they could audit directly.
4. You’re At Massive Scale
At comma.ai’s level (600 GPUs, 4PB storage), the economics flip. Cloud margins eat you alive. When you’re spending seven figures annually on compute, owning hardware is mathematically the right move.
I’ve seen the threshold around $500K-$1M in annual cloud spend. Below that, operational overhead often outweighs savings.
The Middle Ground Most People Ignore
There’s this false dichotomy: either go full cloud or build your own data center. But there’s a third option that gets little attention: colocation.
Colocation is renting space in a professional data center. They handle:
- Power redundancy and distribution
- Cooling and climate control
- Physical security
- Network connectivity (often with better peering than you can get yourself)
- Fire suppression
You bring your own servers, but skip the facility engineering. The fintech startup I mentioned? They could have collocated for $3,000/month instead of building their own facility. They would have saved $150,000 on power and cooling infrastructure alone.
A Decision Framework
Here’s the question I ask anyone considering owned infrastructure:
Before committing to owned infrastructure, ask yourself these six questions. Be honest.
- What’s your annual cloud spend? (If under $500K, stop here—stay on cloud)
- Is your compute demand predictable (+/- 20%) over the next 24 months?
- Do you have at least two engineers with deep infrastructure experience?
- Can you tolerate a 3-6 month hardware upgrade cycle?
- Is your workload latency-sensitive enough that cloud overhead hurts?
- Do you have regulatory or security requirements preventing cloud use?
If you answered “yes” to at least 4 of these: Consider owned infrastructure or colocation. If you answered “no” to 3 or more: Stay on cloud. Seriously.
My Personal Take
I’ve helped migrate to cloud, from cloud, and everything in between. Here’s what I’ve learned:
Cloud isn’t a ripoff—you’re paying for flexibility, reduced operational burden, and risk distribution. The premium is real, but so is the value.
Owned infrastructure isn’t always cheaper—your operational costs can easily wipe out hardware savings, especially at smaller scales.
The best engineers I know don’t default to either extreme. They choose based on constraints, not ideology.
What Should You Do?
If you’re running a small-to-medium company (under $1M annual cloud spend), you’re almost certainly better off optimizing your cloud usage than building your own infrastructure:
- Audit your cloud bill – I’ve seen companies save 30% just by eliminating idle resources and using reserved instances
- Right-size your instances – Most people overprovision “just in case”
- Use serverless for variable workloads – Pay per execution, not per hour
- Consider multi-year reserved instances – Cloud providers give huge discounts for commitment
If you’re at scale ($1M+ annual spend), do the math carefully. Factor in:
- Hardware procurement (3-6 month lead times for GPUs)
- Staffing (real ops engineers are expensive)
- Facility costs or colocation fees
- Insurance and risk management
- Opportunity cost of capital (that $5M could be growing elsewhere)
The Real Lesson from comma.ai
The impressive thing about comma.ai isn’t that they own their data center—it’s that they have the engineering chops to do it well. They built their own cooling systems, wrote custom storage software, and run a tight ship with minimal staff.
That’s rare.
Most companies trying this end up in the situation I described: distracted engineers, operational chaos, and an eventual migration back to cloud with lighter wallets and bruised egos.
The real lesson isn’t “own your data center”—it’s “build the capabilities your business needs.” For comma.ai, that meant a data center. For most of us, it means getting better at cloud efficiency.
Next Steps
Not sure where you land? Start here:
- Audit your current setup – Where are your biggest costs? Where’s the most waste?
- Run the numbers – Use a TCO calculator (cloud providers have them) to compare options
- Assess your team – Be honest about your in-house capabilities
- Start small – If you want to try owned infrastructure, don’t bet the company. Start with non-critical workloads
Owning infrastructure is awesome when it fits. Cloud is awesome when it fits. The trap is thinking there’s one right answer for everyone.
Pick the one that fits you, not the one trending on Hacker News.
Have you made a big infrastructure decision that looked different on paper than in practice? I’d love to hear your story—drop it in the comments.