DevOps in 2026: From Servers to Infrastructure as Code

How modern DevOps practices have evolved to make infrastructure programmable, reliable, and scalable

Key Takeaways

  • 01 Infrastructure as Code is now table stakes, not optional
  • 02 GitOps has replaced manual deployment workflows
  • 03 Observability is more important than just monitoring
  • 04 Serverless and edge computing are changing what's possible
  • 05 Security must be integrated throughout the DevOps pipeline

The Old DevOps: SSH and Manual Scripts

Remember when deployment meant:

  1. SSH into server
  2. git pull origin main
  3. npm run build
  4. Copy files to production
  5. nginx -s reload
  6. Pray nothing broke

This wasn’t just slow—it was fragile. Each deployment was a potential disaster. Rollback meant manually restoring old files. Scaling meant buying more servers.

The problem with manual infrastructure isn’t that it doesn’t work—it’s that it doesn’t scale with your team, your traffic, or your complexity.

Infrastructure as Code: Treat Servers Like Source Code

Infrastructure as Code (IaC) changed everything by bringing software development practices to infrastructure.

Why IaC Matters

  1. Version controlled: Every change has git history
  2. Reproducible: Environments are identical across dev, staging, prod
  3. Testable: Preview infrastructure changes before applying
  4. Automated: No more manual SSH sessions

Terraform: Declarative Infrastructure

# main.tf

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  
  tags = {
    Name = "WebServer"
    Environment = var.environment
  }
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

Terraform figures out dependencies, creates resources in parallel, and provides a plan showing exactly what will change.

Applying Changes Safely

# See what will change without applying
terraform plan

# Apply changes (interactive approval)
terraform apply

# Import existing resources into state
terraform import aws_instance.web i-1234567890abcdef

# Destroy resources safely
terraform destroy

Always review terraform plan output before applying. One bad resource change can delete production databases.

Containerization: Docker and Beyond

Containers standardized how we package and deploy applications.

Docker: The Universal Runtime

# Dockerfile
FROM node:22-alpine

WORKDIR /app

# Install dependencies (cached in layer)
COPY package*.json ./
RUN npm ci --only=production

# Copy source
COPY . .

# Build
RUN npm run build

# Expose port
EXPOSE 3000

# Run as non-root for security
USER node

CMD ["node", "dist/server.js"]

Benefits:

  • Consistent environments: “It works on my machine” disappears
  • Rapid scaling: Spin up new instances in seconds
  • Isolation: Process crashes don’t affect other services
  • Resource limits: Prevent misbehaving containers from consuming all memory

Kubernetes: Orchestrating at Scale

For production workloads, Kubernetes provides:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: your-registry/web-app:1.0.0
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          period: 10

Kubernetes handles:

  • Self-healing: Restart crashed containers
  • Scaling: Add/remove replicas based on load
  • Rolling updates: Zero-downtime deployments
  • Service discovery: Automatically load balance traffic

CI/CD: GitOps replaces Manual Deployments

GitOps treats git as the single source of truth. When you merge to main, infrastructure changes automatically.

GitHub Actions: Integrated CI

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '22'
          
      - name: Install dependencies
        run: npm ci
        
      - name: Run tests
        run: npm test -- --coverage
        
      - name: Build
        run: npm run build
        
      - name: Upload artifact
        uses: actions/upload-artifact@v4
        with:
          name: dist
          path: dist/

  deploy:
    needs: test
    runs-on: ubuntu-latest
    environment: production
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Download artifact
        uses: actions/download-artifact@v4
        with:
          name: dist
          
      - name: Deploy to Vercel
        uses: amondnet/vercel-action@v25
        with:
          vercel-token: ${{ secrets.VERCEL_TOKEN }}
          vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
          vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
          vercel-args: '--prod'

GitOps means your git history becomes your deployment history. Rollback is git revert main. No more “which version was deployed?” panic.

Monitoring and Observability: Understanding Your Systems

Monitoring tells you something is broken. Observability tells you why.

Metrics: What’s Happening

// Custom metrics with Prometheus
import { Counter, Histogram } from 'prom-client';

const requestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.1, 0.5, 1, 2.5, 5, 10],
});

app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    requestDuration.observe({
      method: req.method,
      route: req.path,
      status: res.statusCode,
    }, duration);
  });
  
  next();
});

Track:

  • Request rates: Traffic patterns, spikes, anomalies
  • Latency: P50, P95, P99 response times
  • Error rates: 4xx, 5xx status codes
  • Resource usage: CPU, memory, disk, network

Logging: What Happened

Structured logs are queryable, not just readable:

{
  "timestamp": "2026-02-10T15:30:00Z",
  "level": "error",
  "service": "api",
  "trace_id": "abc123def456",
  "user_id": "user-123",
  "error": {
    "type": "ValidationError",
    "message": "Invalid email format",
    "code": "ERR_001"
  },
  "context": {
    "path": "/api/users",
    "method": "POST"
  }
}

Query logs: level:error AND service:api | timestamp > 24h ago

Distributed Tracing: Understanding Flow

import { trace } from '@opentelemetry/api';

const parentSpan = trace.getActiveSpan();

async function processPayment(userId, amount) {
  const span = tracer.startSpan('processPayment', {
    parent: parentSpan,
    attributes: {
      'user.id': userId,
      'payment.amount': amount,
    },
  });
  
  try {
    const result = await paymentGateway.charge(amount);
    span.addEvent('charge_processed');
    return result;
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: SpanStatusCode.ERROR });
    throw error;
  } finally {
    span.end();
  }
}

See the entire request flow: load balancer → API → database → payment gateway → response.

Serverless and Edge: The New Paradigm

Not everything needs a server.

AWS Lambda: Pay-Per-Use Compute

// handler.js
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';

const dynamo = new DynamoDBClient({ region: 'us-east-1' });

export const handler = async (event) => {
  const { userId } = event.pathParameters;
  
  const response = await dynamo.send(new GetItemCommand({
    TableName: 'Users',
    Key: { userId: { S: userId } },
  }));
  
  return {
    statusCode: 200,
    body: JSON.stringify(response.Item),
  };
};

No servers to manage. Pay only for actual execution time.

Cloudflare Workers: Edge Computing

Run code at edge, closer to users:

// worker.js
export default {
  async fetch(request, env) {
    // Cache at edge
    const cache = caches.default;
    const cached = await cache.match(request);
    
    if (cached) {
      return new Response(cached.body, {
        headers: { 'X-Cache': 'HIT' },
      });
    }
    
    // Transform response at edge
    const response = await fetch(request);
    const transformed = transformResponse(response);
    
    // Cache for future
    await cache.put(request, transformed.clone());
    
    return transformed;
  }
};

Edge computing enables:

  • Global latency: Content cached in 300+ locations worldwide
  • Dynamic content: Modify responses at edge (A/B testing, authentication)
  • DDoS protection: Absorb attacks before reaching origin

Security: DevSecOps

Security isn’t a phase—it’s built into everything.

Secrets Management

Never commit secrets:

# .github/workflows/deploy.yml
- name: Deploy
  env:
    # ❌ BAD: Committed to repo
    # DATABASE_URL: postgresql://user:pass@host/db
    
    # ✅ GOOD: From GitHub Secrets
    DATABASE_URL: ${{ secrets.DATABASE_URL }}

Use secret managers:

  • GitHub Secrets: For CI/CD
  • AWS Secrets Manager: For AWS resources
  • Vault: For self-hosted secrets
  • 1Password Connect: For development teams

Dependency Scanning

Automated in CI:

# .github/workflows/security.yml
name: Security Scan

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'
          
      - name: Upload SARIF file
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

Block merges with critical vulnerabilities.

Infrastructure Scanning

Scan for misconfigurations:

# tfsec
terraform fmt -check
terraform validate
tfsec .

# Checkov
checkov -d .

Compliance is easier to build in than bolt on later. Use Terraform Guard, Sentinel, or OPA for policy-as-code.

Cost Optimization: Don’t Overpay

Cloud costs spiral without visibility.

Rightsizing Resources

# Before: Overprovisioned
resource "aws_instance" "web" {
  instance_type = "m5.large"  # 2 vCPU, 8 GB RAM
  # Actual usage: 5% CPU, 1 GB RAM
}

# After: Rightsized
resource "aws_instance" "web" {
  instance_type = "t3.small"  # 2 vCPU, 2 GB RAM
  # Fits usage, saves $50/month
}

Auto-Scale Policies

# AWS Auto Scaling Group
AutoScalingGroupName: production-web

TargetTrackingConfigs:
  - PredefinedMetricSpecification:
      PredefinedMetricType: ASGAverageCPUUtilization
    TargetValue: 50.0
    ScaleOutCooldown: 300
    ScaleInCooldown: 300

Scale up during peak, down during quiet.

Spot Instances

For fault-tolerant workloads:

resource "aws_instance" "worker" {
  instance_market_options {
    spot_options {
      max_price = 0.05  # Bid $0.05/hour
      spot_instance_type = "one-time"
    }
  }
}

Spot instances cost 70-90% less than on-demand.

The Modern DevOps Stack

Infrastructure: Terraform (or Pulumi)
Containers: Docker (or Podman for security)
Orchestration: Kubernetes (or Nomad for simplicity)
CI/CD: GitHub Actions (or GitLab CI)
Monitoring: Prometheus + Grafana
Logging: Loki + Grafana (or Datadog for simplicity)
Tracing: Jaeger (or Honeycomb)
Secrets: AWS Secrets Manager
Security: Trivy + OWASP ZAP
Scanning: SonarQube for code quality

This stack is cloud-agnostic, battle-tested, and well-documented.

Culture: It’s Not Just Tools

DevOps isn’t about tools—it’s about culture:

  1. Blameless postmortems: Learning from failure without punishment
  2. Shared responsibility: Devs participate in on-call, Ops in code reviews
  3. Documentation: Runbooks for every service
  4. Automation: If you do it twice, automate it
  5. Measurement: You can’t improve what you don’t measure

Conclusion

DevOps in 2026 is mature. The best practices are clear:

Treat infrastructure as code. Automate everything. Monitor comprehensively. Plan for failures. Optimize continuously. Build blameless culture.

The old days of SSH and manual deployments are over. The modern stack is faster, more reliable, and scales infinitely.

Your job isn’t keeping servers running—it’s building systems that run themselves.

What will you automate?

Bittalks

Developer and tech enthusiast exploring the intersection of open source, AI, and modern software development.