MatrixGard Blog

Cloud Egress Costs in 2026: AWS vs GCP vs Azure for High-Traffic SaaS Startups

noreply@matrixgard.com (Avinash S) — Tue, 19 May 2026 08:30:00 GMT

Egress is the cloud bill line item that high-traffic SaaS founders almost always underestimate. Compute and database costs are predictable. You provision them, you watch them, you pay for them. Egress is different. It scales with user behaviour, with feature shape, with one accidental misconfiguration in a webhook fanout. It hides under different names on different providers (Data Transfer Out, Internet Egress, Outbound Data Transfer), it has tiered pricing that no built-in dashboard summarises clearly, and it is the single category most likely to surprise an early-stage SaaS team on the bill that arrives after a launch week.

This is the honest 2026 breakdown of cloud egress costs across AWS, GCP, Azure, and the egress-disrupting alternatives (Cloudflare R2, Backblaze B2). What the numbers actually are. What changed after the EU Data Act forced free egress on exit in 2024. The hidden inter-region and inter-AZ bills. The six engineering tactics that move the cost needle for high-traffic SaaS, and the stage-specific recommendations for pre-seed, seed, and Series A teams.

1. What "egress" actually means on a cloud bill

Cloud egress is the umbrella term for outbound data transfer that leaves the cloud provider's network. The bill breaks it into three buckets, priced very differently from each other:

Internet egress. Data that leaves the provider's edge and reaches the public internet (your users, third-party APIs, on-prem systems). Most expensive bucket. Tiered by monthly volume.
Inter-region egress. Data moving between two regions of the same provider (e.g. ap-south-1 to us-east-1 on AWS, or asia-south1 to us-central1 on GCP). Cheaper than internet, more expensive than zero, and almost never visualised on default dashboards.
Inter-AZ egress. Data moving between availability zones of the same region. The smallest line item in any specific request but the largest in volume for HA architectures, since every multi-AZ Postgres replica, every Kafka broker spread across zones, every Application Load Balancer fan-out generates this traffic.

Public sources for the canonical pricing: AWS EC2 Data Transfer pricing, GCP VPC Network Pricing, Azure Bandwidth Pricing. Treat these as the single source of truth; vendor sales decks routinely round in the direction that flatters their position.

2. The 2024 EU Data Act and the new free-egress-on-exit policy

The most important regulatory change for cloud egress in the last two years is the EU Data Act, which entered into force on 11 January 2024. Articles 23 to 25 of the Act target what the EU called "unjustified obstacles to switching" between cloud providers, with the specific goal of removing egress fees as a switching barrier. The Act gave providers a transitional period and a final deadline of January 2027, after which all switching-related data transfer charges must be removed.

The hyperscalers responded in early 2024:

AWS announced free data transfer out to the internet when moving out of AWS on 5 March 2024. Customers must request the credit, fully migrate workloads, and close their AWS accounts to qualify.
GCP eliminated data transfer fees when migrating off Google Cloud in January 2024, ahead of AWS. Similar request-based credit model.
Microsoft extended free data transfers out for customers leaving Azure shortly after, with the same pattern.

What this does NOT change: your day-to-day egress bill. The free-egress-on-exit policies cover only the one-time migration scenario when you fully leave the provider. Webhook traffic to a third party, user downloads from your S3 bucket, cross-cloud replication between AWS and a GCP analytics warehouse, none of that is touched by the EU Data Act provisions. The hyperscalers continue to charge their standard tiered egress rates for everything other than the explicit exit case.

Practitioner opinion: the press coverage of "AWS makes egress free" in 2024 was misleading for SaaS operators. Treat the policy as a switching-cost relief valve, not a structural change to your operating bill.

3. Internet egress: the actual per-GB numbers in 2026

The headline-grabbing numbers from each provider's pricing page, current as of May 2026. Note that GCP and Azure express egress pricing by destination zone (where the traffic lands), while AWS prices by source region (where the traffic originates), so the comparison is region-pair dependent.

AWS internet egress, US East (us-east-1) source:

First 10 TB / month: $0.09 per GB
Next 40 TB: $0.085 per GB
Next 100 TB: $0.07 per GB
Over 150 TB: $0.05 per GB (committed contract pricing can go lower)
First 100 GB per month is free across all AWS accounts

AWS internet egress, Mumbai (ap-south-1) source:

First 10 TB / month: $0.1093 per GB
Next 40 TB: $0.085 per GB
Next 100 TB: $0.082 per GB
Over 150 TB: $0.075 per GB

India egress is roughly 20 percent more expensive than US East at the entry tier, narrowing at higher volume. The same shape repeats for Singapore, Sao Paulo, and other emerging-market AWS regions.

GCP internet egress (worldwide destinations, excluding China and Australia):

First 1 TB / month: $0.12 per GB
Next 9 TB: $0.11 per GB
Over 10 TB: $0.08 per GB
Egress to a Google-owned destination (e.g. user traffic landing on the Google ASN): pricing varies by tier

GCP egress to Australia is priced separately and is higher; GCP egress to China is the most expensive of any provider-destination pair across the three clouds.

Azure internet egress (Zone 1 source: North America, Europe):

First 100 GB / month: free
Next 10 TB: $0.087 per GB
Next 40 TB: $0.083 per GB
Next 100 TB: $0.07 per GB
Over 150 TB: $0.05 per GB

Azure Zone 2 (India, Singapore, Hong Kong, Japan, Korea) is priced separately and runs around 10-15 percent higher than Zone 1 at every tier.

Cloudflare R2 (object storage, designed as an S3 alternative):

Internet egress: $0 per GB. Cloudflare publicly committed to zero egress fees at launch and has held the line.
You pay only for storage at $0.015 per GB-month and operations (Class A writes, Class B reads).

Backblaze B2 (object storage):

Internet egress: $0.01 per GB. Plus 3x your average storage daily egress is free under the Cloud Replication tier.

4. Inter-region egress: the bill that surprises HA architectures

The moment your architecture spans two regions, inter-region egress shows up. For high-availability database replicas, cross-region object storage replication, or multi-region Kafka, this becomes a meaningful line item that is rarely visible on default cost dashboards.

Reference numbers as of May 2026:

AWS: $0.02 per GB for inter-region transfer in the same continent (us-east to us-west), $0.05-$0.09 per GB for cross-continent (us-east to ap-south-1). Mumbai outbound to other AWS regions is in the $0.08-$0.09 range. AWS public pricing.
GCP: $0.02 per GB within North America, $0.05-$0.08 per GB cross-continent. India to North America runs at the upper end. Cloud Interconnect changes the calculation; see section 6.
Azure: $0.02 per GB Zone 1 to Zone 1, $0.05 per GB Zone 1 to Zone 2, $0.087 per GB Zone 2 outbound to any other zone.

Operational example. A startup running Aurora Postgres Multi-AZ in ap-south-1 with a cross-region read replica in us-east-1 will pay roughly $0.08-$0.09 per GB of WAL traffic shipped to the replica. For a transactional workload generating 200 GB of WAL per day, that is roughly $500-$550 / month on cross-region replication egress alone, on top of the database instance cost. Most early-stage teams do not see this line item because it is bundled into a generic "Data Transfer" category on the Cost Explorer default view.

5. Inter-AZ egress: the invisible HA tax

Same provider, same region, different availability zones. The smallest per-GB number on the bill, the largest cumulative line item for properly-architected HA systems.

AWS: $0.01 per GB in each direction (so $0.02 per GB round-trip) for inter-AZ. Same number across all regions. AWS Data Transfer within the same Region.
GCP: $0.01 per GB within the same region across zones, charged on the sender side only.
Azure: $0.01 per GB Availability Zone egress within a region.

Where this surprises teams:

Kafka clusters spread across three AZs. Default replication factor 3 means every produced byte is shipped to two replica brokers, both in different AZs. A 500 MB / second produce rate becomes 1 GB / second of inter-AZ traffic, or about 86 TB / day. That is $860 / day, $26,000 / month, of pure inter-AZ egress on a single Kafka cluster. The AWS MSK pricing page does not show this; it appears in EC2 Data Transfer.
Cross-AZ database replicas. Aurora Multi-AZ does not incur inter-AZ egress (Aurora uses a shared storage layer that pre-replicates), but classic RDS Multi-AZ does. Cloud SQL HA same shape. Verify on your specific managed database before assuming.
EKS / GKE cluster pods talking to each other across AZs. The default Kubernetes scheduler does not consider AZ-affinity for inter-service traffic. A pod in zone A talking to a service IP that routes to a backend pod in zone B generates inter-AZ egress on every request.

Practitioner opinion: for a high-throughput SaaS at the seed stage and beyond, inter-AZ egress is often 20-40 percent of total egress spend. The default operating posture should be: place latency-sensitive call graphs in the same AZ via topology-aware routing, and accept the slightly reduced HA blast radius. Spreading a microservice mesh across three AZs by default, with no topology awareness, is operationally expensive and almost never delivers the HA benefit it implies.

6. Private connectivity: Direct Connect, Cloud Interconnect, ExpressRoute

If your egress volume to a specific destination crosses 5-10 TB / month, private connectivity becomes a real cost lever, not a luxury.

AWS Direct Connect. A 1 Gbps Dedicated Connection from an AWS Direct Connect location runs around $0.30 per port-hour plus data transfer at $0.02 per GB outbound to the internet (versus $0.09-$0.11 per GB on the standard egress path). Break-even versus standard egress: roughly 5-7 TB / month. Public reference: AWS Direct Connect pricing.

GCP Cloud Interconnect. Dedicated Interconnect at 10 Gbps runs around $1,700 / month for the port (regional availability dependent) plus $0.02 per GB outbound. Partner Interconnect at smaller commits (50 Mbps to 10 Gbps) at proportional pricing. Public reference: GCP Interconnect pricing.

Azure ExpressRoute. Local SKU starts at around $55 / month for 50 Mbps to a metro circuit; Standard SKU at 1 Gbps runs around $300 / month plus $0.025 per GB egress (Zone 1) on Metered plans, or unlimited egress on the Unlimited Data plan. Public reference: Azure ExpressRoute pricing.

For a high-traffic SaaS pushing 50-100 TB / month to a small set of large enterprise customers (typical B2B SaaS shape), private connectivity is the largest single FinOps lever. A 1 Gbps Direct Connect carrying 50 TB / month costs roughly $1,000 in port-hours and another $1,000 in data transfer, total $2,000, versus the same 50 TB at standard egress rates of $0.085-$0.09 per GB which runs $4,250-$4,500. The savings compound at higher volumes.

Caveat: private connectivity adds operational complexity (circuit ordering through a carrier or DC partner, BGP peering, routing policy, redundancy planning). For workloads under 5 TB / month it is rarely worth the engineering time.

7. CDN egress: CloudFront, Cloud CDN, Azure Front Door

For consumer-facing or content-heavy SaaS, the right question is rarely "how do I cut origin egress" and almost always "how do I serve from a cache that does not bill origin egress on every hit." The CDN tier is where this happens.

AWS CloudFront. Per-GB pricing is broadly cheaper than direct S3 / EC2 egress in most regions, especially under the free 1 TB / month tier and the CloudFront Security Savings Bundle. India (Asia Pacific) CloudFront pricing: $0.109 per GB first 10 TB, $0.085 next 40 TB. North America: $0.085 first 10 TB, $0.080 next 40 TB. CloudFront-to-S3 origin pulls are free, which is the key economic property.

GCP Cloud CDN. Cache egress to internet (cache fill from origin is free for GCS origins in the same region). Tier 1 (worldwide destinations excluding Australia, China): $0.08-$0.12 per GB depending on volume. GCP Cloud CDN pricing.

Azure Front Door / Azure CDN. Standard tier egress $0.081-$0.087 per GB for Zone 1 destinations. Azure Front Door pricing.

Cloudflare (used as a CDN in front of any origin). Cloudflare's CDN egress to internet is included in the plan flat fee. The Free, Pro ($25 / month), and Business ($250 / month) plans all carry unmetered bandwidth for typical web traffic. Enterprise plans negotiate. For an early-stage SaaS, putting Cloudflare in front of an AWS / GCP / Azure origin and caching aggressively turns most of the egress bill into a flat monthly Cloudflare fee. This is the largest possible cost lever for a content-heavy or read-heavy workload.

Note: Cloudflare's Terms of Service section 2.8 historically restricted unmetered bandwidth for non-HTML / non-website traffic on lower tiers. Video streaming, large file distribution, and similar workloads can trip the AUP. Read the AUP before betting your architecture on "unmetered."

8. Object storage egress: S3, GCS, Azure Blob, R2, B2

Object storage egress deserves its own treatment because it is the single most common surprise on a startup's bill. Numbers as of May 2026:

S3 internet egress (us-east-1): $0.09 per GB tier 1 (uses the same EC2 Data Transfer Out tiering).
S3 internet egress (ap-south-1): $0.1093 per GB tier 1.
S3 to CloudFront: free ("origin fetch"). This is why CDN-fronted S3 is the standard pattern.
GCS internet egress (worldwide, tier-1 excluding China and Australia): $0.12 per GB first 1 TB, $0.11 per GB next, $0.08 per GB over 10 TB.
Azure Blob internet egress (Zone 1): $0.087 per GB first 10 TB.
Cloudflare R2 internet egress: $0 per GB. Architecturally the most disruptive option for egress-heavy workloads.
Backblaze B2 internet egress: $0.01 per GB, with the first 3x of daily storage free.

For pure object storage backed by frequent egress (CDN origin for static assets, software downloads, media libraries, on-demand video) the gap between R2 / B2 and the hyperscalers is structural. A 100 TB / month egress workload runs roughly $8,500-$10,000 on S3 / GCS / Blob, $1,000 on B2, and effectively zero on R2 (only the storage and operations fees, around $1,500 / month for 100 TB stored).

Practitioner opinion: if you are running a static-asset-heavy SaaS and your egress bill is more than $2,000 / month, R2 or B2 should be on your six-month roadmap. The migration is mechanical and the savings recover the engineering time within one to two billing cycles.

9. Which SaaS workload shapes get hurt most by egress

Some workloads are egress-light by nature; others are structurally egress-heavy. Recognising your shape early matters because the architectural response is different for each.

API-only SaaS (CRM, accounting, project management). Egress is usually 5-15 percent of bill. Response sizes are small, JSON payloads compress well, mostly TLS overhead. Low priority for egress optimisation work.
Webhook-heavy fintech and notification platforms. Outbound webhook delivery to thousands of external endpoints, often retrying on failure. Egress can run 15-30 percent of bill. Look at retry-storm patterns, exponential backoff configuration, and dead-letter queues before optimising the data path itself.
Media-heavy SaaS (video editing, photo sharing, podcast hosting). Egress is often 40-70 percent of the bill once the user base crosses a few thousand active accounts. R2 / B2 plus aggressive CDN caching is the structural fix. Origin-served media without a CDN is a financial mistake at scale.
Data and analytics SaaS (BI, data warehouse, observability). Egress shows up two ways: customer-facing exports (CSV / Parquet downloads) and cross-cloud replication if the analytics tier lives somewhere different from operational data. Cross-cloud replication is the more dangerous of the two because it is steady, predictable, and rarely visible to the engineering team.
AI inference SaaS. Egress includes both the response payloads (large for image / video generation, small for text) and any audio / video streaming back to the client. For a video-generation SaaS pushing 50-200 MB outputs at scale, egress can equal compute spend.

10. Six engineering tactics that move the egress bill

In order of leverage, highest first:

(a) Put a CDN in front of everything that can cache. CloudFront, Cloud CDN, Azure Front Door, Cloudflare. The economics of S3-to-CloudFront-free and Cloudflare's flat-fee bandwidth make this the single biggest cost lever for any read-heavy workload. If you are not running a CDN today, this is week-one work.

(b) Move static-asset and download-heavy object storage to R2 or B2. If your egress is dominated by static-asset serving, the price differential is too large to ignore. R2 specifically eliminates the egress line item entirely. The S3 API compatibility makes the migration a config change for most SDK-based workflows.

(c) Topology-aware AZ routing for inter-service traffic. In Kubernetes, use service topology hints or the TopologyAwareRouting feature to keep client-server traffic in the same AZ when possible. In AWS classic VPC architectures, place tightly-coupled services (web server + cache + database) in the same AZ. Accept that one-AZ-down loses that service tier, and rely on multi-AZ ELB / ALB for fan-out resilience rather than mesh-level multi-AZ chatter.

(d) Compress everything. gzip, Brotli, and zstd at the application level for HTTP responses. zstd at the storage tier for cold data. For JSON-heavy APIs, Brotli at quality 4-6 typically compresses 60-75 percent versus uncompressed, and your egress bill drops in roughly the same proportion for that traffic.

(e) Replace cross-cloud or cross-region replication with private connectivity. If you have a steady 5+ TB / month flowing between AWS and GCP, or between two AWS regions, the economics of Direct Connect / Interconnect / ExpressRoute pay back inside a quarter for most volume tiers. Combine with replication-friendly database engines that ship deltas rather than full rows.

(f) Audit your Cost Explorer for the "Data Transfer" bucket every month. Most early-stage teams look at compute and database costs first, egress last. Flip that order. The biggest single optimisation discovery I have personally found across audits is a misconfigured cross-region replication shipping a database in real time to a region that was supposed to be the cold DR target. Six months of $4,000 / month bills before anyone noticed.

11. The honest summary table

Workload	AWS	GCP	Azure	Best alternative
Internet egress, US source, <10 TB	$0.09/GB	$0.12/GB	$0.087/GB	Cloudflare R2 ($0)
Internet egress, India source, <10 TB	$0.1093/GB	$0.12/GB	$0.10/GB (Zone 2)	Cloudflare R2 ($0)
Inter-region (same continent)	$0.02/GB	$0.02/GB	$0.02/GB	Private connectivity if >5 TB / mo
Inter-region (cross-continent)	$0.05-$0.09/GB	$0.05-$0.08/GB	$0.05-$0.087/GB	Private connectivity, async replication
Inter-AZ same region	$0.01/GB each way	$0.01/GB sender	$0.01/GB	Topology-aware routing
Free-egress on exit (EU Data Act)	Yes, since Mar 2024	Yes, since Jan 2024	Yes, since 2024	One-time only
Object storage egress (heavy CDN origin)	S3 $0.09/GB direct, free to CloudFront	GCS $0.12/GB direct, free to Cloud CDN	Blob $0.087/GB direct, free to Azure CDN	R2 ($0 egress), B2 ($0.01/GB)
CDN egress (cached delivery)	CloudFront $0.085-$0.109/GB	Cloud CDN $0.08-$0.12/GB	Front Door $0.081-$0.087/GB	Cloudflare (flat plan fee)

12. Stage-specific recommendations

Pre-seed (1-5 engineers, <$5k / month cloud bill). Egress is probably 5-10 percent of your bill. Do not over-engineer. Put Cloudflare in front of your origin (free or $25 Pro tier), enable gzip on every endpoint, leave the rest alone. The opportunity cost of optimising egress at this stage is much higher than the dollar savings.

Seed (5-20 engineers, $5k-$30k / month cloud bill). Egress is probably 10-25 percent of bill. Audit the Data Transfer line on Cost Explorer / Billing once a month. If you are serving static assets, move them behind Cloudflare with aggressive caching; if you are running cross-region replication, verify it is necessary and configured efficiently. For media-heavy workloads, evaluate R2 / B2 migration as a one-quarter project.

Series A (20-50 engineers, $30k-$200k / month cloud bill). Egress is probably 20-40 percent of bill. Hire or assign a part-time FinOps owner. Audit inter-AZ traffic patterns (especially Kafka and Kubernetes service mesh). Evaluate Direct Connect / Cloud Interconnect / ExpressRoute for the top 2-3 destinations. Consider negotiating committed egress pricing with your account team; at this volume, 15-25 percent discounts versus published rates are routinely available with a 1-3 year commit.

Series B and beyond. Egress economics start to drive architectural decisions: where to place compute relative to users, whether to operate your own edge POPs (rare, but real at the scale of Netflix, Cloudflare, Spotify), and whether multi-cloud is paying for itself or quietly bleeding 1.6-1.8x on egress with no offsetting benefit.

The trap: free-egress-on-exit makes the day-to-day bill look smaller than it is

The 2024 EU Data Act coverage made cloud egress sound like a solved problem in the press. It is not. The free-egress-on-exit policy applies only when you fully leave a provider, and even then you need to actively request the credit and close the account. Daily operational egress to your users, to third-party APIs, to your other cloud, continues to bill at standard tiered rates and remains one of the largest single optimisable line items on any high-traffic SaaS bill.

Treat egress as you would any other unbounded cost driver: instrument it, tag it, alert on anomalies, and assign a clear owner to optimise it. The teams I have seen most surprised by their egress bill are uniformly the teams that had no one looking at it month-over-month.

If you want a second opinion on your egress posture

I run a free 20-minute cloud cost audit for SaaS founders looking at high-traffic workloads. Pull your Cost Explorer / Billing report for Data Transfer for the last 90 days; bring the breakdown; I will give you a ranked list of the three highest-leverage optimisations specific to your architecture, with rough payback timelines. No NDA needed for the first conversation. Send a note.

Avinash S is the founder of MatrixGard. Fractional DevSecOps for pre-seed and seed startups across India, the GCC, the UK, and the US. Almost a decade of running production workloads across AWS, GCP, and Azure, including egress-heavy CDN, media, and data-replication architectures.

Methodology note. All pricing references taken from public AWS, GCP, and Azure pricing pages, plus the public Cloudflare and Backblaze pricing pages, current as of May 2026. Regulatory references taken from the European Commission's Data Act materials and the public AWS / GCP / Azure announcement blogs on free-egress-on-exit. Vendor sales decks and analyst reports were not used. Cloud pricing changes quarterly; verify the specific numbers against the source pages before committing them to a budget. Operational opinions are mine, labelled inline. The summary table aggregates published prices and rounds to the nearest commonly-cited tier; reasonable practitioners working from the same primary sources will arrive at substantially the same conclusions, though stage-specific recommendations vary by workload shape.

PCI DSS 4.0 in 2026: The 9 Most-Missed Requirements for Pre-Seed Fintech CTOs

noreply@matrixgard.com (Avinash S) — Tue, 19 May 2026 08:00:00 GMT

PCI DSS 4.0 has been fully in force globally since March 31, 2025. By May 2026, every entity touching cardholder data, whether a payment-processing startup or an e-commerce shop accepting card payments, is expected to be compliant against the updated standard. Yet most pre-seed and seed fintech teams are still operating against PCI DSS 3.2.1 mental models. The result: their first formal assessment lands with avoidable failures.

This is not another generic walkthrough of the 12 PCI DSS requirement areas. The PCI Security Standards Council publishes those, and they are exhaustive. This is a focused list of the nine requirements I see pre-seed and seed fintechs most often miss when preparing for their first assessment, drawn from public PCI Council documentation and patterns common across early-stage cloud-native startups.

Each section names the requirement, cites its PCI DSS 4.0 reference, explains why startups miss it, and outlines what "passing" actually looks like at the cloud-native engineering level. Where I am stating practitioner opinion rather than the standard's text, I have labelled it inline.

Quick context: what changed in 4.0

PCI DSS 4.0 was published by the PCI Security Standards Council in March 2022. The transition timeline was: PCI DSS 3.2.1 retired in March 2024, and the "future-dated" requirements (the most operationally demanding changes) became mandatory in March 2025. As of 2026, full 4.0 compliance is required for any merchant or service provider in scope.

Four structural shifts matter for the engineering team:

Customised approach: a new option (Annex E) that lets entities meet a requirement through alternative controls, provided they document a Targeted Risk Analysis. This is genuine flexibility but it adds documentation overhead.
Continuous focus, not point-in-time: many controls now require ongoing monitoring rather than annual proof.
Multi-factor authentication everywhere: no longer just for admin access.
Stronger cryptography and inventory requirements: full crypto inventories, mandatory keyed hashing, and longer minimum password lengths.

With that frame, here are the nine requirements I see startups miss most.

1. MFA on ALL access to the CDE, not just admin

PCI DSS 4.0 Reference: Requirement 8.4.2

Under 3.2.1, multi-factor authentication was required only for non-console administrative access to the cardholder data environment (CDE) and for remote access. Under 4.0, MFA is required for all access into the CDE, regardless of whether the user is an administrator or a regular employee. This is the single most common gap I see at startups.

The typical failure pattern: the engineering team has MFA enforced on their cloud console (AWS Console, GCP Console) for admin roles via IAM Identity Center or similar. But the backend admin portal that customer support staff use to look up a transaction, a portal that touches cardholder data, only requires a password. That portal is now non-compliant.

What passing looks like: every system that stores, processes, or transmits cardholder data, plus every system connected to the CDE, enforces MFA for all users. For cloud-native startups this typically means: IAM Identity Center with MFA enforced at the SSO layer for all human access, plus application-level MFA on internal admin portals via your auth provider (Auth0, Clerk, WorkOS).

2. Fifteen-character minimum passwords

PCI DSS 4.0 Reference: Requirement 8.3.6

The minimum password length under 3.2.1 was seven characters. Under 4.0 it is fifteen characters (or twelve if combined with other complexity requirements). Most pre-seed startups still have their authentication providers configured to the seven-character minimum that was the industry standard a decade ago.

This sounds trivial. It is not. Changing minimum length triggers password resets for the existing user base, which means a forced support workload spike on the day of the change. Startups that defer this hit the deadline scramble at month eleven of their compliance prep.

What passing looks like: your IdP (Okta, Azure AD, Auth0) password policy is updated to 15-character minimum, and the change is rolled out with sufficient communication time so users do not get locked out. A single configuration change in Auth0 or Okta admin, but plan it for a low-traffic week.

3. Authenticated internal vulnerability scans

PCI DSS 4.0 Reference: Requirement 11.3.1.2

Under 3.2.1, internal vulnerability scans had to be performed quarterly but could be unauthenticated (the scanner did not need to log in to the systems it was scanning). Under 4.0, internal vulnerability scans must be authenticated, meaning the scanner runs with credentials that allow it to inspect the actual configuration of each host.

The failure mode: startups run Nessus, Qualys, or Tenable scans without configuring credentialed scanning, then submit the results as evidence. The auditor flags this immediately. Authenticated scanning surfaces a different (and larger) set of findings, because it can read configuration files, package versions, and patch levels that surface scanning cannot see.

What passing looks like: your vulnerability scanner is configured with a dedicated service account on each in-scope host (or via cloud-native agents) that has read-only access to package managers, registry/config stores, and OS-level metadata. Scans run quarterly minimum, and reports are reviewed within an SLA. For containerised workloads this typically means using AWS Inspector or equivalent.

4. Targeted Risk Analysis documentation

PCI DSS 4.0 Reference: Requirement 12.3.1

Under 4.0, the entity must perform and document a Targeted Risk Analysis (TRA) for every requirement where it uses the customised approach (Annex E), and for every compensating control. The TRA must justify the risk-equivalence of the alternative control compared to the defined approach.

Most startups discover this requirement on the day they realise a particular defined-approach control will not work for them. They reach for the customised approach as a workaround, then learn that customised approach requires extensive TRA documentation: threat modelling, control effectiveness analysis, residual risk justification, annual review.

What passing looks like: a documented TRA for each requirement where you deviate from the defined approach. The PCI Council publishes a TRA template; use it. The TRA is a written artefact, not a verbal explanation to the auditor. Annual review is required, so calendar a TRA refresh review every twelve months.

Practitioner opinion: for a pre-seed startup, the customised approach is usually not worth the documentation overhead. Stick to the defined approach wherever possible and only invoke customised approach for the one or two genuinely awkward controls.

5. Detection of changes to payment pages (anti-skimming)

PCI DSS 4.0 Reference: Requirements 6.4.3 and 11.6.1

This is the single most consequential new requirement in 4.0 for e-commerce merchants and payment-page integrators. The standard now requires:

Req 6.4.3: a mechanism to authorise all scripts loaded on payment pages, plus an integrity check to detect unauthorised script changes.
Req 11.6.1: a change-and-tamper-detection mechanism that alerts the entity to unauthorised modifications of HTTP headers or the payment-page DOM.

The threat being mitigated here is Magecart-style attacks, where a malicious script is injected into a payment page and silently exfiltrates card data to an attacker-controlled domain. Most startups have no monitoring at all on their payment-page integrity.

What passing looks like: implementation of either Content Security Policy with strict source allowlisting, Subresource Integrity (SRI) hashes for every third-party script, or a payment-page monitoring tool (Source Defense, Imperva Client-Side Protection, Akamai Page Integrity Manager) that detects DOM/script changes in real time. For a pre-seed shop the cheapest viable path is CSP plus SRI, configured carefully and tested against the actual payment integration (Stripe, Razorpay, Adyen). Many fintechs offload this entirely to the payment processor by using a hosted payment page (Stripe Checkout, Razorpay Standard Checkout) where the merchant page never directly handles the card data, narrowing PCI scope.

6. Cryptographic inventory

PCI DSS 4.0 Reference: Requirement 12.3.3

The entity must maintain a documented inventory of all cryptographic cipher suites and protocols in use, reviewed at least annually. This includes both data-at-rest and data-in-transit cryptography, across all systems in scope.

Most cloud-native startups have no formal inventory. They know "we use TLS 1.2 or higher" and "we encrypt with AES-256" but cannot produce a written document listing: which TLS versions are enabled on which load balancers, which cipher suites are accepted, which KMS keys exist, which symmetric and asymmetric algorithms are used by which application, which hash functions are used for password storage, what the key rotation schedule is for each key.

What passing looks like: a single document (typically a spreadsheet or a Confluence page) listing every cryptographic algorithm, cipher suite, and key in use across the in-scope environment, mapped to the system that uses it, the rotation schedule, and the responsible team. Reviewed annually with a documented sign-off. This document is one of the highest leverage compliance artefacts to build early because it surfaces weak-cipher misconfigurations that would have been failures regardless of PCI DSS.

7. Anti-phishing controls

PCI DSS 4.0 Reference: Requirement 5.4.1

The entity must deploy automated mechanisms that detect and protect personnel against phishing attacks. This is a new explicit requirement in 4.0; under 3.2.1, anti-phishing was implicit under broader malware-protection language.

Most startups rely on the default phishing protection that ships with Google Workspace or Microsoft 365. That default is good but does not by itself satisfy the requirement. The standard expects active configuration plus visible evidence of detection capability.

What passing looks like: a documented anti-phishing technology stack (the email provider's protection settings, configured rather than at default, plus optionally a dedicated tool like Abnormal Security, Material Security, or Tessian for higher-risk environments) and quarterly phishing simulation runs with results tracked. For a pre-seed team, the cheapest viable path is enabling Google Workspace's advanced phishing protection settings (Strict mode, external sender warnings, encrypted external email warnings) plus running a quarterly phishing simulation via a free tier of KnowBe4 or GoPhish.

8. Manual code review for bespoke software in the CDE

PCI DSS 4.0 Reference: Requirement 6.2.4

Under 4.0, software developed internally for use in the CDE (custom and bespoke software) must be reviewed at least annually using either manual code review by qualified personnel or automated tools (or both). The wording is important: "either" is acceptable, but pure reliance on automated SAST scanning without any manual review is not sufficient if the SAST tool has known limitations on the language or framework used.

The failure pattern: a startup runs GitHub Advanced Security or Snyk Code, generates a clean scan report, and assumes that suffices. The auditor asks: what does the SAST tool's documentation say about its coverage of your stack? If there are known gaps (and there always are: SAST tools struggle with custom DSLs, complex business-logic vulnerabilities, and certain serverless patterns), some level of manual review is required to compensate.

What passing looks like: automated SAST in CI/CD (GitHub Advanced Security, Snyk, Semgrep), plus an annual targeted manual review of the in-scope code paths by either a qualified team member or an external code-review service. Documented review notes, not just the SAST report.

9. Continuous monitoring for service providers

PCI DSS 4.0 Reference: Requirement A.3.5

For entities that meet the definition of a service provider (which includes most B2B fintech startups that process or store cardholder data on behalf of another entity), 4.0 introduces continuous monitoring obligations that go beyond the annual assessment. Service providers must perform and document ongoing reviews of their PCI DSS scope, the in-scope systems, and the effectiveness of their controls.

Pre-seed and seed fintechs often miss this because they treat PCI DSS compliance as a one-and-done event (pass the assessment, file the AOC, ship). The standard now expects ongoing operational rigour: quarterly internal reviews of scope changes, ongoing control effectiveness validation, change-driven re-assessment when the architecture shifts.

What passing looks like: a documented quarterly compliance review cadence with assigned owner, output artefacts (a quarterly compliance status report), and evidence of scope re-validation when significant architectural changes occur. This is calendar discipline more than engineering work, but startups that skip it find themselves scrambling to reconstruct evidence at re-assessment time.

The honest summary table

Most-missed requirement	PCI 4.0 Section	Typical fix effort
MFA on all CDE access, not just admin	8.4.2	1-2 weeks (IdP reconfiguration + comms)
15-character minimum passwords	8.3.6	1 day (IdP config) + 1-2 weeks for rollout
Authenticated internal vulnerability scans	11.3.1.2	2-4 weeks (scanner credentials, agent rollout, baseline)
Targeted Risk Analysis documentation	12.3.1	1-2 weeks per TRA (compounds quickly)
Payment-page change detection (anti-skimming)	6.4.3 / 11.6.1	2-6 weeks (CSP + SRI + monitoring tool)
Cryptographic inventory	12.3.3	1-2 weeks (audit + documentation)
Anti-phishing controls	5.4.1	1 week (Workspace/M365 config + sim setup)
Manual code review for bespoke software	6.2.4	Annual; 1-2 weeks per cycle
Continuous monitoring (service providers)	A.3.5	Ongoing; quarterly cadence

Stage-specific recommendation

If you are a pre-seed fintech (under 15 engineers) just starting PCI DSS scoping: reduce scope first. Use a hosted payment page (Stripe Checkout, Razorpay Standard, Adyen Drop-in) so your application never touches raw card data. This narrows PCI scope dramatically, typically from SAQ D to SAQ A or A-EP. Several of the requirements above either drop out of scope or become straightforward at that lower SAQ tier.

If you are a seed fintech processing card data through your own systems (SAQ D-merchant or SAQ D-service-provider): the nine requirements above are your highest-priority gaps. Order of operations: passwords (Req 8.3.6, fastest) and MFA (Req 8.4.2, near-fastest), then payment-page anti-skimming (Req 6.4.3 / 11.6.1, the highest-risk if missing), then cryptographic inventory (Req 12.3.3, foundational documentation), then the rest in the order above.

If you are a service-provider fintech with enterprise customers asking for AoC: the continuous-monitoring requirement (A.3.5) is your enterprise-customer-facing signal. Build the quarterly review cadence early. Enterprise procurement teams will ask for evidence of ongoing compliance posture, not just an annual certificate.

The trap: assuming PCI DSS is a 12-month project

The most expensive mistake I see Indian and GCC fintech founders make is treating PCI DSS compliance as a 12-month preparation project culminating in an audit. The reality is closer to: PCI DSS becomes a baseline operational rhythm from the day cardholder data first touches your infrastructure. The annual assessment is just the visible checkpoint.

The startups that pass cleanly are not the ones who hire a consultant for a final-month sprint. They are the ones who built the controls in continuously from week one of touching card data. The gap requirements above are the ones that compound when deferred: passwords, MFA, crypto inventory, and TRA documentation all become exponentially harder to retrofit after the system has grown.

If you want a second opinion on your PCI DSS 4.0 scope and gaps

MatrixGard runs a free 20-minute PCI DSS scope and gap-readiness audit for pre-seed and seed fintech founders. Your specific cardholder data flow, your current SAQ tier, your most likely gaps against the 4.0 requirements, my honest read in 20 minutes. No NDA required for the first conversation. Send a note.

Avinash S is the founder of MatrixGard. Fractional DevSecOps for pre-seed and seed startups across India, the GCC, the UK, and the US. Almost a decade of building, breaking, and securing cloud infrastructure for fintech, healthtech, and SaaS workloads.

Methodology note. All requirement references taken from the PCI DSS v4.0 specification as published by the PCI Security Standards Council. The "most missed" framing is a practitioner opinion based on pattern frequency, not a published PCI Council statistic. Fix-effort estimates are practitioner ranges; actual effort varies with architecture and team maturity. The list is not exhaustive; PCI DSS 4.0 contains 64 distinct requirements across 12 control areas, and full compliance requires meeting all applicable controls for the entity's SAQ tier.

AWS vs GCP for Indian Fintech: The 12 Decision Points No One Writes About

noreply@matrixgard.com (Avinash S) — Fri, 15 May 2026 08:30:00 GMT

The standard AWS-vs-GCP comparisons online miss the realities that matter for an Indian fintech building in 2026. Most are written from a US-enterprise perspective. The factors that actually decide cloud choice for an RBI-regulated, India-incorporated fintech serving Indian users with a 5-50 person engineering team are different.

This is the breakdown across 12 decision points, with honest verdicts per factor. Both clouds are good. Neither is universally right. The right answer depends on which of these 12 you weight highest.

I have shipped production workloads on both AWS and GCP across most of the last decade, including India-region workloads with payment, KYC, and compliance scope. What follows is operational opinion grounded in that, plus public AWS, GCP, RBI, and MeitY documentation. Where I am stating opinion rather than fact, I have labelled it as such.

1. India region maturity and latency

AWS opened Mumbai (ap-south-1) in June 2016 and Hyderabad (ap-south-2) in November 2022. Three Availability Zones in Mumbai, three in Hyderabad. The Mumbai region carries almost every AWS service within months of US launch and has the densest CloudFront edge network in India (Mumbai, Chennai, Delhi, Hyderabad, Bengaluru, Kolkata).

GCP opened Mumbai (asia-south1) in November 2017 and Delhi (asia-south2) in July 2021. Three zones each. Service coverage has caught up substantially since 2022, though a handful of services (some newer Vertex AI features, certain Anthos add-ons) still lag the Mumbai region by 3-6 months versus US launch.

Verdict for Indian fintech: AWS wins on maturity, especially if you need active-active across two Indian regions for RBI Business Continuity Planning expectations. Hyderabad as a second AWS region is more mature than Delhi as a second GCP region today. Latency to users in Mumbai, Bengaluru, and Delhi is similar from both providers; the Tier-1 CDN tiers are comparable. The maturity gap closes another 30-50% per year, so by late 2026 this factor becomes near-neutral.

2. RBI Data Localisation and regulatory comfort

The relevant policies for Indian fintech are: RBI Storage of Payment System Data 2018 (payment data must be stored only in India), RBI Master Direction on Outsourcing of IT Services 2023, and the DPDP Act 2023 rules notification.

Both AWS and GCP are listed as eligible cloud service providers in MeitY's empanelment. Both publish RBI-aligned shared-responsibility models. Both offer India-resident customer data isolation, region-locked storage, and contractual commitments around regulator access. Both have walked through actual RBI bank inspections successfully via customers.

The operational difference is in how much paperwork the vendor already has signed for Indian regulators. AWS has had more Indian banks and NBFCs as customers for longer, which means standard MSAs already include RBI-acceptable clauses (data residency, audit rights, exit assistance, supervisory access). GCP has caught up, but for first-time RBI-regulated buyers the AWS legal package is more out-of-the-box.

Verdict: AWS, narrowly, on regulatory comfort. Once GCP has signed an MSA with you that includes the standard RBI clauses, the difference disappears. Plan an extra 2-4 weeks of legal review if you go GCP-first as a regulated Indian fintech.

3. Pricing for fintech-shaped workloads

The default pricing pages mislead. Indian fintech has a workload shape (compute + managed database + KMS + outbound bandwidth for webhooks + log retention) where the real cost lives in three line items: compute commit discounts, managed-database HA, and egress.

For equivalent on-demand compute (general-purpose VMs in Mumbai), GCP n2-standard pricing runs around 10-20% lower than AWS m6i in 2026, before any commit discount. With Committed Use Discounts (CUDs) of 1-year, GCP can drop another 30-35%. AWS Savings Plans (1-year, all-upfront) typically discount 35-50%. The math evens out at the upper commit tier; GCP wins at the no-commit floor.

Managed databases: Cloud SQL for PostgreSQL HA is about 15-25% cheaper than equivalent AWS RDS Multi-AZ for the same vCPU + memory + storage spec in Mumbai region. Aurora pricing is higher than both but you are buying a different engine architecture. Spanner, GCP's globally distributed SQL database, has no AWS equivalent at the same consistency tier (DynamoDB global tables are eventually consistent at the table level; Spanner is strongly consistent at the row level globally).

Egress bandwidth, the line item most fintech founders ignore until the bill arrives: AWS lists Mumbai egress at $0.1093 per GB up to 10 TB/month. GCP lists Mumbai egress at $0.12 per GB up to 1 TB/month, then $0.11 / $0.08 per GB at higher tiers. AWS's Reserved Instances do not reduce egress; GCP's commits do not either. For a webhook-heavy fintech (payment notifications, account updates, sync to external KYC providers) egress can be 15-30% of the monthly bill.

KMS: AWS KMS charges per key ($1/month per CMK) plus per request ($0.03 per 10,000 requests). GCP KMS charges $0.06 per active key version per month plus $0.03 per 10,000 operations. For a fintech with 50-200 CMKs (one per service per environment), KMS line item is comparable.

Verdict: GCP cheaper at the no-commit floor and for moderate workloads. AWS competitive at high commit tiers (3-year Savings Plans). Honest call: a single early-stage fintech burning ₹5-15 lakh/month on cloud will save 10-25% on GCP. Past ₹50 lakh/month, the gap closes or reverses depending on commit posture.

4. Database choices that matter for ledger systems

This is the factor where the two clouds diverge most for fintech. The choice is rarely simple.

AWS: Aurora PostgreSQL/MySQL is the workhorse for transactional workloads. Aurora Serverless v2 scales between 0.5 and 256 ACUs without read-replica downtime. DynamoDB for high-throughput key-value, with Global Tables for multi-region. RDS Proxy for connection pooling. Redshift for analytical workloads. The fintech-standard stack is: Aurora for ledger + DynamoDB for hot lookups + S3 + Athena for cold analytics.

GCP: Cloud SQL for PostgreSQL/MySQL is operationally simpler than RDS, but lacks Aurora's high-throughput storage architecture. Spanner is the unique GCP capability, globally distributed strongly-consistent SQL with five-nines SLA, but pricing starts around $0.90/node-hour minimum, so the floor for a non-toy Spanner instance is roughly $650/month. Firestore for document/key-value. BigQuery for analytics, the strongest analytical database on either cloud by significant margin.

For an Indian fintech building a ledger system that needs strong consistency at scale (think: settling cross-border remittances or running an in-house wallet), Spanner is genuinely a category-of-one product. AWS does not have a direct equivalent.

For a fintech building a simpler ledger + reads-heavy analytics workload, BigQuery beats Redshift on time-to-insight and price-per-query for ad-hoc fraud and risk queries.

Verdict: GCP wins on analytics (BigQuery) and globally-distributed SQL (Spanner). AWS wins on the operational maturity of Aurora and the depth of the surrounding ecosystem (RDS Proxy, Aurora Serverless v2 autoscaling). For most Indian fintechs at seed stage, Aurora is the safer default. For a fintech that will live or die on real-time analytics, GCP is the better long-term bet.

5. IAM, credential management, and secret rotation

This is the factor I have the strongest opinion on, having operationally maintained both.

AWS IAM is more powerful, more granular, and more complex than GCP IAM. SCPs at the Organizations level, permission boundaries, resource-based policies, and policy simulators give you control that GCP cannot match. AWS IAM Access Analyzer surfaces unintended external sharing more comprehensively than GCP's IAM Recommender.

GCP IAM is simpler, more opinionated, and frequently safer-by-default. The killer feature: Workload Identity Federation for GKE, which eliminates static service account keys for pods. Pods authenticate as Kubernetes service accounts; GCP IAM maps those to GCP service accounts; no JSON keys distributed, no secrets to rotate. AWS has IRSA (IAM Roles for Service Accounts on EKS) which achieves similar, but the GCP implementation requires less ceremony.

Secret management: AWS Secrets Manager is mature, integrates with Lambda, RDS auto-rotation, and CloudWatch Events for custom rotation hooks. GCP Secret Manager is simpler, with versioning baked in, but lacks the same depth of automated-rotation hooks.

Verdict: GCP wins on default-safety (Workload Identity, simpler IAM, fewer ways to misconfigure). AWS wins on advanced control surface (SCPs, permission boundaries, organization-level governance). For a startup with a 5-15 person engineering team that does not have a dedicated cloud security engineer, GCP's defaults reduce risk. For a fintech that needs fine-grained policy control across hundreds of accounts, AWS is more capable.

6. PCI DSS scope and shared-responsibility nuances

Both clouds carry PCI DSS 4.0 attestation. Both publish the Responsibility Matrix and the AOC (Attestation of Compliance) for download.

The operational difference: AWS marketplace has more PCI-scope tooling, log management, file integrity monitoring, vulnerability scanners, that integrates AWS-first. The major Indian compliance-automation platforms (Sprinto, Scrut, Drata, Vanta) all integrate AWS deeply; GCP integrations exist but cover fewer evidence sources. For a fintech going through a first PCI assessment, AWS reduces evidence-collection friction by 20-40%.

Specific PCI DSS 4.0 control areas where AWS has more out-of-box options: log retention with immutability (S3 Object Lock + S3 Glacier for 1-year retention), file integrity monitoring (CloudWatch + Inspector + third-party tools), and network segmentation (more granular Security Group + NACL options than GCP firewall rules).

Verdict: AWS for a first PCI DSS assessment. GCP is fully capable but you will spend more engineering time wiring up evidence collection.

7. Networking for payment-gateway connectivity patterns

Indian fintech needs hybrid connectivity to: bank partners (often via leased lines or MPLS), payment switches (Mindgate, AGS, FSS), KYC providers (Karza, Hyperverge, Signzy), and Aadhaar AUA/KUA infrastructure (UIDAI-mandated VPN tunnels). The cloud needs to support direct-connect to all of these.

AWS Direct Connect has more India-resident colocation partners (CtrlS, NTT, Sify, Reliance Jio) and more pre-existing private connectivity to NPCI, NSE, BSE, and major Indian banks. AWS Transit Gateway as the hub for multi-VPC + on-prem networking is more mature than GCP's equivalent (Network Connectivity Center + Cloud Router).

GCP's Shared VPC is simpler than AWS's account-per-environment VPC peering pattern, and is a genuine operational advantage at the 5-50 engineer scale.

For Aadhaar-bound workloads (eKYC, Aadhaar-linked payouts), both clouds have customers operating UIDAI-approved AUA/KUA architectures. AWS has more documented reference architectures published by Indian fintechs.

Verdict: AWS for hybrid connectivity to Indian banking infrastructure. GCP for cleaner internal networking when you do not need many partner connections.

8. Kubernetes: EKS vs GKE

This is the clearest verdict on the list. GKE wins.

GKE Autopilot mode runs the control plane and node infrastructure for you, billed per-pod. EKS requires you to either run nodes (more ops) or use Fargate (more cost). GKE upgrades, network policy, and HPA work out-of-the-box without the EKS-typical add-on installation ceremony (aws-load-balancer-controller, cluster-autoscaler, external-dns, kube-state-metrics, etc.).

GKE pricing for the managed control plane is comparable to EKS at $0.10/hour per cluster. The hidden cost difference is operational: a typical Indian fintech engineering team will spend 0.5-1 FTE-equivalent on EKS operational toil that simply does not exist on GKE Autopilot.

Verdict: GKE, unambiguously, for any Indian fintech that does not already have deep EKS operational expertise. The category-of-one product on GCP.

9. Serverless for India-specific bursty workloads

India has bursty traffic patterns that pure serverless suits well: NPS / TDS deadlines, IPL match windows, festival sale events, salary-day banking traffic.

AWS Lambda has the deepest ecosystem (custom runtimes, Lambda Layers, X-Ray integration, Step Functions for orchestration), the largest set of trigger sources, and the most mature observability tooling.

GCP Cloud Run is operationally simpler. Container-based, autoscale to zero, supports any runtime that builds to a container, billed per request + CPU-second. For a fintech that already builds Docker images for its services, Cloud Run is essentially "Lambda but you bring your own runtime, and the pricing model is cleaner." Cloud Run jobs and Cloud Run for Anthos add long-running and Kubernetes-bound variants.

Verdict: Cloud Run for simple HTTP-triggered services where you already have containerised builds. Lambda for event-driven workflows with rich AWS trigger graph (S3, DynamoDB Streams, SQS, EventBridge). Most Indian fintechs will use both eventually; pick by where the first 5 services need to live.

10. Security observability and threat detection

AWS approach: a stack of independent services. GuardDuty (threat detection), Security Hub (aggregation + CIS benchmark), AWS Config (configuration drift), AWS Inspector (vulnerability scanning), Macie (data classification), Detective (forensics), Audit Manager (compliance evidence). Each is good. Together, they are powerful but require integration effort.

GCP approach: Security Command Center as the unified pane. Bundled threat detection, vulnerability findings, sensitive-data discovery, posture management, and IAM Recommender all in one product. The Premium tier (required for most of the value) is expensive, but covers what AWS spreads across 5-7 separate services.

For a small fintech team (1-3 engineers responsible for cloud security), GCP's unified surface reduces operational fragmentation. For a larger team with a dedicated security engineer, AWS's specialised services give more depth per domain.

Verdict: GCP Security Command Center wins for small-team operational simplicity. AWS wins for advanced specialisation.

11. Indian talent availability

The hiring market is the factor most cloud-comparison articles ignore. For Indian fintech building in 2026, it is one of the most important.

AWS-certified engineers in India outnumber GCP-certified engineers roughly 5-7 to 1, based on public certification numbers, LinkedIn job posting data, and Naukri search ratios. AWS Solutions Architect is the most common cloud certification on Indian engineering resumes. GCP Professional Cloud Architect is rarer, and commands a 15-25% salary premium in 2026 because supply is constrained.

What this means operationally: if you build on AWS, you can hire mid-level cloud engineers from a pool of ~150,000 in India. If you build on GCP, the pool drops to ~25,000-40,000, and they are more expensive. For senior platform engineers (5+ years cloud-native), the gap narrows somewhat as senior engineers tend to be cloud-agnostic, but the rate premium for GCP senior is real.

The flip side: GCP engineers are often more recent (the certification programmes are newer), and the Indian GCP community runs a tighter set of regular meetups and conferences (GDG, Google Cloud Next India). The talent pool is small but higher-engagement on average.

Verdict: AWS for ease of hiring at mid-level. GCP for a smaller, more recent, more expensive pool. If your hiring runway is short, this factor alone may push you to AWS.

12. Marketplace and ecosystem

The AWS Marketplace has more compliance, security, and observability ISVs available with INR billing through Indian resellers. The major Indian compliance-automation platforms (Sprinto, Scrut, Drata, Vanta) integrate AWS first; GCP integrations exist but cover fewer evidence sources.

Indian managed-service-provider (MSP) ecosystem: AWS has the larger India MSP community by 3-4x. If you plan to outsource cloud operations to an Indian MSP (TCS, Infosys, Wipro, smaller specialists like Minfy, Searce, BluePi), AWS is the more common skill set.

GCP's marketplace has caught up substantially in 2024-2025 with the launch of GCP Marketplace India billing, but the depth of third-party offerings still trails AWS by roughly 2-3x in count.

Verdict: AWS for ecosystem depth and Indian MSP availability. GCP for native Google integrations (Workspace, BigQuery, Looker).

The honest summary table

Decision factor	AWS	GCP	Lean
India region maturity	3 regions, longer history	2 regions, catching up	AWS
RBI regulatory comfort	More pre-signed MSA paperwork	Capable but newer for Indian regulated buyers	AWS
Pricing (no commit)	Higher floor	10-20% cheaper floor	GCP
Pricing (3-year commit)	Aggressive Savings Plans	Strong CUDs	Roughly even
Ledger DB	Aurora, mature	Spanner, unique at scale	Depends on workload
Analytics DB	Redshift	BigQuery	GCP
IAM (default safety)	Powerful, complex	Simpler, safer defaults	GCP
IAM (advanced control)	SCPs, permission boundaries	Simpler, less granular	AWS
PCI DSS evidence collection	Deeper marketplace tooling	Fewer integrations	AWS
Hybrid connectivity (India banks)	More Direct Connect partners	Cleaner internal VPC model	AWS
Kubernetes	EKS, more ops	GKE Autopilot, less ops	GCP
Serverless	Lambda ecosystem	Cloud Run simplicity	Depends on workload
Security observability	Specialised, fragmented	Unified Security Command Center	GCP for small teams
Indian talent pool	5-7x larger	Smaller, more expensive	AWS
Marketplace + MSP	Deeper	Newer, narrower	AWS

The honest recommendation depending on your fintech stage

If you are a seed-stage Indian fintech with under 15 engineers and your first compliance gate is PCI DSS or RBI Master Direction: default to AWS. Lower legal friction, deeper ecosystem, easier hiring. The savings on GCP do not yet outweigh the operational overhead of a smaller talent pool and fewer integrations.

If you are a fintech where analytics and risk modelling are core differentiators: seriously consider GCP. BigQuery is enough of a category-of-one product that the rest of the trade-offs become acceptable.

If your engineering team has strong Kubernetes preferences and wants to spend zero time on cluster operations: GKE Autopilot makes GCP the better choice on day one, and the operational savings compound.

If you are building a globally-distributed ledger or a strong-consistency cross-region payment switch: Spanner is the right tool, and Spanner only exists on GCP.

If none of the above are decisive: AWS as default for Indian fintech in 2026, GCP for specific workloads where the unique capabilities (Spanner, BigQuery, GKE Autopilot) carry real weight.

The trap: defaulting to both

The mistake I see most often with Indian fintechs at the 30-50 engineer stage is "multi-cloud by accident." One team builds on AWS, another picks GCP for an analytics project, two years later the SRE team is maintaining two sets of IAM, two sets of networking, two sets of monitoring, two sets of compliance evidence. Cost increases roughly 1.6-1.8x for the same workload because the commit discount is split across two providers.

Pick one as primary. Use the other for one specific workload where the unique capability justifies the operational overhead. Resist the rest. Multi-cloud as a strategy is rarely a fit for a seed-stage Indian fintech; it is most often a sign that platform decisions were made by feature-team consensus rather than by an architect with the operational picture.

If you want a second opinion on your specific stack

I run a free 20-minute cloud audit for Indian fintech founders evaluating cloud choices. No NDA needed for the first conversation. Your specific workload, your specific compliance gates, my honest read on AWS vs GCP for your situation. Send a note.

Methodology note. Pricing references taken from public AWS and GCP pricing pages as of May 2026; numbers shift quarterly. Regulatory references taken from public RBI, MeitY, and IRDAI notifications. Operational opinions are mine, labelled inline. Where I have stated a verdict, the underlying tradeoffs are documented above; reasonable practitioners can weight them differently and arrive at the opposite call.

AWS S3 Block Public Access: Four Settings, What Each One Does, and Why You Need All Four

noreply@matrixgard.com (Avinash S) — Tue, 12 May 2026 17:30:00 GMT

The pattern doesn't start with a hacker. It starts with a developer in a hurry.

Someone needs to share a file with a vendor. They right-click the S3 object, click "Make public," see it works, move on. Six weeks later, a security researcher with a search index finds the URL.

That's how most S3 incidents actually begin. The breach is a checkbox that got flipped by someone who didn't know what the checkbox protected against.

AWS knows this. In November 2018, they shipped a feature called Block Public Access to fix it. In April 2023, they made the strict version the default for every new bucket. In 2026, public S3 misconfigurations still appear regularly in disclosed breaches, often on buckets created before 2023 or accounts where Block Public Access was deliberately switched off.

This post is the boring reference your team should have read before configuring a bucket. Four settings, what each one does, and why none of them are individually enough.

The four settings

AWS Block Public Access is a set of four boolean controls. They sit at two levels: the AWS account and the individual bucket. The four:

Setting	What it blocks
`BlockPublicAcls`	New ACLs that grant public access. Existing public ACLs continue to work.
`IgnorePublicAcls`	All public ACLs are ignored at evaluation time. Public ACLs continue to exist but have no effect.
`BlockPublicPolicy`	New bucket policies that grant public access.
`RestrictPublicBuckets`	Cross-account and anonymous public access through bucket policies, regardless of policy contents.

These four are layered, not redundant. Each blocks a different way an S3 object can become public.

One. BlockPublicAcls

S3 has two access models. Bucket policies are JSON IAM-style documents. Bucket ACLs are an older system Amazon kept around for compatibility. ACLs let you grant access to specific AWS accounts, the bucket owner, the special AllUsers group (everyone on the internet), or the special AuthenticatedUsers group (anyone with an AWS account).

BlockPublicAcls=true prevents new ACLs being applied that grant access to AllUsers or AuthenticatedUsers. It also blocks PUT Object requests that include an ACL grant to those groups, and PUT Object requests with --acl public-read arguments. The API call returns AccessDenied instead of silently succeeding.

Important: this setting does not retroactively remove public ACLs that already exist. If a developer set an ACL last year before the setting was enabled, the object is still public until the ACL is removed.

Two. IgnorePublicAcls

This is the retroactive fix. IgnorePublicAcls=true tells S3 to treat any existing public ACL as if it doesn't exist when an access request comes in. The object stays in the bucket, the ACL stays on the object, but the public read never resolves.

Most teams enable BlockPublicAcls and IgnorePublicAcls together. The first blocks new mistakes. The second neutralises old ones.

Three. BlockPublicPolicy

ACLs are one path to a public object. Bucket policies are the other. A bucket policy that allows s3:GetObject to Principal: "*" makes every object in the bucket world-readable.

BlockPublicPolicy=true rejects any new bucket policy that would grant public access. Existing public policies continue to operate. This blocks the most common path teams take to share a bucket with the world: pasting a public-bucket policy template from Stack Overflow.

Four. RestrictPublicBuckets

The strictest of the four. When enabled, AWS ignores any portion of a bucket policy or ACL that would grant access to public or anonymous users. The bucket can still have a public policy attached. The policy is just non-functional.

This is the setting that protects you from a bucket policy that already exists and grants public access. BlockPublicPolicy prevents new ones. RestrictPublicBuckets neutralises old ones.

Two levels, not one

These four settings can be configured at the bucket level and at the account level. The account level is an envelope that applies to every bucket.

If account-level BlockPublicAcls=true is set, every bucket in the account behaves as if it had BlockPublicAcls=true, regardless of what the bucket-level setting says. Account-level is strictly more restrictive: the OR of account and bucket settings wins.

This matters because most accidental exposures happen at the bucket level. A developer with s3:PutBucketPublicAccessBlock permission can disable the bucket setting and turn the bucket public. They cannot do the same at the account level without s3:PutAccountPublicAccessBlock, which is normally restricted to a small group.

The clean rule: set all four at the account level, and only allow exceptions case by case. Most teams skip the account-level step. That's the gap.

The April 2023 default change everyone forgets

In April 2023, AWS changed the defaults for new S3 buckets. All four Block Public Access settings now default to true. ACLs are disabled by default. A new bucket created in 2024 or later is private out of the box.

This sounds like the end of the problem. It isn't, for three reasons:

Pre-2023 buckets retain their old configuration. A bucket created in 2019 with all four settings off is still that way unless someone explicitly remediated it.
Account-level defaults were not changed automatically. Your account-level Block Public Access settings are whatever you set them to when you opened the account, or all-off if you never touched them.
The defaults only protect against accidental public access. Deliberately public buckets (static website hosting, public CDN origins) are still common, and once a bucket is intentionally public, every object inside inherits the risk.

The pattern we still see: an Indian seed startup creates an AWS account in 2021, gets a bucket public for a CDN, leaves account-level Block Public Access off, then later creates a private bucket assuming "AWS defaults are safe now." The new bucket is fine. The old one isn't. Account-level was never enabled.

The DPDP and RBI angle

For an Indian startup, public S3 isn't just a security mistake. It's a regulatory event.

Under the DPDP Act 2023, a Data Fiduciary is liable for personal data exposure regardless of intent. The penalty for a significant breach can reach Rs 250 crore. "We left a bucket public by accident" is not a defence under the Act. The duty is to maintain reasonable security safeguards, and exposing personal data through misconfigured S3 fails that test.

For RBI-regulated fintechs, the same exposure also triggers reporting obligations under the Cyber Security and Resilience Framework. The clock starts the moment the misconfiguration is discovered, internally or externally.

The technical fix for both regimes is the same: turn all four Block Public Access settings on, at the account level, and audit existing buckets for pre-2023 settings.

The five-minute audit

For each AWS account you operate:

# Check account-level Block Public Access
aws s3control get-public-access-block --account-id YOUR_ACCOUNT_ID

# Check every bucket
aws s3api list-buckets --query "Buckets[].Name" --output text | \
  tr "\t" "\n" | while read bucket; do
    echo "--- $bucket ---"
    aws s3api get-public-access-block --bucket "$bucket" 2>&1
  done

If any of the four settings return false, or the API returns NoSuchPublicAccessBlockConfiguration, that bucket is in the danger zone.

The remediation in the AWS Console: S3, Block Public Access settings for this account, Edit, tick all four, Save. Then for each bucket that's intentionally public, document why, and add an exception only at the bucket level.

What this doesn't cover

Block Public Access is necessary, not sufficient. It does nothing about:

Pre-signed URLs that leak personal data
IAM users with overly broad S3 permissions
Cross-account bucket sharing through s3:GetBucketAcl
Data accidentally written to a bucket that was never meant to hold it
Server-side encryption gaps

If you want the rest of the layered defence, that's the AWS Security Baseline for Indian Startups we maintain. Block Public Access is one of nine controls in it.

TL;DR

Four settings: BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, RestrictPublicBuckets. Each blocks a different path to a public object. None of them work alone. Set all four, at the account level, for every AWS account you run.

For Indian operators, this is also a DPDP control. Treat it that way.

I Audited Five OTT Platforms With Browser Devtools. The Cache Headers Told a Story.

noreply@matrixgard.com (Avinash S) — Thu, 07 May 2026 06:30:00 GMT

A few weeks ago I was watching a cricket match on my phone. The stream dropped to what looked like 480p mid-over.

I cursed my wifi. Then I started wondering whether it actually was my wifi.

So I spent three weeks running technical audits across five OTT streaming platforms. Standard browser developer tools, signed in as a paying or registered user. No DRM bypass, no unauthorized access, no clever exploits. Just the network panel, the Performance API, and a careful eye on what each platform's player was actually doing on the wire.

What I found was less about whose stream is "best." It was about how differently platforms make architectural choices when solving the same problem: get video to a paying user reliably.

Same technical problem. Five completely different answers.

This piece pulls together what I observed. Platforms are anonymized A through E. The methodology section at the bottom explains what was measured and what wasn't.

The cache TTL finding that surprised me most

Streaming video works by chopping content into small segments (2 to 10 seconds each) and delivering them on demand. The CDN caches these segments at edge locations close to viewers. How long a segment stays in cache is set by a Cache-Control: max-age header.

Long cache: origin server gets hit rarely, costs are low. Short cache: origin server gets hit constantly, costs scale linearly with traffic.

Across the five platforms, segment cache TTLs ranged from 5 minutes to nearly a year for the same kind of asset.

Platform	Manifest TTL	Segment TTL
A (global hyperscale)	Signed, ~1hr expiry	Signed, ~1hr expiry
B (Indian market leader)	37 minutes	~1 year
C (Indian, mid-market)	2 minutes	5 minutes
D (Indian, regional)	~3 months	~3 months
E (global hyperscale)	Signed via private protocol	Signed

Read that table again.

Platform B caches each video segment for nearly a year. Platform C caches the same kind of object for five minutes. Both serve Indian users. Both run on commercial CDNs.

The difference is a deliberate engineering choice with massive cost implications.

A segment cached for a year hits origin once and serves from edge for everyone forever. A segment cached for 5 minutes hits origin every five minutes per edge node, multiplied by every edge node serving traffic. At scale, this is the difference between a CDN bill that works and one that doesn't.

The reason Platform B can cache aggressively: they treat segments as immutable. Once packaged, never changed. Platform C re-validates them constantly, probably out of caution about content updates, but the caution is unnecessary if your packaging pipeline is right.

This choice doesn't show up on any architecture diagram. But it separates teams that have thought hard about CDN economics from teams that haven't.

URL signing: the security layer most platforms skip

When you watch a video, your player fetches segment URLs from the CDN. Whether those URLs are signed determines whether they can be shared.

Platform B signs every segment URL with an HMAC token that expires in about an hour. The URL is bound to a session. Try to use it from a different IP or after expiry, and you get a 403.

Platforms C and D ship plain, unsigned URLs.

Anyone who pulls a URL from their browser's network panel can paste it into another browser, on another network, and stream the content directly. With Platform D's months-long cache TTL, a leaked URL stays valid for an absurdly long time.

The DRM on the segment bytes still protects against re-distribution of decrypted content. But unsigned URLs eliminate the first layer of defense. They make scraping easier. They make casual sharing trivially possible. They turn the CDN into a public file server with extra steps.

Most platforms that skip URL signing aren't doing it deliberately. They inherited a CDN config that didn't include token authorization, and nobody went back to fix it.

Where auth tokens live

This is the finding that surprised me least but matters most.

Every modern web platform stores a session token somewhere on the client. Two options: a cookie marked httpOnly (JavaScript on the page cannot read it), or localStorage (any JavaScript on the page can read it).

The pattern was striking:

Platform	Auth storage
A	httpOnly cookies only
B	httpOnly cookies only
C	Tokens duplicated across cookies and localStorage
D	OAuth2 access and refresh tokens in localStorage
E	httpOnly cookies + private protocol

Why does this matter?

If anyone successfully injects JavaScript into the platform's pages, through stored XSS, a compromised third-party SDK, or a malicious browser extension, they can read whatever's in localStorage and exfiltrate it. They cannot read httpOnly cookies. The cookie can still make requests on the user's behalf, but the raw token never leaves the browser.

Refresh tokens are the highest-stakes case. An access token is usually short-lived. A refresh token might be valid for days or weeks. An attacker who exfiltrates a refresh token can mint new access tokens long after the user has logged out and gone to bed.

Platforms that get this wrong usually have an architectural reason. A third-party SDK or a legacy OAuth flow that needed JavaScript access at some point. The fix is well-documented. The cost of not fixing it scales with your XSS exposure, which scales with your third-party JS footprint.

This is one of those "the cost is invisible until something goes wrong, and then the cost is enormous" patterns.

Player choices: build, buy, or wrap

Three strategies for getting a video player on your platform.

Build it yourself. Platform A built Cadmium, an entirely proprietary player that talks to its CDN over a private protocol. Platform E went the same route. Multi-year investment, dedicated player team, only justified at hyperscale.

Buy a vendor. Platform D uses a commercial player engine bundled into their app. The vendor handles the player, the DRM integration, the ABR controller. The platform handles UI and CMS.

Wrap an open-source player. Platform B uses Shaka Player (Google maintains it) under their own branded wrapper with custom telemetry, DRM orchestration, and UI. Platform C does the same with Video.js.

For the longest time I assumed the "best" platforms wrote their own players. The audit data corrected me.

Platform B is widely considered best-in-class for its market. They use off-the-shelf Shaka with a thin wrapper. They wrote the parts that matter (telemetry, ABR memory, DRM caching) and let Google maintain the player engine.

If you're building an OTT at any scale below Netflix, you almost certainly don't need to write a player from scratch. Pick an open-source engine, wrap it well, ship it.

CDN topology: owning vs renting the wire

This is where Platform A is in a class of its own.

Most platforms (B, C, D) use commercial CDNs. Akamai, CloudFront, Cloudflare. Their video segments live on the CDN's edge servers, which are geographically distributed but run by the CDN, not the platform.

Platform A built and operates Open Connect Appliances. Physical servers shipped to ISPs, who install them inside their own networks.

When you watch Platform A's content from a major Indian ISP, your video doesn't traverse the public internet. It comes from a Platform A appliance physically located inside the ISP's data center, on the ISP's own network, often with zero transit cost.

The hostnames told the story. I observed segments served from clusters in two different Indian cities, inside two different ISPs, simultaneously, on a single playback session. The platform's client was steering between four different appliances mid-playback based on conditions I couldn't see.

This is a 10+ year capital investment that no other platform in my audit comes close to matching. It's not replicable at small scale, and it's not even strictly necessary at small scale.

But it explains why Platform A's streams feel different. They're physically closer to the user than anyone else's, by a wide margin.

Telemetry: centralized vs federated

How does each platform know what's happening with your stream? They send telemetry beacons.

Platform A: small number of beacons per session, all to its own first-party endpoint, in JSON, with an outbox pattern (failed sends queued in localStorage and retried). Telemetry treated as a first-class engineering concern.

Platform B: beacons in Protobuf (a binary wire format) to a single first-party endpoint. Response acknowledgment is two bytes. Beacons are 5 to 12 KB. Under surge conditions, this matters. Telemetry itself becomes a load source if you're not careful.

Platforms C, D, and others: beacons fanned out to multiple third-party SDKs simultaneously. Mixpanel, CleverTap, NPAW Youbora, Branch.io, Facebook, Google Analytics, Comscore, Conviva, AppsFlyer. One platform's watch page made requests to over 30 distinct hosts.

There's a cost to this federation.

During my audit, one platform's video QoE telemetry endpoint was returning HTTP 503 errors. Their pipeline was broken at the moment I measured it, and presumably had been for some time without detection.

Centralized telemetry has fewer single points of failure than federated telemetry, and easier observability.

The pattern is consistent. Platforms that take observability seriously consolidate. Platforms that treat telemetry as a checkbox spray it across vendors.

Accessibility: the largest gap I observed

I expected to find architectural differences. I didn't expect the gap on accessibility to be this stark.

For a single drama series episode:

Platform	Audio tracks	Subtitle tracks	Audio descriptions
A	35 across 23 languages	42 across 33 languages	14 tracks
B (Indian leader)	1 (English)	1 (English)	None
C (Indian)	1 (regional language)	1 (regional language)	None
D (Indian regional)	1 (English, on a regional drama)	1 (English)	None
E	Multiple	Multiple	Not measurable

Platform A's catalog has been built for a global multi-language audience for over a decade, and it shows.

Platform D, which positions itself as a regional Indian OTT, shipped English-only audio on a regional-language drama series. That's either a packaging mistake on the title I watched, or a capability gap, or a cost choice. Whichever it is, it directly contradicts the platform's stated regional positioning.

Audio descriptions, narration tracks for visually impaired viewers, are present on exactly one of the five platforms. Fourteen tracks across multiple languages on Platform A. Zero on the others.

Accessibility is the dimension where the gap between "platform that takes its users seriously" and "platform that ships the minimum" is most visible.

It's not a hard problem. It's a priority.

What this means if you're building a streaming platform

A few patterns worth taking seriously.

Cache asymmetry is your friend. Manifests should not be cached. Segments should be cached forever, or close to it. They have completely different lifecycles and need completely different cache strategies.

Sign your segment URLs. Every CDN supports it. There's no good reason to ship plain URLs in 2026.

Keep auth out of localStorage. httpOnly cookies have been the right answer for fifteen years. The exceptions are vanishingly rare and almost always trace back to a third-party SDK someone forgot to question.

Don't write a player from scratch unless you're at hyperscale. Wrap Shaka or hls.js. Spend your engineering on the parts users actually feel: telemetry, ABR memory, DRM caching, UI.

Centralize your telemetry. If you're sending the same events to five vendors, you're paying five times for the same insight, debugging five integrations, and giving five third parties access to your user data. Pick one. Build the rest yourself.

Treat accessibility as core, not as an add-on. Multi-language audio and subtitles aren't extras for a global platform. They're the product.

Methodology

All observations were made via standard browser developer tools while signed in as a paying or registered user. No DRM was bypassed. No access controls were circumvented. No license server payloads were captured beyond noting that requests fired and to which endpoints.

Platform identities are anonymized. Findings that could uniquely identify a platform have been described in general terms or omitted.

Single VOD title per platform, on desktop Chrome, on a residential Indian connection. Network throttling and mobile network behavior were not in scope.

If you're building or scaling an OTT, talk to us

The wire tells stories the marketing doesn't. If you recognized your platform in the audit above (good or bad), or if you're building one and want a second set of engineering eyes on your architecture, that is exactly what MatrixGard does.

We do read-only infrastructure audits across cloud, security, and delivery layers. Same methodology as the audit above, but applied to your own stack with full access and a written report at the end. See how a MatrixGard audit works or start with the free 2-minute readiness checklist.

Avinash S is the founder of MatrixGard, a fractional DevSecOps practice helping founder-led teams ship cloud infrastructure that holds up under audit, scale, and incident pressure. Eight-plus years across enterprise and startup cloud environments. M.Tech Cyber Security at SRMIST.

What SOC 2 Actually Costs an Indian Seed Startup in 2026: A Line Item Breakdown

noreply@matrixgard.com (Avinash S) — Thu, 23 Apr 2026 07:30:00 GMT

An Indian seed-stage SaaS founder told me last month that his investor had recommended Vanta + a Big-4 audit firm + a boutique vCISO. The combined quote came to ₹34 lakh. He nearly signed.

We ran the same scope through the Indian-market stack, Sprinto + a small AICPA-licensed Indian audit firm + Astra for the pen test. Total: ₹10 lakh. Same Type II attestation. Same opinion letter. Same customer-facing security page. (Story details changed for anonymity; the price gap is real and recurring.)

This post is the breakdown nobody on a SaaS pricing page will give you, grounded in actual Indian-market quotes (grcdesk.in, neumetric.com, parafoxtechnologies.in, soc2.in), not US-buyer aggregators that overstate Indian pricing 2-4x.

Scope: SOC 2 Type II, the one your enterprise customers actually demand, for an Indian-incorporated SaaS company with a 5-15 person team, in the first audit cycle (12-month observation period).

Why every SOC 2 cost article you've read is misleading

Three reasons, named honestly:

The big-three SaaS (Vanta / Drata / Sprinto) price themselves, not the project. Their pricing page is one bill of five. They don't tell you about the others because if you saw the total upfront, the SaaS subscription would feel like a smaller commitment than it is.
Most "cost of SOC 2" articles are written by the SaaS vendors themselves. Read the byline. The incentive is to make their slice look like the whole pie.
Audit firm quotes are pad-loaded. A Big-4 audit typically costs 2-3x what a small AICPA-licensed specialist firm charges for the same scope of Type II opinion under the same standard. Most Indian startups default to the Big-4 they recognise. Most Indian customers don't actually care which audit firm signed the report, they just want to see SOC 2 Type II on a security page.

The result of all three: founders walk in expecting a ₹6 lakh project and walk out three quarters later having written ₹20+ lakh in cheques across five vendors. The over-spend isn't fraud. It's information asymmetry. This post is the symmetry restored.

The five line items, in rupees

1. Compliance automation SaaS, ₹2-5 lakh/year (Indian path)

The platform that automates evidence collection. You'll need one. The choice is which, and the Indian buyer reality is very different from the US-aggregator number you'll see online.

Sprinto (Bengaluru-HQ, Indian-founded, INR billing): ₹2-5L/year for a startup tier with single framework; ₹5-15L for multi-framework setups (grcdesk.in, cybersecify.com). Pricing is gated behind a demo call, verify directly.
Scrut Automation (also Bengaluru-HQ): ₹2-5L/year at startup tier, comparable feature set to Sprinto for a single-product Indian SaaS (mitigata.com).
Drata (US, no India tier published): Indian buyers report ₹5-15L/year. Built for US mid-market, quoted in USD, no FX cushion (grcdesk.in).
Vanta (US, no India tier): same band as Drata, ₹5-15L/year. Heaviest brand recognition outside India, which is why investors recommend it, not because it's better.

The honest math: Sprinto and Scrut are 2-3x cheaper than Vanta/Drata at the Indian seed tier, with INR billing avoiding FX swing. Capability gap on Trust Services Criteria automation: minimal for a single-product seed-stage SaaS. The reason Western funds push you toward Vanta is unfamiliarity with the Indian alternatives.

What to actually spend the savings on if you have it: a better auditor (next line item).

2. The actual audit (Type II), ₹3-6 lakh from the right firm

This is the part the SaaS pricing page doesn't include and the part most founders forget exists until month four. The auditor, a CPA firm, independently inspects your evidence and issues the opinion letter your customers will ask for.

Indian-market pricing tiers for first-year Type II:

Smaller Indian CA firms / India-resident SOC 2 boutiques (e.g. soc2.in): ₹3-4.2L for a starter package, often bundled with pen-test.
Indian compliance-first shops (Parafox, Neumetric, GRCDesk): ₹4-6L for 10-30 employees; ₹7-10L for 30-100 employees (parafoxtechnologies.in, zcybersecurity.com).
A-LIGN India / Schellman India (US specialist firms with AICPA-licensed Indian teams): buyers report ₹6-10L on calls, neither firm publishes INR pricing.
Big-4 India (PwC / Deloitte / EY / KPMG): ₹15-30L+. They typically don't quote sub-50-FTE SaaS, and when they do, it's at this band.

Picking a smaller Indian CPA firm over Big-4 saves ₹10-25L for the same scope of opinion under the same AICPA standard. The opinion letter has the same legal weight. The customers asking you for SOC 2 won't reject A-LIGN, Schellman, or a credible Indian CA firm, all are on the AICPA's licensed-CPA-firm list.

In our experience, an explicit "Big-4 only" requirement from customers is uncommon. Most enterprise procurement asks for "a recognized AICPA-licensed firm," which any specialist auditor satisfies. When the Big-4-specific demand does appear, it's usually a procurement-team box-tick, and typically negotiable at the contract stage.

3. Consulting / vCISO / readiness, ₹0-15 lakh

This is the line item with the widest range and the highest founder confusion.

DIY with the SaaS tooling: ₹0. The platform's inbuilt readiness assessment + control templates can carry you, if someone on your team can absorb the work.
Boutique vCISO retainer (3-6 month engagement): ₹5-15L. Useful when nobody on your team has done compliance before.
Big-name consultancy (the Deloittes of the world, but for advisory): ₹15-30L. Rare for seed stage. Almost always overkill.

You save ₹5-15L by DIY-ing this. The catch: it requires 80-150 engineering hours across the year, distributed across the right person. If your team is 3 backend engineers and a designer, you don't have that person, and the SaaS platform won't carry you the rest of the way.

The honest test: ask whichever of your engineers will own this whether they've ever read AICPA Trust Services Criteria. If yes, DIY. If no, budget vCISO.

4. Engineering hours (the hidden cost), ₹3-10 lakh equivalent

This is the cost no SaaS marketing page admits exists.

SOC 2 Type II requires evidence, log retention configs, change-management workflows, access reviews, vulnerability scan outputs, vendor-management documentation, security training records. The SaaS platform pulls a lot of this automatically. It does not pull all of it. The remainder requires engineers.

Plan for 80-200 engineering hours over the 12-month observation period. At a fully-loaded cost of ₹3,000-5,000 per hour for a senior engineer (salary + benefits + opportunity cost), that's ₹2.4-10L in real engineering capacity diverted from product.

Reduce this by picking the SaaS with the best evidence-collection automation for your stack. Drata generally edges out Vanta on this dimension as of early 2026; Sprinto is improving fast on Indian-stack integrations.

Do not pretend this cost is zero. It's the most common reason a SOC 2 budget triples mid-year.

5. Pen test (auditor will require it), ₹1.5-3 lakh from Indian vendors

The auditor will require a pen test result for the application within scope. You can't skip this. You can choose how to deliver it.

CERT-In empanelled small Indian firms: ₹40K-1.5L for a single web-app VAPT with a usable certificate (Astra India VAPT guide). Cheapest defensible option.
Astra Security (Delhi-HQ, CERT-In + CREST): single VAPT scan ₹40K-2L; continuous pentest plan ~₹5L/year, overkill for a single SOC 2 cycle (getastra.com/pricing).
Payatu / SAFE Security / NotSoSecure: typical Indian VAPT range ₹1.5-3L for a thorough manual + automated SaaS test (neumetric.com, bminfotrade.com).
Western firm: ₹5-8L. Same opinion letter on the auditor's desk. Usually picked by founders unfamiliar with Indian options.

The auditor doesn't care which path you pick. Pick by your team's preference and your stack's complexity.

Bonus line, bridge letters between Type II cycles, ₹50K-1.5L per letter

Customers often ask for bridge letters (mini-attestations the auditor issues between annual Type II cycles, confirming nothing material has changed). Each one your auditor issues costs ₹50K-1.5L.

The cheapest path: negotiate 1-2 bridge letters into the original audit scope at signing. After signing, each one becomes a separate engagement at full price.

The total, three real scenarios

Every Indian seed-stage SaaS founder we've helped through SOC 2 ends up at one of three roughly-shaped totals. The spread between them is enormous.

Scenario	Automation	Audit	Pen test	Readiness	Total
Cheap DIY (Indian boutique) (Sprinto + soc2.in-style starter + CERT-In small firm)	₹2.5L	₹3L	₹1.5L	₹0 (founder-led)	₹7L
Typical Indian seed-stage (Sprinto/Scrut + mid-tier Indian CPA + Astra/Payatu + light consulting)	₹3L	₹4-5L	₹2L	₹1L	₹10-11L
Western-default-imported (the trap) (Drata/Vanta + Big-4 + vCISO retainer + Western pen-test)	₹8L+	₹15L+	₹5L	₹6L	₹34L+

The headline: the spread between the cheapest defensible Indian path and the Western-default trap is roughly ₹27 lakh. Customers can't tell them apart. The opinion letter reads the same. The Trust Services Criteria coverage is identical. Most Indian seed-stage SaaS land in the middle row at ₹8-14L all-in.

Many US-funded Indian startups default to Vanta or Drata plus a US audit firm, usually because that's what their investors and US customers recognize, not because the Indian alternatives can't deliver the same attestation.

What the SaaS sales reps won't tell you

Five specific things, named:

You don't need their consulting add-on if you have a competent senior engineer. The platform IS the consulting layer for most of the work. The add-on is for companies without infrastructure understanding. If your CTO can read the AICPA Trust Services Criteria PDF without flinching, skip the add-on.
You can switch SaaS platforms mid-year. Evidence portability across compliance platforms is real now, Vanta, Drata, and Sprinto all export evidence in standard formats. If your pricing surprises you at renewal, switch.
The auditor doesn't care which SaaS you use. They care about evidence quality and completeness. You can switch auditors and SaaS independently.
Type II isn't "another full audit" after Type I. Type I confirms your controls exist on a single date; Type II confirms they operated effectively over 6-12 months. Type II typically prices at 1.3-1.5x Type I, same controls, longer observation window, more evidence sampling (Sprinto, Comp AI).
The "you must use a Big-4" customer demand is rare. When it does appear, it's almost always negotiable. Specialist firms (A-LIGN, Schellman, Sensiba) appear on the same AICPA-licensed-CPA-firm list. In our experience the demand for a specifically-Big-4 firm is uncommon and usually softens once the AICPA-licensed status is shown.

What about ISO 27001? HIPAA? PCI?

Same line items, different multipliers:

ISO 27001: comparable first-year cost in India, with recurring surveillance audits roughly ₹4-10L/year (Wattlecorp) vs SOC 2's annual re-audit cycle. Indian certification bodies (BSI India, TÜV, BV, DNV) compete on price against UK/US bodies.
HIPAA: not a certification, it's compliance with US healthcare regulation. No formal audit unless a Business Associate contract demands one. Tooling cost roughly the same; engineering cost higher because of mandatory encryption and access control depths.
PCI DSS: variable from ₹5L (SAQ A self-assessment for Stripe-style flows where you never touch card numbers) to ₹40L+ (mid-scope QSA assessment). Level 1 (>6M transactions/year) can exceed ₹1Cr and is out of scope for most seed-stage. Most Indian fintech founders dramatically over-scope this. If you can use Stripe / Razorpay / Cashfree as the payment processor, you almost never need a full PCI assessment.

The pattern repeats: SaaS automation, an audit body, optional consulting, engineering hours, and at least one external test. The rupee amounts vary by framework. The five-line structure does not.

When you'd actually want to bring in help

Three triggers where DIY stops being the right call:

You have an enterprise customer demanding SOC 2 in under 90 days. The DIY path takes 6+ months end-to-end. If the timeline is forced, buy your way in with a vCISO retainer and an auditor that has Type II completion in <120 days as a stated capability. A few specialists offer this; most don't.
You don't have a senior engineer who's done compliance work before. The platform won't save you. The engineering hours will quietly compound past the consulting fee you would have paid. A boutique vCISO at ₹1-2L/month for 6 months is often cheaper than 200 untracked engineering hours.
You're targeting HIPAA / PCI DSS / FedRAMP / RBI Master Direction next year. Don't DIY SOC 2 if you'll need a real GRC function in 18 months. Build the muscle now with a vCISO who can carry you across multiple frameworks. The marginal cost of the second framework is much lower than the first if you build the right operating model up front.

If none of those apply, you can probably DIY the first SOC 2 cycle and revisit the question at year two.

What this post is missing

I deliberately didn't cover:

Trust Services Criteria selection (Security only vs Security + Availability + Confidentiality, etc.). That's a separate post, for almost all seed-stage SaaS, Security-only is correct, but the reasoning matters.
Specific control implementation (how to actually configure CloudTrail / Cloud Audit Logs / vendor reviews / change management). Each of those is a post on its own.
The exact AICPA TSC text. It's free at aicpa-cima.com. Read it once. It's 40 pages. It will save you weeks of consulting time.

If you want me to look at your specific SOC 2 path

I do this for ~10 startups a quarter, free, no NDA needed: 30 minutes, your specific stack, where the cheapest viable path lives, what you can DIY, what's worth paying for. Mostly because it's the fastest way I know to find startups who actually need the work I do once the audit cycle starts.

Send me a note with what framework you're targeting and your timeline. I'll reply with a 5-line read on the cheapest viable path for your situation.

Avinash S is the founder of MatrixGard. Cloud and DevSecOps for startups who can't afford the team they need. Almost a decade of building, breaking, and securing cloud infrastructure across India, Singapore, and the US.

Methodology note. Pricing ranges sourced exclusively from Indian-market public references, GRCDesk, Neumetric, Parafox, soc2.in, Cybersecify, Z Cybersecurity, Astra Security, Neumetric VAPT, BM Infotrade, combined with quotes shared by Indian founders in our network for first-time, single-criterion SOC 2 Type II engagements at seed-stage SaaS (10-50 FTE). US-buyer aggregators (Vendr / Spendflo / ComplyJet / Comp AI / SOC2Auditors.org) are deliberately excluded, their numbers reflect US enterprise tiers that are 2-4x higher than what Indian SaaS actually pay. Multi-product, multi-region, or multi-framework scope pushes the upper end significantly. All numbers are directional, get a real quote before you budget.

Ghost Hunter: The $28,000 Question Your Dashboard Won't Answer

noreply@matrixgard.com (Avinash S) — Sun, 19 Apr 2026 06:30:00 GMT

It's 11:47 PM. The CEO sends a two-word email.

Subject: Bill?

The AWS bill went from $135,000 to $163,000 in a single month. The board call is at 9 AM tomorrow. The CFO wants a cause, not a number.

The on-call engineer opens the console. Sees the spike. Does not see the cause. Starts digging.

Three hours, eleven browser tabs, and one cold coffee later, the answer surfaces. A single forgotten GPU instance in us-east-1, launched two weeks ago by someone who has since left the team. $1.62 an hour. 24 hours a day. 18 days.

This scene plays out in every cloud-native company, every month. The senior SRE it takes to resolve it is one of the most expensive people in engineering.

I built Ghost-hunter to play that SRE. At 11:47 PM. When nobody else is awake.

Dashboards describe. They do not diagnose.

Cloud dashboards are the smoke detector. They tell you there is a fire. They cannot tell you which wire frayed.

The "why" lives in three places the dashboard cannot reach:

Command-line output from service-specific tools (aws, gcloud, kubectl)
Log data the dashboard never ingested
Tribal knowledge. Who launched what. Which account is test. What's normal for this team.

A human SRE walks that terrain by hand. They form a theory. Run a read-only command. Read the output. Adjust.

Ghost-hunter does the same. No human required at 11:47 PM.

Two detectives, not one

Most AI tools wrap a single model. You ask a question. It writes commands. It runs them. It tells you what it thinks.

For a chatbot, that's fine. For anything that touches your cloud, it's reckless.

Picture a detective investigating a scene. If the same person forms theories AND handles raw evidence, two things go wrong. They miss what a fresh eye would catch. And they're one bad assumption away from contaminating the scene.

Ghost-hunter uses two.

The lead detective. Forms theories. Weighs evidence. Decides what to investigate next. Never touches the crime scene directly. (This is Claude Opus.)
The evidence technician. Follows instructions. Collects samples. Writes one-line summaries. Signs off on the chain of custody before anything crosses. (This is Claude Sonnet.)

"Contaminating the scene" in this analogy is running a command that damages your cloud. The detective never writes commands. The technician writes them. A seven-gate safety system verifies them. Nothing runs until every gate signs off.

A case, five scenes

I ran Ghost-hunter against the FinOps Foundation's public FOCUS 1.0 sample. Real shape, anonymized data, no customer exposure. The dollar amounts are scaled down. The mechanics are what you'd see in production.

Scene 1. The scene of the spike

Ghost-hunter in advisor mode. "Will not touch your cloud. Reads your billing export, proposes read-only commands, asks you to run them yourself."

EC2 at the top of the list, up 185.5%. 27 other services scanned and ranked by dollar impact.

The investigation starts with a fact, not a guess.

Scene 2. The suspects

The lead detective pulls the file apart. Top SKUs. Top accounts. Top regions. One account, 11353890204, is responsible for 91% of the spend. 92% of it landed in us-east-1.

Four theories go on the board:

H1 (55%). GPU instances running for ML or rendering, driving most of the bill.
H2 (30%). A general-purpose instance left running longer than it should have.
H3 (35%). A CI or batch pipeline spinning up short-burst instances.
H4 (10%). Storage growth as a secondary contributor.

Each one has a confidence score. Each one is testable. The detective picks the strongest.

Scene 3. The interview that goes sideways

The evidence technician drafts a command. Read-only. Validated by four security layers. Copied to the user's clipboard automatically.

The user replies:

"i dont have access to the aws account to run any commands"

Most AI tools break here. Either they freeze. Or they hallucinate a result. Or they quietly pretend the user did run the command.

Ghost-hunter does none of that. The detective takes the refusal as information. Re-reads what's on the board. Updates the confidence scores (H1 climbs from 55 to 75). Concludes with what's actually provable from billing alone.

"Understood. You don't have CLI access. No problem. The billing data is quite revealing on its own. Let me work with what we have and wrap this up."

A fake confidence drop would be worse than no tool at all. Ghost-hunter lands on 72%. Not 95. Not 100. Seventy-two.

Scene 4. The plan

Not "do these twelve things and good luck." A prioritized ladder.

NOW. Contact the owner of account 11353890204. Check running g5.4xlarge instances in us-east-1.
THIS WEEK. Set a Cost Anomaly Detection monitor. Add a $5 budget with email alerts at 80% and 100%.
THIS MONTH. Evaluate Savings Plans. Add an IAM guardrail to block expensive GPU launches without approval.

Every "NOW" item is under five minutes. Nothing in the list is a write command against production. Ghost-hunter will never tell you to delete, terminate, or modify anything without your finger on the key.

Scene 5. The verdict, with honest gaps

A root cause. Five cited pieces of evidence. A list of five things Ghost-hunter could not verify.

This part matters more than the conclusion itself.

Most AI tools close with false certainty because false certainty feels polished. Ghost-hunter tells you what it does not know. "Could not confirm which specific EC2 instances are running." "Could not determine who launched the GPU instances or for what purpose."

That transparency is what makes the conclusion trustworthy. You can read the transcript, see what was cited, see what was not, and decide if 72% is good enough to act on.

The seven doors

Every command Ghost-hunter proposes passes through a vault with seven doors. Miss any one door, the command dies.

 1. Fast reject      shell metacharacters blocked (;, &&, unquoted $())
 2. Allowlist        is this verb on the read-only list?
 3. Flag check       every flag safe for this verb?
 4. Input hygiene    length, encoding, empty-command?
 5. Budget           caps on commands, cost, time per run
 6. Semantic check   does this actually test the stated hypothesis?
 7. Sandbox          environment isolation (active mode only)

A system that sometimes lets through commands its validator was unsure about is a system that will one day run delete by accident. Ghost-hunter has no "helpful override." A command that cannot pass every door does not run.

Three lines I refuse to cross

No writes. Ever. Read-only is the whole product. The detective does not hold the keys to the cloud.
No hardcoded answers. Most "AI FinOps" tools win benchmarks by memorizing patterns. "If NAT Gateway plus high bytes, the answer is missing VPC endpoint." Ghost-hunter refuses. The CI pipeline literally fails commits that put scenario names in prompts. If the reasoning isn't in the transcript, it isn't in the product.
No data leaves your machine. Your bill stays local. The only thing that moves is compressed evidence summaries, through your own Anthropic API key.

Why this matters

Most AI tools in this space are lookup tables with a nice voice. They recognize the shapes they were trained on. They miss the shapes they weren't.

Ghost-hunter is slower. On a known pattern, a memorizing tool will beat it every time.

Ghost-hunter wins on the bill nobody has seen before. Your bill. Your configuration. The spike caused by your ML team's experiment, your third-party vendor's bug, the intern who cloned a production pipeline for testing. Every hypothesis, every command, every piece of evidence sits in a transcript you can read.

You do not trust the conclusion because an AI said so. You trust it because you can audit the reasoning yourself.

That's the product.

Private beta

Ghost-hunter is not yet public. If you run cloud infrastructure and you've ever been the person answering the 11:47 PM email, I'll open access to you first.

Book a 20-minute call and I'll walk you through Ghost-hunter against a billing export of your choosing. Or send me a note with what you'd want it to solve first.

Avinash S is the founder of MatrixGard. Cloud and DevSecOps for startups who cannot afford the team they need. Almost a decade of building, breaking, and securing cloud infrastructure.

I Looked at 30 Startups' Infrastructure. Every Single One Had the Same Problem.

noreply@matrixgard.com (Avinash S) — Sun, 12 Apr 2026 10:00:00 GMT

Over the last 8 years working in cloud infrastructure, I have seen the inside of startups at every stage. Seed rounds running on a single AWS account. Series B companies with 40 engineers and no one owning security. Teams that shipped a product customers love, built on infrastructure that keeps the CTO up at night.

Every single one had the same fundamental problem.

Not a specific vulnerability. Not a misconfigured S3 bucket. Something deeper.

Nobody owned security.

The CTO was doing it. The same person writing architecture docs, reviewing PRs, managing the cloud bill, handling incidents at 2 AM, and pitching to investors on Friday. Security was somewhere on the list. Usually at the bottom.

Not because they did not care. Because there was nobody else.

Here are the 7 things I found in every startup under 50 engineers

1. The CTO is the entire infrastructure team

In 28 out of 30 startups, the CTO or a co-founder was the only person who understood how the infrastructure worked. No DevOps engineer. No SRE. No security person. Just one technical founder wearing four hats and hoping nothing breaks on the weekend.

The engineering budget went to product engineers. Which makes sense when you are trying to ship features and close customers. But it means the person responsible for security is also the person who has the least time for it.

2. Secrets were everywhere except a vault

API keys in environment variables. Database passwords in config files committed to the repo. AWS credentials shared over Slack. One startup had their production database password in a shared Notion page that the entire team could access.

Not one of the 30 startups was using a proper secrets manager. Not AWS Secrets Manager, not HashiCorp Vault, not even a basic encrypted store. The reason was always the same: "We will set it up when we have time."

3. Antivirus was the entire security stack

When I asked about cloud security, the most common answer was: "We have antivirus on our laptops." Endpoint protection was the entire security posture. Nothing in the cloud.

No CloudTrail. No GuardDuty. No WAF. No container scanning. No dependency vulnerability checks. The cloud infrastructure was completely unmonitored. Somebody could be running crypto miners on their AWS account right now and they would not know until the bill arrives.

4. The last security review was never

"When was your last infrastructure security review?"

The most common answer: silence. Followed by: "We have been meaning to do one."

22 out of 30 startups had never done a security review of any kind. Not a penetration test. Not a vulnerability scan. Not even an internal audit. The infrastructure was built to work, not to be secure. And nobody had gone back to check.

5. No incident response plan exists

If a breach happened at 2 AM tonight, what happens?

In most of these startups, the answer is: the CTO's phone rings. Maybe. If someone notices. There is no runbook, no escalation procedure, no communication template, no forensic capability. Just a person waking up and figuring it out in real time.

For fintechs under RBI regulation, the reporting window is 2-6 hours. For DPDP Act compliance, it is 72 hours to the Data Protection Board. You cannot meet those timelines if your incident response plan is "call the CTO."

6. Compliance was a future problem that became a today problem

The pattern repeats: startup builds product, gets traction, raises funding, starts talking to enterprise customers. Enterprise customer sends a vendor assessment. The assessment asks for SOC2 Type II certification, or an ISO 27001 audit report, or evidence of RBI compliance.

The startup does not have any of these. The deal stalls. The CTO scrambles to figure out what SOC2 even requires. The timeline is 3-6 months to get certified. The enterprise customer moves on.

I have seen this exact scenario play out at 4 startups in the last 2 months alone. The compliance gap is not just a security risk. It is a revenue blocker.

7. The AWS bill was hiding real problems

When I asked to look at cloud costs, every single startup had waste. Dev environments running 24/7. Oversized instances nobody had right-sized since launch. Unattached EBS volumes accumulating charges. Load balancers pointing to nothing.

The average waste I found: 30-40% of the monthly cloud bill. One startup was spending over Rs 5 lakh per month on AWS. Nearly 40% of that was resources nobody was using. That adds up to lakhs per year in ghost costs.

The cloud bill is not just a cost problem. Unmonitored resources are also unmonitored attack surface. That idle EC2 instance nobody remembers? It has not been patched in 18 months.

Why this keeps happening

It is not negligence. It is prioritization under pressure.

When you have 15 engineers and 200 things to build, security does not make the sprint. The CTO knows it should. But there is a product launch next week, three customer bugs to fix, a hiring pipeline to manage, and an investor update due Friday.

Security gets pushed to "next quarter." Next quarter it gets pushed again. Until something forces the issue: an enterprise deal that requires SOC2, an RBI audit notice, a customer who finds a vulnerability, or worse.

The startups that avoid this trap are the ones that treat security as infrastructure, not as a project. It is not something you "do" once. It is something that runs alongside your product, maintained by someone whose job it is.

What to do about it

If you recognized your startup in the list above, here are three things you can do this week:

1. Take 2 minutes to score yourself. We built a free security readiness quiz that asks 7 questions and tells you exactly where you stand. No signup required to start. Takes 2 minutes.

2. Fix the free stuff today. Enable MFA on your AWS root account (5 minutes). Turn on CloudTrail (10 minutes). Check for public S3 buckets (one CLI command). These cost nothing and close the most obvious gaps.

3. Get an outside set of eyes. You are too close to your own infrastructure to see the gaps. Someone who has looked at 30 other startups will spot patterns in 20 minutes that would take you weeks to find on your own. Book a free 20-minute infrastructure review and find out what is actually hiding.

The best time to fix your security was when you launched. The second best time is before the next audit, the next enterprise deal, or the next incident forces your hand.

Avinash S is the founder of MatrixGard, a DevSecOps consultancy that helps startups get infrastructure-ready in weeks, not months. Previously 8+ years in cloud infrastructure across enterprise and startup environments.

RBI Compliance for Fintech Startups: Security Checklist 2026

noreply@matrixgard.com (Avinash S) — Sun, 05 Apr 2026 14:00:00 GMT

If you are building a fintech startup in India, RBI compliance is not optional. It is the difference between getting a banking partnership and getting shut down. The Reserve Bank of India issued three major master directions in 2024-2025 alone, each tightening the technical requirements for payment aggregators, NBFCs, and digital lending platforms.

Most fintech founders treat compliance as a legal problem. It is not. It is an infrastructure problem. The RBI does not care about your privacy policy. They care about whether your data is encrypted, whether your cloud runs in India, whether you can detect a breach in 6 hours, and whether you have the audit trails to prove it.

Here is the checklist your CERT-In empanelled auditor will actually check.

Which RBI Framework Applies to You?

Before building anything, know which direction you fall under:

If you are a...	Your governing framework	Compliance deadline
Payment Aggregator	PA Master Direction 2025	Active now
NBFC (Top/Upper/Middle layer)	IT Governance Master Direction 2024	Active since Apr 2024
Non-bank PSO (large)	Cyber Resilience Direction 2024	Active since Apr 2025
Non-bank PSO (medium)	Cyber Resilience Direction 2024	April 1, 2026
Digital lending platform	Digital Lending Directions 2025	Active now

If you process payments, lend money, or route funds through your platform, at least one of these applies to you. Many startups think they are "just an interface." The moment you touch, hold, or settle funds, licensing and compliance requirements kick in.

The Infrastructure Checklist

1. Data Must Live in India

This is non-negotiable. All payment system data must be stored on servers physically located in India. This includes transaction records, card credentials, timestamps, user details, and payment profiles.

What this means for your infrastructure:

AWS: ap-south-1 (Mumbai) only for payment and financial data
Azure: Central India or South India regions
GCP: asia-south1 (Mumbai)
Your Terraform or Pulumi code must enforce region constraints. No exceptions.
If data is processed overseas temporarily, a complete copy must return to India within 24 hours and the foreign copy must be deleted
RBI must have unrestricted audit access to all stored data

The most expensive compliance mistake I see: startups that launch on us-east-1 because it was the default, then discover they need to migrate everything to Mumbai. Retrofitting costs 5x more than building it right from day one.

2. Encryption Everywhere

The RBI mandates encryption in transit and at rest. Specifically:

In transit: TLS 1.2 or higher on all connections. No self-signed certificates in production.
At rest: AES-256 encryption for databases, object storage, and volumes. Use AWS KMS, Azure Key Vault, or GCP Cloud KMS for key management.
Card data: Tokenization required. Storing actual card details is banned.
PCI-DSS compliance mandatory for payment aggregators and their onboarded merchants.

Quick check: run this against your AWS account to find unencrypted EBS volumes:

aws ec2 describe-volumes --filters Name=encrypted,Values=false --query 'Volumes[*].[VolumeId,Size,State]' --output table

If that returns results, you have a compliance gap.

3. Access Controls and MFA

RBI requires access on a need-to-know basis with time-limited duration. In practice:

Multi-factor authentication on everything: AWS console, VPN, admin panels, deployment pipelines
No administrative rights on end-user workstations
Privileged access management with audit logging
Regular access reviews (quarterly minimum)
Service accounts with least-privilege IAM policies

I audit fintech startups where the CEO still has root access to production databases. That is a finding your auditor will flag on page one.

4. 24/7 Security Monitoring

The Cyber Resilience Direction requires a Security Operations Center. This means:

Continuous monitoring with log correlation and threat detection
Automated alerting for suspicious activity
Log management with retention (minimum 1 year)
Threat intelligence integration

You do not need to build an in-house SOC. Outsourced SOC services work and are specifically permitted. But "we check logs when something breaks" is not a SOC.

At minimum, set up CloudWatch Alarms + CloudTrail + GuardDuty on AWS, or the equivalent on Azure/GCP. Configure alerts for: root account usage, IAM policy changes, security group modifications, and unusual API call patterns.

5. Incident Response (2-6 Hours)

When a security incident happens, RBI reporting timelines are tight:

Banks and NBFCs: Report within 2-6 hours of discovery
Non-bank PSOs: Report cyber-attacks, outages, internal frauds, and settlement delays within 6 hours

Your incident response plan must include:

Automated breach detection (not a human checking dashboards)
Escalation procedures with named owners
Communication templates pre-approved by legal
Forensic analysis capability for severity, impact, and root cause
Cyber Crisis Management Plan (CCMP) approved by the board

6 hours from detection to RBI notification. If your team's current incident response is "someone posts in Slack and we figure it out," you will miss that window.

6. VAPT: Not Once, Not Annually, Continuously

Vulnerability Assessment and Penetration Testing requirements:

Vulnerability Assessment: Every 6 months minimum
Penetration Testing: At least annually, by a CERT-In empanelled auditor
Best practice: Quarterly VAPT, plus after major app or infrastructure changes
Must be performed before regulatory audits and before onboarding banking partners

Integrate vulnerability scanning into your CI/CD pipeline. Tools like Trivy for container scanning, Snyk for dependency vulnerabilities, and OWASP ZAP for web application testing should run on every deployment. The formal CERT-In audit happens annually, but you should be catching issues continuously.

7. Business Continuity and Disaster Recovery

The RBI requires:

Board-approved BCP/DR plan
Documented data migration policy with audit trails
Regular DR testing (not just documentation, actual failover tests)
Defined Recovery Time Objective (RTO) and Recovery Point Objective (RPO)

If your DR plan is a document nobody has read since it was written, that is a compliance gap. Test it. Quarterly.

8. Vendor Risk Management

Every vendor that processes data for you is part of your compliance surface. RBI requires:

Security controls to prevent infiltration from vendor networks
Network segmentation between your environment and vendor access
Certified assurance from an independent auditor for vendors involved in critical processes
Regular vendor risk assessments

Your payment gateway, KYC provider, cloud hosting, SMS gateway, analytics tools: each one needs a risk assessment. If your vendor has a breach, it is your compliance problem.

The Annual Audit: What Happens

Every year, a CERT-In empanelled auditor will:

Review your IS (Information Security) policies and whether they are actually followed
Check encryption implementation across your infrastructure
Verify access controls, MFA, and privilege management
Test your incident response readiness
Validate data localization (is all payment data in India?)
Review VAPT reports and whether findings were remediated
Check BCP/DR documentation and testing evidence
Assess vendor risk management practices

The audit report goes to RBI's Regional Office. Material findings can trigger enforcement actions, restrictions on launching new products, or worse.

Penalties That Have Actually Been Enforced

This is not theoretical. RBI issued 79 enforcement actions in FY 2024-25:

Paytm penalized for KYC non-compliance, with additional FIU-IND penalty for AML violations
PhonePe fined Rs 21 lakh for PPI guideline violations
Four NBFCs fined Rs 76.6 lakh combined for P2P lending violations
PAs that missed the December 2025 authorization deadline must wind down by February 2026

On top of RBI penalties, the DPDP Act adds penalties up to Rs 250 crore for data protection failures.

The 6 Mistakes I See in Every Fintech Audit

Wrong cloud region. Payment data on us-east-1. This is the most expensive mistake to fix after the fact.
No MFA on the AWS root account. First thing every auditor checks. Takes 5 minutes to fix.
Production database accessible from the internet. Security groups with 0.0.0.0/0 on port 5432 or 3306.
No audit logging. CloudTrail not enabled, or enabled but nobody reviews the logs.
VAPT reports with open critical findings. Getting the test done is not enough. You must remediate the findings.
"We will do compliance later." By the time a banking partner asks for your audit report, it is too late to start.

Start Here

If you are a fintech startup preparing for your first RBI audit, or a growing platform that knows the infrastructure has gaps, here is what to do this week:

Verify all payment data is on India-region servers
Enable MFA on every admin account
Turn on CloudTrail and GuardDuty (or equivalent)
Check for unencrypted storage volumes
Document your incident response process

If you want someone to do a full audit and tell you exactly where the gaps are, book a free 20-minute infrastructure review. We specialize in getting fintech startups audit-ready in 4-6 weeks.

MatrixGard is a DevSecOps consultancy for funded startups. See our services or view pricing.

DPDP Act Compliance for Startups: What Your Dev Team Needs to Build Before May 2027

noreply@matrixgard.com (Avinash S) — Sun, 05 Apr 2026 10:00:00 GMT

The Digital Personal Data Protection Act is not coming. It is here. The Rules were notified in November 2025, the Data Protection Board is operational, and full enforcement begins May 13, 2027. That gives your startup roughly 13 months to get compliant or face penalties that can reach INR 250 crore (about $30 million) per violation.

Most founders I talk to think this only applies to large enterprises. It does not. The DPDP Act applies to every business processing digital personal data in India, regardless of size. If your SaaS product collects user emails, if your fintech app stores KYC data, if your healthtech platform handles patient records, you are a Data Fiduciary under this law.

Here is what your dev team actually needs to build.

The Timeline You Cannot Ignore

The enforcement rolls out in three phases:

Phase 1 (November 2025, already live): The Data Protection Board of India is established and operational. Administrative provisions are in effect.

Phase 2 (November 2026): Consent Manager registration framework goes live. If your business acts as a consent intermediary, this is your deadline.

Phase 3 (May 13, 2027): Everything else. Consent requirements, Data Principal rights, security safeguards, breach notification, data retention and erasure, cross-border transfer rules. This is the date that matters for most startups.

The 18-month transition window from November 2025 sounds generous. It is not. Building consent infrastructure, auditing data flows, training teams, and implementing security safeguards takes longer than founders expect.

What the DPDP Act Actually Requires From Your Startup

1. Consent Management

Every time you collect personal data, you need explicit, informed, purpose-specific consent. Not a pre-ticked checkbox buried in your terms of service.

The requirements:

Consent must be free, specific, informed, and unambiguous
Each purpose needs separate consent (no bundling)
Withdrawal must be as easy as giving consent
You must provide a clear privacy notice listing exactly what data you collect and why
Consent records must be retained

If you process data from users under 18, you need verifiable parental consent. OTP to parent's mobile, identity document upload, digital signature, or Aadhaar-based authentication. No exceptions.

2. Security Safeguards

This is where the biggest penalty sits: INR 250 crore for failure to implement "reasonable security safeguards." The Rules specify:

Encryption of data at rest and in transit
Access controls with access logs and regular reviews
Intrusion detection systems
Data masking and obfuscation
Regular data backups
Data retention for minimum 1 year for breach investigation

If you are running a startup on AWS or Azure, this translates to: enable encryption everywhere, implement IAM properly, set up CloudTrail or Azure Monitor, configure alerts, and actually review access logs. Most startups I audit have none of this in place.

3. Breach Notification

When (not if) a breach happens, you have two deadlines:

Immediately: First intimation to the Data Protection Board and affected individuals. No delay.
Within 72 hours: Detailed report including what happened, what data was affected, and what you are doing about it.

Without automated detection tools and pre-built incident response templates, most startups will miss the 72-hour window. Build this infrastructure now, not after the breach.

4. Data Principal Rights

Your users have the right to:

Access a summary of their personal data and know who you have shared it with
Correct inaccurate data
Request erasure when the purpose is fulfilled
Withdraw consent at any time
File complaints with the Data Protection Board

You need to build these capabilities into your product. A "delete my data" button is not optional anymore.

5. Data Inventory

You cannot comply with a law about data protection if you do not know what data you have. Map every piece of personal data your startup collects: what data, where stored, who accesses, which vendors touch it, how long you retain it, and whether you can delete it on request.

Every vendor processing personal data for you is part of your risk surface.

The Penalty Table

These are per violation, per instance. A single incident can trigger multiple penalties:

Violation	Maximum Penalty
Failure to implement security safeguards	INR 250 crore (~$30M)
Failure to notify breach within 72 hours	INR 200 crore (~$24M)
Breach of children's data obligations	INR 200 crore (~$24M)
Breach of Significant Data Fiduciary obligations	INR 150 crore (~$18M)
Any other Data Fiduciary violation	INR 50 crore (~$6M)

The Board considers: gravity of breach, data sensitivity, whether it was repeated, what mitigation efforts were taken, and proportionality to your turnover. Being a startup does not give you a pass, but showing good-faith compliance efforts matters.

DPDP Act vs GDPR: Key Differences

If you are already GDPR compliant, you are not automatically DPDP compliant. Critical differences:

No "legitimate interests" basis. Under GDPR, you can process data without consent if you have a legitimate business reason. Under DPDP, it is consent or nothing (with narrow exceptions).
All breaches must be reported. GDPR only requires notification for breaches that risk individual rights. DPDP requires notification for every breach, regardless of severity.
Children's age threshold is 18. GDPR allows 13-16 depending on the member state. DPDP says 18 across the board.
Consent Managers are a new concept. GDPR has no equivalent. DPDP creates registered intermediaries specifically for consent management.
No data portability right. Unlike GDPR, DPDP does not include the right to data portability.
Cross-border transfers use a blacklist model. GDPR requires approved countries (whitelist). DPDP allows transfers everywhere unless a country is specifically restricted.

The 7 Mistakes Startups Make With DPDP Compliance

Assuming it is only for big companies. It is not. Every business processing digital personal data in India is covered.
Copy-pasting a GDPR privacy policy. The consent and notice requirements are different. Generic policies will not satisfy the itemized disclosure requirements.
Bundling consent. "By signing up, you agree to everything" is non-compliant. Each processing purpose needs separate consent.
No data inventory. If you do not know what personal data you have, where it is, and who can access it, you cannot comply.
Ignoring vendor risk. Your AWS account, analytics tools, CRM, payment processor: every third party that touches user data is your responsibility.
No breach response plan. The 72-hour notification window starts from when the breach is detected. Without automated detection and pre-built templates, you will miss it.
Treating security as a Phase 2 problem. The highest penalty (INR 250 crore) is for inadequate security safeguards. This is not something you bolt on later.

Your 6-Month Compliance Roadmap

Month 1: Data Discovery

Complete data inventory: what personal data, where stored, who accesses, which vendors
Map data flows across your application and infrastructure
Identify gaps in your current privacy notice

Month 2: Consent Infrastructure

Build purpose-specific consent collection
Implement consent withdrawal mechanism
Create itemized privacy notice per DPDP requirements
If handling children's data, implement parental consent verification

Month 3: Security Hardening

Enable encryption at rest and in transit across all services
Implement proper IAM with least-privilege access
Set up access logging and monitoring
Configure intrusion detection

Month 4: Breach Response

Build automated breach detection
Create incident response playbook with clear roles
Prepare notification templates for the Board and affected users
Run a tabletop exercise

Month 5: Data Principal Rights

Build data access, correction, and deletion capabilities
Create user-facing dashboard for consent management
Test the full lifecycle: user requests data, receives it, requests deletion, data is deleted

Month 6: Audit and Documentation

Internal compliance audit
Document everything (the Board wants to see evidence of good-faith effort)
Train team members who handle personal data
Set up ongoing monitoring and review cadence

Do Not Wait Until 2027

The startups that start now will be compliant by May 2027. The startups that wait will be scrambling, cutting corners, and hoping the Board does not come knocking.

If you want a clear picture of where your startup stands today, book a free 20-minute infrastructure review. We will tell you exactly what is broken and what it costs to fix. No pitch, just a practical assessment.

MatrixGard helps funded startups get audit-ready in 4-6 weeks. See how we work or view our pricing.

AWS IAM Audit for Startups: A Step-by-Step Guide to Finding and Fixing Risky Permissions

noreply@matrixgard.com (Avinash S) — Thu, 26 Mar 2026 13:02:17 GMT

Most startups don't have an IAM problem. They have ten IAM problems, and they don't know about any of them. A developer needed S3 access six months ago, got AdministratorAccess because it was faster, and that credential is still active. A Lambda function has a role that can write to every DynamoDB table in the account. An intern who left in March still has a login. This is the normal state of AWS IAM at a Series A company, and it is a serious liability.

This guide walks you through an AWS IAM audit for your startup using the AWS CLI and the IAM console. No paid tools required to start. You will know exactly what to look for, what to fix first, and what mistakes to avoid.

Why IAM Audits Matter More at Startups

Larger companies have dedicated security teams running automated compliance checks. Startups move fast, give developers broad access to unblock them, and rarely clean up afterward. That combination means your AWS blast radius, the scope of damage an attacker can do with one compromised credential, is usually much larger than it should be.

IAM misconfigurations are consistently in the top causes of AWS-related breaches. Stolen credentials with overly broad permissions turn a phishing email or a leaked .env file into a full account compromise. An audit does not take days. A focused review takes two to four hours and can significantly reduce your exposure.

Step 1: Generate the IAM Credential Report

Start here. Run this command to generate a CSV of every IAM user, their last activity, and whether MFA is enabled:

aws iam generate-credential-report

Then download it:

aws iam get-credential-report --query Content --output text | base64 -d > iam_report.csv

Open the CSV and look for three things immediately. First, any user where password_last_used is more than 90 days ago or is empty. Those accounts are dormant and should be disabled or deleted. Second, any user where mfa_active is false and password_enabled is true. That is a human login without MFA, which is unacceptable. Third, any access key where access_key_1_last_used_date is older than 90 days. Rotate or delete it.

Step 2: Find Overprivileged Users and Roles

Run this to list all users with attached managed policies:

aws iam list-users --query 'Users[*].UserName' --output text | tr '\t' '\n' | xargs -I{} aws iam list-attached-user-policies --user-name {}

You are specifically looking for AdministratorAccess or PowerUserAccess attached to any user who is not a break-glass emergency account. If a developer has AdministratorAccess for day-to-day work, that is the first thing to fix.

For roles, do the same check:

aws iam list-roles --query 'Roles[*].RoleName' --output text | tr '\t' '\n' | xargs -I{} aws iam list-attached-role-policies --role-name {}

Pay close attention to roles used by Lambda functions, ECS tasks, and EC2 instances. These are frequently over-permissioned because they were set up quickly and never revisited.

Step 3: Use IAM Access Analyzer

Enable IAM Access Analyzer in the IAM console if you have not already. It is free and it will flag any resource policies that allow access from outside your AWS account or organization. Go to IAM, click Access Analyzer, create an analyzer for your account or organization, and review the findings. Any finding labeled as external access to an S3 bucket, KMS key, or Lambda function deserves immediate attention.

Step 4: Review Inline Policies and Old Roles

Inline policies are easy to miss because they do not show up in managed policy lists. Check them with:

aws iam list-user-policies --user-name YOURUSERNAME

Also audit roles that have not been used recently. AWS records last role activity in the console under IAM, Roles. Sort by last activity and flag anything unused for 60 days or more for deletion.

Common Mistakes Startups Make

Using the root account for anything operational. Create an admin IAM user or use AWS SSO. Lock down root and store those credentials offline.
Sharing access keys across team members. Every person and every service should have its own credential. Shared keys make audit logs useless.
Attaching policies directly to users instead of groups or roles. This makes permissions impossible to manage at scale. Use groups for humans and roles for services.
Skipping the permission boundary on developer roles. If developers can create IAM roles themselves, they can escalate their own privileges. Use permission boundaries to cap what they can grant.
Never reviewing third-party cross-account roles. Every SaaS tool you connected to AWS may have a cross-account role sitting in your account with broad access. List them and verify they are still needed and still scoped correctly.

Run this audit quarterly at minimum. If you are preparing for SOC 2 or a security review from an enterprise customer, you will need evidence that you do this regularly. A spreadsheet log of findings and remediations is enough to start.

Need help?

If you'd rather have someone do this for you, book a free 20-minute call with MatrixGard. We'll tell you what's broken and what it costs to fix.

Cloud Cost Optimization for Startups: Cut AWS Bills Fast

noreply@matrixgard.com (Avinash S) — Thu, 26 Mar 2026 13:00:38 GMT

Cloud bills have a way of sneaking up on you. One quarter you are running lean, and the next you are staring at a $40,000 AWS invoice wondering where it all went. For startups, that kind of surprise can derail a runway projection and trigger uncomfortable conversations with your board. The good news is that most cloud waste follows predictable patterns, and fixing them does not require a dedicated FinOps team.

Start With Visibility Before You Cut Anything

The single biggest mistake I see startup teams make is jumping straight to reserved instances or savings plans without first understanding where money is actually going. Turn on AWS Cost Explorer or the equivalent in your cloud provider and tag every resource by environment, team, and service. Without tagging, you are flying blind.

A practical first step: run this AWS CLI command to find untagged EC2 instances.

aws ec2 describe-instances --query 'Reservations[*].Instances[?!not_null(Tags)]'

Once you have tagging in place, set up a weekly cost report delivered to a Slack channel. Visibility alone tends to change behavior. Engineers who see their service costs start making smarter decisions about instance sizes and data transfer.

Right-Size Your Compute First

Compute is almost always the largest line item for early-stage startups, and it is almost always over-provisioned. A team will launch a service on an m5.2xlarge during a high-traffic test and forget to scale it back down. That single instance running idle costs roughly $280 per month.

Use AWS Compute Optimizer or Datadog's infrastructure recommendations to find instances running below 20 percent CPU utilization for more than two weeks. Those are your first targets. Downsizing from an m5.2xlarge to an m5.large on a low-traffic internal service can save over $200 per month per instance.

Check CPU and memory utilization over a 30-day window, not just peak hours
Consider Graviton-based instances (m7g, c7g) which run 20 to 40 percent cheaper than x86 equivalents
Use Spot Instances for batch jobs, data pipelines, and non-critical background workers

Storage Costs Compound Quietly

S3 buckets, EBS volumes, and RDS snapshots accumulate over time without anyone noticing. A startup I worked with was spending $3,200 per month on S3 alone, and nearly half of it was old build artifacts and test data nobody had touched in over a year.

Set lifecycle policies on every S3 bucket. For most engineering assets, moving objects to S3 Intelligent-Tiering after 30 days and to Glacier after 90 days cuts storage costs by 60 percent or more with zero code changes.

For RDS, audit your automated snapshot retention settings. The default is often 7 days, but teams leave it at 35 days and forget. Also check for unattached EBS volumes using:

aws ec2 describe-volumes --filters Name=status,Values=available

Available volumes are not attached to any instance. You are paying for storage that is doing nothing.

Data Transfer Is a Hidden Budget Killer

Data transfer fees are confusing by design, and they catch a lot of startup teams off guard. Traffic leaving AWS to the public internet costs $0.09 per GB in us-east-1. If your application is pulling data from S3 in one region and processing it in another, you are paying cross-region transfer fees on top of that.

Use VPC Endpoints for S3 and DynamoDB to eliminate NAT Gateway data processing charges
Co-locate your compute and storage in the same region and availability zone where possible
Enable S3 Transfer Acceleration only when users are globally distributed, not as a default

A single NAT Gateway processing 10 TB per month adds roughly $450 in processing fees alone, separate from the hourly charge. Switching internal traffic to VPC Endpoints removes that cost entirely for eligible services.

Build Cost Checks Into Your Engineering Workflow

Cloud cost optimization for startups is not a one-time audit. It is a habit. The teams that keep bills under control treat infrastructure spend the same way they treat security, which means they review it regularly and they catch regressions early.

Add Infracost to your Terraform pull requests so engineers see cost diffs before merging
Set billing alerts at 80 percent and 100 percent of your monthly budget in CloudWatch
Schedule a 30-minute monthly cost review with your lead engineer and someone from finance
Use AWS Budgets with service-level breakdowns so you can spot anomalies by resource type

The goal is not to make engineers afraid to provision resources. The goal is to make costs visible so that decisions are intentional. A startup that builds this muscle early will scale infrastructure spending in proportion to revenue instead of in spite of it.

Need help?

If you would rather have someone do this for you, book a free 20-minute call with MatrixGard. We will tell you what is broken and what it costs to fix.