Most enterprise AWS estates did not move to the cloud; they extended into it.
Direct Connect lands, someone writes a 10.0.0.0/16 rule, and that
rule sits there through every reorg and audit, silently trusting 65,536 addresses
for years. sg-tightener reads your VPC flow logs,
observes which IPs have actually connected, and replaces those broad rules with
the tightest covering CIDRs, without breaching AWS rule limits.
Worked example from a typical three-year-old account with a Direct Connect path from a corporate datacenter. Every replacement CIDR was empirically derived from 90 days of accepted connections in VPC flow logs.
# Before After 10.0.0.0/16 on tcp/443 (65,536 trusted) → 10.0.10.0/27, 10.0.20.10/31 (34 trusted) 10.4.0.0/16 on tcp/5432 (65,536 trusted) → 10.4.8.0/29 (8 trusted) 10.8.0.0/16 on tcp/8443 (65,536 trusted) → 10.8.2.0/28, 10.8.16.0/28 (32 trusted) -------------------------------------------------------------------- Total trusted address space: 196,608 → 74 Lateral-movement surface area reduction: 99.96%
Each tool covers a single responsibility. They share an approved-IPs file as the source of truth, so the same observed evidence flows through analyse, plan, apply, diagnose, and revert — with break-glass extend and rule compaction either side for live incidents.
Reads VPC flow logs (CloudWatch Logs or S3), builds a plan, applies it atomically per group, and halts loudly with a revert command if anything fails partway through.
View on GitHub →Surfaces private source IPs being REJECTED and not covered by any current rule. Lets you merge them into the approved list and re-apply in one loop.
View on GitHub →Walks the entire Organisation, assumes a cross-account role per account, scans every region in parallel, and ranks accounts by severity-weighted risk score. Exits non-zero on any CRITICAL, usable as a pipeline gate.
View on GitHub →
For DR failovers and live incidents. Reads the last 24h of flow logs,
finds the source IPs being REJECTED, and — when --groups
is omitted — looks up each REJECTed destination ENI to derive which
SGs need the rule. Lambda / Route53-healthcheck traffic collapses into
the AWS service prefix instead of /32s. Strictly additive; manifest is
folded back into the evidence base on the next cycle.
Reclaims rule budget when a group nears the 60-rule cap by widening existing RFC 1918 CIDRs into fewer blocks. You pick a compaction ratio; plan mode ranks the busiest groups and sweeps ratios so you can see the trade-off before applying. Coverage is always preserved.
View on GitHub →The flow is intentionally boring: read evidence, propose changes, apply with halt-on-error, diagnose anything that breaks. Every step writes a JSON artefact that the next step reads, with no hidden state.
Read 90+ days of flow logs from CloudWatch Logs or S3. Write a sorted, deduplicated list of accepted source IPs to approved.json.
Collapse observed IPs into the smallest CIDR set within each security group's rule budget. Emit a signed plan.json showing every revoke and authorise.
Revoke broad rules and authorise tight replacements per group. If any single group fails, halt immediately and print the revert command. No partial silent state.
If anything legitimate gets caught, sg_diagnose.py scans REJECT entries, surfaces uncovered private sources, and merges them back into approved.json.
# 1. Read 90 days of flow logs python sg_tightener.py analyse \ --region us-east-1 \ --log-group /aws/vpc/flowlogs \ --days 90 \ --out approved.json # 2. Build a plan (no AWS writes) python sg_tightener.py plan \ --region us-east-1 \ --approved approved.json \ --max-rules 60 \ --out plan.json # 3. Review plan.json, then apply python sg_tightener.py apply --plan plan.json # 4. If something legitimate is now being blocked python sg_diagnose.py --region us-east-1 \ --log-group /aws/vpc/flowlogs --hours 24 # Worst case: full revert from the manifest written by apply python sg_tightener.py revert --manifest manifest-20260528T120000Z.json
# Outage: at 2am you know things are broken — but not which SGs, which CIDRs, # or which ports. Omit --groups and sg_extend discovers everything itself # from the last 24h of REJECT flow logs (destination ENI -> attached SGs). python sg_extend.py \ --region us-east-1 \ --log-group /aws/vpc/flowlogs # When you do know which SGs, scope it explicitly. --include-public also # turns on AWS-service summarisation: Lambda Hyperplane / R53 health-check # sources collapse into the published service prefix, not /32 host routes. python sg_extend.py \ --region us-east-1 \ --groups sg-aaaa,sg-bbbb \ --log-group /aws/vpc/flowlogs \ --hours 24 \ --tolerance 0.5 \ --ports 443,5432 \ --include-public # Afterwards a group may be near the 60-rule cap. See where the rules are and # what each compaction ratio would reclaim (no AWS writes): python sg_compact.py plan --region us-east-1 # Pick a ratio, write a plan, review it, then apply: python sg_compact.py plan --region us-east-1 --ratio 0.5 --out plan.json python sg_compact.py apply --plan plan.json # Reverts via the same manifest machinery as sg_tightener. python sg_compact.py revert --manifest sg_compact-manifest-20260528T120000Z.json
AWS hard-caps a security group at 60 inbound rules by default. If 200 IPs have connected, you can't write 200 /32s, and a /16 reintroduces the permissiveness you're trying to remove. The algorithm finds the middle ground.
For each observed IP, walk outward to the widest containing prefix where the gap fraction (addresses in the block that were never observed) stays within the configured tolerance (default 30%).
Densely-populated subnets collapse aggressively. Sparse outliers stay as /32 host routes. Nothing widens past the IP's RFC 1918 home block.
If layer 1 produces more rules than the group's budget, widen the tolerance in 5% steps up to 95%, recomputing each time. Every step is logged so the operator can see exactly what trade-off was made.
If 95% tolerance is still over budget, merge the closest pair of blocks whose union introduces the smallest amount of new untrusted space. Merges never cross an RFC 1918 boundary, so a 10/8 block is never fused with a 172.16/12 block.
Force-fit prints a loud warning recommending an AWS Support quota increase.
The budget for replacement rules is computed from the current state of each group, not the global limit. If a group has 25 rules being left alone (SG references, public 0.0.0.0/0, already-tight CIDRs), the budget for replacements is 60 − 25 − (broad rules removed).
Prevents the failure mode where apply succeeds in revoking but fails in authorising because the destination can't hold the new rules.
Eligibility uses strict subset semantics
(not overlap) so overlapping non-private ranges like 192.0.0.0/4
are correctly excluded. Rules at /24 or tighter are not modified.
NACLs are scanned and labelled in the OU report. Automated NACL tightening is a planned phase two: stateless, subnet-scoped, and a 20-rule limit need separate care.
sg-tightener does not evaluate whether services should be reachable at all, only whether the source CIDR on existing private rules is broader than the evidence supports.
The default 90-day window is long enough to catch most regular traffic and not long enough to catch everything. Categories of traffic most likely to be missed are the ones that matter most in a crisis, and the tool is built around that risk, not in spite of it.
Extend with --days 180 or longer for accounts where you
know seasonal or infrequent traffic patterns exist: quarterly DR
tests, month-end batches, blue-green failovers where the dormant
environment was inactive during analysis.
If any single security group fails to update cleanly, apply halts immediately and prints the revert command. Partial silent state is impossible. Every apply writes a timestamped manifest of every change so revert can be one command.
Plans are signed with a SHA-256 hash of the security group snapshot they were built from. If anyone touches a relevant rule between plan and apply, apply refuses to run.
sg_extend.py exists for the cases the standard loop can't
cover: DR failovers, supplier IP cutovers, on-call moments where
connectivity must come back in minutes. It reads the last 24h of flow
logs, adds the REJECTED private sources — collapsed into CIDRs by a
configurable tolerance — strictly additively, and logs a manifest.
At 2am the operator often knows something is broken but not
which security groups need patching. Omit --groups
and sg_extend looks up each REJECTed destination ENI and
derives the attached SGs itself; flows are attributed only to the SGs
whose ENIs actually saw them, so a typo can't fan rules out across the
estate. A --max-groups cap (default 20) is the hard ceiling.
VPC Lambda traffic, Route53 health-checkers, and other managed-ENI
sources arrive from AWS-published IP ranges that rotate over time.
When --include-public is on, sg_extend
classifies each public source against the AWS ip-ranges.json
and collapses every flow that falls inside a service prefix into one
rule per service — instead of a fistful of /32s that go stale on
the next AWS rotation. The rule description tags the service and region
for audit visibility.
The AMAZON catch-all is deliberately blocklisted —
it covers essentially all of AWS and is too broad to be a trust source.
Pass --no-aws-summarise to revert to per-IP host routes.
A noisy incident can push a group toward the 60-rule cap.
sg_compact.py reclaims budget by widening existing RFC 1918
CIDRs into fewer blocks, gated by a compaction ratio — the fraction
of unused space you'll tolerate. Plan mode ranks the busiest groups and
sweeps ratios first; coverage is never reduced.
The CIDR-collapsing algorithm has enough edge cases that a regression suite is essential before any change. Every algorithmic invariant, parser, and validation rule is covered.
cd sg-tightener/
./install.sh
source .venv/bin/activate
python sg_tightener_test.py
Ran 66 tests in 0.025s
OK
View sg_tightener_test.py on GitHub →
Most organisations spend considerable effort building security controls at the perimeter: WAFs, DDoS protection, identity federation. What receives far less attention is the internal trust model once traffic is past the perimeter. The implicit assumption in most hybrid cloud estates is that the corporate network is trusted, and that assumption is encoded directly into security group rules as broad RFC 1918 CIDR blocks that nobody has revisited since they were written.
Modern threat models assume the corporate network is already compromised, or will be. Ransomware operators routinely move laterally across flat trusted networks before triggering payloads. Compromised build agents are a standard initial-access vector precisely because they sit in trusted ranges with broad permissions into production. The cloud did not eliminate flat networks; it gave many organisations the tools to build more sophisticated ones while quietly replicating the same trust assumptions they always made.
sg-tightener exists because trust should be earned through observed behaviour, not inherited from a datacenter subnet designed fifteen years ago.