I Reviewed 44,305 IDS Signatures So You Don’t Have To: Here’s What I Learned
IDS signatures are a living archive of attacker behavior. They tell us not just what threats have been seen, but how adversaries evolve their techniques. Reviewing over 44,000 Suricata IDS signatures reveals a truth defenders often ignore: attackers make minor, iterative changes to what already works.
If we only chase signatures, we’ll always be behind. This isn’t a slight against Suricata; the project is incredible, open, and community-driven. These are valuable lessons from hard-fought battles, and we should be mining this data to make our networks more secure.
The real issue is how the industry uses IDS: too often it is treated as a compliance checkbox rather than a critical sensor worth tuning, measuring, and learning from. By lifting the hood and showing the metrics, we can see how much potential is being left untapped when SOCs just accept alerts without ever questioning what’s under the engine.
What IDS Is (and How It Works)
Intrusion Detection Systems (IDS) are designed to inspect network traffic and trigger alerts when malicious activity is suspected. They operate primarily in two modes:
Signature-based detection: Rules written to identify known attack patterns, such as exploit payloads, specific CVE exploits, or malware communication fingerprints.
Anomaly-based detection: Alerts generated when traffic deviates from expected baselines.
Suricata (and systems like it) rely heavily on signatures. Each signature is essentially a pattern-matching rule: match this string, regex, or packet condition, and fire an alert. These signatures are community-driven, often created in response to real-world attacks. When a vulnerability is exploited, researchers craft a signature to catch that behavior in the future.
Where the industry stumbles is in what happens after. Instead of tuning, validating, or analyzing the quality of signatures, SOCs tend to dump them in, enable logging, and move on. The result? Alerts that are:
Too broad: false positives everywhere.
Too narrow: attackers change one byte and bypass the rule.
Too old: 14+ year-old CVEs still wasting cycles.
When IDS is reduced to a checkbox, it shifts from a shield to a noise machine. The lesson from 44,305 signatures is not that IDS doesn’t work; it’s that we’re not using it to its full potential.
Sample Packet Walkthrough
To see how these signatures actually work, let’s walk through a simplified example. Imagine a packet carrying a suspicious HTTP request:
GET /admin.php?id=1' OR '1'='1 HTTP/1.1
Host: vulnerable.example.com
User-Agent: Mozilla/5.0
This is a classic SQL injection attempt. A Suricata signature for this might look like:
alert http any any -> any any (msg:"ET WEB_SPECIFIC_APPS SQL Injection Attempt"; content:"' OR '1'='1"; nocase; sid:2025001; rev:1;)
When Suricata inspects the packet, it parses the HTTP payload. The content keyword in the signature searches for the SQLi string ' OR '1'='1. When it matches, the rule fires and generates an alert tagged with the msg field. That alert gets logged, picked up by Splunk, and counted in the metrics we’ve been analyzing.
This simple case illustrates the larger point: signatures fire when known bad patterns appear. But attackers only need to change that payload to /admln.php?id=1' OR 'a'='a and the rule won’t match. Which is why signatures are valuable intelligence, but not a guarantee of prevention.
Key Metrics From the Dataset
Total signatures: 44,305
2,031 CVE references spanning 1,132 unique CVEs.
CVE call-out: Those 2,031 mentions map to 1,132 distinct issues. The same CVE often appears across multiple rules (different protocols, payload variants, or revisions), which means analysts can end up triaging the same vulnerability multiple times. Use this duplication to your advantage:
(1) patch first where duplicate coverage is highest,
(2) de-duplicate/tune overlapping rules, and
(3) measure alert volume reduction after remediation.
Earliest CVE: CVE-2006-2149, Latest CVE: CVE-2025-57218
Historical span: This dataset covers nearly twenty years of vulnerabilities, from CVE-2006-2149 through CVE-2025-57218. Attackers continue to recycle older CVEs long after disclosure, underscoring the importance of patching and monitoring legacy systems.
17,085 signatures (38.6%) remain at Rev 1: never tuned or updated from their first draft.
4,214 signatures (9.5%) contain regex/PCRE patterns, many of them overly broad and computationally expensive.
src_ip breakdown: 2,203 (5.0%) use literal any; 41,150 (92.9%) use Suricata macros like $HOME_NET/$EXTERNAL_NET; 952 (2.1%) enumerate literal IPv4 addresses (often as lists). All 44,305 entries are non-empty.
Attack Surface Metrics: Where the Fire Is Hottest
Top Services Targeted:
HTTP/HTTPS: Constant probing for vulnerable CMS, outdated plugins, and known exploits.
DNS: 14,000+ signatures tied to tunneling, exfiltration, and command-and-control.
SMB (445/139): Legacy file shares remain ransomware’s favorite launchpad.
RDP (3389): Still a goldmine for brute force and credential stuffing attacks.
Mobile Services (push, sync, device APIs): Attacks on app sync services, push notification endpoints, and device management APIs.
Top Ports Seen in Signatures:
445/139: SMB-based ransomware deployment paths.
3389: RDP brute forcing and abuse.
443: TLS-encrypted traffic hides malicious payloads from weak inspection.
80: That dusty web app you forgot about? Attackers didn’t.
53: DNS exfiltration and tunneling are evergreen.
5228/5223: Mobile push notification and app sync services, exploited for covert comms.
Top Destination Identifiers Triggering Signatures:
Concentrated in TOR exit nodes, bulletproof hosting, dynamic residential IP ranges used by botnets, and cloud-hosted command-and-control servers.
The Hidden Value in the msg Field
Every Suricata rule carries a msg: not just a label, but a compact threat-intel sentence. Mining it at scale shows where coverage is concentrated and which attack themes dominate.
Coverage by technology (from msg):
HTTP/HTTPS: 24,195 (54.6%)
DNS: 14,082 (31.8%)
TLS/SNI: 7,519 (17.0%)
Databases (SQL/NoSQL): 4,287 (9.7%)
Mobile APIs/Platforms: 2,550 (5.8%)
Long tail: SMB 473 (1.1%), RDP 691 (1.6%), Email (IMAP/POP3/SMTP) 166 (0.4%), Cloud 782 (1.8%), FTP 294 (0.7%), VPN 95 (0.2%), SSH 73 (0.2%), VNC 27 (0.1%).
Attack style in msg:
C2 / beaconing: 1,092 (2.5%)
Web exploitation (SQLi/XSS/LFI/RCE/etc.): 5,652 (12.8%)
Overflow: 677 (1.5%)
Tunneling / exfiltration: 571 (1.3%)
Scanner / recon: 476 (1.1%)
Webshells: 336 (0.8%)
Brute force: 47 (0.1%)
Family/tool concentration (named in msg):
Sliver: 433 (1.0%), Cobalt Strike: 279 (0.6%), Predator (Android): 291 (0.7%), Magecart: 215 (0.5%), IcedID: 179 (0.4%), PlugX: 23 (0.05%), Gh0st: 125 (0.3%), Phorpiex: 78 (0.2%), TrickBot: 36 (0.1%), QakBot: 6 (<0.1%).
Feed/source tags observed in msg: mostly ET (~97%), with smaller slices GPL, Community, and Cyber Threat Intelligence | Threatview.io . (Messages can carry multiple tags; counts aren’t exclusive.)
Where efforts are visibly concentrated:
The big three surfaces are web (HTTP/HTTPS), name resolution (DNS), and encrypted traffic context (TLS/SNI), together spanning ~70%+ of all messages.
Command and control still shows up, but is a much smaller fraction (~2.5%) than assumed in prior datasets. Many beaconing detections are blended into web/DNS/TLS categories.
Named families like Sliver/Cobalt Strike/Magecart/IcedID indicate sustained detection work against operator frameworks and crimeware, not just CVE-driven exploits.
Examples of high-frequency msg prefixes (signals of focus):
ET ATTACK_RESPONSE Havoc/Sliver Framework TLS Certificate Observed: 429 rules
ET MOBILE_MALWARE Android Spy PREDATOR CnC Domain in DNS Lookup: 226 rules
ET JA3 Hash: 102 rules
ET MALWARE Phorpiex CnC Domain in DNS Query: 62 rules
ET WEB_SPECIFIC_APPS … SQL Injection Attempt: dozens per app family
Takeaway: The msg field is a taxonomy waiting to be mined. It reveals that coverage is heaviest on web/DNS/TLS and C2 behaviors with sustained attention on operator frameworks. Use it to steer tuning and hardening where attackers actually operate, rather than treating rules as a flat, noisy list.
Cloud Metrics: Attacker Use of Major CSPs
Focusing only on major cloud service providers (CSPs) avoids noise from general SaaS or unrelated mentions. In this dataset, ~2,031 rules (~4.6%) directly reference Microsoft/Azure, AWS, or Google Cloud services in the msg field.
Breakdown by provider (non-exclusive):
Microsoft / Azure (Office365, Graph API, SharePoint, OneDrive, Azure blobs, etc.): ~1,154 rules (~56.8% of CSP-related)
AWS (S3, CloudFront, API Gateway, EC2, IAM, etc.): ~89 rules (~4.4%)
Google Cloud (Google APIs, Drive, Docs, Appspot, Firebase, etc.): ~540 rules (~26.6%)
Why SOCs Are Drowning
It’s brutally difficult for defenders to get IDS right, and attackers know it. They only need to change a byte or tweak a payload to slide past rules, while we wrestle with brittle signatures and noisy alerts. The data shows why:
Context is missing. Only ~5% of signatures specify a concrete src_ip, and most rely on macros. That means broad matching with little precision.
Regex rules are heavy but not precise. They chew CPU cycles, trigger floods of false positives, and still miss clever variations.
Overlapping signatures pile up. Dozens of rules catch the same family of attacks with slight variations.
Ancient CVEs still generate alerts. Proof that attackers don’t need zero-days when defenders fail to patch.
Revisions stall. Nearly 40% of rules remain at Rev 1, never tuned or improved after the first draft.
For SOC analysts, this means IDS often feels like background noise: drowning in alerts, chasing ghosts, while attackers waltz through small changes.
Action Plan: Hardening Over Chasing
Signatures are useful, but by the time one fires, the attacker is already knocking. The real value comes from studying them and using those lessons to harden ahead of time.
Patch Beyond the Headlines. We saw CVEs from as far back as 2006 still in play. That means attackers still find unpatched systems. Vulnerability remediation remains the #1 control.
Kill Service-Port Weak Links. Thousands of signatures target SMB (445/139), RDP (3389), and legacy web apps. Blocking or segmenting these services erases entire clusters of rules.
Monitor High-Risk Combos. DNS tunneling + TLS beaconing show up repeatedly. Enforce DNS egress policies, monitor JA3/JA4 TLS fingerprints, and baseline outbound connections.
Harden Before They Knock. Don’t wait for an IDS alert to tell you RDP is open to the world. Inventory exposed services, shut down what isn’t business critical, and enforce MFA. Closing RDP at the firewall alone wipes out hundreds of signatures from this dataset.
Reassess Rule Quality. Regex analysis showed 99% of PCREs were blind pattern matches. Invest in higher fidelity detections and push vendors to do the same.
Closing the Loop
After parsing through 44,305 signatures, one truth stands out: signatures are not the endgame, they are the breadcrumbs. Every rule is a story of an attack that already worked somewhere else. Treating IDS as a checkbox or drowning in its noise means we miss the real opportunity: to mine those stories and strengthen defenses before the next wave hits.
Hardening isn’t glamorous, but it is decisive. Close RDP and you erase hundreds of signatures from relevance. Patch a single CMS plugin and whole families of SQL injection rules disappear. Shut down legacy SMB and ransomware loses a highway. These aren’t theoretical wins; they’re visible in the dataset itself.
If we learn anything from this research, it’s that chasing signatures keeps us reactive. Studying them, tuning them, and most importantly acting on their lessons moves us from reaction to resilience. IDS will always be valuable, but only if we use it as a map of where we’ve been, not a crutch for where we’re going.













