April 24, 2026

7 Signs Your S3 May Be Slowing Down Your SOC

Search debt is real, widely documented, and quietly reshaping security operations in ways that never get expensed. Or that get expensed across hidden invoices containing the real cost of what it takes to run multi-year searches, or those investigations your team decided not to run because the query would have taken too long to matter. Those answers that weren't chased. The pivots that got sent to query purgatory.

A real investigation is a relay race of pivots. Alert fires, analyst asks a question, gets an answer, the answer shapes the next question. Good investigations chase five to ten hops. Great ones keep pivoting until every asset is accounted for. The speed of those hops governs whether the investigation finds the full blast radius or closes with half the story.

The SOC hunting ground has a typical setup. Your SIEM (Splunk, Elastic, Datadog Cloud SIEM) holds two to four weeks of essential logs, hot and fast. Everything else lives in S3. Two kinds of data usually end up there: the high volume sources that would bankrupt the SIEM if you indexed them (DNS, WAF, VPC Flow, cloud audit), and the historical data that ages out of the hot window the moment a few weeks have passed.

When an analyst needs DNS logs from last Thursday, or a pivot from six weeks ago, they drop into Athena. Queries range from several minutes to hours. Often they time out. Most SOCs only pivot at SIEM speed across roughly 10% of their data. The other 90% is cold storage masquerading as visibility. Analysts are trying to think at human speed against infrastructure running at geological speed.

Here are seven signs it may be costing you more than you realize.

1. Your analysts schedule their coffee runs around query execution.

The "fire the query, go to Slack, come back in four minutes" rhythm is the most universal symptom. Splunk users have a phrase for it ("let's get some coffee before this search completes"). On S3, the rhythm is the same but the ceiling is harder.

Firebolt's engineering team documented that Athena times out after 30 minutes, and when it does you get no query results. On a broad security hunt, that means the real latency on a meaningful question is effectively "never." AWS staff on re:Post have acknowledged the same pattern: CloudTrail queries in Athena can take a long time for several reasons, even with WHERE clauses. That is AWS conceding their own service runs slow for the most common security log source in the cloud.

Every wait cycle breaks working memory. The analyst returns to their terminal having forgotten half the context of why they ran that query in the first place. Multiply by forty queries a shift and you have a team that never enters flow state.

2. Your default lookback window keeps shrinking and LIMIT 1000 appears in 80% of your query history.

The team used to search ninety days. Now it's fourteen. The threat model did not change. Athena started timing out. This is dangerous because it never shows up in your MTTR dashboard, yet it quietly sets your dwell time ceiling. Supply chain compromises take an average of 267 days to detect and contain. Lookback windows under thirty days functionally guarantee you will not find persistence mechanisms or slow burn campaigns. That 90% you already cannot search now gets compressed to a subset of the last two weeks.

The compression repeats at the row level. Open your team's query history and count the LIMIT clauses. Analysts learned that bounded queries return something and unbounded queries just hang, so they cap everything. Sampling is fine for exploration. Sampling an incident is how the post mortem ends with "we did not realize the scope."

The flip side shows up at audit time. SOC 2 wants a year of retention. PCI DSS wants twelve months with three months immediately available. HIPAA requires six years. FedRAMP requires three. The auditor asks a follow-up question that joins thirteen months of CloudTrail against a user directory, and Athena begins its journey toward the heat death of the universe. Teams cope by pre-running quarterly reports and stockpiling canned answers, until the auditor asks something new.

The same query engine is now writing your retention policy and your compliance narrative.

3. Investigations die at hop 3 or 4.

This is the most consequential sign. The relay race stops. Every investigation hits a wall at the same place. Analyst follows the IP, finds the user, finds the session, then stops. There is more to find. Each additional pivot costs five more minutes of wall time plus the cognitive reload of re-establishing context. By hop five the thread is gone. The investigation closes prematurely. Scope gets called at five affected systems when the real answer was fifty. Remediation misses half the footholds. Three weeks later, the same adversary returns through a dormant persistence mechanism nobody had time to find.

4. You cannot tell if your query engine is broken.

Athena reports per-query execution time, but the data often sits there ungathered. Most SOCs do not aggregate it, do not fingerprint queries to compare today's runs against last quarter's, and do not alert when a regression shows up against a baseline. The metric exists. The discipline does not.

The lived experience is universal: a query that ran in seconds last week takes minutes today, and there's no way to prove it. Athena performance varies with partition layout, file size distribution, and whatever else AWS is doing in the background that day.

5. One person writes all the fast queries.

The partition layout whisperer. They know which predicates hit partition keys, which columns have the right Parquet layout, which time ranges won't crash the cluster. When they take PTO, investigation quality measurably drops. This is the most underrated operational risk in an S3 plus Athena SOC, and it doubles as a career development crisis. Juniors cannot develop investigative craft when their iteration loop is five minutes wide. The SANS 2025 SOC Survey found that 70% of SOC analysts with five years or less experience leave within three years. Tooling friction is usually hiding somewhere in their exit interviews.

6. Your S3 savings quietly became 3 invoices.

These are the hidden invoices from the opening, itemized. You pay SIEM prices for recent data. You pay S3 prices for the archive. You pay Athena scan fees every time an investigation crosses the tier boundary, which is every investigation worth running. Ask the person who signs those invoices how much your "S3 cost savings" actually saved once the rest of the stack is accounted for. The math usually flips.

An entire cottage industry of "5X faster Athena" blog posts exists because the baseline is unlivable. A cottage industry of workaround ebooks is itself a symptom of the problem, dressed up as a resource.

Then there is the rehydration tax. A Splunk community user calls thawing archived S3 data back into searchable form "a real PITA process". Observe documents the same: rehydration "can take hours and consume compute and storage resources." Every cross-tier investigation pays this tax before it even starts.

7. Your threat hunts lost their joy and became compliance theatre.

The hunt is on the calendar. The template gets filled. The queries come from last quarter's hunt library because writing new hypotheses across longer term cold data takes too long. Hunting requires fast pivoting against broad data, especially the 90% nobody else is looking in. Without that, hunting becomes a reporting exercise.

On paper your SOC is going through the motions. The spark is lost. The hunt is why people get into this work. The people you hired for their pattern recognition are doing data entry, and the ones who still have the hunch are the ones who leave first for a proactive, forward-thinking startup that values the game.

What all this adds up to

Business cost. Slow search shows up as unproductive hours, extended MTTR, and a list of investigations that quietly close early. The exact number varies across every environment. If your MTTR is trending the wrong way quarter over quarter, the query engine is a live hypothesis worth testing.

Operational degradation. Detections ship to production without backtesting because validating against ninety days of S3 data takes overnight. False positive rates already exceed 50% in most enterprise SOCs. Forensic reports arrive late to Legal and Exec. Dashboards time out. Trust in the security team's data may slowly erode one board meeting at a time.

Human cost. 71% of SOC analysts report burnout, and 83% of IT security professionals admit that burnout has led to errors resulting in security breaches. Staring at loading spinners for four minutes, forty times a day, is slow motion burnout that shows up in exit interviews as "tooling frustration." Juniors plateau because feedback loops are too wide for the reps they need. The query whisperer eventually burns out from carrying the practice alone. Morale drops in Q2. Attrition hits the board deck in Q4.

The Detection Engineering Tax: In the ideal world we ship new detections on a weekly cadence. The ceiling isn't always analyst capacity. Every new rule competes for hot tier compute or Athena scan budget, and detection velocity gets capped at the rate the infrastructure can absorb rather than the rate the threat landscape demands. Your detection backlog ends up shaped by the budget instead of the MITRE ATT&CK matrix.
The Benchmarking Gap: Application teams have monitored production latency for fifteen years. SOC teams have not extended that practice to their own search engine. Most teams cannot answer "is Athena slower this week than last week" with data, because nobody is collecting it. Performance regressions happen quietly, and the only signal is when an analyst notices an investigation feels stuck.


Search speed is a security control

Scanner indexes your logs directly in S3 and returns full text search results across petabytes in seconds. You get the full visibility most SOCs have been trading away for budget, at the S3 economics that made the tradeoff necessary in the first place, with the search speed that makes the data actually usable.

The 90% becomes searchable. The three invoices collapse back into one. Every pivot returns before your analyst loses the thread. Every hunt tests real hypotheses against real data. Lookback windows open back up to a full year by default. The shadow stack comes down. Your analysts reach flow state. The investigation runs to its actual conclusion. And the best hunches do what they were always supposed to do: become detections.

Photo of Cliff Crosland
Cliff Crosland
CEO, Co-founder
Scanner, Inc.
Cliff is the CEO and co-founder of Scanner.dev, which provides fast search and threat detections for log data in S3. Prior to founding Scanner, he was a Principal Engineer at Cisco where he led the backend infrastructure team for the Webex People Graph. He was also the engineering lead for the data platform team at Accompany before its acquisition by Cisco. He has a love-hate relationship with Rust, but it's mostly love these days.