The Low-Flow Showerhead for AI
How SASI Could Make Artificial Intelligence Sustainable
Every time you ask an AI a question, somewhere a data center burns electricity. A lot of it.
AI inference, the process of an LLM generating a response, is one of the most energy-intensive computing workloads that exists at scale. Data centers running large language models consume electricity at rates that rival small cities. And because those data centers generate enormous heat, they cool themselves with something equally precious: fresh water. Estimates suggest a single conversation with a large language model can consume the equivalent of a small bottle of water in cooling alone, on top of the electricity cost. Multiply that by billions of daily queries, and AI's environmental footprint stops looking like a rounding error and starts looking like a policy problem.
Here's what surprised me: while building SASI, a clinical-grade AI safety middleware, I realized the architecture we designed for human safety might also have a meaningful impact on both electricity consumption and water use. This post is my honest attempt to think through what that could mean, where the claims hold up, and where they need more work.
"The architecture we designed for human safety might also have a meaningful impact on resource efficiency."
1. Symbolic Logic Is Cheap. Neural Networks Are Not.
SASI is built on a neuro-symbolic architecture. At its core, this means a lightweight, rule-based symbolic layer governs when and how the heavy neural AI gets involved. Think of it as a bouncer at the door of an expensive nightclub, most requests are triaged fast and cheaply before the full compute machinery ever spins up.
Symbolic logic (rules, conditions, logical checks) requires orders of magnitude less compute than running a large neural network inference pass. Early research in neuro-symbolic AI suggests this kind of hybrid approach can meaningfully reduce the energy cost of AI safety evaluation specifically, not the underlying model, but the governance overhead on top of it.
Important caveat: SASI wraps the LLM, it doesn't replace it. The neural model still runs. But if the safety evaluation that used to require a secondary AI model can now be handled by a symbolic ruleset, that's a real and measurable efficiency gain. We're not claiming improvements on the whole system, but on the governance layer specifically, the numbers look compelling.
2. Fewer Wrong Answers Means Less Electricity, and Less Water
One of the most overlooked inefficiencies in AI usage is the re-roll: the frustrated user who asks the same question three times because the first two answers were wrong or confusing.
Each retry is a fresh inference call. Fresh inference means fresh electricity drawn from the grid, fresh heat generated in the data center, fresh water consumed in cooling. Every redundant query is waste, paid for in kilowatt-hours and liters.
SASI's core value proposition, ensuring that AI responses are safe, aligned, and clinically appropriate before they reach the user, is also, indirectly, an efficiency proposition. If we can dramatically improve first-time accuracy, we reduce the number of retries. The math is simple: 30–50% fewer redundant queries is 30–50% less electricity and water wasted on the same task.
This is speculative in terms of exact numbers. But the directional logic is sound, and it's the kind of claim that deserves real measurement once SASI is deployed at scale.
3. Edge Governance: Move the Safety Check Closer to the User
Right now, most AI safety evaluation happens in massive cloud data centers. Every request goes out, gets evaluated, and comes back, all in the cloud, all in facilities that require industrial-scale cooling.
Because SASI is a deterministic middleware SDK, not a giant neural model, it's small enough to run at the edge. On a laptop. On a phone. On a hospital workstation.
This is the most concrete efficiency argument in SASI's favor. If the safety audit happens locally, before the request ever hits the cloud, you offload real governance compute from centralized infrastructure. For enterprise deployments, think hospitals, schools, therapy platforms, this could represent a meaningful reduction in cloud dependency for compliance and safety overhead.
4. The Idea I'm Most Excited About: Resource-Aware Governance
Here's where we go from efficiency side effect to deliberate design choice.
What if SASI could tell you, before you hit send, that a particular query is computationally expensive? What if the middleware could say: "This request will consume significant resources. Would you like to route it to a smaller, more efficient model instead?"
This would turn AI safety infrastructure into a genuine sustainability tool. Not as a constraint, nobody wants to be told they can't ask questions, but as informed choice. The same way a smart thermostat doesn't stop you from cranking the heat but tells you what it costs.
We're calling this concept Resource-Aware Governance, and it's on our roadmap. It extends SASI's existing framework, where safety is defined multidimensionally, to include resource safety as a genuine governance dimension.
"What if AI safety infrastructure could also be a sustainability tool?"
5. The Simplest Argument: Fewer Tokens In, Less Electricity and Water Out
Here's the efficiency story that doesn't require any research papers to defend.
Every high-risk AI application, a mental health chatbot, an AI tutor, a clinical decision support tool, needs safety guardrails. Today, developers build those guardrails into system prompts. A typical safety system prompt for a high-stakes application runs 1,500 to 2,000 tokens or more. That prompt gets sent to the LLM on every single API call; every single time a user sends a message.
SASI moves those guardrails out of the prompt and into the infrastructure layer. The symbolic middleware handles governance before the request hits the LLM. The prompt gets shorter. The LLM does less work per call. Less work means less electricity consumed, and less electricity means less heat generated, less cooling required, and less water used.
That's a real, measurable reduction in compute, and compute is what drives both your electricity bill and your water footprint.
The Numbers, Honestly Presented
Research from UC Riverside and published in Communications of the ACM puts GPT-3-scale inference water consumption at roughly 500ml per 10–50 queries. Google's own infrastructure data puts their Water Usage Effectiveness at 1.15 liters of water per kWh of electricity. A medium-sized AI request consumes approximately 0.004–0.016 kWh of server energy depending on model size.
A 2,000-token safety system prompt represents conservatively 20–35% of a typical request's prefill compute cost. Eliminating it saves roughly 0.001–0.004 kWh of electricity per call, and 3–8ml of water. Small per call, but the electricity savings translate directly to cost savings for developers, and the water savings compound at scale:
- 1 app, 10,000 daily users: saves ~4,000 kWh of electricity and ~15,000–30,000 liters of water per year
- 100 apps on SASI: ~400,000 kWh and 1.5–3 million liters per year
- 1,000 apps on SASI: ~4 million kWh and 15 million liters per year, enough electricity to power 370 homes annually
To make the water number human: 1.5 million liters is 20 residential swimming pools. To make the electricity number human: 400,000 kWh is what roughly 37 average US homes use in an entire year. Both figures are byproducts of simply building AI safety the right way.
More importantly, these savings compound as AI usage grows. The efficiency gain is baked into the architecture; it scales automatically with adoption.
"SASI moves safety out of the prompt and into the infrastructure layer, which is better for users, developers, and the planet."
Where This Is Solid. Where It Needs Work.
I want to be honest about the state of these claims, because I think the AI industry has a credibility problem with overselling environmental commitments.
What's defensible right now:
• Edge governance is real, symbolic middleware can legitimately run locally, saving cloud electricity
• Fewer retries = less compute, the directional logic on electricity and water savings is sound
• Symbolic evaluation overhead is cheaper than neural evaluation overhead, well-supported in research
• Shorter prompts = less electricity per call, straightforward token math, no speculation needed
What needs more evidence:
• Exact kWh and liter figures, we need real deployment telemetry, not just published benchmarks
• Hallucination interception, SASI catches policy violations, not all hallucinations
• Infrastructure-level measurement, electricity and water savings vary significantly by data center location and model size
6. The Competitive Reality: Every Other Safety Solution Makes AI Heavier
Here's something the AI safety industry doesn't talk about enough: most conventional safety approaches don't just fail to reduce compute overhead; they actively increase it. Sometimes dramatically.
Understanding this contrast is what makes SASI's efficiency story genuinely compelling rather than just directionally interesting.
Cloud Guardrails (Azure AI Content Safety, AWS Guardrails, etc.)
These services run a second AI model in parallel with your primary LLM. Your request goes to the main model and simultaneously to a separate safety model sitting in Microsoft's or Amazon's infrastructure. You are literally running two models for the price of one query. The safety overhead isn't reduced, it's doubled. Both models drink water. Both consume energy. The user gets one answer.
MCP Servers (Model Context Protocol)
MCP adds external server roundtrips on every tool call. The request leaves your infrastructure, hits an external server, comes back, and the results get injected back into the context window, making it larger and more expensive to process on the next turn. Every MCP integration compounds the compute cost of a conversation over time.
Long Safety System Prompts
The most common approach and the most wasteful. A developer writes a 2,000–6,000 token safety rulebook and pastes it at the top of every conversation. The LLM re-reads the entire rulebook before answering every single message, from every single user, all day long. It's the equivalent of an employee re-reading the entire company policy manual before responding to each email. The compute waste is continuous and invisible.
RAG-Based Safety Layers
Retrieval Augmented Generation safety approaches add a vector database lookup, an embedding computation step, and retrieved safety content injected into context on each call. Each of those steps has its own energy and water cost, stacked on top of the base inference cost.
"Every other safety solution makes AI heavier. SASI makes it lighter."
The pattern is consistent across every conventional approach: safety is implemented as something added on top of the LLM. More layers, more compute, more energy, more water.
SASI's architecture inverts this entirely. The symbolic middleware intercepts before the LLM, handles governance with lightweight logical rules rather than neural compute, and returns a leaner, cleaner request. The LLM does less work, not more.
To put a rough percentage on it: conventional cloud guardrail approaches add 50–100% overhead to the safety evaluation cost. Long system prompts add 20–35% overhead to every single inference call. SASI's symbolic layer costs a fraction of either, or runs locally, without a cloud round-trip.
This isn't just an environmental argument. It's a performance argument, a cost argument, and a latency argument all at once. Safer AI that is also faster, cheaper, and less resource-intensive is a genuinely unusual value proposition. It exists because the architecture was designed right from the start.
Why This Matters for ESG-Focused Investors
Environmental, Social, and Governance criteria are increasingly central to how institutional capital evaluates technology companies. AI infrastructure is coming under growing scrutiny, for its electricity consumption, its carbon footprint, and the water stress it creates in the regions where data centers are concentrated.
SASI isn't primarily an environmental product. It's a clinical-grade safety layer for AI systems operating in high-stakes domains, mental health, education, healthcare. But the architecture that makes it safe also makes it efficient. Less electricity consumed. Less water used. Less cost per call. That's not a coincidence; it's a consequence of building with symbolic AI principles from the ground up.
The pitch to ESG investors isn't "SASI will save the planet." It's: "The same architectural principles that make AI safer also make it more efficient. As AI scale increases, that electricity and water savings dividend becomes genuinely significant, and it compounds automatically with every new app that adopts SASI as its safety layer."
That's a story worth telling, carefully, with evidence, and without exaggeration.
