What’s new today…

VentureBeat Transformative tech coverage that matters

  • Google and AWS split the AI agent stack between control and execution
    on April 22, 2026 at 9:37 pm

    The era of enterprises stitching together prompt chains and shadow agents is nearing its end as more options for orchestrating complex multi-agent systems emerge. As organizations move AI agents into production, the question remains: “how will we manage them?”Google and Amazon Web Services offer fundamentally different answers, illustrating a split in the AI stack. Google’s approach is to run agentic management on the system layer, while AWS’s harness method sets up in the execution layer. The debate on how to manage and control gained new energy this past month as competing companies released or updated their agent builder platforms—Anthropic with the new Claude Managed Agents and OpenAI with enhancements to the Agents SDK—giving developer teams options for managing agents. AWS with new capabilities added to Bedrock AgentCore is optimizing for velocity—relying on harnesses to bring agents to product faster—while still offering identity and tool management. Meanwhile, Google’s Gemini Enterprise adopts a governance-focused approach using a Kubernetes-style control plane. Each method offers a glimpse into how agents move from short-burst task helpers to longer-running entities within a workflow. Upgrades and umbrellasTo understand where each company stands, here’s what’s actually new. Google released a new version of Gemini Enterprise, bringing its enterprise AI agent offerings—Gemini Enterprise Platform and Gemini Enterprise Application—under one umbrella. The company has rebranded Vertex AI as Gemini Enterprise Platform, though it insists that, aside from the name change and new features, it’s still fundamentally the same interface. “We want to provide a platform and a front door for companies to have access to all the AI systems and tools that Google provides,” Maryam Gholami, senior director, product management for Gemini Enterprise, told VentureBeat in an interview. “The way you can think about it is that the Gemini Enterprise Application is built on top of the Gemini Enterprise Agent Platform, and the security and governance tools are all provided for free as part of Gemini Enterprise Application subscription.”On the other hand, AWS added a new managed agent harness to Bedrock Agentcore. The company said in a press release shared with VentureBeat that the harness “replaces upfront build with a config-based starting point powered by Strands Agents, AWS’s open source agent framework.” Users define what the agent does, the model it uses and the tools it calls, and AgentCore does the work to stitch all of that together to run the agent. Agents are now becoming systemsThe shift toward stateful, long-running autonomous agents has forced a rethink of how AI systems behave. As agents move from short-lived tasks to long-running workflows, a new class of failure is emerging: state drift.As agents continue operating, they accumulate state—memory, too, responses and evolving context. Over time, that state becomes outdated. Data sources change, or tools can return conflicting responses. But the agent becomes more vulnerable to inconsistencies and becomes less truthful. Agent reliability becomes a systems problem, and managing that drift may need more than faster execution; it may require visibility and control. It’s this failure point that platforms like Gemini Enterprise and AgentCore try to prevent. Though this shift is already happening, Gholami admitted that customers will dictate how they want to run and control any long-running agent. “We are going to learn a lot from customers where they would be using long-running agents, where they just assign a task to these autonomous agents to just go ahead and do,” Gholami said. “Of course, there are tricks and balances to get right and the agent may come back and ask for more input.”The new AI stackWhat’s becoming increasingly clear is that the AI stack is separating into distinct layers, solving different problems.  AWS and, to a certain extent, Anthropic and OpenAI, optimize for faster deployment. Claude Managed Agents abstracts much of the backend work for standing up an agent, while the Agents SDK now includes support for sandboxes and a ready-made harness. These approaches aim to lower the barrier to getting agents up and running.Google offers a centralized control panel to manage identity, enforce policies and monitor long-running behaviors. Enterprises likely need both. As some practitioners see it, their businesses have to have a serious conversation on how much risk they are willing to take. “The main takeaway for enterprise technology leaders considering these technologies at the moment may be formulated this way: while the agent harness vs. runtime question is often perceived as build vs. buy, this is primarily a matter of risk management. If you can afford to run your agents through a third-party runtime because they do not affect your revenue streams, that is okay. On the contrary, in the context of more critical processes, the latter option will be the only one to consider from a business perspective,” Rafael Sarim Oezdemir, head of growth at EZContacts, told VentureBeat in an email.Iterating quickly lets teams experiment and discover what agents can do, while centralized control adds a layer of trust. What enterprises need is to ensure they are not locked into systems designed purely for a single way of executing agents. 

  • Are you paying an AI ‘swarm tax’? Why single agents often beat complex systems
    by bendee983@gmail.com (Ben Dickson) on April 22, 2026 at 9:24 pm

    Enterprise teams building multi-agent AI systems may be paying a compute premium for gains that don’t hold up under equal-budget conditions. New Stanford University research finds that single-agent systems match or outperform multi-agent architectures on complex reasoning tasks when both are given the same thinking token budget.However, multi-agent systems come with the added baggage of computational overhead. Because they typically use longer reasoning traces and multiple interactions, it is often unclear whether their reported gains stem from architectural advantages or simply from consuming more resources.To isolate the true driver of performance, researchers at Stanford University compared single-agent systems against multi-agent architectures on complex multi-hop reasoning tasks under equal “thinking token” budgets.Their experiments show that in most cases, single-agent systems match or outperform multi-agent systems when compute is equal. Multi-agent systems gain a competitive edge when a single agent’s context becomes too long or corrupted.In practice, this means that a single-agent model with an adequate thinking budget can deliver more efficient, reliable, and cost-effective multi-hop reasoning. Engineering teams should reserve multi-agent systems for scenarios where single agents hit a performance ceiling.Understanding the single versus multi-agent divideMulti-agent frameworks, such as planner agents, role-playing systems, or debate swarms, break down a problem by having multiple models operate on partial contexts. These components communicate with each other by passing their answers around.While multi-agent solutions show strong empirical performance, comparing them to single-agent baselines is often an imprecise measurement. Comparisons are heavily confounded by differences in test-time computation. Multi-agent setups require multiple agent interactions and generate longer reasoning traces, meaning they consume significantly more tokens.ddConsequently, when a multi-agent system reports higher accuracy, it is difficult to determine if the gains stem from better architecture design or from spending extra compute.Recent studies show that when the compute budget is fixed, elaborate multi-agent strategies frequently underperform compared to strong single-agent baselines. However, they are mostly very broad comparisons that don’t account for nuances such as different multi-agent architectures or the difference between prompt and reasoning tokens.“A central point of our paper is that many comparisons between single-agent systems (SAS) and multi-agent systems (MAS) are not apples-to-apples,” paper authors Dat Tran and Douwe Kiela told VentureBeat. “MAS often get more effective test-time computation through extra calls, longer traces, or more coordination steps.”Revisiting the multi-agent challenge under strict budgetsTo create a fair comparison, the Stanford researchers set a strict “thinking token” budget. This metric controls the total number of tokens used exclusively for intermediate reasoning, excluding the initial prompt and the final output.The study evaluated single- and multi-agent systems on multi-hop reasoning tasks, meaning questions that require connecting multiple pieces of disparate information to reach an answer.During their experiments, the researchers noticed that single-agent setups sometimes stop their internal reasoning prematurely, leaving available compute budget unspent. To counter this, they introduced a technique called SAS-L (single-agent system with longer thinking).Rather than jumping to multi-agent orchestration when a model gives up early, the researchers suggest a simple prompt-and-budgeting change.”The engineering idea is simple,” Tran and Kiela said. “First, restructure the single-agent prompt so the model is explicitly encouraged to spend its available reasoning budget on pre-answer analysis.”By instructing the model to explicitly identify ambiguities, list candidate interpretations, and test alternatives before committing to a final answer, developers can recover the benefits of collaboration inside a single-agent setup. The results of their experiments confirm that a single agent is the strongest default architecture for multi-hop reasoning tasks. It produces the highest accuracy answers while consuming fewer reasoning tokens. When paired with specific models like Google’s Gemini 2.5, the longer-thinking variant produces even better aggregate performance.The researchers rely on a concept called “Data Processing Inequality” to explain why a single agent outperforms a swarm. Multi-agent frameworks introduce inherent communication bottlenecks. Every time information is summarized and handed off between different agents, there is a risk of data loss.In contrast, a single agent reasoning within one continuous context avoids this fragmentation. It retains access to the richest available representation of the task and is thus more information-efficient under a fixed budget.The authors also note that enterprises often overlook the secondary costs of multi-agent systems.”What enterprises often underestimate is that orchestration is not free,” they said. “Every additional agent introduces communication overhead, more intermediate text, more opportunities for lossy summarization, and more places for errors to compound.”On the other hand, they discovered that multi-agent orchestration is superior when a single agent’s environment gets messy. If an enterprise application must handle highly degraded contexts, such as noisy data, long inputs filled with distractors, or corrupted information, a single agent struggles. In these scenarios, the structured filtering, decomposition, and verification of a multi-agent system can recover relevant information more reliably.The study also warns about hidden evaluation traps that falsely inflate multi-agent performance. Relying purely on API-reported token counts heavily distorts how much computation an architecture is actually spending. The researchers found these accounting artifacts when testing models like Gemini 2.5, proving this is an active issue for enterprise applications today.”For API models, the situation is trickier because budget accounting can be opaque,” the authors said. To evaluate architectures reliably, they advise developers to “log everything, measure the visible reasoning traces where available, use provider-reported reasoning-token counts when exposed, and treat those numbers cautiously.”What it means for developersIf a single-agent system matches the performance of multiple agents under equal reasoning budgets, it wins on total cost of ownership by offering fewer model calls, lower latency, and simpler debugging. Tran and Kiela warn that without this baseline, “some enterprises may be paying a large ‘swarm tax’ for architectures whose apparent advantage is really coming from spending more computation rather than reasoning more effectively.”Another way to look at the decision boundary is not how complex the overall task is, but rather where the exact bottleneck lies.”If it is mainly reasoning depth, SAS is often enough. If it is context fragmentation or degradation, MAS becomes more defensible,” Tran said.Engineering teams should stay with a single agent when a task can be handled within one coherent context window. Multi-agent systems become necessary when an application handles highly degraded contexts. Looking ahead, multi-agent frameworks will not disappear, but their role will evolve as frontier models improve their internal reasoning capabilities.”The main takeaway from our paper is that multi-agent structure should be treated as a targeted engineering choice for specific bottlenecks, not as a default assumption that more agents automatically means better intelligence,” Tran said.

  • OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets
    by carl.franzen@venturebeat.com (Carl Franzen) on April 22, 2026 at 6:01 pm

    In a significant shift toward local-first privacy infrastructure, OpenAI has released Privacy Filter, a specialized open-source model designed to detect and redact personally identifiable information (PII) before it ever reaches a cloud-based server. Launched today on AI code sharing community Hugging Face under a permissive Apache 2.0 license, the tool addresses a growing industry bottleneck: the risk of sensitive data “leaking” into training sets or being exposed during high-throughput inference.By providing a 1.5-billion-parameter model that can run on a standard laptop or directly in a web browser, the company is effectively handing developers a “privacy-by-design” toolkit that functions as a sophisticated, context-aware digital shredder.Though OpenAI was founded with a focus on open source models such as this, the company shifted during the ChatGPT era to providing more proprietary (“closed source”) models available only through its website, apps, and API — only to return to open source in a big way last year with the launch of the gpt-oss family of language models.In that light, and combined with OpenAI’s recent open sourcing of agentic orchestration tools and frameworks, it’s safe to say that the generative AI giant is clearly still heavily invested in fostering this less immediately lucrative part of the AI ecosystem. Technology: a gpt-oss variant with bidirectional token classifier that reads from both directionsArchitecturally, Privacy Filter is a derivative of OpenAI’s gpt-oss family, a series of open-weight reasoning models released earlier this year. However, while standard large language models (LLMs) are typically autoregressive—predicting the next token in a sequence—Privacy Filter is a bidirectional token classifier.This distinction is critical for accuracy. By looking at a sentence from both directions simultaneously, the model gains a deeper understanding of context that a forward-only model might miss. For instance, it can better distinguish whether “Alice” refers to a private individual or a public literary character based on the words that follow the name, not just those that precede it.The model utilizes a Sparse Mixture-of-Experts (MoE) framework. Although it contains 1.5 billion total parameters, only 50 million parameters are active during any single forward pass. This sparse activation allows for high throughput without the massive computational overhead typically associated with LLMs. Furthermore, it features a massive 128,000-token context window, enabling it to process entire legal documents or long email threads in a single pass without the need for fragmenting text—a process that often causes traditional PII filters to lose track of entities across page breaks.To ensure the redacted output remains coherent, OpenAI implemented a constrained Viterbi decoder. Rather than making an independent decision for every single word, the decoder evaluates the entire sequence to enforce logical transitions. It uses a “BIOES” (Begin, Inside, Outside, End, Single) labeling scheme, which ensures that if the model identifies “John” as the start of a name, it is statistically inclined to label “Smith” as the continuation or end of that same name, rather than a separate entity.On-device data sanitizationPrivacy Filter is designed for high-throughput workflows where data residency is a non-negotiable requirement. It currently supports the detection of eight primary PII categories:Private Names: Individual persons.Contact Info: Physical addresses, email addresses, and phone numbers.Digital Identifiers: URLs, account numbers, and dates.Secrets: A specialized category for credentials, API keys, and passwords.In practice, this allows enterprises to deploy the model on-premises or within their own private clouds. By masking data locally before sending it to a more powerful reasoning model (like GPT-5 or gpt-oss-120b), companies can maintain compliance with strict GDPR or HIPAA standards while still leveraging the latest AI capabilities.For developers, the model is available via Hugging Face, with native support for transformers.js, allowing it to run entirely within a user’s browser using WebGPU.Fully open source, commercially viable Apache 2.0 licensePerhaps the most significant aspect of the announcement for the developer community is the Apache 2.0 license. Unlike “available-weight” licenses that often restrict commercial use or require “copyleft” sharing of derivative works, Apache 2.0 is one of the most permissive licenses in the software world.For startups and dev-tool makers, this means:Commercial Freedom: Companies can integrate Privacy Filter into their proprietary products and sell them without paying royalties to OpenAI.Customization: Teams can fine-tune the model on their specific datasets (such as medical jargon or proprietary log formats) to improve accuracy for niche industries.No Viral Obligations: Unlike the GPL license, builders do not have to open-source their entire codebase if they use Privacy Filter as a component.By choosing this licensing path, OpenAI is positioning Privacy Filter as a standard utility for the AI era—essentially the “SSL for text”.Community reactionsThe tech community reacted quickly to the release, with many noting the impressive technical constraints OpenAI managed to hit. Elie Bakouch (@eliebakouch), a research engineer at agentic model training platform startup Prime Intellect, praised the efficiency of Privacy Filter’s architecture on X:”Very nice release by @OpenAI! A 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion scale data cheaply. keeping 128k context with such a small model is quite impressive too”.The sentiment reflects a broader industry trend toward “small but mighty” models. While the world has focused on massive, 100-trillion parameter giants, the practical reality of enterprise AI often requires small, fast models that can perform one task—like privacy filtering—exceptionally well and at a low cost.However, OpenAI included a “High-Risk Deployment Caution” in its documentation. The company warned that the tool should be viewed as a “redaction aid” rather than a “safety guarantee,” noting that over-reliance on a single model could lead to “missed spans” in highly sensitive medical or legal workflows. OpenAI’s Privacy Filter is clearly an effort by the company to make the AI pipeline fundamentally safer. By combining the efficiency of a Mixture-of-Experts architecture with the openness of an Apache 2.0 license, OpenAI is providing a way for many enterprises to more easily, cheaply and safely redact PII data.

AWS News Blog Announcements, Updates, and Launches

    Feed has no items.