What’s new today…

VentureBeat Transformative tech coverage that matters

  • Apple’s new Siri AI is more than just a smarter assistant — it’s a new enterprise app layer
    by carl.franzen@venturebeat.com (Carl Franzen) on June 9, 2026 at 9:49 pm

    Apple’s new Siri AI, unveiled yesterday at Apple’s annual Worldwide Developers Conference (WWDC 2026), may look like a consumer product story on the surface. But for enterprise developers and IT leaders, the bigger news from WWDC26 is that Apple is turning Siri into a systemwide AI interface for apps, data and workplace actions across iPhone, iPad, Mac, Apple Watch and Vision Pro, as revealed in the WWDC26 Apple Intelligence developer guide.In other words, if your company offers an application on Apple devices, whether it’s served on iOS mobile device or Mac, the new Siri AI may force you to change how that application is discovered, served, and its contents and workflows made available to end users. Enterprise developers can expose app content through App Entities, make it available to Apple’s Spotlight semantic index, define actions through App Intents and App Schemas, and map onscreen user interface elements to app objects through View Annotations.That makes Siri AI much more than a voice assistant. Apple is positioning it as an AI-powered app action and content-discovery layer built into its operating systems.Siri becomes an app action layerFor enterprise developers, the shift could be significant. A business app that properly adopts Apple’s new frameworks could let users ask Siri to find, summarize, update or act on app content without the developer having to build a separate chatbot interface. Apple says App Intents, its existing framework for exposing app actions to system features like Siri and Shortcuts, is the path for connecting apps to Apple Intelligence and Siri AI, while schemas make app content and actions usable through natural language.In practical terms, that could apply to customer records in a CRM, open tickets in an IT service desk, project tasks, invoices, calendar events, documents, expenses, notes, messages or field-service records. Instead of opening an app, searching manually and clicking through menus, an employee could ask Siri to act on the specific object they are viewing or retrieve a related item from another app.Spotlight becomes the enterprise search hookApple says in its WWDC26 Apple Intelligence guide that entity schemas contribute app content to the Spotlight semantic index, while intent schemas let users take action on that indexed content without developers defining a rigid list of command phrases. Apple also says the new View Annotations API lets developers map views to entities so users can refer to what is onscreen conversationally — for example, “summarize this customer thread,” “add this invoice to my expenses,” or “follow up on this task tomorrow.”That is an important distinction from earlier voice-assistant integrations, which often required narrow command structures and explicit invocation phrases. Apple is instead giving developers a way to describe an app’s data and capabilities so Siri, Spotlight and Shortcuts can use them through the system.Developers get testing tools for Siri and app actionsApple is also adding AppIntentsTesting, a framework that validates App Intents through the same infrastructure used by Siri, Shortcuts and Spotlight without requiring UI automation. That matters for enterprise software teams because natural-language app actions need to be testable, repeatable and reliable before they are trusted in production workflows. It also gives developers a path to include Siri and Spotlight behavior in ordinary testing pipelines instead of treating assistant integration as a manual demo feature.The result is a clearer developer mandate: if an app wants to show up well inside Siri AI, it will likely need to expose its data, actions and onscreen context through Apple’s system frameworks. For enterprise SaaS vendors, that could become an important part of Apple-platform competitiveness, especially in categories such as productivity, collaboration, CRM, project management, finance, design, knowledge management, healthcare, logistics and field operations.Apple expands its model stack for developersApple is also using WWDC26 to expand its AI developer stack beyond Siri. The updated Foundation Models framework gives Swift developers access to Apple’s on-device models, Apple models running through Private Cloud Compute and third-party model providers that conform to Apple’s Language Model protocol. That gives developers more flexibility than a single Apple-only model path. Apple says in its Apple Intelligence developer guide that the framework now supports multimodal prompts, Vision tools, dynamic model profiles and evaluations. In theory, an enterprise app could use an Apple on-device model for private or lightweight tasks, call Apple’s Private Cloud Compute for heavier reasoning, or plug in an outside provider such as Claude, Gemini, an open-source model or a company-controlled model through Apple’s model-provider interface.Core AI brings custom models onto Apple siliconApple is also introducing Core AI, an operating system-level framework for running developers’ own models on Apple silicon. For enterprises that do not want sensitive data sent to a cloud model at all, local inference remains one of Apple’s most important advantages. Core AI gives developers a first-party way to deploy custom models with Swift APIs, memory controls and optimized execution on Apple hardware.Evaluations signal a more mature enterprise AI postureThe company’s new Evaluations framework also points at a more mature enterprise AI posture. AI features are difficult to test with conventional unit tests because model outputs can vary. Apple says the framework helps developers define metrics, automatically grade outputs and aggregate statistics. For enterprise buyers, that matters because AI features need measurable reliability, not just impressive demos.Apple is also explicitly addressing the security risks of app agents. WWDC26 developer materials include a session on how developers can mitigate risks to agentic features, covering indirect prompt injection, data exfiltration, unintended actions, threat modeling, user confirmations, authentication and safeguards for App Intents and Foundation Models. That is a notable acknowledgement that AI assistants able to read context and take action across apps create new attack surfaces.Enterprise IT gets new Apple Intelligence controlsFor enterprise IT, Apple also answered some of the governance questions raised by Siri AI’s initial announcement.Its WWDC26 device management documentation describes new management controls for Apple Intelligence, Siri and external intelligence integrations. Supervised devices can use Apple’s intelligence settings configuration to allow or deny features such as Genmoji, Image Playground, Writing Tools, Image Wand, app-specific intelligence in Mail, Notes and Safari, Apple Intelligence Report, Visual Intelligence Summary and on-device-only processing for dictation and translation.Apple says additional management for Siri AI and Visual Intelligence will arrive in later beta releases. That means enterprise controls are not complete yet, but Apple is clearly building Siri AI into its managed-device architecture rather than treating it as an unmanaged consumer feature.Apple also adds controls for outside AI servicesApple is also adding controls for external intelligence services. Its deployment docs describe a configuration for managing external intelligence integrations, including whether users can access outside AI services and whether they can sign in to those services. That will matter for organizations trying to control when employees use Apple’s own models, Apple’s private cloud architecture or third-party AI systems.Those controls could help Apple compete with Microsoft and Google in enterprise AI, but with a different pitch. Microsoft Copilot and Google Gemini are tied deeply to their respective productivity clouds. Apple’s strategy is more device- and OS-centered: make AI available where the user already works, expose app actions through system frameworks and emphasize on-device processing and Private Cloud Compute as privacy advantages.Apple’s privacy pitch remains centralApple’s privacy architecture remains central to that pitch. Siri AI uses Apple Foundation Models on device and through Private Cloud Compute. Apple says in its Siri AI announcement that requests handled by Private Cloud Compute do not store personal data or make it accessible to Apple. For industries such as healthcare, financial services, legal, education and government, that claim may be more important than any single assistant feature.But enterprises will still need more detail before treating Siri AI as a fully governed workplace assistant. Apple’s WWDC26 materials show progress on management controls, external AI restrictions and app-level governance, but the full picture is still emerging. Key questions remain around auditability, retention, work-versus-personal data boundaries, role-based access, compliance certifications, and how much control IT departments will have over Siri’s ability to act inside specific business apps.Availability limits could complicate rolloutAvailability also complicates enterprise rollout. Siri AI is in developer testing now for iOS 27, iPadOS 27, macOS 27 and visionOS 27, with watchOS support coming in a later beta. Apple says the user-facing beta arrives later this year. The feature requires Apple Intelligence-capable hardware, which means many older corporate devices will not support it. Apple also says Siri AI will not initially be available on iPhone and iPad in the European Union, and that Siri AI and other new Apple Intelligence features are not available in China while the company works through regulatory requirements.That means global enterprises may face fragmented deployment, with different feature availability by hardware, operating system, language and region.App Store changes give business software vendors another openingApple also introduced enterprise-adjacent App Store changes that could matter for business software vendors. StoreKit 2 will support subscriptions for groups and organizations, including volume purchasing through Apple Business and Apple School Manager. IT teams will be able to buy and assign App Store subscriptions through device management workflows, while developers will be able to manage subscription availability for organizations. That gives Apple a more business-friendly path for selling app subscriptions into managed environments.The company is also unifying Apple Business Manager, Apple Business Essentials and Apple Business Connect under Apple Business, which Apple describes as a broader platform for Managed Apple Accounts, device management, volume licensing, Admin APIs, Apple Maps locations, Tap to Pay on iPhone, Branded Mail and multi-seat subscriptions.Apple’s enterprise AI strategy comes into focusTaken together, the WWDC26 enterprise story is bigger than Siri alone. Apple is building an AI stack that spans user-facing assistant features, developer integration frameworks, local and private-cloud model infrastructure, AI testing, App Store business subscriptions and device-management controls.The strategic question is whether Apple can make this more than another Siri reset. Developers will need to adopt Apple’s app-intelligence frameworks. Enterprises will need stronger governance assurances. Users will need the assistant to work reliably across real workflows, not just Apple’s own apps.But the direction is now much clearer. Apple is not trying to compete in enterprise AI by launching a standalone chatbot. It is embedding AI into the operating system, making apps addressable through Siri and Spotlight, giving developers model and testing tools, and giving IT teams at least the beginnings of policy controls.For enterprise developers, that means App Intents, App Schemas, App Entities, Spotlight indexing and View Annotations may become core parts of building competitive Apple-platform apps. For enterprise technology leaders, it means Apple’s devices could soon include a native AI assistant that can act across business workflows — if Apple can prove that the privacy, security and management model is strong enough for production use.

  • Cohere open-sources a coding agent that runs on a single H100
    on June 9, 2026 at 9:41 pm

    Engineering teams building agentic coding pipelines now have a concrete open-source alternative to managed models like Claude Fable 5 — one that runs on a single H100. The tradeoff: Cohere’s North Mini Code, which launched Tuesday, generated three times the output tokens of comparable models in independent testing, a verbosity cost that compounds in high-volume production workloads.The new open-source model is a 30 billion parameter mixture-of-experts (MoE) model with 3 billion parameters active per token, built for agentic software engineering including sub-agent orchestration, architecture mapping, code review and terminal work. The model supports a 256,000 token context window with a 64,000 token maximum generation length, and is available on Hugging Face under an Apache 2.0 license.What North Mini Code can doNorth Mini Code targets the full agentic coding stack. Here is what the model does and what it runs on.Software engineering. Cohere built North Mini Code specifically for agentic software engineering, not adapted from a general-purpose base. It has integrated tool-use capabilities and supports interleaved thinking, which Cohere says improves performance across multi-step agentic work.Architecture mapping and code review. North Mini Code can analyze and map systems architecture, surface dependencies and perform code review across large codebases. With a 256,000 token context window, it can hold substantial multi-file projects in a single context pass.Terminal-based agentic tasks. The model is trained for terminal environments, handling shell interactions, package scripts and command-line tooling. Cohere benchmarked it on Terminal-Bench v2, which tests agents in real terminal environments rather than synthetic code generation tasks.How it was builtNorth Mini Code is a sparse mixture-of-experts model with 128 experts, of which 8 activate per token. The compute requirement at inference time is closer to a 3 billion parameter model despite 30 billion total parameters. Nick Frosst, co-founder of Cohere, demoed it running on a Mac Studio via MLX at around 20 gigabytes of RAM, the same machine he uses for his own local coding work.Cohere trained the model through two stages of supervised fine-tuning followed by reinforcement learning with verifiable rewards across more than 70,000 verifiable tasks spanning approximately 5,000 repositories, deduplicated against SWE-Bench. Rather than optimizing against a single agent scaffold, Cohere trained across three. SWE-Agent uses a rich CLI with specialized commands. Mini-SWE-Agent uses a single bash tool with raw shell output. OpenCode uses individually typed tools returning structured JSON. Cohere reports a 10 percentage point gain on OpenCode evaluation from the multi-harness approach while maintaining SWE-Agent performance.Where it fitsNorth Mini Code enters a market that now includes Mistral Devstral Small 2, GitHub Copilot, Cursor, and Claude Fable 5 — each with distinct cost and deployment tradeoffs.Cohere’s primary benchmark comparison is against Mistral Devstral Small 2, a 24 billion parameter dense model. In vendor-reported internal tests, Cohere claims 2.8x higher output throughput and a 30% inter-token latency advantage over Devstral Small 2 in internal tests under identical hardware configurations. Cohere also claims, in its Hugging Face technical post, that North Mini Code outperforms open-source models up to four times its parameter count on its reported benchmarks, including models at 120 billion parameters. Artificial Analysis independently ranks it eighth of 127 comparable open-weight models on output speed at 210 tokens per second, with a time to first token of 0.25 second against a class median of 1.95 seconds. It places 18th of 127 on the Artificial Analysis Intelligence Index. One flag from the same data: the model generated 75 million output tokens to complete the Intelligence Index against a class median of 25 million. In high-volume agentic pipelines, that verbosity compounds into inference cost and latency.”Suddenly people are thinking like hey, am I getting enough economic value out of the tokens from a model?” Frosst said during the launch video. “Local deployment is one way of empowering people and making AI really something that works for them.”GitHub Copilot, Cursor and Claude Code operate on per-usage or subscription pricing with no on-premises option. Anthropic’s Claude Fable 5, now the most capable publicly available managed coding model, runs at $50 per million output tokens. For Frosst, the model is the polar opposite of Fable.”Its small, cost effective, apache 2.0, and locally deployable. This is the way LLMs should go. small, open source, transparent and sovereign, vs large, expensive, proprietary and hegemonic,” Frosst wrote in a post on X.What this means for enterprisesFor teams building production agentic coding pipelines, North Mini Code’s release clarifies a set of decisions that have been forming for months.Purpose-built agentic training is now a baseline to evaluate against. The distinction between models fine-tuned for code and models trained specifically for agentic workflows, with verified tool calls and multi-harness robustness, is now a material factor in pipeline decisions. Any model vendor claiming agentic coding capability should be able to answer whether its training used verifiable agentic tasks or was adapted from a general-purpose base.Verbosity is a hidden pipeline cost that benchmarks do not surface. Artificial Analysis measured North Mini Code generating three times the output tokens of comparable models. That verbosity compounds across inference cost and latency in high-volume pipelines. Throughput testing against actual workload volume is the evaluation step the benchmark rankings skip.The frontier pricing split is now a real architectural decision. Fable 5 at $50 per million output tokens and North Mini Code on a single H100 represent a genuine tradeoff between cost control and data residency on one side, and managed infrastructure overhead on the other. Teams running high-volume agentic coding pipelines should model both cost paths against their actual workload before committing to either.

  • On-device AI agents hit a hard memory limit. Apple’s new architecture routes around it.
    on June 9, 2026 at 5:49 pm

    On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple’s third-generation foundation models, announced at WWDC26, break that constraint by moving the weight set off DRAM entirely.The AFM 3 family was developed in collaboration with Google and spans five models: two on-device and three server-based, all running within Apple’s Private Cloud Compute boundary. The server-side models, including AFM 3 Cloud Pro for agentic tool use and complex reasoning, run on Nvidia GPUs in Google Cloud. The on-device architecture is Apple’s own. AFM 3 Core Advanced is a 20-billion-parameter model that stores weights in NAND flash rather than DRAM.”Instead of forcing the entire model into DRAM, the full model is stored in flash memory,” Apple’s research team wrote. “Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt.”How the architecture actually worksThe memory wall Apple is working around is one every local AI developer runs into. “You can’t put 20B parameters in RAM at any reasonable precision,” Awni Hannun, a researcher at Anthropic and former Apple research scientist, posted on X. “To make it work they are using pretty exotic architecture by today’s standards. A small model predicts from the query (or prompt) which experts to load from NAND into RAM.”That prediction-and-load mechanism has three distinct components, each driven by the hardware constraints of consumer silicon.The full 20B weight set lives in flash, not DRAM. AFM 3 Core Advanced stores its entire parameter set in NAND flash rather than active memory. Standard on-device deployments require the full model to fit in DRAM, which is what caps their parameter counts. Apple’s approach, which it calls Instruction-Following Pruning (IFP) and developed with its own researchers, treats flash as the model’s permanent home and DRAM as a working buffer for whichever experts a given prompt requires.Expert routing happens once per prompt, not per token. In a conventional Mixture of Experts model, a router selects different experts for every token generated — which would require continuous weight movement between flash and DRAM at inference speed. NAND-to-DRAM bandwidth cannot support that. AFM 3 Core Advanced routes once at prompt time, selects a fixed expert set, loads it into DRAM alongside always-active shared experts, and generates all tokens from that same configuration. “The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts,” Hannun wrote.Active parameter count scales from 1B to 4B depending on task complexity. Rather than running a fixed model size for every request, AFM 3 Core Advanced adjusts how many parameters it activates based on what the task requires — 1 billion for simpler operations, up to 4 billion for harder ones, all drawn from the 20-billion-parameter pool in flash. What Apple has and hasn’t disclosedThe architecture paper is detailed on the memory design and sparse activation mechanism. It is less forthcoming on practical deployment constraints.Apple’s profiling tools expose timing but not the metrics that decide production viability. “Energy, memory bandwidth, thermal? Not in the docs,” Marco Abis, who is building Ziraph, a profiler for local AI on Apple silicon, posted on X. “A notable gap, given those decide most of on-device performance.” Abis also did not find a statement in Apple’s documentation — across the Core AI docs, the Foundation Models docs or the Private Cloud Compute security post — of when an on-device request transparently offloads, or whether that routing is visible to the developer or the user. For enterprises that need to document where inference runs, that is a direct compliance problem.Not all the information is currently available. Apple has indicated a full technical report with benchmarks is coming later this summer.What this means for enterprise architectsRegulated industries evaluating agentic AI deployments now have a concrete architectural decision to make.The DRAM wall for on-device agents just moved. Enterprises evaluating agents that need to run without a cloud round-trip now have a 20-billion-parameter local option to evaluate. The constraint shifts from model capability to device hardware.The private/cloud boundary is now an architectural decision, not a default. Simpler requests stay on-device; complex agentic tasks route to AFM 3 Cloud Pro on Private Cloud Compute. Apple has not publicly specified when a request offloads or whether that routing is visible to the developer — a gap that complicates policy decisions for organizations that need to document where inference runs.The agentic server tier depends on Google Cloud. AFM 3 Cloud Pro runs on Nvidia GPUs in Google Cloud. The Private Cloud Compute guarantee covers data privacy. It does not eliminate the Google Cloud dependency for server-side inference.AFM 3 Core Advanced gives enterprises a 20-billion-parameter on-device option that did not exist before WWDC26. Whether it is deployable at scale depends on answers Apple has not yet published. Those details are due in the summer technical report.

AWS News Blog Announcements, Updates, and Launches

    Feed has no items.