Claude for Apple Foundation Models Integration

Anthropic has introduced the ClaudeForFoundationModels Swift package, which integrates Claude into Apple's Foundation Models framework. This integration allows developers to use Claude as a server-side language model while utilizing the same LanguageModelSession API used for Apple's own on-device models, enabling a seamless transition between local and cloud-based inference.

Unified API for Local and Cloud Inference

The primary advantage of this integration is the abstraction layer provided by Apple's LanguageModel protocol. By conforming Claude to this protocol, developers can drive the model using the same session API for responding to prompts, streaming, guided generation, and tool calling.

Developers can decide when to use Claude versus Apple's on-device model based on the task complexity:

On-device models: Optimized for speed, privacy, and offline availability for lightweight tasks.
Claude: Recommended for tasks requiring larger context windows, frontier reasoning, or server-side tools like web search and code execution.

Because both providers use the same LanguageModelSession API, switching between the two requires only swapping the model: argument in the session.

Technical Requirements and Installation

This package is currently in beta and targets the server-side language model API introduced in the OS 27 betas. The following requirements must be met:

Operating Systems: iOS 27, macOS 27, visionOS 27, or watchOS 27 (all in beta).
Development Tool: Xcode 27 (beta).
Authentication: A Claude API key from the Claude Console for development.

To install, developers add the ClaudeForFoundationModels package to their Package.swift or via Xcode's "Add Package Dependencies" menu, then import it alongside the FoundationModels framework.

Key Features and Implementation Details

Model Selection and Capabilities

Model identifiers are managed via the ClaudeModel enum. Constants like .opus4_8 (mapping to claude-opus-4-8) carry specific capabilities, ensuring the package only sends request fields that the model supports. This prevents hard errors caused by sending unsupported fields to the API.

Effort Levels

Developers can pin a specific effort level (low, medium, high, xhigh, max) using the fixedEffort: parameter. This takes precedence over the framework's general reasoning hints. While the framework's reasoning levels stop at "high," the fixedEffort parameter allows access to .xhigh and .max levels.

Structured Output and Streaming

Structured Output: By annotating a type with @Generable, developers can request structured outputs. If the chosen model does not support this capability, the package throws a LanguageModelError.unsupportedGenerationGuide error.
Streaming: The streamResponse(to:) method provides incremental responses. Each element returned is a cumulative snapshot of the response rather than a delta.

Tool Use: Client-side vs. Server-side

Client-side Tools: The framework's standard tools: array is used. The framework invokes these tools on the device when Claude requests them.
Server-side Tools: Tools such as web search and web fetch run on Anthropic's infrastructure. These are configured via ClaudeLanguageModel and surface in the transcript as ClaudeServerToolSegment custom segments.

Authentication and Security

To prevent API key exposure in shipping binaries, Anthropic provides two authentication modes:

API Key (Development): Used for rapid prototyping. Not recommended for production as keys are extractable from binaries.
Proxy (Production): Requests are routed through a developer's own backend. The relay adds the Claude API credential server-side, ensuring the app ships no keys. Custom headers can be provided for the proxy to authorize the caller.

Error Handling and Limitations

The package maps Claude API errors to Apple's LanguageModelError cases. For example, HTTP 429 is mapped to .rateLimited and context-window overflow is mapped to .contextSizeExceeded.

Certain Messages API features are not available because they are not represented in Apple's protocol, including prompt caching controls (though caching is applied automatically), stop sequences, batch processing, and the token counting API.

Community Insights and Perspectives

Industry observers and developers have noted that this move represents a strategic shift by Apple to commoditize LLMs while maintaining control over the user experience and the hardware ecosystem.

This is Apple commoditizing LLMs while keeping control of the UX. They are a hardware company and will keep selling the best machine for AI use.

Other developers have expressed concerns regarding the deployment of API keys and the user experience of requiring users to provide their own keys. Some suggest that this abstraction layer is a long-term play by Apple to make it easier for developers to integrate their other cloud models as they improve their own on-device capabilities.

I think this is just Apple planning for their on-device models getting better... If developers use this for all their code calling an external LLM, then as Apple's model becomes more capable and covers more use cases it'll be easy to switch to it at individual call sites.

Claude for Apple Foundation Models Integration

Claude for Apple Foundation Models Integration

Unified API for Local and Cloud Inference

Technical Requirements and Installation

Key Features and Implementation Details

Model Selection and Capabilities

Effort Levels

Structured Output and Streaming

Tool Use: Client-side vs. Server-side

Authentication and Security

Error Handling and Limitations

Community Insights and Perspectives

References

HN Stories