Skip to content

Two years ago, running an LLM on-device was a science project. In 2026 it is a deploy target. Both Google and Apple ship first-party on-device models with public APIs, the RAM ceiling finally fits 2-4B parameter models, and power draw is defensible for features you run a few times a minute.

What we ship on-device today

  • Smart reply drafting in chat apps · 80-150ms, no spinner.
  • Receipt / invoice field extraction · fully offline, GDPR-trivial.
  • Photo caption + search index on-device.
  • Meeting transcription + bullet summary (with Whisper.cpp + a local summariser).

Gemini Nano · where it wins

  • Available on a much wider device matrix · Android 15+ with 8GB+ RAM.
  • AICore handles model updates without a playstore ship.
  • Summarisation + rewriting APIs are stable and predictable.
  • Works better with languages beyond English than Apple Intelligence on mid-range hardware.

Apple Intelligence · where it wins

  • A17 Pro / M-series only · narrower matrix, but the models are meaningfully better.
  • The Writing Tools API is a drop-in replacement for a cloud call · zero glue code.
  • Private Cloud Compute fallback is automatic and audit-friendly.
  • Foundation-models API surface is more coherent · one SDK, not three.

Hungarian-language reality check

Both models underperform on Hungarian vs English. In our evals Gemini Nano gives ~85% acceptable-output rate on Hungarian summarisation, Apple Intelligence ~80%. For comparison, Claude 3.7 Haiku is ~97%. For Hungarian-heavy features we keep a cloud fallback for now.

When we still call the cloud

  1. Any agentic flow with tool calls · on-device tool-use is fragile.
  2. Long-context tasks (> 8k tokens effective) · on-device context windows are still small.
  3. Safety-critical outputs · medical, legal, financial advice · we route to a policy-gated cloud call with audit logs.
  4. Multilingual features where non-English quality matters for conversion.

Always design the UI for a cloud fallback from day one. The right on-device feature feels instant when the model is present and works anyway when it is not.

ShareXLinkedIn#
Dezso Mezo

By

Dezso Mezo

Founder, DField Solutions

I've shipped production products from fintech to creator-tooling · for startups and enterprises, from Budapest to San Francisco.

Keep reading

RELATED PROJECTS

Would rather build together?

Let's talk about your project. 30 minutes, no strings.

Let's talk