Article

AI Sovereignty Is Two Questions — Most Teams Only Answer One

Allan Thraen 19 April 2026 Reading

AI sovereignty usually gets framed as a data-residency question. That half gets attention. The other half — whether the workflow you built today still runs next quarter — often doesn't. Here's how we think about both at umage, and how we actually run it.

In February 2025, the US administration sanctioned the chief prosecutor of the International Criminal Court. Within days, he reportedly lost access to his Microsoft 365 account. Email, documents, the lot.

That is a sovereignty story — but only one kind. A pure continuity story. Nothing had leaked. Nothing had been misused. The service was just withdrawn.

Meanwhile, the sovereignty conversation most compliance teams are already having is the other one: where is your confidential data, and who can see it? That concern is not paranoid either. A US court order in the New York Times lawsuit forced OpenAI to preserve user conversations that would otherwise have been deleted — including from paying customers. If your team is pasting meeting transcripts, internal presentations, draft strategy documents, or unreleased product ideas into a chat interface, those are now assets sitting on somebody else’s disk for reasons that have nothing to do with you.

Both stories are sovereignty. The data one gets serious attention. The continuity one, often, does not — until it bites.

This post is how we think about both, and how we actually run them at umage.

Two risks, not one

The first risk — where your data lives and who can see it — is the one most compliance conversations already take seriously. Personal data leaking into a model training pipeline in a jurisdiction you cannot audit is a real concern. Internal strategy documents sitting on infrastructure you cannot inspect is a real concern. We take both seriously, and so do most of our clients.

The second risk is continuity. What happens to your operations if OpenAI changes its terms, Anthropic deprecates the model your agent depends on, Microsoft gets caught in a new sanctions regime, or a vendor quietly disappears from your market because of an embargo? You can answer every data question perfectly and still be one vendor decision away from a critical workflow not running next Tuesday.

These risks also compound. The vendor you trusted on the data question can become a continuity risk overnight, and the continuity fallback you reach for — “just switch providers” — can re-introduce the data problem you thought you had solved.

The fix is not moving everything local. It is being clear about both risks, and having fallbacks you actually control for the workflows that matter.

Where it matters — and where it doesn’t

We are not purists. Most of our development runs against Claude Code, Codex, and GitHub Copilot. For day-to-day engineering work, the best available model on the best available infrastructure wins every time. We would be slower and worse without it.

Sovereignty shows up in three specific places:

Sensitive data. Client IP, personal data, internal strategy. If the worst-case exposure is “it ends up training a model in a jurisdiction we do not control,” we keep it local.

Business-critical workflows. If a workflow stopping tomorrow would actually hurt, it gets a fallback that does not depend on someone else’s roadmap.

Scheduled, unattended agents. Anything that runs overnight, on a cron, or in a long loop. Per-token pricing surprises and rate limits are fine when a human is watching. They are not fine at 3 a.m. in the middle of a batch job.

Everything else is fair game for the frontier providers. Sovereignty is not a religion. It is a specific answer to specific questions.

You don’t need a frontier model

The default assumption is that running AI in-house means running a worse ChatGPT. It does not.

For most real tasks — summarisation, extraction, classification, code navigation, a specific agent loop — a smaller dedicated model is within a few percentage points of the frontier. Sometimes it is faster. Often it is more predictable. Always it is yours.

A 30B model that does one thing well is usually a better production choice than a 400B generalist that does the same thing and also knows the capital of Burkina Faso. You are not hiring a trivia champion. You are running a workload.

What our kit actually looks like

We run three local rigs, sized for different trade-offs.

A desktop with 128 GB of unified memory. Runs very large models, slowly. Good for experimentation and one-off deep runs.
A dedicated AI rig with two (soon three) used Nvidia 3090s. Runs ~30B models fast because the VRAM is where it needs to be. This is the workhorse.
A backup server with a built-in mobile GPU. Smaller models at a reasonable pace. Picks up slack when the primary rigs are busy.

Used GPUs and a bit of thermal engineering go a long way. The total hardware bill is less than what plenty of teams spend per month on API credits.

On top of that, we lean on an open-source stack that has quietly become excellent:

Open WebUI — a local alternative to the ChatGPT web interface, pointed at whatever models we have running. Most of the team uses it as their daily driver for internal chat.
Vexa — meeting transcription and summarisation, end-to-end local. We maintain a fork with a few extras; worth a look if you want the setup to be less fiddly. A proper write-up is coming.
Invoke AI and ComfyUI — image, video, audio, and music generation. Everything creative-adjacent that we would otherwise be paying per-call for.
Chunkhound — indexing and semantic search across large codebases. Makes local agents useful on real repos instead of toy ones.

The workloads we actually run on this kit:

Training image recognition and the agentic engine for a client’s customer service support system. Both sit on our rigs, and we can rebuild them if something upstream changes.
Overnight batch jobs where local agents review entire websites for SEO, GEO, accessibility, content consistency, and PII leakage. The kind of work nobody wants to pay per-token for at any scale that matters.
Scheduled agents producing weekly digests of Optimizely community activity and the wider AI world. Local model, local cron, local storage.

None of this is exotic. The pieces have existed for more than a year. What changed is that they got good enough.

Scaling past the home lab

Rigs under a desk are fine for R&D, internal tooling, and some client workloads. For production-grade sovereign AI at scale, the next step is renting dedicated GPU capacity from an EU — ideally Danish — datacentre.

You get bare-metal GPUs, a physical location you can point to in a compliance document, a jurisdiction your legal team already understands, and, crucially, infrastructure that is not shared with a vendor whose business model might change. The models you run are not frontier. They also do not need to be. A well-tuned 30B or 70B model on dedicated infrastructure beats a frontier API on the specific task you care about, most of the time, and keeps running regardless of which way the political wind is blowing this quarter.

You can answer the data question perfectly and still be one vendor decision away from a workflow that does not run on Monday.

How to start

If this lands and you want a sovereign fallback for at least your most important workflows by end of quarter, the short version:

Inventory both ends. List the workflows that touch confidential data, and list the workflows that would actually hurt if they stopped tomorrow. The two lists overlap, but not fully. The intersection is where you start.

Pilot one workload locally. One agent, one model, one rented GPU. Open WebUI plus a 30B model on a rented EU dedicated server is a weekend project, not a six-month procurement cycle.

Keep the frontier. We still do most of our development against Claude, GPT, and Gemini. Moving away from them for everything would be slower and more expensive. Moving away from them for the workflows that cannot fail is common sense.

You do not need to win an argument about AI sovereignty. You just need an answer for the Monday morning when the argument stops being hypothetical.