mra.studio
~ $cat./research/stablecoin-regtech.md
Research · 2026-03-30 · 11 min

Stablecoin compliance, automated

An evaluation harness for LLM-driven Travel Rule and beneficial-owner checks at the speed of settlement.

StablecoinTravel RuleRegTechEvals

Regulated stablecoins are quietly becoming the rails for cross-border payments that don’t need a card network. The compliance overhead, however, has not gone anywhere. If you are moving USDC between a Canadian counterparty and a European one, you still owe the same Travel Rule attestations, beneficial-owner checks, and source-of-funds reasoning as if you were sending a wire. The difference is that the rail is faster than the compliance step.

That mismatch is where AI helps. Compliance steps that used to take a human five minutes can be drafted by an LLM in seconds, with the analyst’s time reserved for the cases where automation legitimately doesn’t know enough to decide. The trick is building an evaluation harness that lets you trust the model without becoming the regulatory finding.

What the Travel Rule actually asks

FATF’s Recommendation 16 (the “Travel Rule”) requires originators and beneficiaries of virtual-asset transfers above a threshold to exchange identifying information. In practice this means name, address, account or wallet identifier, and a reference to a verified identity record. Different jurisdictions layer additional requirements on top: beneficial-owner attestation, purpose-of-payment categorisation, source-of-funds reasoning.

The hard part isn’t the data; it’s the reconciliation. Counterparties send messages in different formats, with different levels of completeness, often with names transliterated awkwardly between scripts. A human used to spend their day matching the message to the wallet to the regulatory record. An LLM can do most of that match for you, if you let it.

The evaluation harness

Before we let any model touch a Travel Rule message we run it through an eval that we maintain in Vertex AI Eval. The corpus has a few thousand labelled cases drawn from public examples, anonymised production data, and synthetic edge cases (deliberate spelling errors, sanctioned counterparties, deliberately incomplete beneficial-owner records).

Each case has a ground-truth disposition: auto-approve, escalate, or reject. The harness runs the candidate model and emits a confusion matrix with three diagonal cells we care about and six off-diagonal cells we hate. We do not deploy a change unless the off-diagonal counts get smaller.

We also score on calibration. A model that auto-approves 95% of cases and is right 95% of the time isn’t the same as a model that auto-approves 50% of cases and is right 99% of the time. The second one is a better operator, even if its headline accuracy is lower. Eval reports calibration alongside accuracy.

Beneficial owner reasoning

Beneficial-owner verification is the most LLM-amenable surface. The evidence is heterogeneous (corporate registries, government gazettes, scanned articles of association) and the answer is structured: a list of natural persons with ownership percentages and a chain. Document AI extracts the fields; Gemini Pro reasons over them; the output is a typed object the rest of the system consumes.

Where the model isn’t confident—say a holding company introduces a layer the registry doesn’t resolve—the case kicks to a human with a one-line summary of what stopped the chain. The human spends their time on the gap, not on the boilerplate.

Source of funds, structured

Source of funds is the surface where models hallucinate most. It’s tempting to ask “is this consistent with a legitimate business?” and let the LLM riff. We’ve learned to constrain it harder. The model produces a bounded narrative with citations to specific source documents, and the narrative is checked against a checklist of red-flag categories (cash-intensive industries, high-risk geographies, recent rapid growth without funding events). Every red flag must be either confirmed by the document or recorded as “not present”.

The checklist is what makes this defensible. A regulator examining the case six months later sees the structured output, the red-flag enumeration, and the supporting documents—not a free-form essay.

What stops a stablecoin transfer

Across the cases we’ve seen, three patterns trigger an escalation:

  • The Travel Rule message is incomplete or inconsistent with the wallet’s history.
  • The beneficial-owner chain bottoms out before resolving to natural persons.
  • The source-of-funds narrative cites documents the model can’t verify against the registry.

Roughly two thirds of escalations are the first pattern, which also has the highest auto-resolution rate after the analyst pulls one missing field. The second pattern is rare but expensive when it happens. The third pattern is the one we keep iterating on; the eval set has grown disproportionately around it.

What we will not do

We will not auto-approve a Travel Rule message on the strength of an LLM’s reasoning alone. The model gets a vote, the rules engine gets a vote, and a human gets a vote on the cases the rules engine flags. Three votes, two of them deterministic, one explainable. That is the only configuration that has survived contact with a regulator examination so far.

We will not train on customer transaction data without an explicit opt-in. Most of our improvement signal comes from synthetic data and the public corpus; the marginal lift from real data is not worth the trust cost.

What’s next

We’re building a regulator-facing reproducer: a small CLI tool that takes a case ID, replays the exact prompt, retrieval, and model version, and produces the same verdict to within floating-point noise. The point is not to convince a regulator the model is right. It is to show that nothing is hiding.