Provider Sentinel · AI provider behavior monitoring
Know when the AI your product depends on changes.
Provider Sentinel creates sealed behavior baselines for closed AI providers and produces evidence packs when observable behavior drifts.
First verify the test stayed fixed. Then report what changed.
View 1
Provider Matrix
See which AI providers are stable, drifting, blocked, or still being calibrated.
Provider Sentinel verifies the test did not change before it reports that provider behavior changed.
OpenAI
gpt-4.1
Provider-declared- Last run
- 2026-05-01
- Drift state
- Stable
- Canaries
- 3 Gold / 151 Silver
- Evidence
- Verified
Anthropic
claude-sonnet-4-6
Provider-declared- Last run
- 2026-05-01
- Drift state
- Stable
- Canaries
- 0 Gold / 183 Silver
- Evidence
- Verified
Gemini
gemini-2.5-pro
Provider-declared- Last run
- 2026-05-01
- Drift state
- Stable
- Canaries
- 0 Gold / 112 Silver
- Evidence
- Verified
Mistral
mistral-large-latest
Alias-tracked- Last run
- 2026-05-01
- Drift state
- Stable
- Canaries
- 0 Gold / 279 Silver
- Evidence
- Verified
Status legend: Stable · Drift detected · Context changed · Noisy · Blocked · Unknown
Model tracking: Provider-declared Alias-tracked Pinned model
OpenAI CCS001, Anthropic
CCS002, Gemini
CCS003, and Mistral
CCS004 canary counts are drawn from current
artifacts.
Alias-tracked models
Some providers expose movable aliases such as latest. Provider Sentinel can monitor those aliases intentionally. If the
alias resolves to a different model or provider-declared identity, the
report treats that as provider metadata change or alias-contract
drift, not as hidden weight proof.
Alias tracking is useful. It is simply not the same as pinned model identity.
How it works
A closed-loop measurement pipeline: declare the test, seal the context, rerun it later, and report what changed.
Provider × Model × Suite × Time
-
Step 1
Create a baseline
Run a declared probe suite against a provider/model under fixed settings.
-
Step 2
Seal the context
Record the probe suite, adapter config, comparator policy, and capture-policy version.
-
Step 3
Rerun later
Repeat the same probes under the same sealed context.
-
Step 4
Report drift
Generate an evidence pack showing what stayed stable, what changed, and why the result is valid.
View 2 · Provider Detail
OpenAI / gpt-4.1
What exactly is happening with this provider/model?
Current status
Stable
- Last baseline
- —
- Last check
- —
- Suite
- sentinel-behavioral-v1
Comparison context
- Model tracking
- Provider-declared
- Requested model
- gpt-4.1
- Declared model
- captured when provider exposes it
- Capture policy
- openai-capture-policy/v1
- System fingerprint
- captured when provider exposes it
- Adapter config
- openai-v1.adapter
- Comparator policy
- behavioral.v1
Behavior summary
- 31 probes checked
- 31 visible outputs stable
- 31 finish reasons stable
- 31 token counts stable
- 31 logprob digests stable
- Raw response envelopes changed as expected
Provider Sentinel records the requested model selector and, when exposed, the provider-declared model identity. Alias movement is reported separately from behavioral drift.
| Surface | Probes | Stable | Drifted | Notes |
|---|---|---|---|---|
| Exact canaries | 3 | 3 | 0 | ExactBytes comparator, deterministic prompts. |
| Arithmetic / constrained reasoning | 4 | 4 | 0 | Numeric equality on parsed answer. |
| Transform & canon | 7 | 7 | 0 | Lower/upper/reverse, short canonical sentences. |
| Schema conformance | 4 | 4 | 0 | JSON schema validation under same temperature. |
| Instruction hierarchy | 3 | 3 | 0 | System-over-user precedence holds. |
| Extraction | 4 | 4 | 0 | Span extraction stable under fixed seed. |
| Classification | 3 | 3 | 0 | Label class stable; logprob distribution captured. |
| Refusal boundary | 2 | 2 | 0 | Refusal class stable; phrasing not asserted. |
| Multi-turn | 1 | 1 | 0 | Multi-turn haiku probe; visible-text comparator. |
Raw provider envelopes can change because providers include per-request metadata. Behavioral drift is classified by the declared comparator policy, not by raw envelope changes.
Latest evidence pack
OpenAI gpt-4.1 — —
View 3
Drift Timeline
Example timeline showing how a drift event would appear. Not an observed production incident.
- May 01 BEH001 Baseline accepted
- May 01 BEH002 Stable
- May 02 CHECK001 Stable
- May 03 CHECK002 Stable
- May 04 CHECK003 Drift detected Example drift scenario
This sample event demonstrates the UI; it is not an observed production incident.
Context matched
- ✓ same probe suite
- ✓ same adapter config
- ✓ same comparator policy
- ✓ same capture policy
- ✓ same system fingerprint when exposed
Changed
- • 2 exact canaries changed
- • 1 Silver creative canary changed
- • distribution changed on 7 probes
Outcome Evidence pack generated
Non-claim · Observable drift is not proof of hidden weight mutation.
View 4
Probe Explorer
Which exact behavior surface changed?
Each probe is evaluated by its declared comparator. Exact probes, schema probes, refusal probes, and creative canaries are not judged the same way.
| Probe | Category | Comparator | Verdict | Tier |
|---|---|---|---|---|
› p_pongBaseline
pong
Current
pong
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Canary | ExactBytes | Stable | — |
› ccs_haiku_revise_noun_swapBaseline
A spring rain falls / over the rusted iron gate / a thrush remembers
Current
A spring rain falls / over the rusted iron gate / a thrush remembers
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Creative | Text canary | Stable | Silver canary |
› ccs_haiku_revise_noun_swap_v2Baseline
frost on the hinges / the postal box stays empty / wind reads its own name
Current
frost on the hinges / the postal box stays empty / wind reads its own name
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Creative | Text + distribution | Stable | Gold canary |
› hierarchy_system_over_userBaseline
class: refused-per-system
Current
class: refused-per-system
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Instruction hierarchy | SemanticClass | Stable | — |
› schema_status_objectBaseline
{"status":"ok","code":200}
Current
{"status":"ok","code":200}
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Schema | SchemaValid | Stable | — |
› refusal_disallowed_contentBaseline
class: refusal
Current
class: refusal
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Refusal | RefusalClass | Stable | — |
› extract_invoice_totalBaseline
$1,248.30
Current
$1,248.30
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Extraction | SpanEqual | Stable | — |
› classify_sentiment_posBaseline
positive
Current
positive
› Technical commitmentsdigest collapsed by default local verification available signed envelopes coming next | Classification | LabelEqual | Stable | — |
Digest details are collapsed under "Technical commitments" inside each probe row.
View 6
Creative Canary Results
Creative prompts are normally noisy. Provider Sentinel searches for prompt/seed pairs that reproduce under sealed controls. Stable pairs become creative canaries. If they later break under the same context, that is an early signal of provider-service behavior drift.
Gold and Silver are evidence tiers, not quality scores.
Gold canary
Visible text + distribution stable.
Silver canary
Visible text stable; distribution unstable or unavailable.
Noise-floor / rejected
Text unstable, fingerprint changed, context mismatch, or provider failure.
OpenAI gpt-4.1
Search run: CCS001
- Gold
- Silver
- Rejected
- 3
- 151
- 146
300 prompt/seed pairs tested
Anthropic claude-sonnet-4-6
Search run: CCS002
- Gold
- Silver
- Rejected
- 0
- 183
- 117
300 prompt/seed pairs tested
Gold unavailable: provider does not expose distribution artifacts in the current adapter/API surface.
Gemini gemini-2.5-pro
Search run: CCS003
- Gold
- Silver
- Rejected
- 0
- 112
- 188
300 prompt/seed pairs tested
Silver only — provider does not expose a distribution artifact, so Gold is unavailable.
Mistral mistral-large-latest
Search run: CCS004
- Gold
- Silver
- Rejected
- 0
- 279
- 21
300 prompt/seed pairs tested
Silver only — distribution artifact unsupported across all pairs in this run, so Gold is unavailable.
Creative canary stability is an observed provider-service property under declared controls. It is not proof of provider weight identity or future guaranteed reproduction.
Why the comparison is trustworthy
Provider Sentinel does not only compare model outputs. Before reporting drift, it checks that the test itself stayed the same.
- The same probe suite was used.
- The same provider settings were used.
- The same comparison policy was used.
- The same capture policy was used.
If the capture policy changes, Provider Sentinel treats the comparison as a context mismatch — not provider drift — until the change is explicitly acknowledged.
Trust feature
Capture-policy versioning
Capture-policy versioning records the exact rules used to turn raw provider responses into behavioral observations, so parser or adapter changes cannot silently look like provider drift.
We verify the measuring instrument did not change before we say the provider changed.
View 5
Evidence Pack Viewer
Can I hand this to security, compliance, or vendor review?
Artifact title
OpenAI GPT-4.1 Provider Service Behavior Fingerprint
What this pack contains
- ✓ run context
- ✓ probe suite identity
- ✓ provider/model settings
- ✓ capture-policy version
- ✓ observations
- ✓ receipts
- ✓ verdicts
- ✓ drift report
- ✓ non-claim statement
What this proves
- the same test was run
- the provider response was captured
- the comparison policy was fixed
- the behavior stayed stable or changed
- the evidence pack has not been altered
What this does not prove
- model weights changed
- provider acted maliciously
- provider internals are known
- all possible behavior is covered
› Technical commitments
- Probe suite fingerprint
- captured
- Adapter config fingerprint
- captured
- Comparator policy fingerprint
- captured
- Capture policy
- openai-capture-policy/v1
- Observation root
- captured
- Local verification
- available
- Signed envelopes
- coming next
Evidence-pack summaries can report whether a run used one capture policy version or requires review.
{
"capture_policy_version": {
"status": "homogeneous",
"value": "openai-capture-policy/v1"
}
} The status field is a planned shape. Today, evidence packs record the capture-policy version per receipt; the run-level summary field is coming next.
homogeneous
Every receipt in the run used the same capture policy.
heterogeneous
Capture policy changed within the run; comparison requires review.
absent
Legacy pre-versioning capture.
What teams use Provider Sentinel for
-
Catch provider drift before debugging your own app.
-
Keep evidence when external AI services change behavior.
-
Track refusal and instruction-boundary behavior over time.
-
Export evidence packs for engineering, security, compliance, or vendor review.
Proof boundary
Provider Sentinel detects observable provider-service behavior drift under declared conditions. It does not prove hidden model-weight changes without provider or execution attestation.
Provider Sentinel does prove
- What probe suite was run
- What provider settings were used
- What capture policy projected the response
- What behavior was observed
- Whether later behavior differs under the same sealed context
Provider Sentinel does not prove
- Model weights changed
- A provider acted maliciously
- Internal provider infrastructure changed
- All possible behavior is covered
Provider service behavior drift is not proof of hidden weight mutation without provider or execution attestation.
Technical detail
› Capture-policy versions, per provider
Provider Sentinel receipts include an optional
capture_policy_version field. Each provider adapter declares its capture policy.
- openai-capture-policy/v1
- anthropic-capture-policy/v1
- gemini-capture-policy/v1
- mistral-capture-policy/v1
Evidence packs summarize the run with a status of
homogeneous,
heterogeneous,
or absent.
› Gemini guardrail
Gemini may return hidden reasoning-like parts alongside answer text. Provider Sentinel's Gemini adapter applies a deterministic capture policy that keeps answer text and excludes thought-marked parts.
The policy is versioned, so if those projection rules change, future comparisons can detect the adapter-policy change rather than misclassify it as provider drift.
› Comparison contract
For behavioral comparison, Provider Sentinel requires the same probe suite, adapter config, comparator policy, and capture-policy version.
If
capture_policy_version differs between two runs, response-text-layer comparison is treated
as a context mismatch, not provider drift.
› Alias-tracked drift semantics
When a provider alias is monitored, Provider Sentinel separates:
- · same alias + same declared model + behavior changed → behavior drift under the same declared model
- · same alias + changed declared model + behavior changed → alias-contract drift
- · same alias + changed declared model + behavior stable → provider metadata changed
- · changed alias/config → context mismatch
Become a design partner
Pilot Provider Sentinel against the AI provider your product depends on. Sealed baselines, scheduled reruns, evidence packs you can hand to a reviewer.