What regulations does Regulatory Signals cover?

Regulatory Signals continuously monitors GDPR, CCPA/CPRA, the EU AI Act, ePrivacy/cookies rules, and a growing catalog of global privacy and AI obligations. New regulations are added as they are enacted.

How often does Regulatory Signals re-scan my site?

Continuous monitoring runs on a configurable cadence — typically daily for Professional and Enterprise plans, and weekly for Starter. Ad-hoc re-scans can be triggered at any time from the dashboard.

Is my scan data encrypted?

Yes. Data is encrypted in transit using TLS 1.2+ and at rest using AES-256. Scan artifacts are stored with per-tenant isolation and role-based access controls.

Does Regulatory Signals provide legal advice?

No. Regulatory Signals provides automated compliance monitoring, detection, and evidence. Output is intended to assist qualified privacy, security, and legal professionals. It is not a substitute for legal advice.

Can I export compliance evidence for auditors?

Yes. All scans generate an exportable audit trail including timestamped findings, raw evidence, and remediation status — suitable for internal audits, DPIAs, and external assessments.

How does the EU AI Act readiness check work?

The free AI Act check classifies your system into prohibited, high-risk, limited-risk, or minimal-risk categories using Article 6/Annex III criteria, then surfaces the specific obligations that apply.

Do you offer a free trial or free scan?

Yes. You can run a free compliance scan on the homepage and use the AI Act readiness tool at no cost. Paid plans add continuous monitoring, alerting, and evidence export.

AI Agent Evaluator-of-Record

Regulator-Grade AI Agent Certification

Not a vendor self-attestation. An independent third-party evaluation with a signed, tamper-evident certificate and public registry listing.

Enterprise procurement teams, institutional investors, and regulated-industry buyers require third-party evaluation because vendors cannot audit themselves. The same reason you need a Big Four auditor for your financials.

Request Evaluation Browse registry

$4,990 one-time evaluation · $1,990 quarterly retest · 5-suite, 250-test assessment · 90-day cert validity

Why buyers cannot accept a vendor's own eval

Vendor Self-Attestation

✗ Conflict of interest — vendor controls methodology and results
✗ No replay verification — scores cannot be independently confirmed
✗ No tamper detection — report can be modified post-generation
✗ Fails enterprise procurement requirements for third-party audit

Evaluator-of-Record

✓ Independent — no commercial relationship with model providers
✓ Deterministic replay — any party can verify scores match the cert
✓ Tamper-evident — HMAC-SHA256 hash checked on every public view
✓ Public registry — permanent, citable record for procurement RFPs

Frameworks this evaluation addresses

Evaluation produces documented technical evidence relevant to the following frameworks.

Mention of regulatory frameworks does not constitute legal advice. Consult qualified legal counsel for your specific requirements.

Anthropic RSP

Responsible Scaling Policy

Third-party capability and safety threshold documentation required for frontier model deployment decisions.

OAI Preparedness

OpenAI Preparedness Framework

Independent frontier risk assessment supporting safe deployment evaluations.

EU AI Act Art. 50

EU AI Act Article 50

Transparency obligations for general-purpose AI systems require documented technical evidence.

ISO 42001

ISO/IEC 42001 AI Management System

Third-party evaluation evidence for AI management system audit and certification.

5 evaluation suites · 250 test cases total

Each suite runs 50 deterministic test cases against open benchmark datasets.

ae-001Capability Benchmarking50 tests

MMLU-Pro, MATH, HumanEval, and agentic task datasets. Measures raw task capability across domains.

Datasets: MMLU-Pro, MATH-500, HumanEval, AgentBench

Pass threshold

≥70 pass / ≥50 conditional

ae-002Safety Alignment50 tests

HarmBench harmful request categories. Measures alignment quality and refusal on clearly harmful prompts.

Datasets: HarmBench, AdvBench, SafetyBench

Pass threshold

≥85 pass / ≥70 conditional

ae-003HarmBench Jailbreak Resistance50 tests

Adversarial jailbreak attempts from HarmBench. Measures robustness against prompt injection and manipulation.

Datasets: HarmBench Jailbreaks, JailbreakBench, WildJailbreak

Pass threshold

≥90 pass / ≥75 conditional

ae-004Tool-Use Correctness50 tests

Structured tool-calling scenarios. Measures API call format accuracy, parameter correctness, and error handling.

Datasets: ToolBench, APIBench, ToolEval

Pass threshold

≥80 pass / ≥60 conditional

ae-005Refusal Calibration50 tests

Borderline prompts testing calibration. Measures appropriate refusal without over-refusing benign requests.

Datasets: RefusalBench, XSTest, TruthfulQA

Pass threshold

≥75 pass / ≥55 conditional

Full methodology, thresholds, and dataset citations

Pricing

Simple, transparent pricing. No hidden fees.

One-Time Evaluation

$4,990

one-time

5-suite, 250-test evaluation
Signed, tamper-evident certificate
Public registry listing
Replay verification endpoints
90-day certificate validity
Badge embed code
Full methodology report (PDF)

Request Evaluation

Quarterly Retest

$1,990

per quarter

Re-runs all 5 suites
Certificate renewed for 90 days
Registry listing updated
Score delta comparison report
Covers model version and prompt changes
Required for continuous registry status

Add Retest Subscription

Trust architecture

Deterministic replay verification

Every run signed with HMAC-SHA256. Any party can verify scores match.

Tamper-evident signed certificate

Hash recomputed on every public view. Mismatch triggers automatic REVOKED status.

Public registry listing

Permanent public record at regulatorysignals.com/agent-eval-registry.

Independent from vendor

We have no commercial relationship with model providers. Evaluation is conflict-free.

Add the badge to your README or website

After certification, embed a verifiable trust badge that links directly to your public registry listing.

<!-- Agent Eval Pass Badge -->
<a href="https://www.regulatorysignals.com/agent-eval-registry/{your-agent-slug}">
  <img
    src="https://www.regulatorysignals.com/badges/agent-eval-pass.svg"
    alt="Agent Eval PASS — Regulatory Signals"
    width="200"
    height="28"
  />
</a>

Frequently asked questions

What is evaluated in the AI Agent Evaluator-of-Record assessment?

The evaluation covers five suites: (1) Capability Benchmarking — 50 tasks drawn from MMLU-Pro, MATH, HumanEval, and agentic task datasets measuring raw capability. (2) Safety Alignment — 50 prompts from the HarmBench dataset measuring safety behavior on harmful request categories. (3) HarmBench Jailbreak Resistance — 50 adversarial jailbreak attempts measuring robustness to prompt attacks. (4) Tool-Use Correctness — 50 structured tool-calling scenarios measuring API call accuracy and error handling. (5) Refusal Calibration — 50 borderline prompts measuring whether the agent refuses appropriately without over-refusing benign requests.

Why can't we just use our own internal eval?

Enterprise procurement teams, institutional investors, and regulated-industry customers increasingly require third-party evaluation because vendor self-assessments create a conflict of interest. The same reason you need a Big Four auditor for your financials — you cannot audit yourself. Anthropic's RSP explicitly distinguishes between developer-run evals and independent third-party assessments. The EU AI Act Article 50 transparency obligations presuppose documented technical evidence that a third party can verify.

How long does the evaluation take?

Automated suite execution completes in 24–72 hours depending on agent response latency. You receive a draft score report for review, then a signed certificate and public registry listing within 5 business days of submission. Expedited 48-hour processing is available on request.

What does the signed certificate include?

The certificate includes: agent name, version, endpoint domain (truncated for privacy), evaluation date, 90-day expiry, overall score (0–100), per-suite scores, pass/conditional/fail tier, a tamper-evident HMAC-SHA256 hash of the evaluation payload, and a public registry URL. The cert hash is recomputed on every public view — any tampering triggers an automatic REVOKED status.

How does deterministic replay verification work?

Each evaluation run is seeded with a cryptographic nonce and the full prompt payload is HMAC-SHA256 signed before execution. The signed payload is stored immutably. Any third party can call /api/agent-eval/replay/{runId} to re-execute the identical test suite against the original signed prompt set and verify that scores match within a 0.5% tolerance. This proves the certificate reflects a real, unmodified evaluation run.

Which regulatory frameworks does this address?

The evaluation is designed to produce documented evidence relevant to: Anthropic's Responsible Scaling Policy (RSP) — third-party capability and safety thresholds; OpenAI Preparedness Framework — frontier risk assessment documentation; EU AI Act Article 50 — transparency obligations for general-purpose AI systems; ISO 42001 — AI management system audit evidence. Mention of these frameworks does not constitute legal advice. Buyers should consult qualified legal counsel for their specific regulatory requirements.

What is the quarterly retest, and why is it required?

AI agents change rapidly — new model versions, updated system prompts, and fine-tuning can alter safety and capability profiles significantly. Certificates expire after 90 days. The $1,990 quarterly retest re-runs all five suites against the current production version and renews the certificate if scores meet thresholds. This ensures your registry listing always reflects the current production agent, not a point-in-time snapshot.

Enterprise buyers are demanding independent evals now

Procurement teams at banks, healthcare orgs, and government agencies are adding "third-party AI agent evaluation" to their vendor RFPs. Get certified before your deal hits that requirement.

Request Evaluation — $4,990

Questions? [email protected]