ClarityLift · Service Standards
The criteria under which we would shut this product down.
Every Startvest product publishes its kill criteria. Specific thresholds. Written in advance. So you know we would sunset ClarityLift before damaging your compliance posture, not after.
This page is the inverse of growth-at-all-costs. It is the promise that we would walk away rather than ship a compromised product. Hold us accountable to it.
Quality and accuracy
Classifier error rate floor (overall classification agreement)
ThresholdIf the rolling 90-day classifier-vs-rule agreement rate (sampled via the reproducibility job) drops below 80 percent, we suspend AI-classified signal surfacing for the affected signal types and run rules-only until the rate recovers above 85 percent for two consecutive 30-day windows.
WhyThe classifier is supposed to enrich, not replace, the rule-based path. A sustained sub-80 percent agreement rate means the LLM is contributing more drift than insight. The 5-point hysteresis (80 to suspend, 85 to resume) prevents flip-flopping at the boundary. Numbers are first-pass defensible thresholds; the reviewer should adjust them on PR if the empirical baseline looks different.
Classifier false-positive rate on adverse signals (friction, retention)
ThresholdIf sustained false-positive rate exceeds 25 percent over a calendar quarter on any of the adverse signal types, we sunset that signal type.
WhyA signal that fires more wrong than right is worse than no signal. It pulls manager attention into manufactured problems and discredits the real ones.
Sub-floor leakage
ThresholdAny single confirmed instance of team-level data appearing for a team below the 10-person floor triggers a Sev 1 public incident. Three confirmed instances in a calendar year sunset the product.
WhyThe floor is the load-bearing privacy commitment. One verified breach is recoverable with public disclosure. Three is structural failure that no remediation closes.
Insight quality
ThresholdIf the rolling 90-day customer rating of insight usefulness drops below 40 percent useful, we pull the LLM-generated insight surface and run rules-only until quality recovers.
WhyLLM insights presented as actionable but consistently wrong are the same trust failure as wrong audit findings. Pulling the feature is preferable to shipping noise.
Privacy and regulatory
Regulatory change response window
ThresholdIf a material regulatory change (EU AI Act, state-level employee-monitoring law, NLRB guidance shift) requires a product change to remain compliant, we either ship the change within 90 days or sunset operations in the affected jurisdiction.
WhyCompliance products that lag the regulation become legal liabilities for the customer. Faster than 90 days is fine. Slower is not.
Data subject request fulfillment
ThresholdIf we cannot fulfill a verified data subject access or deletion request within 30 days, we publicly disclose the failure and engage outside counsel to resolve. Two unresolved DSAR failures in a calendar year sunset the product.
WhyThe retention-zero architecture is supposed to make DSAR trivial because there is no individual data to retrieve or delete. A failure to deliver on that posture means the architecture has broken.
LLM provider boundary breach
ThresholdIf a provider sub-processor is found to be storing or training on customer-routed prompts in violation of the published sub-processor list, we cut the integration within 7 days. If no compliant alternative exists, we sunset the LLM-classifier path and run rules-only.
WhyThe LLM provider boundary is the load-bearing claim of the trust architecture. A breach there is the same class of failure as a sub-floor leakage.
Customer trust
Customer satisfaction floor
ThresholdIf customer-reported integrity concerns exceed 10 percent of the active customer base in a single quarter, we trigger an external review. If the external review confirms a structural integrity problem, we sunset the affected feature or the product as warranted by the finding.
WhyCustomers pay attention to integrity. When they raise concerns at scale, the data is more reliable than internal self-assessment.
Audit finding response
ThresholdAny finding from the annual independent third-party audit that is not resolved or formally accepted with mitigation within 180 days triggers public disclosure. Critical findings unresolved at 365 days sunset the product.
WhyThe annual audit is the structural commitment that ties our revenue to the quality of our work. Letting findings rot kills the commitment.
Availability and infrastructure
LLM provider availability and fallback
ThresholdPrimary LLM path (Azure OpenAI eastus) targets 99.5 percent monthly availability for classification calls. If primary availability drops below 99 percent for two consecutive months, we activate the documented Anthropic fallback path within 30 days. If neither path can meet the 99 percent monthly bar for a calendar quarter, we run rules-only until a compliant provider is restored.
WhyThe product still works without the LLM — rule-based signals and rule-based insights are the always-on baseline (see /methodology). The LLM is enrichment. Losing it temporarily degrades insight quality but does not break privacy or the floor. Numbers reflect Azure OpenAI's published SLA tier; 99 percent for two months is a real signal something has structurally changed at the provider, not noise.
Application uptime
ThresholdApplication uptime target is 99.5 percent monthly (excluding pre-announced maintenance windows). Three consecutive months below 99 percent monthly uptime triggers a public root-cause review and credit to active customers per the MSA. Sustained uptime below 99 percent for a calendar quarter triggers an internal viability review.
WhyUptime is table stakes for a daily-digest product. Numbers are deliberately modest to avoid the aspirational-SLA failure mode — 99.5 percent is roughly 3.6 hours of unplanned downtime per month, which is achievable on a single-region App Service deploy without overpromising.
API latency
ThresholdP95 latency on the public dashboard API is under 1500 ms over a rolling 7-day window. Sustained P95 above 3000 ms for two consecutive 7-day windows triggers a performance incident and a public note on the next quarterly trust report.
WhyLatency above 3 seconds on a dashboard read is the user-perceptible threshold where admins start to disengage from the daily check. The rolling window prevents a single bad deploy from triggering, while two consecutive windows catches a real regression.
Operational
Pricing-rigor breach
ThresholdIf commercial pressure ever requires us to ship under terms that violate the Trust Principles ('unlimited audits,' AI without review gates, customer attestation as proof), we sunset rather than ship. We do not amend the Trust Principles to enable a deal.
WhyThe Trust Principles are constraints, not goals. Loosening them to close revenue is the failure pattern this whole framework defends against.
Capacity overrun
ThresholdIf we onboard customers faster than we can meet the published response SLAs (privacy email, AUP enforcement, incident response), we pause new-customer onboarding until staffing catches up.
WhyThe SLAs are commitments, not aspirations. Breaking them quietly because we are growing is the volume-business failure mode.
How a sunset actually happens
If a kill criterion above is triggered, the process is:
- Founder confirms the trigger within 7 business days. The audit log entry citing the criterion lands the same day.
- Customers are notified directly within 14 business days. Notice covers the trigger, the timeline, and the data migration / refund path.
- Public disclosure on
claritylift.ai/incidentswithin 30 days. - Operational sunset within 90 days unless the criterion specifies a different window. Customer data is exported and deleted per the published retention policy.
- Quarterly trust report covers the full sunset event in detail, including what we missed about the failure mode and what changes to the Trust Principles or runbooks followed.
What this page is not
Marketing copy. Aspirational SLAs. “Best effort” language. Soft commitments that quietly slip.
The thresholds above are specific because vague kill criteria are an excuse to never trigger. A criterion you cannot operationalize is not a criterion. It is a vibe.
Hold us to these. If we ever soften the thresholds without a paired update to the Trust Principles and a new version log entry on the audit log, that is the failure pattern itself.
Version
Current version: v1.1
Effective date: 2026-04-29
Threshold changes ship with a version bump on this page and a paired entry in the changelog below. Soft-loosening a threshold without a version bump and a corresponding Trust Principles update is the failure pattern this page exists to defend against.
Changelog
v1.1 · 2026-04-29
Added explicit thresholds for classifier overall agreement floor, LLM provider availability and fallback SLA, application uptime, and API latency. Made the version + changelog structure explicit per the framework's HIGH-SV-METHODOLOGY-VERSIONED rule pattern. Numbers throughout this revision are first-pass defensible thresholds; reviewers may adjust on PR if the empirical baseline differs.
v1.0 · 2026-04-25
Initial publication. Quality and accuracy thresholds (false-positive rate, sub-floor leakage, insight quality), privacy and regulatory thresholds (regulatory response window, DSAR fulfillment, LLM provider boundary), customer trust thresholds (customer satisfaction floor, audit finding response), operational thresholds (pricing-rigor breach, capacity overrun), and the sunset mechanics.
Related
- Startvest Trust Principles
- ClarityLift methodology
- ClarityLift transparency receipt
- Privacy architecture
Concerns about whether we are living up to these standards: integrity@startvest.ai