ClarityLift · Methodology · v1.0

How a message becomes a team-level signal.

Public methodology, written so a procurement reviewer or security engineer can verify each step. Not the source code. The methodology. Hidden methodologies are how shortcuts get hidden.

Version 1.0·Last updated 2026-04-25·Cross-checks against /transparency

The pipeline

Every inbound message follows the same path. The path runs in this order, every time. Each step is described below.

Webhook arrives at our handler.
DM rejection at ingest. Direct messages drop here.
Channel opt-in check. Disabled channels drop here.
Per-employee consent gate. Non-consenting senders drop here.
Silence-baseline counter increments. Metadata only.
Cross-channel dedup. Reposts of the same text drop here.
Fast classifier. Regex-based, rule-only, no I/O.
LLM classifier. If the fast classifier marks the message as signal-worthy.
Floor check at write. Sub-10 teams produce no row.
HealthSignal row persisted with seven fields. No text.

Every step has a deliberate purpose and a deliberate place in the order. Reordering changes the privacy posture. The order itself is part of the methodology.

The classifier path

Step 01

Webhook arrives at our handler.

Slack and Microsoft Teams send platform events to our webhook endpoints. The handler ACKs within 3 seconds and schedules classification asynchronously, so platform retries do not pile up. A durable queue (`CL_WebhookEvents`) holds metadata about the event in case the worker crashes mid-classification. The queue stores no message text.

Step 02

DM rejection at ingest.

The first filter checks the channel type. If it is a 1:1 DM or a group DM, the handler returns 200 and drops the event before any classifier touches it. DM scopes are not requested at OAuth install in the first place; this filter exists as a redundant guard in case a platform delivers something we did not ask for. A CI rule fails the build if DM scope strings (`im:*`, `mpim:*`, `Chat.*`) are added to any manifest.

Step 03

Channel opt-in check.

A channel is analyzed only if a workspace admin has explicitly enabled it on the dashboard. Disabled channels drop here. The default state for a newly connected channel is disabled. Disabling a channel stops analysis within 30 seconds and clears the durable queue of pending jobs for that channel.

Step 04

Per-employee consent gate.

For organizations on consent-mode (the default), each sender is resolved against the workspace member table and their consent status is checked. Non-consenting senders drop here. Communication-mode skips this step at the individual level (admin-consent + handbook policy + announcement substitutes) but still allows opt-out via /my-data. Every flip on the consent gate is audit-logged.

Step 05

Silence-baseline counter increments.

Channel-volume metadata gets a tick. Aggregate time-series, no message content. Used by the silence classifier (a cron, not the per-message pipeline) to detect channel-volume drops. The counter is metadata-only. No sender id, message id, or text.

Step 06

Cross-channel dedup.

If the same message body has been seen in another opted-in channel within the last hour, this copy drops. Enforced via a per-org Levenshtein-similarity cache (≥ 85 percent match). The cache prevents a reposted announcement from inflating signal counts. The dedup is per-message-text, not a temporal-correlation defense.

Step 07

Fast classifier.

Regex-based rule classifier. Runs in process, no network I/O, no LLM. Marks the message as signal-worthy or not based on token patterns, escalation language, communication features. Most messages are not signal-worthy and exit the pipeline here. Fast classifier output is cached briefly so duplicate classification calls within a short window short- circuit.

Step 08

LLM classifier (only when needed).

For messages the fast classifier flagged as signal-worthy, the message body is passed once to an LLM for classification. The LLM classifies the message into one of the six signal types (friction, disengagement, communication, culture, retention, alignment) plus a severity and a confidence. The LLM provider is named on /privacy-architecture. Provider configuration:

Azure OpenAI. Default. Prompts stay inside Microsoft's Azure boundary; OpenAI as a company does not receive the data. 30-day abuse-monitor retention only when safety systems flag a prompt; near zero in practice for workplace conversation.
Anthropic. Available under a contracted zero-data-retention agreement. Used when the customer prefers Anthropic and the ZDR contract is in force.
OpenAI direct. Phase-0 legacy path. 30-day default retention. Production workspaces are not on this path.

The provider call uses retention-zero request options where available (OpenAI store: false). The moment the classifier returns, the message body drops out of scope and is garbage-collected. No row in our database ever contains the message text.

Step 09

Floor check at write.

Before the classified signal can be persisted, a TypeORM subscriber checks the team's membership count. If the team has fewer than 10 members, the insert is rejected and the signal is dropped. This is the k-anonymity floor, enforced at the database write layer, not the UI. A team of 9 cannot produce a stored signal. The CI rule `CRIT-C-MIN-GROUP-SIZE` fails the build if MIN_GROUP_SIZE is altered.

Step 10

HealthSignal row persisted.

The seven-field row that lands in the database: organization id, team id, channel id, signal type, severity, confidence, detection timestamp. That is it. No user id. No message id that resolves to a user. No text. No quote. No author. The schema has no column for any of those, and a CI rule blocks adding them.

The six signal types

The LLM classifier emits one of six signal types. Each answers a different question about the team.

Friction

Recurring cross-team tension, escalation frequency, blame language. Computed at the team level only; never per-individual.

Disengagement

Declining participation across channels the employee normally engages in, withdrawal from strategic conversations, response shortening. Aggregate across the team, not per-person.

Communication health

Cross-functional dialogue frequency, information bottlenecks, response-time degradation, siloing patterns.

Culture drift

Tone-pattern shifts, values-alignment signals, psychological-safety indicators. Drift over time relative to a team's own baseline, not a peer comparison.

Retention signals

Team-level stability indicators based on communication-pattern aggregates. Surfaces dynamics that historically correlate with team-level turnover. Never an individual flight-risk score; the schema cannot represent one.

Alignment

Direction of team activity relative to admin-defined organizational goals. Reinforce / contradict / neutral. Opt-in feature; off by default until an admin defines goals at /dashboard/strategy.

Severity for each is one of low / medium / high. Confidence is a 0-1 score from the classifier. Both fields land on the HealthSignal row.

Aggregation rules

Every output the dashboard surfaces is bound by the same aggregation rules. These are the floor commitments that the classifier path's step 9 enforces at write time.

Minimum team size of 10. Teams below 10 produce no signal at all. Not a hidden output. Not a dimmed reading. No row.
Cohort filtering re-applies the floor. If an admin filters the dashboard by HRIS metadata (department, team type, tenure bucket), the resulting cohort must still be ≥ 10 members. Cohorts below the floor render “below floor” with no aggregate values.
Cross-customer aggregation is opt-in only. By default, your data does not contribute to platform-level benchmarks. See /transparency § 5 for the disclosure.
Customer-level floor of 10 applies to platform-level benchmarks. No benchmark publishes from fewer than 10 opted-in customers. Same k-anonymity principle, applied at the customer tier.

Calibration

New channels enter a 30-day calibration window. During calibration, the per-message pipeline runs end to end (the DM gate and the floor still apply) but signals are not persisted. Calibration is how the classifier learns the channel's baseline for tone, response time, and participation distribution. Without calibration, the first week of signals would all read as anomalies relative to nothing.

After calibration completes, the channel transitions to active and signals start firing. Admins can pause a channel at any time, which suspends signal generation but preserves historical scores.

Retention

Message text: zero retention. Processed in memory during the LLM call. Discarded the moment the classifier returns.
HealthSignal rows: retained indefinitely by default. Per-org retention policies on the enterprise roadmap.
Audit log: 7-year retention floor. Survives customer offboarding (with disclosure to customer at offboard).
Consent records: 7-year retention floor. Same survival rule as audit log.
Channel-volume metadata (silence baseline): retained as long as the channel is enabled, plus 30 days.

Failure modes

When something in the pipeline fails, the failure is visible, not hidden.

LLM call fails or times out: the message drops. The fast-classifier output is logged with `skip:llm-fail` and no signal is persisted. The signal does NOT default to “verified” or any positive state.
Floor check throws (membership count unresolvable): the insert is rejected. We treat the signal as if the team were below floor. Fail-closed.
Provider boundary breach (e.g., a sub-processor is found storing prompts in violation of the published terms): the integration is cut within 7 days per /service-standards. Until the integration cutover lands, the affected path is disabled.

Version

Current version: v1.0

Last updated: 2026-04-25

Methodology changes ship with a version bump on this page and a paired entry in the changelog below. The CI rule HIGH-SV-METHODOLOGY-VERSIONED blocks merges where this page is updated without a Version and Changelog header.

Changelog

v1.0 (2026-04-25). Initial publication. Closes the methodology gap on INTEGRITY.md. Documents the 10-step classifier path, the six signal types, the aggregation rules, calibration, retention, and failure modes.