Reference
Privacy
PII redaction and identifier hashing
Otis aims to never hold Personally Identifiable Information (PII), since it has little to no value in product analytics.
Otis applies two independent privacy mechanisms, each with a client-side and a server-side layer:
- PII redaction — scrubs PII content from span text before it reaches Otis.
- Client-side runs in the SDK before spans are exported. Regex patterns catch structured PII (credit cards, emails, phone numbers, SSNs, API keys, JWTs, etc.).
- Server-side runs when Otis receives spans and before they touch persistent storage. A machine-learning model catches unstructured PII that regex can't detect (names, addresses, organizations, other contextual PII) and composes with the client-side layer.
- Identifier hashing — pseudonymizes raw user, session, and group IDs so they're opaque to Otis.
- Client-side HMAC applied by the SDK before values are exported, using a salt derived from your API key.
- Server-side HMAC applied when Otis receives spans and before they touch persistent storage, using a separate per-project secret held by Otis. Double-hashing defeats rainbow-table attacks on the space of likely raw identifiers.
Each layer is independent defense-in-depth against the previous one failing or missing something.
Client-side PII redaction
Enabled by default. 40+ regex patterns with Luhn and similar validators to minimize false positives. Entity linking ensures the same PII value gets the same placeholder within a span.
initOtis({
serviceName: "my-app",
piiRedaction: {
enabled: true, // default
disabledPatterns: ["ipv4"], // skip specific patterns
customPatterns: [{ // add custom patterns
name: "internal_id",
regex: /INT-\d{10}/g,
placeholder: "[INTERNAL_ID_N]",
priority: 50,
}],
},
});Entity linking
Within a single span, the same PII value receives the same placeholder. Maps reset between spans to maintain context separation.
Input: "Contact john@acme.com for help. CC john@acme.com for the team."
Output: "Contact {REDACTED_EMAIL_1} for help. CC {REDACTED_EMAIL_1} for the team."Which attributes are scanned
AI prompt/response attributes, user-corrected text, and user/group property values are scanned.
Scanned attributes:
ai.prompt, ai.prompt.messages, ai.prompt.lastUserMessage, ai.response, ai.response.text, gen_ai.input.messages, gen_ai.prompt, gen_ai.prompt.messages, gen_ai.output.messages, gen_ai.response, gen_ai.response.text, gen_ai.completion, user_message, response_message
Scanned feedback fields:
The comment and expected attributes in sendFeedbackSignal are scanned.
Scanned prefixes: traits.*, metadata.*, properties.*, session_properties.*
Inverse contract. Anything else — top-level attributes on sendEvent calls, OTel-semantic identifiers (user.id, session.id), structural fields (format, doc.id, tool.name), the artifact.id / artifact.stage / artifact.type set — is treated as identifier-shaped and ships unscanned. To route a freeform value through redaction without changing the namespace, use the opt-in segment rule below.
Opt-in scanning with sensitive_
Any attribute key with a sensitive_ or sensitive. segment prefix is value-scanned for PII, regardless of namespace. The match is case-insensitive and applies at any depth.
sendEvent("document.export", {
format: "pdf", // not scanned
sensitive_note: "Drafted with alice@acme.com", // scanned
});
sendArtifactEvent("doc-abc123", {
stage: "share",
sensitive_note: "Sent to alice@acme.com", // becomes artifact.sensitive_note, still scanned
});| Key | Scanned |
|---|---|
sensitive_note | yes |
sensitive.note | yes |
artifact.sensitive_note | yes |
hello.world.sensitive_email | yes |
Foo.SENSITIVE.bar | yes (case-insensitive) |
nonsensitive_thing | no (no segment boundary) |
email.sensitive | no (prefix the parent, not the leaf) |
The same rule applies on the server-side layer, so opting in covers both regex and NER passes.
Property key dropping
Property keys set via setUserProperties, setGroupProperties, and identifyUser are checked against a curated list of well-known PII field names (e.g. ssn, email, first_name, phone_number, date_of_birth, credit_card, address). Matching entries are dropped entirely, since the key name itself reveals PII intent even if the value is redacted.
Hyphens are normalized to underscores, so first-name matches first_name. Suffix matching catches compound keys like customer_ssn or primary_email.
initOtis({
serviceName: "my-app",
piiRedaction: {
dropPIIPropertyKeys: true, // default: true
additionalPIIPropertyKeys: ["employee_id"], // merge with defaults
},
});Property values not dropped by key matching are still regex-scanned. For example, properties.notes = "Call 415-555-1234" redacts the phone number.
Use debug: "filter" to see which keys are dropped:
[otis:filter] { piiKeyDropped: 'properties.email' }Override scanned attributes
scanAttributes and scanAttributePrefixes replace the defaults when provided:
initOtis({
serviceName: "my-app",
piiRedaction: {
scanAttributes: ["custom.user_input", "custom.response"],
scanAttributePrefixes: ["user_data."],
},
});To extend defaults rather than replace, spread the exported constants:
import {
initOtis,
DEFAULT_SCAN_ATTRIBUTES,
DEFAULT_SCAN_ATTRIBUTE_PREFIXES,
} from "@runotis/sdk";
initOtis({
serviceName: "my-app",
piiRedaction: {
scanAttributes: [...DEFAULT_SCAN_ATTRIBUTES, "custom.user_input"],
scanAttributePrefixes: [...DEFAULT_SCAN_ATTRIBUTE_PREFIXES, "user_data."],
},
});Server-side PII redaction
Enabled by default on every project environment. A second pass runs on the ingest collector before data is persisted, using a machine-learning model to detect unstructured PII that regex patterns can't reliably catch: personal names, street addresses, organization names, and other contextually identified entities. The server-side pass runs on the same set of attributes as the client-side layer.
What it adds on top of client-side redaction
- Natural-language entities. Names, addresses, and organizations that don't match any regex pattern.
- Cross-attribute entity linking. The same person name appearing in
user_messageandai.response.textwithin the same span receives the same placeholder. - Composition with client-side redaction. If the SDK has already redacted part of a string (e.g. replacing an email with
{REDACTED_EMAIL_1}), the server-side layer still analyzes the surrounding text and can detect adjacent PII the SDK missed, without producing overlapping or conflicting replacements. - Always-on regex union. The same structured-PII regex patterns the SDK uses also run server-side, so a client with PII redaction disabled is still protected.
Configuration
Server-side redaction is configured per project environment from the Otis UI (Settings → Ingestion → PII Redaction). Enabled by default; the other settings below have sensible defaults and rarely need to change.
| Setting | Default | Effect |
|---|---|---|
| Enabled | On | Run server-side redaction on every span for this environment |
| Confidence threshold | Tuned for balanced precision/recall | Minimum model confidence for an entity to be redacted (higher = fewer false positives, possibly lower recall) |
| Disabled entity types | None | Entity types to skip (e.g. keep organization names if they're never sensitive for your app) |
| On error | passthrough | passthrough — keep the span unredacted if the redaction model is unavailabledrop — discard the span if redaction can't run |
Turning server-side redaction off is supported but not recommended. Client-side redaction alone will not catch unstructured PII like names or addresses.
Behavior on failure
If the server-side model is unavailable, the regex union still runs, so structured PII (credit cards, SSNs, emails, phone numbers, etc.) is always caught. The on error setting determines what happens if both layers fail: passthrough accepts some risk of PII reaching storage in exchange for no data loss; drop discards the span.
Keep client-side on too
Even with server-side redaction enabled, keep client-side redaction on. Client-side catches structured PII before it leaves your infrastructure, which reduces your surface area in the event of a transport-layer issue or misconfiguration. The two layers are designed to compose.
Identifier hashing
All identifiers — user IDs, session IDs, and group IDs — are pseudonymized via HMAC-SHA256 before they leave your application. This lets you correlate activity for analytics without exposing raw identifiers (emails, internal user IDs, etc.) in analytics storage.
Enabled by default
Identifier hashing is enabled by default. The HMAC salt is derived automatically from your API key, so no additional configuration is needed.
initOtis({
serviceName: "my-app",
// Identifier hashing is enabled by default.
// To disable (not recommended): identifierHashing: false
});Hashing happens at entry points (contextFromChatRequest, withContext, wrap context, identifyUser), so downstream code always sees already-hashed values.
What gets hashed
| Input | Output prefix |
|---|---|
| User ID | usr_v1_<base64url-hmac-sha256> |
| Session ID | ses_v1_<base64url-hmac-sha256> |
| Group ID | grp_v1_<base64url-hmac-sha256> |
HMAC input is domain-separated, so the same raw string passed as a user ID vs a session ID produces different hashes.
Helpers to check whether a value is already hashed
import { isHashedUserId, isHashedSessionId, isHashedGroupId } from "@runotis/sdk";Already-hashed values (with the correct prefix) pass through without re-hashing. This means it's safe to call identifyUser(hashedId) in code paths where the ID might have already been hashed upstream.
Anonymous users
The browser SDK generates anonymous user IDs for unauthenticated sessions, prefixed anon_. Anonymous IDs are not hashed (they're already opaque) but flow through the same pipeline.
Server-side hashing layer
After values leave the SDK, a second hashing pass is applied in the ingest collector using a server-side secret. This is transparent to your application (you never see the second hash) and means raw identifiers would need to be compromised on both ends to be recoverable. You don't need to configure this layer; it happens automatically.
Why double-hash?
A single HMAC with a known secret is vulnerable to rainbow-table attacks on the space of likely raw identifiers: an attacker with the hashed values plus a guess at the input domain (email addresses, sequential internal IDs) can precompute hashes and match them against analytics storage. Double-hashing defeats this. An attacker would need to compromise both the API-key-derived salt (held by your application) and the server-side ingest secret (held by Otis, and never transmitted to clients) to reverse any hash. Since the two secrets live in independent security boundaries, a breach of one side is insufficient.
Recommended configuration
For production use, we recommend:
- Leave client-side PII redaction on (the default).
- Leave server-side PII redaction on (the default).
- Leave identifier hashing enabled (the default).
- Use
on error: passthroughfor analytics environments where data loss is costly (the default). - Use
on error: dropfor regulated environments where PII exposure is costlier than missing spans.