Security & Privacy Whitepaper (Overview)Chapter 10
Chapter 10: Logging, Monitoring & Incident Response
FileBolt's observability must align with Zero-Knowledge boundaries: the system must identify faults, detect attacks, and quantify performance without introducing new sensitive data leak vectors. This chapter defines which events are logged, explicitly prohibits recording certain information, outlines alerting and anomaly detection methods, and details the Security Incident Response (IR) process and external communication principles.
10.0 Summary
- SHOULD: Logs and monitoring cover only necessary operational and security events, utilizing aggregation and sampling to minimize raw detail retention.
- MUST NOT: Any logs, APM, or error reports must not contain CEK, URL fragments (
location.hash), or full URLs containing fragments. - MUST NOT: Do not record plaintext content, decrypted chunks, or derivable key material.
- MUST: Establish detection and alerting for enumeration, brute force, traffic spiking, and anomalous patterns, with containment measures in place.
- MUST: Maintain an Incident Response process (Triage, Containment, Forensics, Remediation, Review, Communication).
10.1 Logged Events (Minimalism)
System logs and monitoring SHOULD cover only the set of events "necessary for operations and security," avoiding the ingestion of user content or sensitive credentials into observability systems. Events are categorized into Lifecycle, Authentication, Storage/Network, and Security Protection, with aggregation recommended.
10.1.1 Transfer Lifecycle Events
- Create transfer (generate transferId)
- Upload chunk complete (logged at ciphertext level, not plaintext)
- Manifest write/update
- Expiration cleanup, user deletion/revocation (including cleanup task trigger & completion)
- Download start/complete (measured at ciphertext download level)
10.1.2 Authentication & Authorization Events
- Session token issuance, validation failure (scope mismatch, expired, revoked, etc.)
- Long-term login token validation failure (admin action denied)
- Suspicious access patterns: Anomalous spikes in 401/403/404
Note: Record "result & reason code" only. MUST NOT record token plaintext, Authorization header plaintext, or signature parameter plaintext.
10.1.3 System Health & Performance Metrics
- Request latency (p50/p95/p99), error rates, throughput
- Object storage read/write failure rates, origin fetch failure rates, retry counts
- Edge node anomalies, queue backlogs, background cleanup task success rates
- Bandwidth and traffic trends (aggregated dimensions)
10.1.4 Security Protection Events
- Rate limiting triggers, WAF/firewall rule hits
- Threshold triggers for suspected enumeration/brute-force behavior
- Anomalous concurrency and traffic patterns (high concurrency from same source, abnormal resource heat)
10.1.5 Aggregation & Sampling
- SHOULD: Aggregate counts and latency quantiles by minute/hour to reduce granular logs.
- SHOULD: Sample high-frequency successful requests, retaining necessary details only for errors and anomalous paths.
- SHOULD: Retain stable classification fields (e.g., eventType, reasonCode) for anomalous events to facilitate auditing and machine analysis.
10.2 Prohibited Information (CEK, Fragment, Credentials)
A key to maintaining the Zero-Knowledge commitment is strictly enforcing a "Sensitive Data Prohibition" policy across all observability systems. This policy must cover: server logs, APM, error reporting, third-party monitoring, client logs, and browser console output.
10.2.1 Server-Side MUST NOT Record
- CEK (Content Encryption Key) and any derivable material
- URL fragment (
#...),location.hash, and full URLs containing fragments - Plaintext Content: File plaintext, preview content, decrypted chunks, or any plaintext summaries
- Sensitive Credential Plaintext: Session tokens, long-term tokens, Authorization header plaintext, signature field plaintext in signed URLs
If logging URLs is required for debugging: MUST sanitize (remove fragment; redact or remove sensitive queries) and enforce this at the code level (not relying on manual convention).
10.2.2 Client-Side MUST NOT Record
- Do not print
location.href(may contain fragment) in production environments - Do not include URLs with fragments, CEK, or noncePrefix in error reports
- Debug logs and dev tool outputs should be disabled or minimized by default
SHOULD: Encapsulate a unified log/report function on the frontend that performs URL sanitization and sensitive field filtering internally before outputting/reporting.
10.3 Monitoring Metrics & Alerts (Availability, Performance, Security)
10.3.1 Availability & Performance Alerts
- Error Rate Alerts: Spikes in 5xx, abnormal rise in 4xx (distinguish auth failure from abuse)
- Latency Alerts: p95/p99 threshold breaches, regional origin fetch anomalies
- Storage Alerts: Rise in object storage write failures, read timeouts
- Cleanup Task Alerts: Expiration cleanup backlog, excessive delete task retries
10.3.2 Security Alerts
- Abnormal rise in rate limit hit rate
- High frequency 401/403/404 from same source, suspected enumeration/brute force
- Abnormal single-resource download bandwidth/concurrency, suspected traffic flooding or hotlinking
- Critical policy changes (e.g., WAF rules, CSP/header changes) should trigger change audits and alerts (if applicable)
10.3.3 Alert Governance
- SHOULD: Categorize alerts (P0/P1/P2) with rotation/escalation paths.
- SHOULD: Apply debounce/window aggregation to noisy metrics to avoid alert fatigue.
- SHOULD: Attach minimal diagnostic info (reason code, trend chart, affected region/route) to each alert, excluding sensitive data.
10.4 Anomaly Detection & Abuse Identification
File transfer services naturally face risks of enumeration, brute force, traffic flooding, and resource abuse. This section outlines detectable signals and response principles. Specific thresholds and strategies must be dynamically adjusted based on business scale and false positive costs.
10.4.1 Common Anomaly Patterns
- Enumeration: Massive probing of transferId / resource paths in short time, spikes in 404/401/403.
- Brute Force: High frequency attempts on short codes/passwords/tokens, abnormal failure rate curve.
- Traffic Flooding: Abnormal bandwidth for single resource, massive concurrent connections from same source, repeated downloads.
- Protocol Anomalies: Irrational chunk order, duplicate uploads, frequent interrupt/retries, abnormal UA patterns.
10.4.2 Mitigation Strategies (Principles)
- SHOULD: Rate limiting, tiered banning, challenge mechanisms (if applicable), dynamic policies based on ASN/Region/Fingerprint.
- SHOULD: Adopt gentler strategies (e.g., delay, segmented throttling) for paths sensitive to false positives (normal users may download frequently).
- MUST: Retain auditable triggers (reason code, threshold category) for actions taken and provide support channels (if applicable).
10.5 Incident Response Process
FileBolt MUST establish a Security Incident Response (IR) process to ensure rapid damage control, reviewability, and transparent communication. Events potentially affecting Zero-Knowledge commitments (e.g., XSS, supply chain injection, log leakage of fragments) should be treated as highest priority.
- Triage: Classify by impact scope and severity (e.g., P0/P1/P2), define escalation paths and owners.
- Containment: Rapidly isolate risk sources (ban attack sources, temporarily disable features, tighten policies, rotate keys/tokens).
- Forensics & Review: Reconstruct timeline based on minimal logs & metrics; document decisions, impact, remediation, and improvements.
- Remediation & Verification: Deploy patches, regression testing, security verification, third-party scan re-testing if necessary.
- Communication & Disclosure: Explain transparently to users and public without expanding attack surface; publish status page incident reports if necessary.
External communication should clarify: whether key material or plaintext was potentially exposed, impact scope, measures taken, and actions users need to take (if any).
10.6 Data Retention & Access Control
Logs and monitoring data are sensitive assets themselves. Shortest useful retention periods and least privilege access control should be applied to minimize leakage and abuse risks.
- SHOULD: Set differentiated retention periods for log types (slightly longer for errors & security events, shorter or no retention for successful request details).
- SHOULD: Use Least Privilege Access (RBAC) for log & monitoring platforms and retain access audits.
- SHOULD: Perform sanitization and minimization when exporting/sharing logs (prohibit carrying sensitive credentials and PII).
10.7 Third-Party Monitoring & Integration Principles
Third-party APM/Error tracking tools may collect URLs, headers, form fields, and device info by default. If using third-party monitoring, ensure its collection boundaries do not violate Zero-Knowledge commitments.
- MUST: Do not load third-party scripts on download/decryption pages (Consistent with Chapter 9).
- MUST: Explicitly configure third-party collection: No fragments, no sensitive header plaintext, no plaintext content.
- SHOULD: Unify URL sanitization for reporting; prohibit or strongly mask payloads potentially containing sensitive fields.
- SHOULD: Prioritize server-side aggregated metrics over client-side granular tracking to reduce endpoint collection surface.
10.8 Related Claim IDs (Reserved)
This chapter covers "Prohibition of Sensitive Data Logging, Alerting & IR, Third-Party Monitoring Constraints." Corresponding Claim IDs will be added to the Appendix: Master List of Claim IDs as the sole authoritative source.