Skip to content

feat(gdpr): author erasure (PR5 of #6701)#7550

Merged
JohnMcLear merged 9 commits intodevelopfrom
feat-gdpr-author-erasure
May 3, 2026
Merged

feat(gdpr): author erasure (PR5 of #6701)#7550
JohnMcLear merged 9 commits intodevelopfrom
feat-gdpr-author-erasure

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

  • New authorManager.anonymizeAuthor(authorID) zeroes the display identity on globalAuthor:<id> (keeps the record as an opaque stub so existing changeset references still resolve), deletes every token2author:* and mapper2author:* binding that points at the author, and nulls authorId on chat messages they posted. Pad content, revisions, and attribute pool are kept intact.
  • New REST endpoint POST /api/1.3.1/anonymizeAuthor?authorID=… — admin-auth via the existing apikey/JWT pipeline.
  • Idempotent. Second call returns zero counters.
  • doc/privacy.md explains exactly what the call does and does not do.

Final PR of the #6701 GDPR work. PR1 #7546 (deletion controls), PR2 #7547 (IP/privacy audit), PR3 #7548 (HttpOnly author cookie), PR4 #7549 (privacy banner) complete the set.

Design: docs/superpowers/specs/2026-04-19-gdpr-pr5-author-erasure-design.md
Plan: docs/superpowers/plans/2026-04-19-gdpr-pr5-author-erasure.md

Test plan

  • ts-check
  • AuthorManager unit (identity zeroing / mapping removal / idempotence / unknown authorID) — 4 passing
  • REST integration (successful erasure + missing-authorID error) — 2 passing
  • api.ts regression — passes

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Apr 19, 2026

Review Summary by Qodo

(Agentic_describe updated until commit 16cd84a)

GDPR author erasure with resumable partial-failure recovery

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implement GDPR Art. 17 right-to-erasure via anonymizeAuthor(authorID) function
  - Zeroes display identity (name, colorId) on globalAuthor:<id> record
  - Deletes all token2author:* and mapper2author:* bindings pointing to author
  - Nulls authorId on chat messages authored by the person
  - Preserves pad content, revisions, and attribute pools intact
• Add REST endpoint POST /api/1.3.1/anonymizeAuthor?authorID=… with admin auth
• Implement idempotent erasure with resumable partial-failure recovery
• Add comprehensive unit and integration tests plus privacy documentation
Diagram
flowchart LR
  A["Author Request<br/>anonymizeAuthor"] --> B["Drop token2author<br/>& mapper2author"]
  B --> C["Zero display identity<br/>name & colorId"]
  C --> D["Scrub chat messages<br/>null authorId"]
  D --> E["Set erased=true<br/>sentinel"]
  E --> F["Return counters<br/>& idempotent"]
  G["Partial Failure"] -.->|Resume| B
Loading

Grey Divider

File Changes

1. src/node/db/AuthorManager.ts ✨ Enhancement +102/-0

Add GDPR Art. 17 author anonymization with resumable failure handling

• Add anonymizeAuthor(authorID) function implementing GDPR Art. 17 erasure
• Zeroes name and colorId on globalAuthor:<authorID> record
• Deletes all token2author:* and mapper2author:* mappings pointing to author
• Iterates author's pads and nulls authorId on chat messages they posted
• Returns counters for affected pads, removed mappings, and cleared messages
• Implements resumable partial-failure recovery via two-phase write strategy

src/node/db/AuthorManager.ts


2. src/node/db/API.ts ✨ Enhancement +14/-0

Expose anonymizeAuthor on programmatic API surface

• Expose anonymizeAuthor on programmatic API surface
• Validate authorID parameter is non-empty string
• Throw CustomError with 'apierror' code if validation fails
• Delegate to authorManager.anonymizeAuthor for core logic

src/node/db/API.ts


3. src/node/handler/APIHandler.ts ✨ Enhancement +6/-1

Register anonymizeAuthor in API version 1.3.1

• Create new API version 1.3.1 extending 1.3.0
• Register anonymizeAuthor endpoint with ['authorID'] parameter spec
• Update latestApiVersion from '1.3.0' to '1.3.1'
• Automatically picked up by OpenAPI document generation

src/node/handler/APIHandler.ts


View more (5)
4. src/tests/backend/specs/anonymizeAuthor.ts 🧪 Tests +93/-0

Add AuthorManager.anonymizeAuthor unit tests

• Add unit tests for AuthorManager.anonymizeAuthor function
• Test identity zeroing: verify name=null, colorId=0, erased=true sentinel
• Test mapping deletion: confirm token2author:* and mapper2author:* removed
• Test idempotence: second call returns zero counters
• Test unknown authorID: returns zero counters without error
• Test partial-failure resumption: verify retry completes after partial run

src/tests/backend/specs/anonymizeAuthor.ts


5. src/tests/backend/specs/api/anonymizeAuthor.ts 🧪 Tests +51/-0

Add REST anonymizeAuthor end-to-end integration tests

• Add REST integration tests for POST /api/1.3.1/anonymizeAuthor endpoint
• Test successful erasure: verify author name becomes null post-call
• Test counter return: confirm affectedPads and mapping counts returned
• Test missing authorID error: verify error code 1 and message validation
• Use JWT admin auth via common.generateJWTToken() pattern

src/tests/backend/specs/api/anonymizeAuthor.ts


6. doc/privacy.md 📝 Documentation +37/-0

Add GDPR right-to-erasure documentation and API example

• Create new privacy documentation file
• Add "Right to erasure (GDPR Art. 17)" section
• Document what the anonymizeAuthor call does: zero identity, delete mappings, null chat
• Document what it does not do: preserve pad content, revisions, attribute pools
• Provide curl example for operators triggering erasure via REST API
• Note idempotent behavior and reference related GDPR PR work

doc/privacy.md


7. docs/superpowers/specs/2026-04-19-gdpr-pr5-author-erasure-design.md 📝 Documentation +222/-0

Add GDPR PR5 author erasure design specification

• Add comprehensive design specification for author erasure feature
• Audit what links authorID to real person: globalAuthor, token2author, mapper2author
• Define goals: anonymize identity, delete mappings, null chat, keep pad content
• Specify implementation: AuthorManager.anonymizeAuthor, REST endpoint, OpenAPI pickup
• Detail testing strategy: unit tests, REST integration, chat regression
• Document risk mitigation: idempotence, partial-failure recovery, performance considerations

docs/superpowers/specs/2026-04-19-gdpr-pr5-author-erasure-design.md


8. docs/superpowers/plans/2026-04-19-gdpr-pr5-author-erasure.md 📝 Documentation +510/-0

Add GDPR PR5 author erasure implementation plan

• Add detailed implementation plan with 6 tasks and self-review checklist
• Task 1: Implement anonymizeAuthor on AuthorManager with two-phase write
• Task 2: Write unit tests covering identity, mappings, idempotence, unknown ID
• Task 3: Expose on REST API via API.ts and register version 1.3.1
• Task 4: Add REST integration tests with JWT admin auth
• Task 5: Document in doc/privacy.md with curl examples
• Task 6: Verification, type-checking, and PR submission steps

docs/superpowers/plans/2026-04-19-gdpr-pr5-author-erasure.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Apr 19, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Legacy chat IDs not cleared 🐞 Bug ≡ Correctness
Description
AuthorManager.anonymizeAuthor() only nulls msg.authorId, so chat records stored with legacy
userId (explicitly supported as an older DB shape) will not be anonymized and will keep
referencing the erased authorID.
Code

src/node/db/AuthorManager.ts[R388-395]

+    for (let i = 0; i <= chatHead; i++) {
+      const chatKey = `pad:${padID}:chat:${i}`;
+      const msg = await db.get(chatKey);
+      if (msg != null && msg.authorId === authorID) {
+        msg.authorId = null;
+        await db.set(chatKey, msg);
+        clearedChatMessages++;
+      }
Evidence
The new scrub loop only checks msg.authorId, but ChatMessage.fromObject() explicitly states that
the DB might contain older records with userId/userName and maps them for compatibility. Those
records will not match the msg.authorId === authorID condition and therefore will not be updated
by anonymizeAuthor().

src/node/db/AuthorManager.ts[388-395]
src/static/js/ChatMessage.ts[17-31]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/node/db/AuthorManager.anonymizeAuthor()` only detects chat messages authored by the target author via `msg.authorId`, but chat records might use the legacy field names (`userId`/`userName`). Those legacy records will not be scrubbed, leaving author linkage behind.
### Issue Context
`ChatMessage.fromObject()` explicitly supports old DB records where `userId` was renamed to `authorId` and `userName` to `displayName`.
### Fix Focus Areas
- src/node/db/AuthorManager.ts[388-395]
### Implementation notes
- When loading `msg` from `pad:<padId>:chat:<i>`, treat the author field as `msg.authorId ?? msg.userId`.
- If it matches `authorID`, clear **both** `authorId` and `userId` (and consider clearing `displayName`/`userName` if present) before writing back.
- Keep the stored value a plain object (avoid storing a `ChatMessage` instance that could serialize differently).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. anonymizeAuthor lacks feature flag📘 Rule violation ☼ Reliability
Description
The new anonymizeAuthor REST/API surface is registered unconditionally and becomes available by
default, without any enable/disable mechanism. This violates the requirement that new features be
gated behind a feature flag and disabled by default.
Code

src/node/handler/APIHandler.ts[R146-152]

+version['1.3.1'] = {
+  ...version['1.3.0'],
+  anonymizeAuthor: ['authorID'],
+};
+
// set the latest available API version here
-exports.latestApiVersion = '1.3.0';
+exports.latestApiVersion = '1.3.1';
Evidence
PR Compliance ID 5 requires new features to be behind a feature flag and disabled by default. The PR
registers anonymizeAuthor in the API version map and sets latestApiVersion to 1.3.1 without
any conditional gating, and also exports the new API function directly.

src/node/handler/APIHandler.ts[146-152]
src/node/db/API.ts[65-77]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A new feature (`anonymizeAuthor` API/REST endpoint) is enabled by default and has no feature-flag gating.
## Issue Context
Compliance requires new features to be behind a feature flag and disabled by default.
## Fix Focus Areas
- src/node/handler/APIHandler.ts[146-152]
- src/node/db/API.ts[65-77]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Non-resumable partial erasure🐞 Bug ☼ Reliability
Description
AuthorManager.anonymizeAuthor() persists erased: true before the chat-scrub loop, so any error
during chat scrubbing can leave chat messages unchanged while subsequent calls short-circuit on
existing.erased and never finish the scrub. This contradicts the documented behavior that chat
message authorId is nulled, and makes failures non-recoverable without manual DB intervention.
Code

src/node/db/AuthorManager.ts[R336-395]

+  const existing = await db.get(`globalAuthor:${authorID}`);
+  if (existing == null || existing.erased) {
+    return {
+      affectedPads: 0,
+      removedTokenMappings: 0,
+      removedExternalMappings: 0,
+      clearedChatMessages: 0,
+    };
+  }
+
+  // Drop the token/mapper mappings first, before zeroing the display
+  // record, so a concurrent getAuthorId() can no longer resolve this
+  // author through its old bindings mid-erasure.
+  let removedTokenMappings = 0;
+  const tokenKeys: string[] = await db.findKeys('token2author:*', null);
+  for (const key of tokenKeys) {
+    if (await db.get(key) === authorID) {
+      await db.remove(key);
+      removedTokenMappings++;
+    }
+  }
+  let removedExternalMappings = 0;
+  const mapperKeys: string[] = await db.findKeys('mapper2author:*', null);
+  for (const key of mapperKeys) {
+    if (await db.get(key) === authorID) {
+      await db.remove(key);
+      removedExternalMappings++;
+    }
+  }
+
+  // Zero the display identity. Keep `padIDs` so future maintenance (or a
+  // pad-delete batch) can still find the set of pads this authorID touched.
+  await db.set(`globalAuthor:${authorID}`, {
+    colorId: 0,
+    name: null,
+    timestamp: Date.now(),
+    padIDs: existing.padIDs || {},
+    erased: true,
+    erasedAt: new Date().toISOString(),
+  });
+
+  // Null authorship on chat messages the author posted.
+  const padIDs = Object.keys(existing.padIDs || {});
+  let clearedChatMessages = 0;
+  for (const padID of padIDs) {
+    if (!await padManager.doesPadExist(padID)) continue;
+    const pad = await padManager.getPad(padID);
+    const chatHead = pad.chatHead;
+    if (typeof chatHead !== 'number' || chatHead < 0) continue;
+    for (let i = 0; i <= chatHead; i++) {
+      const chatKey = `pad:${padID}:chat:${i}`;
+      const msg = await db.get(chatKey);
+      if (msg != null && msg.authorId === authorID) {
+        msg.authorId = null;
+        await db.set(chatKey, msg);
+        clearedChatMessages++;
+      }
+    }
+  }
+
Evidence
The function returns immediately if existing.erased is set, but it sets erased: true on the
global author record *before* iterating pads and rewriting chat messages. If an exception occurs
anywhere after the author record update (DB error, pad load error, etc.), retries will short-circuit
and skip the chat scrub step, leaving chat messages with the original authorId despite docs
stating they are nulled.

src/node/db/AuthorManager.ts[336-395]
doc/privacy.md[18-27]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`anonymizeAuthor()` marks the author record as `erased: true` before finishing the chat scrub. If any error occurs during the chat loop, retries will short-circuit on `existing.erased` and never finish nulling `authorId` on chat messages.
### Issue Context
- Current behavior uses `existing.erased` as the idempotency guard.
- Docs state chat message `authorId` is nulled.
- The implementation should either (a) only mark `erased: true` once all steps have completed, or (b) track per-step completion so retries can resume unfinished work.
### Fix Focus Areas
- src/node/db/AuthorManager.ts[336-395]
### Suggested implementation direction
- Introduce an intermediate state (e.g., `erasureInProgress: true`) and set it before starting work.
- Perform token/mapper cleanup + chat scrub.
- Only after successful completion, update the author record to `{erased: true, erasureInProgress: false}`.
- Alternatively: keep `erased: true` but add a separate flag (e.g., `chatScrubbed: true`) and only short-circuit when both are complete; otherwise resume the missing steps.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Bearer apikey docs mismatch 🐞 Bug ⚙ Maintainability
Description
doc/privacy.md instructs users to send an apikey as Authorization: Bearer , but the API key path
compares the raw header value to the configured apikey and does not strip the Bearer  prefix, so
the documented example will fail on apikey-based deployments.
Code

doc/privacy.md[R13-15]

+curl -X POST \
+  -H "Authorization: Bearer <admin JWT / apikey>" \
+  "https://<instance>/api/1.3.1/anonymizeAuthor?authorID=a.XXXXXXXXXXXXXX"
Evidence
The documentation example includes Bearer for apikey usage, but the API key authentication branch
uses fields.authorization directly as the apikey value and compares it verbatim to the configured
key; only the JWT branch strips Bearer . Therefore Bearer  will not authenticate when apikey
auth is enabled.

doc/privacy.md[12-16]
src/node/handler/APIHandler.ts[180-185]
src/node/handler/APIHandler.ts[192-201]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The privacy docs show `Authorization: Bearer <admin JWT / apikey>`, but when apikey auth is enabled the server compares the Authorization value verbatim to the configured apikey (no `Bearer ` stripping). This makes the documented apikey invocation fail.
### Issue Context
- JWT path strips `Bearer `.
- apikey path does not.
### Fix Focus Areas
- doc/privacy.md[12-16]
### Implementation notes
Update the example to either:
- Show two separate examples (one for JWT Bearer, one for apikey via `?apikey=...`), or
- If documenting apikey in the Authorization header, omit the `Bearer ` prefix for apikey usage.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. O(N) mapping key scans 🐞 Bug ➹ Performance
Description
anonymizeAuthor() deletes mappings by scanning all token2author:* and mapper2author:* keys and
issuing a db.get() for each, which is O(N) over the entire keyspace and can be extremely slow on
large instances. This can cause long-running requests/timeouts during GDPR erasure operations.
Code

src/node/db/AuthorManager.ts[R349-364]

+  let removedTokenMappings = 0;
+  const tokenKeys: string[] = await db.findKeys('token2author:*', null);
+  for (const key of tokenKeys) {
+    if (await db.get(key) === authorID) {
+      await db.remove(key);
+      removedTokenMappings++;
+    }
+  }
+  let removedExternalMappings = 0;
+  const mapperKeys: string[] = await db.findKeys('mapper2author:*', null);
+  for (const key of mapperKeys) {
+    if (await db.get(key) === authorID) {
+      await db.remove(key);
+      removedExternalMappings++;
+    }
+  }
Evidence
Mappings are created via mapAuthorWithDBKey() as individual token2author: / mapper2author:
records, with no reverse index from authorID to tokens/mappers. As a result, anonymization must
enumerate all keys (findKeys('token2author:*'), findKeys('mapper2author:*')) and check each
value to find those that point to the target authorID.

src/node/db/AuthorManager.ts[117-137]
src/node/db/AuthorManager.ts[349-364]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`anonymizeAuthor()` currently performs full keyspace scans over `token2author:*` and `mapper2author:*` and then does per-key `db.get()` checks. On large databases this is slow and can lead to request timeouts.
### Issue Context
Mappings are created one-way (`token2author:<token> -> authorID`, `mapper2author:<mapper> -> authorID`), so there is no efficient way to enumerate mappings for a single author.
### Fix Focus Areas
- src/node/db/AuthorManager.ts[117-137]
- src/node/db/AuthorManager.ts[349-364]
### Suggested implementation direction
- Maintain reverse indexes when creating mappings (e.g., `author2tokens:<authorID>` and `author2mappers:<authorID>` as sets/lists).
- On anonymization, read those reverse-index keys and delete only the relevant `token2author:*` / `mapper2author:*` entries.
- Optionally: keep the scan as a fallback for existing data (migration-free), but prefer the reverse index when present.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment on lines +146 to +152
version['1.3.1'] = {
...version['1.3.0'],
anonymizeAuthor: ['authorID'],
};

// set the latest available API version here
exports.latestApiVersion = '1.3.0';
exports.latestApiVersion = '1.3.1';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. anonymizeauthor lacks feature flag 📘 Rule violation ☼ Reliability

The new anonymizeAuthor REST/API surface is registered unconditionally and becomes available by
default, without any enable/disable mechanism. This violates the requirement that new features be
gated behind a feature flag and disabled by default.
Agent Prompt
## Issue description
A new feature (`anonymizeAuthor` API/REST endpoint) is enabled by default and has no feature-flag gating.

## Issue Context
Compliance requires new features to be behind a feature flag and disabled by default.

## Fix Focus Areas
- src/node/handler/APIHandler.ts[146-152]
- src/node/db/API.ts[65-77]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread src/node/db/AuthorManager.ts
Qodo review: the `erased: true` sentinel was written before the chat
scrub loop, so a throw during scrub left chat messages untouched
while subsequent calls short-circuited on `existing.erased` and never
finished. Split the write: zero the display identity first (still
hides the name), run the chat scrub, and only then stamp
`erased: true` so a retry resumes the sweep. Regression test
covers the partial-run → retry path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear marked this pull request as draft April 27, 2026 09:55
@JohnMcLear JohnMcLear marked this pull request as ready for review May 3, 2026 06:05
@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@JohnMcLear JohnMcLear requested a review from SamTV12345 May 3, 2026 06:05
@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 3, 2026

Persistent review updated to latest commit 16cd84a

Comment on lines +388 to +395
for (let i = 0; i <= chatHead; i++) {
const chatKey = `pad:${padID}:chat:${i}`;
const msg = await db.get(chatKey);
if (msg != null && msg.authorId === authorID) {
msg.authorId = null;
await db.set(chatKey, msg);
clearedChatMessages++;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Legacy chat ids not cleared 🐞 Bug ≡ Correctness

AuthorManager.anonymizeAuthor() only nulls msg.authorId, so chat records stored with legacy
userId (explicitly supported as an older DB shape) will not be anonymized and will keep
referencing the erased authorID.
Agent Prompt
### Issue description
`src/node/db/AuthorManager.anonymizeAuthor()` only detects chat messages authored by the target author via `msg.authorId`, but chat records might use the legacy field names (`userId`/`userName`). Those legacy records will not be scrubbed, leaving author linkage behind.

### Issue Context
`ChatMessage.fromObject()` explicitly supports old DB records where `userId` was renamed to `authorId` and `userName` to `displayName`.

### Fix Focus Areas
- src/node/db/AuthorManager.ts[388-395]

### Implementation notes
- When loading `msg` from `pad:<padId>:chat:<i>`, treat the author field as `msg.authorId ?? msg.userId`.
- If it matches `authorID`, clear **both** `authorId` and `userId` (and consider clearing `displayName`/`userName` if present) before writing back.
- Keep the stored value a plain object (avoid storing a `ChatMessage` instance that could serialize differently).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

* Resolve doc/privacy.md conflict by folding the GDPR Art. 17 section
  into develop's expanded privacy doc (kept IP/banner content).
* Consolidate the duplicated `version['1.3.1']` declaration in
  APIHandler.ts so both compactPad (from develop) and anonymizeAuthor
  (from this branch) live in one map.
* Address Qodo rule violation: gate anonymizeAuthor on a new
  `gdprAuthorErasure.enabled` setting (default false). API.ts now
  rejects calls with an apierror when disabled; settings.json.template
  and settings.json.docker document the toggle. Integration test
  flips the flag in `before()` and asserts the disabled-flag error
  path.
* Qodo bug 'non-resumable partial erasure' was already fixed by
  16cd84a — `erased: true` is stamped only after the chat scrub
  loop completes, so a thrown scrub now resumes on retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear merged commit 69bb1e1 into develop May 3, 2026
50 checks passed
@JohnMcLear JohnMcLear deleted the feat-gdpr-author-erasure branch May 3, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants