fix: improve harvester reports access and log downloads by TahaKhan998 · Pull Request #773 · CERNDocumentServer/cds-rdm

TahaKhan998 · 2026-04-21T12:10:19Z

Closes #767

Email template, removed "View Full Details"

The email had a "View Full Details" button pointing to /administration/jobs/<job_id>/<run_id>, but that page requires administration-access, which most recipients don’t have → leads to a permission error.

Removed the button from both HTML and plain-text templates. Summary + safe links are still there.

"View Changes" works for curators

Curators could access the Harvester Reports page, but "View Changes" failed due to can_search_revisions not including the harvester-curator path (blank page / 403).

Added HarvesterCurator to can_search_revisions, so curators can now actually see revision history.

Download button now returns real run logs

Previously, download used audit logs - only showed what got published.

Now it queries JOBS_LOGGING_INDEX using run_id + job_id and formats output like the admin UI:

[timestamp] LEVEL message
grouped by task_id
includes run status + failure message
respects max results limit

The .log is now a proper offline copy of the run logs.

Harvester Reports shows only system publishes

Page was showing all record.publish logs (including manual ones).

Now also filters user_id = "system" (backend + frontend), so it only shows harvester activity.

sakshamarora1 · 2026-04-22T08:35:55Z

+        if not run:
+            return {"message": "Run not found"}, 404
+
+        full_index_name = prefix_index(current_app.config["JOBS_LOGGING_INDEX"])


We should use the service layer here.

sakshamarora1 · 2026-04-22T08:38:03Z

+            )
+        if status in ("FAILED", "PARTIAL_SUCCESS"):
+            banner = "Job failed" if status == "FAILED" else "Job partially succeeded"
+            header.append("")


How does the end result look like? Can you attach an example file?

harvester_logs_81eb2c75-b5c2-40a4-a3c1-7d4c375d6876_20260422_110359.log
Check this

sakshamarora1 · 2026-04-22T08:38:22Z

    can_create = [AuthenticatedRegularUser(), SystemProcess()]
    can_read = RDMRecordPermissionPolicy.can_read + [ArchiverRead()]
    can_search = RDMRecordPermissionPolicy.can_search + [ArchiverRead()]
+    can_search_revisions = RDMRecordPermissionPolicy.can_search_revisions + [


Why is this needed?

So curators can use the "View Changes" button on the Harvester Reports page.

can you make sure they can only see revisions made by system user?
if they can see more, let's discuss on how to tackle it before jumping to implementations

kpsherva · 2026-04-22T09:50:50Z

-        )
+            hidden_params=[
+                ["action", "record.publish"],
+                ["user_id", "system"],


did you check by any chance what does the REST API return for audit logs? we need to make sure REST API does not return wrong entries for curators role (on audit logs endpoint directly, not on this admin one)

kpsherva · 2026-04-22T09:53:57Z

+        except ValueError:
+            return {"message": "Invalid harvester run query"}, 400
+
+        run_query = Run.query.filter_by(started_at=started_at, parent_run_id=None)


I am not sure I understand the intention correctly - are you selecting the run based only on the started at time? if yes - I would make it more precise - it needs to be a run only of the specific harvester type of task (not a job - because a job is an instance of celery task, and if it is not created on the environment, then it will not exist)
otherwise, if there will be another run (of other type) which started the same time - unlikely but not impossible, then you will return logs of that other run

also, after additional thought... wouldn't it be better to rely on the run identifier rather than on it's timestamp? identifier would be more reliable, no?

kpsherva · 2026-04-22T09:56:58Z

+            header.append(
+                f"Finished: {_format_timestamp(run.finished_at.isoformat())}"
+            )
+        if status in ("FAILED", "PARTIAL_SUCCESS"):


I would include the success ones too, because it might contain warnings (or debug or info, depending on the settings), even if it was successful run

kpsherva · 2026-04-22T09:57:48Z

+                return raw
+
+        # Group by context.task_id in first-seen order (RunsLogs.js buildLogTree).
+        task_groups = {}


wasn't this grouping already done somewhere in the code? I am having flashbacks :)

TahaKhan998 force-pushed the fix-harvester-report-flow branch 2 times, most recently from d0ad3f5 to da34f38 Compare April 22, 2026 07:33

sakshamarora1 reviewed Apr 22, 2026

View reviewed changes

fix: improve harvester reports access and log downloads

ddd6fd8

TahaKhan998 force-pushed the fix-harvester-report-flow branch from da34f38 to ddd6fd8 Compare April 22, 2026 09:10

kpsherva reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve harvester reports access and log downloads#773

fix: improve harvester reports access and log downloads#773
TahaKhan998 wants to merge 1 commit intoCERNDocumentServer:masterfrom
TahaKhan998:fix-harvester-report-flow

TahaKhan998 commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

sakshamarora1 Apr 22, 2026

Uh oh!

kpsherva Apr 22, 2026

Uh oh!

sakshamarora1 Apr 22, 2026

Uh oh!

TahaKhan998 Apr 22, 2026

Uh oh!

sakshamarora1 Apr 22, 2026

Uh oh!

TahaKhan998 Apr 22, 2026

Uh oh!

kpsherva Apr 22, 2026

Uh oh!

kpsherva Apr 22, 2026

Uh oh!

kpsherva Apr 22, 2026 •

edited

Loading

Uh oh!

kpsherva Apr 22, 2026

Uh oh!

kpsherva Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TahaKhan998 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Email template, removed "View Full Details"

"View Changes" works for curators

Download button now returns real run logs

Harvester Reports shows only system publishes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kpsherva Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TahaKhan998 commented Apr 21, 2026 •

edited

Loading

kpsherva Apr 22, 2026 •

edited

Loading