Skip to content

fix: improve harvester reports access and log downloads#773

Open
TahaKhan998 wants to merge 1 commit intoCERNDocumentServer:masterfrom
TahaKhan998:fix-harvester-report-flow
Open

fix: improve harvester reports access and log downloads#773
TahaKhan998 wants to merge 1 commit intoCERNDocumentServer:masterfrom
TahaKhan998:fix-harvester-report-flow

Conversation

@TahaKhan998
Copy link
Copy Markdown

@TahaKhan998 TahaKhan998 commented Apr 21, 2026

Closes #767

Email template, removed "View Full Details"

The email had a "View Full Details" button pointing to /administration/jobs/<job_id>/<run_id>, but that page requires administration-access, which most recipients don’t have → leads to a permission error.

Removed the button from both HTML and plain-text templates. Summary + safe links are still there.

"View Changes" works for curators

Curators could access the Harvester Reports page, but "View Changes" failed due to can_search_revisions not including the harvester-curator path (blank page / 403).

Added HarvesterCurator to can_search_revisions, so curators can now actually see revision history.

Download button now returns real run logs

Previously, download used audit logs - only showed what got published.

Now it queries JOBS_LOGGING_INDEX using run_id + job_id and formats output like the admin UI:

[timestamp] LEVEL message
grouped by task_id
includes run status + failure message
respects max results limit

The .log is now a proper offline copy of the run logs.

Harvester Reports shows only system publishes

Page was showing all record.publish logs (including manual ones).

Now also filters user_id = "system" (backend + frontend), so it only shows harvester activity.

@TahaKhan998 TahaKhan998 force-pushed the fix-harvester-report-flow branch 2 times, most recently from d0ad3f5 to da34f38 Compare April 22, 2026 07:33
Comment thread site/cds_rdm/harvester_download/resources/resource.py Outdated
if not run:
return {"message": "Run not found"}, 404

full_index_name = prefix_index(current_app.config["JOBS_LOGGING_INDEX"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the service layer here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

)
if status in ("FAILED", "PARTIAL_SUCCESS"):
banner = "Job failed" if status == "FAILED" else "Job partially succeeded"
header.append("")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the end result look like? Can you attach an example file?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_create = [AuthenticatedRegularUser(), SystemProcess()]
can_read = RDMRecordPermissionPolicy.can_read + [ArchiverRead()]
can_search = RDMRecordPermissionPolicy.can_search + [ArchiverRead()]
can_search_revisions = RDMRecordPermissionPolicy.can_search_revisions + [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So curators can use the "View Changes" button on the Harvester Reports page.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure they can only see revisions made by system user?
if they can see more, let's discuss on how to tackle it before jumping to implementations

@TahaKhan998 TahaKhan998 force-pushed the fix-harvester-report-flow branch from da34f38 to ddd6fd8 Compare April 22, 2026 09:10
) No newline at end of file
hidden_params=[
["action", "record.publish"],
["user_id", "system"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check by any chance what does the REST API return for audit logs? we need to make sure REST API does not return wrong entries for curators role (on audit logs endpoint directly, not on this admin one)

except ValueError:
return {"message": "Invalid harvester run query"}, 400

run_query = Run.query.filter_by(started_at=started_at, parent_run_id=None)
Copy link
Copy Markdown
Contributor

@kpsherva kpsherva Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the intention correctly - are you selecting the run based only on the started at time? if yes - I would make it more precise - it needs to be a run only of the specific harvester type of task (not a job - because a job is an instance of celery task, and if it is not created on the environment, then it will not exist)
otherwise, if there will be another run (of other type) which started the same time - unlikely but not impossible, then you will return logs of that other run

also, after additional thought... wouldn't it be better to rely on the run identifier rather than on it's timestamp? identifier would be more reliable, no?

header.append(
f"Finished: {_format_timestamp(run.finished_at.isoformat())}"
)
if status in ("FAILED", "PARTIAL_SUCCESS"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include the success ones too, because it might contain warnings (or debug or info, depending on the settings), even if it was successful run

return raw

# Group by context.task_id in first-seen order (RunsLogs.js buildLogTree).
task_groups = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't this grouping already done somewhere in the code? I am having flashbacks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sandbox: harvester notification email includes inaccessible "view full details" link

3 participants