fix: improve harvester reports access and log downloads#773
fix: improve harvester reports access and log downloads#773TahaKhan998 wants to merge 1 commit intoCERNDocumentServer:masterfrom
Conversation
d0ad3f5 to
da34f38
Compare
| if not run: | ||
| return {"message": "Run not found"}, 404 | ||
|
|
||
| full_index_name = prefix_index(current_app.config["JOBS_LOGGING_INDEX"]) |
There was a problem hiding this comment.
We should use the service layer here.
| ) | ||
| if status in ("FAILED", "PARTIAL_SUCCESS"): | ||
| banner = "Job failed" if status == "FAILED" else "Job partially succeeded" | ||
| header.append("") |
There was a problem hiding this comment.
How does the end result look like? Can you attach an example file?
There was a problem hiding this comment.
| can_create = [AuthenticatedRegularUser(), SystemProcess()] | ||
| can_read = RDMRecordPermissionPolicy.can_read + [ArchiverRead()] | ||
| can_search = RDMRecordPermissionPolicy.can_search + [ArchiverRead()] | ||
| can_search_revisions = RDMRecordPermissionPolicy.can_search_revisions + [ |
There was a problem hiding this comment.
So curators can use the "View Changes" button on the Harvester Reports page.
There was a problem hiding this comment.
can you make sure they can only see revisions made by system user?
if they can see more, let's discuss on how to tackle it before jumping to implementations
da34f38 to
ddd6fd8
Compare
| ) No newline at end of file | ||
| hidden_params=[ | ||
| ["action", "record.publish"], | ||
| ["user_id", "system"], |
There was a problem hiding this comment.
did you check by any chance what does the REST API return for audit logs? we need to make sure REST API does not return wrong entries for curators role (on audit logs endpoint directly, not on this admin one)
| except ValueError: | ||
| return {"message": "Invalid harvester run query"}, 400 | ||
|
|
||
| run_query = Run.query.filter_by(started_at=started_at, parent_run_id=None) |
There was a problem hiding this comment.
I am not sure I understand the intention correctly - are you selecting the run based only on the started at time? if yes - I would make it more precise - it needs to be a run only of the specific harvester type of task (not a job - because a job is an instance of celery task, and if it is not created on the environment, then it will not exist)
otherwise, if there will be another run (of other type) which started the same time - unlikely but not impossible, then you will return logs of that other run
also, after additional thought... wouldn't it be better to rely on the run identifier rather than on it's timestamp? identifier would be more reliable, no?
| header.append( | ||
| f"Finished: {_format_timestamp(run.finished_at.isoformat())}" | ||
| ) | ||
| if status in ("FAILED", "PARTIAL_SUCCESS"): |
There was a problem hiding this comment.
I would include the success ones too, because it might contain warnings (or debug or info, depending on the settings), even if it was successful run
| return raw | ||
|
|
||
| # Group by context.task_id in first-seen order (RunsLogs.js buildLogTree). | ||
| task_groups = {} |
There was a problem hiding this comment.
wasn't this grouping already done somewhere in the code? I am having flashbacks :)
Closes #767
Email template, removed "View Full Details"
The email had a "View Full Details" button pointing to /administration/jobs/<job_id>/<run_id>, but that page requires administration-access, which most recipients don’t have → leads to a permission error.
Removed the button from both HTML and plain-text templates. Summary + safe links are still there.
"View Changes" works for curators
Curators could access the Harvester Reports page, but "View Changes" failed due to can_search_revisions not including the harvester-curator path (blank page / 403).
Added HarvesterCurator to can_search_revisions, so curators can now actually see revision history.
Download button now returns real run logs
Previously, download used audit logs - only showed what got published.
Now it queries JOBS_LOGGING_INDEX using run_id + job_id and formats output like the admin UI:
[timestamp] LEVEL message
grouped by task_id
includes run status + failure message
respects max results limit
The .log is now a proper offline copy of the run logs.
Harvester Reports shows only system publishes
Page was showing all record.publish logs (including manual ones).
Now also filters user_id = "system" (backend + frontend), so it only shows harvester activity.