Skip to content

PT download: check "just use a link" with HEAD request#192

Draft
carylwyatt wants to merge 12 commits intomainfrom
pt-download-head-request
Draft

PT download: check "just use a link" with HEAD request#192
carylwyatt wants to merge 12 commits intomainfrom
pt-download-head-request

Conversation

@carylwyatt
Copy link
Member

@aelkiss Question for you regarding the new "just use a link" choice for "simple" downloads that don't use the callback function from imgsrv.

If the link is broken/malformed or imgsrv gives a 403/404/500, there isn't anything we can do with that if it's "just a link." There isn't a way to catch an error or make a suggestion to the user to try again because it's a link, not a fetch.

We probably don't want to assume that if the link is correct that imgsrv will always return what is requested. I think we pursued this path because my implementation was fetching and returning a blob, and we were concerned about memory/performance if the download/images were large.

In my little bit of research on this, I've seen an intermediary option that checks the link via fetch with a HEAD request instead of GET request to just see if the request is valid. In the small bit of testing I did on this locally, the HEAD request is slowwww (like 3-4 seconds), but I think that could maybe work since we're using the download modal to "build" the download. In the current implementation, the "build" is instantaneous since it builds the link for the download button on the fly, but if we need a slight delay to send a HEAD request and check if the request is valid before allowing the download (and setting an error in the modal if we get an error), then the download modal is the perfect place to do that.

Will an extra request to imgsrv add significant problems? Any other gotchas I haven't thought through here?

Just some thoughts on this Friday afternoon. I know this process has already taken some time, but this is my last lingering question.

P.S. This is staged on dev-3 so I could test the download.

@carylwyatt carylwyatt requested a review from aelkiss March 6, 2026 20:56
return `${request}${newAction}?${params.toString()}`
})

async function checkFetch(url) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moseshll already reviewed the rest of the code, so I'll just comment on the parts that are new on this branch. Here's an async function with the HEAD request that also sends the error to matomo and updates some variables that will later make sure the user gets an error message and can't click the download button.

if (isSimpleDownload()) {
downloadInProgress = true;

checkFetch(simpleUrl)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the checkFetch functionality to the submitDownload function fired by the Download button in the form in the sidebar.

<div xxstyle="width: 30rem">
<div>
{#if status.percent < 100}
{#if simpleDownload && downloadInProgress}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New error handling in the modal.

{/if}
{#if downloadInProgress}
<span class="btn btn-primary disabled"> Download </span>
{:else if simpleDownload && downloadError}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a "cancel" button if the download has an error.

@carylwyatt
Copy link
Member Author

The UI on this needs some additional polishing, but I didn't want to go too far down that path if this isn't a viable option. Thanks!

@carylwyatt carylwyatt marked this pull request as draft March 9, 2026 13:09
@aelkiss
Copy link
Member

aelkiss commented Mar 10, 2026

Thoughts on general implementation:

  • What's the behavior in the interface if the GET returns a 403/404/500? Does the download just silently fail / download an empty file?
  • How does imgsrv handle a HEAD request? Ideally, it might check some things like if that volume/page exists and the user is authorized to get it?

Trying to think through what we'd learn about failures if we implement the HEAD option:

  • if imgsrv is in a bad state or generally timing out (although there's no guarantee it would be the same as a subsequent GET request)
  • if there's some problem with session & authentication (although I think pageturner already tries to compensate for this)
  • if requests to imgsrv are just getting blocked entirely (throttling, cloudflare, etc)

However:

  • The HEAD request should tell us if imgsrv thinks the download should succeed, but generally if the user is being presented a download link then at least pageturner thinks it should succeed.
  • We could certainly end up with cases where the HEAD request succeeds but the GET request ultimately fails.
  • We want to know about scenarios where the download ultimately doesn't succeed.

I'll think about this some more.

@carylwyatt
Copy link
Member Author

What's the behavior in the interface if the GET returns a 403/404/500? Does the download just silently fail / download an empty file?

Before I added the download attribute to the link, the browser would load the page, which, in my case looked like a single word on an unstyled page: Restricted. (I also occasionally got an unstyled page with "Your download is in progress. Please try again later." message from _in_progress_alert. But that's a status 200, I just haven't seen it again since I added the download attribute).
image

After I added the download attribute to the link, yes, just a silent fail with an empty download (which looks like this in Brave-flavored Chrome).
image

With the download attribute on the link, it's not immediately obvious that anything went wrong with the download because the user is still presented with the "Save As" dialog (or at least I was presented with that window).

How does imgsrv handle a HEAD request? Ideally, it might check some things like if that volume/page exists and the user is authorized to get it?

For a valid request, it returned a 200 and all the response headers it would return with a GET request.
image

On this request, I removed the id param, so now I'm getting a 500:
image

Trying to think through what we'd learn about failures if we implement the HEAD option

For me, it's more about letting the user know that something went wrong, and the "just a link" failures don't make that obvious since something downloaded (kind of). We get complaints from people that they tried to download and "nothing happened" and we can't recreate that on our end. With this HEAD request step, I can update the UI to tell the user that something went wrong and to try again. This also gives me a place to push the matomo tag with the error code and the URL that was attempted for download so we can track when downloads fail. Without this step, I'm sure there's a way to tell from the logs that someone attempted to reach this URL but was denied... right?

We could certainly end up with cases where the HEAD request succeeds but the GET request ultimately fails.

I didn't think of that. But of course that's true! Hmm...

We want to know about scenarios where the download ultimately doesn't succeed.

Yes. And in the best case scenario also let the user know the download didn't succeed.

@aelkiss
Copy link
Member

aelkiss commented Mar 10, 2026

OK -- I think it probably makes sense to go ahead and try the HEAD option, probably combined with some work in imgsrv to optimize HEAD requests. I'm looking at some other things in imgsrv anyway so I can take a look at options there.

@aelkiss
Copy link
Member

aelkiss commented Mar 10, 2026

For imgsrv I think what we want to do is bail out before generating the actual result if it's a HEAD request -- that is, do all the setup & check access, but don't do the actual generation, for example:

https://github.com/hathitrust/babel/blob/main/imgsrv/lib/SRV/Volume/Base.pm#L129

We probably need to put such a check in every run method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants