Skip to content

Upgrade lsdb to >=0.8#331

Open
jaladh-singhal wants to merge 2 commits into
Caltech-IPAC:mainfrom
jaladh-singhal:lsdb-upgrade
Open

Upgrade lsdb to >=0.8#331
jaladh-singhal wants to merge 2 commits into
Caltech-IPAC:mainfrom
jaladh-singhal:lsdb-upgrade

Conversation

@jaladh-singhal

@jaladh-singhal jaladh-singhal commented Jun 3, 2026

Copy link
Copy Markdown
Member

I chose 0.8.1 because that's what we're using in fornax-demo-notebooks and it was a hotfix for problems in 0.8.0 (https://github.com/astronomy-commons/lsdb/releases/tag/v0.8.1)

Closes #197

@jaladh-singhal jaladh-singhal self-assigned this Jun 3, 2026
@jaladh-singhal jaladh-singhal added maintenance General maintenance of the content and/or infrastructure infrastructure Infrastructure related issues/PRs. labels Jun 3, 2026

@troyraen troyraen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irsa-hats-with-lsdb.md
ztf_lc_df = ztf_lc.compute()

memory_limit=None # to prevent it from running out of memory
) as client:
print(f"This may take more than a few minutes to complete. You can monitor progress in Dask dashboard at {client.dashboard_link}")
ztf_lcs_df = ztf_lcs.compute()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing circleci to run out of memory and fail. I'll try running it when I'm back at a computer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAM usage seems to peak a few times in the latest run, but I have never seen this dask traceback before, so it's a bit different right now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few different things to try to reduce the memory usage of this call. Running it without a client used the least for me (5.5G max). It seems in the other notebook we found that the cumulative effect of multiple .compute() calls put us over the limit, so we needed to use a client for the index call so that it would release the memory, but here it's the added overhead of using a client that puts us over the limit.

FWIW, the only other ways I found to keep this call under the 8G limit of our CircleCI machine were to use nworkers=1 (7.3G max) or reduce to very minimal columns = ['objectid', 'lightcurve.mag', 'lightcurve.hmjd'] (7.7G max). I think both of those are worse options for the end user.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's skip this on the circleCI job then and see if it passes on GHA. -- I'm planning to drop circleCI altogether anyway in the near future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just remove the client here so that this notebook uses a max of 5.5G, will that be enough to leave in the circleci job?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that may just work, but we may see it easier if you just push a commit and try

@bsipocz bsipocz added the GHA buildhtml Enable extra buildhtml job on GHA label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GHA buildhtml Enable extra buildhtml job on GHA infrastructure Infrastructure related issues/PRs. maintenance General maintenance of the content and/or infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update notebooks for lsdb v0.8 crossmatch() changes

3 participants