Skip to content

Optimize quarter column computation using vectorized NumPy operations#1065

Open
vshnvii wants to merge 2 commits intomalariagen:masterfrom
vshnvii:optimize-quarter-computation
Open

Optimize quarter column computation using vectorized NumPy operations#1065
vshnvii wants to merge 2 commits intomalariagen:masterfrom
vshnvii:optimize-quarter-computation

Conversation

@vshnvii
Copy link

@vshnvii vshnvii commented Mar 6, 2026

This PR replaces the row-wise pandas "apply()" used to compute the "quarter"
column with a vectorized NumPy implementation.

Previous implementation:

df["quarter"] = df.apply(
lambda row: ((row.month - 1) // 3) + 1 if row.month > 0 else -1,
axis="columns",
)

Row-wise "apply()" loops through rows in Python and can be slow for large datasets.

The new implementation uses NumPy vectorization:

df["quarter"] = np.where(
df["month"] > 0,
((df["month"] - 1) // 3) + 1,
-1
)

This improves performance and follows pandas best practices.

@vshnvii
Copy link
Author

vshnvii commented Mar 6, 2026

Hello @jonbrenas @leehart I'm exploring the repository to start contributing towards GSoC 2026.
I've opened a couple of small PRs to get familiar with the codebase and contribution workflow.
I would love to work on more substantial improvements or features related to metadata handling or data processing in the project.
If there are any issues or areas where contributors are particularly needed, I'd be happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant