Skip to content

added miniray support for legacy codedir#59

Draft
MarkNDeaconu wants to merge 28 commits into
masterfrom
legacy-codedir-support
Draft

added miniray support for legacy codedir#59
MarkNDeaconu wants to merge 28 commits into
masterfrom
legacy-codedir-support

Conversation

@MarkNDeaconu

Copy link
Copy Markdown
Contributor

No description provided.

TheConverseEngineer and others added 28 commits June 10, 2026 14:51
Bugfix: Check for pending model loads before re-attempting
Added warning for empty queue. Notice for people that have not pulled master and are on the wrong queue.
PYTHONPYCACHEPREFIX redirects all bytecode reads/writes to a per-node dir under /var/cache/miniray, so imports never write to the codedir which can cause issues on nfs when using an older linux kernel (torvalds/linux@99bc9f2). 

Each job gets its own subdir so tasks sharing a venv share the cache, and cleanup_venvs deletes the local subdir when the venv is evicted. Task processes run as TASK_UID so they can write to the user-owned pycache dirs.

PYTHONDONTWRITEBYTECODE is slower in comparison since recompiling happens per task. (2.5x more time compared to warm cache for torch + pandas + numpy import)
https://docs.python.org/3/library/os.html#os.waitpid

"options is an OR combination of flags. If it contains WNOHANG and there are no matching children in the requested state, (0, 0) is returned."

pid, returncode = os.waitpid(self.proc.pid, os.WNOHANG) returns (0,0) from a task that has not terminated. On a timeout error the returncode is set to 0 for each task, leaving them as zombies. poll() is a better alternative since it returns None when the task has not terminated : https://docs.python.org/3/library/subprocess.html#subprocess.Popen.poll
@MarkNDeaconu

Copy link
Copy Markdown
Contributor Author

We decided not to go this direction. Leaving the pr for a bit in case it comes up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants