-
Notifications
You must be signed in to change notification settings - Fork 4
fix: simplify URI handling when the same deployment URL is already opened #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Small improvement where we get rid of emitting environment and ssh connection trigger events from new coroutines. StateFlow in Kotlin is a hot, conflated flow that keeps only the most recent value. In other words we can immediately update the value without needing to launch a new coroutine, and we won't block the current thread.
…ened Netflix reported that only seems to reproduce on Linux (we've only tested Ubuntu so far). I can’t reproduce it on macOS. First, here’s some context: 1. Polling workspaces: Coder Toolbox polls the deployment every 5 seconds for workspace updates. These updates (new workspaces, deletions,status changes) are stored in a cached “environments” list (an oversimplified explanation). When a URI is executed, we reset the content of the list and run the login sequence, which re-initializes the HTTP poller and CLI using the new deployment URL and token. A new polling loop then begins populating the environments list again. 2. Cache monitoring: Toolbox watches this cached list for changes—especially status changes, which determine when an SSH connection can be established. In Netflix’s case, they launched Toolbox, created a workspace from the Dashboard, and the poller added it to the environments list. When the workspace switched from starting to ready, they used a URI to connect to it. The URI reset the list, then the poller repopulated it. But because the list had the same IDs (but new object references), Toolbox didn’t detect any changes. As a result, it never triggered the SSH connection. This issue only reproduces on Linux, but it might explain some of the sporadic macOS failures Atif mentioned in the past. I need to dig deeper into the Toolbox bytecode to determine whether this is a Toolbox bug, but it does seem like Toolbox wasn’t designed to switch cleanly between multiple deployments and/or users. The current Coder plugin behavior—always performing a full login sequence on every URI—is also ...sub-optimal. It only really makes sense in these scenarios: 1. Toolbox started with deployment A, but the URI targets deployment B. 2. Toolbox started with deployment A/user X, but the URI targets deployment A/user Y. But this design is inefficient for the most common case: connecting via URI to a workspace on the same deployment and same user. While working on the fix, I realized that scenario (2) is not realistic. On the same host machine, why would multiple users log into the same deployment via Toolbox? The whole fix revolves around the idea of just recreating the http client and updating the CLI with the new token instead of going through the full authentication steps when the URI deployment URL is the same as the currently opened URL
With this commit we split the responsibilities. `CoderProtocolHander` is now only responsible searching the workspace and agent and orchestrating the IDE installation and launching. The code around initializing the http client and cli with a new URL and a new token, plus cleaning up the old resources like the polling loop and the list of environments. There are two major benefits to this approach: - allows us to easily share/reuse the logic around cleaning up resources and re-initializing the http client and cli without passing so many callbacks to CoderProtocolHandler (less coupling, code that is cleaner and easier to read, easier to maintain and test) - provides a nice and easy way to check whether the URI url is the same as the one from current deployment and properly react if they differ.
When Toolbox is closed and URI is executed the handler waits for the plugin to fully initialize with the last successful deployment after which it goes on with the URI handling logic. We can improve this situation by skipping the last successful deployment which anyway. This reduces a lot the time to an actual IDE connection.
Auto connect was running after logout on deployments that were opened when Toolbox was launched by a URI. This fix forces URI handler to disable auto connect for the URI connections.
code-asher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice 👌
| * Also called as part of our own logout. | ||
| */ | ||
| override fun close() { | ||
| softClose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just kind of thinking out loud, not something we have to do now or in this PR, but it looks like the pattern we have is to close the client and cli and then immediately set/unset them, so I wonder if we would benefit from formalizing that, like if we had something like update(newClient, newCli) that closed the old ones if any and set new ones, so it is never possible to accidentally be in a sort of desynced state where you have a closed client/cli but have not updated or unset the client/cli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible, I'll add it on my TODO list.
Netflix reported that only seems to reproduce on Linux (we've only tested Ubuntu so far).
I can’t reproduce it on macOS. First, here’s some context:
Polling workspaces:
Coder Toolbox polls the deployment every 5 seconds for workspace updates.
These updates (new workspaces, deletions,status changes) are stored in a
cached “environments” list (an oversimplified explanation). When a URI is executed,
we reset the content of the list and run the login sequence, which re-initializes
the HTTP poller and CLI using the new deployment URL and token. A new polling loop
then begins populating the environments list again.
Cache monitoring:
Toolbox watches this cached list for changes—especially status changes, which determine
when an SSH connection can be established.
In Netflix’s case, they launched Toolbox, created a workspace from the Dashboard, and the
poller added it to the environments list. When the workspace switched from starting to ready,
they used a URI to connect to it. The URI reset the list, then the poller repopulated it. But
because the list had the same IDs (but new object references), Toolbox didn’t detect any changes.
As a result, it never triggered the SSH connection. This issue only reproduces on Linux, but it
might explain some of the sporadic macOS failures Atif mentioned in the past.
I need to dig deeper into the Toolbox bytecode to determine whether this is a Toolbox bug, but
it does seem like Toolbox wasn’t designed to switch cleanly between multiple deployments and/or users.
The current Coder plugin behavior—always performing a full login sequence on every URI—is also ...sub-optimal.
It only really makes sense in these scenarios:
But this design is inefficient for the most common case: connecting via URI to a workspace on the
same deployment and same user. While working on the fix, I realized that scenario (2) is not realistic.
On the same host machine, why would multiple users log into the same deployment via Toolbox? The whole
fix revolves around the idea of just recreating the http client and updating the CLI with the new token
instead of going through the full authentication steps when the URI deployment URL is the same as the
currently opened URL
The fix focuses on simply recreating the HTTP client and updating the CLI token when the URI URL matches the existing deployment URL, instead of running a full login.
This PR splits responsibilities more cleanly:
The benefits would be: