Skip to content

[netrid] Check response validity when calling all_flights#1388

Open
the-glu wants to merge 1 commit intointeruss:mainfrom
Orbitalize:fix_1386
Open

[netrid] Check response validity when calling all_flights#1388
the-glu wants to merge 1 commit intointeruss:mainfrom
Orbitalize:fix_1386

Conversation

@the-glu
Copy link
Contributor

@the-glu the-glu commented Mar 10, 2026

Fix #1386 by checking all_flights responses for errors, before doing anything else.

A few points:

  • Most of steps already had a ISA query check, but is it really an 'ISA query' since it's also querying flights?
  • Some checks expects the query to fail (e.g. with too large area checks) so checks are not done every time. I noticed that some Misbehavior won't work as they expect the initial query to pass, no? (Here there will no flight if area is too large)(But that previous behavior).
  • diagonal_km is computed twice to avoid returning two values, but we could change that.

Copy link
Member

@BenjaminPelletier BenjaminPelletier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some checks expects the query to fail (e.g. with too large area checks) so checks are not done every time. I noticed that some Misbehavior won't work as they expect the initial query to pass, no? (Here there will no flight if area is too large)(But that previous behavior).

Yes, making sure our documentation is correct and actual actions match documentation are both important, and I think there is probably some existing tech debt that will need to be resolved as part of the main issue. I think the core scenario does work currently though; here is the sequence view artifact from the latest CI run.

Note though that there seem to be 6 queries for the test step, but only 1 check for the whole test step. That's a strong code smell as I would expect there to be something we would want to check regarding the reasonableness of the response for every query we make. Of course, this is existing tech debt and not caused by this PR, so any fixes could be in a separate PR.

diagonal_km is computed twice to avoid returning two values, but we could change that.

I think the strong presumption to favor clarity over maximum performance holds here and computing twice is no problem if that makes the code easier to read, understand, maintain, etc.


self.record_queries(sp_observation.queries)

if not sp_observation.success and not self._is_area_too_large(rect):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you noticed in your PR comment, rid.all_flights doesn't just perform an ISA query; it performs an ISA query (potentially) followed by a bunch of queries for flights and flight details from individual USSs. The DSS failing to return ISAs is a violation of different requirements than the core USS failing to respond correctly to a flights query or a flight details query. Probably part of #1386 involves resolving the TODO just above this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the todo is resolved, however the name is wrong (and probably wrong for existing ISA query as well, since they do the same).

Do you have a better name in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For misbehaviour I switched to "Initial queries" and improved description based on comment bellow, but I think the point still hold for display data evaluator ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that test steps are where actions are performed, and then checks (following the actions) determine if any requirements were violated based on the information obtained from those actions. With that in mind, I'm not sure what "Initial queries" would be checking. I think this first check description makes sense:

interuss.f3411.dss_endpoints.SearchISAs requires a USS providing a DSS instance to implement the DSS endpoints of the OpenAPI specification. If uss_qualifier is unable to query the DSS for ISAs, this check will fail.

But, it seems like the thing we're checking is "Did the (single) DSS query to search for ISAs succeed?" I would expect that check to be named something like "Successful ISA search" or "DSS ISAs search succeeded" or "ISA query succeeded" or "ISA query success".

This check description makes less sense to me (additional emphasis added):

interuss.f3411.dss_endpoints.SearchISAs requires a USS providing a DSS instance to implement the DSS endpoints of the OpenAPI specification. Then, in order to properly test whether the SP handles authentication correctly, after identifying the SP contact information via its ISA in the DSS, this step will first attempt to do a flights request with the proper credentials to confirm that the requested data is indeed available to any authorized query. If uss_qualifier is unable to query the DSS for ISAs or flights, this check will fail.

This seems to be documenting a check (the first sentence) then another, separate test step (second sentence). Actions in a test step belong in the test step documentation (one documentation level higher than check documentation) and then checks should describe only how the information already obtained will be evaluated to determine if a requirement was violated.

So, it seems like there are three actions in this test step (due to rid.all_flights):

  1. Query DSS for ISAs
  2. For each ISA found, query the USS for flights
  3. For each flight found, query the USS for flight details

The InterUSS requirement interuss.f3411.dss_endpoints.SearchISAs applies only to the information obtained from action 1. interuss.f3411.dss_endpoints.SearchISAs doesn't relate at all to the information obtained in actions 2 nor 3 -- instead, presumably requirements NET0710,1 and NET0710,2 apply to actions 2 and 3, respectively (when those queries are expected to succeed). But, these should be two separate checks since the information we're evaluating for NET0710,1 is coming from action 2 and the information we're evaluating for NET0710,2 is coming from action 3 -- the action, information, and relevant requirement are all different between action 2/NET0710,1 and action 3/NET0720,2 so they shouldn't be combined into a single check.

The original problem prompting #1386 was that these two checks AND the ISA query check were all combined into a single check. When that check failed, it was extremely unclear what had actually gone wrong. That's because a check for A && B && C && D failing provides little information as to whether A failed, B failed, and/or C failed. This was especially problematic because the check only documented that D was being checked, so the fact that B was failing was extremely difficult to determine.

@the-glu
Copy link
Contributor Author

the-glu commented Mar 11, 2026

Yes, making sure our documentation is correct and actual actions match documentation are both important, and I think there is probably some existing tech debt that will need to be resolved as part of the main issue. I think the core scenario does work currently though; here is the sequence view artifact from the latest CI run.

I think that the issue that in misbehavior, the test should not try to do authentication tests with too large area (nor multiple ones since it's the same), because it will directly fail at initial step, therefore not finding any flights to test.

image

Commenting line 100/101, and doing on "self._rid_version.max_details_diagonal_km * 1000 - 100" check seems to do it more cleanly:

image

(Also print-based-debugging show similar result, however I see the 3 pass/pooling (too large without flights to test, clustered ok and details ok, but I don't understand why the rapport show only 2 "Initial queries" (and one check)), the two OK pass seems to be merged)

Note though that there seem to be 6 queries for the test step, but only 1 check for the whole test step. That's a strong code smell as I would expect there to be something we would want to check regarding the reasonableness of the response for every query we make. Of course, this is existing tech debt and not caused by this PR, so any fixes could be in a separate PR.

Yes, with the new explicit check it will be better (see previous screenshots) so we're probably better now.

@the-glu
Copy link
Contributor Author

the-glu commented Mar 11, 2026

(Also print-based-debugging show similar result, however I see the 3 pass/pooling (too large without flights to test, clustered ok and details ok, but I don't understand why the rapport show only 2 "Initial queries" (and one check)), the two OK pass seems to be merged)

Answering to myself:

Because the pooler stop at the first function returning result.

That seems wrong, I would just switch to a fix, self._rid_version.max_diagonal_km * 1000 - 100 area for all tests no?

The goal is to evaluate authentification, not if initial query fail because it's too large; and the dedicated function to test search area is already restricting to this for that reason.

(I can side fix it there or in another PR)

@BenjaminPelletier
Copy link
Member

I think that the issue that in misbehavior, the test should not try to do authentication tests with too large area (nor multiple ones since it's the same), because it will directly fail at initial step, therefore not finding any flights to test.

The authentication step still seems valid with area-too-large. The correct response to a properly-authenticated "GET /flights" request is 413 (area too large). When that same "GET /flights" request is made without authentication, the correct response is 401 (413 reveals some information to an unauthenticated user).

It does seem like good coverage would probably (also) include a correct response of 200 to compare against the correct unauthenticated response of 401, but the current test scenario documentation doesn't specifically call for a 200 or 413 to compare against 401, so comparing 413 to 401 does seem to accurately fulfill/implement the test scenario documentation.

Note though that there seem to be 6 queries for the test step, but only 1 check for the whole test step. That's a strong code smell as I would expect there to be something we would want to check regarding the reasonableness of the response for every query we make. Of course, this is existing tech debt and not caused by this PR, so any fixes could be in a separate PR.

Yes, with the new explicit check it will be better (see previous screenshots) so we're probably better now.

Yes, I think a separate, explicit check for the DSS query is an improvement and could be sufficient for a single PR. But, to fully fix the scenario, I think the DSS query check should point to the DSS query it's checking (specify query_timestamp) and each query should have at least 1 check. A query with no check means "the result of this query could be literally anything and there would be nothing of concern to report" (in that case, why make the query at all?) In the new screenshot, the USS /flights query in e29 appears to have no checks. So, it could return literally anything (413: error, 200: flights, 200: cake recipe) and we wouldn't be justified reporting any concerns. If that's actually true, we probably shouldn't make the query. In the more likely event that's not true, we should document the check we'd perform -- presumably we expect that query to work properly, which means if it didn't return 413 we'd be concerned per NET0710,1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check response validity separate from flight matching in RID tests

2 participants