Skip to content

Relax parsing requirements to allow hyphens and periods#139

Merged
ColtonPayne merged 7 commits intomainfrom
update_parsing_logic
Apr 22, 2026
Merged

Relax parsing requirements to allow hyphens and periods#139
ColtonPayne merged 7 commits intomainfrom
update_parsing_logic

Conversation

@ColtonPayne
Copy link
Copy Markdown
Collaborator

@ColtonPayne ColtonPayne commented Apr 17, 2026

Summary

Relaxes identifier/entity validation across the fact and rule parsers, splits the rule parser's single identifier regex into separate predicate/component regexes, and aligns fact and rule component rules so any entity valid as a fact can also appear as a grounded atom in a rule.

Final regex layout

Role Regex Leading digit? . / - @
Fact predicate [a-zA-Z_][a-zA-Z0-9_.\-]* no yes no
Fact component [a-zA-Z0-9_][a-zA-Z0-9_.@\-]* yes yes yes
Rule predicate [a-zA-Z_][a-zA-Z0-9_.\-]* no yes no
Rule component [a-zA-Z0-9_][a-zA-Z0-9_.@\-]* yes yes yes

Fact and rule components share the same regex. Predicates are stricter than components (no leading digit, no @).

Changes

pyreason/scripts/utils/fact_parser.py

  • Added _PREDICATE_RE and _COMPONENT_RE.
  • Added _validate_predicate() and _validate_component() helpers.
  • Replaces inline predicate checks and ad-hoc (, ), : bans on components — covered by the regex.
  • Node components are now validated (previously only edge components had any validation, and only an empty-check).

pyreason/scripts/utils/rule_parser.py

  • Split _IDENTIFIER_RE into _PREDICATE_RE (identifiers) and _COMPONENT_RE (entities — matches fact_parser._COMPONENT_RE).
  • _validate_component_name now uses _COMPONENT_RE; removed the now-unreachable digit-start error branch.
  • Error messages updated to reflect the new allowed sets.

tests/unit/dont_disable_jit/test_rule_parser.py

  • Negative-case inputs switched from - (now valid) to ! (still invalid) in three tests.
  • Removed test_head_variable_starts_with_digit — digit-leading rule components are now valid by design.

tests/api_tests/test_pyreason_reasoning.py

  • 13 facts changed from person("A") to person(A) — quoted entity names are no longer valid under the stricter component allowlist.

Test plan

  • Parser unit tests pass (168/168)
  • All 11 previously-failing API reasoning tests pass after the quoted→unquoted fix
  • Smoke tests confirm:
    • Facts with hyphens/periods/digits/@ parse: has-vuln(node-1), cve.2024.1234(host.a, host.b), person(123), user(alice@example.com)
    • Rule components accept the same entity shapes: p(1X) <- b(1X), p(a@b) <- q(a@b)
    • Invalid chars (!, ", leading @, ~, /) are still rejected

Note

The branch name and first commit mention "spaces" but spaces are not allowed — only -, ., and @ were added across the changes.

@ColtonPayne ColtonPayne changed the title Relax parsing requirements to allow hyphens and spaces Relax parsing requirements to allow hyphens and periods Apr 18, 2026
Copy link
Copy Markdown
Member

@jaikrishnap98 jaikrishnap98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I ran the tests and aslo verified the changes

@ColtonPayne ColtonPayne merged commit c7ff782 into main Apr 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants