Skip to content

Conversation

@alexcrocha
Copy link
Contributor

@alexcrocha alexcrocha commented Jan 15, 2026

Description

This PR introduces ruby-rbs, a safe Rust wrapper for the RBS parser. Builds on #2807 (ruby-rbs-sys) — please refer to it for motivation and background on the two-crate approach.

ruby-rbs Overview

  • build.rs reads config.yml and generates Rust structs matching the C AST node types
  • Each node struct holds a pointer to the C node with lifetime bounds
  • Lifetimes ('a) tie all nodes to the parser, preventing use-after-free
  • SignatureNode implements Drop to free the parser when dropped
  • Includes a Visit trait for traversing the AST

Changes to config.yml

This PR adds two new fields to node definitions:

  • rust_name: Specifies the Rust struct name (e.g., BoolNode for RBS::AST::Bool)
  • optional: Documents which fields can be NULL in the C parser

No impact on existing code. These fields are ignored by the Ruby/C code generators.

alexcrocha and others added 30 commits January 14, 2026 14:31
This commit introduces the `ruby-rbs` crate, which will provide a safe,
high-level Rust API for the RBS C library. It follows the common Rust
pattern of separating the safe wrapper from the `*-sys` crate that
provides the raw FFI bindings.

The `ruby-rbs` crate will depend on `ruby-rbs-sys` for the unsafe C
bindings and will expose a safe, idiomatic Rust interface. This commit
sets up the foundation for that structure.

The initial implementation includes:
- The basic crate structure with its own Cargo.toml, declaring a
  dependency on `ruby-rbs-sys`.
- A build script (`build.rs`) that will be responsible for generating
  safe Rust wrappers from the C API. Currently, it only generates an
  empty `bindings.rs` file.
- The `ruby-rbs` crate is added to the main workspace `Cargo.toml`.

While the interaction is not yet implemented, this setup paves the way
for providing a robust Rust interface for RBS, which will improve safety
and developer experience.
The build script now reads the config.yml file and generates corresponding
Rust struct definitions for all RBS AST nodes.

Implementation details:
- Parse config.yml using serde to extract node definitions
- Generate proper Rust module hierarchy from :: namespace separators
- Apply Rust naming conventions:
  - Modules use snake_case
  - Structs remain PascalCase
- Handle Rust reserved keywords (Use -> UseDirective, Self -> SelfType)
- Smart PascalCase to snake_case conversion that correctly handles acronyms
  (e.g., 'AST' -> 'ast', not 'a_s_t')

The generated bindings create empty struct definitions organized in the
correct module hierarchy, laying the foundation for the safe Rust API
that will wrap the ruby-rbs-sys FFI bindings.
Instead of auto-generating nested module paths from RBS nested naming
conventions, use explicit `rust_name` fields in `config.yml` and
generate flat structs.

- Add `rust_name` field to all node definitions in `config.yml`
- Remove complex module/path parsing logic from build.rs
- Generate flat structs (e.g., `ClassNode`) instead of nested modules
- Add `Node` enum to wrap all node types

This makes the generated Rust code easier to work with.
Handle rbs_string field types when generating Rust structs
from config.yml. The RBSString struct wraps rbs_string_t pointers and
provides an as_bytes() method that safely calculates string length using
pointer arithmetic.
The `parse` function enables parsing RBS code from Rust.
This provides a safe Rust interface to the C parser, handling memory
management and encoding setup.
Since `bool` is a primitive type with direct FFI mapping between C and
Rust, we don't need a wrapper struct like we do for complex types
(`rbs_string_t`, etc.).
Symbol fields in RBS AST nodes store their values as constant IDs that
need to be resolved through the parser's constant pool. This safe
Rust wrapper (`RBSSymbol`) maintains a reference to the parser and
provides access to the symbol's name bytes, similar to how `RBSString`
handles string types.

The build script now generates accessors for `rbs_ast_symbol` fields
that properly pass both the symbol pointer and parser reference to
enable constant pool lookups.
Refactor node structs to use pointer-based access and add NodeList
iterator

Changes node generation from storing individual fields to holding a
single pointer to the C struct. This avoids duplicating data in Rust
structs and matches the pattern used in Prism's bindings. We just
maintain a thin wrapper around the C pointer and dereference it in
accessor methods.

Adds NodeList/NodeListIter to enable idiomatic Rust iteration over RBS's
linked list structures, and implements Node::new() factory method that
type-checks the C node pointer and constructs the appropriate Rust
variant with proper pointer casting.

Also adds convert_name() helper to generate C identifiers from RBS node
names (snake_case_t for types, UPPER_CASE for enum constants).
Many AST nodes in `config.yml` have location fields (`rbs_location`,
`rbs_location_list`). This change adds the necessary wrapper structs
(`RBSLocation`, `RBSLocationList`) and updates `build.rs` to generate
accessors for these fields.

The `RBSLocation` wrapper includes a reference to the parser to support
future functionality like source extraction.
Enable nested AST traversal by exposing rbs_node and rbs_node_list
fields.

Nested structure traversal (e.g., class members, constant types) depends
on access to rbs_node and rbs_node_list fields. Making these fields
accessible aligns the Rust bindings with the C API. Fields named "type"
are accessible via type_ to avoid a Rust keyword collision.
Adds `test_parse_integer()` which parses an integer literal type alias
and traverses the AST (`TypeAlias` -> `LiteralType` -> `Integer`) using
pattern matching to verify node types and extract values.

This validates that the generated node wrappers enable AST traversal in
pure Rust with proper type safety.

Also adds `Debug` derives and refactors memory management by returning
`SignatureNode` instead of raw pointer, with `Drop` impl to free parser.
Refactor the previous implementation of `Symbol`/`Keyword` handling to
treat them as first-class nodes in the build configuration.

`Keyword` and `Symbol` represent identifiers (interned strings), not
traditional AST nodes. However, the C parser defines them in
`rbs_node_type` (as `RBS_KEYWORD` and `RBS_AST_SYMBOL`) and treats them
as nodes (`rbs_node_t*`) in many contexts (lists, hashes).

Instead of manually defining `RBSSymbol`/`RBSKeyword` structs, we now
inject them into the `config.yml` node list in `build.rs`. This allows
them to be generated as `SymbolNode`/`KeywordNode` variants in the
`Node` enum, enabling polymorphic handling (in Node lists and Hashes)
Add support for RBS hashes (`rbs_hash_t`), which are used in Record
types and Function keyword arguments
Enable walking the AST by generating a `Visit` trait with per-node
visitor methods. It uses double dispatch to route each node type to its
corresponding visitor method. This avoids consumers needing to manually
match on Node variants and allows overriding specific visits while
inheriting default behaviour for others.
Some C struct pointer fields can be NULL (super_class when no parent
class, comment when no doc comment). This metadata allows our Rust
codegen to generate Option<T> return types for these accessors instead
of unconditionally wrapping potentially NULL pointers.
Read `optional: true` annotations from `config.yml` and generate
`Option<T>` return types with null checks, so we don't crash at runtime.

The extracted helper function centralizes the accessor generation logic
for pointer-based field types.
The Visit trait added in #69 provided the scaffolding for AST traversal,
but the visitor functions were empty stubs that didn't recurse into
children nodes. Without this, the visitor pattern is incomplete as we'd
have to manually write traversal logic every time we want to walk the
tree.

This commit adds the generation of visitor functions for child node
traversal. We handle four field types:
- `rbs_node`: single child node
- `rbs_node_list`: list of child nodes
- `rbs_hash`: key-value pairs of nodes
- Wrapper types (`rbs_type_name`, `rbs_namespace`, etc): each with its
own visitor method

Each case handles optional fields to safely skip NULL pointers
Each node already has location data in its C struct, but it wasn't
exposed through the Rust API. This adds a generated `location()` method
to every node type, making it easy to get source ranges for any part of
the AST.

Also removing `parser` from location structs as it is not needed.
Addressing some linting warnings
Adds `location()` accessor to the `Node` enum, delegating to
each variant's `location()` method.

A previous commit added `location()` to individual node types
but missed the enum itself. This allows getting the location of the
entire node definition when working with the `Node` enum directly.
Reorder lib.rs structs alphabetically

Improve bindings code formatting

Remove TODO comments from rust crate

Some nodes don't use their parser field, but conditionally omitting it
adds significant complexity. Keep parser on all nodes and suppress the
warning on the parser field.

Remove debug comment from generated bindings
Adds lifetimes to make borrowing relationships clearer so the
Rust compiler can validate and enforce them.
Replaced `*mut T` with `NonNull<T>` for the parser pointer to make the
‘never null’ assumption explicit.

`NonNull<T>` represents a non-null raw pointer (a wrapper around `*mut
T`) that guarantees the pointer is never null.
TypeApplicationAnnotation, InstanceVariableAnnotation,
ClassAliasAnnotation, and ModuleAliasAnnotation also need rust_name
fields for rust binding code generation.
@alexcrocha alexcrocha marked this pull request as ready for review January 15, 2026 01:43
@soutaro soutaro self-assigned this Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants