This document outlines the path to supporting Moose on PerlOnJava. Moose is Perl's most popular object system, providing a rich meta-object protocol (MOP) for defining classes, attributes, roles, and method modifiers.
Goal: pass 477 / 478 Moose 2.4000 test files, i.e. everything
except t/todo_tests/moose_and_threads.t (already TODO upstream;
PerlOnJava does not implement threads).
Today: 71 / 478 fully-green via the Moose-as-Moo shim (after Phase 3).
The path from 56 to 477:
- Phases 3 → 6 (incremental shim widening, ~1 week total) take us to ~110–130 fully-green files. Ships immediate value to real-world Moose-using CPAN modules without bundling upstream Moose.
- Phase D (bundle pure-Perl Moose, ~5 days) takes us from
~110–130 to 477 / 478. Replaces the shim with the real
upstream Moose distribution plus a single new file
(
Class::MOP::PurePerl, ~500 lines) that implements what Moose's 710 lines of XS would have provided.
The original "Quick path vs. real path" framing in earlier revisions of this doc is now obsolete: we did the Quick path, and we will do the real port — they're complementary, not alternatives.
threads-only Moose tests (1 file:t/todo_tests/moose_and_threads.t, already TODO upstream). PerlOnJava does not implement Perl threads; this test will be added to the distroprefs skip list during Phase D.forksemantics. Zero Moose tests usefork; not relevant here.- Real JVM-level class generation (Byte Buddy / Javassist / additional
ASM use beyond what PerlOnJava already does).
Class::MOPoperates on Perl stashes, notjava.lang.Class, so no third-party bytecode library is required for correctness. The optional "make_immutable inlining" optimization can reuse the existing ASM infrastructure if/when pursued.
These are listed only because they were "out of scope" / "blockers" in earlier revisions of this document; they no longer are:
weaken/isweak— implemented in core (selective reference counting on top of JVM GC). Seedev/architecture/weaken-destroy.md.DESTROY/DEMOLISHtiming — implemented in core; fires deterministically for tracked blessed objects. Moose'sDEMOLISHchain falls out ofDESTROYworking correctly; nothing Moose-specific is needed.Bmodule subroutine name/stash introspection — done (Phase 1).
| Component | Status | Verification |
|---|---|---|
B::CV->GV->NAME |
Works | ./jperl -e 'sub f{} use B; print B::svref_2object(\&f)->GV->NAME' → f |
Sub::Identify::get_code_info |
Works | Returns ("main","f") for \&f |
| Moo | Works | use Moo; has ...; ->new(...) (~96% of upstream test suite) |
Try::Tiny |
Works | use Try::Tiny succeeds |
Module::Runtime |
Works | use Module::Runtime succeeds |
Devel::GlobalDestruction |
Works | use Devel::GlobalDestruction succeeds |
Devel::StackTrace |
Works | use Devel::StackTrace succeeds |
Devel::OverloadInfo |
Works | use Devel::OverloadInfo succeeds |
Sub::Exporter |
Works | use Sub::Exporter succeeds |
Sub::Install |
Works | use Sub::Install succeeds |
Sub::Identify |
Works | use Sub::Identify succeeds |
Data::OptList |
Works | use Data::OptList succeeds |
Class::Load |
Works | use Class::Load succeeds |
Package::Stash |
Works | use Package::Stash succeeds |
Eval::Closure |
Works | use Eval::Closure succeeds |
Params::Util |
Works (no env var needed) | _CLASS("Foo") returns truthy |
B::Hooks::EndOfScope |
Works | use B::Hooks::EndOfScope succeeds |
Package::DeprecationManager |
Loads, requires -deprecations => {...} import args (normal upstream behavior) |
— |
Class::MOP |
Missing | Not bundled |
Moose |
Missing | Not bundled; CPAN install fails (XS) |
ExtUtils::HasCompiler |
Returns false in practice | Returns undef early because $Config{usedl} is empty |
| Blocker | Severity | Description |
|---|---|---|
Class::MOP not bundled |
Critical | Moose can't load; even simple use Moose fails |
Moose's Makefile.PL builds 13 .xs files |
Critical | Compiler-check bypass alone is insufficient; MM still tries to compile |
Moose.pm not bundled |
Critical | No alternative entry point on disk |
MAGIC-based export tracking in Moose::Exporter |
Low | Affects re-export warnings only |
Moose uses Class::MOP::get_code_info($coderef) (and Sub::Identify's
identical helper) to:
- Decide whether a method belongs to a class or was imported.
- Track method origins during role composition.
- Tell defined subs from re-exported ones.
- Build method maps and override tables.
PerlOnJava now stores subName/packageName on RuntimeCode
(src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeCode.java), and
B.pm's B::CV/B::GV accessors read them. This is the foundation on which
either path below can build.
Goal: get use Moose; working for the common subset (attributes,
inheritance, roles, method modifiers) by delegating to Moo.
src/main/perl/lib/Moose.pm— shim that translatesMooseAPI into Moo calls.src/main/perl/lib/Moose/Role.pm— delegates toMoo::Role.src/main/perl/lib/Moose/Util/TypeConstraints.pm— minimal stub providingsubtype,enum,as,where,coerce,class_type,role_type.src/main/perl/lib/Moose/Object.pm— base class withnew,BUILD,BUILDARGS,meta(returning a stub metaclass).
Reference implementations to mine: Mo::Moose, Any::Moose, and
MouseX::Types::Moose (already present in ~/.perlonjava/lib/).
./jcpan -t ANSI::Unicode # currently FAIL → must PASS
./jcpan -t Test::Class::Moose # smoke test- No real metaclass introspection (
$class->meta->get_all_attributesetc.). - Type constraints are name-only; no
MooseX::Types. Moose::Exporter-based modules that call deep MOP APIs won't work.
This is enough for the long tail of CPAN modules that just declare attributes and method modifiers.
The upstream module currently lives at ~/.perlonjava/lib/ExtUtils/HasCompiler.pm
and returns false only because $Config{usedl} happens to be empty. Make this
explicit and authoritative:
# src/main/perl/lib/ExtUtils/HasCompiler.pm
package ExtUtils::HasCompiler;
our $VERSION = '0.025';
use strict; use warnings;
use base 'Exporter';
our @EXPORT_OK = qw/can_compile_loadable_object can_compile_static_library can_compile_extension/;
our %EXPORT_TAGS = (all => \@EXPORT_OK);
sub can_compile_loadable_object { 0 }
sub can_compile_static_library { 0 }
sub can_compile_extension { 0 }
1;Verify with:
./jperl -MExtUtils::HasCompiler=can_compile_loadable_object \
-e 'print can_compile_loadable_object(quiet=>1) ? "yes" : "no"'
# → noImportant: this alone does not unblock Moose 2.4000. Its generated
Makefile.PL contains:
OBJECT => "xs/Attribute\$(OBJ_EXT) ... mop\$(OBJ_EXT)",
XS => { "xs/Attribute.xs" => "xs/Attribute.c", ... }, # 13 .xs files
C => [ ... ],After the compiler-check bypass, WriteMakefile will still attempt to compile
those. We need Phase B too.
Two sub-options:
B1. Patch src/main/perl/lib/ExtUtils/MakeMaker.pm (already PerlOnJava's
own copy) so it scrubs OBJECT, XS, C, H, XSPROTOARG, XSOPT from
the args before generating the Makefile. Gate behind a config flag so other
modules with optional XS keep working.
B2. Bundle our own Moose.pm (Phase D) so the upstream
Moose-2.4000/Makefile.PL never runs.
Preferred: B1 — it's a one-time investment that helps every XS module that ships pure-Perl fallbacks.
Create src/main/java/org/perlonjava/runtime/perlmodule/ClassMOP.java. Most
of Class::MOP is already pure Perl upstream and runs unmodified once the
helpers exist. We only need Java for the irreducible pieces:
| Function | Trivial after Phase 1? | Notes |
|---|---|---|
get_code_info($cv) |
Yes | Read RuntimeCode.packageName/subName |
is_stub($cv) |
Yes | Check that the code body is empty |
_definition_context() |
Yes | Capture caller(1) |
_flag_as_reexport($glob) |
No | Needs MAGIC; defer to Phase E |
_export_is_flagged($glob) |
No | Needs MAGIC; defer to Phase E |
INSTALL_SIMPLE_READER etc. |
Optional | Pure-Perl version is fine; Java only for speed |
Skeleton:
package org.perlonjava.runtime.perlmodule;
public class ClassMOP extends PerlModuleBase {
public ClassMOP() { super("Class::MOP", false); }
public static void initialize() {
ClassMOP m = new ClassMOP();
try { m.registerMethod("get_code_info", null); }
catch (NoSuchMethodException e) { throw new RuntimeException(e); }
}
public static RuntimeList get_code_info(RuntimeArray args, int ctx) {
RuntimeScalar cv = args.get(0);
if (cv.type != RuntimeScalarType.CODE) return new RuntimeList();
RuntimeCode code = (RuntimeCode) cv.value;
RuntimeList r = new RuntimeList();
r.add(new RuntimeScalar(code.packageName != null ? code.packageName : "main"));
r.add(new RuntimeScalar(code.subName != null ? code.subName : "__ANON__"));
return r;
}
}Wire it up in org.perlonjava.runtime.perlmodule initialization next to
Mro, Internals, etc.
Drop the upstream .pm tree (without the xs/ and mop.c) into
src/main/perl/lib/Class/MOP* and src/main/perl/lib/Moose*. Add a small
PERL_CLASSMOP_PP=1-style wrapper that forces every Class::MOP::* module
to skip Class::Load::XS/XS-only branches.
Lower priority; only affects Moose::Exporter re-export tracking. Implement
as a WeakHashMap<GlobReference, ExportFlags> on the Java side, exposed
through helper subs Moose::Exporter::_set_flag/_get_flag.
# Phase 1 sanity (already passes)
./jperl -e 'sub f{} use B; print B::svref_2object(\&f)->GV->NAME' # → f
# After Quick path (shim)
./jcpan -t ANSI::Unicode # → PASS
./jperl -MMoose -e 'package P { use Moose; has x => (is=>"rw") } P->new(x=>1)'
# Run upstream Moose's full test suite against the shim (no install)
./jcpan -t Moose # → see baseline below
# After Phase A (HasCompiler stub)
./jperl -MExtUtils::HasCompiler=can_compile_loadable_object \
-e 'print can_compile_loadable_object(quiet=>1) ? "yes" : "no"' # → no
# After Phase B (XS-strip in WriteMakefile)
./jcpan -t Moose # → install OK, more tests pass
# After Phase C+D (real Class::MOP / Moose)
./jperl -MClass::MOP -e 'print Class::MOP::get_code_info(\&Class::MOP::class_of)'
./jcpan -t Moose
./jcpan -t MooseX::Types./jcpan -t Moose is wired up via a CPAN distroprefs entry shipped from
src/main/perl/lib/CPAN/Config.pm (auto-bootstrapped to
~/.perlonjava/cpan/prefs/Moose.yml on first run). It:
- ensures
Moois installed before testing — the shim delegates to Moo, so thepl:step calls a tiny Perl helper (PerlOnJava::Distroprefs::Moose::bootstrap_pl_phase) that doesrequire Mooand falls back tosystem $ENV{JCPAN_BIN}, "Moo"if Moo isn't loadable; - creates a stub
Makefileso CPAN.pm's "no Makefile created" fallback path doesn't kick in (also done by the same helper); - skips
makeandinstall(PerlOnJava::Distroprefs::Moose::noop, cross-platform replacement for POSIXtrue); - runs
prove --exec jperl -r t/against the unpacked tarball.
jcpan / jcpan.bat prepend the project directory to PATH so
shell-spawned subprocesses (CPAN's distroprefs commandlines, prove's
child processes) find jperl on both Unix and Windows. They also
export JCPAN_BIN for the helper to recursively call jcpan when Moo
needs installing.
This design avoids POSIX-only shell constructs — ||, ;, touch,
/dev/null, $VAR — that don't work in Windows cmd.exe. Each phase
is a single jperl -MPerlOnJava::Distroprefs::Moose -e '...' (or
prove --exec jperl ...) invocation, parsed identically by bash,
sh, cmd.exe, and PowerShell.
We deliberately avoid a CPAN depends: block — it would force CPAN to
resolve Moose's full upstream prereq tree (Package::Stash::XS,
MooseX::NonMoose, …), most of which is XS and unsatisfiable. The
helper installs only Moo, the real runtime dependency of the shim.
Because prove --exec invokes jperl per test file without adding
lib/ or blib/lib/ to @INC, the bundled shim from the jar wins
over the unpacked upstream lib/Moose.pm. So you can run the entire
upstream suite end-to-end and see honestly which tests pass, without
patching Moose's Makefile.PL or shipping a fragile diff.
The same recipe is the model for any future "test against shim, don't
install" scenario — define a distroprefs entry that overrides pl /
make / install with no-ops and test with a prove --exec line.
Snapshot history from ./jcpan -t Moose against the current shim:
| Metric | Initial shim | After refcount/DESTROY (Apr 2026) | After Phase A + C-mini (Apr 2026) | After Phase 2 stubs (Apr 2026) | After Phase 3 (Apr 2026) |
|---|---|---|---|---|---|
| Test files executed | 478 | 478 | 478 | 478 | 478 |
| Individual assertions executed | 616 | 616 | 667 | 1419 | 2450 |
| Fully passing files | ~29 | 35 | 36 | 56 | 71 |
| Partially passing files | ~44 | 94 | 98 | 184 | 259 |
Compile/load fail (missing Class::MOP::*, Moose::Meta::*) |
~405 | ~349 | ~344 | ~238 | ~148 |
| Assertions ok | 370 | 372 | 419 | 953 | 1562 |
| Assertions fail | 246 | 244 | 248 | 466 | 888 |
The initial 29 fully-passing files covered BUILDARGS / BUILD chains,
immutable round-trips, anonymous role creation, several Moo↔Moose bug
regressions, the cookbook recipes for basic attribute / inheritance /
subtype use, and the Type::Tiny integration test. The 44 partials
included high-value chunks such as basics/import_unimport.t (31/48),
basics/wrapped_method_cxt_propagation.t (6/7), and
recipes/basics_point_attributesandsubclassing.t (28/31).
The refcount/DESTROY merge (PRs #565, #566, plus weaken/destroy work)
moved the structural picture meaningfully even though the assertion
total only nudged: ~56 files that previously failed at compile/load
time now run subtests. Most ended up partial rather than fully green
(partials roughly doubled, 44 → 94), but six more files are fully
passing (29 → 35). The shim's per-test infrastructure (BUILD chains,
DEMOLISH ordering, weak refs) is now solid; the remaining failures
are dominated by missing Class::MOP::* and Moose::Meta::*
introspection APIs.
Phase A + Phase C-mini (this PR) added two pieces:
ExtUtils::HasCompilerdeterministic stub (src/main/perl/lib/ExtUtils/HasCompiler.pm) — always reports "no compiler", instead of relying on$Config{usedl}happening to be empty.Class::MOPshim (src/main/perl/lib/Class/MOP.pm) — providesclass_of,get_metaclass_by_name,store_metaclass_by_name,remove_metaclass_by_name,does_metaclass_exist,get_all_metaclasses(and friends),get_code_info,is_class_loaded,load_class,load_first_existing_class. Returns "no metaclass" everywhere, which is the correct answer under the Moose-as-Moo shim. The previous behavior was a hard "Undefined subroutine &Class::MOP::class_of called" the moment Moo's_Utils::_load_modulehit a not-installed dependency on a class that already hadMoose.pmloaded.
Net effect of Phase A + C-mini: +51 individual assertions now execute (616 → 667), +47 newly pass (372 → 419), and one more file goes fully green (35 → 36). The four extra failures are upstream tests that previously bailed before reaching their assertion phase and now reach it; none are real regressions.
Phase 2 stubs (a follow-up PR) added the next batch of compile-time blockers and a bailout fix:
Moose.pm/Moose::Rolenowuse Class::MOP ()at top-level so Moo's runtime calls toClass::MOP::class_of(made whenever$INC{"Moose.pm"}is set) are always defined. This was the cause of ~50+ "Undefined subroutine &Class::MOP::class_of" runtime errors on the previous baseline.metaclass.pmstub — installs a no-opmetamethod on the caller.Test::Moose.pm— coversmeta_ok,does_ok,has_attribute_ok,with_immutable. Falls back to$class->can($attr)when no real metaclass is available.Moose::Util.pm— coversfind_meta,is_role,does_role,apply_all_roles,english_list,throw_exception, plus trait/metaclass alias passes-through.- Skeleton stubs for
Class::MOP::Class,Class::MOP::Attribute,Moose::Meta::Class,Moose::Meta::TypeConstraint::Parameterized,Moose::Meta::Role::Application::RoleSummation, andMoose::Exporter— enough surface thatrequire Xsucceeds andX->new(...)returns something with the methods upstream tests inspect. - Pre-populated standard type-constraint stubs in
Moose::Util::TypeConstraints(Any,Item,Defined,Bool,Str,Num,Int,ArrayRef,HashRef,Object, …). Without these,t/type_constraints/util_std_type_constraints.twouldBAIL_OUT("No such type ...")and prove would stop, losing every test file that followed alphabetically (≈7 files / 50+ assertions).
Net effect of Phase 2: +752 individual assertions now execute
(667 → 1419), +534 newly pass (419 → 953), +20 fully-green
files (36 → 56), and -106 files now compile that previously
errored out at compile time. The +218 newly failing assertions are
mostly tests that hadn't reached their assertion phase before (so
"failure" is the honest answer); they include real shortcomings of
the stub (e.g. Test::Moose::has_attribute_ok doesn't know about
inherited Moo attributes) which would only be fixed by Phase D
(real Class::MOP / Moose port).
Phase 3 (this PR's third step) added:
- Rich
Moose::_FakeMeta:@ISAnow includesClass::MOP::ClassandMoose::Meta::Class, soisa_ok($meta, ...)checks pass. Implementsadd_attribute,get_attribute,find_attribute_by_name,has_attribute,remove_attribute,get_attribute_list,get_all_attributes(walks @ISA),get_method(returns aClass::MOP::Method),has_method,get_method_list,new_object,superclasses,linearized_isa,is_immutable,is_mutable,roles,does_role, plus a per-class meta cache so$class->metareturns the same object each call. Moose.pmandMoose/Role.pmnow record eachhasdeclaration on the target's_FakeMeta, so$meta->get_attribute_listandfind_attribute_by_nameactually return useful data.- New compile-time stubs:
Class::MOP::Method,Class::MOP::Instance,Class::MOP::Method::Accessor,Class::MOP::Package,Moose::Meta::Method,Moose::Meta::Attribute,Moose::Meta::Role(withcreate_anon_role),Moose::Meta::Role::Composite,Moose::Meta::TypeConstraint,Moose::Meta::TypeConstraint::Enum,Moose::Util::MetaRole(withapply_metarolesno-op),Moose::Exception(with overloaded stringification +throw). Moose::Util::TypeConstraints::_Stubnow@ISA-inherits fromMoose::Meta::TypeConstraint, so type stubs passisa_ok($t, 'Moose::Meta::TypeConstraint').Moose::Util::TypeConstraints::_storenow blesses results into_Stub(was returning unblessed hashrefs, which produced "Can't call method 'check' on unblessed reference" failures).- New:
find_or_parse_type_constraint(handlesMaybe[Foo],Foo|Bar,ArrayRef[Foo],HashRef[Foo],ScalarRef[Foo]). - New:
export_type_constraints_as_functions. Moose.pmpre-loadsMoose::Util::MetaRoleso MooseX::* extensions that callapply_metaroleswithout an explicituseline don't fail with "Undefined subroutine".
Net effect of Phase 3: +1031 individual assertions now execute (1419 → 2450), +609 newly pass (953 → 1562), +15 fully-green files (56 → 71), and -90 files now compile that previously errored out at compile time. The +422 newly-failing assertions are again mostly tests that hadn't reached their assertion phase before.
Cumulative across this PR (master baseline → end of Phase 3): +36 fully-green files (35 → 71), +1834 assertions executed (616 → 2450), +1190 newly passing (372 → 1562).
Phase 3 hit clear diminishing returns toward the end (the last iteration added only +2 fully-green files while +90 files now compile that previously didn't), confirming the doc's call: shim widening is losing leverage, and Phase D (bundle pure-Perl Moose) is the next move that meaningfully advances the pass count.
Phases 4 → 6 (more shim widening) and Phase D (bundle pure-Perl Moose) should move these numbers further; record the new totals here whenever they shift.
src/test/resources/module/{Distribution}/ is reserved for unmodified
upstream test files of CPAN distributions we actually bundle. Use it
only when both apply:
- The distribution itself is bundled (its
.pmfiles live insrc/main/perl/lib/, or the test directory ships its ownlib/). - The tests being copied are the upstream tests for that distribution.
So:
- When we eventually bundle a pure-Perl
Moose(Phase D), copyMoose-2.4000/t/...intosrc/test/resources/module/Moose/t/. - Same for
Class::MOP,MooseX::Types, etc., as each gets bundled. - Do not snapshot tests for downstream consumers we don't bundle
(e.g. ANSI::Unicode). Those stay as
./jcpan -tsmoke checks. - Do not put shim-specific or PerlOnJava-specific tests under
module/. Shim coverage belongs insrc/test/resources/unit/if it's needed beyond thejcpan -tsmoke.
Conventions for bundled-distribution snapshots (see existing layouts under
src/test/resources/module/, e.g. Clone-PP/, Math-BigInt/,
XML-Parser/):
- One directory per CPAN distribution (
Moose/,Class-MOP/, …); use the dist name with::replaced by-. - Mirror the upstream
t/layout exactly. Don't edit the test files; if a test is genuinely incompatible, prefer fixing the runtime over editing the test (per AGENTS.md). - Tests are picked up automatically by the Gradle
testModuletask — no JUnit wiring is needed.
Verify with:
make test-bundled-modulesThis gives us a regression net: every newly-passing upstream Moose-ecosystem
test we vendor in becomes guarded against regressions, and git log src/test/resources/module/Moose* becomes the historical record of progress.
For the current PR (Quick path / shim only) there are no bundled
upstream distributions yet, so nothing is snapshotted under module/.
The regression net for the shim is make plus the ./jcpan -t ANSI::Unicode smoke check.
Moose ← MISSING
└── Class::MOP ← MISSING (Phase C+D)
├── MRO::Compat ← upstream copy works
├── Class::Load ← works
│ ├── Module::Runtime ← works
│ ├── Data::OptList ← works
│ │ ├── Params::Util ← works (no env var)
│ │ └── Sub::Install ← works
│ └── Try::Tiny ← works
├── Devel::GlobalDestruction ← works
├── Devel::OverloadInfo ← works
├── Devel::StackTrace ← works
├── Dist::CheckConflicts ← works
├── Eval::Closure ← works
├── Package::DeprecationManager← works (normal import-arg requirement)
├── Package::Stash ← works
├── Sub::Exporter ← works
├── Sub::Identify ← works (Phase 1)
├── List::Util ← built-in
├── Scalar::Util ← built-in
└── B::Hooks::EndOfScope ← works
The whole "needs investigation" / "needs PP flag" column from the previous
revision of this doc is gone — every Class::MOP runtime dependency that
isn't Class::MOP itself loads cleanly today.
Goal: pass 477 / 478 Moose 2.4000 test files (everything except
t/todo_tests/moose_and_threads.t, which is already TODO upstream
and PerlOnJava doesn't implement threads). Today: 56 / 478.
- Phase 1 — DONE. B-module subroutine name/stash introspection works.
- Quick path — DONE.
Moose.pmshim ships, ANSI::Unicode-class modules unblocked. - Phase A — DONE.
ExtUtils::HasCompilerdeterministic stub ships atsrc/main/perl/lib/ExtUtils/HasCompiler.pm. - Phase C-mini — DONE.
Class::MOPshim withclass_of/get_metaclass_by_name/get_code_info/is_class_loadedand friends; ships atsrc/main/perl/lib/Class/MOP.pm. - Phase 2 stubs — DONE.
metaclass.pm,Test::Moose.pm,Moose::Util.pm, plus skeletonClass::MOP::Class/Class::MOP::Attribute/Moose::Meta::Class/Moose::Meta::TypeConstraint::Parameterized/Moose::Meta::Role::Application::RoleSummation/Moose::Exporter. Pre-populated standard type-constraint stubs to avoidBAIL_OUTin upstream test suite. - Phase 3 — DONE. Rich
Moose::_FakeMeta(with@ISAand full method surface), attribute-tracking viahaswrapper, plus the next batch of compile-time stubs (Class::MOP::Method/::Instance/::Method::Accessor/::Package,Moose::Meta::Method/::Method::Constructor/::Method::Destructor/::Method::Accessor/::Method::Delegation/::Attribute/::Role/::Role::Composite/::TypeConstraint/::TypeConstraint::Enum,Moose::Util::MetaRole,Moose::Exception),_Stubblessed intoMoose::Meta::TypeConstraint,find_or_parse_type_constraint+export_type_constraints_as_functions+_parse_parameterized_type_constraint+get_type_constraint_registry, method-modifier hooks on_FakeMeta,Class::MOP.pmpre-loads its submodules,Moose.pmpre-loadsMoose::Meta::*andMoose::Util*. 71 / 478 fully-green. - Phases 4 / 5 / 6 — not started. Incremental shim widening. Ship value (~110–130 fully-green) but do not pass all tests on their own.
- Phase D — not started. Bundle pure-Perl Moose. This is the phase that gets us to 477 / 478. Now sized at ~5 days (was previously framed as much larger). See "Phase D plan" below.
- Phase B — deferred. Strip XS keys in
WriteMakefile. Not on the Moose pass-all-tests critical path; the bundled Moose ships from the JAR. - Phase E — deferred. Export-flag MAGIC. Affects warnings only.
- Phase 1: B-module subroutine name introspection
- Verified working dependency tree (Apr 2026)
- Quick path:
Moose.pm/Moose::Role/Moose::Object/Moose::Util::TypeConstraintsshims - Phase A:
ExtUtils::HasCompilerdeterministic stub - Phase C-mini:
Class::MOPshim (no metaclass instances; just enough surface to keep Moo happy) - Phase 2 stubs:
metaclass.pm,Test::Moose.pm,Moose::Util.pm, skeletonClass::MOP::Class/Class::MOP::Attribute/Moose::Meta::Class/Moose::Exporter/ friends, and standard-type stubs inMoose::Util::TypeConstraintsto suppress upstreamBAIL_OUT.
During the Phase 3 → Phase D push, two "core fixes" were attempted to unblock Class::Load / Class::MOP bootstrap, both later reverted:
*GLOB{SCALAR}returns a SCALAR reference, not the value (commit880bf65c7, reverted in3d02203dc). Motivation: Class::Load::PP line 38 does${ *{...}{SCALAR} }and our impl returned a copy. The "fix" returned a fresh\$valuereference each call. This silently broke Path::Class (and DBIC by extension) because Path::Class's overload code does*$sym = \&nil; $$sym = $arg{$_};— assignments through the glob deref expect to land on the package's actual SV slot, not a throwaway reference. Lesson: any change to typeglob slot semantics must be validated against the full DBIC suite, which exercises Path::Class heavily.- Auto-sweep weaken / walker-gated destroy
(commits
ca3af1ad3+ecb5c6400, reverted inf8ef367e4/d3743a11c). Motivation: Class::MOP bootstrap died because the metaclass was being destroyed mid-construction. The "fix" coupled destroy timing to the reachability walker's view of refcount. It passed targeted refcount unit tests but introduced regressions in DBIC'st/52leaks.tthat the unit tests didn't catch. Reverted pending a more disciplined design (see "Refcount fix plan" later in this document).
Common failure mode: my measurement methodology was wrong. I was
running partial DBIC subsets and treating "fast-fail at compile time"
as "no regression". The correct gate is the full ./jcpan -t DBIx::Class (~24 min, 314 files / ~13858 assertions). After both
reverts, DBIC is back at master parity.
The two iterative shim PRs (#570, #572) turned the formal phase plan above on its head: paths C-full / D were originally framed as the "real fix", but in practice incremental shim widening has paid out much faster than a full pure-Perl port would have. Concrete takeaways:
- Compile-time stubs are the highest-leverage move. Each round
of "let
require Xsucceed" cleared dozens of files at once (Phase 2 alone: 344 → 238 files that fail before any subtest). - Pre-loading is as important as having the stub. Once
Moose.pmset$INC{Moose.pm}, Moo's runtime probes calledClass::MOP::class_offrom random call sites. Addinguse Class::MOP ()at the top ofMoose.pm/Moose/Role.pmkilled ~50+ runtime errors that would otherwise have masked any shim widening. - One BAIL_OUT can hide an arbitrary number of test files.
t/type_constraints/util_std_type_constraints.tcallingBAIL_OUT("No such type ...")was costing us ~7 trailing files per run. Pre-populating standard type-constraint stubs cleanly contained that — but the lesson is general: any new failure mode that hitsBAIL_OUTshould be treated as a high-priority block. - The Moose-as-Moo gap is mostly method surface, not metaclass
semantics. A large fraction of upstream tests just want
$meta->add_attribute,$meta->get_method,$meta->is_mutableto exist and return a sensible-shaped value. They rarely care that the metaclass is "real". Ergo: enrichingMoose::_FakeMetais high-leverage and low-risk. - Stub objects must
isathe right things. Upstream tests doisa_ok($meta, 'Moose::Meta::Class')andisa_ok($attr, 'Moose::Meta::Attribute'). Returning a plain blessed hashref isn't enough; the stub needs@ISAset to the real upstream class names soisachecks pass.
The goal is to pass all 478 Moose 2.4000 test files except the
threads-only test (t/todo_tests/moose_and_threads.t, already a
TODO upstream). PerlOnJava does not implement fork or threads, but
the Moose suite is forgiving: zero tests use fork, and only that one
file uses threads. Everything else is in scope.
Phases 3 → 6 below are incremental shim widening. They ship value quickly and are projected to take us from today's 56 / 478 fully-green files to roughly 110–130 / 478 (~25–28%) — covering ordinary Moose consumers (attributes, roles, method modifiers).
To pass the rest of the suite (immutable inlining, MOP self-tests,
role conflict messages, native traits, type-constraint coercion
graphs, Class::MOP self-bootstrap, ...) we then do Phase D — bundle
pure-Perl Moose. With weaken/DESTROY now in core PerlOnJava and only
710 lines of XS to replace (most of it generic hashref accessors),
Phase D is much smaller than its earlier "the real fix" framing
suggested. See "Phase D plan" below for the concrete breakdown.
Target outcome:
- Phases 3 → 6: ~110–130 / 478 fully-green files. Ships value to real-world Moose-using CPAN modules immediately.
- Phase D (bundle + XS replacement): 477 / 478 fully-green files
(everything except
moose_and_threads.t). Anything still failing is a real bug in PerlOnJava core, not in the Moose port.
Phases in priority order:
Estimated payoff: similar to Phase 2 (+15–25 fully-green files, +200–500 newly-passing assertions). Estimated effort: ~1 day.
3a. Enrich Moose::_FakeMeta so isa_ok($meta, 'Moose::Meta::Class')
passes and the methods upstream tests reach for actually exist:
| Method | Failure count in last run |
|-------------------------|---------------------------|
| `add_attribute` | 24 |
| `get_attribute` | 8 |
| `new_object` | 4 |
| `is_mutable` | 3 |
| `get_method` | 3 |
| `meta` | 4 |
| (FakeMeta isa Class::MOP::Class) | 6 |
| (FakeMeta isa Moose::Meta::Class) | 4 |
Fix: add `our @ISA = ('Class::MOP::Class', 'Moose::Meta::Class');`
to `Moose::_FakeMeta`, and implement the missing methods either
as pass-throughs to the underlying Moo metaclass (via
`Moo->_constructor_maker_for($class)->all_attribute_specs`) or as
minimal "remember what `has` declared" tracking inside
`Moose.pm`'s `import`.
3b. Add the next batch of compile-time .pm stubs for the most
common "Can't locate" failures:
| Stub | Errors |
|-------------------------------------------|--------|
| `Moose::Meta::Attribute` | 8 |
| `Moose::Meta::Role` | 6 |
| `Moose::Meta::Role::Composite` | 7 |
| `Class::MOP::Method` | 7 |
| `Class::MOP::Instance` | 4 |
| `Moose::Util::MetaRole` (with `apply_metaroles` no-op) | 4 + 9 calls |
| `Moose::Meta::TypeConstraint` | 3 |
| `Moose::Exception` (and the most-thrown subclasses) | 3 + many `throw_exception` calls |
Each is the same shape as the existing skeleton stubs:
`package X; require Y; our @ISA = (Y); sub new { bless {...} } 1;`.
3c. Bless Moose::Util::TypeConstraints::_Stub into
Moose::Meta::TypeConstraint so isa_ok($t, 'Moose::Meta::TypeConstraint') passes (5 errors today).
3d. Add the missing methods on Moose::Util::TypeConstraints:
| Method | Errors |
|----------------------------------|--------|
| `export_type_constraints_as_functions` | 5 |
| `find_or_parse_type_constraint` | 3 |
3e. Moose::Meta::Role->create_anon_role as a no-op returning
a _FakeRole (4 errors).
Estimated payoff: medium-high (+100–300 newly-passing assertions,
mostly under t/basics/, t/attributes/, t/cmop/attribute*).
Estimated effort: ~2 days.
By Phase 3, $meta->add_attribute(name => ..., is => 'rw') exists
but is a no-op. To make $meta->get_attribute_list / $meta->get_attribute(...)
return useful values, hook into Moo's actual attribute store:
sub get_attribute_list {
my $self = shift;
my $name = $self->name;
require Moo::_Utils;
return keys %{ Moo->_constructor_maker_for($name)->all_attribute_specs // {} };
}Same trick for get_attribute, find_attribute_by_name, etc.; wrap
each Moo attribute spec in a Class::MOP::Attribute stub. This makes
Test::Moose::has_attribute_ok actually test what users mean.
Estimated payoff: low-medium (most MooseX::* extensions need it; few upstream Moose tests do). Estimated effort: ~1 day.
Moose::Util::MetaRole::apply_metaroles is what
MooseX::* extensions use to install custom metaclass roles. A real
implementation needs to compose roles into the metaclass at
install-time — under the shim, "compose into Moo metaclass" is a no-op
that just records the role list, which is enough for most consumers.
Estimated payoff: medium (unlocks every "extends Moose with custom
sugar" module: MooseX::SimpleConfig, MooseX::Getopt, ...).
Estimated effort: ~2–3 days.
The current Moose::Exporter stub only forwards to Moose->import.
A more complete version would install the caller's with_caller /
with_meta / as_is exports onto consumers.
This is the phase that gets us to 477 / 478 passing (everything except the threads-only TODO test).
Phase D status (Apr 2026): started but paused on a core PerlOnJava refcount bug. Findings recorded here so the next attempt picks up where this one left off.
Before trying to bundle Moose, the plan was reviewed for hidden problems. Two real issues surfaced; one is fixed, one remains.
-
*GLOB{SCALAR}returned the value instead of a SCALAR reference. PerlOnJava core bug:*x{SCALAR}yielded the scalar's value where real Perl yields a SCALAR ref. Fixed in the same PR (RuntimeGlob.javaline 554-565: now callscreateReference()like the ARRAY/HASH/GLOB cases). Regression test insrc/test/resources/unit/typeglob.t. This unblockedClass::Load::PP::_is_class_loaded,Package::Stash::PP::get_symbol, and many other modules that read$VERSIONvia the symbol table. -
prove --notdoes not exist. Workaround when Phase D resumes: use a small--execwrapper that returns1..0 # SKIP threads not implementedfort/todo_tests/moose_and_threads.tand runsjperlfor every other file. ~10 lines of Perl, easy to ship.
When weaken was called on a hash slot inside a sub, with the target also held by other strong refs in the caller, the slot became undef immediately. Minimal reproduction:
require Scalar::Util;
my $m = bless {}, "M";
my %REG = (x => $m);
sub attach {
my ($attr, $class) = @_;
$attr->{ac} = $class;
Scalar::Util::weaken($attr->{ac});
}
my @arr = ({}, {}, {});
for my $attr (@arr) {
attach($attr, $REG{x});
}
# Real Perl: all three $arr[i]->{ac} are still defined (weak refs to
# $m, which has a strong ref via %REG).
# PerlOnJava (was, before fix): all three became undef immediately.Class::MOP's bootstrap relies on this pattern pervasively
(weaken($self->{associated_class} = $class) in
Class::MOP::Attribute::attach_to_class, called for every attribute
during Class::MOP.pm's self-bootstrap). Without the fix
use Class::MOP; itself died in the bootstrap.
The auto-sweep (MortalList.maybeAutoSweep →
ReachabilityWalker.sweepWeakRefs(true)) was clearing weak refs to
blessed objects whose selective refCount > 0 simply because the
walker couldn't see them as reachable. The walker only seeds from
package globals and ScalarRefRegistry; it doesn't seed from my
lexical hashes or arrays. A blessed object held only by a my %REG
in the caller's scope is therefore invisible to the walker —
"unreachable" — and got its weak refs cleared on every flush.
ReachabilityWalker.sweepWeakRefs(quiet=true) now skips clearing weak
refs whose referent has refCount > 0. Reasoning: PerlOnJava's
selective refCount can drift due to JVM temporaries, but a positive
refCount means at least one tracked container thinks it's holding a
strong reference. Auto-sweep should be conservative; explicit
Internals::jperl_gc() (non-quiet) still clears, since the user
opted in to aggressive cleanup.
Single-line change in
src/main/java/org/perlonjava/runtime/runtimetypes/ReachabilityWalker.java,
guarded by quiet. See the surrounding comment for the full
reproduction and rationale.
Investigation completed under PJ_RC=1 instrumentation: the metaclass
DOES NOT get destroyed (its refCount oscillates 0↔7 but never reaches
Integer.MIN_VALUE — a small localBindingExists=true guard on the
flush path correctly skips destroy). The actual destroy that triggers
the failure is on a different blessed object — likely an interim
object held briefly by Sub::Install during method installation.
Captured trace excerpt (PJ_RC=1):
RC -1 MortalList.flush b=1677207406 1->0 (refCount>0=true)
RC +1 incrementRefCountForContainerStore b=1677207406 0->1
RC +1 setLargeRefCounted-INC nb=1677207406 1->2
RC +1 setLargeRefCounted-INC nb=1677207406 2->3
RC defer-decrement b=1677207406 refCount=3 (will -1 on flush)
RC defer-decrement b=1677207406 refCount=3 (will -1 on flush)
RC clearWeakRefsTo b=1677207406 (clearing 4 weak refs) # <-- bang
The clearing fires from MortalList.flush → DestroyDispatch.callDestroy
during a routine reference assignment in
Class::MOP::Mixin::HasAttributes's _post_add_attribute chain.
Stack of the destroy trigger:
WeakRefRegistry.clearWeakRefsTo
← DestroyDispatch.doCallDestroy
← DestroyDispatch.callDestroy
← MortalList.flush (line 566)
← anon1205.apply (Class/MOP/Class.pm line 260)
Total events for object 1677207406 over its lifecycle:
- 55 increments
- 45 immediate decrements (
MortalList.flush) - 42 deferred decrements queued
The decrement count exceeds the increment count, indicating a real
asymmetry — but pinpointing which assignment is asymmetric requires
deeper instrumentation than fits in this debugging round, since
PerlOnJava's refcount model has many subsystems (MortalList,
WeakRefRegistry, ScalarRefRegistry, ReachabilityWalker,
MyVarCleanupStack, DestroyDispatch).
Attempt 1: "Skip destroy when weak refs exist" guard in
MortalList.flush():
} else if (WeakRefRegistry.hasWeakRefsTo(base)) {
// skip destroy
}Result: broke 5+ existing weaken / destroy unit tests
(unit/refcount/weaken_destroy.t, weaken_edge_cases.t,
weaken_basic.t, destroy_anon_containers.t,
unit/weaken_via_sub.t Case 5). Reverted.
Attempt 2: Same guard, but tightened to "blessed object with weak refs" only:
} else if (WeakRefRegistry.hasWeakRefsTo(base) && base.blessId != 0) {
// skip destroy
}Applied at both MortalList.flush() and setLargeRefCounted()'s
overwrite-decrement path (line 1192-1200).
Result: still broke the cycle-breaking-via-weaken tests
(weaken_destroy.t, weaken_edge_cases.t, destroy_anon_containers.t)
because those tests use blessed objects in cycles, and rely on
DESTROY firing when the last external strong reference goes away
(letting weaken's cycle-breaking actually free the cycle). With the
guard, the cycle stays alive forever. Reverted.
Lesson: there's no simple predicate that distinguishes "transient refCount drift during heavy reference shuffling" from "genuine end-of-life with weak refs about to clear". The selective refCount system doesn't carry enough information at the destroy gate to make this call. The fix has to be in the accounting itself, not at the destroy gate.
The fix has to make refCount accurate for blessed objects under heavy reference shuffling (Class::MOP self-bootstrap pattern). Below is the detailed plan for getting the count "accurate enough", in priority order — the cheapest, lowest-risk option first.
####### Path 1 (recommended): walker awareness of hash-element seeds
Why this is the right starting point: PerlOnJava already tracks
hash/array element scalars via incrementRefCountForContainerStore,
which registers them in ScalarRefRegistry. The walker iterates
ScalarRefRegistry as roots — but filters out scalars whose
declaration scope has exited (via the MyVarCleanupStack check at
ReachabilityWalker.java lines 110-126).
That filter is correct for my $x lexicals (when the scope ends,
the scalar is logically dead). But it's wrong for hash/array
element scalars: they have no declaration scope of their own —
their lifetime is tied to the enclosing container. A $METAS{HasMethods}
scalar should remain a walker seed as long as %METAS exists.
Fix: in the walker's lexical-seed loop, skip the
MyVarCleanupStack check for scalars marked as hash/array elements
(refCountOwned == true && registered via incrementRefCountForContainerStore).
Use the enclosing container's localBindingExists as the liveness
signal instead.
Concrete patch sketch:
// ReachabilityWalker.java, around the useLexicalSeeds loop:
for (RuntimeScalar sc : ScalarRefRegistry.snapshot()) {
if (sc.captureCount > 0) continue;
if (WeakRefRegistry.isweak(sc)) continue;
// EXISTING check: skip if not in any active scope.
boolean inActiveScope = MyVarCleanupStack.isAlive(sc);
// NEW: hash/array element scalars don't have their own scope —
// treat them as live as long as some container references them.
boolean isContainerElement = sc.refCountOwned
&& ScalarRefRegistry.isContainerElement(sc);
if (!inActiveScope && !isContainerElement) continue;
// Now seed
visitScalarPath(sc, ...);
}ScalarRefRegistry.isContainerElement(sc) is new — it returns true
if sc was last registered via incrementRefCountForContainerStore
(which means it's currently a hash/array slot value). Track this
via a side-set or by exposing a getter on the existing registration.
Verification: with the walker now seeing $METAS{HasMethods}
as a root, the metaclass it points at is reachable, so the auto-sweep
won't try to clear weak refs to it. The transient refCount==0
events during bootstrap are then "false alarms" the walker corrects
on the next sweep cycle — but the MortalList.flush() destroy gate
would still fire prematurely. That's where Path 2 comes in.
Estimated effort: 1 day (small, contained change, easy to test).
####### Path 2: gate MortalList.flush() destroy on walker confirmation — DONE (2026-04-27)
The current flush-destroy gate is: if refCount==0 and !localBindingExists, fire DESTROY.
The Class::MOP bootstrap shows this is too eager — refCount can hit 0
transiently while the object is still reachable through an unwalked
path.
Fix shipped: when the flush gate (or the matching gate in
setLargeRefCounted's overwrite path) would fire DESTROY on a
blessed object, do a scoped reachability check first via the new
ReachabilityWalker.isReachableFromRoots(base) query. Skip DESTROY
only when the walker confirms the object is still reachable from
package globals or MyVarCleanupStack-tracked live my lexicals.
Critical detail: the walker seeds from MyVarCleanupStack.snapshotLiveVars()
(my-vars whose declaration scope is still active), NOT from
ScalarRefRegistry directly. ScalarRefRegistry holds stale entries
(scope-exited scalars not yet JVM-GC'd), which would falsely consider
cycle-broken-via-weaken cycles reachable through their own lexicals.
By gating on MyVarCleanupStack, the walker correctly distinguishes:
- Class::MOP bootstrap:
our %METASis inMyVarCleanupStackwhile Class::MOP.pm loads. Walker traverses %METAS, finds the metaclass via$METAS{HasMethods}, returns true. → Skip DESTROY. - Cycle-break-via-weaken: lexicals in inner block exit, leave
MyVarCleanupStack. The cycle has no path to roots through any live my-var. Walker returns false. → Fire DESTROY normally, cycle freed.
Files changed:
src/main/java/org/perlonjava/runtime/runtimetypes/ReachabilityWalker.java— newisReachableFromRoots(target)method, BFS with hard step cap (50K visits) and short-circuit on target found.src/main/java/org/perlonjava/runtime/runtimetypes/MyVarCleanupStack.java— newsnapshotLiveVars()helper.src/main/java/org/perlonjava/runtime/runtimetypes/MortalList.java— gate atflush()'s destroy path.src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java— mirror gate atsetLargeRefCounted()'s overwrite-decrement path.
Verification:
weaken_via_sub.t(20 assertions) — all pass.unit/refcount/weaken_basic.t(34) — all pass.unit/refcount/weaken_destroy.t(24, includes cycle-break) — all pass.unit/refcount/weaken_edge_cases.t(42) — all pass.unit/refcount/destroy_anon_containers.t(21) — all pass.make(full unit suite) — green../jcpan -t DBIx::Class— IDENTICAL to baseline (314 files / 878 tests / 303 failed files / 2 failing assertions). Zero regressions in the most refcount-heavy CPAN distribution we test.
The original Class::MOP bootstrap failure ("Can't call method
get_method on undefined value") is resolved: with the fix, the
metaclass survives the bootstrap, attribute back-references stay
defined, and use Class::MOP proceeds to a deeper layer
(Class/MOP/Class/Immutable/Trait.pm line 59) which is a separate
issue unrelated to refcount — Phase D will hit it next, but the
refcount blocker that prevented even trying the bundled Moose is
gone.
####### Path 3 (deferred — not needed for the immediate Class::MOP bootstrap)
Path 1 + Path 2 together make refCount drift benign for blessed objects in Class::MOP-style heavy-shuffling scenarios. The 55-vs-87 trace asymmetry observed earlier is now harmless: when refCount dips to 0 transiently, the walker confirms reachability and DESTROY is correctly skipped. If a future scenario requires the trace to be genuinely symmetric (vs just "drift-tolerant"), the Path 3 audit still applies — see the original write-up below.
####### Path 3 (deepest fix): refcount accounting symmetry audit
Use this only if Paths 1+2 don't close the gap. Captured
PJ_RC=1 trace from the Class::MOP bootstrap showed 55 increments vs
87 effective decrements for the failing object — a real asymmetry
beyond walker-blindness. That asymmetry has to come from at least
one code path where ++base.refCount and --base.refCount aren't
symmetric.
Audit candidate sites in priority order:
-
@_aliasing on sub call entry (RuntimeCode.apply). Whenattach($attr, $REG{x})is called, the elements of@_alias the caller's expression results. Real Perl uses RC++ on each alias setup, RC-- on @_ teardown at sub exit. Verify:// entry: each @_ slot whose value is a tracked ref → refCount++ // exit: each @_ slot whose refCountOwned=true → refCount--
Trace: with the test
my $obj = bless {}, "M"; sub f { 1 } for (1..10) { f($obj) }
verify
$obj's refCount lands at the same value before and after the loop. If not, that's site #1. -
List-assignment from
@_(my (...) = @_;). The list-copy path may double-count if it goes through bothsetLargeRefCountedAND a bulksetFromListpath that also touches refcounts. Audit: verifyRuntimeArray.setFromListand list-copy bytecode emit only ONE increment per assigned slot. -
Hash element store on overwrite:
$h->{key} = $a; $h->{key} = $b;The first assignment is a fresh slot (one++a.refCount). The second overwrites — should be one++b.refCountAND one--a.refCount. Audit: confirmRuntimeHash.putandsetLargeRefCounted's overwrite path don't double-decrement when both fire on the same overwrite. -
Sub::Install closure captures. Each closure binding captures
$method,$package, etc. The closure's CODE object'scapturedScalarsarray holds these. Verify per-closure scope-exit decrements only the closure's captures, not the caller's locals. Already-suspicious site:RuntimeCode.applyline 546 callsMortalList.deferDecrementIfTracked(s)on captured scalars — may double-fire across nested closures.
Methodology: write a unit test for each candidate site that
asserts base.refCount post-operation. Use the test as a regression
guard before applying the fix at that site. Like:
# t/refcount_audit_at_calls.t
use Test::More;
use Internals qw(SvREFCNT); # PerlOnJava-only helper if needed
my $obj = bless {}, "M";
my $rc0 = SvREFCNT($obj);
sub f { 1 }
for (1..10) { f($obj) }
is(SvREFCNT($obj), $rc0,
'refCount unchanged after 10 sub calls passing $obj');If SvREFCNT isn't exposed, instrument via the PJ_RC=1 env trace
and post-process the log: count increments and decrements for the
target object's id, assert equality.
Estimated effort: 3-4 days (each candidate site is its own investigation + fix + test).
####### Why this order
- Path 1 alone might solve the bootstrap (walker corrects the transient drift before it causes harm). If yes, ship.
- Path 2 closes the gap if the walker is now right but flush-destroy fires before the next walker cycle. If yes, ship Path 1+2.
- Path 3 is only needed if real refcount asymmetry exists beyond
walker-blindness. The 55-vs-87 trace data suggests it does, but
the asymmetry might be benign once the walker correctly identifies
reachable objects (refCount drift is fine if
localBindingExists- walker say "still alive").
The test gate is unchanged from the previous round:
./jperl src/test/resources/unit/weaken_via_sub.t # 20/20 ok
./jperl src/test/resources/unit/refcount/weaken_basic.t # all ok
./jperl src/test/resources/unit/refcount/weaken_destroy.t # all ok (cycle break)
./jperl src/test/resources/unit/refcount/weaken_edge_cases.t # all ok
./jperl src/test/resources/unit/refcount/destroy_anon_containers.t # all ok
./jperl -e 'use Class::MOP; print "ok\n"' # ok
make # green
./jcpan -t DBIx::Class # 11 green / 876 ok / 2 fail (baseline)####### What success looks like
After Paths 1 and 2 land:
- Class::MOP self-bootstrap loads cleanly. The metaclass's
refCount can still drift to 0 transiently, but the walker correctly
reports it as reachable via
our %METASand the flush-destroy gate defers to that. - Existing weak-ref / cycle-break tests still pass. When the walker correctly says "this object is unreachable" (e.g. cycle isolated from external refs), the flush-destroy gate fires DESTROY as before.
- Phase D unblocks: bundled Moose loads. Then D1-D6 mechanical steps complete and 477/478 Moose tests pass.
####### What success does NOT mean
The selective refCount may still over-count in some cases (objects hold refCount > 0 after they're truly dead). That's acceptable: the existing auto-sweep will reap them on the next walker cycle. The problematic direction — under-counting that fires DESTROY too early — is what Paths 1+2 fix.
DBIx::Class is the most refcount-heavy CPAN distribution we test.
Before / after the auto-sweep weaken fix (commit ca3af1ad3):
| Metric | Baseline | After fix |
|---|---|---|
| Files executed | 314 | 314 |
| Assertions executed | 878 | 878 |
| Fully passing files | 11 | 11 |
| Failed files | 303 | 303 |
| Assertions failing | 2 | 2 |
Zero regressions in DBIC.
The MortalList.flush bug is a separate, unrelated bug — its fix is still pending.
./jperl -e 'use Class::MOP; print "ok\n"' # → ok
./jperl -e 'use Moose; print "ok\n"' # → ok (still via shim)Phase D resumption is now unblocked for the simple case.
Active blocker (discovered while attempting D1-D3): MortalList.flush() destroys the metaclass during Class::MOP bootstrap
When the bundled upstream Moose 2.4000 was tried in
feature/moose-phase-d, use Class::MOP; died at the third
add_attribute(...) call in the Class::MOP.pm self-bootstrap with:
Can't call method "name" on an undefined value at .../Attribute.pm line 433
Diagnosis: The bootstrap calls attach_to_class($meta) ten+ times,
each storing a weak back-reference from the attribute to the
metaclass. The first eight succeed; the ninth weaken() enters with
base.refCount == Integer.MIN_VALUE (already destroyed) and
immediately UNDEFs the slot. Stack trace shows the destroy fires
from:
DestroyDispatch.callDestroy
← MortalList.flush (line 558)
← RuntimeScalar.setLargeRefCounted (line 1236)
← assignment in Sub::Install
So between iterations, an ordinary reference assignment (in
Sub::Install method-installation, called by Class::MOP's bootstrap)
flushes mortals, which decrements the metaclass's refCount to 0
and fires DESTROY — even though it's still referenced by %METAS,
by the lexical $meta, by the attribute $self->{associated_class}
slot, etc.
Per-iteration refCount trace from PJ_WEAKEN_TRACE=1:
DBG weaken called: base=metaclass refCount=6 # iter 1
DBG weaken called: base=metaclass refCount=7 # iter 2
DBG weaken called: base=metaclass refCount=7 # iter 3
DBG weaken called: base=metaclass refCount=5 # iter 4 (drop!)
DBG weaken called: base=metaclass refCount=6 # iter 5
DBG weaken called: base=metaclass refCount=6 # iter 6
DBG weaken called: base=metaclass refCount=-2147483648 # iter 9: ALREADY DESTROYED
The refcount is unstable across flushes. The Phase 3 weaken-auto-sweep
fix prevents the auto-sweep from racing the bootstrap, but the
per-flush DESTROY in MortalList.flush() itself decrements
prematurely.
Hypothesis: setLargeRefCounted is double-counting an "owned"
ref-store somewhere. Each store of the metaclass into a hash should
be balanced by exactly one decrement at scope exit. The trace
suggests scope-exit cleanup is running while a hash-element store
is still live — both decrement.
Investigation steps when resuming:
- Add a refCount-history print to
setLargeRefCountedandMortalList.flushforblessId != 0referents. RunJPERL_NO_AUTO_GC=1 ./jperl -e 'use Class::MOP'and capture every increment/decrement on the metaclass. - Cross-check against the same trace under
perl(real Perl, instrumented refcount) for the same Class::MOP.pm bootstrap. Diff to find which assignment is asymmetric. - Most likely culprit: the
MortalList.deferDecrementIfTrackedpath adds the base topendingeven for hash-store assignments that are themselves followed byMortalList.flush()— so the base gets decremented twice (once at the next flush, once at the eventual scope exit). - Alternative culprit: per-statement flush is being called when the tracked owner is a closure capture (Sub::Install installs methods via closures), which over-counts ownership transitions.
Apply the fix surgically. Re-run the W6 (DBIC zero regressions) and W7 (Class::MOP loads) gates. The minimal-repro + unit-test gate already passes (Step W2 unit tests).
The previous Phase D attempt got as far as:
- D1 (bundle upstream
.pmfiles): branchfeature/moose-phase-dcopiedMoose-2.4000/lib/{Class,Moose,Test/Moose.pm,metaclass.pm,oose.pm}intosrc/main/perl/lib/. Branch was deleted when Phase D paused; redo from~/.cpan/build/Moose-2.4000-*/lib/. - D2 (
Class::MOP.pmXSLoader patch): exact patch worked out. Replace theXSLoader::load('Moose', $VERSION)block atClass::MOP.pmline 31 with:XSLoader::load('Moose', $VERSION) if 0; { require Config; if ($ENV{MOOSE_PUREPERL} || !$Config::Config{usedl}) { require Class::MOP::PurePerl; } else { require XSLoader; XSLoader::load('Moose', $VERSION); } }
- D3 (
Class::MOP::PurePerlskeleton): drafted in this attempt. Replicates the simple-reader installation from each of the 13 .xs files plus mop.c. Survived as a dead file in the deleted branch but the design is now well-known. The full inventory of what each .xs installs is documented above. Total replacement is < 500 lines. - D4 (prereq verification), D5 (distroprefs), D6 (snapshot tests): not yet attempted.
Drop Moose-2.4000/lib/Class/MOP* and Moose-2.4000/lib/Moose* and
Moose-2.4000/lib/metaclass.pm and Moose-2.4000/lib/Test/Moose.pm
into src/main/perl/lib/. Replace our existing shim files (Moose.pm,
Moose/Role.pm, Moose/Object.pm, Moose/Util/TypeConstraints.pm,
Class/MOP.pm, Test/Moose.pm, metaclass.pm, and the various
skeleton .pm stubs from Phase 2). Snapshot upstream Moose-2.4000/t/
into src/test/resources/module/Moose/t/ for regression coverage
(this is what AGENTS.md's "lock in progress" rule asks for).
Effort: ~½ day (mostly mechanical).
Upstream Class::MOP.pm does an unconditional
XSLoader::load('Moose', $VERSION) at line 31. On PerlOnJava the
loader fails with "Can't load shared library on this platform" and
the whole module won't compile. Replace the XSLoader::load block
with:
if ($ENV{MOOSE_PUREPERL} || !$Config{usedl}) {
require Class::MOP::PurePerl;
}
else {
XSLoader::load('Moose', $VERSION);
}PerlOnJava's Config::usedl is empty, so this routes to the
PurePerl module unconditionally. (The env var is for forcing PP on
real Perl during development.)
This is the only modification to upstream Moose code. Document it prominently so future sync-ups with upstream don't drop it.
Effort: ~½ day.
The XS provides accessor methods on a handful of mixin classes.
None of them do anything clever — they all read/write hash slots
on the metaclass / attribute / method instances. The breakdown
(by xs/*.xs file):
| .xs file | Lines | What it provides | PP replacement |
|---|---|---|---|
Attribute.xs |
9 | BOOT only — pulls in shared accessor table | trivial |
AttributeCore.xs |
18 | Mixin readers: name / accessor / reader / writer / predicate / clearer / builder / init_arg / initializer / definition_context / insertion_order | one-liners over $_[0]->{...} |
Class.xs |
12 | BOOT only | trivial |
Generated.xs |
9 | BOOT only | trivial |
HasAttributes.xs |
9 | Mixin: _attribute_map reader |
one-liner |
HasMethods.xs |
89 | _method_map, add_package_symbol-tied method install |
pure-Perl Package::Stash-based |
Inlined.xs |
8 | BOOT only | trivial |
Instance.xs |
8 | BOOT only | trivial |
MOP.xs |
22 | is_class_loaded, _inline_check_constraint, etc. |
already in our shim |
Method.xs |
23 | body, name, package_name accessors |
one-liners |
Moose.xs |
148 | Moose::Util::throw_exception_class_callback and friends, init_meta hooks |
most can defer to existing pure-Perl |
Package.xs |
8 | BOOT only | trivial |
ToInstance.xs |
63 | Class::MOP::class_of fast path, blessed-arg checks |
one-liner with blessed/ref |
mop.c |
284 | Shared accessor-generation framework: mop_install_simple_accessor, mop_class_check, mop_check_package_cache_flag |
~150 lines of pure Perl |
Total Perl replacement: well under 500 lines. Most of it is
literally sub name { $_[0]->{name} }-shaped.
The actual implementation lives in one new file:
src/main/perl/lib/Class/MOP/PurePerl.pm. It walks the mixin packages
(Class::MOP::Mixin::AttributeCore, Class::MOP::Mixin::HasAttributes,
Class::MOP::Mixin::HasMethods, Class::MOP::Method, Class::MOP::Package,
Class::MOP::Class, Class::MOP::Attribute, Class::MOP::Instance)
and installs the accessors that upstream's XS would have installed,
plus the few non-accessor helpers (_inline_check_constraint,
Class::MOP::class_of PP version, etc.).
Reference: this is exactly what Class::MOP::PurePerl.pm would have
been before XS was added. The upstream commit that introduced the
XS (bf38c2e9, 2010) is a useful guide — its diff shows exactly
which Perl was replaced.
Effort: ~3 days. Most time goes to implementing & testing the accessor packs, not architectural decisions.
Class::MOP::Package does use Package::Stash;. Upstream
Package::Stash tries Package::Stash::XS first, falls back to
Package::Stash::PP if XS unavailable — this works as-is on
PerlOnJava (we've verified use Package::Stash succeeds today).
Other prereqs already verified working on PerlOnJava (per the
existing dependency-graph table earlier in this doc):
Try::Tiny, Module::Runtime, Devel::GlobalDestruction,
Devel::StackTrace, Devel::OverloadInfo, Sub::Exporter,
Sub::Install, Sub::Identify, Data::OptList, Class::Load,
Eval::Closure, Params::Util, B::Hooks::EndOfScope,
Package::DeprecationManager, Dist::CheckConflicts.
Effort: ~½ day to verify nothing regressed when we move from shim
to real Class::MOP.
Today's Moose.yml distropref runs prove --exec jperl -r t/. Add
an exclusion for t/todo_tests/moose_and_threads.t:
test:
commandline: 'prove --exec jperl -r t/ --not t/todo_tests/moose_and_threads.t'(or, equivalent, use a prove ignore-file.)
Effort: ~10 minutes.
Per AGENTS.md's bundled-modules rule, copy Moose-2.4000/t/ (minus
the threads file) into src/test/resources/module/Moose/t/. Add the
new directory to make test-bundled-modules. From then on,
regressions in any of the 477 passing files are caught by make.
Effort: ~½ day.
| Sub-phase | Effort |
|---|---|
D1: bundle upstream .pm files |
½ day |
D2: patch Class::MOP.pm XSLoader skip |
½ day |
D3: implement Class::MOP::PurePerl |
3 days |
| D4: prereq verification | ½ day |
| D5: distroprefs threads-test exclusion | 10 min |
D6: snapshot tests under module/Moose/ |
½ day |
| Total | ~5 days |
Outcome: 477 / 478 fully-green files
(everything except t/todo_tests/moose_and_threads.t). Anything
still failing after Phase D is a real bug in PerlOnJava core (not in
the Moose port) and gets fixed in core.
After Phase D the bundled Moose ships from the JAR; users don't run
cpan -i Moose. Phase B becomes useful only when somebody wants to
install a different XS distribution that has a pure-Perl fallback
the way Moose does. Not part of the Moose plan.
Affects Moose::Exporter re-export-tracking warnings only. The
real Moose's Moose::Exporter will surface a warning instead of
hard-failing when this magic is missing — acceptable. Not part of
the Moose pass-all-tests plan.
- Phase D started: bundled upstream Moose 2.4000 in
src/main/perl/lib/. use Class::MOPanduse Mooseboth succeed on PerlOnJava (with walker-gated destroy +our %METAS+ Package::Stash::PP slot patch- grep aliasing fix + Method::Accessor weaken disable + a hand-rolled
type-name parser to bypass
(?(DEFINE)…)).
- grep aliasing fix + Method::Accessor weaken disable + a hand-rolled
type-name parser to bypass
- Moose's own test suite (with
./jcpan -t Moose --jobs 1) reaches 412 / 478 fully-green files (was 71 / 478 with the old shim). - DBIC (
./jcpan -t DBIx::Class --jobs 1) regressed from master's0 failing assertions / 2 failed filesto23 failing assertions / 13 failed fileswith the walker gate.
1c938a99d (cherry-picked from ecb5c6400) "fix(refcount):
walker-gated destroy resolves Class::MOP bootstrap blocker" is the
sole DBIC regression source. Bisection: master passes
t/prefetch/incomplete.t; 1c938a99d alone fails it the same way
the full Phase-D branch does ("Can't call method 'resultset' on an
undefined value … source 'Track' is not associated with a schema").
The walker gate is necessary for use Class::MOP to load.
Without it, the metaclass for Class::MOP::Mixin::HasMethods is
DESTROYed mid-bootstrap by a MortalList.flush() and weak refs to
the metaclass clear before _attach_attribute finishes.
The walker gate is simultaneously what breaks DBIC, because the
walker reports reach=false for objects (e.g. DBICTest::Schema)
that are clearly held by a script-level my $schema = DBICTest->init_schema(). With reach=false, the gate falls
through to the destroy path, all weak refs from
ResultSource->{schema} clear, and downstream method calls hit
undef.
So the same gate either over-protects (broken cycle break) or under-protects (broken DBIC bootstrap). The walker's reachability oracle is the ground-truth concept; refining the oracle is the work.
ReachabilityWalker.isReachableFromRoots seeds from:
GlobalVariable.globalCodeRefs— package subs.GlobalVariable.globalVariables/globalArrays/globalHashes— package globals.ScalarRefRegistry.snapshot()filtered byMyVarCleanupStack.isLive(sc) || sc.refCountOwned,!WeakRefRegistry.isweak(sc),!sc.scopeExited,sc.captureCount == 0.MyVarCleanupStack.snapshotLiveVars()— currently-registered my-vars (RuntimeScalar, RuntimeHash, RuntimeArray instances).DestroyDispatch.snapshotRescuedForWalk()— DESTROY-rescued objects.
Then BFS over the seeds, walking RuntimeHash element values and
RuntimeArray elements via followScalar (which honours
!WeakRefRegistry.isweak(s) and the REFERENCE_BIT).
The user-script's my $schema = DBICTest->init_schema() is a
top-level lexical. Tracing showed:
- The
$schemaRuntimeScalar IS registered in MyVarCleanupStack during execution (after the fix to populateliveCountsunconditionally — see "Fixes already landed in this branch" below). - But
seedTarget($schema, target, …)returnsfalsefor the schema target. That is,$schema.valuedoes not point to the blessed RuntimeHash that's being destroyed at the gate-fire moment.
That can mean only one of:
A. $schema.value was overwritten / cleared before the gate
fires (some intermediate call assigned undef to $schema's
storage slot, even though the user's lexical view of $schema
is still live).
B. There are two different DBICTest::Schema blessed instances
— $schema points to instance #1, the gate fires for instance #2,
and #2 is held only by closures / mortals / detached refs.
C. $schema itself is not the lexical the walker thinks it is —
maybe the JVM bytecode emits a copy into a local slot (with
refCountOwned=true) and registers THAT in MyVarCleanupStack,
while the schema lives on the original.
These are correct, useful, and needed regardless of the next fix-level work:
-
MyVarCleanupStack.registeralways populatesliveCounts(Apr 2026). Was gated onWeakRefRegistry.weakRefsExist; meant thatmyvars declared before any weaken() were invisible to the walker. Cost: one HashMap.merge permy. -
ReachabilityWalker.snapshotLiveVars()seeding: walkRuntimeScalarfirst (so its REFERENCE_BIT gets followed viaseedTarget), only then fall through to the genericRuntimeBasebranch. Otherwise the BFS adds the scalar totodobut the BFS body only steps into hashes / arrays. -
our %METASin bundled Class::MOP.pm so the walker finds the metaclass cache as a global hash. -
grepreturns aliases (Class::MOP::MiniTrait depends on it). -
Stub the XS-only Class::MOP / Moose accessors in
Class/MOP/PurePerl.pm. -
Patch Class::MOP::Method::Accessor to skip
weaken($self->{attribute}). The selective refCount can't keep the attribute alive across the brief window betweenweakenand_initialize_body. Trade: leaks attribute objects at global destruction. -
Patch
~/.perlonjava/lib/Package/Stash/PP.pmto bypass*GLOB{SCALAR}for the SCALAR slot (which our impl returns as the value, not a SCALAR ref).
A reliable failing reproducer now lives at
dev/sandbox/walker_gate_dbic_minimal.t (kept in sandbox until it
passes, per project convention; move it to
src/test/resources/unit/refcount/ after the fix).
my @objs;
my @wrappers;
for (1..5) {
my $o = T::Obj->new;
my $w = T::Wrapper->new($o); # weakens $w->{obj}
$o->{wrapper} = $w; # cycle: o -> w (strong), w -> o (weak)
push @objs, $o;
push @wrappers, $w;
}
# many ref operations, then:
# T::Obj id=1 has been DESTROY'd even though @objs[0] still points to it.
# wrapper[0]->{obj} cleared. -> 3/4 of the test's assertions fail.The test deliberately AVOIDS use Test::More — loading it creates
enough additional globals/lexicals that the walker's reachable set
becomes large enough to transitively cover @objs, masking the bug.
The bare-print-TAP version reliably fails on every run.
Stack trace from PJ_DESTROY_TRACE=1 shows the destroy path:
at DestroyDispatch.callDestroy(...)
at ReachabilityWalker.sweepWeakRefs(...)
at MortalList.maybeAutoSweep(...)
at MortalList.flush(...)
So the destroy fires from sweepWeakRefs, not from the
walker-gated MortalList.flush() decrement path. The auto-sweep
calls ReachabilityWalker.walk() to compute a "live" set; anything
not in that set has its weak refs cleared and DESTROY fired.
The walker's walk() method (the multi-phase one used by the
sweep) seeds from:
GlobalVariable.globalCodeRefs(with closure-capture walking)GlobalVariable.globalVariables / globalArrays / globalHashesDestroyDispatch.snapshotRescuedForWalk()ScalarRefRegistry.snapshot()filtered byMyVarCleanupStack.isLive(sc) || sc.refCountOwned
It does NOT seed from MyVarCleanupStack.snapshotLiveVars()
directly. That seeding was added only to the per-object query
isReachableFromRoots() — the same fix needs to go into walk().
So the fix is:
- Make
walk()also seed fromMyVarCleanupStack.snapshotLiveVars()(RuntimeArray / RuntimeHash / RuntimeScalar live my-vars), mirroring theisReachableFromRoots()change. - Apply the same RuntimeScalar-before-RuntimeBase ordering inside the walk's seed-handler loop.
This is D-W2's fix path. Estimated impact: D-W0(c) reproducer passes, DBIC drops from 23 failing assertions back to ≤2, Moose stays at 412/478, refcount unit tests stay green.
-
D-W0 (DONE): reliable reproducer at
dev/sandbox/walker_gate_dbic_minimal.tconsistently fails. -
D-W1 (DONE — Apr 2026): added two seeding fixes to
ReachabilityWalker.walk():- Seed from
MyVarCleanupStack.snapshotLiveVars()so top-levelmy @arr/my %hashlexicals are visible to the auto-sweep. - Order RuntimeScalar before RuntimeBase in the seed handler so scalar reference bits get followed.
Result:
dev/sandbox/walker_gate_dbic_minimal.tpasses. The per-test failing reproducert/prefetch/incomplete.tpasses (20/20). All refcount unit tests stay green. Moose suite stays at 412/478. - Seed from
-
D-W2 (DONE — Apr 2026): RuntimeStash skip in walker BFS.
Stash hashes (RuntimeStash whose
elementsis a HashSpecialVariable) eagerly copy all global keys viaentrySet()on every visit — O(globals) per visit, quadratic when the walker is called repeatedly.Fix:
if (cur instanceof RuntimeStash) continue;inReachabilityWalker.bfs()andisReachableFromRoots(). Stash entries are already directly seeded fromGlobalVariable.global*Refs, so iterating them viastash.elementsis redundant work.Empirical impact (
t/sqlmaker/dbihacks_internals.t):- Before D-W2: never finished (>10 minutes wall-clock, still running)
- After D-W2: ALL 6492 tests pass in 30 seconds
-
D-W2b (PARTIALLY DONE — Apr 2026): lazy
MyVarCleanupStack.liveCountspopulation.MyVarCleanupStack.registernow only populatesliveCountswhenWeakRefRegistry.weakRefsExist == true, restoring pre-D-W1 per-mycost. To preserve D-W1 correctness, the FIRSTweaken()call (inWeakRefRegistry.registerWeakRef) does a one-time backfill: walks the existingMyVarCleanupStack.stackand inserts every still-registered my-var intoliveCounts.Per-test wallclock comparison (master jperl JAR vs feature jperl JAR, same .t files, no harness):
Test Master D-W2 D-W2b vs Master t/05components.t 6.25s 4.85s 2.82s 0.45× t/52leaks.t 40.15s 9.43s 5.90s 0.15× t/76joins.t 9.79s 6.90s 5.52s 0.56× t/86might_have.t 9.66s 9.86s 4.67s 0.48× t/100populate.t - 12.62s 15.76s - t/60core.t - - 15.65s - Most individual tests are now 2-7× faster than master (the walker gate prevents wasteful destroy cascades).
Full DBIC suite results:
Metric Master D-W2 D-W2b D-W2c Wallclock 1410s 3782s 2386s 1748s Tests run 13858 13740 13851 13858 Failed files 0 8 4 0 Failed subt. 0 2 2 0 Result PASS FAIL FAIL PASS D-W2c is the green state. DBIC matches master baseline exactly (314 files / 13858 tests / 0 fail / PASS), 1.24× wallclock cost.
-
D-W2c (DONE — Apr 2026): the walker gate is now class-name-restricted to Class::MOP / Moose / Moo class hierarchies. Other classes get normal Perl 5 destroy semantics (refCount==0 → destroy fires immediately).
Why class-name gating works empirically:
Class::MOP / Moose store metaclasses in
our %METAS(a package global) and rely on the gate to absorb transient refCount drift during bootstrap. DBIC and CDBI store rows inlive_object_indexvia WEAK refs, expecting the row to die at refCount==0 so a fresh fetch reloads from the DB. The two patterns require opposite gate behaviour. The class-name filter cleanly separates them.Why this is a stopgap:
Other modules outside Class::MOP/Moose may need the gate in the future. The proper long-term fix is to either:
-
Find and back-fill the missing refCount increments at the source (when an object transitions from refCount=-1 untracked to refCount=0+ tracked, scan ScalarRefRegistry for scalars holding the object and back-increment).
-
Replace the selective refCount mechanism with a more reliable scheme (e.g. JVM-level identity hashmap keyed by referent, counting actual scalar holders).
Both are deferred to D-W2d.
Per-test verification (all PASS):
dev/sandbox/walker_gate_dbic_minimal.t(4/4)src/test/resources/unit/refcount/walker_gate_dbic_pattern.tT1-T4 (T5 marked SKIP — needs PJ_RUN_T5=1; tests a pattern that fails on master too).t/cdbi/04-lazy.t36/36t/storage/txn_scope_guard.t18/18t/52leaks.t11/11use Moose; package Foo; has bar => (is=>'rw'); ...
-
-
D-W2d (NEXT after D-W2c — perf gap to close): bring the remaining 1.69× wallclock gap closer to 1.0×. Likely candidates:
-
Per-class hasWeakRefs filter. Skip the walker call for classes never weakened (most blessed objects in DBIC's data layer have no weak refs).
-
Cache the walker's live-set per flush. Compute live set once if multiple weak-ref'd objects hit refCount=0 in the same flush.
-
Coalesce gate calls. Queue and process in a single walker pass at flush end.
-
-
D-W3 (BLOCKED on D-W2c): drop reproducer into
src/test/resources/unit/refcount/. -
D-W4 (LATER): Phase 4-6 shim widening for Moose 412→477 / 478.
Per the user's instruction: "Failing weaken/DESTROY is not accepted at all." Every fix MUST be validated against:
./jcpan --jobs 1 -t DBIx::Class # 0 failing assertions, ≤2 failed files
./jcpan --jobs 1 -t Moose # ≥ 412 / 478 fully green
make # full unit suite green
./jperl src/test/resources/unit/refcount/*.t # all pass
./jperl src/test/resources/unit/weaken_via_sub.t # 20/20Parallel runs (./jcpan -t … without --jobs 1) OOM-crash on the
local box for several DBIC tests; that is environmental, not a
DESTROY regression. Always serialise the regression gate with
--jobs 1.
Optimistic order (Phases 3 → 6 ship value incrementally; D is the destination):
- Phase 3a: enriched
Moose::_FakeMeta(@ISAincludesClass::MOP::Class+Moose::Meta::Class; addedadd_attribute/get_attribute/find_attribute_by_name/has_attribute/remove_attribute/get_attribute_list/get_all_attributes/get_method/has_method/get_method_list/new_object/is_mutable). - Phase 3b: added next batch of compile-time
.pmstubs (Class::MOP::Method,Class::MOP::Instance,Class::MOP::Method::Accessor,Class::MOP::Package,Moose::Meta::Method,Moose::Meta::Attribute,Moose::Meta::Role,Moose::Meta::Role::Composite,Moose::Meta::TypeConstraint,Moose::Meta::TypeConstraint::Enum,Moose::Util::MetaRole,Moose::Exception). - Phase 3c: blessed
Moose::Util::TypeConstraints::_StubintoMoose::Meta::TypeConstraint. - Phase 3d: added
export_type_constraints_as_functionsandfind_or_parse_type_constrainttoMoose::Util::TypeConstraints. - Phase 3e: added
Moose::Meta::Role->create_anon_role. - Phase 4: hook into Moo's attribute store from
Moose::_FakeMeta->get_attribute*methods. - Phase 5: real-ish
Moose::Util::MetaRole::apply_metaroles. - Phase 6: full
Moose::Exportersugar installation. - Phase D1: drop upstream
Moose-2.4000/lib/{Class/MOP*,Moose*, metaclass.pm,Test/Moose.pm}intosrc/main/perl/lib/, replacing the shim files. - Phase D2: patch
src/main/perl/lib/Class/MOP.pm'sXSLoader::loadblock to fall back toClass::MOP::PurePerlwhen!$Config{usedl}. - Phase D3: implement
src/main/perl/lib/Class/MOP/PurePerl.pm(~500 lines pure Perl; replacesxs/*.xs+mop.c). Mining reference: upstream Moose pre-XS commitbf38c2e9. - Phase D4: verify all
Class::MOPruntime dependencies still load cleanly with the bundled (vs shim)Class::MOP. - Phase D5: edit
src/main/perl/lib/CPAN/Config.pm'sMoose.ymldistropref to skipt/todo_tests/moose_and_threads.t. - Phase D6: snapshot
Moose-2.4000/t/(minus the threads test) intosrc/test/resources/module/Moose/t/somake test-bundled-modulesenforces no regressions. - After Phase D: write a one-line note at the top of this doc
saying "passes 477/478 of upstream Moose 2.4000". Update
dev/modules/cpan_compatibility.mdif it tracks Moose.
Phases B / E remain deferred as before — they're not on the Moose "pass all tests" critical path.
Two bytecode/runtime bugs found while triaging the Moose 412/478 plateau:
B::svref_2object($code)->GV->STASH->NAME (used by
Class::MOP::get_code_info and Class::MOP::Mixin::HasMethods:: _code_is_mine) returned main for any anon sub created in a
non-main package. Real Perl returns the compile-time package
(CvSTASH).
The CvSTASH is recorded on RuntimeCode.packageName. Fixed by:
Sub::Util::subnamenow returnsPkg::__ANON__when the sub has no name but a known package.Sub::Util::subnamehonors theexplicitlyRenamedflag forset_subname("", $code)(returns empty string, not__ANON__).B.pm's_introspectacceptsPkg::__ANON__,Pkg::and bare renames.
This fix unblocks immutable metaclass trait application (the
Class::MOP::Class::Immutable::Trait::add_method etc. were being
rejected by _code_is_mine and so didn't take effect).
Tests fixed (full file pass, failing_count -> 0):
t/cmop/make_mutable.t(12)t/cmop/numeric_defaults.t(12)t/cmop/subclasses.t(6)t/cmop/method.t(5)t/cmop/method_modifiers.t(3)t/cmop/anon_class.t(3)t/cmop/immutable_metaclass.t(5 -> 1)t/cmop/add_method_debugmode.t(10 -> remaining timing-only)t/exceptions/class-mop-class-immutable-trait.t(2)t/cmop/get_code_info.t(3 -> 1, MODIFY_CODE_ATTRIBUTES name remains)
sort { $_[0]->($a, $b) } @list is the idiom Moose's native Array
trait emits for the sort($cmp) accessor. Real Perl exposes the
surrounding sub's @_ inside the sort BLOCK, but PerlOnJava was
creating a fresh empty @_.
Fixed in the same way map/grep already worked: pass slot 1 (@_) to
ListOperators.sort from both the JVM emitter
(EmitOperator.handleMapOperator) and the interpreter
(InlineOpcodeHandler.executeSort), and forward it as the
comparator's args (unless the comparator has a $$ prototype, in
which case the existing (a, b) semantics still apply).
Bytecode descriptor for the SORT op was widened to include
RuntimeArray (the outer @_).
Tests fixed:
t/native_traits/trait_array.t(6 sort-with-fn failures)
- DBIC: still PASS (314/314, 13858/13858) — verified locally.
- Moose: 396/478 files pass (was 391/478). 137 failed asserts
(was 145). Remaining failures cluster around:
- Numeric/string warning categories not implemented
(
always_strict_warnings.t, etc.) - Native trait Hash coerce + delete corner cases.
- Anonymous metaclass GC timing (depends on weak-ref / walker scheduling).
Moose::Exception::CannotLocatePackageInINCetc. —INCattribute name handling.- Stack-trace shape (
__ANON__in stringified frames, etc.). - A handful of cmop/method introspection edge cases (constants, forward declarations, eval-defined subs).
- Numeric/string warning categories not implemented
(
Date: 2026-04-26
Branch: fix/d-w6-precise-die-probe
With the walker gate disabled in MortalList.java:558 and a more
granular probe added to WeakRefRegistry (env-flag
PJ_WEAKCLEAR_TRACE), we instrumented weaken, removeWeakRef, and
clearWeakRefsTo and ran use Class::MOP after wrapping
Class::MOP::Attribute::{attach_to_class, install_accessors} from
Perl with refaddr-printing logs.
Smoking gun trace (use Class::MOP, gate disabled):
[WEAKEN] ref=... referent=1746570062 (RuntimeHash) ← _methods slot
[WEAKEN] ref=... referent=1746570062 (RuntimeHash) ← method_metaclass slot
[WEAKEN] ref=... referent=1746570062 (RuntimeHash) ← wrapped_method_metaclass slot
[WEAKCLEAR] referent=1746570062 (RuntimeHash) clearing 4 weak refs
at WeakRefRegistry.clearWeakRefsTo(...)
at DestroyDispatch.callDestroy(...)
at MortalList.flush(...)
at jar:PERL5LIB/Class/MOP/Class.pm:260 ← inside Class::MOP::Class::initialize/_construct_class_instance
at jar:PERL5LIB/Class/MOP/Mixin.pm:104 ← Mixin::meta
at jar:PERL5LIB/Class/MOP.pm:2351 ← bootstrap statement
attach_to_class attr=... name=_methods class=1746570062
after_attach: assoc=1746570062 ← OK
attach_to_class attr=... name=method_metaclass class=1746570062
after_attach: assoc=1746570062 ← OK
attach_to_class attr=... name=wrapped_method_metaclass class=1746570062
after_attach: assoc=UNDEF ← BUG
install_accessors: assoc=UNDEF
The bug: the metaclass RuntimeHash (id=1746570062) is stored in
our %METAS (store_metaclass_by_name $METAS{$name} = $self).
Despite %METAS strongly holding it, the selective refCount drops
to 0 at end-of-statement when the temporary returned by
HasMethods->meta falls out of scope and MortalList.flush()
processes its mortals. clearWeakRefsTo is called on the metaclass,
which nulls all 4 attribute back-references including the freshly
weakened wrapped_method_metaclass's associated_class slot.
The first failing install_accessors then sees associated_class = UNDEF and dies on $class->add_method(...). That die is hidden by
local $SIG{__DIE__} inside try { install_accessors } at
Class/MOP/Class.pm:897. The catch block runs remove_attribute → remove_accessors → _remove_accessor at Class/MOP/Attribute.pm:475,
which dies again with the visible Can't call method "get_method" on an undefined value message.
Root cause statement: the selective refCount is failing to
count the strong reference held by the package variable hash slot
$METAS{$package_name}. When the temporary metaclass return-value
from ->meta expires, refCount goes from 1 → 0 even though %METAS
still holds it.
Why the walker gate "fixes" it: the gate at
MortalList.java:558-561 short-circuits the destroy cascade for
metaclass-shaped names, so clearWeakRefsTo never gets a chance to
null the attribute back-refs.
Why universal walker doesn't fix it: the walker is consulted
after refCount underflow. By the time MortalList.flush() decides
to call DESTROY, the refCount is already 0; the walker would need to
detect "refCount=0 but %METAS still references" — which is exactly
what a graph walk from package globals could confirm. The "universal
walker" experiments only checked direct lexical reachability, not
package-variable-hash reachability.
Two options for a real fix:
-
Fix the refCount discipline: ensure
RuntimeHash.put()/ slot-assignment increments the selective refCount of the referent when storing a blessed/tracked RuntimeBase value. Find why this doesn't happen for$METAS{$name} = $meta. -
Walker as ground truth before
clearWeakRefsTo: when refCount hits 0, before firing DESTROY, run a reachability walk from package-variable roots; if reachable, leave refCount at 1 instead of dropping to 0. (Replaces the heuristic gate with a precise walker check.)
Option 1 is the proper fix; option 2 is the safety net we already half-built.
The diagnostic env-flags PJ_WEAKCLEAR_TRACE (now wired up across
weaken, removeWeakRef, clearWeakRefsTo) and PJ_DESTROY_TRACE
should be retained.
Three concrete fix attempts on fix/d-w6-precise-die-probe:
| Variant | DBIC 52leaks | Notes |
|---|---|---|
| Master (class-name heuristic) | 11/11 pass | baseline |
| Universal walker (no heuristic) | 9/11 + die at line 518 | resultset that should leak is incorrectly rescued; downstream weaken(my $r = shift @circreffed) dies because $r is undef |
| Hybrid (heuristic full + non-heuristic globalOnly) | same as universal — DBIC regression unchanged | |
| Add walker gate to weaken's dec-to-0 path | times out (>120s) on t/52leaks.t |
walker invocations on every weaken of Moose/CMOP-blessed objects is too expensive on DBIC's Moose-heavy schema construction |
Conclusion: the class-name heuristic is the best known compromise until we either:
- Fix the underlying
our %METASrefCount-discipline bug (option 1) - Cache walker reachability results to make universal application cheap enough not to time out
Synthetic reproducers in src/test/resources/unit/refcount/drift/
do NOT trip the underflow with non-CMOP class names — proving that
the bug is not just a generic our %HASH storage issue. Some
specific shape of the recursive CMOP bootstrap (interaction between
HasMethods->meta reentry and weakly-attached attributes) is
required.
Suggested next investigation: per-RuntimeBase refCount transition
trace for the specific metaclass, surfacing each increment/decrement
site during the failing use Class::MOP. With 22 decrement sites
across the runtime, blanket instrumentation is infeasible, but a
targeted tracer (turn on for blessed Class::MOP::Class only)
should make the underflow source pinpointable.
Implemented (this branch): a per-RuntimeBase refCountTrace flag
that is armed at bless time when the bless target matches
classNeedsWalkerGate (Class::MOP / Moose / Moo) and the env-flag
PJ_REFCOUNT_TRACE is set. The flag-gated traceRefCount(delta, reason) method writes a stderr line for every transition with a
short stack snippet. Wired into the four critical sites:
WeakRefRegistry.weaken(decrement on weakening)RuntimeScalar.setLargeRefCounted(increment on store)RuntimeScalar.setLargeRefCounted(decrement on overwrite)MortalList.flush(deferred decrement)
Cost when off: one boolean && test per call site. Cost when on:
all increments/decrements logged for matched objects only.
Findings on use Class::MOP (gate disabled): the failing
metaclass RuntimeHash (id 573487274 in one run) accumulated:
- 50 increments via
setLargeRefCounted (increment on store) - 45 decrements via
MortalList.flush (deferred decrement) - 6 decrements via
WeakRefRegistry.weaken (decrement on weakening) - 1 silent increment from
blessitself (refCount++ atReferenceOperators.java:83) - 1 silent paired deferred decrement queued by
blessfromMortalList.deferDecrement(referent)at line 84 (fires later during a flush, counted in the 45)
Net: +1 + 50 - 51 = 0, i.e. one extra decrement.
The 1→0 transition occurs during MortalList.flush() triggered at
the end of statement at Class/MOP/Class.pm:260
(my $super_meta = Class::MOP::get_metaclass_by_name(...)). At
that moment the metaclass should have refCount==1 (held by
our %METAS), but the selective count goes to 0 because one of
the 50 increment sites does not end up paired with the right
decrement (one extra pending.add(metaclass) queued without a
paired increment, or one increment lost without queueing a paired
decrement).
The full trace for the failing metaclass is preserved at
dev/diagnostic-traces/cmop-metaclass-refcount-underflow.txt (102
lines, every transition with stack).
Best fix candidates given this data:
-
Mortal-side rescue (cheap): before
--base.refCountbrings refCount to 0 insideMortalList.flush, consult the walker on a per-blessId whitelist (currently the heuristic). Keep the class-name heuristic — it's the most efficient mask. -
Increment audit (correct, hard): find the unpaired increment/decrement. Likely candidates:
- The bless
pending.add(referent)atReferenceOperators.java:84: if the bless is followed bysetLargeRefCountedstoring the same referent, both happen, and the deferred-decrement's eventual flush is paired with the storing increment — fine. But if the bless is followed by a copy-and-store path that doesn't go throughsetLargeRefCounted, the deferred decrement is unpaired. - The
setLargeRefCounted"rescue" path (currentDestroyTarget) at line 1155 increments without a paired decrement.
- The bless
-
Walker as ground truth (medium): the universal walker (already attempted in D-W6.9) is the most general fix but regresses DBIC. Without a smarter root-set definition, this is not a viable replacement for the heuristic.
The diagnostic env-flags and tracer are retained on this PR so the next session can pinpoint the unpaired site without re-bootstrapping the instrumentation.
The user-stated requirement: the class-name heuristic is not acceptable. We must find and fix the actual unpaired increment/decrement in selective refCount discipline.
For the failing CMOP metaclass, instrumented over use Class::MOP
with the gate disabled:
| Source | Count |
|---|---|
setLargeRefCounted increments (line 1133) |
50 |
bless silent increment (ReferenceOperators.java:83) |
1 |
MortalList.deferDecrement queueings |
2 |
↳ from bless at ReferenceOperators.java:100 |
1 |
↳ from RuntimeArray.shift at RuntimeArray.java:163 |
1 |
MortalList.deferDecrementIfTracked queueings |
42 |
MortalList.deferDecrementRecursive queueings (blessed) |
4 |
WeakRefRegistry.weaken decrements |
6 |
| (queueings flushed during run) | 45 |
Total queueings (deferred decrements): 48 Plus immediate weaken decrements: 6 Total logical decrements at end: 54
Total increments: 51 (50 setLargeRefCounted + 1 silent bless)
Net: 51 - 54 = -3 decrement excess, vs the expected +1
(metaclass held by our %METAS). Discrepancy: 4 extra decrements
unpaired with increments.
Trace artifact: dev/diagnostic-traces/cmop-metaclass-refcount-with-queue-sites.txt
(150 lines, full call-stack snippets).
Step 1: Make the tracer track per-scalar ownership.
Today the tracer knows refCount transitions but not which scalar
holds the relevant ownership flag. The unpaired sites are easier to
find if we tag each setLargeRefCounted increment with the
RuntimeScalar's identity, and each decrement with the scalar whose
refCountOwned is being consumed. Imbalance = an increment scalar
that was never decremented, or a decrement scalar that was never
incremented.
Implementation sketch:
// In setLargeRefCounted: when nb.refCount++ runs,
nb.recordOwner(this); // tag this scalar as an owner
// In deferDecrementIfTracked: when scalar.refCountOwned -> false,
base.releaseOwner(scalar); // remove tag
// At end of run, dump base.activeOwners:
// - 1 entry expected: the hash element scalar of $METAS{name}
// - 0 entries observed → the bugStep 2: Identify the unpaired site.
Run instrumented use Class::MOP once. The 4 extra decrement
queueings will surface as either:
- A scalar that was decremented twice (double-release)
- A pending.add() called for a scalar that never owned the increment
- A
deferDecrementRecursivewalking through a container that shouldn't be torn down (probably the most likely — DBIC's stash cleanup walks too aggressively?)
Likely suspects from the trace:
RuntimeArray.shiftat line 163 queues adeferDecrementfor a blessed metaclass element. When this slot's prior store incremented refCount, this dec balances. But if shift was called on a non-tracked array (or on a slot that was a copy not the original), the dec is unpaired. Stack:Class/MOP/Package.pm:1281— investigate what shift target is being shifted there.deferDecrementRecursive(4 calls) walks recursively into containers being destroyed. If the metaclass is referenced through a hash that itself goes out of scope, this is the path. But the surrounding hash may have been only a transient (e.g., args list to a method) — its tear-down would be paired with whatever increment created that hash's elements.
Step 3: Fix the unpaired site.
Once located, the fix is one of:
- Add a missing increment (e.g., a path that stores into a
container without going through
setLargeRefCounted) - Remove a spurious decrement (e.g.,
deferDecrementRecursivewalking a container whose elements were stored without ownership) - Adjust ownership tracking (e.g., copy-ctor not setting
refCountOwned, but a downstream path treating the copy as if it owned)
Step 4: Remove the class-name heuristic.
Once the underlying bug is fixed, the gate at
MortalList.flush() line 561 simplifies to:
} else if (base.blessId != 0
&& WeakRefRegistry.hasWeakRefsTo(base)
&& ReachabilityWalker.isReachableFromRoots(base)) {
// Universal walker safety net (no heuristic).
}Or even simpler: remove the gate entirely once selective refCount is correct and DBIC's leak detection passes without rescue.
Step 5: Acceptance gates.
The fix is acceptable when ALL of:
makepasses (unit tests)- DBIC
t/52leaks.tpasses 11/11 (today's master baseline) - Moose suite ≥ 396/478 files (today's baseline)
use Class::MOPsucceeds without the walker gate- The synthetic reproducers in
src/test/resources/unit/refcount/drift/pass with the gate disabled
The diagnostic env-flags (PJ_REFCOUNT_TRACE, PJ_WEAKCLEAR_TRACE,
PJ_DESTROY_TRACE) and instrumentation hooks remain in place for
future regressions.
Implemented Step 1 of the D-W6.11 plan: per-scalar ownership tracker
on RuntimeBase with recordOwner(scalar, site) /
releaseOwner(scalar, site). When PJ_REFCOUNT_TRACE is set, every
refCount-affecting operation that touches a heuristic-blessed object
records or releases the owning scalar; dumpTraceOwners() runs as a
JVM shutdown hook.
Wired increments into:
RuntimeScalar.setLargeRefCounted(line 1133)RuntimeScalar.incrementRefCountForContainerStore(line 932)ReferenceOperators.blessre-bless path (line 139)RuntimeScalar.setLargeRefCountedrescue path (line 1183)
Wired releases into:
RuntimeScalar.setLargeRefCountedoverwrite (line 1196)WeakRefRegistry.weaken(line 116)MortalList.deferDecrementIfTracked(line 183)MortalList.deferDecrementRecursive(line 444)RuntimeArray.shift(line 164)DestroyDispatch.doCallDestroyargs balance (line 364)
After plumbing all known inc/dec paths: zero *** UNPAIRED RELEASE ***
events, but one CMOP metaclass (RuntimeHash id 408069119) ends
the run with refCount = Integer.MIN_VALUE (destroyed) and 2
surviving owner scalars holding strong references. Trace at
dev/diagnostic-traces/cmop-metaclass-with-owner-tracking.txt.
The two surviving owners both came from setLargeRefCounted store:
- One via
RuntimeScalar.set ← addToScalar:902(normal store path) - One via
RuntimeScalar.set ← RuntimeBaseProxy.set:65(proxy delegate)
This proves the imbalance is not in the tracer instrumentation: those owner scalars genuinely still hold strong refs to the destroyed metaclass. Selective refCount said "0 strong refs" while in fact 2 strong refs exist.
The 2 extra decrements that brought refCount → 0 must come from
sites that modify base.refCount without consulting ownership.
Audit candidates (sites doing --base.refCount directly):
| File | Line | Path |
|---|---|---|
RuntimeList.java |
623, 642, 666, 699 | list-assignment "undo materialized copy" |
Storable.java |
617 | dclone refCount fixup |
DestroyDispatch.java |
366 | doCallDestroy args balance (instrumented; OK) |
RuntimeBaseProxy.java |
65–67 | proxy set: copies lvalue.value to proxy without ref-tracking |
The most suspicious is RuntimeBaseProxy.set:
this.lvalue.set(value); // properly increments refCount via setLargeRefCounted
this.type = lvalue.type;
this.value = lvalue.value; // proxy ALSO points to base, but no recordOwnerThe proxy's this.value field then holds a strong reference to the
base invisible to selective refcounting. When the proxy is
later assigned a new value, the proxy's set() calls
lvalue.set(new_value) which decrements old base's refCount via the
overwrite path — but only because lvalue.refCountOwned is true. If
some path inadvertently decrements via this.value (bypassing
lvalue), or the proxy is treated as a normal scalar copy
(invalidating lvalue.refCountOwned), the count desyncs.
For each site that does base.refCount-- directly (without going
through the queueing-then-flushing protocol):
- Confirm what RuntimeScalar "owned" the increment that this decrement is undoing.
- If a specific scalar can be identified, replace direct decrement
with
releaseOwner(scalar, site) + base.refCount--. - If no scalar can be identified, the path is decrementing without pairing — fix by either: a. tracking the increment differently (e.g. recording on the original increment), or b. removing the decrement (it was unpaired in the first place).
Once all sites are cleanly paired, the selective refCount becomes self-consistent and the walker-gate heuristic can be removed.
dev/diagnostic-traces/cmop-metaclass-refcount-underflow.txt— D-W6.10 trace (4 inc/dec sites instrumented).dev/diagnostic-traces/cmop-metaclass-refcount-with-queue-sites.txt— D-W6.10 with queue-site tracing.dev/diagnostic-traces/cmop-metaclass-with-owner-tracking.txt— D-W6.12 trace (full owner pairing, surfaces the 2 unpaired increments).
The diagnostic instrumentation is left in place (gated on
PJ_REFCOUNT_TRACE) so future sessions can resume the audit.
This session implemented Step 2 of the D-W6.11 plan: audited and
instrumented all direct --base.refCount sites with paired
releaseOwner / releaseActiveOwner calls. The new
base.activeOwners set on RuntimeBase tracks the live set of
RuntimeScalars that hold a counted strong reference, with
activeOwnerCount() providing a filtered count (only scalars that
still satisfy refCountOwned == true && value == this).
RuntimeArray.shiftline 163 (MortalList.deferDecrementpath)RuntimeList.javaline 624, 645, 670, 705 (4 "undo materialized copy" paths)Storable.javareleaseApplyArgsline 617DestroyDispatch.doCallDestroyline 366 (args balance)MortalList.deferDecrementIfTrackedline 184MortalList.deferDecrementRecursiveline 446WeakRefRegistry.weakenline 119RuntimeScalar.setLargeRefCountedoverwrite at line 1199 and store at line 1135
Trying activeOwnerCount() > 0 → restore refCount as a universal
rescue at MortalList.flush() (replacing the class-name
heuristic):
| Result | Note |
|---|---|
use Class::MOP works without heuristic |
confirmed |
use Moose works without heuristic |
confirmed |
| Unit tests: PASS | all green |
DBIC t/52leaks.t: 9–10 of 18 fail |
leak rescue too aggressive — keeps cycle members alive that real Perl correctly leaks |
The fundamental issue: selective refCount cannot tolerate
cycles without weaken. My filter refCountOwned && value == this
finds true strong owners, including cycle members that real Perl
also leaks but DBIC's leak test expects to see destroyed (likely
because real Perl's mark-sweep would clean them up at GC time, or
because the test environment specifically expects refcount-zero).
Reverted production rescue to retain master parity (11/11 on
t/52leaks.t, walker-gate heuristic still primary).
Calling base.activateOwnerTracking() on every weaken() (which
backfills from ScalarRefRegistry.snapshot()) caused 9/18 fails
on t/52leaks.t — but activateOwnerTracking is supposed to be
side-effect free (just initializes a private set + reads the
registry). The most plausible explanation: iterating
scalarRegistry.keySet() triggers WeakHashMap.expungeStaleEntries(),
which can shift the timing of when JVM GC observes scalars as
collected. This indirectly affects DBIC's leak detection (which
relies on weak refs becoming undef at specific points).
For now, removed the activateOwnerTracking() call from
weaken() — the infrastructure is dormant unless explicitly
activated. Future work: investigate the WeakHashMap expungement
side-effect.
- All ownership-affecting sites have paired record/release calls
activeOwnersset +activeOwnerCount()filter ready for use- DBIC
t/52leaks.t: 11/11 (master parity) - Unit tests: green
use Class::MOP/use Moose: work (with class-name heuristic still gating)- Walker-gate heuristic still in place — replacement still pending the proper refCount-discipline fix
The selective refCount underflow is real (D-W6.10 trace shows
the metaclass refCount going to 0 with 2 surviving strong owners
— D-W6.12). The 2 unpaired increments are still unidentified. The
audit of direct --refCount sites pointed at all known paths,
which now have paired releases — yet the trace numbers from
D-W6.12 (50 inc / 51 dec) still don't balance.
The next investigation should:
- Re-run the D-W6.12 trace with all the new release calls in
place (this branch). The owner dump should now show clean
refCount == owners.size()for the metaclass. - If imbalance persists: the unpaired sites are NOT in the 22
reviewed locations. Search for hidden refCount manipulations:
- assignment paths in
RuntimeArrayProxyEntry,RuntimeHashProxyEntry,RuntimeStash - bytecode-emitted scope-exit code that uses direct field access
- assignment paths in
- If the trace is now clean: the bug is elsewhere — perhaps in
how
refCount = MIN_VALUEinteracts with subsequent stores (the rescue path insetLargeRefCounted).
Question raised: my activeOwnerCount > 0 rescue regresses
DBIC's leak detection by 5-9 tests. What does system Perl do
differently?
Answer: System Perl does not solve cycles. It uses precise
reference counting, and cycles leak by design. The programmer breaks
cycles via explicit weaken() (or Scalar::Util::weaken).
DBIC's leak tests pass on real Perl because DBIC weakens its
internal back-references (e.g., ResultSource → Schema is weak),
allowing each reference-counted graph to collapse properly when an
external strong reference is dropped.
For PerlOnJava to behave the same way, we need:
- Precise selective refCount — every increment paired with exactly one decrement, no transient zeros.
- Effective weaken() — weakening must decrement refCount and exclude the slot from owner-counting (already done correctly).
- No cycle-breaker rescue — cycles SHOULD leak (matching real Perl), and DBIC's weakens will resolve them.
Selective refCount has transient zeros. The deferred
decrement model (MortalList.flush) means a sequence like
inc → queue → inc → flush → flush → inc can have refCount briefly
hit 0 between the second flush and the third inc — even though the
third inc's owning scalar was alive throughout.
When the transient zero hits, MortalList.flush() fires DESTROY,
permanently corrupting the object's state (clearWeakRefsTo, cascade
cleanup). Even if subsequent stores re-bump refCount, the damage
is done.
activeOwnerCount > 0 correctly identifies objects with surviving
strong owners. But:
-
For Class::MOP/Moose metaclasses (held by
our %METAS): surviving owners reflect real strong refs. Rescue is correct. -
For DBIC row objects in test cycles:
populate_weakregistryweak-refs the row. Then test doesundef $rowin some scope. Real Perl: refcount drops to 0, DESTROY fires, weak ref clears. PerlOnJava: selective refCount has phantom owners — typically container element scalars whose containers are themselves transient/dying but haven't yet released their elements.
The phantom owners satisfy the filter refCountOwned == true && value == base, but are themselves on a path to destruction.
Rescue keeps them alive, breaking the leak detection.
System Perl's approach (precise refcount + programmer-controlled weaken) maps to PerlOnJava as:
Goal: eliminate transient zeros from selective refCount without changing the deferred-decrement architecture.
Algorithm options:
-
Synchronous decrement (simplest, correct, expensive): make all decrements synchronous like real Perl. Eliminate
MortalList.flushand put decrement at scope-exit / overwrite sites. Performance cost: every Perl statement has ~10x more refCount work today. -
Owner snapshot at flush (lazy validation): when a deferred decrement would bring refCount to 0, validate the activeOwners set first. Force-purge stale entries by:
- Iterating activeOwners
- For each owner scalar: check
sc.refCountOwned && sc.value == base - For surviving owners: check whether they are themselves reachable from package globals OR live my-vars
- Rescue ONLY if at least one surviving owner is reachable
The "reachable owner" check is the critical filter — it excludes phantom owners that are themselves on a path to destruction. This addresses the DBIC regression.
-
Two-phase destruction (deferred validation): when refCount→0, defer the actual
clearWeakRefsToand cascade. Add the object to a "pending destruction" queue. Validate at next safe point (e.g., outer flush completion or before END blocks):- Force a JVM
System.gc()to purge stale weak refs fromScalarRefRegistry - Re-check
activeOwnerCount - Fire DESTROY only if still 0 This matches Java's deferred finalization model. Performance: System.gc() is slow but only at infrequent boundaries.
- Force a JVM
Recommended: Option 2 (owner snapshot + reachability filter). The reachability filter naturally excludes phantom owners (they won't be reachable from roots once their containing scopes have exited).
// At MortalList.flush() refCount→0:
if (base.activeOwners != null) {
// Filter: only count scalars that are owned AND reachable
int reachableOwners = 0;
for (RuntimeScalar sc : base.activeOwners) {
if (sc.refCountOwned && sc.value == base
&& ReachabilityWalker.isScalarReachable(sc)) {
reachableOwners++;
}
}
if (reachableOwners > 0) {
base.refCount = reachableOwners;
return; // skip DESTROY
}
}
// Otherwise: fire DESTROY normallyWhere isScalarReachable(sc) checks if sc itself is reachable
from any package global, live my-var, or stash entry. This is a
new method on ReachabilityWalker — currently the walker checks
"is base reachable from roots" but not "is THIS specific scalar
reachable".
Implement Option 2:
- Add
ReachabilityWalker.isScalarReachable(RuntimeScalar)— walks from roots looking for the specific scalar identity. - Wire it into the rescue check at
MortalList.flush(). - Test: Class::MOP loads (with metaclass having package-global
reachable owners), DBIC
t/52leaks.t11/11 (cycle objects have only phantom owners not reachable from roots). - Remove the class-name heuristic gate.
This is a tractable design change and matches system Perl's semantics: only objects with reachable strong owners are kept alive; cycles (with no external reachable owner) leak as in real Perl.
Implemented reachableOwnerCount() (refCount-rescue using
walker-validated active owners) and disabled the class-name
heuristic gate. Wired into MortalList.flush() at the
refCount→0 transition.
Results:
| Test | Status |
|---|---|
use Class::MOP (no heuristic) |
✅ works |
use Moose (no heuristic) |
✅ works |
| Unit tests | ✅ green |
DBIC t/52leaks.t |
❌ "detached result source" at line 433 (early failure) |
The Class::MOP/Moose case is fully fixed: the metaclass owner
(the $METAS{name} hash element) is found reachable through
globalHashes, so its reachableOwnerCount() returns >0 and
DESTROY is suppressed correctly.
DBIC still regresses: a Schema or ResultSource object is destroyed prematurely. Some scalar holding the Schema strongly isn't reachable via the current walker's seeds (globalCodeRefs, globalVariables, globalArrays, globalHashes, MyVarCleanupStack live my-vars, RuntimeCode.capturedScalars).
The walker doesn't find Schema's owner. Likely candidates:
- The owner is a JVM-stack-live RuntimeScalar that isn't
registered in
MyVarCleanupStack(e.g., a method-call argument scalar that's still on the JVM call stack). - The owner is in a TIE_HASH/TIE_ARRAY/TIED_SCALAR slot that needs special walking.
- The owner is in a RuntimeStash entry that the walker doesn't
enter (line 519 in
isReachableFromRoots(target, false)skips RuntimeStash for perf).
The walker is fundamentally a snapshot of live JVM state at
this exact moment. Selective refCount can hit 0 transiently
DURING method call frames where strong owners are sitting on the
JVM stack but haven't been pushed into MyVarCleanupStack (e.g.,
intermediate RuntimeScalar values during method dispatch).
The right fix would be one of:
-
Make every refCount-affecting RuntimeScalar register itself in MyVarCleanupStack (like local vars), so the walker can always find live owners. Cost: every increment of refCount adds a stack entry.
-
Use Java GC as ground truth: treat refCount<=0 + walker-unreachable as "candidate for destruction". Defer the actual DESTROY call to a periodic safe point (e.g., MortalList batch-flush boundary), where we force
System.gc()and re-validate. Java GC will purge truly-dead objects from ScalarRefRegistry; survivors are real strong owners. -
Eliminate transient zeros at the source: ensure selective refCount never goes to 0 except at true scope exits, by making MortalList drain only AFTER the destination scalar's increment has happened. Requires re-ordering JVM- emitted bytecode for assignments (probably very invasive).
The class-name heuristic gate is not removable today. The infrastructure for D-W6.14 Option 2 is in place but the reachability gap for DBIC's intermediate owners must be bridged first. The heuristic gate is the safety net until that bridge is built.
For this session, the heuristic gate remains disabled in code
(line 591 has else if (false ...)), with reachableOwnerCount()
as the only rescue. This means Class::MOP/Moose work but DBIC
regresses. To restore master parity, re-enable the heuristic gate
as a fallback alongside reachableOwnerCount > 0.
Implemented a more aggressive isScalarReachable() that:
- Seeds from globalCodeRefs/Variables/Arrays/Hashes (package globals)
- Seeds from
MyVarCleanupStack.snapshotLiveVars()(active lexical scopes) - Follows
RuntimeCode.capturedScalarsduring BFS - Skips weak refs (
WeakRefRegistry.isweak) - Auto-init
activeOwnerson firstrecordActiveOwner(no need for explicit activation in weaken)
Results:
| Test | Status |
|---|---|
use Class::MOP (no heuristic via reachableOwnerCount) |
✅ works |
use Moose |
✅ works |
| Unit tests | ✅ green |
DBIC t/52leaks.t 1-8 (basic) |
✅ pass |
DBIC t/52leaks.t 9-12 (per-object GC checks) |
❌ 4 fails |
Master t/52leaks.t baseline |
11/11 (heuristic only) |
The walker correctly reaches the our %METAS cache via
globalHashes, which fixes the Class::MOP/Moose case without
needing the class-name heuristic.
DBICTest::Artist/CD row objects with phantom strong owners. After
undef $row, real Perl's refcount drops to 0 and DESTROY fires.
PerlOnJava with my walker finds Artist still reachable through
SOME path (likely a phantom my-var or scalar still holding the row
strongly).
Possible sources:
- DBIC's internal method dispatch left a my-var holding the row that wasn't released because selective refCount has a pre-existing imbalance.
- Closure captures: a closure created during DBIC processing captured the row; the closure is reachable from globals.
- JVM lazy GC: a RuntimeScalar holding the row hasn't been
GC'd, and is still in
MyVarCleanupStack.liveCounts.
The walker correctly handles weak refs (skips them via
WeakRefRegistry.isweak), but cannot distinguish "phantom strong
holder" from "real strong holder" — both look identical at the JVM
level.
The walker-gate heuristic (classNeedsWalkerGate) only matches
Class::MOP/Moose/Moo classes. DBICTest::Artist/CD aren't in
heuristic, so the heuristic doesn't rescue them either. The
problem is reachableOwnerCount > 0 itself returns positive for
Artist/CD when it shouldn't.
Class::MOP/Moose ARE removable from the heuristic gate (the new
walker handles them correctly via our %METAS). But removing
the heuristic outright still has 4 DBIC leak-test failures.
The path forward: dig into WHICH scalar in the walker is finding each Artist/CD reachable. Add a probe that prints the path. From there, identify whether it's a closure capture, a my-var, or a hash element — then either filter that path out or fix the underlying refCount imbalance that left it as a phantom.
The commit ships reachableOwnerCount() as the primary rescue,
with the walker-gate heuristic as a fallback. DBIC still has 4
failures (the heuristic doesn't help for non-Moose classes), but
Class::MOP/Moose now work via the precise walker rather than the
heuristic — so even if the heuristic was deleted, those modules
would continue to load.
To delete the heuristic safely, the over-rescue of DBIC rows must be addressed first — see D-W6.17 (next session).
The original D-W6.15 plan suggested making selective refCount precise by ensuring transient zeros only happen at scope-exit boundaries (matching Perl 5's FREETMPS). This session attempted that and learned the plan needs revision.
Hypothesis: replace the MortalList.flush() at the end of every
setLargeRefCounted with a single flush per statement (FREETMPS at
statement boundaries, matching Perl 5).
Implementation:
- Removed
MortalList.flush()fromsetLargeRefCounted - Added per-statement
MortalList.flush()in JVMEmitBlockafter each non-last statement - Added per-statement
MORTAL_FLUSHopcode inBytecodeCompilerfor the interpreter backend
Result:
- Unit tests: PASS
- DBIC
t/52leaks.t11/11: PASS (with heuristic gate enabled) - Class::MOP load WITH heuristic: works
- Class::MOP load WITHOUT heuristic: STILL FAILS with "Can't call method 'get_method' on undefined value at Class/MOP/Attribute.pm line 475"
Conclusion: per-statement flush is correct (matches Perl 5 semantics), but it does NOT fix the underlying refCount imbalance. The 50 inc / 51 dec event count from D-W6.10 is a REAL imbalance, not a timing artifact. Flush schedule changes WHEN decrements fire, not WHETHER they're paired.
The user's plan asked for "transient zeros only at scope-exit boundaries". This would be sufficient IF the selective refCount were otherwise balanced. But it's NOT balanced — somewhere a decrement fires for which no matching increment ever happened (or vice versa).
The transient-zero hypothesis came from observing a specific
trace: refCount went 1→0 mid-statement during Class/MOP/Class.pm:260.
But that 1→0 was the CONSEQUENCE of the imbalance, not a cause —
the metaclass's "true" refCount should be 2 at end of run (matching
2 surviving owners), but PerlOnJava's selective count went to
MIN_VALUE because there were 2 extra decrement events somewhere.
Step 1: Per-pair tracing. Tag each setLargeRefCounted increment with a unique ID. Tag each decrement with the ID of the increment it's pairing with. At end of run, find unmatched IDs.
Step 2: Identify the unpaired site. From the trace, identify the specific code path that produces an unmatched event.
Step 3: Fix the site. Either:
- Add a missing increment for an unmatched decrement (correct: a store path that doesn't go through setLargeRefCounted)
- Remove a spurious decrement for an unmatched increment (correct: a path that calls deferDecrement but no matching scalar exists)
Step 4: Verify all four invariants without heuristic:
- Unit tests pass
- Class::MOP/Moose load
- DBIC
t/52leaks.t11/11 - The synthetic reproducers in
src/test/resources/unit/refcount/drift/pass
Even after fixing the imbalance, switching from per-assignment to per-statement flush has these benefits:
- Matches Perl 5 semantics — FREETMPS at statement boundaries
- Performant — fewer flush calls per script (one per statement vs per assignment)
- Cleaner refCount transitions — within a statement, refCount only increases; at statement end, all decrements fire together
But this is a separate refactoring, distinct from the imbalance fix.
This session: experimental per-statement flush implemented and verified. Heuristic gate restored to primary. The plan correctly identifies that the underlying imbalance is the root issue; flush-schedule changes alone don't fix it.
Next step: Step 1 (per-pair tracing) to pinpoint the unpaired site that creates the 50/51 imbalance for the CMOP metaclass. Once the specific path is identified, the fix is targeted and correct.
User asked: "review the plan to make sure we are fixing the right thing — must be performant and correct."
- Master with class-name heuristic gate works correctly for both Class::MOP/Moose AND DBIC. Performance is acceptable.
- Replacing the heuristic with universal walker breaks DBIC.
- Replacing per-assignment flush with per-statement (FREETMPS) does NOT fix the underlying imbalance.
- The 50/51 inc/dec count from D-W6.10 represents a real refCount imbalance for the CMOP metaclass — somewhere a decrement fires without a paired increment.
Real Perl's behavior:
- Class::MOP metaclass in
our %METAS: refcount stays ≥ 1 forever (held by package hash); object never destroyed during program run - DBIC row in
my $row: refcount drops to 0 whenundef $row; destroyed correctly
Master's heuristic gate:
- For Class::MOP/Moose/Moo classes: walker check at refCount→0 →
finds reachable from
%METAS→ rescue. Works. - For DBIC row classes: heuristic NOT triggered → standard refCount → 0 → DESTROY. Works.
The heuristic is empirically correct for the modules we care about. Its shape is: "for these specific classes, selective refCount may have transient zeros; consult walker before firing DESTROY."
It's NOT semantically wrong — it correctly distinguishes two real classes of object lifetime:
- Module-global metadata (CMOP metaclass): persistent throughout program run, held by package globals
- Per-call data (DBIC row): transient, follows lexical scope
The heuristic's flaw is:
- Hard-coded class name list — won't extend to other module-global metadata (custom MOPs, plugin systems)
- Doesn't capture the underlying property
Replace classNeedsWalkerGate(blessId) with a property-based check
that captures "this object is stored in a package-global hash":
// Set on the RuntimeBase when stored as the value of a
// global hash element (e.g., $METAS{Foo} = $meta).
public boolean storedInPackageGlobal = false;
// Set in setLargeRefCounted when 'this' (the scalar being assigned to)
// is an element of a hash registered in GlobalVariable.globalHashes.Then the gate becomes:
} else if (base.blessId != 0
&& base.storedInPackageGlobal
&& WeakRefRegistry.hasWeakRefsTo(base)
&& ReachabilityWalker.isReachableFromRoots(base)) {
// Rescue: this object's lifetime is module-global, but
// selective refCount has transient zeros. Walker confirms
// reachability; suppress DESTROY.
}For DBIC rows: never stored in a package-global hash → flag never set → heuristic doesn't trigger → standard refCount → DESTROY on undef.
For CMOP metaclass: stored in $METAS{...} → flag set → walker
checks reachability → rescue.
- Per-store cost: one boolean check + one flag set if scalar's container is a global hash. O(1) per store.
- Walker cost: only fires when object has weak refs AND storedInPackageGlobal AND refCount→0. Same gating as today's heuristic — same performance envelope.
- No tracing or instrumentation overhead.
- Class-name agnostic: works for any module that uses package- global hashes for metadata (extensibility).
- Behaviorally equivalent to current heuristic for tested modules (Class::MOP, Moose, Moo, DBIC).
- Captures the underlying property explicitly rather than via ad-hoc class name list.
- Add
boolean storedInPackageGlobalfield toRuntimeBase. - In
RuntimeScalar.setLargeRefCounted: when storing, check ifthis(destination scalar) is an element of aRuntimeHashthat's inGlobalVariable.globalHashes. If yes, set the flag on the new base. - Replace
classNeedsWalkerGate(blessId)withbase.storedInPackageGlobalin the walker gate condition. - Verify:
makepassesuse Class::MOP; use Moose;works- DBIC
t/52leaks.t11/11
- Once green, delete
classNeedsWalkerGateand the class-name list entirely. The class-name heuristic is gone.
This is "principled removal of the class-name heuristic". The heuristic worked because it captured a real property — we're just moving from "approximate by class name" to "exact via flag set at store time".
Alternative plans rejected:
- Per-statement FREETMPS: doesn't fix the imbalance (proven by D-W6.17 experiment).
- Universal walker: breaks DBIC (regression in 4-9 tests).
- Find unpaired refCount site: bounded work but unclear if finable in practice; even if found and fixed, the heuristic is still in place (orthogonal).
- Java GC ground truth: major rewrite, too risky.
The flag must propagate correctly:
- Stored in package-global hash → flag set on referent's RuntimeBase
- If stored in a sub-hash (e.g.,
$Foo::CACHE{x}{y} = $meta) — does the deep storage propagate? Need to verify. - For now, only set flag on direct global-hash stores. Indirect stores would not be rescued — but that matches the current heuristic's behavior (only direct module-global metadata).
This is the next-session implementation task.
Status: CLOSED — not a regression in this branch.
Original symptom (user report):
DBIx::Class::ResultSource::schema(): Unable to perform storage-
dependent operations with a detached result source (source 'CD'
is not associated with a schema). at t/52leaks.t line 208
t/52leaks.t .....................................
Dubious, test returned 255 (wstat 65280, 0xff00)
All 6 subtests passed
Resolution (Apr 29, 2026): The user's time ./jcpan -t DBIx::Class
ran make test in ~/.perlonjava/cpan/build/DBIx-Class-0.082844-81/.
That dist's Makefile had been generated by a previous CPAN install
under /Users/fglock/projects/PerlOnJava4/jperl (a different
checkout on branch feature/storable-native-format, with WIP
Storable native-format changes). So the failing make test was
exercising PerlOnJava4's Storable code, not this branch's
walker-gate code.
Verification: re-running the exact same t/*.t t/admin/*.t ...
glob using /Users/fglock/projects/PerlOnJava2/jperl (this
branch) under the same test_harness(0, ...) invocation, the
test passes:
t/52leaks.t ......................................... ok
t/53lean_startup.t .................................. ok
t/60core.t .......................................... ok
So:
- The walker gate (PR #618) and per-flush cache (
f45308336) are not implicated in the user'smake testfailure. - The failing test under PerlOnJava4 surfaced a
dclone failed: freeze failed: Can't store CODE itemsdiagnostic, consistent with the in-progress Storable rewrite on that branch.
Lesson learned: when a CPAN dist's Makefile was generated by
a different jperl binary, make test will keep using that binary
even if the user's current shell points at a different repo. To
test a specific PerlOnJava checkout's behaviour against an existing
build dir, either:
- Run
jperl Makefile.PLfrom the target repo to regenerate the Makefile, or - Skip
make testand invokejperl ... test_harness(...) t/*.tdirectly with the binary you want to test (as done above).
No code change required. Section retained for the historical record so a future agent doesn't repeat the misdiagnosis.
- xs_fallback.md — XS fallback mechanism
- makemaker_perlonjava.md — MakeMaker implementation
- cpan_client.md — CPAN client support
.agents/skills/port-cpan-module/— Module porting skill