Add Workload object and execution methods #18

tkarna · 2025-11-18T17:30:36Z

Implements #16.

Adds a Workload object and execution engine utility functions that can execute it.
Adds two CPU examples that demonstrate the usage
- One that allocates input data with NumPy and another that uses MLIR helper functions to allocate/fill/deallocate input memrefs.

rolfmorel

Here's a partial pass through it. Will try to complete the first pass through tomorrow!

rolfmorel · 2025-11-18T23:02:31Z

python/lighthouse/workload.py

+        pass
+
+    @abstractmethod
+    def get_input_arrays(self, execution_engine) -> list:


-> abc.Sequence[ir.Value]?

If we had that each Workload is explicitly associated with a Target (or more than one), i.e. has a Target member, and each Target keeps track of it's associated execution_engine, this signature becomes simpler.

Creating the execution engine requires a fully lowered payload IR. So keeping the engine in the Workload object complicates the re-use of this object, e.g., for different schedule parameters.

The flow is something like: ParametrizedWorkload(schedule with unknown params, initial payload IR, correctness test, target specific info, ...) -> ConcreteWorkload(lowered payload IR, target specific info, ...).

In this proposal the Workload object addresses the first part - it is a re-usable high-level description of the problem. Every time one asks for a payload or schedule module, a new module is generated (The payload module is lowered in-place, so it cannot be re-used; schedule may depend on the given parameters). The latter part - lowering to a final payload IR and executing it - is handled via the execution helper functions. We could design this differently, e.g. by adding another "ConcreteWorkload" object that is associated with an execution engine (either in this PR or later). All ideas are much appreciated!

Thank you for explaining!

Indeed, there appears to be this "phase change" in interacting with Workloads: only after lowering do certain actions, like executing, become available. It would be nice if we modelled this in the code somehow. I agree that having separate classes would work (I don't have much of a preference for the mechanism at the moment -- e.g. making methods take arguments of type only produced by methods that need have happened earlier is a fine mechanism as well).

rolfmorel · 2025-11-18T23:04:41Z

python/lighthouse/workload.py

+        pass
+
+    @abstractmethod
+    def get_complexity(self) -> list:


If the different "dimensions" of the returned thing are known, and have sensible names, I would go with a NamedTuple.

Changed to tuple[int, int, int]

python/lighthouse/utils/execution.py

python/examples/workload/example.py

rolfmorel · 2025-11-23T21:43:01Z

python/examples/workload/example.py

+                    if dump_kernel == "bufferized":
+                        transform.YieldOp()
+                        return schedule_module


Do we want this in the committed version?

An alternative to consider is inserting transform.print %mod if some verbosity flag has been set.

This PR has the dump_payload and dump_schedule arguments in the Workload class. As such we should demonstrate their use in the examples. We could also drop this feature for now, but personally I find this feature useful, e.g. in the xegpu matmul example.

I agree this kind of functionality is very useful while developing! I am not sure if these arguments are necessary on the most general version of Workload. Though as above, let's iterate in-tree. That is, it's up to follow-up PRs to demonstrate that getting rid of such arguments still gives us all the functionality we care about.

rolfmorel

Have left a couple more medium-level comments.

We are getting close to this being ready, IMO.

rolfmorel · 2025-12-05T15:28:47Z

lighthouse/utils/runner.py

+    return execution_engine
+
+
+def execute(


Could we move both execute and other functions taking Workload to lighthouse.workload (creating a folder if so needed)?

To me these functions seem closely tied to the Workload interface (and as such have a bit of an opinionated way of interacting with schedules, e.g. how schedule_parameters are to be taken).

rolfmorel · 2025-12-05T15:30:57Z

lighthouse/utils/runner.py

+                raise ValueError("Benchmark verification failed.")
+
+
+def emit_benchmark_function(


This function seems to be largely independent of the workload argument.

I would probably still prefer it in lighthouse.workload over lighthouse.utils though.

rolfmorel · 2025-12-05T15:37:03Z

lighthouse/utils/runner.py

+        benchmark.func_op.attributes["llvm.emit_c_interface"] = ir.UnitAttr.get()
+
+
+def benchmark(


This feels like it is implementing a little "protocol" for how to interact with Workload objects. Given its close tie to Workload, we could add it as a method on Workload. We could even mark that method @final: https://peps.python.org/pep-0591/#id1

Same goes for execute, I think.

(Sorry if I am repeating myself from a couple weeks back.)

For me personally, such protocol-implementing functions also help me understand the different parts of an interface, e.g. which purpose(s) they serve. Having them close to the definition of the interface could help people (like me).

We could promote benchmark and execute functions to the Workload class. For the time being I've added them as "examples" of common utility functions because we don't know if these two methods are actually sufficient for all use cases. If not, the developer can just implement their own version of execute/benchmark function and use it instead. (execute is probably always fairly similar though).

Also semantically, the workload definition and the way we want to, say, benchmark it seem somewhat orthogonal to me.

rolfmorel · 2025-12-05T15:41:16Z

lighthouse/workload.py

+            print(schedule_module)
+        return payload_module
+
+    @abstractmethod


If this is an abstractmethod, do we need a default implementation? If the default implementation is useful, i.e. subclasses would sometimes just use it as is, then we can remove @abstractmethod, I think.

Yes, we do not need an implementation here, I've included the dummy code just for the comments. I can remove it.

rolfmorel · 2025-12-05T16:06:36Z

lighthouse/utils/runner.py

+        func.FuncOp("rtclock", ((), (f64_t,)), visibility="private")
+        # emit benchmark function
+        time_memref_t = ir.MemRefType.get((nruns,), f64_t)
+        args = payload_arguments + [time_memref_t]


An alternative to modifying the signature is to add a memref.global @time_deltas : memref<10xf64> = dense<0.000000e+00> alongside the benchmark function and just load and store from/to there. In Python you should be able to use https://numpy.org/doc/stable/reference/routines.ctypeslib.html to get a numpy array backed by this global, so that you can do your means etc easily.

So the benefit of using a global buffer is to avoid pre-allocating the buffer to the expected size with numpy? The global buffer will live as long as the execution engine is alive?

To me the primary benefit is that the function can have the same signature as the benchmarked function. The time tracking becomes a side-effect of the function, meaning that, for their functional behaviour, we could interchange them.

I believe the global buffer would indeed live as long as the execution engine (i.e. as long as the shared object remains loaded). Might not be great if we were to decide we want to run parallel execution within the same process, though that doesn't sound great for benchmarking anyway.

rolfmorel · 2025-12-05T16:10:07Z

examples/workload/example.py

+            def payload(*args):
+                A, B, C = args


Suggested change

def payload(*args):

A, B, C = args

def payload(A, B, C):

Super nit 😄

rolfmorel · 2025-12-05T16:14:03Z

examples/workload/example_mlir.py

+        min_cst = arith.constant(f64_t, min)
+        max_cst = arith.constant(f64_t, max)
+        seed_cst = arith.constant(i32_t, seed)
+        linalg.fill_rng_2d(min_cst, max_cst, seed_cst, outs=[buffer])


Nice! Didn't know this existed. Wonder why it's just "2d".

rolfmorel · 2025-12-05T16:35:52Z

lighthouse/workload.py

+
+    @abstractmethod
+    @contextmanager
+    def allocate_inputs(self, execution_engine: ExecutionEngine):


It's debatable as to whether this method should be on the most generic version of Workload. That is, another conceptually simple way of implementing workloads is to fully represent the benchmark in IR. Where the payload contains the kernel and a main function that calls it which is also responsible for the setup of inputs. In that case this method is extraneous.

When we start adding example workloads like that we can have a look at if we maybe want a more generic version of the Workload interface alongside more concrete ones for particular ways of implementing workloads (e.g. one for where the kernel-calling logic is implemented as Python methods on Workload and another for where the calling logic resides fully in IR).

Yes, anticipating all the possible kinds of workloads is kind of hard. For the generic case, however, we do need to assume that the payload function takes input memrefs and that we need to be able to handle user-defined allocation and deallocation. For the latter, a context manager seems like a good choice for the generic interface, e.g. in the runner, as shown in this PR.

Then it's a question of defining a nice interface that would provide a context manager for the runner (even if a no-op) and avoid having the user to write a lot of unnecessary boilerplate for the simple workloads. One option could be to add a default context manager that calls allocate/deallocate functions that by default are no-ops. The use would then only need to implement alloc/dealloc functions if actually needed. In addition we would probably also need something like get_input_arrays to get the memrefs for the payload function. I had something like this in an earlier version.

tkarna force-pushed the workload branch from 171ff77 to 59b3a6d Compare November 18, 2025 17:35

rolfmorel reviewed Nov 23, 2025

View reviewed changes

tkarna added 10 commits December 2, 2025 14:18

add workload obj, execution and mlir utils, and two workload examples

91339a3

clean up context and other fixes

01c8007

move execution.py -> runner.py

83b837a

workload: allocate_inputs ctx manager returns the input memrefs

af7a49b

define helper functions with func.func decorator

f026282

remove apply_transform_schedule function

ebcc6bb

lower_payload function is a member of Workload

da9eb18

typing and mlir utils import

802ded0

rename workload requirements to shared_libs

8463748

get_engine: remove context

9214caa

tkarna force-pushed the workload branch from 59b3a6d to 9214caa Compare December 4, 2025 14:15

annotate examples for CI

bf78876

tkarna marked this pull request as ready for review December 4, 2025 14:41

rolfmorel reviewed Dec 5, 2025

View reviewed changes

		raise ValueError("Benchmark verification failed.")


		def emit_benchmark_function(

		benchmark.func_op.attributes["llvm.emit_c_interface"] = ir.UnitAttr.get()


		def benchmark(

Add Workload object and execution methods #18

Are you sure you want to change the base?

Add Workload object and execution methods #18

Uh oh!

Conversation

tkarna commented Nov 18, 2025

Uh oh!

rolfmorel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rolfmorel Dec 5, 2025 •

edited

Loading

rolfmorel Dec 5, 2025 •

edited

Loading