Added method for prompt generation from raw data by alright-code · Pull Request #272 · ITM-Kitware/align-system

alright-code · 2026-04-27T20:39:48Z

Two phase approach for prompt generation:

Extract the relevant information from the raw dataset
Simplify extracted entries into a concise list
phase2_direct_gen.yaml shows how this can be configured for medical kdma.

dmjoy · 2026-04-28T12:17:56Z

+        extract_cacher = ub.Cacher(
+            "generated_system_prompt_template_extract",
+            depends=self._cache_repr(self.dataset, self.extract_prompt),
+            enabled=self.enable_caching,


Oh interesting. So this enabled flag turns the caching on or off? Didn't know that was a feature and could simplify the caching code I have elsewhere.

dmjoy · 2026-04-28T12:23:49Z

-                    dialog.insert(0, DialogElement(
-                        role='system',
-                        content=self.per_attribute_templates[attribute]['system_prompt']))
+                system_prompt_template = self.per_attribute_templates[attribute].get('system_prompt_template')


Hmm, I think this change (expecting a system_prompt_template instead of system_prompt) might break some existing configs. Would much prefer to support both as we do with the baseline component here: https://github.com/ITM-Kitware/align-system/blob/main/align_system/algorithms/outlines_baseline_adm_component.py#L74-L86

dmjoy · 2026-04-28T12:28:23Z

+                    if callable(system_prompt_template):
+                        system_prompt = call_with_coerced_args(
+                            system_prompt_template,
+                            {'model':self.structured_inference_engine.model}


Wondering why we aren't initializing this component with structured_inference_engine and instead opting to pass in the model and create a new generator (looks like we're just passing in the structured_inference_engine.model anyway). Are there parameters that need to be different or are we otherwise invoking this differently that makes this necessary? The main detractor to doing it this way is that we can't freely swap out the inference engine backend.

dmjoy · 2026-04-28T12:29:05Z

This file probably makes more sense to go into the prompt_engineering directory (instead of algorithms)

Aly Mankowski added 4 commits April 27, 2026 16:33

added generatedsystemprompttemplate

3eeac0e

2 phase approach for prompt template

0528b11

refactored prompt generation code

5e8c8ab

removed reference comment

8ed064c

dmjoy requested changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added method for prompt generation from raw data#272

Added method for prompt generation from raw data#272
alright-code wants to merge 4 commits intomainfrom
dev/unstructured-translation

alright-code commented Apr 27, 2026

Uh oh!

dmjoy Apr 28, 2026

Uh oh!

dmjoy Apr 28, 2026

Uh oh!

dmjoy Apr 28, 2026

Uh oh!

dmjoy Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alright-code commented Apr 27, 2026

Uh oh!

dmjoy Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

dmjoy Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants