Skip to content

Conversation

@lyrixx
Copy link
Member

@lyrixx lyrixx commented Nov 26, 2025

Q A
Bug fix? no
New feature? yes
Docs? no
Issues Fix #911 ; fix #875
License MIT

Life is always easier without storing state in a service

@carsonbot carsonbot added Feature New feature Store Issues & PRs about the AI Store component Status: Needs Review labels Nov 26, 2025
@lyrixx lyrixx force-pushed the store-indexer-no-state branch from e11d79f to e2cdac7 Compare November 26, 2025 14:42
return (new PhpCsFixer\Config())
// @see https://github.com/PHP-CS-Fixer/PHP-CS-Fixer/pull/7777
->setParallelConfig(PhpCsFixer\Runner\Parallel\ParallelConfigFactory::detect())
->setUnsupportedPhpVersionAllowed(true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using PHP8.5. This allow me to run php-cs-fixer.
Obviously, we must not use PHP8.5 features. The CI will catch us if we do :)

$platform = PlatformFactory::create(env('OPENAI_API_KEY'), http_client());
$vectorizer = new Vectorizer($platform, $embeddings = 'text-embedding-3-small');
$indexer = new Indexer(new InMemoryLoader($documents), $vectorizer, $store, logger: logger());
$indexer->index($documents);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$documents, are already set in the Loader. No need to repeat it here.

@lyrixx lyrixx force-pushed the store-indexer-no-state branch 2 times, most recently from 645485a to 2bf1345 Compare December 1, 2025 08:28
Life is always easier without storing state in a service
@lyrixx lyrixx force-pushed the store-indexer-no-state branch from 2bf1345 to e56e022 Compare December 1, 2025 08:32
@chr-hertel
Copy link
Member

Yes, i agree on the issue with the statefulness, but the slim configuration is quite nice and methods names like loadAndIndex turn me down as well.

what do you think about a concept of an abstract Source as object with different implementations and factory methods?

Indexer::index(SourceInterface $source, array $options = []): void

the indexer could even know a set of loaders with:

interface LoaderInterface
{
    public function supports(SourceInterface $source): bool;
    public function createSource(string|string[] $source): SourceInterface;
    public function load(SourceInterface $source, array $options = []): EmbeddableDocumentInterface[];
}

we could have

  • class RssFeed implements SourceInterface
  • class DocumentCollection implements SourceInterface <= enabling to hand over documents
  • class Url implements SourceInterface
  • class Folder implements SourceInterface
  • class TextFile implements SourceInterface
  • class Pdf implements SourceInterface
  • ...

the createSource might be helpful to keep that slim configuration alive.

would that solve our concerns here? @lyrixx @OskarStark

@lyrixx lyrixx closed this Dec 10, 2025
@lyrixx lyrixx deleted the store-indexer-no-state branch December 10, 2025 13:47
@lyrixx lyrixx restored the store-indexer-no-state branch December 10, 2025 13:47
@lyrixx lyrixx reopened this Dec 10, 2025
@lyrixx
Copy link
Member Author

lyrixx commented Dec 10, 2025

What do you think about a concept of an abstract Source as object with different implementations and factory methods?

At first glance, it seems okay, but I fail to see how to implement that. It's too specific. We would implement Rss, Url, Folder, etc., but the list could be infinite, right? We don't want to support everything. In my humble opinion, it lacks genericness.

Then, to me, Rss, Url, or Source are data objects, not stateless services. The naming is confusing.

Moreover, I'm not a fan of the Strategy Design Pattern. You have to inject a bunch of Loaders, and call supports() on each of them. Performance is low. We are fighting this pattern so much in symfony/serializer to avoid performance issues!


I understand loadAndIndex() is not perfect. But it shows a weakness in the design. This class does too much. It should be split:

  • One will take documents, and index them.
  • Another one will take a loader and the new class, and load + index.

So I think the first, small step, is to break this class in two:

  • The diff will be very small.
  • The API will stay the same.
  • I (and others) will be able to re-use the indexer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature New feature Status: Needs Review Store Issues & PRs about the AI Store component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Store] Indexer is nice, but hard to use directly

4 participants