Skip to content

Fix Jindo DataLoad without target for multi mount Datasets#5631

Draft
Nakshatra480 wants to merge 2 commits intofluid-cloudnative:masterfrom
Nakshatra480:fix-jindo-dataload-multi-mount
Draft

Fix Jindo DataLoad without target for multi mount Datasets#5631
Nakshatra480 wants to merge 2 commits intofluid-cloudnative:masterfrom
Nakshatra480:fix-jindo-dataload-multi-mount

Conversation

@Nakshatra480
Copy link

Closes #4439

Summary

This PR fixes Jindo DataLoad behavior when spec.target is not set but the target Dataset has multiple mounts.

  • If DataLoad.spec.target is specified, behavior is unchanged.
  • If DataLoad.spec.target is empty, the controller now derives targetPaths from Dataset.spec.mounts (one entry per mount path, replicas default to 1).
  • Adds a unit test to cover the no target, multi mount case for genDataLoadValue.

This addresses the issue where a JindoRuntime DataLoad could fail when no target was provided for a dataset with multiple mount points (see issue #4439).

Implementation details

  • Updated JindoEngine.genDataLoadValue to:

    • Keep existing logic for explicit spec.target.
    • Fallback to iterating targetDataset.Spec.Mounts when spec.target is empty, building cdataload.TargetPath entries with:
      • Path from the mount Path
      • Replicas set to 1
      • FluidNative decided via utils.IsTargetPathUnderFluidNativeMounts.
  • Extended Test_genDataLoadValue to include:

    • "dataset with multiple mounts and no explicit target":
      • Verifies that two mount paths (/mnt0, /mnt1) are translated into TargetPaths with correct Replicas and FluidNative values.
      • Confirms options such as loadMemorydata and hdfsConfig are still set as before.

Why this change

For JindoRuntime, users reasonably expect a DataLoad without spec.target to preload all dataset mounts. Previously, this scenario was not handled explicitly, leading to failures when the dataset had multiple mount points. By falling back to dataset mounts, we align the controller behavior with user expectations and the other runtimes’ patterns.

Testing

  • Unit tests:
    • go test ./pkg/ddc/jindo -run Test_genDataLoadValue$
  • All existing Test_genDataLoadValue cases continue to pass.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Nakshatra480, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the JindoRuntime's data loading capabilities by introducing a more robust handling mechanism for datasets with multiple mount points when no explicit target is specified in the DataLoad configuration. Previously, such configurations could lead to failures; now, the system intelligently infers the intended target paths from the dataset's mounts, aligning its behavior with user expectations and other runtime patterns. This change improves the reliability and user-friendliness of data loading for multi-mount Jindo datasets.

Highlights

  • Jindo DataLoad Fix: Resolved an issue where Jindo DataLoad operations would fail when spec.target was not explicitly set, but the target Dataset had multiple mount points.
  • Automatic Target Path Derivation: When DataLoad.spec.target is empty, the controller now automatically derives targetPaths from the Dataset.spec.mounts, creating one entry per mount path with replicas defaulting to 1.
  • Unit Test Coverage: A new unit test has been added to cover the specific scenario of a DataLoad without a target on a dataset with multiple mounts, ensuring correct behavior for genDataLoadValue.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue where a Jindo DataLoad would fail if no target paths were specified for a dataset with multiple mounts. The logic now correctly falls back to using the dataset's mount points, with a default replica count of 1. The implementation is clear and the addition of a unit test for the multi-mount scenario is a great improvement. I've added one suggestion to further enhance test coverage by including an edge case.

Signed-off-by: Nakshatra Sharma <nakshatra.sharma3012@gmail.com>
@Nakshatra480 Nakshatra480 force-pushed the fix-jindo-dataload-multi-mount branch from 6f7a6bf to 406ea8d Compare January 29, 2026 09:48
Signed-off-by: Nakshatra Sharma <nakshatrasharma@Nakshatras-MacBook-Air.local>
@sonarqubecloud
Copy link

@Nakshatra480
Copy link
Author

@cheyang pls review these changes and give me the feedback if anything comes up thanks

@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.10%. Comparing base (e807509) to head (4dd26ce).
⚠️ Report is 16 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5631   +/-   ##
=======================================
  Coverage   59.10%   59.10%           
=======================================
  Files         444      444           
  Lines       30538    30538           
=======================================
  Hits        18049    18049           
  Misses      10992    10992           
  Partials     1497     1497           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in JindoRuntime where DataLoad operations would fail when no explicit spec.target was provided for datasets with multiple mount points. The fix adds fallback logic to derive target paths from the dataset's mounts when no explicit targets are specified.

Changes:

  • Added conditional logic to fallback to dataset mounts when DataLoad.spec.target is empty
  • Added unit test coverage for datasets with multiple mounts and no explicit target

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/ddc/jindo/load_data.go Implements fallback logic to use dataset mounts as target paths when no explicit target is specified in DataLoad
pkg/ddc/jindo/load_data_test.go Adds two new test cases covering datasets without mounts and datasets with multiple mounts when no explicit target is provided

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +174 to +179
if m.Path == "" {
continue
}
fluidNative := utils.IsTargetPathUnderFluidNativeMounts(m.Path, *targetDataset)
targetPaths = append(targetPaths, cdataload.TargetPath{
Path: m.Path,
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When mount.Path is empty, the code skips that mount entirely. According to the UFSPathBuilder.GenUFSPathInUnifiedNamespace convention, if mount.Path is not set, the path should default to /{mount.Name}. Consider using GenUFSPathInUnifiedNamespace to get the correct path instead of skipping mounts with empty paths, to ensure consistency with how mount paths are handled elsewhere in the codebase.

Suggested change
if m.Path == "" {
continue
}
fluidNative := utils.IsTargetPathUnderFluidNativeMounts(m.Path, *targetDataset)
targetPaths = append(targetPaths, cdataload.TargetPath{
Path: m.Path,
mountPath := m.Path
if mountPath == "" {
mountPath = "/" + m.Name
}
fluidNative := utils.IsTargetPathUnderFluidNativeMounts(mountPath, *targetDataset)
targetPaths = append(targetPaths, cdataload.TargetPath{
Path: mountPath,

Copilot uses AI. Check for mistakes.
Comment on lines +172 to 184
} else {
for _, m := range targetDataset.Spec.Mounts {
if m.Path == "" {
continue
}
fluidNative := utils.IsTargetPathUnderFluidNativeMounts(m.Path, *targetDataset)
targetPaths = append(targetPaths, cdataload.TargetPath{
Path: m.Path,
Replicas: 1,
FluidNative: fluidNative,
})
}
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue likely exists in JindoCache and JindoFsx runtimes. Both pkg/ddc/jindocache/load_data.go and pkg/ddc/jindofsx/load_data.go iterate directly over dataload.Spec.Target without handling the empty target case. Consider applying this same fix to those runtimes for consistency and to prevent the same bug from affecting users of those runtime types.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] JindoRuntime Dataload failed without specifying the target field for multiple mountpoints in Dataset

1 participant