Skip to content

Conversation

@Shekharrajak
Copy link

@Shekharrajak Shekharrajak commented Nov 13, 2025

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Shekharrajak for you contribution, please add a function to the fuzztesting kit, similar to #2755

@mbutrovich
Copy link
Contributor

mbutrovich commented Nov 13, 2025

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

@andygrove
Copy link
Member

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

We probably need to fall back to Spark unless this config is enabled:

  val COMET_REGEXP_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
    conf("spark.comet.regexp.allowIncompatible")
      .category(CATEGORY_EXEC)
      .doc("Comet is not currently fully compatible with Spark for all regular expressions. " +
        s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
      .booleanConf
      .createWithDefault(false)

@Shekharrajak
Copy link
Author

Thanks @Shekharrajak for you contribution, please add a function to the fuzztesting kit, similar to #2755

Thanks! Added in commit 8eddd29

@Shekharrajak
Copy link
Author

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

Added tests 987b646

@Shekharrajak
Copy link
Author

In the past I think we've encountered differences in Java and Rust's regex engines wrt graphemes. Could we get some larger UTF-8 characters in the tests?

We probably need to fall back to Spark unless this config is enabled:

  val COMET_REGEXP_ALLOW_INCOMPATIBLE: ConfigEntry[Boolean] =
    conf("spark.comet.regexp.allowIncompatible")
      .category(CATEGORY_EXEC)
      .doc("Comet is not currently fully compatible with Spark for all regular expressions. " +
        s"Set this config to true to allow them anyway. $COMPAT_GUIDE.")
      .booleanConf
      .createWithDefault(false)

How can we check if it is not falling back to Spark's JVM execution? @andygrove

@wForget wForget changed the title Support for StringSplit feat: Support for StringSplit Nov 17, 2025
@Shekharrajak Shekharrajak force-pushed the feature/add-string-split-support branch from dbb34d5 to 1f8f2b2 Compare November 17, 2025 18:52
@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.40%. Comparing base (f09f8af) to head (9d149fd).
⚠️ Report is 732 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2772      +/-   ##
============================================
- Coverage     56.12%   54.40%   -1.73%     
- Complexity      976     1444     +468     
============================================
  Files           119      167      +48     
  Lines         11743    15283    +3540     
  Branches       2251     2531     +280     
============================================
+ Hits           6591     8315    +1724     
- Misses         4012     5744    +1732     
- Partials       1140     1224      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kazuyukitanimura
Copy link
Contributor

Thanks @Shekharrajak
Looks like there are rust check failures
https://github.com/apache/datafusion-comet/actions/runs/19441578149/job/55638326879?pr=2772

Perhaps you can try cargo fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for StringSplit

7 participants