Skip to content

Conversation

@shah-siddd
Copy link

Pull Request

Summary

Adds CSV format support to the CLI dataset loading and writing functionality. The CLI now automatically detects and handles both JSON and CSV datasets based on file extension, maintaining format consistency between input and output files.

Changes

  • Added CSV parsing with proper quote and comma handling (parseCsv, parseCsvLine)
  • Added CSV serialization with automatic quoting for special characters (serializeCsv, escapeCsvValue)
  • Added format detection based on file extension (detectDatasetFormat)
  • Refactored loadDataset to support both JSON and CSV formats
  • Refactored writeDataset to preserve input format in output files
  • Updated CLIHandler to use the new format-aware dataset functions
  • Added comprehensive unit tests for CSV parsing, serialization, and round-trip operations

Context

Previously, the CLI only supported JSON datasets. This change enables users to work with CSV datasets, which are commonly used in data science workflows. The implementation handles edge cases like quoted values, commas within fields, and escaped quotes according to RFC 4180 standards.

Testing

  • Unit tests
  • Manual testing
  • Postman CI/CD
  • Other (please specify)

Test Coverage:

  • JSON dataset loading and writing
  • CSV dataset loading and writing
  • CSV parsing with quoted values and commas
  • CSV serialization with proper escaping
  • Round-trip CSV serialization/parsing

Monitoring

  • No expected impact
  • Added/updated relevant monitoring (Sentry alerts, logs, dashboards)

Notes

  • CSV values are automatically quoted when they contain commas, quotes, or newlines
  • CSV parsing handles escaped quotes ("") correctly
  • Output format matches input format (CSV input → CSV output, JSON input → JSON output)
  • All CSV parsing/serialization is implemented without external dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants