Conversation
1b2bee5 to
8e88f65
Compare
englehardt
reviewed
Jun 30, 2020
Contributor
|
@vringar do you have a sample notebook I can use to check the outputs of these? I know we don't have tests (which is okay for now), but that would help me check whether everything is what I'd expect. |
Contributor
Author
This function is used in the dataquality notebook on Databricks. |
3220d0f to
d8eb4bb
Compare
Downloading files via the SparkContext was much slower than downloading via boto (which is what S3Dataset does. So now both classes use the same method, as PySparkS3Dataset inherits from S3Dataset
This parameter allows for filtering out VisitIds that are part of `incompleted_visits` or that had a command with a command_status other than "ok" since users probably shouldn't consider them for analysis This filtering functionality is extracted into the TableFilter class to be reused by other Datasets.
cb8a25f to
00a3d47
Compare
00a3d47 to
247adea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This function should give the user some general overview over the crawl_history and what kind of data loss to expect.