-
-
Notifications
You must be signed in to change notification settings - Fork 642
Detect gibberish copyright #2402 #4610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
@pombredanne removing the tests that I linked above, we only fail these data driven tests:
|
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
* Remove unnecessary tests Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
f9e70b3 to
f3fd656
Compare
Signed-off-by: Jono Yang <[email protected]>
Signed-off-by: Jono Yang <[email protected]>
| @@ -0,0 +1,18 @@ | |||
| about_resource: gibberish.py | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a fine provenance research!
The original from @rrenaud at https://github.com/rrenaud/Gibberish-Detector references a SO answer
This is a nice (IMO) answer to this guys question on stackoverflow. http://stackoverflow.com/questions/6297991/is-there-any-way-to-detect-strings-like-putjbtghguhjjjanika/6298040#comment-7360747
And the SO author is the same as the GH author: https://stackoverflow.com/users/286449/rob-neuhaus
So this settles the original license to be MIT as per @rrenaud choice.
Then we have this chain of forks and derivations to document:
- this file
- derived from @yapus https://github.com/yapus/gibberish
- derived from @vsobolmaven https://github.com/vsobolmaven/gibberish
- derived from @mgreenw https://github.com/mgreenw/gibberish
- derived from @pjanata https://github.com/pjanata/Gibberish-Detector
- ultimately derived from @rrenaud https://github.com/rrenaud/Gibberish-Detector
- derived from @pjanata https://github.com/pjanata/Gibberish-Detector
- derived from @mgreenw https://github.com/mgreenw/gibberish
- derived from @vsobolmaven https://github.com/vsobolmaven/gibberish
- derived from @yapus https://github.com/yapus/gibberish
It would be nice and the right thing to do to keep the credits to each author for this chain of forks and refinements and ... I guess we could either:
- add that as extra doc in the ABOUT file
- OR get a full git history for these files with some git fu and git filter repo?
(The license has stayed MIT all the way so this is about credits, not the license itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to help with this, if guided as to what to do.
This PR adds a gibberish detector to textcode to avoid processing nonsense copyright strings detected from binaries.