@martinreynaert provided the following examples:
<mre> veroor_zaakt#1#veroorzaakt#100000002#1#0.815385
<mre> veroor_zaakt_door#1#veroorzaakt_door#100000001#1#1
<mre> veroor#1#verloor#100000024#1#0.998869
The last entry is undesirable.
<mre> veroor_zaakt#1#veroorzaakt#100000002#1#0.815385
<mre> veroor_zaakt_door#1#veroorzaakt_door#100000001#1#1
<mre> zaakt_door#1#zaak_voor#100000001#2#1
<mre> zaakt#1#nazakt#100000000#2#0.998757
The last entry is undesirable.
<mre> verlaa_ten#1#verlaaten#100000010#1#0.984416
<mre> verlaa#1#verlaan#100000000#1#0.998726
Idem
<mre> acobs_Nakomelingen#1#j_acobs_Nakomelingen#1#2#1
<mre> acobs#1#Jacobs#100000001#1#0.993398
<mre> j_acobs#1#Jacobs#100000001#1#0.977545
Here the second is undesirable.
This last one also illustrates why filtering out is not that easy.
It would be handy if is was a sequential process, but unfortunately not.
At the moment TICCL-rank process it's input and output in chunks, but we have to change that and store all results so we can filter the above cases out afterwards.
A major change! More memory consuming, and less easy to handle multi threaded.
Some more investigation is needed.
@martinreynaert provided the following examples:
The last entry is undesirable.
The last entry is undesirable.
Idem
Here the second is undesirable.
This last one also illustrates why filtering out is not that easy.
It would be handy if is was a sequential process, but unfortunately not.
At the moment TICCL-rank process it's input and output in chunks, but we have to change that and store all results so we can filter the above cases out afterwards.
A major change! More memory consuming, and less easy to handle multi threaded.
Some more investigation is needed.