Some of the DB tables could have their PRIMARY KEYs changed to improve query performance.
For example the PRIMARY KEY of the obs_web table is this:
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/{cluster}/tables/ooni/obs_web_repl/{shard}', '{replica}')
PARTITION BY concat(substring(bucket_date, 1, 4), substring(bucket_date, 6, 2))
PRIMARY KEY (measurement_uid, observation_idx)
while the analysis table is this:
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/{cluster}/tables/ooni/analysis_web_measurement/{shard}', '{replica}')
PARTITION BY substring(measurement_uid, 1, 6)
PRIMARY KEY measurement_uid
ORDER BY (measurement_uid, measurement_start_time, probe_cc, probe_asn, domain)
What this means is that it's triggering very large scans (effectively a full scan), when performing very common filtering operations over the measurement_start_time, probe_cc, probe_asn and domain.
Unfortunately it's not possible to change these with an ALTER query, so instead we are going to have to create the new table and copy the data over.
Some of the DB tables could have their PRIMARY KEYs changed to improve query performance.
For example the PRIMARY KEY of the obs_web table is this:
while the analysis table is this:
What this means is that it's triggering very large scans (effectively a full scan), when performing very common filtering operations over the
measurement_start_time,probe_cc,probe_asnanddomain.Unfortunately it's not possible to change these with an ALTER query, so instead we are going to have to create the new table and copy the data over.