Greenplum secrets🎩

Секрет 41 ( Доверяй, но проверяй: нюанс сбора статистики в партициях )
Оказия случилась с опросом выше, благодаря которой родился новый секрет, для тех, кто не зациклен на чтении документации.

Собственно, выявлена неожиданная особенность в 6ке,
gp_autostats_mode = on_no_stats не тригерит сбор статы в партицированой таблице, в которую вносим данные:

For partitioned tables, automatic statistics collection is not triggered if data is inserted from the top-level parent table of a partitioned table.
 
But automatic statistics collection is triggered if data is inserted directly in a leaf table (where the data is stored) of the partitioned table.

Собственно, это и стало причиной моего ложного вывода, т.к. я, будучи уверен, что
в партициях стата была собрана ввиду параметра выше, сравнивал объекты с статой и без.

Таким образом, корректный ответ в опросе:2, мудрость толпы восторжествовала!
🔥Респект @alias cd='rm -rf ', заметившему брешь в моей логике и @Василий Антонов, подсветившему, что
такое поведение gp_autostats_mode в рамках документации.

В итоге, выигрыш от бисекции DISTINCT-а составил 36% по результатам 2х прогонов. ( расчет в файле ниже )
Подумаю, не запатентовать ли сию оптимизацию.

Secret 41 (Trust, but verify: the nuance of collecting statistics in partitions)
An unfortunate incident with the survey above gave birth to a new secret, for those who aren't fans of reading documentation.

A specific issue was discovered in version 6:
gp_autostats_mode = on_no_stats doesn't trigger stats collection in the partitioned table into which we insert data.

For partitioned tables, automatic statistics collection is not triggered if data is inserted from the top-level parent table of a partitioned table.
 
But automatic statistics collection is triggered if data is inserted directly in a leaf table (where the data is stored) of the partitioned table.

This is precisely what led to my false conclusion, because, being certain that
stats were collected in partitions due to the parameter above, I compared an object with stats to an object without stats.

Therefore, the correct answer is 2; the wisdom of the crowd has triumphed!
🔥Kudos to @alias cd='rm -rf' for spotting a flaw in my logic, and to @Vasily Antonov for pointing out that
this behavior of gp_autostats_mode isn't a bug.

In the end, the gain from DISTINCT bisection was 36% based on two runs (calculations in the file below).
I'll consider patenting this optimization.

🫡4🔥3

896 viewsedited 11:20