Greenplum secrets🎩

Проверим, а всегда ли нас ждет такой печальный конец для оконной функции?
Перераспределим foo по ключу invalid_id ( для примера - синтетика )
Let's check if such a sad end always awaits us for the window function?
Let's redistribute foo by the invalid_id key (for example - synthetics)

create table public.tst WITH (appendonly = true, orientation = column, compresstype = zstd, compresslevel = 1)
as
select generate_series(1, 1000) invalid_id, 1 version_id, random() hash_diff distributed by(invalid_id);

Видим, что Redistribute Motion исчез, потому что теперь на каждой ноде есть все данные для локального выполнения запроса ввиду выбранного хэша табл-ы:
We see that Redistribute Motion has disappeared because now each node has all the data for local query execution due to the selected table hash:

explain analyze
SELECT row_number()
                                       OVER (PARTITION BY rdv_src.invalid_id ORDER BY rdv_src.version_id DESC) AS rdv_wf$vsn_rank,
                                       rdv_src.hash_diff                                                       AS hash_diff,
                                       rdv_src.invalid_id                                                      AS invalid_id
                                FROM public.tst AS rdv_src
                                                                                                                      
Gather Motion 720:1  (slice1; segments: 720)  (cost=0.00..431.04 rows=1000 width=20) (actual time=0.094..381.777 rows=1000 loops=1)
  ->  Result  (cost=0.00..431.00 rows=2 width=20) (actual time=0.192..0.203 rows=6 loops=1)
        ->  WindowAgg  (cost=0.00..431.00 rows=2 width=20) (actual time=0.188..0.197 rows=6 loops=1)
              Partition By: invalid_id
              Order By: version_id
              ->  Sort  (cost=0.00..431.00 rows=2 width=16) (actual time=0.178..0.179 rows=6 loops=1)
"                    Sort Key: invalid_id, version_id"
                    Sort Method:  quicksort  Memory: 23760kB
                    ->  Seq Scan on tst  (cost=0.00..431.00 rows=2 width=16) (actual time=0.146..0.154 rows=6 loops=1)
Planning time: 13.253 ms
  (slice0)    Executor memory: 679K bytes.
"  (slice1)    Executor memory: 420K bytes avg x 720 workers, 428K bytes max (seg0).  Work_mem: 33K bytes max."
Memory used:  229376kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution time: 400.504 ms

В итоге, исходный запрос лечению не подлежит, т.к. проблема должна решаться выше, на уровне бизнес-анализа,
ибо расчет ранга по данному техническому полю смысла не имеет - баг!
Но мораль истории в том, что в GP все в ваших руках.
As a result, the original request cannot be resolved, because the problem must be solved earlier, at the business analysis level,
because calculating the rank for this technical field does not make sense - a bug!
But the moral of the story is that in GP you're the master of your business.

👍6

656 viewsedited 15:13