Greenplum secrets🎩

Тот же секрет картинкой - Валентинкой -) Всем добра!

👍2🐳1

657 views19:26

Секрет 29 ( Красиво не значит правильно )
Secret 29 (Beautiful does not mean correct)

Давно я вас не утомлял планами! Подвезли новые,
Часто при построении отчета надо передать параметры в запрос: кто-то материализует их в отдельную табл-у, которую встраивает в запрос, что работает не так быстро, как хар-код, а кто-то умудряется из хард-кода сделать "произведение искусства".
Одно из таких попалось на той неделе, где автор обернул параметры в подзапрос, видимо интуитивно полагая, что
это будет работать так же хорошо как и явный хардкод параметров, но не все так просто.

Выяснилось, что в зависимости от структуры табл-ы, из которой читаем данные, можно получить либо Broadcast,
либо Redistribute, как увидим ниже.
It's been a while since I've bothered you with query plans! We've brought new ones,
Often when building a report, you need to pass parameters to a query: someone materializes them in a separate table, which is embedded in the query,
which does not work as fast as a hard code, and someone manages to make a "work of art" out of a hard code.
One of these came across last week, where the author wrapped the parameters in a subquery, apparently intuitively believing that
this would work as well as an explicit hard code of parameters, but it's not that simple.

It turned out that depending on the structure of the table from which we read the data, you can get either Broadcast,
or Redistribute, as we'll see below.

По традиции создадим синтетику, без ключа и без статы:
As usual, let's create synthetics, without a key and stats:

create table tst_1k_nk_dat_no_stat2   WITH (appendonly=true,orientation=column,compresstype=zstd,compresslevel=1)
as select (generate_series(1,1000))::text n_txt;

и выполним запрос: and run query

explain analyze
with data_batch as (
select rn::int8 rn, mdmId::text mdmId, report_dt::date report_dt, report_from_dt::date report_from_dt, report_to_dt::date report_to_dt
  from (
    values
    (0, 10, '2024-01-01', '2023-01-01', '2023-12-31')
     ) t(rn, mdmId, report_dt,report_from_dt, report_to_dt)
          )
select   b.rn
  ,b.report_from_dt
  ,b.report_to_dt
  ,b.report_dt
  ,s0.*
from tst_1k_nk_dat_no_stat2 s0
join data_batch b
on b.mdmId = s0.n_txt

Получили Broadcast горемычный в плане, где тиражируется не 1 строка с параметрами а данные табл-ы:
We received a poor Broadcast where data from table are replicated:

Gather Motion 348:1  (slice2; segments: 348)  (cost=0.00..431.00 rows=1 width=28) (actual time=21.969..29.806 rows=1 loops=1)
  ->  Hash Join  (cost=0.00..431.00 rows=1 width=28) (actual time=18.992..20.031 rows=1 loops=1)
        Hash Cond: (((column2)::text) = n_txt)
"        Extra Text: (seg152) Hash chain length 1.0 avg, 2 max, using 999 of 524288 buckets."
        ->  Result  (cost=0.00..0.00 rows=1 width=28) (actual time=0.341..0.341 rows=1 loops=1)
              ->  Result  (cost=0.00..0.00 rows=1 width=32) (actual time=0.267..0.267 rows=1 loops=1)
                    One-Time Filter: (gp_execution_segment() = 152)
                    ->  Result  (cost=0.00..0.00 rows=1 width=32) (actual time=0.041..0.041 rows=1 loops=1)
        ->  Hash  (cost=431.00..431.00 rows=1 width=8) (actual time=16.519..16.519 rows=1000 loops=1)
              ->  Broadcast Motion 348:348  (slice1; segments: 348)  (cost=0.00..431.00 rows=1 width=8) (actual time=0.048..16.214 rows=1000 loops=1)
                    ->  Seq Scan on tst_1k_nk_dat_no_stat2  (cost=0.00..431.00 rows=1 width=8) (actual time=0.093..0.098 rows=9 loops=1)
Planning time: 17.846 ms
  (slice0)    Executor memory: 447K bytes.
"  (slice1)    Executor memory: 172K bytes avg x 348 workers, 172K bytes max (seg1)."
"  (slice2)    Executor memory: 4306K bytes avg x 348 workers, 4307K bytes max (seg0).  Work_mem: 32K bytes max."
Memory used:  311296kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution time: 101.132 ms

Уменьшим энтропию, задав ключ: Let's reduce the entropy by setting the key:

alter table tst_1k_nk_dat_no_stat2 set distributed by(n_txt)

и выполним запрос снова.
Получим оптимальный план, который ждали:
We get the optimal plan:

563 viewsedited 19:14