Greenplum secrets🎩

Секрет 37 (Готовь сани летом, а телегу зимой) Неожиданно и случайно новый созрел секрет Поговорим про проактивность, ведь как известно 90% расходов на бюджет ИТ продукта приходится на ошибки проектирования. Бывший коллега по Ламоде, канал которого ("Инжиниринг…

And the reason? It's the same, the query doesn't have a src_cd cutoff, i.e. partition prunning which wouldn't be out of place for a partitioned big_tbl:

UPDATE big_tbl tgt
SET version_id = :1
FROM dlt
WHERE 1 = 1
AND (dlt.account_rk = tgt.account_rk)
AND tgt.src_cd IN (<prt_list>)

It remains to say that for UPDATE queries that have a partition cutoff, the time has been reduced from minutes to seconds.
📌
The moral is that if your target tables are not partitioned today, use a cutoff for the incoming set in your update queries if appropriate.
Tomorrow, they will be cut into sections, as was the case with our DW.

Greenplum secrets🎩

Секрет 1 (Проблема пустой таблицы)
Даже select из пустой таблицы с фильтром по тяжелой таблице может зависнуть, вызвав спилл (утечка памяти)

Secret 1 (Empty table surprise)
Even empty table query filtered by heavy table may run slow and create spill
--bad…

423 views21:19

Greenplum secrets🎩

#true MPP Секрет 38 (Случай сортировки без мастера) Я тут подумал, GP - это про BigData. А где много данных, там не обходится без статистики, которой до сих пор не уделялось должного внимания. В продолжение находки загорелся идеей проверить теорию автора…

Translation
#true MPP
Secret 38 (A case of sorting without a master mode)
I believe, conceptually GP is about BigData. And where there is a lot of data, there is no way around statistics,
which has not been given due attention until now at the channel.
Continuing with the discovery
I got the idea to test the author's theory in practice.

In general, the median, unlike the average, is often used for a more adequate interpretation of reality.
For example, the average salary of the working population of the Russian Federation in 2024 may differ greatly from the median due to the presence of oligarchy at the tail
of the distribution.
In other words, if out of 76 million people, 50% have a salary no higher than 50k RUR, then the median will not be higher than 50k,
with an average of 88k.

Statistical formulas, including the median, are also widely used in AI algorithms:
-- Outlier processing:
-- in time series processing and filtering
-- In image processing algorithms for smoothing without strong blurring of edges
etc.

Now to the point.

Let me remind you that calculating the median requires sorting the set.
For example, if you have a series of natural numbers from 1 to 1000 in your table, then the function

select percentile_disc(0.5) within group(order by n) as median from tst

will return 500, and its analogue percentile_cont(0.5) - 500.5 = the average between the upper boundary of the left set and the lower boundary of the right set.

I checked that the median calculation is indeed performed on the master by creating a set of 100 million numbers, repeating 100,000 times the first 1000:

create table tst
as
select generate_series(1, 1000) n;

insert into tst
select a.*
from tst a
join generate_series(1, 100000) on 1 = 1

I intentionally created it DISTRIBUTED RANDOMLY, explain why later.

Calculating the median gave me the plan:

explain analyze
select percentile_disc(0.5) within group(order by n) as median from tst

Aggregate (cost=3743.50..3743.51 rows=1 width=4) (actual time=73917.174..73917.175 rows=1 loops=1)
-> Gather Motion 372:1 (slice1; segments: 372) (cost=0.00..3741.00 rows=1000 width=4) (actual time=0.539..8346.740 rows=100001000 loops=1)
-> Seq Scan on tst (cost=0.00..11.00 rows=3 width=4) (actual time=0.118..18.218 rows=270117 loops=1)
Planning time: 0.876 ms
(slice0) Executor memory: 34184K bytes.
" (slice1) Executor memory: 218K bytes avg x 372 workers, 218K bytes max (seg0)."
Memory used: 983040kB
Optimizer: Postgres query optimizer
Execution time: 74008.427 ms

median=500
Now we calculate local medians and take their average, because on each node we have a random sample from the original set:

select avg(median) from
(
select percentile_disc(0.5) within group(order by n) as median from dbg.tst
group by gp_segment_id) a;
Execution time: 2332.995 ms

median=500.4919354838709677

Acceleration by 31.2 times, which is cool, but until we remember that we have 372 segments, and it is not entirely clear why the result
turned out to be an order of magnitude slower than the reduced data volume on each node.

I checked the calculation for a set of 1 billion by multiplying the existing set x10

insert into tst
select a.*
from tst a
join generate_series(1, 10) on 1 = 1

Exec time is 15 m 12 s 474 ms and 28 s 268 ms for the MPP approach respectively.
Acceleration x 32.35.
The results also almost matched: 500 and 500.4704301075268817 respectively. By the way, for this MPP test, all calculated
medians on nodes lie within the range from 499(min) to 501(max)

📌It is clear that the median calculation scales well with a multiple increase in the sample and the acceleration is stably kept in the range of x31 - 32.
By the way, in our case, the volume of a random sample on each shard does not exceed 0.27% (= 1/372 nodes), while the author in the original recommends not to fall below 2%,
but in his case the result is the median of only a subsample, while I eat the whole elephant in parts.

551 views21:39

Greenplum secrets🎩

📌It remains to add that in Kahneman's book "Noise: A Flaw in Human Judgment" there are many examples when the "wisdom of the crowd" turns out to be closer to the truth than individuals point of view.
I think I gave another worthy example for his book, where - the crowd is a set of cluster nodes, in which the error in estimating the answer to the question posed by each of them is leveled out when averaging to an almost flawless level for such a ridiculous price! -)

Happy weekend to everyone and have a good rest!

Greenplum secrets🎩

https://eax.me/approximate-percentile/

❤1

676 views21:39

Greenplum secrets🎩

This media is not supported in your browser

VIEW IN TELEGRAM

Друзья, Ровно через неделю, в Пт, 20 июн, поговорим тут в прямом эфире с автором о ~~черных дырах~~ базах данных, in-memory и не очень. Приглашенный гость Director, R&D Arenadata; founder & CTO Picodata - Константин Осипов. Начало в 19.00 мск

🔥7

837 views21:48

Greenplum secrets🎩

На заметку
#alter #table
В продолжение,
предположим, вы создали таблицу foo AO/(CO|RO) и собрали стат-ку, потом добавили столбец в который либо внесли данные либо оставили as is.
При любом запросе к таблице получим предупреждение:

Continuing with,
let's assume you created a table foo with AO/(CO|RO) option and collected statistics, then added a column and either partially updated it or left it as is.
For any query to the table, we will receive a warning

One or more columns in the following table(s) do not have statistics: foo

Возникает вопрос - как узнать какие колонки в табл-е без статы, ответ :
The question arises - how to find out what columns are in a table without statistics, the answer:

SELECT
    att.attname AS column_name,
    att.attnum AS column_id,
    rel.relname AS table_name
FROM
    pg_attribute att
JOIN
    pg_class rel ON att.attrelid = rel.oid
WHERE
    rel.relname = 'foo' -- имя вашей таблицы
    AND att.attnum > 0 -- исключая системные колонки
    AND NOT att.attisdropped -- исключая удалённые колонки
    AND NOT EXISTS (
        SELECT 1
        FROM pg_stats stats
        WHERE
            stats.schemaname = 'public' -- ваша схема
            AND stats.tablename = rel.relname
            AND stats.attname = att.attname
    )

Greenplum secrets🎩

Секрет 15 ( 2 в 1 или Точность - вежливость королей )
ч.1
На днях прекрасная коллега, а также подписчик канала, попросила чекнуть запрос на удаление дублей в 2 млрд табл-е.
Я предупредил что, DELETE 99% записей не лучшая идея, но по регламенту TRUNCATE +…

❤2🔥2

1.25K viewsedited 18:53

Greenplum secrets🎩

Друзья, Ровно через неделю, в Пт, 20 июн, поговорим тут в прямом эфире с автором о черных дырах базах данных, in-memory и не очень. Приглашенный гость Director, R&D Arenadata; founder & CTO Picodata - Константин Осипов. Начало в 19.00 мск

Друзья, через 8 ч начинаем, ссылка на стрим появится под этим постом в 19.00

❤3

1.15K views07:59

Greenplum secrets🎩

Live stream scheduled for Jun 20, 2025 at 16:00

15:31

Greenplum secrets🎩

Live stream started

15:58

Greenplum secrets🎩

Live stream finished (1 hour)

17:33

Greenplum secrets🎩

Друзья, через 8 ч начинаем, ссылка на стрим появится под этим постом в 19.00

Arenadata_with_Osipov_Konstantin

52.7 MB

Друзья, ниже стрим с Дэвидом Линчем в мире БД, состоявшийся в прошлую Пт. Прошлись по технологиям от старых флопов до супер компьютеров, не забыв передать привет ИИ.

🔥8

826 viewsedited 15:54

Greenplum secrets🎩

Друзья, через 8 ч начинаем, ссылка на стрим появится под этим постом в 19.00

Audio

👍5

713 views17:59

Greenplum secrets🎩

#GUC
На заметку
Часто встает вопрос при анализе деградации GP, вызванной динамическим кодом не твоих рук, была ли собрана статистика по измененной таблице, которая аффектит LA.
Глобально, для этого предусмотрен параметр (в show all)
Note
A question often arises when analyzing GP degradation caused by dynamic code made not by your hands, whether statistics were collected on the modified table that affects LA.
Globally, there is a parameter for this (in show all)

gp_autostats_mode_in_functions,none
gp_autostats_on_change_threshold,2147483647,Threshold for number of tuples added to table by CTAS or Insert-to

Но есть нюанс, если вы знаете под каким пользаком был выполнен запрос (а вы почти всегда это знаете) И запрос вернул данные
But there is a nuance, if you know under what role the query was executed (and you almost always know this) AND the query returned data

select setconfig
from pg_db_role_setting
where setdatabase = <pg_database.oid> -- имя БД, на которую действует настройка
  and setrole in (select usesysid from pg_user where usename = 'lady_bird_user') -- имя роли (или юзера)

напр. вида
eg. kinda

{gp_autostats_mode=on_change,gp_autostats_mode_in_functions=on_change,gp_autostats_on_change_threshold=100000}

то это значит, если данные в таблицу записаны в рамках PL/pgSQL функции сбор статистики для транзакции от 100 000 новых строк гарантирован.
Соотв-но для созданных в таком режиме работы БД крохотных таблиц важно помнить, что стата собрана не будет, что может выйти боком, если табл-а сопрягается далее с массивными табл-ами, о чем писал неоднократно
this means that if the data is written to the table within the PL/pgSQL function, statistics for a transaction of 100,000 new rows is guaranteed.
Accordingly, for tiny tables created in this mode of operation of the database, it is important to remember that statistics will not be collected,
which can backfire if the table is further coupled with massive tables, which I have written about many times

Greenplum secrets🎩

Секрет 30 (Девиз GUCCI) Secret 30 (GUCCI motto)

Сегодня на проме попалась PL/pgSQL ф-я на 1k строк, которая упорно давала осечку с ошибкой workfile per query size limit exceeded
, генеря каждый раз спилл более 30 TB.

Я прогнал ее под своей учеткой в песочнице…

❤4👍1

840 viewsedited 22:05

Greenplum secrets🎩

В заключение этого удивительного разговора, свежего как озон после осеннего июньского дождя, оставлю wish-list того, что хотелось бы иметь в GP :
1) Поддержка Oracle хинта /*+ materialize */ - напр., часто в запрос надо прокинуть 2 константных DATE поля для
ограничения between по диапазону дат в join-е с партицированной табл-ой и для этого приходится этот диапазон явно сохранить в отдельную таблицу,
из которой вычитать в переменные, которые передать в запрос

2) Поддержка Oracle хинта /*+ RESULT_CACHE*/, чтобы рез-ты одного и того же запроса можно достать из кэша.

3) Поддержка Oracle опции ALTER TABLE ... SET UNUSED для мгновенного удаления колонки и очистка этого удаленного поля при вакуум

4) Есть удобный drop table if exists, но хорошо бы иметь create table if not exists (в PostgreSQL 17 это сделали)

5) Поправить баг в выдаче расхода RAM в JSON версии плана запроса explain. Сейчас он ограничен int4, значения выше 2^32-1 отображаются <0 (Аренадата уже фиксит )
Если в вас есть, что добавить - you are welcome!!!

👍6

1K viewsedited 14:43

Greenplum secrets🎩

#DATA#Фото дня#

741 views11:14

Greenplum secrets🎩

Друзья, прошла аномально крупная японская свеча на покупку акций Аренадата ( тикер #DATA ).
Как бы ни развивалась история с обвинением Сергею Мацоцкому( основатель IBS, и, если верить СМИ, держатель крупного пакета акций #DATA )
по даче взятки, безапеляционно верю в компанию, в технологию MPP, продолжаю в нее инвестировать.

❤1

737 viewsedited 11:14

Greenplum secrets🎩

Секрет 39 ( Не перегружайте запрос лишними полями )
В продолжение wish-list, пример на проме не заставил себя ждать
Запрос

with cte as (
SELECT 
    a.report_dt,
    a.c1, ... , a.c41
   FROM foo a
     JOIN ( SELECT b.report_dt,
            b.agreement_rk,
            max(b.version_id) AS version_id
           FROM foo b
          GROUP BY b.report_dt, b.agreement_rk) c USING (report_dt, agreement_rk, version_id)
)
select 'total' as nm, '2999-12-31' as report_dt, count(*) from cte
union all
select 'q1' as nm, report_dt, count(*) from cte
where report_dt between '2025-01-01' and '2025-03-31'
group by 2
union all
select 'q2' as nm, report_dt, count(*) from cte
where report_dt between '2025-04-01' and '2025-06-30'
group by 2;

создал спилл 20 TB.

Почему меня зацепил этот запрос ?
По ряду причин.
Если исключить список лишних полей a.c1, ... , a.c41( которые разработчик видимо на автомате скопипастил из кода аксессора, который наш фреймворк использует для доступа к витринам), не влияющих на результат , спилл падает до 3.5 TB:

with cte as (
SELECT 
    a.report_dt
   FROM foo a
     JOIN ( SELECT b.report_dt,
            b.agreement_rk,
            max(b.version_id) AS version_id
           FROM foo b
          GROUP BY b.report_dt, b.agreement_rk) c USING (report_dt, agreement_rk, version_id)
)
<тот же набор сабтоталов из 3х селектов, что в исходном запросе>

Если оставить все поля, но выбрать из cte
либо

select 'q1' as nm, report_dt, count(*)
where report_dt between '2025-01-01' and '2025-03-31'
group by 2

либо

select 'total' as nm, '2999-12-31' as report_dt, count(*)

спилл падает до 2.8 TB и 2.6 TB соотв-но, т.е. с учетом погрешности нашего коллекторра спиллов, он одинаков.

Наверное, если бы была поддержка хинта MATERIALIZE как в Oracle, таких аномалий не было бы.

Действительно, если сохранить cte

with cte as (
SELECT 
    a.report_dt,
   FROM foo a
     JOIN ( SELECT b.report_dt,
            b.agreement_rk,
            max(b.version_id) AS version_id
           FROM foo b
          GROUP BY b.report_dt, b.agreement_rk) c USING (report_dt, agreement_rk)
)

в таблу AO/CO zstd1 (24 млрд туплов), то получим спилл данной операции CTAS 2.3 TB, выборка сабтоталов из которой спила уже не создает.

В исходном же варианте, планы запросов для
вариантов

 
select 'total' as nm, '2999-12-31' as report_dt, count(*) from cte

select 'total' as nm, '2999-12-31' as report_dt, count(*) from cte
union all
select 'q1' as nm, report_dt, count(*) from cte
where report_dt between '2025-01-01' and '2025-03-31'
group by 2

разительно отличаются.

Greenplum secrets🎩

👍3❤1👏1

765 viewsedited 21:10

Greenplum secrets🎩

Друзья, лето в самом разгаре и для тех кто еще думает у меня 2 хорошие новости. Во первых, несмотря на спад туристов на 70% в Анапе, на других ККК, как-то в Кабардинке и Дивноморском аншлаг, по мазуту норма, вода прогрелась до 25, самое то! А с открытием аэроопорта в Геленджике море стало еще доступнее. Во вторых, если кто-то пресытился морем или горами Кавказа И хочет отдохнуть семьей, компанией до 8 чел, то рекомендую классное, уникальное место на берегу р.Кубань, за качество отвечаю, ссылка ниже.

❤3👍2

926 views21:02

Greenplum secrets🎩

https://t.iss.one/tsatse01/92

ЦАЦЭ - территория Вашего отдыха

🔠🔠🔠🔠🔠🔠🔠

⏳ половина лета позади, но это не повод грустить - впереди ещё ровно столько же дней классно отдохнуть.

Всё лето мы работаем для вас в двух режимах по предварительной брони:

➡️ дневное пребывание с понедельника по четверг с 11-00 до 21-00
➡️ суточное…

👍1

1.08K views21:04

Greenplum secrets🎩

На заметку
А вы знали, что не каждая транзакция откатывается при ошибке ?
Если pl/pgsql функция абортнулась из-за нехватки места
schema's disk space quota exceeded with name : public
то таблицы, которые были ей созданы не откатываются, а остаются как будто транзакция корректно завершилась.
Note:
Did you know that not every transaction is rolled back on error?
If the pl/pgsql function aborted due to lack of space with error
"schema's disk space quota exceeded with name : public"
then the tables that were created by it are not rolled back, but remain as if the transaction had completed correctly.

😱9❤1

1.53K viewsedited 10:09

Greenplum secrets🎩

На заметку
Если вы трассируете pl/pgsql ф-ю в поисках ее узких мест, скажем неоптимального SQL оператора CTAS, имейте ввиду, что CTAS выполненный в режиме explain analyze
создает таблицу x без статы, что чревато тем, что для больших таблиц ( от 40 млрд строк в моем случае ),

select count(*) from x

может не выполниться, т.к
выполняется без предагрегации на сегментах.
Note:
If you optimize a pl/pgsql function searching its bottlenecks, say a non-optimal SQL CTAS operator, keep in mind that CTAS executed in Explain-Analyze mode
creates table x without stats, which has consequences, for example for large tables (from 40 billion rows in my case),

select count(*) from x

may not execute, because
it is executed without pre-aggregation on segments.

🤔3👍1

757 views20:12

Greenplum secrets🎩

Секрет 40 (Чем раньше, тем лучше или почему ИИ не лучше джуна)

У нас в команде разработчиков работаеют талантливые и креативные ребята, которые пишут много динамического кода, который можно использовать в парадигме LEGO для генерации целевого кода, но иногда он местами сыроват.

Попросили посмотреть очередь "пациентов":
Симптомы те же у каждого, код работает 1ч и рвет с ошибкой,
Command could not be dispatch to segment entry db <127.0.0.1:5432 pid=154069: server closed the connection unexpectedly
У каждого пациента выполняется один и тот же динамический код с точностью до названия таблицы.

Собственно, сам код банален - получить список уникальных версий в CSV формате:

select string_agg(DISTINCT version_id::text, ',') from x

Однако, даже если в рез-те получим мизерный список, скажем дюжину элементов, сначала мастер должен отсортировать всю колонку в рамках DISTINCT.
В нашем случае, для табл-ы из 2 млрд строк, оный этого не переварил.

Незатейливая декомпозиция делает этот код масштабируеммым, если на конкатенацию подать уникальный список, который получим в режиме MPP :

select string_agg( t.version_id::text, ',') from
(select DISTINCT version from x) t

Кстати, ИИ на запрос "напиши код для Greenplum, который создает из колонки таблицы список ее уникальных значений в формате csv"
выдал код по первому варианту, т.е. без использования MPP

👍9❤1

837 views07:47

About

Blog

Apps

Platform