Блог*

Рубрика: интересные компиляторы

Поговорим просто про сложение. На x86 сложение имеет 2 операнда addq %reg1, %reg2, что означает reg2 += reg1. На ARM сложение имеет 3 операнда add x3, x2, x1, что означает x3 = x1 + x2.

Когда вы складываете несколько чисел подряд (скажем, для простоты 4 числа x1 = x1 + x2 + x3 + x4), то это делается в 3 инструкции

x5 = x1 + x2
x6 = x3 + x4
x1 = x5 + x6

Такие оптимизации называются tree reduction, чтобы первые две операции исполнялись параллельно в процессоре, так как у них нет зависимости. В итоге такие операции занимают 2 цикла вместо 3.

К сожалению, так как в x86 сложение принимает только 2 операнда, так сделать не получится, либо надо складывать числа как x1 += x2, x3, x4 (цепочка из 3), либо складывать x3 += x4, что не всегда хочется или можно (скажем, менять x2, x3, x4 не хочется). Есть инструкция lea, но на x86 не всегда хватает регистров, чтобы сделать это быстро, поэтому в целом tree reduction не очень применяется.

Так вот, так как clang слишком сильно и годами оптимизировался Долиной под x86, такие сложения редко оптимизировались в целом и оставались просто через add.

И да, clang как-то слишком топорно оптимизирует сложения 4 чисел на Arm, где у нас 16 регистров вообще

        add     x13, x13, x9
        add     x13, x13, x10
        add     x13, x13, x12

То есть цепочка из 3, когда увидел, прям ощутились оптимизации, на которые забили, когда смотрели на код x86. И такие вещи забавно прослеживаются, когда смотришь декомпиляцию clang под Arm -- много оптимизаций или их отсутствие как на платформе x86.

GCC, кстати получше это делает

        add     x1, x4, x1
        add     x6, x3, x2
        add     x1, x6, x1

Интересная заключительная мораль в том, что мы даже сложения чисел не можем адекватно соптимизировать в 2022. Ну бывает, что ж

Поиграться: https://gcc.godbolt.org/z/1nozoz1M4

gcc.godbolt.org

Compiler Explorer - C++

// We're going to do these two things in almost every function. Prevents
// compiler from optimizing away.
#define DECLARE5 \
uint64_t data1, data2, data3, data4, data5;
#define FAKE_DEP5 \
asm volatile("" \
: "+r" (data1), "+r" (data2), "+r"…

👍9

454 views21:17

Блог*

#prog #rust #article

Давно хотел рассказать, да всё руки что-то не доходили.

Fast and Extensible Equality Saturation with egg

At a high level, compilers, optimizers, and synthesizers are all trying to do the same thing: transform programs into better, equivalent programs. We see this pattern all over the place: in vectorizing computations for digital signal processing, performing operator fusion for machine learning kernels, and automating numerical analysis to improve floating point accuracy. This pattern even shows up in CAD when we want to reverse engineer some low-level, geometric description of an object back up to a high-level, editable design (above).

What if, rather than developing ad hoc, application-specific search strategies for each problem, we could adopt a general, principled approach for searching through the space of programs equivalent to our input to find the “best” simplified candidate?

It may seem that such searches will inevitably suffer exponential blow up. However, if we use a rewriting technique known as Equality Saturation (EqSat) coupled with recent insights for efficiently implementing equality graphs (e-graphs, the core data structure driving EqSat), we can develop a generic, reusable library for building compilers, optimizers, and synthesizers across many domains.

This post introduces egg, our fast and flexible e-graph library implemented in Rust. egg lets you bring the power of EqSat to your problem domain. egg is open-source, packaged, and documented, and many research projects across diverse domains have used its speed and flexibility to achieve state-of-the-art results:

* Szalinski shrinks 3D CAD programs to make them more editable. [PLDI 2020]
* Diospyros automatically vectorizes digital signal processing code. [ASPLOS 2021]
* Tensat optimizes deep learning compute graphs both better and faster (up to 50x) than the state of the art. [MLSys 2021]
* Herbie improves the accuracy of floating point expressions. The egg-herbie library made parts of Herbie over 3000x faster! [PLDI 2015]
* SPORES optimizes linear algebra expressions up to 5x better than state-of-the-art. [VLDB 2020]

Так как это сугубо вводная статья, настоятельно рекомендую также прочитать папир по библиотеке, которая заодно объясняет, почему она смогла сделать equality saturation практичной техникой оптимизации.

SIGPLAN Blog

Fast and Extensible Equality Saturation with egg

Recent developments in e-graphs and equality saturation make a compelling case for a new way to build optimizers and synthesizers.

👍3❤1

726 views22:20

Блог*

Нет, Алиса, название "egg" вообще никак не связано с тем, о чём ты подумала

❤2👎2💩2

634 views22:26

Блог*

#prog #article

Provably Space-Efficient Parallel Functional Programming

<...> Using functional languages also helps the programmer with an important safety concern — data races — by allowing greater control over effects.

<...>

Implicitly-parallel functional programming can therefore be a game changer for parallelism, but there is the elephant in the room: poor performance. The primary reason for poor performance is memory.

<...>

We have been working on this problem by utilizing a memory property of parallel functional programs called disentanglement. <...> Using disentanglement, we partition memory into heaps and distribute heaps among threads with a heap scheduler. Each thread manages its assigned heap independently, in parallel, and without communicating with other threads.<...> We implemented these techniques in the MPL compiler for parallel ML and were able to obtain impressive speedups: up to 50x on 70 cores with respect to our sequential baseline

682 views22:58

Блог*

https://ura.news/news/1052582394

Ученикам одной из курганских школ велели закупить дневники с изображением государственной символики. Дневники с изображениями котят и супергероев даже для начальных классов запрещены, сообщили URA.RU родители школьников.

«На родительском собрании в школе нам заявили, что дневники должны быть только с российской символикой, с флагом или гербом. Дневники с котиками и суперменами нельзя. Первый класс, еще ладно, если уже купили, но начиная со второго класса для всех строго — только символика РФ», — рассказала мама младшеклассника.

Россия для грустных.

---

Правда, в этой истории смущает отсутствие деталей ("одной из курганских школ"). Возможно, это просто выдумка.

ura.news

Курганским ученикам запретили покупать дневники с котятами

Читайте на URA.RU

💩5👍1

704 viewsedited 02:22

Блог*

#prog #meme

671 views08:35

Блог*

Forwarded from RWPS::Mirror

👍20😁12❤1😢1

681 views08:35