Just links
6.59K subscribers
364 photos
43 videos
10 files
7.82K links
That's just link aggregator of everything I consider interesting, especially DL and topological condensed matter physics. @EvgeniyZh
Download Telegram
>We throw away gradient updates randomly
>Outperforms Muon with RMSProp

paper
🔥10👍5👎1
Forwarded from Градиент обреченный (Sergei Averkiev)
🔺 hf-mem

Утилита, показывающая сколько нужно памяти для запуска модели с HF, кол-во её параметров и заодно их разбивку. Качает только метадату, по ней и считает.

uvx hf-mem --model-id Qwen/Qwen-Image


(uvx тут запускает hf-mem без установки в систему)

Есть флаг --experimental (работает для ForCausalLM и ForConditionalGeneration классов), с ним считает размер KV cache'а, необходимого для инференса с заданными max-length и batch-size.

👉 https://github.com/alvarobartt/hf-mem
👍12🔥5💩1
Узнал из https://t.iss.one/tropicalgeometry/1095 что оптимальность решётки E8 тоже формализовали!* Интересно почитать блюпринт (и вообще кажется, что это приятный жанр): файл содержит схему доказательства, причем доступную человеку со стороны. Ну и забавен местами (см. картинку)

https://thefundamentaltheor3m.github.io/Sphere-Packing-Lean/blueprint.pdf
🔥2
Forwarded from 
Bayesians Commit the Gambler's Fallacy

Abstract:

The gambler's fallacy is the tendency to expect random processes to switch more often than they actually do—for example, to assign a higher probability to heads after a streak of tails. It's often taken to be evidence for irrationality. It isn't. Rather, it's to be expected from a group of Bayesians who begin with causal uncertainty, and then observe unbiased data from an (in fact) statistically independent process. Although they increase their confidence that the outcomes are independent, they do so in an asymmetric way—ruling out “streaky” hypotheses more quickly than “switchy” ones. Their expectations depend on this balance of uncertainty; as a result, the majority (and the average) exhibit the gambler's fallacy, expecting a heads after a string of tails. If they have limited memory, this tendency persists even with arbitrarily-large amounts of data. In fact, such Bayesians exhibit a variety of the empirical trends found in studies of the gambler's fallacy. They expect switches after short streaks but continuations after long ones; these nonlinear expectations vary with their familiarity with the causal system; their predictions depend on the sequence they've just seen; they produce sequences that are too switchy; and they exhibit greater rates of the gambler's fallacy in binary predictions than in probability estimates. In short: what's been thought to be evidence for irrationality may instead be rational responses to limited data and memory.

https://onlinelibrary.wiley.com/doi/10.1111/cogs.70171
😭2