QC 的小树林

抽时间把 Detypify 重构了一下，发了个版

符号更多了，准确率更高了，页面更丝滑了

@fxck_durov 做的新模型，前三个候选项的 acc 是 99%

Typst 用户可以上 detypify.quarticcat.com 玩玩

GitHub

GitHub - QuarticCat/detypify: Typst symbol classifier

Typst symbol classifier. Contribute to QuarticCat/detypify development by creating an account on GitHub.

👍7

1.35K views16:40

QC 的小树林

Forwarded from Welcome to the Black Parade

学会了 spinlock 的正确写法，加一个 while (*xp == 1) 的内部 spin 原地提速 30%👍

--- a/bad.c
+++ b/good.c
@@ -34,8 +34,10 @@ static inline int read_once(const xchglock_t *p)
 
 static inline void xchg_lock(xchglock_t *xp)
 {
-    while (xchg(xp, 1) == 1)
-        ;
+    while (xchg(xp, 1) == 1) {
+        while (read_once(xp) == 1)
+            ;
+    }
 }

原因是 cacheline bouncing，perf c2c 检查可知好版本的 HITM 只有坏版本的 1/5。太高级了。

ref: perfbook 7.3.1

Please open Telegram to view this post

VIEW IN TELEGRAM

👍5❤1

244 views08:24

QC 的小树林

https://github.com/NVlabs/cutile-rs

GitHub

GitHub - NVlabs/cutile-rs: cuTile Rust provides a safe, tile-based kernel programming DSL for the Rust programming language. It…

cuTile Rust provides a safe, tile-based kernel programming DSL for the Rust programming language. It features a safe host-side API for passing tensors to asynchronously executed kernel functions. -...

👍1

319 views08:49

QC 的小树林

Welcome to the Black Parade

学会了 spinlock 的正确写法，加一个 while (*xp == 1) 的内部 spin 原地提速 30%👍 --- a/bad.c +++ b/good.c @@ -34,8 +34,10 @@ static inline int read_once(const xchglock_t *p) static inline void xchg_lock(xchglock_t *xp) { - while (xchg(xp, 1) == 1) - ; + while…

https://www.siliceum.com/en/blog/post/spinning-around

Siliceum

Spinning around: Please don't! - siliceum

Embark on a journey about why you should sometimes trust your OS more than yourself.

👍1

355 views06:41

QC 的小树林

Forwarded from Lancern's Treasure Chest

How many branches can your CPU predict?

via Daniel Lemire's blog

Daniel Lemire's blog

How many branches can your CPU predict?

Modern processors have the ability to execute many instructions per cycle, on a single core. To be able to execute many instructions per cycle in practice, processors predict branches. I have made the point over the years that modern CPUs have an incredible…

👍2❤1

350 views05:24

QC 的小树林

https://chuanqixu9.github.io/c++/2026/03/27/C++20-Coroutines-from-compiler-and-library-authors-perspective.html

327 views07:02

QC 的小树林

Forwarded from RSS bot

A Fast Immutable Map in Go
https://lemire.me/blog/2026/03/29/a-fast-immutable-map-in-go/

Daniel Lemire's blog

A Fast Immutable Map in Go

Consider the following problem. You have a large set of strings, maybe millions. You need to map these strings to 8-byte integers (uint64). These integers are given to you. If you are working in Go, the standard solution is to create a map. The construction…

👀3

263 views07:15

QC 的小树林

RSS bot

A Fast Immutable Map in Go https://lemire.me/blog/2026/03/29/a-fast-immutable-map-in-go/

以防频道订户看到标题直接划过：转发这篇主要是里面提到的 Binary Fuse Filter，适用于一次构建多次查询的场景，比 bloom filter 更高效

arXiv.org

Binary Fuse Filters: Fast and Smaller Than Xor Filters

Bloom and cuckoo filters provide fast approximate set membership while using little memory. Engineers use them to avoid expensive disk and network accesses. The recently introduced xor filters can...

👍3

1.2K views01:34

QC 的小树林

麻薯生日快乐🎂

🥰9🎉1

242 views16:06

QC 的小树林

https://github.com/envidera/zench

GitHub

GitHub - envidera/zench: Run benchmarks anywhere in your codebase and integrate performance checks directly into your cargo test…

Run benchmarks anywhere in your codebase and integrate performance checks directly into your cargo test pipeline. - envidera/zench

👀1

286 views03:54

QC 的小树林

https://github.com/envidera/zench

rust 官方的 #[bench] 什么时候才会出来

🔥1

286 views04:07

QC 的小树林

https://chrisdown.name/2026/03/24/zswap-vs-zram-when-to-use-what.html

chrisdown.name

Debunking zswap and zram myths

zswap and zram are fundamentally different approaches with different philosophies. If in doubt, use zswap.

332 views10:07

QC 的小树林

Forwarded from slanterns w/ 🦀

https://www.e6data.com/blog/deadlocking-tokio-mutex-without-holding-lock

E6Data

Deadlocking a Tokio Mutex without Holding a Lock | e6data

Learn why a Tokio mutex in async Rust can appear deadlocked even when unlocked, how a waker-contract violation traps permits, and how to fix it safely.

224 views22:33

QC 的小树林

Forwarded from 少数派sspai

Type-C 接口最大的问题，是看起来已经「统一」了 [by 流歌]
https://sspai.com/post/108325

少数派 - 高品质数字消费指南

上架就被抢断货的 CtoC 转接头，Type-C 接口真的统一了吗？ - 少数派

真正的问题在于，Type-C 用一个统一的接口形态，掩盖了背后复杂且分裂的实现与协议。

274 views08:04

QC 的小树林

少数派sspai

Type-C 接口最大的问题，是看起来已经「统一」了 [by 流歌] https://sspai.com/post/108325

才知道还有 5.1K 转接头这种好东西，可以把家里廉价小家电的充电线扔了

339 views08:06

QC 的小树林

Forwarded from Hacker News (yahnc_bot)

Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets https://arxiv.org/abs/2604.07902

arXiv.org

Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

Granlund and Montgomery proposed an optimization method for unsigned integer division by constants [3]. Their method (called the GM method in this paper) was further improved in part by works such...

214 views09:18

Score: 72+Comments: 7+

QC 的小树林

Forwarded from Lancern's Treasure Chest

https://isocpp.org//blog/2026/04/announcement-cppreference.com-update

cppreference 挂了一年多，标准委员会老头坐不住亲自下场帮忙了，预计于本月晚些时候将完成全部更新并重新上线。

❤2

226 views03:18

QC 的小树林

Forwarded from Welcome to the Black Parade

perf 有一个子命令 annotate 很好用，如果之前没有用过的话可以看一下用它 debug cpython [1] 和找数据结构的热点字段 [2]。但可能有人不知道，perf annotate 的结果可能是“错误的”。

考虑下面的代码：

#include <stdint.h>
#include <stdio.h>

__attribute__((noinline))
uint64_t hot_div(uint64_t iters) {
    uint64_t a = 0x123456789abcdefULL;
    uint64_t b = 7;
    uint64_t acc = 0;

    for (uint64_t i = 0; i < iters; i++) {
        asm volatile(
            "xor %%rdx, %%rdx\n\t"
            "idivq %[b]\n\t"
            "add %%rax, %[acc]\n\t"
            "nop\n\t"
            "nop\n\t"
            "nop\n\t"
            : "+a"(a), [acc]"+r"(acc)
            : [b]"r"(b)
            : "rdx", "cc");
        a += 0x9e3779b97f4a7c15ULL;
    }
    return acc;
}

int main(void) {
    volatile uint64_t x = hot_div(400000000ULL);
    printf("%llu\n", (unsigned long long)x);
    return 0;
}

idivq 指令应该是最慢的热点，我们预期 perf record + perf annotate 的输出应该指向这个这个指令，在 x64 linux6.17 上运行命令试试：

gcc main.c
perf record -e cycles:u -- taskset -c 2 ./a.out 
perf annotate --stdio -l -s hot_div

输出却是 62% 的 add 和 0% 的 idiv？

   16.63 :   401165: xor    %rdx,%rdx // a.out[401165]
    0.00 :   401168: idiv   %rsi
   62.06 :   40116b: add    %rax,%rcx // a.out[40116b]

我是这样肤浅地理解这个问题的，cpu 执行指令需要 decode 同时推进 rip 寄存器，所以如果 cpu 在执行一句指令，此时 rip 其实指向下一条指令；此事在 call 指令上也很明显，压到栈上的 ra (LR) 其实是指向 call 的下一条指令，本质也是因为执行 call 指令的时候 rip 已经指向下一条。

所以对于上面的 perf annotate 输出，真正的热点其实是 62% 的上一条，idiv 指令。

然而真实世界比这更加复杂。我们在相同的 x64 linux6.17 再试一次：

# perf record -- taskset -c 2 ./a.out 
# perf annotate --stdio -l -s hot_div

    4.20 :   401162: mov    %rdx,%rcx // a.out[401162]
    0.00 :   401165: xor    %rdx,%rdx
   70.35 :   401168: idiv   %rsi // a.out[401168]
    0.00 :   40116b: add    %rax,%rcx

这次 perf annotate 又准确地指向了 idiv 热点 (70%)，为什么？

简单来说 x64 的 perf 在内核里实现了一个叫做 precise_ip 的东西，上面两次 perf record 的微小参数差异导致调用的 perf_event_open syscall 的参数有差异，第一个命令的 precise_ip=3，第二个的 precise_ip=0。如果 precise_ip=3，在内核里会对 rip 做一次修正，这在 intel 里叫做 PEBS (Precise Event-Based Sampling)。这项功能的 fixup 做得非常精细，考虑到上一个指令其实可能是 goto 跳过来到当前 rip 的， intel cpu 通过查询 LBR 准确地回溯到上一次跳转来修正 rip，非常炫。

不过实际使用的时候并不需要观测 syscall 参数，而是通过 perf evlist 直接看采样文件就能知道是否启动了 precise_ip 修正：

perf evlist -v -i perf.data

如果输出里有 "precise_ip: 3"，说明采样结果已修正；如果没有 precise_ip，说明有 off-by-one。

以上只针对 x64 的情况，如果是 arm64 又会怎样呢？

不知，没有 arm64 的机器，但 gpt-5.3-codex-medium 说 arm64 有个 SPE 的功能可以在 report/annotate 的时候做矫正。听不懂😋

总之用 perf 要小心了，如果不理解这些实现细节小心得出完全错误的 profiling 结论😉

[1] debug cpython https://t.iss.one/c/1459082815/900
[2] 找数据结构的热点字段 https://t.iss.one/c/1459082815/1015

👍3❤2

166 views08:59

QC 的小树林

https://emschwartz.me/your-clippy-config-should-be-stricter/

Evan Schwartz

Your Clippy Config Should Be Stricter

“If it compiles, it works.” This feeling is one of the main things Rust engineers love most about Rust, and a reason why using it with coding agents is espec...

👀1

394 views03:32

QC 的小树林

https://emschwartz.me/your-clippy-config-should-be-stricter/

原来还有 #[expect(..., reason = "…")]

138 views03:32

About

Blog

Apps

Platform