#golang team plans for 1st half 2023: dropping non-unified IR and removing associated GOEXPERIMENT, next steps for PGO, revamping the inliner, core health CI/CD tasks, batched write barriers, <...> long-term RAM efficiency efforts. https://github.com/golang/go/issues/43930#issuecomment-1386010267
GitHub
Go compiler and runtime meeting notes · Issue #43930 · golang/go
Google's Go compiler and runtime team meets periodically (roughly weekly) to discuss ongoing development of the compiler and runtime. While not open to the public, there's been desire by th...
Reducing #golang Execution Tracer Overhead With Frame Pointer Unwinding (by @felixge )
https://blog.felixge.de/reducing-gos-execution-tracer-overhead-with-frame-pointer-unwinding/
https://blog.felixge.de/reducing-gos-execution-tracer-overhead-with-frame-pointer-unwinding/
Felix Geisendörfer
Reducing Go Execution Tracer Overhead With Frame Pointer Unwinding
Learn how frame pointer unwinding could significantly reduce the CPU overhead of the Go execution tracer.
Go 1.20 released https://twitter.com/golang/status/1620875197569187840
In #Golang 1.20 the Go team introduced an experimental new method of memory management called Go arenas.
In this blog post we show how we combined continuous profiling with memory arenas to improve performance of one of our cloud services by ~8% !
https://pyroscope.io/blog/go-1-20-memory-arenas/
In this blog post we show how we combined continuous profiling with memory arenas to improve performance of one of our cloud services by ~8% !
https://pyroscope.io/blog/go-1-20-memory-arenas/
pyroscope.io
Go 1.20 Experiment: Memory Arenas vs Traditional Memory Management | Open Source Continuous Profiling Platform
Go 1.20 Experiment with Memory Arenas
Fast and dynamic encoding of Protocol Buffers in #golang
https://vincent.bernat.ch/en/blog/2023-dynamic-protobuf-golang
https://vincent.bernat.ch/en/blog/2023-dynamic-protobuf-golang
vincent.bernat.ch
Fast and dynamic encoding of Protocol Buffers in Go
Encoding to Protocol Buffers usually require a proto definition file. But, by using low-level primitives and a code-defined schema, fast encoding...
Exploring Go's Profile-Guided Optimizations
https://www.polarsignals.com/blog/posts/2022/09/exploring-go-profile-guided-optimizations/
https://www.polarsignals.com/blog/posts/2022/09/exploring-go-profile-guided-optimizations/
Efficient Go APIs with the mid-stack inliner (2019) by Filippo Valsorda
https://words.filippo.io/efficient-go-apis-with-the-inliner/
https://words.filippo.io/efficient-go-apis-with-the-inliner/
Filippo Valsorda
Efficient Go APIs with the mid-stack inliner
A common task in Go API design is returning a byte slice. In this post I will explore some old techniques and a new one. In particular, we'll see how the mid-stack inliner interacts with escape analysis to make it possible for the most natural API to be also…
- #golang PGO will be auto by default in Go 1.21
- First PGO result from a google-internal app yields -2.75% CPU
- We might get better regalloc (registry allocator) (5% CPU gain)
Do you have any success stories by running PGO ? Please share!
Source: https://github.com/golang/go/issues/43930#issuecomment-1468713261
- First PGO result from a google-internal app yields -2.75% CPU
- We might get better regalloc (registry allocator) (5% CPU gain)
Do you have any success stories by running PGO ? Please share!
Source: https://github.com/golang/go/issues/43930#issuecomment-1468713261
GitHub
Go compiler and runtime meeting notes · Issue #43930 · golang/go
Google's Go compiler and runtime team meets periodically (roughly weekly) to discuss ongoing development of the compiler and runtime. While not open to the public, there's been desire by th...
runtime/cgo: store M for C-created thread in pthread key
BenchmarkCGoInCThread results:
1. it's 28x faster, 3395 ns/op -> 121 ns/op, macOS & Intel i7-9750H CPU
2. it's 6.5x faster, 1495 ns/op -> 230 ns/op, Linux & Intel Xeon CPU E5-2630
https://go-review.googlesource.com/c/go/+/392854 #golang
BenchmarkCGoInCThread results:
1. it's 28x faster, 3395 ns/op -> 121 ns/op, macOS & Intel i7-9750H CPU
2. it's 6.5x faster, 1495 ns/op -> 230 ns/op, Linux & Intel Xeon CPU E5-2630
https://go-review.googlesource.com/c/go/+/392854 #golang
Great #golang guides are coming! At least might be. From https://github.com/golang/go/issues/43930#issuecomment-1487438236
We are now using Swissmap, a new @golang hash table based on SwissTable that is faster and uses less memory than Golang's built-in map.
This blog by @AndyArt58355407 covers the motivation, design, and implementation of SwissMap for Dolt.
https://dolthub.com/blog/2023-03-28-swiss-map/
This blog by @AndyArt58355407 covers the motivation, design, and implementation of SwissMap for Dolt.
https://dolthub.com/blog/2023-03-28-swiss-map/
Dolthub
SwissMap: A smaller, faster Golang Hash Table
Initial release of SwissMap, a Golang port of Abseil's flat_hash_map.
High-performance JSON parsing in #golang by @CockroachDB
https://www.cockroachlabs.com/blog/high-performance-json-parsing/
https://www.cockroachlabs.com/blog/high-performance-json-parsing/
Cockroachlabs
High-performance JSON parsing in Go
This blog post is an exploration of JSON parser performance, and, ultimately, a description of the high-performance JSON parser used in CockroachDB.
crypto/sha256: add native SHA256 instruction implementation for AMD64 #golang merged 🚀
https://github.com/golang/go/issues/50543
https://github.com/golang/go/issues/50543
GitHub
crypto/sha256: add native SHA256 instruction implementation for AMD64 · Issue #50543 · golang/go
The sha_ni sha256 instructions have been shown to provide an ~4x increase in hash rate on newer amd64 systems versus the avx2 implementation. Transliterating the Linux implementation shows an up to...