profile
viewpoint

Ask questionscmd/compile: slower bit operations (regression in go 1.16.3)

What version of Go are you using (go version)?

<pre> $ go version 1.16.3 </pre>

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

<details><summary><code>go env</code> Output</summary><br><pre> $ go env GO111MODULE="auto" GOARCH="amd64" GOBIN="" GOCACHE="/mnt/go-cache" GOENV="/mnt/ubuntu/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/mnt/jenkins/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/mnt/jenkins" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.16.3" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build4206356499=/tmp/go-build -gno-record-gcc-switches" </pre></details>

What did you do?

<!-- If possible, provide a recipe for reproducing the error. A complete runnable program is good. A link on play.golang.org is best. -->

We are upgrading from Go 1.15.8 to Go 1.16.3 and found a noticeable benchmark regression. Here is a bit set setting i-th bit

package main

const remMask = uint64(1)<<6 - 1

type bitset struct {
	b []uint64
}

func (b *bitset) set(i int) {
	m := uint64(1) << (uint64(i) & remMask)
	b.b[i>>6] |= m
}

func (b *bitset) unset(i int) {
	m := uint64(1) << (uint64(i) & remMask)
	b.b[i>>6] &^= m
}
package main

import (
	"math/rand"
	"testing"
)

func BenchmarkSet(b *testing.B) {
	const n = 10
	const m = 1<<n - 1
	const numWords = (1 << n + 63) >> 6
	bs := &bitset{
		b: make([]uint64, numWords),
	}
	r := rand.New(rand.NewSource(0))
	bits := make([]bool, 1<<n)
	for i := range bits {
		bits[i] = r.Intn(2) == 1
	}
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		if bits[i&m] {
			bs.set(i & m)
		} else {
			bs.unset(i & m)
		}
	}
}

Here is the benchmark result comparing 1.15.8 and 1.16.3.

goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz

Set-36     1.64ns ± 1%    2.89ns ± 1%  +76.54%  (p=0.001 n=7+7)

I bisected it, the benchmark regressions started at https://github.com/golang/go/commit/96139f25993f3d8122e27a6fec877a4d4f69f83b

What did you expect to see?

Less performance regression.

What did you see instead?

+76.54% performance regression.

golang/go

Answer questions ulikunitz

One observation: the ANDQ is not required, it is done by BTSQ itself.

Another: BTSQ is probably multiple micro-ops including a write of the CF flag. The upper sequence is probably translated 1-to-1 into micro-ops and will have a smaller micro-op count.

useful!
source:https://uonfu.com/
answerer
Ulrich Kunitz ulikunitz Germany Go developer interested in compression; DevOps manager for identity & authentication
Github User Rank List