profile
viewpoint

rhysh/aws-sdk-go 1

AWS SDK for the Go programming language.

rhysh/2017-talks 0

Slides and links for 2017 talks

rhysh/docker 0

Docker - the Linux container runtime

rhysh/FlameGraph 0

stack trace visualizer

rhysh/goleveldb 0

LevelDB key/value database in Go.

rhysh/grpc-go 0

The Go language implementation of gRPC. HTTP/2 based RPC

rhysh/heka 0

Data collection and processing made easy.

rhysh/protobuf 0

Go support for Google's protocol buffers

MemberEvent

issue commentgolang/go

proposal: errors: add ErrUnimplemented as standard way for interface method to fail

The result of an optional interface check is always the same for a particular value. Code can branch based off of that and know that the value won't suddenly implement or un-implement the optional method. (Consider an http.Handler that uses http.Flusher several times per response.)

What are the rules for methods that return ErrUnimplemented? If a method call on a value returns it, does that method need to return it on every subsequent call? If a method call on a value doesn't return it (maybe does a successful operation, maybe returns a different error), is the method allowed to return it on a future call?

If there were a way to construct types with methods at runtime (possibly by trimming the method set of an existing type), with static answers for "does it support this optional feature", would that address the need?

ianlancetaylor

comment created time in 2 months

issue commentgolang/go

runtime: deprecate SetCPUProfileRate and replace body with panic

If you later call pprof.StartCPUProfile, that errors out if the profiler is on

That checks runtime/pprof.cpu.profiling. It does not check runtime.cpuprof.on.

An early call to runtime.SetCPUProfileRate does not cause a subsequent call to pprof.StartCPUProfile to return an error (though it does cause it to print a log line). CPU profiling data at the rate requested by the first call will be written to the io.Writer provided in the second call.

rsc

comment created time in 2 months

issue commentgolang/go

proposal: net/http, net/http/httptrace: add mechanism for tracing request serving

GotFirstRequestByte or similar will be an important tracing point. Connecting that event to the forthcoming http.Request value looks like one of the main API challenges. There's a start at that via http.Server.ConnState, but it relies on HTTP/1.x (and no pipelining). And, would instrumentation that acts only at the http.Handler level have no way to access GotFirstRequestByte events after the fact?

Middleware that works at the http.Handler level should have a way to subscribe to future events (such as WroteStatus(code int)). If the code to do that is r = r.WithContext(httptrace.WithServerTrace(ctx, trace)), and if we cannot require that dispatch(w, r) examine r.Context, then it looks like that pair of Context-changing calls would need to have side-effects. This is the other significant API challenge I see.


Here's what the struct might look like, provided we're able to address those challenges:

package httptrace // import net/http/httptrace

type ServerTrace struct {
  GotFirstRequestByte func(req *ServerTraceRequestKey)
  GotHeaders func(req *ServerTraceRequestKey)
  GotRequestBody func(req *ServerTraceRequestKey, length int64)
  GotTrailers func(req *ServerTraceRequestKey)
  WroteStatus func(req *ServerTraceRequestKey, code int)
  WroteHeaders func(req *ServerTraceRequestKey)
  WroteBody func(req *ServerTraceRequestKey, length int64)
  WroteTrailers func(req *ServerTraceRequestKey)
}

type ServerTraceRequestKey {
  // Maybe this struct doesn't need to hold any information,
  // and only needs to have a unique address.
  _ byte

  // Or maybe it could hold uniqueness in its fields, such as
  // the remote Addr or net.Conn plus HTTP/2 stream ID.
}

Plus a way to access the ServerTraceRequestKey once the Request is available:

package http // import net/http

// RequestKeyContextKey is a context key.
// The associated value will be of type *httptrace.ServerTraceRequestKey
var RequestKeyContextKey = &contextKey{"request-key"}
CAFxX

comment created time in 2 months

issue commentgolang/go

runtime/pprof: Linux CPU profiles inaccurate beyond 250% CPU use

It looks like timer_create with CLOCK_THREAD_CPUTIME_ID can work for pure-Go programs: • The timers seem to correctly track the CPU usage of single threads. • An extension specific to Linux (SIGEV_THREAD_ID for the sigev_notify field) can steer the signal to the thread that caused the timer to expire. • The kernel will successfully deliver these signals, even when many of them trigger in a 4ms window.

There's more: • The clockid for a thread's CPU usage can be calculated from its pid/tid. That means a single goroutine/thread in the program can create a timer to track the CPU usage each other thread in the program (and ask that those timers deliver SIGPROF to the respective threads). • Go registers its signal handlers with SA_SIGINFO, and the value of si_code shows whether this SIGPROF delivery is from setitimer (SI_KERNEL) or from timer_create (SI_TIMER). That means it's possible to use setitimer and timer_create at the same time, and to de-conflict the results. (I've tested this on Linux 4.14, and have not confirmed on older kernels.)

CLOCK_THREAD_CPUTIME_ID is probably most promising. The only caveat is handling threads that the runtime is not aware of.

If we use setitimer for the whole process, and use timer_create for the threads the runtime can discover (either through runtime.allm or /proc/self/task), the signal handler can choose to no-op on SIGPROF signals that are both 1) caused by setitimer and 2) arrive on threads that have a per-thread timer active.

The result of this would be profiles of work done on threads created by the Go runtime that are more accurate (not cutting off at 250% CPU usage) and precise (not under-sampling GC work on large, mostly-idle machines by a factor of 5), while keeping the current behavior for non-Go-runtime-created threads.

If there's interest in improving profiles for work done on threads that the Go runtime did not create (or are otherwise not in runtime.allm), and the project has appetite for the corresponding complexity, there are a few possible paths after that: • When setitimer causes a SIGPROF delivery on a previously-unknown thread, the Go code that samples the stack could also request a per-thread timer for that thread. • The program could poll /proc/self/task, looking for threads that were created without going through runtime.newm or similar. • The program could use use a mechanism to subscribe to new thread creation (it looks like perf_event_open can do this, if the runtime can call it when there's still only one thread in the process .. so maybe it doesn't help for the applications that would need it).


I'm working on code to make this happen, but it's resulting in a lot of new branching between Linux and the rest of the Unix support; there's a lot that was shared in src/runtime/signal_unix.go that's now different. If you're reading this and can think of other caveats or blockers that might keep that contribution out of the tree, or reasons that branching wouldn't be allowed, I'd be interested to hear and discuss them. Thanks!

rhysh

comment created time in 2 months

pull request commenttwitchtv/twirp

Add support for TooManyRequests (429) error code

Thank you for the survey, @thesilentg .

My read of the results are that having ResourceExhausted map to something other than 429 is a persistent source of confusion.

· Some users specifically want a 429 status on the wire, and go out of their way to build their own (using ResourceExhausted as the trigger). · Some users don't care how they get a 403 status on the wire, and are confused that there are two options. · Some users don't see a way to communicate the idea of a 429, so make something else up instead.

What would a new Twirp status code for "please send fewer requests" say about uses of the old one? MDN is pretty clear on 429 being "the user has sent too many requests in a given amount of time" and 403 being "access is permanently forbidden", but the RFC for 403 is less clear (it doesn't say "permanent", and does say "a request might be forbidden for reasons unrelated to the credentials"). Twirp's description of ResourceExhausted doesn't mention time one way or another, saying "some resource has been exhausted".

I think that @marioizquierdo 's compatibility analysis is technically correct (the best kind), but I don't agree with the implications of how important various points are. For example, do we know of any code that would ignore a Twirp status code in the message body and instead calculate its own Twirp status code based on the HTTP status (as if the response were from an intermediary)? If that code were to trigger today, per the protocol spec it would convert ResourceExhausted into PermissionDenied every time, even with no change here. It would convert InvalidArgument, Malformed, and OutOfRange into Internal. It would convert Internal into Unknown. I don't think earning a "❌ " from breaking that code further (by changing the remapping of ResourceExhausted from PermissionDenied to Unavailable) says much about the proposed change.

How much of the problem could be solved by clarifying the meaning of ResourceExhausted in documentation? That would cover goals 1 and 3 very well. It doesn't cover goal 2, to be consistent with other web/http standards. Can you say more about why that goal is important @thesilentg ?

thesilentg

comment created time in 2 months

issue closedgolang/go

runtime/pprof: under-samples work on short-lived threads

What version of Go are you using (go version)?

<pre> $ go version go version go1.15rc1 darwin/amd64

$ uname -a Linux ip-172-31-18-196.us-west-2.compute.internal 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux </pre>

Does this issue reproduce with the latest release?

Yes this is present in go1.13.14 and go1.14.6, though with slightly different shapes. When I ran with Go 1.13, the profile contained zero samples. When I ran with Go 1.14, the process took about 3 seconds longer to run (7.8s vs 5s) and included 2.2s worth of samples for the function of interest. (Though if this worked as I expect, the profiles would cover the same amount of time as the "time" shell built-in reported.)

What operating system and processor architecture are you using (go env)?

<details><summary><code>go env</code> Output</summary><br><pre> $ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/usr/local/go" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build464588215=/tmp/go-build -gno-record-gcc-switches -fno-common"

</pre></details>

What did you do?

On a linux/amd64 machine, I ran a program that does 5ms of work at a time on each of 1000 worker goroutines, and uses runtime.LockOSThread and runtime.Goexit to force each worker goroutine to be on a different OS thread.

package main

import (
	"flag"
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/pprof"
	"sync"
	"time"
)

func main() {
	profileName := flag.String("profile", "./hops.pb.gz", "File name for CPU profile")
	tickTime := flag.Duration("tick", 5*time.Millisecond, "How much work to run on each thread")
	iterations := flag.Int("iterations", 1000, "Number of threads to use for work")
	flag.Parse()

	pf, err := os.Create(*profileName)
	if err != nil {
		log.Fatalf("Create; err = %v", err)
	}
	defer func() {
		err := pf.Close()
		if err != nil {
			log.Fatalf("Close; err = %v", err)
		}
	}()
	err = pprof.StartCPUProfile(pf)
	if err != nil {
		log.Fatalf("StartCPUProfile; err = %v", err)
	}
	defer pprof.StopCPUProfile()

	tick := time.Tick(*tickTime)

	var p int64 = 1
	var wg sync.WaitGroup
	for i := 0; i < *iterations; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()

			// Have this code run on its own thread, force the runtime to
			// terminate the thread when this goroutine completes (so the next
			// run will be on a different thread).
			runtime.LockOSThread()
			defer runtime.Goexit()

			for {
				select {
				case <-tick:
					return
				default:
				}

				work(&p)
			}
		}()
		wg.Wait()
	}

	fmt.Printf("%d\n", p)
}

func work(p *int64) {
	for i := 0; i < 1e5; i++ {
		*p *= 3
	}
}

What did you expect to see?

I expected the CPU time reported by the shell's time built-in and the CPU time reported by go tool pprof to roughly match.

What did you see instead?

When doing 5ms of work on each of 1000 unique threads, the shell's time built-in reports 5103ms of user-space CPU time, and go tool pprof reports only 50ms of CPU time.

$ time ./hops.linux_amd64 -profile=./hops.pb.gz -tick=5ms -iterations=1000
-6036671839377369855

real	0m5.105s
user	0m5.103s
sys	0m0.085s

$ go tool pprof -top ./hops.pb.gz
File: hops.linux_amd64
Type: cpu
Time: Jul 27, 2020 at 2:04pm (PDT)
Duration: 5.10s, Total samples = 50ms ( 0.98%)
Showing nodes accounting for 50ms, 100% of 50ms total
      flat  flat%   sum%        cum   cum%
      30ms 60.00% 60.00%       30ms 60.00%  main.work
      10ms 20.00% 80.00%       10ms 20.00%  runtime.(*randomEnum).next (inline)
      10ms 20.00%   100%       10ms 20.00%  runtime.futex
         0     0%   100%       10ms 20.00%  main.main
         0     0%   100%       30ms 60.00%  main.main.func2
         0     0%   100%       10ms 20.00%  runtime.convT64
         0     0%   100%       10ms 20.00%  runtime.findrunnable
         0     0%   100%       10ms 20.00%  runtime.futexsleep
         0     0%   100%       10ms 20.00%  runtime.gcBgMarkStartWorkers
         0     0%   100%       10ms 20.00%  runtime.gcStart
         0     0%   100%       10ms 20.00%  runtime.main
         0     0%   100%       10ms 20.00%  runtime.mallocgc
         0     0%   100%       10ms 20.00%  runtime.mstart
         0     0%   100%       10ms 20.00%  runtime.mstart1
         0     0%   100%       10ms 20.00%  runtime.notetsleep_internal
         0     0%   100%       10ms 20.00%  runtime.notetsleepg
         0     0%   100%       10ms 20.00%  runtime.schedule

closed time in 2 months

rhysh

issue closedgolang/go

runtime/pprof: under-samples all work when runtime creates threads

What version of Go are you using (go version)?

<pre> $ go1.15 version go version go1.15 darwin/amd64

$ uname -a Linux ip-172-31-18-196.us-west-2.compute.internal 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ cat /boot/config* | grep HIGH_RES_TIMERS CONFIG_HIGH_RES_TIMERS=y </pre>

I'm cross-compiling to GOOS=linux GOARCH=amd64 for the test.

Does this issue reproduce with the latest release?

Yes, this is present in Go 1.15 which is currently the latest release.

What operating system and processor architecture are you using (go env)?

Here's the view from my laptop; the bug is on linux/amd64.

<details><summary><code>go env</code> Output</summary><br><pre> $ go1.15 env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/Users/rhys/go/version/go1.15" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/Users/rhys/go/version/go1.15/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build016610480=/tmp/go-build -gno-record-gcc-switches -fno-common" </pre></details>

What did you do?

This program does work in a single goroutine and collects a CPU profile of itself. It periodically forces a thread to be created to do a trivial amount of processing in a separate goroutine.

I altered the period between new thread creations to be less than the CPU profiling rate (100 Hz, 10ms), somewhat longer than that period, and much longer than that period.

The code for the test is in the "details" immediately below:

<details><summary>bumps.go</summary>

package main

import (
	"context"
	"flag"
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/pprof"
	"sync"
	"time"
)

func main() {
	profileName := flag.String("profile", "./bumps.pb.gz", "File name for CPU profile")
	tickTime := flag.Duration("tick", 5*time.Millisecond, "How long to work before creating a new thread")
	duration := flag.Duration("duration", 5*time.Second, "Total duration of the test")
	flag.Parse()

	pf, err := os.Create(*profileName)
	if err != nil {
		log.Fatalf("Create; err = %v", err)
	}
	defer func() {
		err := pf.Close()
		if err != nil {
			log.Fatalf("Close; err = %v", err)
		}
	}()
	err = pprof.StartCPUProfile(pf)
	if err != nil {
		log.Fatalf("StartCPUProfile; err = %v", err)
	}
	defer pprof.StopCPUProfile()

	ctx, cancel := context.WithTimeout(context.Background(), *duration)
	defer cancel()

	tick := time.Tick(*tickTime)

	var p int64 = 1
	workUntilTick := func() {
		for {
			select {
			case <-tick:
				return
			default:
			}

			work(&p)
		}
	}

	// All work takes place on goroutine 1. Lock it to the current thread so
	// we're sure the
	runtime.LockOSThread()

	var wg sync.WaitGroup
	for ctx.Err() == nil {
		workUntilTick()

		wg.Add(1)
		go func() {
			defer wg.Done()

			// Have this code run on its own thread, force the runtime to
			// terminate the thread when this goroutine completes (so the next
			// run will be on a different thread).
			runtime.LockOSThread()
			defer runtime.Goexit()

			// Yield for a moment, try to trigger a call to runtime·execute on
			// this thread.
			runtime.Gosched()
		}()
		wg.Wait()
	}

	fmt.Printf("%d\n", p)
}

func work(p *int64) {
	for i := 0; i < 1e5; i++ {
		*p *= 3
	}
}

</details>

What did you expect to see?

I expected the profile to show the CPU cost of the program's work. I expected it to show the CPU cost of the work equally well as I varied the "tick" parameter: without regard to how often the program created new OS threads.

What did you see instead?

When the program creates threads infrequently, its CPU profile reports a number of samples (each representing 10ms of work) that corresponds to the OS's view of how much CPU time the program spent.

When the program creates threads every 20ms, the resulting CPU profile has about half as many 10ms samples as I'd expect.

When the program creates threads faster than it earns SIGPROF deliveries (less than 10ms), the CPU profile ends up with almost no samples.

I think the problem is that runtime.execute calls runtime.setThreadCPUProfiler when the current thread's profilehz is different from what the program requests, which leads to a setitimer syscall to adjust the profiling rate for the entire process. The test program creates new threads while the program is being profiled, and because they've never before run Go code their profilehz values are still 0 .. and that the result of the setitimer syscall they make is to reset the kernel's account of how much CPU time the process has spent.

$ time ./bumps -tick=1s
40153154464199553

real	0m5.118s
user	0m4.978s
sys	0m0.036s
$ go tool pprof -top /tmp/bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:05pm (PDT)
Duration: 5.11s, Total samples = 4.97s (97.19%)
Showing nodes accounting for 4.92s, 98.99% of 4.97s total
Dropped 14 nodes (cum <= 0.02s)
      flat  flat%   sum%        cum   cum%
     4.92s 98.99% 98.99%      4.93s 99.20%  main.work
         0     0% 98.99%      4.93s 99.20%  main.main
         0     0% 98.99%      4.93s 99.20%  main.main.func2
         0     0% 98.99%      4.93s 99.20%  runtime.main
         0     0% 98.99%      0.04s   0.8%  runtime.mstart
         0     0% 98.99%      0.04s   0.8%  runtime.mstart1
         0     0% 98.99%      0.03s   0.6%  runtime.sysmon

$ time ./bumps -tick=20ms
-5752685429589199231

real	0m5.135s
user	0m5.057s
sys	0m0.008s
$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:06pm (PDT)
Duration: 5.13s, Total samples = 2.68s (52.25%)
Showing nodes accounting for 2.66s, 99.25% of 2.68s total
Dropped 8 nodes (cum <= 0.01s)
      flat  flat%   sum%        cum   cum%
     2.65s 98.88% 98.88%      2.65s 98.88%  main.work
     0.01s  0.37% 99.25%      0.02s  0.75%  runtime.findrunnable
         0     0% 99.25%      2.65s 98.88%  main.main
         0     0% 99.25%      2.65s 98.88%  main.main.func2
         0     0% 99.25%      2.65s 98.88%  runtime.main
         0     0% 99.25%      0.02s  0.75%  runtime.mstart
         0     0% 99.25%      0.02s  0.75%  runtime.mstart1
         0     0% 99.25%      0.03s  1.12%  runtime.schedule

$ time ./bumps -tick=3ms
7665148222646782337

real	0m5.007s
user	0m4.977s
sys	0m0.344s
$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:08pm (PDT)
Duration: 5s, Total samples = 0 
Showing nodes accounting for 0, 0% of 0 total
      flat  flat%   sum%        cum   cum%

closed time in 2 months

rhysh

issue commentgolang/go

runtime/pprof: under-samples work on short-lived threads

It looks like this is fixed in tip, likely thanks to CL 240003 / commit bd519d0c8734c3e30cb1a8b8217dd9934cd61e25. The pprof output claims "Total samples = 5.11s", which is a close match to the "user 0m5.106s" that the time shell built-in reports.

$ go-tip version ./hops
./hops: devel +9679b30733 Fri Aug 21 16:52:08 2020 +0000

$ time ./hops -tick=5ms -iterations=1000
-8662753420010608383

real	0m5.132s
user	0m5.106s
sys	0m0.097s

$ go tool pprof -top /tmp/hops.pb.gz 
File: hops
Type: cpu
Time: Aug 21, 2020 at 1:11pm (PDT)
Duration: 5.13s, Total samples = 5.11s (99.65%)
Showing nodes accounting for 4.99s, 97.65% of 5.11s total
Dropped 47 nodes (cum <= 0.03s)
      flat  flat%   sum%        cum   cum%
     4.87s 95.30% 95.30%      4.87s 95.30%  main.work
     0.06s  1.17% 96.48%      0.06s  1.17%  runtime.rtsigprocmask
     0.04s  0.78% 97.26%      0.04s  0.78%  runtime.runqgrab
     0.01s   0.2% 97.46%      0.05s  0.98%  runtime.runqsteal
     0.01s   0.2% 97.65%      0.12s  2.35%  runtime.schedule
         0     0% 97.65%      4.88s 95.50%  main.main.func2
         0     0% 97.65%      0.10s  1.96%  runtime.findrunnable
         0     0% 97.65%      0.05s  0.98%  runtime.minit
         0     0% 97.65%      0.04s  0.78%  runtime.minitSignalMask
         0     0% 97.65%      0.04s  0.78%  runtime.minitSignals
         0     0% 97.65%      0.20s  3.91%  runtime.mstart
         0     0% 97.65%      0.18s  3.52%  runtime.mstart1
         0     0% 97.65%      0.06s  1.17%  runtime.sigprocmask (inline)
rhysh

comment created time in 2 months

issue commentgolang/go

runtime/pprof: under-samples all work when runtime creates threads

Yes, this works as I'd expect in tip. CL 240003 / commit bd519d0c8734c3e30cb1a8b8217dd9934cd61e25 look like they would be what fixed it. Thanks!

$ go-tip version ./bumps
./bumps: devel +9679b30733 Fri Aug 21 16:52:08 2020 +0000

$ time ./bumps -tick=3ms
-6275116732956957439

real	0m5.128s
user	0m5.149s
sys	0m0.209s

$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:56pm (PDT)
Duration: 5.12s, Total samples = 5.20s (101.49%)
Showing nodes accounting for 5.11s, 98.27% of 5.20s total
Dropped 26 nodes (cum <= 0.03s)
      flat  flat%   sum%        cum   cum%
     4.84s 93.08% 93.08%      4.84s 93.08%  main.work
     0.12s  2.31% 95.38%      0.12s  2.31%  runtime.futex
     0.07s  1.35% 96.73%      0.07s  1.35%  runtime.runqgrab
     0.02s  0.38% 97.12%      0.03s  0.58%  runtime.checkTimers
     0.02s  0.38% 97.50%      0.03s  0.58%  runtime.lock2
     0.02s  0.38% 97.88%      0.09s  1.73%  runtime.runqsteal
     0.01s  0.19% 98.08%      0.18s  3.46%  runtime.findrunnable
     0.01s  0.19% 98.27%      0.10s  1.92%  runtime.notesleep
         0     0% 98.27%      4.84s 93.08%  main.main
         0     0% 98.27%      4.84s 93.08%  main.main.func2
         0     0% 98.27%      0.09s  1.73%  runtime.futexsleep
         0     0% 98.27%      0.03s  0.58%  runtime.futexwakeup
         0     0% 98.27%      0.03s  0.58%  runtime.lock (inline)
         0     0% 98.27%      0.03s  0.58%  runtime.lockWithRank (inline)
         0     0% 98.27%      4.84s 93.08%  runtime.main
         0     0% 98.27%      0.11s  2.12%  runtime.mcall
         0     0% 98.27%      0.23s  4.42%  runtime.mstart
         0     0% 98.27%      0.23s  4.42%  runtime.mstart1
         0     0% 98.27%      0.03s  0.58%  runtime.notewakeup
         0     0% 98.27%      0.10s  1.92%  runtime.park_m
         0     0% 98.27%      0.33s  6.35%  runtime.schedule
         0     0% 98.27%      0.04s  0.77%  runtime.startlockedm
         0     0% 98.27%      0.10s  1.92%  runtime.stoplockedm
         0     0% 98.27%      0.03s  0.58%  runtime.stopm
rhysh

comment created time in 2 months

issue commentgolang/go

runtime/pprof: under-samples work on short-lived threads

I'm curious if we know why this issue happens?

I think it's the call from runtime.execute to runtime.setThreadCPUProfiler to setitimer(2); each M makes that call, and this test program ensures a steady supply of new Ms. It looks like the result of the setitimer syscall is to reset the count for the whole process. I filed #40963 for that flavor of the problem.

rhysh

comment created time in 2 months

issue openedgolang/go

runtime/pprof: under-samples all work when runtime creates threads

What version of Go are you using (go version)?

<pre> $ go1.15 version go version go1.15 darwin/amd64 </pre>

I'm cross-compiling to GOOS=linux GOARCH=amd64 for the test.

Does this issue reproduce with the latest release?

Yes, this is present in Go 1.15 which is currently the latest release.

What operating system and processor architecture are you using (go env)?

Here's the view from my laptop; the bug is on linux/amd64.

<details><summary><code>go env</code> Output</summary><br><pre> $ go1.15 env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/Users/rhys/go/version/go1.15" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/Users/rhys/go/version/go1.15/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build016610480=/tmp/go-build -gno-record-gcc-switches -fno-common" </pre></details>

What did you do?

This program does work in a single goroutine and collects a CPU profile of itself. It periodically forces a thread to be created to do a trivial amount of processing in a separate goroutine.

I altered the period between new thread creations to be less than the CPU profiling rate (100 Hz, 10ms), somewhat longer than that period, and much longer than that period.

The code for the test is in the "details" immediately below:

<details><summary>bumps.go</summary>

package main

import (
	"context"
	"flag"
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/pprof"
	"sync"
	"time"
)

func main() {
	profileName := flag.String("profile", "./bumps.pb.gz", "File name for CPU profile")
	tickTime := flag.Duration("tick", 5*time.Millisecond, "How long to work before creating a new thread")
	duration := flag.Duration("duration", 5*time.Second, "Total duration of the test")
	flag.Parse()

	pf, err := os.Create(*profileName)
	if err != nil {
		log.Fatalf("Create; err = %v", err)
	}
	defer func() {
		err := pf.Close()
		if err != nil {
			log.Fatalf("Close; err = %v", err)
		}
	}()
	err = pprof.StartCPUProfile(pf)
	if err != nil {
		log.Fatalf("StartCPUProfile; err = %v", err)
	}
	defer pprof.StopCPUProfile()

	ctx, cancel := context.WithTimeout(context.Background(), *duration)
	defer cancel()

	tick := time.Tick(*tickTime)

	var p int64 = 1
	workUntilTick := func() {
		for {
			select {
			case <-tick:
				return
			default:
			}

			work(&p)
		}
	}

	// All work takes place on goroutine 1. Lock it to the current thread so
	// we're sure the
	runtime.LockOSThread()

	var wg sync.WaitGroup
	for ctx.Err() == nil {
		workUntilTick()

		wg.Add(1)
		go func() {
			defer wg.Done()

			// Have this code run on its own thread, force the runtime to
			// terminate the thread when this goroutine completes (so the next
			// run will be on a different thread).
			runtime.LockOSThread()
			defer runtime.Goexit()

			// Yield for a moment, try to trigger a call to runtime·execute on
			// this thread.
			runtime.Gosched()
		}()
		wg.Wait()
	}

	fmt.Printf("%d\n", p)
}

func work(p *int64) {
	for i := 0; i < 1e5; i++ {
		*p *= 3
	}
}

</details>

What did you expect to see?

I expected the profile to show the CPU cost of the program's work. I expected it to show the CPU cost of the work equally well as I varied the "tick" parameter: without regard to how often the program created new OS threads.

What did you see instead?

When the program creates threads infrequently, its CPU profile reports a number of samples (each representing 10ms of work) that corresponds to the OS's view of how much CPU time the program spent.

When the program creates threads every 20ms, the resulting CPU profile has about half as many 10ms samples as I'd expect.

When the program creates threads faster than it earns SIGPROF deliveries (less than 10ms), the CPU profile ends up with almost no samples.

I think the problem is that runtime.execute calls runtime.setThreadCPUProfiler when the current thread's profilehz is different from what the program requests, which leads to a setitimer syscall to adjust the profiling rate for the entire process. The test program creates new threads while the program is being profiled, and because they've never before run Go code their profilehz values are still 0 .. and that the result of the setitimer syscall they make is to reset the kernel's account of how much CPU time the process has spent.

$ time ./bumps -tick=1s
40153154464199553

real	0m5.118s
user	0m4.978s
sys	0m0.036s
$ go tool pprof -top /tmp/bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:05pm (PDT)
Duration: 5.11s, Total samples = 4.97s (97.19%)
Showing nodes accounting for 4.92s, 98.99% of 4.97s total
Dropped 14 nodes (cum <= 0.02s)
      flat  flat%   sum%        cum   cum%
     4.92s 98.99% 98.99%      4.93s 99.20%  main.work
         0     0% 98.99%      4.93s 99.20%  main.main
         0     0% 98.99%      4.93s 99.20%  main.main.func2
         0     0% 98.99%      4.93s 99.20%  runtime.main
         0     0% 98.99%      0.04s   0.8%  runtime.mstart
         0     0% 98.99%      0.04s   0.8%  runtime.mstart1
         0     0% 98.99%      0.03s   0.6%  runtime.sysmon

$ time ./bumps -tick=20ms
-5752685429589199231

real	0m5.135s
user	0m5.057s
sys	0m0.008s
$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:06pm (PDT)
Duration: 5.13s, Total samples = 2.68s (52.25%)
Showing nodes accounting for 2.66s, 99.25% of 2.68s total
Dropped 8 nodes (cum <= 0.01s)
      flat  flat%   sum%        cum   cum%
     2.65s 98.88% 98.88%      2.65s 98.88%  main.work
     0.01s  0.37% 99.25%      0.02s  0.75%  runtime.findrunnable
         0     0% 99.25%      2.65s 98.88%  main.main
         0     0% 99.25%      2.65s 98.88%  main.main.func2
         0     0% 99.25%      2.65s 98.88%  runtime.main
         0     0% 99.25%      0.02s  0.75%  runtime.mstart
         0     0% 99.25%      0.02s  0.75%  runtime.mstart1
         0     0% 99.25%      0.03s  1.12%  runtime.schedule

$ time ./bumps -tick=3ms
7665148222646782337

real	0m5.007s
user	0m4.977s
sys	0m0.344s
$ go tool pprof -top ./bumps.pb.gz 
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:08pm (PDT)
Duration: 5s, Total samples = 0 
Showing nodes accounting for 0, 0% of 0 total
      flat  flat%   sum%        cum   cum%

created time in 2 months

pull request commenttwitchtv/twirp

Add support for TooManyRequests (429) error code

The example @marioizquierdo shared from grpc-gateway makes it look like "resource exhausted" is synonymous with "too many requests"—that an API responding with error code codes.ResourceExhausted means the gateway converts it unconditionally to HTTP status 429. Does that match how you read it?

@thesilentg , can you share what Google (as a notable gRPC user) does to communicate that a caller is exceeding their rate limit, either for public APIs (maps?) or for GCP? Do they send codes.ResourceExhausted alone, do they send that code but with meta info that spells out the rest of the story, or do they do something else? What other examples can you find for how API providers that ship type-safe standardized/generated SDKs communicate that condition?

What examples are there of Twirp services communicating the idea of "too many requests" to their callers today? What status codes do they use, what additional information do they add to the response?

What examples are there of Twirp services returning the status code twirp.ResourceExhausted to callers today? What do they intend to communicate when they return that response?

Adding a new status code is a wire protocol change. Changing the HTTP code mapping is also a wire protocol change—and it's one of the details of HTTP that Twirp allows developers to ignore. If changes to the wire protocol are on the table, let's consider that whole set of options.

thesilentg

comment created time in 2 months

issue commentgolang/go

runtime/pprof: Linux CPU profiles inaccurate beyond 250% CPU use

Work that comes in bursts—causing the process to spend more than 10ms of CPU time in one kernel tick—is systematically under-sampled.

I have a real-world example of this type of work: the garbage collection done by the runtime. One of the apps I work with runs on a machine with a large number of hyperthreads, but typically does only a small amount of application-specific work. A CPU profile (from runtime/pprof.StartCPUProfile) that covers several GC runs shows about 4% of the process's CPU time is spent within runtime.gcBgMarkWorker. An execution trace (from runtime/trace.Start) that covers several GC runs shows (in the "Goroutine analysis" view) that the runtime.gcBgMarkWorker goroutines accounted for about 20% of the total program execution time.

The 4% vs 20% figures don't account for edge effects (the trace covers 3 GC runs and 4 long periods of application work), or for cases where the Go runtime believed it had scheduled a goroutine to a thread but the kernel had in fact suspended the thread. But a 5x difference in the two reports is significant, it aligns with the behavior I've described in this issue, and it's in a workload that affects any Go program that 1) uses the GC, 2) has many hyperthreads available for use, 3) is provisioned to use less than 100% CPU.

rhysh

comment created time in 3 months

issue openedgolang/go

x/xerrors: tests broken (via vet error) with go1.15rc1

What version of Go are you using (go version)?

<pre> $ go1.15 version go version go1.15rc1 darwin/amd64 $ go1.14 version go version go1.14.6 darwin/amd64 </pre>

Does this issue reproduce with the latest release?

The issue is present in the latest release candidate, go1.15rc1. It is a regression from the latest release, go1.14.6.

What operating system and processor architecture are you using (go env)?

<details><summary><code>go env</code> Output</summary><br><pre> $ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/usr/local/go" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/tmp/xerr/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build982147191=/tmp/go-build -gno-record-gcc-switches -fno-common" </pre></details>

What did you do?

$ mkdir /tmp/xerr && cd /tmp/xerr && go mod init xerr
go: creating new go.mod: module xerr

$ go get golang.org/x/xerrors@master
go: golang.org/x/xerrors master => v0.0.0-20191204190536-9bdfabe68543

$ go1.15 test golang.org/x/xerrors
# golang.org/x/xerrors_test
/Users/rhys/go/pkg/mod/golang.org/x/xerrors@v0.0.0-20191204190536-9bdfabe68543/fmt_test.go:398:43: Sprint arg e causes recursive call to Error method
FAIL	golang.org/x/xerrors [build failed]
FAIL

$ go1.14 test golang.org/x/xerrors
ok  	golang.org/x/xerrors	0.302s

$ go1.15 version
go version go1.15rc1 darwin/amd64

$ go1.14 version
go version go1.14.6 darwin/amd64

What did you expect to see?

I expected the tests for the golang.org/x/errors package to succeed without errors.

What did you see instead?

A new vet check in Go 1.15, enabled by default when running tests, detects what looks like a bug in a type that x/xerrors defines for the purpose of testing itself. https://github.com/golang/xerrors/blob/9bdfabe68543/fmt_test.go#L398

created time in 3 months

more