profile
viewpoint

dgl/cgiirc 47

CGI:IRC web based IRC client

dgl/cpangrep 37

Search code on CPAN with Regexps. No longer maintained, I suggest using http://grep.metacpan.org/ instead.

dgl/javascript-v8 24

Perl interface to V8 JavaScript engine

dgl/AnyEvent-Redis 5

Asynchronous Redis client

dgl/alertmanager-webhook-signald 4

Alertmanager webhook server for Signald

dgl/ircd_exporter 2

Prometheus exporter for IRC server state

dgl/App-redisp 1

Perl and Redis REPL style shell

dgl/circ 1

An IRC packaged chrome app

dgl/cpanminus 1

cpanminus - get, unpack, build and install modules from CPAN

issue commentprometheus/alertmanager

prometheus dont support use labels of rule in annotations?

Hi @szediktam, I've got the same issues as yours, would you mind sharing some information about this? Did you solve it by some configurations, or whatever?

szediktam

comment created time in 22 minutes

issue commentprometheus/prometheus

Proposal: Completely remove series after deleting and cleaning

@roidelapluie We're seeing the same behaviour. I deleted the series but it still pops up in the suggestion box.

trallnag

comment created time in 2 hours

pull request commentprometheus/prometheus

CNAME responses can occur with "Type: A" dns_sd_config requests

Looks like all tests passed, and the proper text was updated, @brian-brazil. Please let me know if there's anything further needed on this PR.

mattberther

comment created time in 3 hours

issue openedprometheus/prometheus

Prometheus raise out of bounds error for all target after resume the system from a suspend

What did you do?

After suspend the system and resume again, prometheus report following error and can not scrape any new metrics, unless restart the Prometheus service.

What did you expect to see?

promtheus should continue to scrape new metrics.

What did you see instead? Under which circumstances?

check the log above. and in the webui, i got following

image

Environment

  • System information:
Linux t470p 5.4.79-1-lts #1 SMP Sun, 22 Nov 2020 14:22:21 +0000 x86_64 GNU/Linux
  • Prometheus version:
prometheus, version 2.22.2 (branch: tarball, revision: 2.22.2)
  build user:       someone@builder
  build date:       20201117-18:44:08
  go version:       go1.15.5
  platform:         linux/amd64
# prometheus.service

# /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus service
Requires=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Restart=on-failure
WorkingDirectory=/usr/share/prometheus
EnvironmentFile=-/etc/conf.d/prometheus
ExecStart=/usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/dat>
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65535
NoNewPrivileges=true
ProtectHome=true
ProtectSystem=full
ProtectHostname=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
LockPersonality=true
RestrictRealtime=yes
RestrictNamespaces=yes
MemoryDenyWriteExecute=yes
PrivateDevices=yes
CapabilityBoundingSet=

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/prometheus.service.d/prometheus.conf
[Service]
ExecStart=
ExecStart=/usr/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/home/prometheus $PROME>
ProtectHome=False
  • Prometheus configuration file:
---
global:
  scrape_interval: 300s
  evaluation_interval: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093
scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets:
          - localhost:9100
        labels:
          uid: t470
  • Logs:
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.714Z caller=head.go:889 component=tsdb msg="WAL checkpoint complete" first=1668 last=1669 duration=32.511167ms
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.763Z caller=head.go:809 component=tsdb msg="Head GC completed" duration=739.34µs
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.812Z caller=head.go:809 component=tsdb msg="Head GC completed" duration=802.06µs
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.812Z caller=checkpoint.go:96 component=tsdb msg="Creating checkpoint" from_segment=1670 to_segment=1671 mint=1606780800000
Dec 01 09:46:29 t470p prometheus[1629652]: level=info ts=2020-12-01T01:46:29.822Z caller=head.go:889 component=tsdb msg="WAL checkpoint complete" first=1670 last=1671 duration=10.152588ms
Dec 01 09:46:47 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:47.293Z caller=scrape.go:1378 component="scrape manager" scrape_pool=ssh target="http://127.0.0.1:9115/probe?module=ssh_banner&target=172.20.149.141%3A22" msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=6
Dec 01 09:46:47 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:47.293Z caller=scrape.go:1145 component="scrape manager" scrape_pool=ssh target="http://127.0.0.1:9115/probe?module=ssh_banner&target=xxxx%3A22" msg="Append failed" err="out of bounds"
Dec 01 09:46:47 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:47.293Z caller=scrape.go:1094 component="scrape manager" scrape_pool=ssh target="http://127.0.0.1:9115/probe?module=ssh_banner&target=xxx%3A22" msg="Appending scrape report failed" err="out of bounds"
Dec 01 09:46:56 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:56.162Z caller=scrape.go:1378 component="scrape manager" scrape_pool=blackbox target="http://127.0.0.1:9115/probe?module=http_2xx&target=http%3A%2F%2Fxxx" msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=17
Dec 01 09:46:56 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:56.162Z caller=scrape.go:1145 component="scrape manager" scrape_pool=blackbox target="http://127.0.0.1:9115/probe?module=http_2xx&target=http%3A%2F%2Fwww.baidu.com" msg="Append failed" err="out of bounds"
Dec 01 09:46:56 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:46:56.162Z caller=scrape.go:1094 component="scrape manager" scrape_pool=blackbox target="http://127.0.0.1:9115/probe?module=http_2xx&target=http%3A%2F%2Fwww.baidu.com" msg="Appending scrape report failed" err="out of bounds"
Dec 01 09:47:01 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:47:01.836Z caller=scrape.go:1378 component="scrape manager" scrape_pool=gitea target=http://localhost:10080/metrics msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=67
Dec 01 09:47:01 t470p prometheus[1629652]: level=warn ts=2020-12-01T01:47:01.836Z caller=scrape.go:1145 component="scrape manager" scrape_pool=gitea target=http://localhost:10080/metrics msg="Append failed" err="out of bounds"

created time in 3 hours

Pull request review commentprometheus/prometheus

CNAME responses can occur when with "Type: A" dns_sd_config requests

 func (d *Discovery) refreshOne(ctx context.Context, name string, ch chan<- *targ 			target = hostPort(addr.A.String(), d.port) 		case *dns.AAAA: 			target = hostPort(addr.AAAA.String(), d.port)+		case *dns.CNAME:+			// Ignore to prevent warning message from default case.

I misunderstood what you meant by comment- I thought you were referring to the title of the PR. My mistake, I'll get it adjusted.

mattberther

comment created time in 4 hours

Pull request review commentprometheus/prometheus

CNAME responses can occur when with "Type: A" dns_sd_config requests

 func (d *Discovery) refreshOne(ctx context.Context, name string, ch chan<- *targ 			target = hostPort(addr.A.String(), d.port) 		case *dns.AAAA: 			target = hostPort(addr.AAAA.String(), d.port)+		case *dns.CNAME:+			// Ignore to prevent warning message from default case.

This comment is still not useful. I can already tell this from the code alone, instead explain why this is the right thing to do.

mattberther

comment created time in 4 hours

pull request commentprometheus/prometheus

CNAME responses can occur when with "Type: A" dns_sd_config requests

I've kicked it off.

mattberther

comment created time in 4 hours

pull request commentprometheus/prometheus

CNAME responses can occur when with "Type: A" dns_sd_config requests

@brian-brazil I've made the proposed changes. There seems to be a circleci test that is failing. I'm not expecting that my change of an error message caused the described failure (since the pipeline passed on the initial PR):

Failed
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492: 
        	Error Trace:	web_test.go:492
        	            				asm_amd64.s:1374
        	Error:      	Received unexpected error:
        	            	Post "http://localhost:9090/-/quit": EOF
        	Test:       	TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.01s)

However, I see no way of being able to re-run the workflow (presumably because i do not have write access). Is this something that you can kick off, or is there another way for me to re-run the workflow?

mattberther

comment created time in 4 hours

issue closedprometheus/prometheus

Prometheus hangs without log message

What did you do?

$ while curl 10.10.2.4:9090/-/healthy; do date; done
Sun May 31 18:18:11 UTC 2020
Prometheus is Healthy.
Sun May 31 18:18:11 UTC 2020
Prometheus is Healthy.
Sun May 31 18:18:11 UTC 2020
Prometheus is Healthy.
Sun May 31 18:18:29 UTC 2020
Prometheus is Healthy.
Sun May 31 18:18:29 UTC 2020
Prometheus is Healthy.
Sun May 31 18:18:29 UTC 2020

What did you expect to see?

Either a healthy message at least once every 2 seconds or a warning message in the Prometheus logs.

What did you see instead? Under which circumstances?

Prometheus unresponsive, here for 18 seconds, later for over 30 seconds (long enough to get it killed), and nothing in the Prometheus log between "Server is ready to receive web requests" and "Received SIGTERM, exiting gracefully".

Prometheus is idle, with compaction disabled because it is running Thanos sidecar to upload data to S3.

Prometheus has 1 full CPU and 4Gi memory allocated, and no indication it is using more than 1.5Gi or being killed because the node is OOM. This is a quiet cluster with total allocated memory Limits lower than total available memory.

Environment

Prometheus 2.18.1 on Kubernetes 1.15.10 EKS. Running 2 replicas. Both replicas (on separate nodes) exhibiting the same behavior).

  • System information:
$ uname -srm
Linux 4.14.165-133.209.amzn2.x86_64 x86_64
  • Prometheus version:
$ prometheus --version
prometheus, version 2.18.1 (branch: HEAD, revision: ecee9c8abfd118f139014cb1b174b08db3f342cf)
  build user:       root@2117a9e64a7e
  build date:       20200507-16:51:47
  go version:       go1.14.2
  • Prometheus configuration file: <details><summary>Click to reveal</summary>
global:
  evaluation_interval: 30s
  scrape_interval: 30s
  external_labels:
    cluster: test
    prometheus: monitoring/po-prometheus
    prometheus_replica: prometheus-po-prometheus-1

plus jobs and rules from CoreOS Prometheus Operator </details>

  • Prometheus command line args: <details><summary>Click to reveal</summary>
      args:
        - '--web.console.templates=/etc/prometheus/consoles'
        - '--web.console.libraries=/etc/prometheus/console_libraries'
        - '--config.file=/etc/prometheus/config_out/prometheus.env.yaml'
        - '--storage.tsdb.path=/prometheus'
        - '--storage.tsdb.retention.time=7h'
        - '--web.enable-lifecycle'
        - '--storage.tsdb.no-lockfile'
        - '--web.enable-admin-api'
        - '--web.external-url=https://prometheus.redacted.com'
        - '--web.route-prefix=/'
        - '--log.format=json'
        - '--storage.tsdb.max-block-duration=2h'

</details>

  • Logs: <details><summary>Log extract (omissions noted with "snip")</summary>

{"caller":"main.go:337","level":"info","msg":"Starting Prometheus","ts":"2020-05-31T18:45:54.980Z","version":"(version=2.18.1, branch=HEAD, revision=ecee9c8abfd118f139014cb1b174b08db3f342cf)"}
{"build_context":"(go=go1.14.2, user=root@2117a9e64a7e, date=20200507-16:51:47)","caller":"main.go:338","level":"info","ts":"2020-05-31T18:45:54.980Z"}
{"caller":"main.go:339","host_details":"(Linux 4.14.165-133.209.amzn2.x86_64 #1 SMP Sun Feb 9 00:21:30 UTC 2020 x86_64 prometheus-po-prometheus-0 (none))","level":"info","ts":"2020-05-31T18:45:54.980Z"}
{"caller":"main.go:340","fd_limits":"(soft=65536, hard=65536)","level":"info","ts":"2020-05-31T18:45:54.981Z"}
{"caller":"main.go:341","level":"info","ts":"2020-05-31T18:45:54.981Z","vm_limits":"(soft=unlimited, hard=unlimited)"}
{"caller":"query_logger.go:79","component":"activeQueryTracker","level":"info","msg":"These queries didn't finish in prometheus' last run:","queries":"[{\"query\":\"sum(rate(container_network_transmit_bytes_total{pod=~\\\"ingress-nginx-ingress-controller-hffqg\\\",namespace=\\\"kube-system\\\"}[1m])) by (container, namespace)\",\"timestamp_sec\":1590950448},{\"query\":\"sum(kube_pod_container_resource_requests{pod=~\\\"prometheus-po-prometheus-0\\\",resource=\\\"memory\\\",namespace=\\\"monitoring\\\"}) by (container, namespace)\",\"timestamp_sec\":1590950448},{\"query\":\"sum(rate(container_cpu_usage_seconds_total{container!=\\\"POD\\\",container!=\\\"\\\",pod=~\\\"prometheus-po-prometheus-0\\\",namespace=\\\"monitoring\\\"}[1m])) by (container, namespace)\",\"timestamp_sec\":1590950448}]","ts":"2020-05-31T18:45:54.985Z"}
{"caller":"main.go:678","level":"info","msg":"Starting TSDB ...","ts":"2020-05-31T18:45:55.015Z"}

snip

{"caller":"head.go:627","component":"tsdb","duration":"31.459049573s","level":"info","msg":"WAL replay completed","ts":"2020-05-31T18:46:26.704Z"}
{"caller":"main.go:694","fs_type":"NFS_SUPER_MAGIC","level":"info","ts":"2020-05-31T18:46:26.964Z"}
{"caller":"main.go:695","level":"info","msg":"TSDB started","ts":"2020-05-31T18:46:26.964Z"}
{"caller":"main.go:799","filename":"/etc/prometheus/config_out/prometheus.env.yaml","level":"info","msg":"Loading configuration file","ts":"2020-05-31T18:46:26.964Z"}
{"caller":"kubernetes.go:253","component":"discovery manager scrape","discovery":"k8s","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-05-31T18:46:26.968Z"}
{"caller":"kubernetes.go:253","component":"discovery manager scrape","discovery":"k8s","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-05-31T18:46:26.970Z"}
{"caller":"kubernetes.go:253","component":"discovery manager scrape","discovery":"k8s","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-05-31T18:46:26.970Z"}
{"caller":"kubernetes.go:253","component":"discovery manager notify","discovery":"k8s","level":"info","msg":"Using pod service account via in-cluster config","ts":"2020-05-31T18:46:26.971Z"}
{"caller":"main.go:827","filename":"/etc/prometheus/config_out/prometheus.env.yaml","level":"info","msg":"Completed loading of configuration file","ts":"2020-05-31T18:46:27.184Z"}
{"caller":"main.go:646","level":"info","msg":"Server is ready to receive web requests.","ts":"2020-05-31T18:46:27.184Z"}
{"caller":"main.go:524","level":"warn","msg":"Received SIGTERM, exiting gracefully...","ts":"2020-05-31T18:48:26.766Z"}

Turning on debugging, I can see this (the healthy endpoint was unresponsive from 2020-05-31T20:35:25 to 2020-05-31T20:35:57, the SIGTERM coming presumably because the health probe failureThreshold was exceeded):

{"caller":"klog.go:53","component":"k8s_client_runtime","func":"Verbose.Infof","level":"debug","msg":"caches populated","ts":"2020-05-31T20:34:49.570Z"}
{"caller":"scrape.go:962","component":"scrape manager","err":"Get \"http://10.10.15.65:9090/metrics\": context deadline exceeded","level":"debug","msg":"Scrape failed","scrape_pool":"monitoring/po-prometheus/0","target":"http://10.10.15.65:9090/metrics","ts":"2020-05-31T20:35:25.783Z"}
{"alertname":"KubeSchedulerDown","caller":"manager.go:783","component":"rule manager","group":"kubernetes-system-scheduler","labels":"{alertname=\"KubeSchedulerDown\", severity=\"critical\"}","level":"debug","msg":"'for' state restored","restored_time":"Saturday, 30-May-20 11:20:31 UTC","ts":"2020-05-31T20:35:57.170Z"}
{"alertname":"KubeControllerManagerDown","caller":"manager.go:783","component":"rule manager","group":"kubernetes-system-controller-manager","labels":"{alertname=\"KubeControllerManagerDown\", severity=\"critical\"}","level":"debug","msg":"'for' state restored","restored_time":"Saturday, 30-May-20 11:20:05 UTC","ts":"2020-05-31T20:35:57.170Z"}
{"alertname":"KubeVersionMismatch","caller":"manager.go:783","component":"rule manager","group":"kubernetes-system","labels":"{alertname=\"KubeVersionMismatch\", severity=\"warning\"}","level":"debug","msg":"'for' state restored","restored_time":"Saturday, 30-May-20 11:20:14 UTC","ts":"2020-05-31T20:35:57.175Z"}
{"alertname":"TargetDown","caller":"manager.go:783","component":"rule manager","group":"general.rules","labels":"{alertname=\"TargetDown\", job=\"po-prometheus\", namespace=\"monitoring\", service=\"po-prometheus\", severity=\"warning\"}","level":"debug","msg":"'for' state restored","restored_time":"Sunday, 31-May-20 20:35:57 UTC","ts":"2020-05-31T20:35:57.182Z"}
{"alertname":"TargetDown","caller":"manager.go:783","component":"rule manager","group":"general.rules","labels":"{alertname=\"TargetDown\", job=\"kubelet\", namespace=\"kube-system\", service=\"po-kubelet\", severity=\"warning\"}","level":"debug","msg":"'for' state restored","restored_time":"Sunday, 31-May-20 20:35:57 UTC","ts":"2020-05-31T20:35:57.182Z"}
{"alertname":"KubePodCrashLooping","caller":"manager.go:783","component":"rule manager","group":"kubernetes-apps","labels":"{alertname=\"KubePodCrashLooping\", container=\"prometheus\", endpoint=\"http\", instance=\"10.10.17.202:8080\", job=\"kube-state-metrics\", namespace=\"monitoring\", pod=\"prometheus-po-prometheus-1\", service=\"stable-po-kube-state-metrics\", severity=\"critical\"}","level":"debug","msg":"'for' state restored","restored_time":"Sunday, 31-May-20 20:30:57 UTC","ts":"2020-05-31T20:35:57.247Z"}
{"alertname":"KubePodCrashLooping","caller":"manager.go:783","component":"rule manager","group":"kubernetes-apps","labels":"{alertname=\"KubePodCrashLooping\", container=\"prometheus\", endpoint=\"http\", instance=\"10.10.17.202:8080\", job=\"kube-state-metrics\", namespace=\"monitoring\", pod=\"prometheus-po-prometheus-0\", service=\"stable-po-kube-state-metrics\", severity=\"critical\"}","level":"debug","msg":"'for' state restored","restored_time":"Sunday, 31-May-20 20:30:57 UTC","ts":"2020-05-31T20:35:57.247Z"}
{"caller":"main.go:524","level":"warn","msg":"Received SIGTERM, exiting gracefully...","ts":"2020-05-31T20:37:52.433Z"}

</details>

closed time in 7 hours

Nuru

issue commentprometheus/prometheus

Prometheus hangs without log message

I am closing this bug. Please reopen if after upgrading to 2.21 you can reproduce it.

Nuru

comment created time in 7 hours

Pull request review commentprometheus/prometheus

Guard closing quitCh with sync.Once to prevent double close

 func (h *Handler) version(w http.ResponseWriter, r *http.Request) { }  func (h *Handler) quit(w http.ResponseWriter, r *http.Request) {-	select {-	case <-h.quitCh:+	var stopped bool+	h.quitOnce.Do(func() {+		stopped = true+		close(h.quitCh)+	})+	if stopped {

This should be the opposite

johejo

comment created time in 9 hours

pull request commentprometheus/prometheus

Consider status code 429 as recoverable errors to avoid resharding

Ah, I feel like rate limiting is a thing from the remote storage. That means, the remote storage should get more control as to how it wants a particular request to behave. This gives itself the ability to handle the situation and come out of it. So, the response header should be a good way out here.

Harkishen-Singh

comment created time in 10 hours

pull request commentprometheus/prometheus

Consider status code 429 as recoverable errors to avoid resharding

The Retry-After response header may be returned by remote storage to tell the client (remote write) how long to wait before trying another send. It is called out in https://tools.ietf.org/html/rfc6585#section-4.

To be clear, I am not saying implement the Retry-After logic. I am just not sure the 429 behavior should be the exact same as 5XX retry behavior, and would like a discussion. I am curious what @cstyan thinks too.

Harkishen-Singh

comment created time in 10 hours

pull request commentprometheus/prometheus

Guard closing quitCh with sync.Once to prevent double close

Thanks good suggestion. I think it's better only Handler.Quit(e.g. from main module) could read quitCh.

johejo

comment created time in 12 hours

pull request commentprometheus/prometheus

Consider status code 429 as recoverable errors to avoid resharding

I didn't get the retry after thing. Is it something that remote storage will respond to the remote write component, a time after which only, it should retry?

Harkishen-Singh

comment created time in 12 hours

pull request commentprometheus/prometheus

Guard closing quitCh with sync.Once to prevent double close

You could instead revert 8166 and apply to the full function ?

pseudo code

var stopped bool
quitOnce.Do {
 stopped=true
  print(Quitting)
  close(quit)
}
if !stopped {
  print(Exit in progress)
}
johejo

comment created time in 13 hours

issue commentprometheus/alertmanager

Old peer is not forgotten properly

the error is NOT resolved: old peers are not forgotten

how long have you waited? Alertmanager removes the permanently failed peers every 15 minutes (irrespective of --cluster.reconnect-timeout). Also it doesn't apply to the peers known from --cluster.peer.

csonp

comment created time in 13 hours

issue commentprometheus/alertmanager

AlertManager not sending all alerts to Webhook endpoint.

@andrewipmtl can you share the new config?

andrewipmtl

comment created time in 13 hours

PR opened prometheus/prometheus

Guard closing quitCh with sync.Once to prevent double close

related #8144

See https://github.com/prometheus/prometheus/issues/8144#issuecomment-735814282

Signed-off-by: Mitsuo Heijo mitsuo.heijo@gmail.com

<!-- Don't forget!

- If the PR adds or changes a behaviour or fixes a bug of an exported API it would need a unit/e2e test.

- Where possible use only exported APIs for tests to simplify the review and make it as close as possible to an actual library usage.

- No tests are needed for internal implementation changes.

- Performance improvements would need a benchmark test to prove it.

- All exposed objects should have a comment.

- All comments should start with a capital letter and end with a full stop.

-->

+4 -1

0 comment

1 changed file

pr created time in 14 hours

pull request commentprometheus/prometheus

Consider status code 429 as recoverable errors to avoid resharding

First, thanks for working on this, I think something like this behavior is certainly useful as dropping 429s indiscriminately is not ideal.

That said, I am not sure we ever decided what the best behavior on a 429 is. If you encounter a 429 in regular operation, chances are your remote write will just continue to fall behind until you force a restart dropping a significant chunk of data. Right now the 429 behavior can be useful because dropping some samples basically means less resolution in your data which might be better for many people not having any recent data at all.

Configuration of what to do on a 429 would be ok, but are there other ideas for handling rate limits more gracefully? Sending a new request that will probably fail 30ms later doesn't seem great. Limited number of retries + respect for the Retry-After header could be one option? This might end up relating to #7912.

Harkishen-Singh

comment created time in 14 hours

issue commentprometheus/prometheus

Error: http: superfluous response.WriteHeader call

I get the same error

Client: Docker Engine - Community
 Cloud integration: 1.0.2
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 16:58:31 2020
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:07:04 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.7
  GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

I'm using skaffold

bandesz

comment created time in 14 hours

issue commentprometheus/alertmanager

Web UI: Javascript intinite loop on Firefox

Can you share more details on your Alertmanager setup (CLI flags, config, ...)?

brpaz

comment created time in 14 hours

issue commentprometheus/prometheus

panic - close of closed channel - after POST /-/quit

This issue doesn't seem to be fixed.

In my env, the test that added in #8166 is flaky.

$ git log -n 1
commit a6e18916ab4090a21e222e040458b52ca5053cf7 (HEAD -> master, origin/master, origin/HEAD)
Author: johncming <johncming@yahoo.com>
Date:   Mon Nov 30 16:55:33 2020 +0800

    tsdb: Remove duplicate variables. (#8239)

    Signed-off-by: johncming <johncming@yahoo.com>
$ for i in `seq 10`; do go test ./web -run TestHandleMultipleQuitRequests -count=1 -v; done
=== RUN   TestHandleMultipleQuitRequests
--- PASS: TestHandleMultipleQuitRequests (5.00s)
PASS
ok      github.com/prometheus/prometheus/web    5.027s
=== RUN   TestHandleMultipleQuitRequests
--- PASS: TestHandleMultipleQuitRequests (5.00s)
PASS
ok      github.com/prometheus/prometheus/web    5.026s
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.028s
FAIL
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.026s
FAIL
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.027s
FAIL
=== RUN   TestHandleMultipleQuitRequests
--- PASS: TestHandleMultipleQuitRequests (5.00s)
PASS
ok      github.com/prometheus/prometheus/web    5.024s
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.027s
FAIL
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.026s
FAIL
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.026s
FAIL
=== RUN   TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
    web_test.go:492:
                Error Trace:    web_test.go:492
                                                        asm_amd64.s:1374
                Error:          Received unexpected error:
                                Post "http://localhost:9090/-/quit": EOF
                Test:           TestHandleMultipleQuitRequests
--- FAIL: TestHandleMultipleQuitRequests (5.00s)
FAIL
FAIL    github.com/prometheus/prometheus/web    5.027s
FAIL
jamiecyber

comment created time in 15 hours

pull request commentprometheus/prometheus

Adding the option to scrape per network

unfortunately, i checked and if the label doesn't exist it doesn't catch the instance.

I don't understand what you mean here, I've not worked with this bit of GCE.

don't you think that network is a basic thing that should be configured the same as zone and not as a label?

I also don't know this, however I know that any filtering like this should be done with relabelling. If we lack the metadata to make that possible today I'm open to adding it.

shirpx

comment created time in 16 hours

pull request commentprometheus/prometheus

Adding the option to scrape per network

unfortunately, i checked and if the label doesn't exist it doesn't catch the instance. (feel free to check as well through the API - https://cloud.google.com/compute/docs/reference/rest/v1/instances/list) but, back to the pr, don't you think that network is a basic thing that should be configured the same as zone and not as a label?

shirpx

comment created time in 16 hours

pull request commentprometheus/prometheus

Adding the option to scrape per network

I'm not familiar with this part of GCE, but relabelling can match the empty string as it's using regexes. So you shouldn't need to change anything about your setup, only choose appropriate relabelling actions.

shirpx

comment created time in 17 hours

pull request commentprometheus/prometheus

Adding the option to scrape per network

you are right that its can be done through relabelling, but when trying to filter out with != it does not find instances that doesn't have the network label configured. filter: 'labels.network!=example-vpc' that means that we need to add network label to every instance which is ok for some envs but for us its a problem in our current setup (huge env and all baked) which brought me to do this PR. thought others might find this useful, but maybe its only good for our use case. thanks anyway.

shirpx

comment created time in 17 hours

pull request commentprometheus/prometheus

Adding the option to scrape per network

This appears to be custom client-side filter logic, if you're looking to do something like this it should be done via relabelling rather than each SD reinventing ways to do arbitrary target manipulation. To me it looks like the required metadata is already available to relabelling for this, but if not it can be added.

shirpx

comment created time in 18 hours

PR opened prometheus/prometheus

Adding the option to scrape per network

Currently it's not possible to filter GCE by network, GCE API does not support filtering on the fields of repeated message fields. Public Issue Tracker: https://issuetracker.google.com/issues/73455339

in the mean time this PR will allow to filter by network

+10 -0

0 comment

1 changed file

pr created time in 18 hours

pull request commentprometheus/prometheus

Update remote-write grafana mixin

Should we keep that backwards-compatible?

To clarify, that woul be using or : rate(new[5m]) or rate(old[5m])

done.

Allex1

comment created time in 20 hours

more