profile
viewpoint
Simon Pasquier simonpasquier Red Hat https://twitter.com/SimonHiker Working on logging, monitoring, alerting.

rhobs/kube-events-exporter 21

Kubernetes events aggregator and exporter

simonpasquier/alertmanager 0

Prometheus Alertmanager

simonpasquier/alertmanager2es 0

Receives HTTP webhook notifications from AlertManager and inserts them into an Elasticsearch index for searching and analysis

simonpasquier/ambench 0

Tool to perform load tests on the Prometheus Alertmanager project.

simonpasquier/appdash 0

Application tracing system for Go, based on Google's Dapper.

simonpasquier/azure-pipeline-go 0

Package pipeline implements an HTTP request/response middleware pipeline whose policy objects mutate an HTTP request's URL, query parameters, and/or headers before the request is sent over the wire. This is the Go implementation.

simonpasquier/azure-storage-blob-go 0

Microsoft Azure Blob Storage Library for Go

pull request commentopenshift/cluster-monitoring-operator

[release-4.6] Bug 1904091: Exporting registry v1 protocol usage metric

@openshift-bot: This pull request references Bugzilla bug 1904091, which is invalid:

  • expected dependent Bugzilla bug 1885856 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment <code>/bugzilla refresh</code> to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

<details>

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

ricardomaraschini

comment created time in 20 minutes

pull request commentopenshift/cluster-monitoring-operator

[release-4.6] Bug 1904091: Exporting registry v1 protocol usage metric

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

ricardomaraschini

comment created time in 20 minutes

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

@paulfantom: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-operator 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

paulfantom

comment created time in 21 minutes

PR opened prometheus/prometheus

UI: Remove useless else-if

That else-if can never be reached.

Signed-off-by: Julien Pivotto roidelapluie@inuits.eu

<!-- Don't forget!

- If the PR adds or changes a behaviour or fixes a bug of an exported API it would need a unit/e2e test.

- Where possible use only exported APIs for tests to simplify the review and make it as close as possible to an actual library usage.

- No tests are needed for internal implementation changes.

- Performance improvements would need a benchmark test to prove it.

- All exposed objects should have a comment.

- All comments should start with a capital letter and end with a full stop.

-->

+0 -2

0 comment

1 changed file

pr created time in an hour

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in an hour

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

@paulfantom: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-operator 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

paulfantom

comment created time in 2 hours

GollumEvent

issue openedprometheus/alertmanager

How to disable Watchdog initial notification in Alertmanager

<!--

Please do *NOT* ask usage questions in Github issues.

If your issue is not a feature request or bug report use:
https://groups.google.com/forum/#!forum/prometheus-users. If
you are unsure whether you hit a bug, search and ask in the
mailing list first.

You can find more information at: https://prometheus.io/community/

-->

**I installed Alertmanager in my EKS cluster along with Prometheus and set up some alerts, they all working fine except one annoying alert that spin up every time which is the Watchdog notification that tells that the entire pipeline is working fine, I know it's an important alert but we have one receiver that accepts all kind of alerts, and it's really annoying to get notified at 12pm to only see that one alert i tried to et rid of it by redirecting it to a null receive but it doesn't work. **

Disable the Watchdog alert

The Watchdog alert keeps firing all the time

Environment

  • System information:

    EKS cluster v1.16

  • Alertmanager version:

    v0.21.0

  • Prometheus version:

    v2.21.0

  • Alertmanager configuration file:

config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      receiver: prometheus-msteams
      routes:
      - match:
          alertname: Watchdog
        receiver: prometheus-msteams
    receivers:
    - name: prometheus-msteams
      webhook_configs:
      - url: "http://prometheus-msteams:2000/alertmanager"
        send_resolved: true

created time in 2 hours

PR opened prometheus/docs

Add Vector to tools that export Prometheus metrics

Vector has both a Prometheus source and sink and a remote_write source and sink.

@brian-brazil

+1 -0

0 comment

1 changed file

pr created time in 3 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in 3 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

@paulfantom: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-operator 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

paulfantom

comment created time in 3 hours

issue openedprometheus/haproxy_exporter

crashes on http.ListenAndServe (during accepting new connections)

Version: 0.11.0

Crashes for me every few hours during the ListenAndServe call, as it looks like when trying to accept new incoming connections

runtime: checkdead: find g 491671 in status 1 fatal error: checkdead: runnable g runtime stack: runtime.throw(0x9f811e, 0x15) /usr/local/go/src/runtime/panic.go:1116 +0x72 runtime.checkdead() /usr/local/go/src/runtime/proc.go:4407 +0x390 runtime.mput(...) /usr/local/go/src/runtime/proc.go:4824 runtime.stopm() /usr/local/go/src/runtime/proc.go:1832 +0x95 runtime.exitsyscall0(0xc000001500) /usr/local/go/src/runtime/proc.go:3268 +0x111 runtime.mcall(0x0) /usr/local/go/src/runtime/asm_amd64.s:318 +0x5b goroutine 1 [IO wait, 25002 minutes]: internal/poll.runtime_pollWait(0x7fb060191f18, 0x72, 0x0) /usr/local/go/src/runtime/netpoll.go:203 +0x55 internal/poll.(*pollDesc).wait(0xc0000b8918, 0x72, 0x0, 0x0, 0x9ef93b) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Accept(0xc0000b8900, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:384 +0x1d4 net.(*netFD).accept(0xc0000b8900, 0xf0146b39697dc950, 0x1000000000000, 0xf0146b39697dc950) /usr/local/go/src/net/fd_unix.go:238 +0x42 net.(*TCPListener).accept(0xc000130ac0, 0x5fb30338, 0xc00004bb60, 0x4cf116) /usr/local/go/src/net/tcpsock_posix.go:139 +0x32 net.(*TCPListener).Accept(0xc000130ac0, 0xc00004bbb0, 0x18, 0xc000000180, 0x6c79fc) /usr/local/go/src/net/tcpsock.go:261 +0x64 net/http.(*Server).Serve(0xc00014a000, 0xaba1a0, 0xc000130ac0, 0x0, 0x0) /usr/local/go/src/net/http/server.go:2901 +0x25d net/http.(*Server).ListenAndServe(0xc00014a000, 0xc00014a000, 0x1) /usr/local/go/src/net/http/server.go:2830 +0xb7 net/http.ListenAndServe(...) /usr/local/go/src/net/http/server.go:3086 main.main() /app/haproxy_exporter.go:616 +0x1788 goroutine 491668 [IO wait, 1 minutes]: internal/poll.runtime_pollWait(0x7fb060191e38, 0x72, 0xffffffffffffffff) /usr/local/go/src/runtime/netpoll.go:203 +0x55 internal/poll.(*pollDesc).wait(0xc0006b1298, 0x72, 0x0, 0x1, 0xffffffffffffffff) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45

created time in 3 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in 4 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

@paulfantom: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic
ci/prow/e2e-agnostic-operator 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

paulfantom

comment created time in 4 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in 5 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in 5 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

@paulfantom: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-operator 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic-operator
ci/prow/e2e-agnostic 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

paulfantom

comment created time in 6 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

@paulfantom: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-operator 797724ff41687cdb6620e94fbfb682f6b7c1fd5c link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

paulfantom

comment created time in 6 hours

pull request commentopenshift/cluster-monitoring-operator

WIP: IBM Cloud manifest profile patch

@csrwng: PR needs rebase.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

csrwng

comment created time in 8 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1900792: Collect all resource counts for telemetry

@smarterclayton: PR needs rebase.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

smarterclayton

comment created time in 8 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1904161: Use alertmanager_integrations metric instead of alertmanager_notifications_total for AlertmanagerReceiversNotConfigured

@RiRa12621: PR needs rebase.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

RiRa12621

comment created time in 8 hours

pull request commentopenshift/cluster-monitoring-operator

Add pv_collector_total_pv_count storage metric

@tsmetana: PR needs rebase.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. </details>

tsmetana

comment created time in 8 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in 9 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1903464: jsonnet: fix recording rules with many-to-many matching errors

/retest

Please review the full test history for this PR and help us cut down flakes.

paulfantom

comment created time in 9 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1900792: Collect all resource counts for telemetry

/lgtm cancel Needs rebase anyway

smarterclayton

comment created time in 11 hours

pull request commentopenshift/cluster-monitoring-operator

[release-4.6] Bug 1904091: Exporting registry v1 protocol usage metric

/retest

Please review the full test history for this PR and help us cut down flakes.

ricardomaraschini

comment created time in 12 hours

issue commentprometheus-operator/kube-prometheus

Errors using `kube-prometheus-thanos-sidecar.libsonnet`

It prevents the user from including other prometheusAlerts and prometheusRules from the thanos-mixin/mixin.libsonnet entirely as Prometheus will complain that the thanos-sidecar.rules group name is declared twice

This should be fixed upstream in https://github.com/thanos-io/thanos/pull/3542

sdurrheimer

comment created time in 12 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1900792: Collect all resource counts for telemetry

@smarterclayton: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic 73e45b6a133c696d1200a0566ca78c830fcd28cd link /test e2e-agnostic
ci/prow/e2e-aws-operator 73e45b6a133c696d1200a0566ca78c830fcd28cd link /test e2e-aws-operator

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

smarterclayton

comment created time in 14 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1900792: Collect all resource counts for telemetry

@smarterclayton: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-operator 73e45b6a133c696d1200a0566ca78c830fcd28cd link /test e2e-aws-operator
ci/prow/e2e-agnostic 73e45b6a133c696d1200a0566ca78c830fcd28cd link /test e2e-agnostic

Full PR test history. Your PR dashboard.

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

smarterclayton

comment created time in 14 hours

pull request commentopenshift/cluster-monitoring-operator

Bug 1900792: Collect all resource counts for telemetry

/retest

Please review the full test history for this PR and help us cut down flakes.

smarterclayton

comment created time in 14 hours

more