profile
viewpoint
Johannes 'fish' Ziemke discordianfish Berlin https://5pi.de Freelance Cloud Native Consultant, Founder of @prometheus node_exporter.

discordianfish/blackbox_prober 21

Export availability, request latencies and size for remote services

discordianfish/check_graphite.r 8

holt-winters forecast nagios check based on graphite

discordianfish/blackbox-exporter-lambda 7

Run the Prometheus Blackbox Exporter as AWS Lambda

discordianfish/alpine-armhf-docker 3

Alpine Docker base images for ARM

discordianfish/banksman 3

Render iPXE from collins attributes

discordianfish/alpine-armhf-docker-dumb-init 1

Alpine armhf base image with dumb-init installed

discordianfish/bootylicious 1

One-file weblog on Mojo steroids!

discordianfish/bootylicious-plugin-top_pages 1

plugin to render a "page" on each blog page

pull request commentprometheus/procfs

proc/sys/kernel: adds support for parsing core_pattern

@SuperQ It does seem trivial tbh. On the other hand, when I was looking for such a functionality recently, I was certain It would be something I'd find in this project. Writing a wrapper is fine with me and in fact, I was a bit skeptic about adding this myself and opened an issue(#342) beforehand to clear this up. cc: @discordianfish

danishprakash

comment created time in an hour

pull request commentprometheus/node_exporter

Expose zfs zpool state

@SuperQ Hello. Could you please review the PR?

Hexta

comment created time in 8 hours

issue commentprometheus/node_exporter

can node_exporter support AIX os?

Interesting, I don't have any idea about static linking in Go/AIX. Again, I have no access to any AIX systems to try any of this on.

We don't have any CGO in the node_exporter for Linux, but we do allow it for other UNIX platforms like the BSDs. You can see the separate .promu-cgo.yml configuration for those platforms.

I don't see a problem with adding AIX CGO to our main codebase.

william-yang

comment created time in 10 hours

issue commentprometheus/node_exporter

can node_exporter support AIX os?

Small update, this seems to be related to static linking the library into the go code. If I compile without the static flags, the binary runs correctly (or at least doesn't crash, and returns data). The same behaviour is observed in the libperfstat module, it crashes with static linking. The good news is I am getting disk statistics from the libperfstat library from node_exporter at this time, so the basic functionality is working.

I'm not sure where to look or what to test out next. I haven't been able to find much info on static linking AIX libraries into go, and if there are any special considerations. May be time to write a minimal test case to see if this is with all statically linked go programs with external libraries on AIX.

william-yang

comment created time in 13 hours

fork juliusliunz/docker-backup

Tool for backing up docker volume / data containers

fork in a day

Pull request review commentprometheus/node_exporter

Add ErrorLog plumbing to promhttp

 func (h *handler) innerHandler(filters ...string) (http.Handler, error) { 	handler := promhttp.HandlerFor( 		prometheus.Gatherers{h.exporterMetricsRegistry, r}, 		promhttp.HandlerOpts{+			ErrorLog:            stdlog.New(log.NewStdlibAdapter(level.Error(h.logger)), "", 0),

Upstream go-kit PR: https://github.com/go-kit/kit/pull/1041

SuperQ

comment created time in 2 days

Pull request review commentprometheus/procfs

proc/sys/kernel: adds support for parsing core_pattern

+// Copyright 2020 The Prometheus Authors+// Licensed under the Apache License, Version 2.0 (the "License");+// you may not use this file except in compliance with the License.+// You may obtain a copy of the License at+//+// http://www.apache.org/licenses/LICENSE-2.0+//+// Unless required by applicable law or agreed to in writing, software+// distributed under the License is distributed on an "AS IS" BASIS,+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+// See the License for the specific language governing permissions and+// limitations under the License.++// +build !windows++package procfs++import (+	"bytes"+	"os"++	"github.com/prometheus/procfs/internal/util"+)++// KernelCorePattern returns value from /proc/sys/kernel/core_pattern+func (fs FS) KernelCorePattern() (pattern []byte, err error) {

Don't name these returns.

func (fs FS) KernelCorePattern() ([]byte, error) {
danishprakash

comment created time in 2 days

Pull request review commentprometheus/procfs

proc/sys/kernel: adds support for parsing core_pattern

+// Copyright 2019 The Prometheus Authors

New files should get the current year.

// Copyright 2020 The Prometheus Authors
danishprakash

comment created time in 2 days

pull request commentprometheus/node_exporter

Support for CPU, network and filesystem stat collection on NetBSD

Thank you for looking into fixing my mess, sorry I can't really help much

sthomen

comment created time in 2 days

pull request commentprometheus/node_exporter

collector/bonding_linux: Monitor bond mii_status not link operstate

@nakato thanks for reply! Linux HLR1 2.6.32-504.12.2.el6.x86_64 #1 SMP Sun Feb 1 12:14:02 EST 2015 x86_64 x86_64 x86_64 GNU/Linux tree.txt Active-backup mode. In my humble opinion, line 88 should be left unchanged.

nakato

comment created time in 4 days

startedmvndaemon/mvnd

started time in 4 days

issue commentprometheus/node_exporter

Proposal: Use shell commands to extend node_exporter

From a security perspective, adding a help tool to the textfile to replace the shell-type collector may be a good choice.

In this way, system administrators can also easily expand the ability of node_export to collect metrics. No need to use crontab and textfile as before. After all, crontab can only be accurate to the minute level.

Of course, it is better to have a shell type collector.

idweball

comment created time in 4 days

issue commentprometheus/node_exporter

Proposal: Use shell commands to extend node_exporter

That would be great, especially if that tool can fork as different users to execute processes, but still write files with the running user.

idweball

comment created time in 4 days

fork iranzoferri/docker-backup

Tool for backing up docker volume / data containers

fork in 4 days

startedMEGA65/mega65-user-guide

started time in 4 days

issue openedprometheus/node_exporter

Processes exporter logs no such process errors

<!-- Please note: GitHub issues should only be used for feature requests and bug reports. For general usage/help/discussions, please refer to one of:

- #prometheus on freenode
- the Prometheus Users list: https://groups.google.com/forum/#!forum/prometheus-users

Before filing a bug report, note that running node_exporter in Docker is
not recommended, for the reasons detailed in the README:

https://github.com/prometheus/node_exporter#using-docker

Finally, also note that node_exporter is focused on *NIX kernels, and the
WMI exporter should be used instead on Windows.

For bug reports, please fill out the below fields and provide as much detail
as possible about your issue.  For feature requests, you may omit the
following template.

-->

Host operating system: output of uname -a

Linux 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux CentOS 7

node_exporter version: output of node_exporter --version

<!-- If building from source, run make first. --> node_exporter --version node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f) build user: root@1f76dbbcfa55 build date: 20200616-12:44:12 go version: go1.14.4

node_exporter command line flags

<!-- Please list all of the command line flags --> node_exporter --collector.processes --collector.qdisc --collector.systemd

Are you running node_exporter in Docker?

<!-- Please note the warning above. --> No

What did you do that produced an error?

Simply rand node_exporter for an extended period of time

What did you expect to see?

No error logs

What did you see instead?

node_exporter[15746]: level=error ts=2020-11-25T13:12:02.291Z caller=collector.go:161 msg="collector failed" name=processes duration_seconds=0.027611184 err="unable to retrieve number of allocated threads: "read /proc/2054/stat: no such process""

Analysis

This is very closely related to #1043 : that change fixed processes disappearing between list the /proc directory and reading the actual process stats. But another race condition is possible: between opening the /proc/<process id>/stat file and actually reading it, another race condition can occur and the error code returned is different. Bellow is a small code snippet to reproduce that race condition.

The recommended fix is to modify getAllocatedThreads() in collector/processes_linux.go to continue after stat, err := pid.Stat() if the error meets this condition: strings.Contains(err.Error(),syscall.ESRCH.Error()).

package main

import (
    "fmt"
    "os"
    "io"
    "io/ioutil"
    "syscall"
    "strings"
    "strconv"
    "os/exec"
    "log"
)

func main(){
    const maxBufferSize = 1024 * 512

    fmt.Printf("Starting process sleep\n")
    cmd := exec.Command("sleep","1")
    err := cmd.Start()
    if(err != nil) {
        log.Fatal(err)
    }

    procPath := "/proc/" + strconv.Itoa(cmd.Process.Pid) + "/stat"

    fmt.Printf("Read stat for %s\n",procPath)
    f, err := os.Open(procPath)
    defer f.Close()
    if(err != nil) {
        log.Fatal(err)
    }

    cmd.Wait()
    fmt.Printf("Sleep process existed, reading opened stat file\n")
    reader := io.LimitReader(f, maxBufferSize)
    _, err = ioutil.ReadAll(reader)

    if err != nil {
        if strings.Contains(err.Error(),syscall.ESRCH.Error()) {
            fmt.Println("Got error no such process:", err)
        } else {
            fmt.Println("Read stat failed: ",err)
        }
    } else {
        fmt.Println("No error reading stat")
    }
}

created time in 5 days

pull request commentprometheus/node_exporter

Update install instructions in README

:heart: Thanks!

SuperQ

comment created time in 5 days

Pull request review commentprometheus/node_exporter

Update install instructions in README

 To expose NVIDIA GPU metrics, [prometheus-dcgm ](https://github.com/NVIDIA/gpu-monitoring-tools#dcgm-exporter) can be used. +## Installation and Usage++If you are new to Prometheus and `node_exporter` there is a [simple step-by-step guide](https://prometheus.io/docs/guides/node-exporter/).++### Ansible++For automated installs with [Ansible](https://www.ansible.com/), there is the [Cloud Alchemy role](https://github.com/cloudalchemy/ansible-node-exporter).

:heart: Thanks!

SuperQ

comment created time in 5 days

issue commentprometheus/node_exporter

Proposal: Use shell commands to extend node_exporter

We explicitly have a policy against using external processes in the node_exporter for a variety of safety reasons.

What I have thought about is adding a helper tool to make managing and running textfile helper scripts easier.

A config file like this:

---
global:
  interval: 1m
  timeout: 10s
scripts:
- cmd: /path/to/textfile_helper.sh
  interval: 5m
  timeout: 1m
  file: textfile.prom

Each script would be managed by a goroutine, and the stdout would be sent atomically to the textfile.

idweball

comment created time in 5 days

Pull request review commentprometheus/node_exporter

Update install instructions in README

 To expose NVIDIA GPU metrics, [prometheus-dcgm ](https://github.com/NVIDIA/gpu-monitoring-tools#dcgm-exporter) can be used. +## Installation and Usage++If you are new to Prometheus and `node_exporter` there is a [simple step-by-step guide](https://prometheus.io/docs/guides/node-exporter/).++### Ansible++For automated installs with [Ansible](https://www.ansible.com/), there is the [Cloud Alchemy role](https://github.com/cloudalchemy/ansible-node-exporter).

LGTM :100: :tada:

SuperQ

comment created time in 5 days

issue commentprometheus/node_exporter

Proposal: Use shell commands to extend node_exporter

Since there is already ozhiwei/shell_exporter why not use it instead of adding complexity to node_exporter? If not, you can always run shell scripts alongside and use textfile collector to ingest data from those.

idweball

comment created time in 5 days

pull request commentprometheus/node_exporter

collector/bonding_linux: Monitor bond mii_status not link operstate

@berghauz What kernel version?

It's unlikely I'll be able to bring up an old kernel with bonding for investigation, could you also provide the tree at /sys/class/net/bondX/ and details about your bonding configuration.

nakato

comment created time in 5 days

issue openedprometheus/node_exporter

Proposal: Use shell commands to extend node_exporter

Extending the node_exporter collector requires writing golang code, which is a very difficult task for an operation and maintenance worker who is not familiar with golang.

Is it possible to add a shell-type collector to collect metrics by executing shell commands? Such as: https://github.com/ozhiwei/shell_exporter

My English is too bad. This proposal uses Google Translate, hoping to understand what I mean.

created time in 5 days

issue commentprometheus/node_exporter

can node_exporter support AIX os?

I went ahead and tried to implement diskstats_aix.go, see https://github.com/thorhs/node_exporter/tree/diskstats_aix. Now, I'm new to the node_exporter build process, but i followed along the instructions which led me to just 'make'. After a few bumps with go modules, I got it to compile and it starts up correctly.

When I try to curl it, I get a crash where the most likely suspect is this gorutine:

goroutine 55 [syscall]:
runtime.cgocall(0x1105e7380, 0xa00010000193808, 0x0)
        /opt/freeware/lib/golang/src/runtime/cgocall.go:133 +0x58 fp=0xa000100001937a8 sp=0xa00010000193760 pc=0x1000030f8
github.com/thorhs/aix_libperfstat/generated._Cfunc_perfstat_disk(0x0, 0x0, 0x1f000000000, 0x0)
        _cgo_gotypes.go:604 +0x48 fp=0xa000100001937e8 sp=0xa000100001937a8 pc=0x1004c3158
github.com/thorhs/aix_libperfstat/generated.CollectDisks(0x0)
        /home/local/REIKNISTOFA/rb747/go/src/github.com/prometheus/node_exporter/vendor/github.com/thorhs/aix_libperfstat/generated/disk.go:48 +0x3c fp=0xa00010000193b50 sp=0xa000100001937e8 pc=0x1004c322c
github.com/prometheus/node_exporter/collector.(*diskstatsCollector).Update(0xa00010000230a20, 0xa0001000043a180, 0x11060ea80, 0x0)
        /home/local/REIKNISTOFA/rb747/go/src/github.com/prometheus/node_exporter/collector/diskstats_aix.go:55 +0x28 fp=0xa00010000193de8 sp=0xa00010000193b50 pc=0x1004e8a18

The thing that strikes me as odd is the third parameter to the _Cfunc_perfstat_disk function. It should be the sizeof of the structure being passed in, but it seems to be 0x1f000000000. I'm don't have any experience with golang and C interop, so I don't know if that is normal. The go code in question is: num := C.perfstat_disk(nil, nil, C.sizeof_perfstat_disk_t, 0)

The aix_libperfstat module is working if I run tests in that directly. I'm wondering if there is something different with the builds of node_exporter than with plain go. For one, I had to install a package to get /lib/syscalls.exp, which was not needed when building the standalone module.

I would appreciate if anyone has any insights or hints as to what to look at. If I get this working then getting much more AIX coverage should be easy.

william-yang

comment created time in 5 days

pull request commentprometheus/node_exporter

Support for CPU, network and filesystem stat collection on NetBSD

@discordianfish I have a local branch where I have done most of the work so that it applies properly on master. I'll see about submitting it as new pull request(s) and then we can close this.

sthomen

comment created time in 5 days

push eventprometheus/node_exporter

Ben Kochie

commit sha 35f2e3d83c62a71c415384ea8e389e9b31a7741a

Update install instructions in README Move end-user install instructions to the top of the README. * Add a Docker Compose example. * Improve some wording. * Link to the Cloud Alchemy Ansible role. * Update to git clone method for dev/building Signed-off-by: Ben Kochie <superq@gmail.com>

view details

push time in 6 days

pull request commentprometheus/node_exporter

Update install instructions in README

CC @anthonyeleven @osg

Thanks for the inspiration to really look over the README a bit. Here's some improvements.

SuperQ

comment created time in 6 days

PR opened prometheus/node_exporter

Update install instructions in README

Move end-user install instructions to the top of the README.

  • Add a Docker Compose example.
  • Improve some wording.
  • Link to the Cloud Alchemy Ansible role.
  • Update to git clone method for dev/building

Signed-off-by: Ben Kochie superq@gmail.com

+62 -29

0 comment

1 changed file

pr created time in 6 days

more