profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/pstibrany/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Peter Štibraný pstibrany @grafana Software Engineer at Grafana Labs

pstibrany/tsdb-example 2

Example that shows how to generate Prometheus TSDB block

pstibrany/OWASP-CSRFGuard 1

OWASP CSRFGuard is a library that implements a variant of the synchronizer token pattern to mitigate the risk of Cross-Site Request Forgery (CSRF) attacks.

pstibrany/promfreq 1

Display frequency distributions from the command-line.

pstibrany/services 1

Go implementation of Service from Google Guava library

pstibrany/alertmanager 0

Prometheus Alertmanager

pstibrany/baseds-series 0

A complied list of resources from the baseds series

pstibrany/chunks-inspect 0

Tool for inspecting Loki and Cortex chunks

pstibrany/common 0

Libraries used in multiple Weave projects

pstibrany/cortex 0

A horizontally scalable, highly available, multi-tenant, long term Prometheus.

PR closed cortexproject/cortex

Make `cortex_discarded_samples_total` Independent of the Replication Factor size/L stale

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does:

Because all ingesters receive and (mostly) discard the same samples, the number of discarded samples reported on each push is multiplied by the number of ingesters. This PR changes the way cortex_discarded_samples_total is reported, so instead of being generated by ingesters and distributors, the ingesters send what they used to report on their own back to the distributor, which then reports it on their behalf after deduplicating.

Two new fields are added to cortexpb.WriteResponse. succeeded is the number of samples that were successfully added, and although it's not used in this PR, I thought I'd add it because it was being discussed. discarded is an array of all the reasons that samples were discarded, accompanied by the number of samples that were discarded for that reason. The distributor gathers an array of all of these returned by the ingesters, and then goes through and finds the highest number of discarded_samples for each reason. This is then added to the cortex_discarded_samples_total metric labeled with the reason and the user ID.

message WriteResponse {
	int64 succeeded = 1;
	repeated DiscardedMetric discarded = 2 [(gogoproto.nullable) = false];
}
message DiscardedMetric {
	string reason = 1;
	int64 discarded_samples = 2;
}

Which issue(s) this PR fixes: Fixes #3955

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+581 -69

1 comment

8 changed files

LeviHarrison

pr closed time in 13 minutes

pull request commentcortexproject/cortex

Make `cortex_discarded_samples_total` Independent of the Replication Factor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

LeviHarrison

comment created time in an hour

PR opened cortexproject/cortex

Add CORTEX_CHECKOUT_PATH env variable to CI

Signed-off-by: Gábor Lipták gliptak@gmail.com

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does:

Which issue(s) this PR fixes: Fixes #<issue number>

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+15 -13

0 comment

1 changed file

pr created time in 2 hours

PR closed cortexproject/cortex

Add Kustomize Support size/XXL

Signed-off-by: Weifeng Wang qclaogui@gmail.com

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does:

Add Kustomize Support

Kustomize background

Kustomize is a CNCF project that is a part of Kubernetes. It's included in the kubectl in order to allow users to customize their configurations without introducing templates.

Usage

kustomize encourages defining multiple variants - e.g. dev, staging and prod, as overlays on a common base.

It’s possible to create an additional overlay to compose these variants together - just declare the overlays as the bases of a new kustomization.

cortex-kustomize provides a common base for Blocks Storage deployment to Kubernetes. People should Create variants using overlays to deploy Cortex in their own environment.

An overlay is just another kustomization, referring to the base, and referring to patches to apply to that base. This arrangement makes it easy to manage your configuration with git. The base could have files from an upstream repository managed by someone else. The overlays could be in a repository you own. Arranging the repo clones as siblings on disk avoids the need for git submodules (though that works fine, if you are a submodule fan).

Example

This is an example of monitoring Cortex by adding prometheus and grafana using kustomize

  1. Create development Environment

    mkdir -p deploy/overlays/dev

  1. Create kustomization.yaml

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    namespace: cortex-monitoring-system
    
    resources:
    - github.com/qclaogui/cortex-kustomize/deploy/base/blocks?ref=main
    - add-grafana-dep.yaml
    - add-grafana-svc.yaml
    - add-retrieval-dep.yaml
    - add-retrieval-svc.yaml
    
    patchesStrategicMerge:
    - patch-nginx-svc.yaml
    
    images:
    - name: quay.io/cortexproject/cortex
     newTag: master-b6eea5f
    - name: minio/minio
     newTag: RELEASE.2021-06-17T00-10-46Z
    

File structure:

└── deploy
   └── overlays
       └── dev
           ├── add-grafana-dep.yaml
           ├── add-grafana-svc.yaml
           ├── add-retrieval-dep.yaml
           ├── add-retrieval-svc.yaml
           ├── kustomization.yaml
           └── patch-nginx-svc.yaml

  1. Deploy to a cluster

kustomize build deploy/overlays/dev | kubectl apply -f -

More example detailed https://github.com/qclaogui/cortex-kustomize-demo.

Happy Cortex Which issue(s) this PR fixes:

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+1489 -0

3 comments

35 changed files

qclaogui

pr closed time in 8 hours

pull request commentcortexproject/cortex

Add Kustomize Support

I have moved the files to a separate repo called cortex-kustomize.

qclaogui

comment created time in 14 hours

issue commentcortexproject/cortex

Deprecated endpoints in configs

@bboreham So the other got closed, but this is still not in the docs? I tried to follow PRs. Can you re-open this?

till

comment created time in 16 hours

issue commentcortexproject/cortex

Cortex can read rules but doesn't activate them

@pracucci

In my S3(Minio) monitoring bucket I am getting 0, not sure is it the tenant id or not.

image

And I did modification in ruler yaml but NO LUCK.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ruler-configmap
  namespace: monitoring
data:
  rules.yml: |-
    groups:
      - name: "centralmonitoring"
        rules:
          - alert: "PrometheusDown"
            annotations:
              message: Prometheus replica in cluster {{$labels.cluster}} has disappeared.
            expr: sum(up{cluster!="", pod=~"prometheus.*"}) by (cluster) < 3
            for: 15s
            labels:
              severity: critical
              category: metrics
          - alert: "TooManyPods"
            annotations:
              message: Too many pods in cluster {{$labels.cluster}} on node {{$labels.instance}}
            expr: sum by(cluster,instance) (kubelet_running_pods{cluster!="",instance!=""}) > 5
            for: 15s
            labels:
              severity: warning
              category: metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruler
spec:
  replicas: 1
  selector:
    matchLabels:
      name: ruler
  template:
    metadata:
      labels:
        name: ruler
    spec:
      containers:
      - name: ruler
        image: quay.io/cortexproject/cortex:v1.9.0
        imagePullPolicy: IfNotPresent
        args:
        - -target=ruler
        - -log.level=debug
        - -server.http-listen-port=80
        - -ruler.configs.url=http://configs.monitoring.svc.cluster.local:80
        - -ruler.alertmanager-url=http://alertmanager.monitoring.svc.cluster.local:9093
        - -ruler-storage.backend=local
        - -ruler-storage.local.directory=/etc/cortex/rules/0
        - -ruler.rule-path=/rules
        - -consul.hostname=consul.monitoring.svc.cluster.local:8500
        - -s3.url=s3://admin:admin2675@172.31.40.72:9000/monitoring
        - -s3.force-path-style=true
        - -dynamodb.url=dynamodb://user:pass@dynamodb.monitoring.svc.cluster.local:8000
        - -schema-config-file=/etc/cortex/schema.yaml
        - -store.chunks-cache.memcached.addresses=memcached.monitoring.svc.cluster.local:11211
        - -store.chunks-cache.memcached.timeout=100ms
        - -store.chunks-cache.memcached.service=memcached
        - -distributor.replication-factor=1
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/cortex
          name: config
        - mountPath: /etc/cortex/rules/0
          name: alert
        - mountPath: /rules
          name: rules
      volumes:
        - configMap:
            name: schema-config
          name: config
        - configMap:
            name: cortex-ruler-configmap
          name: alert
        - emptyDir: {}
          name: rules
  • Error
[root@ip-172-31-40-72 monitoring]# oc exec -it ruler-7fb94dd7d7-8t6qc sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # ls -ltr /rules/
total 0
/ # ls -ltr /etc/cortex/rules/0/
total 0
lrwxrwxrwx    1 root     root            16 Jun 19 00:42 rules.yml -> ..data/rules.yml
/ # exit
[root@ip-172-31-40-72 monitoring]# oc logs -f ruler-7fb94dd7d7-8t6qc
level=info ts=2021-06-19T00:42:53.016332668Z caller=main.go:188 msg="Starting Cortex" version="(version=1.9.0, branch=HEAD, revision=ed4f339)"
level=info ts=2021-06-19T00:42:53.017559897Z caller=server.go:239 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=debug ts=2021-06-19T00:42:53.019175345Z caller=api.go:128 msg="api: registering route" methods=GET path=/config auth=false
level=debug ts=2021-06-19T00:42:53.021682252Z caller=api.go:128 msg="api: registering route" methods=GET path=/ auth=false
level=debug ts=2021-06-19T00:42:53.021814945Z caller=api.go:128 msg="api: registering route" methods=GET path=/debug/fgprof auth=false
level=debug ts=2021-06-19T00:42:53.021959288Z caller=api.go:128 msg="api: registering route" methods=GET path=/memberlist auth=false
level=debug ts=2021-06-19T00:42:53.023008694Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ingester/ring auth=false
level=debug ts=2021-06-19T00:42:53.023055416Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ring auth=false
level=warn ts=2021-06-19T00:42:53.023883349Z caller=experimental.go:19 msg="experimental feature in use" feature="DNS-based memcached service discovery"
level=info ts=2021-06-19T00:42:53.030612161Z caller=mapper.go:46 msg="cleaning up mapped rules directory" path=/rules
level=debug ts=2021-06-19T00:42:53.030753864Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler/ring auth=false
level=debug ts=2021-06-19T00:42:53.030790066Z caller=api.go:128 msg="api: registering route" methods=POST path=/ruler/delete_tenant_config auth=true
level=debug ts=2021-06-19T00:42:53.030835253Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler_ring auth=false
level=debug ts=2021-06-19T00:42:53.030860276Z caller=api.go:128 msg="api: registering route" methods=GET path=/ruler/rule_groups auth=false
level=debug ts=2021-06-19T00:42:53.030909584Z caller=api.go:128 msg="api: registering route" methods=GET path=/services auth=false
level=info ts=2021-06-19T00:42:53.031585317Z caller=module_service.go:59 msg=initialising module=server
level=debug ts=2021-06-19T00:42:53.031690112Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.031718025Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.031758011Z caller=module_service.go:49 msg="module waiting for initialization" module=store waiting_for=server
level=debug ts=2021-06-19T00:42:53.031776355Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=distributor-service
level=info ts=2021-06-19T00:42:53.031791076Z caller=module_service.go:59 msg=initialising module=store
level=debug ts=2021-06-19T00:42:53.031587773Z caller=module_service.go:49 msg="module waiting for initialization" module=memberlist-kv waiting_for=server
level=info ts=2021-06-19T00:42:53.031937836Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=debug ts=2021-06-19T00:42:53.032078255Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=ring
level=debug ts=2021-06-19T00:42:53.032146949Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=server
level=info ts=2021-06-19T00:42:53.032254539Z caller=module_service.go:59 msg=initialising module=ring
level=debug ts=2021-06-19T00:42:53.047187773Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=server
level=info ts=2021-06-19T00:42:53.047280661Z caller=module_service.go:59 msg=initialising module=distributor-service
level=debug ts=2021-06-19T00:42:53.047490102Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.047527989Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=ring
level=debug ts=2021-06-19T00:42:53.047539583Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=server
level=debug ts=2021-06-19T00:42:53.047727464Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=store
level=info ts=2021-06-19T00:42:53.047738286Z caller=module_service.go:59 msg=initialising module=ruler
level=info ts=2021-06-19T00:42:53.047768984Z caller=ruler.go:438 msg="ruler up and running"
level=debug ts=2021-06-19T00:42:53.047783448Z caller=ruler.go:476 msg="syncing rules" reason=initial
level=info ts=2021-06-19T00:42:53.047888139Z caller=cortex.go:414 msg="Cortex started"
level=debug ts=2021-06-19T00:43:53.048652595Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-19T00:44:53.047992537Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-19T00:45:53.048070935Z caller=ruler.go:476 msg="syncing rules" reason=periodic

NOTE: I tried with both 0 and fake but result same :(

jakubgs

comment created time in a day

issue closedcortexproject/cortex

compactor went into tailspin

Describe the bug

Consul was down at the time compactor started, and it never recovered:

level=info ts=2021-03-04T20:23:01.350819279Z caller=main.go:188 msg="Starting Cortex" version="(version=1.6.0, branch=master, revision=56f794d)"
level=info ts=2021-03-04T20:23:01.35627101Z caller=module_service.go:59 msg=initialising module=server
level=info ts=2021-03-04T20:23:01.358032071Z caller=module_service.go:59 msg=initialising module=compactor
level=info ts=2021-03-04T20:23:01.351964058Z caller=server.go:229 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-03-04T20:23:01.356491008Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=info ts=2021-03-04T20:23:01.36340712Z caller=compactor.go:373 component=compactor msg="waiting until compactor is ACTIVE in the ring"
level=info ts=2021-03-04T20:23:01.366220882Z caller=lifecycler.go:527 msg="not loading tokens from file, tokens file path is empty"
level=error ts=2021-03-04T20:23:01.391200499Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.394262836Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.391467754Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.399955867Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.397642587Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.402454173Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.404896477Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.407444107Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.410901197Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.413247689Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.4157827Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:02.848432818Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:06.895736201Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:14.397161217Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:25.078769837Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:49.041899802Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:24:35.634427793Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:25:35.127795679Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:26:37.176005708Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:27:42.508689928Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:28:35.350979964Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:29:41.117751261Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:30:39.87687265Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=info ts=2021-03-04T20:31:36.036454597Z caller=client.go:247 msg="value is nil" key=compactor index=22
level=info ts=2021-03-04T20:31:36.930199939Z caller=client.go:247 msg="value is nil" key=compactor index=24
level=info ts=2021-03-04T20:31:37.932395975Z caller=client.go:247 msg="value is nil" key=compactor index=25
level=info ts=2021-03-04T20:31:41.887641219Z caller=client.go:247 msg="value is nil" key=compactor index=28
level=info ts=2021-03-04T20:31:41.941646251Z caller=client.go:247 msg="value is nil" key=compactor index=29
level=info ts=2021-03-04T20:31:46.926493102Z caller=client.go:247 msg="value is nil" key=compactor index=31
level=info ts=2021-03-04T20:31:46.967013886Z caller=client.go:247 msg="value is nil" key=compactor index=32
...
level=info ts=2021-03-05T10:12:35.156472897Z caller=client.go:247 msg="value is nil" key=compactor index=78733
level=info ts=2021-03-05T10:12:35.455228914Z caller=client.go:247 msg="value is nil" key=compactor index=78734
level=info ts=2021-03-05T10:12:36.157313862Z caller=client.go:247 msg="value is nil" key=compactor index=78736
level=info ts=2021-03-05T10:12:37.157305834Z caller=client.go:247 msg="value is nil" key=compactor index=78738

We have compactor sharding turned on:

        - -compactor.ring.consul.hostname=consul.cortex.svc.cluster.local:8500
        - -compactor.ring.prefix=
        - -compactor.ring.store=consul
        - -compactor.sharding-enabled=true

Expected behavior I think it should exit with error in this situation; crashlooping would make the fault more obvious to the operator, and after a few restarts it would have managed to talk to Consul in my case.

closed time in a day

bboreham

Pull request review commentcortexproject/cortex

Proposal for time series deletion with block storage

 Using block store, the different caches available are: - Chunks cache (stores the potentially to be deleted chunks of data)  - Query results cache (stores the potentially to be deleted data)  -Using the tombstones, the queriers filter out the data received from the ingesters and store-gateway. The cache not being processed through the querier needs to be invalidated to prevent deleted data from coming up in queries. There are two potential caches that could contain deleted data, the chunks cache, and the query results cache. +There are two potential caches that could contain deleted data, the chunks cache, and the query results cache. Using the tombstones, the queriers filter out the data received from the ingesters and store-gateway. The cache not being processed through the querier needs to be invalidated to prevent deleted data from coming up in queries.  -Firstly, the query results cache needs to be invalidated for each new delete request. This can be done using the same mechanism currently used for chunk storage by utilizing the cache generation numbers. For each tenant, their cache is prefixed with a cache generation number. This is already implemented into the middleware and would be easy to use for invalidating the cache. When the cache needs to be invalidated due to a delete or cancel delete request, the cache generation numbers would be increased (to the current timestamp), which would invalidate all the cache entries for a given tenant. The cache generation numbers are currently being stored in an Index table  (e.g. DynamoDB or Bigtable). One option for block store is to store a per tenant key using the KV-store with the ring backend and propogate it using a Compare-And-Set/Swap (CAS) operation. If the current cache generation number is older than the KV-store is older or it is empty, then the cache is invalidated and the current timestamp becomes the cache generation number. +Firstly, the query results cache needs to be invalidated for each new delete request. This can be done using the same mechanism currently used for chunk storage by utilizing the cache generation numbers. For each tenant, their cache is prefixed with a cache generation number. This is already implemented into the middleware and would be easy to use for invalidating the cache. When the cache needs to be invalidated due to a delete or cancel delete request, the cache generation numbers would be increased (to the current timestamp), which would invalidate all the cache entries for a given tenant. With chunk store, the cache generation numbers are currently being stored in an Index table  (e.g. DynamoDB or Bigtable). One option for block store is to save a per tenant key using the KV-store with the ring backend and propagate it using a Compare-And-Set/Swap (CAS) operation. If the current cache generation number is older than the KV-store is older or it is empty, then the cache is invalidated and the current timestamp becomes the cache generation number. 

@pracucci - I was thinking of an alternative approach.

The purger would write new tombstones to 3 ingesters assigned to a user and wait until at-least 2 succeeds. While executing a query, a querier would fetch the tombstones for a user from all ingesters in the cluster. If the tombstoneTimestamp > currentCacheGenNumber, the querier would update the currentCacheGenNumber to currentTimestamp.

ilangofman

comment created time in a day

push eventcortexproject/cortex

ci

commit sha 81a74f073b0735646591579cdf5a66b174d898ac

Deploy to GitHub pages

view details

push time in a day

push eventcortexproject/cortex

Niclas Schad

commit sha 83b51d1de55aa5174da87c6bc2bf7a9e84d125b4

docs: added alertmanager_storage docs (#4264) Signed-off-by: ShuzZzle <niclas.schad@gmail.com>

view details

push time in a day

PR merged cortexproject/cortex

docs: added alertmanager_storage docs size/XS

Signed-off-by: ShuzZzle niclas.schad@gmail.com

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does: Update documentation on how to configure alertmanager_storage

Which issue(s) this PR fixes: Fixes #4206

Checklist

  • [ NA ] Tests updated
  • [ X ] Documentation added
  • [ NA ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+8 -0

1 comment

1 changed file

ShuzZzle

pr closed time in a day

issue closedcortexproject/cortex

Document how to drive Ruler and AlertManager config

Best I could find is https://cortexmetrics.io/docs/guides/alertmanager-configuration/, which doesn't cover what changed in #3888

There are several issues which boil down to "it isn't documented": #3401, #3395, #3148

closed time in a day

bboreham

pull request commentcortexproject/cortex

Remove inadvertant global variable in integration tests

It's interesting to see that CI fails constantly on this PR (apparently due to flaky tests) but not on other PRs 🤔 @56quarters is this rebased to master?

56quarters

comment created time in a day

PR opened cortexproject/cortex

Add ability to support strict JSON unmarshal for `limits`

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does: Adds ability to support strict JSON unmarshal for limits struct This is the default behaviour for YAML now.

	in := `{"unknown_fields": 100}`
	l := validation.Limits{}

	fmt.Println(yaml.UnmarshalStrict([]byte(in), &l))

	// yaml: unmarshal errors:
	// line 1: field unknown_fields not found in type validation.plain

This PR adds same behaviour if unmarshalled from JSON input as well.

Which issue(s) this PR fixes: Fixes NA

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+15 -1

0 comment

2 changed files

pr created time in a day

PR opened cortexproject/cortex

chore: Define interface api.Distributor

What this PR does:

This allows reusing the API handler while swapping out the Distributor implementation.

In my use case I would like to use NewQuerierHandler with my own queryable, but not provide a "real" distributor implementation.

Signed-off-by: Christian Simon simon@swine.de

Checklist

  • ~[ ] Tests updated~
  • ~[ ] Documentation added~
  • ~[ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]~
+7 -3

0 comment

2 changed files

pr created time in a day

issue commentcortexproject/cortex

Looking for Recommendation on Ingester Resource Configuration/Optimization

We have a Cortex(v1.7)

Could you upgrade to Cortex 1.9? The S3 client memory utilization when uploading large objects is something that has been improved at some point, so I would suggest you to test with the latest version (generally speaking it's always a good idea to run the latest stable version).

But when we increase the rate to ~30K metrics per second

I guess you mean samples/sec. However, the ingesters memory utilisation is primarily influenced by the number of in-memory series (number of different series hold in the ingester memory). You can query cortex_ingester_memory_series metric exposed by Cortex ingesters to see the actual number of in-memory series. With that number I can give suggestions about the sizing of the ingester.

To give you an idea, we target to 1.5M series / ingester with 25GB memory limit / ingester.

prprasad2020

comment created time in a day

pull request commentcortexproject/cortex

Add support for running cortex through docker image without root uid

This PR looks in a good shape. @simonswine do you have any plan to progress on it?

@pracucci thanks for the reminder, the outstanding work is to document the way to enable it for Kubernetes / docker run. Will hopefully get a chance to do that later today or early next week

simonswine

comment created time in a day

issue commentcortexproject/cortex

Problems with alertmanager in v1.9

You got this error:

level=error ts=2021-06-14T20:32:16.470699236Z caller=cortex.go:426 msg="module failed" module=alertmanager err="invalid service state: Failed, expected: Running, failure: failed to list users with alertmanager configuration: Get "http://spi-mon-storage-cortex-configs.mon.svc.cluster.local:8080/private/api/prom/configs/alertmanager": dial tcp: lookup spi-mon-storage-cortex-configs.mon.svc.cluster.local on 169.254.25.10:53: no such host"

Looking at your config:

  alertmanager_storage:
    backend: s3
    configdb:
      configs_api_url: http://spi-mon-storage-cortex-configs.mon.svc.infra.(...):8080
    s3:
      endpoint: s3.i02.estaleiro.serpro.gov.br
      bucket_name: prod-cortex-alertmanager

configdb and S3 are mutually exclusive. You should remove the configdb entry from the config if you want to use S3 (suggested) to store your config.

mrmassis

comment created time in a day

push eventcortexproject/cortex

ci

commit sha c6127786552532199b412933b76e69e895936071

Deploy to GitHub pages

view details

push time in a day

push eventcortexproject/cortex

Marco Pracucci

commit sha 8ddf6141470fbdf3602d5954e6900a1176721412

Fixed 'make doc' Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

push time in a day

push eventcortexproject/cortex

Marco Pracucci

commit sha 5db985781ab0f6188685f3a07d705c8b88456a9e

Deleted unused code Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

push time in a day

delete branch cortexproject/cortex

delete branch : update-chunks-storage-deprecation-doc

delete time in a day

push eventcortexproject/cortex

Marco Pracucci

commit sha b15782ef8cf42ac6653efbf0bd1cf6b3489ea5aa

Updated doc about chunks storage deprecation (#4294) Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

push time in a day

PR merged cortexproject/cortex

Updated doc about chunks storage deprecation size/XS

What this PR does: In this PR I'm addressing the feedback received in https://github.com/cortexproject/cortex/pull/4268#issuecomment-856942861.

Which issue(s) this PR fixes: N/A

Checklist

  • [ ] Tests updated
  • [x] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+4 -2

0 comment

3 changed files

pracucci

pr closed time in a day

issue commentcortexproject/cortex

Cortex can read rules but doesn't activate them

With your current config, the rules file is store at: /etc/cortex/rules/rules.yml

The expected (correct) filepath is: /etc/cortex/rules/<tenant id>/<filename>.yml

If you're running with auth enabled, then <tenant id> is your tenant id, otherwise it's hardcoded to fake. So if you're running with auth disabled the file should be stored at the following path to make it work: /etc/cortex/rules/fake/rules.yml

jakubgs

comment created time in a day

push eventcortexproject/cortex

Marco Pracucci

commit sha 34a933ca0141aa402d650c5a5289a8754153c556

Fixed userID in integration tests Signed-off-by: Marco Pracucci <marco@pracucci.com>

view details

push time in 2 days

PR closed cortexproject/cortex

when msi is enabled, use a msi authorizer to fetch the storage accoun… storage/chunks

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does: In some Azure environments, sensitive information such as Storage Account Key is not allowed in the application configuration file. Azure recommends using MSI to obtain the read and write permissions of the Storage Account.

Three config options are added to BlobStorageConfig. MSIEnabled: whether to use msi to access storage account MSIResource: The Azure Management URI ResourceGroupName: the resource group name which the target storage account belong to SubscriptionId: the subscription id where current env belong to

Imported two denpendencies: azure-sdk-for-go and go-autorest

With the above changes, use a MSI authorizer to interact with storage account and get the key of it. Then update this key into the original field. Thus I think we can archive the goal with minimal cost.

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+32867 -0

5 comments

144 changed files

guojing013214

pr closed time in 2 days

pull request commentcortexproject/cortex

when msi is enabled, use a msi authorizer to fetch the storage accoun…

We've deprecated the chunks storage in Cortex and Loki is forking it into their own repo (see https://github.com/grafana/loki/pull/3842). I would suggest to discuss this change with the Loki community as soon as https://github.com/grafana/loki/pull/3842 is merged.

guojing013214

comment created time in 2 days

PR opened cortexproject/cortex

Reviewers
Reduce chunks storage usage in integration tests

What this PR does: Following up the deprecation of the chunks storage, in this PR I'm proposing to:

  1. Delete chunks storage tests for which we also have a blocks storage counterpart
  2. Delete docs/configuration/single-process-config.md from doc and move docs/configuration/single-process-config.yaml to docs/chunks-storage/single-process-config.yaml

Which issue(s) this PR fixes: N/A

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+13 -1118

0 comment

13 changed files

pr created time in 2 days