profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/pracucci/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Marco Pracucci pracucci Grafana Labs Italy https://pracucci.com Software Engineer at Grafana Labs

grafana/cortex-jsonnet 56

This repo has the jsonnet for deploying and also the mixin for monitoring Cortex

pracucci/node-cidr-matcher 21

Fast CIDR matcher. Given an input IPv4 or IPv6 address, it checks if it's inside a set of IP ranges, expressed in CIDR notation.

grafana/puppet-promtail 8

Deploy and configure Grafana's Promtail with Puppet

pracucci/php-on-kubernetes 8

Lessons learned running PHP on Kubernetes in production

pracucci/lokitool 5

Tooling for Grafana Loki

pracucci/elasticsearch-playstore 3

Google Play Store App Analytics importer for ElasticSearch

pracucci/alertmanager 1

Prometheus Alertmanager

pracucci/avalanche 1

Prometheus/OpenMetrics endpoint series generator for load testing.

pracucci/cortex 1

A multitenant, horizontally scalable Prometheus as a Service

pracucci/etcd 1

Distributed reliable key-value store for the most critical data of a distributed system

pull request commentcortexproject/cortex

Make `cortex_discarded_samples_total` Independent of the Replication Factor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

LeviHarrison

comment created time in 9 minutes

PR opened cortexproject/cortex

Add CORTEX_CHECKOUT_PATH env variable to CI

Signed-off-by: Gábor Lipták gliptak@gmail.com

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does:

Which issue(s) this PR fixes: Fixes #<issue number>

Checklist

  • [ ] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+15 -13

0 comment

1 changed file

pr created time in an hour

startedslidevjs/slidev

started time in 5 hours

PR closed cortexproject/cortex

Add Kustomize Support size/XXL

Signed-off-by: Weifeng Wang qclaogui@gmail.com

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does:

Add Kustomize Support

Kustomize background

Kustomize is a CNCF project that is a part of Kubernetes. It's included in the kubectl in order to allow users to customize their configurations without introducing templates.

Usage

kustomize encourages defining multiple variants - e.g. dev, staging and prod, as overlays on a common base.

It’s possible to create an additional overlay to compose these variants together - just declare the overlays as the bases of a new kustomization.

cortex-kustomize provides a common base for Blocks Storage deployment to Kubernetes. People should Create variants using overlays to deploy Cortex in their own environment.

An overlay is just another kustomization, referring to the base, and referring to patches to apply to that base. This arrangement makes it easy to manage your configuration with git. The base could have files from an upstream repository managed by someone else. The overlays could be in a repository you own. Arranging the repo clones as siblings on disk avoids the need for git submodules (though that works fine, if you are a submodule fan).

Example

This is an example of monitoring Cortex by adding prometheus and grafana using kustomize

  1. Create development Environment

    mkdir -p deploy/overlays/dev

  1. Create kustomization.yaml

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    namespace: cortex-monitoring-system
    
    resources:
    - github.com/qclaogui/cortex-kustomize/deploy/base/blocks?ref=main
    - add-grafana-dep.yaml
    - add-grafana-svc.yaml
    - add-retrieval-dep.yaml
    - add-retrieval-svc.yaml
    
    patchesStrategicMerge:
    - patch-nginx-svc.yaml
    
    images:
    - name: quay.io/cortexproject/cortex
     newTag: master-b6eea5f
    - name: minio/minio
     newTag: RELEASE.2021-06-17T00-10-46Z
    

File structure:

└── deploy
   └── overlays
       └── dev
           ├── add-grafana-dep.yaml
           ├── add-grafana-svc.yaml
           ├── add-retrieval-dep.yaml
           ├── add-retrieval-svc.yaml
           ├── kustomization.yaml
           └── patch-nginx-svc.yaml

  1. Deploy to a cluster

kustomize build deploy/overlays/dev | kubectl apply -f -

More example detailed https://github.com/qclaogui/cortex-kustomize-demo.

Happy Cortex Which issue(s) this PR fixes:

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+1489 -0

3 comments

35 changed files

qclaogui

pr closed time in 7 hours

pull request commentcortexproject/cortex

Add Kustomize Support

I have moved the files to a separate repo called cortex-kustomize.

qclaogui

comment created time in 14 hours

startedgitpod-io/gitpod

started time in 14 hours

issue commentcortexproject/cortex

Deprecated endpoints in configs

@bboreham So the other got closed, but this is still not in the docs? I tried to follow PRs. Can you re-open this?

till

comment created time in 15 hours

issue closedthanos-io/thanos

compact: Ensure downsampled chunks are not larger than 120 samples; stream downsampling more

During bug fixing on https://github.com/thanos-io/thanos/pull/2528 I found that downsampling always encodes whatever is given in the block into huge chunks. This can lead to inefficiency during query time when only a small part of chunk data is needed, but store GW needs to fetch and decode everything.

See: https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/compact/downsample/downsample.go#L141

AC:

  • Downsampled chunks are not larger than 120 samples
  • Chunks are expanded on demand in iterator.

closed time in 17 hours

bwplotka

issue commentthanos-io/thanos

compact: Ensure downsampled chunks are not larger than 120 samples; stream downsampling more

Closing for now as promised, let us know if you need this to be reopened! 🤗

bwplotka

comment created time in 17 hours

Pull request review commentthanos-io/thanos

Block replication concurrently

 func (rs *replicationScheme) execute(ctx context.Context) error { 		return availableBlocks[i].BlockMeta.MinTime < availableBlocks[j].BlockMeta.MinTime 	}) +	// Replicate concurrently+	var wg sync.WaitGroup+	errChan := make(chan error)+	finishChan := make(chan struct{})+ 	for _, b := range availableBlocks {-		if err := rs.ensureBlockIsReplicated(ctx, b.BlockMeta.ULID); err != nil {-			return errors.Wrapf(err, "ensure block %v is replicated", b.BlockMeta.ULID.String())+		wg.Add(1)+		go func(b *metadata.Meta) {+			defer wg.Done()+			if err := rs.ensureBlockIsReplicated(ctx, b.BlockMeta.ULID); err != nil {

Thanks for the coemmnt. Working on these.

Hangzhi

comment created time in 17 hours

issue commentthanos-io/thanos

query: exemplar support file-sd

@yeya24 Your work for exemplar is great, thank you. file-sd is very useful in some scene. some cloud provider place deployment in diffrent k8s cluster, so cannot use k8s headless service. I deploy thanos cluster(with 70+ sidecar, use --store.sd-files discovery sidecar endpoint. recently, i try to introduce exemplar feature in metrics system. but --exemplar can not use file-sd. I try to add support on fork branch with repeating store.file-sd code.

when #4282 is merged, add file-sd support will be more easy.

hanjm

comment created time in 18 hours

issue commentcortexproject/cortex

Cortex can read rules but doesn't activate them

@pracucci

In my S3(Minio) monitoring bucket I am getting 0, not sure is it the tenant id or not.

image

And I did modification in ruler yaml but NO LUCK.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cortex-ruler-configmap
  namespace: monitoring
data:
  rules.yml: |-
    groups:
      - name: "centralmonitoring"
        rules:
          - alert: "PrometheusDown"
            annotations:
              message: Prometheus replica in cluster {{$labels.cluster}} has disappeared.
            expr: sum(up{cluster!="", pod=~"prometheus.*"}) by (cluster) < 3
            for: 15s
            labels:
              severity: critical
              category: metrics
          - alert: "TooManyPods"
            annotations:
              message: Too many pods in cluster {{$labels.cluster}} on node {{$labels.instance}}
            expr: sum by(cluster,instance) (kubelet_running_pods{cluster!="",instance!=""}) > 5
            for: 15s
            labels:
              severity: warning
              category: metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruler
spec:
  replicas: 1
  selector:
    matchLabels:
      name: ruler
  template:
    metadata:
      labels:
        name: ruler
    spec:
      containers:
      - name: ruler
        image: quay.io/cortexproject/cortex:v1.9.0
        imagePullPolicy: IfNotPresent
        args:
        - -target=ruler
        - -log.level=debug
        - -server.http-listen-port=80
        - -ruler.configs.url=http://configs.monitoring.svc.cluster.local:80
        - -ruler.alertmanager-url=http://alertmanager.monitoring.svc.cluster.local:9093
        - -ruler-storage.backend=local
        - -ruler-storage.local.directory=/etc/cortex/rules/0
        - -ruler.rule-path=/rules
        - -consul.hostname=consul.monitoring.svc.cluster.local:8500
        - -s3.url=s3://admin:admin2675@172.31.40.72:9000/monitoring
        - -s3.force-path-style=true
        - -dynamodb.url=dynamodb://user:pass@dynamodb.monitoring.svc.cluster.local:8000
        - -schema-config-file=/etc/cortex/schema.yaml
        - -store.chunks-cache.memcached.addresses=memcached.monitoring.svc.cluster.local:11211
        - -store.chunks-cache.memcached.timeout=100ms
        - -store.chunks-cache.memcached.service=memcached
        - -distributor.replication-factor=1
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/cortex
          name: config
        - mountPath: /etc/cortex/rules/0
          name: alert
        - mountPath: /rules
          name: rules
      volumes:
        - configMap:
            name: schema-config
          name: config
        - configMap:
            name: cortex-ruler-configmap
          name: alert
        - emptyDir: {}
          name: rules
  • Error
[root@ip-172-31-40-72 monitoring]# oc exec -it ruler-7fb94dd7d7-8t6qc sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # ls -ltr /rules/
total 0
/ # ls -ltr /etc/cortex/rules/0/
total 0
lrwxrwxrwx    1 root     root            16 Jun 19 00:42 rules.yml -> ..data/rules.yml
/ # exit
[root@ip-172-31-40-72 monitoring]# oc logs -f ruler-7fb94dd7d7-8t6qc
level=info ts=2021-06-19T00:42:53.016332668Z caller=main.go:188 msg="Starting Cortex" version="(version=1.9.0, branch=HEAD, revision=ed4f339)"
level=info ts=2021-06-19T00:42:53.017559897Z caller=server.go:239 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=debug ts=2021-06-19T00:42:53.019175345Z caller=api.go:128 msg="api: registering route" methods=GET path=/config auth=false
level=debug ts=2021-06-19T00:42:53.021682252Z caller=api.go:128 msg="api: registering route" methods=GET path=/ auth=false
level=debug ts=2021-06-19T00:42:53.021814945Z caller=api.go:128 msg="api: registering route" methods=GET path=/debug/fgprof auth=false
level=debug ts=2021-06-19T00:42:53.021959288Z caller=api.go:128 msg="api: registering route" methods=GET path=/memberlist auth=false
level=debug ts=2021-06-19T00:42:53.023008694Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ingester/ring auth=false
level=debug ts=2021-06-19T00:42:53.023055416Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ring auth=false
level=warn ts=2021-06-19T00:42:53.023883349Z caller=experimental.go:19 msg="experimental feature in use" feature="DNS-based memcached service discovery"
level=info ts=2021-06-19T00:42:53.030612161Z caller=mapper.go:46 msg="cleaning up mapped rules directory" path=/rules
level=debug ts=2021-06-19T00:42:53.030753864Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler/ring auth=false
level=debug ts=2021-06-19T00:42:53.030790066Z caller=api.go:128 msg="api: registering route" methods=POST path=/ruler/delete_tenant_config auth=true
level=debug ts=2021-06-19T00:42:53.030835253Z caller=api.go:128 msg="api: registering route" methods=GET,POST path=/ruler_ring auth=false
level=debug ts=2021-06-19T00:42:53.030860276Z caller=api.go:128 msg="api: registering route" methods=GET path=/ruler/rule_groups auth=false
level=debug ts=2021-06-19T00:42:53.030909584Z caller=api.go:128 msg="api: registering route" methods=GET path=/services auth=false
level=info ts=2021-06-19T00:42:53.031585317Z caller=module_service.go:59 msg=initialising module=server
level=debug ts=2021-06-19T00:42:53.031690112Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.031718025Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.031758011Z caller=module_service.go:49 msg="module waiting for initialization" module=store waiting_for=server
level=debug ts=2021-06-19T00:42:53.031776355Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=distributor-service
level=info ts=2021-06-19T00:42:53.031791076Z caller=module_service.go:59 msg=initialising module=store
level=debug ts=2021-06-19T00:42:53.031587773Z caller=module_service.go:49 msg="module waiting for initialization" module=memberlist-kv waiting_for=server
level=info ts=2021-06-19T00:42:53.031937836Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=debug ts=2021-06-19T00:42:53.032078255Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=ring
level=debug ts=2021-06-19T00:42:53.032146949Z caller=module_service.go:49 msg="module waiting for initialization" module=ring waiting_for=server
level=info ts=2021-06-19T00:42:53.032254539Z caller=module_service.go:59 msg=initialising module=ring
level=debug ts=2021-06-19T00:42:53.047187773Z caller=module_service.go:49 msg="module waiting for initialization" module=distributor-service waiting_for=server
level=info ts=2021-06-19T00:42:53.047280661Z caller=module_service.go:59 msg=initialising module=distributor-service
level=debug ts=2021-06-19T00:42:53.047490102Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=memberlist-kv
level=debug ts=2021-06-19T00:42:53.047527989Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=ring
level=debug ts=2021-06-19T00:42:53.047539583Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=server
level=debug ts=2021-06-19T00:42:53.047727464Z caller=module_service.go:49 msg="module waiting for initialization" module=ruler waiting_for=store
level=info ts=2021-06-19T00:42:53.047738286Z caller=module_service.go:59 msg=initialising module=ruler
level=info ts=2021-06-19T00:42:53.047768984Z caller=ruler.go:438 msg="ruler up and running"
level=debug ts=2021-06-19T00:42:53.047783448Z caller=ruler.go:476 msg="syncing rules" reason=initial
level=info ts=2021-06-19T00:42:53.047888139Z caller=cortex.go:414 msg="Cortex started"
level=debug ts=2021-06-19T00:43:53.048652595Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-19T00:44:53.047992537Z caller=ruler.go:476 msg="syncing rules" reason=periodic
level=debug ts=2021-06-19T00:45:53.048070935Z caller=ruler.go:476 msg="syncing rules" reason=periodic

NOTE: I tried with both 0 and fake but result same :(

jakubgs

comment created time in a day

pull request commentthanos-io/thanos

Replaces errutil with tools/pkg/merrors; Fixes very ugly misuse of multi-error

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

bwplotka

comment created time in a day

issue closedcortexproject/cortex

compactor went into tailspin

Describe the bug

Consul was down at the time compactor started, and it never recovered:

level=info ts=2021-03-04T20:23:01.350819279Z caller=main.go:188 msg="Starting Cortex" version="(version=1.6.0, branch=master, revision=56f794d)"
level=info ts=2021-03-04T20:23:01.35627101Z caller=module_service.go:59 msg=initialising module=server
level=info ts=2021-03-04T20:23:01.358032071Z caller=module_service.go:59 msg=initialising module=compactor
level=info ts=2021-03-04T20:23:01.351964058Z caller=server.go:229 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-03-04T20:23:01.356491008Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=info ts=2021-03-04T20:23:01.36340712Z caller=compactor.go:373 component=compactor msg="waiting until compactor is ACTIVE in the ring"
level=info ts=2021-03-04T20:23:01.366220882Z caller=lifecycler.go:527 msg="not loading tokens from file, tokens file path is empty"
level=error ts=2021-03-04T20:23:01.391200499Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.394262836Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.391467754Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.399955867Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.397642587Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.402454173Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.404896477Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.407444107Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.410901197Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.413247689Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.4157827Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:02.848432818Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:06.895736201Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:14.397161217Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:25.078769837Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:49.041899802Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:24:35.634427793Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:25:35.127795679Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:26:37.176005708Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:27:42.508689928Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:28:35.350979964Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:29:41.117751261Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:30:39.87687265Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=info ts=2021-03-04T20:31:36.036454597Z caller=client.go:247 msg="value is nil" key=compactor index=22
level=info ts=2021-03-04T20:31:36.930199939Z caller=client.go:247 msg="value is nil" key=compactor index=24
level=info ts=2021-03-04T20:31:37.932395975Z caller=client.go:247 msg="value is nil" key=compactor index=25
level=info ts=2021-03-04T20:31:41.887641219Z caller=client.go:247 msg="value is nil" key=compactor index=28
level=info ts=2021-03-04T20:31:41.941646251Z caller=client.go:247 msg="value is nil" key=compactor index=29
level=info ts=2021-03-04T20:31:46.926493102Z caller=client.go:247 msg="value is nil" key=compactor index=31
level=info ts=2021-03-04T20:31:46.967013886Z caller=client.go:247 msg="value is nil" key=compactor index=32
...
level=info ts=2021-03-05T10:12:35.156472897Z caller=client.go:247 msg="value is nil" key=compactor index=78733
level=info ts=2021-03-05T10:12:35.455228914Z caller=client.go:247 msg="value is nil" key=compactor index=78734
level=info ts=2021-03-05T10:12:36.157313862Z caller=client.go:247 msg="value is nil" key=compactor index=78736
level=info ts=2021-03-05T10:12:37.157305834Z caller=client.go:247 msg="value is nil" key=compactor index=78738

We have compactor sharding turned on:

        - -compactor.ring.consul.hostname=consul.cortex.svc.cluster.local:8500
        - -compactor.ring.prefix=
        - -compactor.ring.store=consul
        - -compactor.sharding-enabled=true

Expected behavior I think it should exit with error in this situation; crashlooping would make the fault more obvious to the operator, and after a few restarts it would have managed to talk to Consul in my case.

closed time in a day

bboreham

Pull request review commentthanos-io/thanos

support build docker images for arm64 arch

+ARG OS="linux"

Agree 100%

daixiang0

comment created time in a day

pull request commentthanos-io/thanos

Fix for parsing port at log middleware

Thanks for the fix. This pr worth a unit test. Would you mind adding one?

Added a unit test to test with an URL that does not explicitly contain a port number.

spaparaju

comment created time in a day

Pull request review commentcortexproject/cortex

Proposal for time series deletion with block storage

 Using block store, the different caches available are: - Chunks cache (stores the potentially to be deleted chunks of data)  - Query results cache (stores the potentially to be deleted data)  -Using the tombstones, the queriers filter out the data received from the ingesters and store-gateway. The cache not being processed through the querier needs to be invalidated to prevent deleted data from coming up in queries. There are two potential caches that could contain deleted data, the chunks cache, and the query results cache. +There are two potential caches that could contain deleted data, the chunks cache, and the query results cache. Using the tombstones, the queriers filter out the data received from the ingesters and store-gateway. The cache not being processed through the querier needs to be invalidated to prevent deleted data from coming up in queries.  -Firstly, the query results cache needs to be invalidated for each new delete request. This can be done using the same mechanism currently used for chunk storage by utilizing the cache generation numbers. For each tenant, their cache is prefixed with a cache generation number. This is already implemented into the middleware and would be easy to use for invalidating the cache. When the cache needs to be invalidated due to a delete or cancel delete request, the cache generation numbers would be increased (to the current timestamp), which would invalidate all the cache entries for a given tenant. The cache generation numbers are currently being stored in an Index table  (e.g. DynamoDB or Bigtable). One option for block store is to store a per tenant key using the KV-store with the ring backend and propogate it using a Compare-And-Set/Swap (CAS) operation. If the current cache generation number is older than the KV-store is older or it is empty, then the cache is invalidated and the current timestamp becomes the cache generation number. +Firstly, the query results cache needs to be invalidated for each new delete request. This can be done using the same mechanism currently used for chunk storage by utilizing the cache generation numbers. For each tenant, their cache is prefixed with a cache generation number. This is already implemented into the middleware and would be easy to use for invalidating the cache. When the cache needs to be invalidated due to a delete or cancel delete request, the cache generation numbers would be increased (to the current timestamp), which would invalidate all the cache entries for a given tenant. With chunk store, the cache generation numbers are currently being stored in an Index table  (e.g. DynamoDB or Bigtable). One option for block store is to save a per tenant key using the KV-store with the ring backend and propagate it using a Compare-And-Set/Swap (CAS) operation. If the current cache generation number is older than the KV-store is older or it is empty, then the cache is invalidated and the current timestamp becomes the cache generation number. 

@pracucci - I was thinking of an alternative approach.

The purger would write new tombstones to 3 ingesters assigned to a user and wait until at-least 2 succeeds. While executing a query, a querier would fetch the tombstones for a user from all ingesters in the cluster. If the tombstoneTimestamp > currentCacheGenNumber, the querier would update the currentCacheGenNumber to currentTimestamp.

ilangofman

comment created time in a day

PublicEvent

PR opened cortexproject/cortex

Add ability to support strict JSON unmarshal for `limits`

<!-- Thanks for sending a pull request! Before submitting:

  1. Read our CONTRIBUTING.md guide
  2. Rebase your PR if it gets out of sync with master -->

What this PR does: Adds ability to support strict JSON unmarshal for limits struct This is the default behaviour for YAML now.

	in := `{"unknown_fields": 100}`
	l := validation.Limits{}

	fmt.Println(yaml.UnmarshalStrict([]byte(in), &l))

	// yaml: unmarshal errors:
	// line 1: field unknown_fields not found in type validation.plain

This PR adds same behaviour if unmarshalled from JSON input as well.

Which issue(s) this PR fixes: Fixes NA

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
+15 -1

0 comment

2 changed files

pr created time in a day

PR opened cortexproject/cortex

chore: Define interface api.Distributor

What this PR does:

This allows reusing the API handler while swapping out the Distributor implementation.

In my use case I would like to use NewQuerierHandler with my own queryable, but not provide a "real" distributor implementation.

Signed-off-by: Christian Simon simon@swine.de

Checklist

  • ~[ ] Tests updated~
  • ~[ ] Documentation added~
  • ~[ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]~
+7 -3

0 comment

2 changed files

pr created time in a day

pull request commentcortexproject/cortex

Add support for running cortex through docker image without root uid

This PR looks in a good shape. @simonswine do you have any plan to progress on it?

@pracucci thanks for the reminder, the outstanding work is to document the way to enable it for Kubernetes / docker run. Will hopefully get a chance to do that later today or early next week

simonswine

comment created time in a day

startedpracucci/www-pracucci

started time in a day

startedpracucci/alertmanager

started time in a day

startedpracucci/prometheus

started time in a day

startedpracucci/avalanche

started time in a day

startedpracucci/cortex

started time in a day

issue commentthanos-io/thanos

katacoda: Add a tutorial to demonstrate Receiver modes (Ingester, Router)

This is blocked on https://github.com/thanos-io/thanos/issues/4359

kakkoyun

comment created time in a day

issue openedthanos-io/thanos

receive: 'quorum not reached' when --receive.replication-factor > 1

<!-- Template relevant to bug reports only!

Keep issue title verbose enough and add prefix telling about what components it touches e.g "query:" or ".*:" -->

<!-- In case of issues related to exact bucket implementation, please ping corresponded maintainer from list here: https://github.com/thanos-io/thanos/blob/main/docs/storage.md -->

Thanos, Prometheus and Golang version used:

  • Thanos: HEAD
  • Prometheus: v2.26.0
  • Golang: go version go1.16.4 linux/amd64

<!-- Output of "thanos --version" or docker image:tag used. (Double-check if all deployed components/services have expected versions)

If you are using custom build from main branch, have you checked out the tip of the main? -->

Object Storage Provider: N/A

What happened:

Running TestReceive/receive_distributor_ingestor_mode e2e test, but specifying --receive.replication-factor=2 in the routing component throws a huge number number of of quorum not reached errors (details below), and the data is not replicated beyond one ingesting component.

What you expected to happen:

Data should be replicated twice per source across 'receive' instances.

How to reproduce it (as minimally and precisely as possible):

See https://github.com/thanos-io/thanos/pull/4358 for the minimal failing test case.

Pull the branch, and run:

make docker && go test ./test/e2e -run TestReceive/receive_distributor_ingestor_mode -v

Full logs to relevant components:

<details> <summary>Routing logs</summary>


09:44:41 receive-d1: level=error name=receive-d1 ts=2021-06-18T08:44:41.943443895Z caller=handler.go:352 component=receive component=receive-handler 
      err="3 errors: 
          replicate write request for endpoint receive_distributor_ingestor_mode-receive-i1:9091: quorum not reached: 
          2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i2:9091: rpc error: code = InvalidArgument 
          desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i1:9091: rpc error: code = AlreadyExists 
          desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i1:9091: conflict; 
           
          replicate write request for endpoint receive_distributor_ingestor_mode-receive-i2:9091: quorum not reached: 
          2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i3:9091: rpc error: code = InvalidArgument 
          desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i2:9091: rpc error: code = AlreadyExists 
          desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i2:9091: conflict; 
           
          replicate write request for endpoint receive_distributor_ingestor_mode-receive-i3:9091: quorum not reached: 
          2 errors: forwarding request to endpoint receive_distributor_ingestor_mode-receive-i1:9091: rpc error: code = InvalidArgument 
          desc = replica count exceeds replication factor; forwarding request to endpoint receive_distributor_ingestor_mode-receive-i3:9091: rpc error: code = AlreadyExists 
          desc = store locally for endpoint receive_distributor_ingestor_mode-receive-i3:9091: conflict" msg="internal server error"

</details>

Anything else we need to know:

Recently we've implemented 'dual mode' (https://github.com/thanos-io/thanos/pull/4231) in receiver, whereby it can run in the following modes:

  • routing - forward remote_write requests only.
  • ingesting - ingest metrics data only, no routing.
  • routing & ingesting - each receive both forwards and ingests data - this led to the split-proposal.

The current e2e tests does not exercise this configuration.

Why is this error happening?

When an ingesting only component is started up, it gets the following defaults:

  • --receive.replication-factor set to 1
  • SingleNodeHashring here

When a routing component configured with --receive.replication-factor=2 routes a remote_write requests to two ingesting components, the first one will succeed and the second one will fail.

  • In the first request replica = 1 in the first storepb.WriteRequest, since data-replication is 1 in the ingester it accepts this request ok.

  • In the second request,replica = 2 in storepb.WriteRequest, since data-replication is 1 in the ingester it does not accept the request.

Since the second request always fails, the router can never reach quorum.

Can we not just set a high replication factor in each of the ingesters?

This doesn't work either, because by default they are started up with a SingleNodeHashring, so if they receive a storepb.WriteRequest with replica=2, they cannot accept this because they think they only have one node in their hashring.

What is the solution?

Chatting with @squat earlier today, we thought that we could remove the strict check an ingester makes before accepting the storepb.WriteRequest, which has implications we will discuss in the PR.

created time in a day

release grafana/cortex-tools

v0.10.2

released time in 2 days