profile
viewpoint
Tamara Boehm b1tamara @sap-cloudfoundry Walldorf, Germany

b1tamara/cf-deployment 0

The canonical open source deployment manifest for Cloud Foundry

b1tamara/cf-for-k8s 0

The open source deployment manifest for Cloud Foundry on Kubernetes

b1tamara/cf-k8s-networking 0

building a cloud foundry without gorouter....

b1tamara/haproxy-boshrelease 0

A BOSH release for haproxy (based on cf-release's haproxy job)

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

I see this as an opportunity to improve what CF does in regards to HA. Keeping "at least an app instance up-and-running" doesn't consider the load the app may be experiencing and would still result in downtime if an app experiences high load. I also see the perspective of keeping the status-quo but this might be an opportunity to do better.

braunsonm

comment created time in 34 minutes

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

Reading earlier comments in this thread, I came here to say what @9numbernine9 has meanwhile said: I believe that a good (IMHO: the best) default is to do what Diego does because we'll see people moving over from CF-for-VMs not expecting a behavior change when it comes to CF app scaling. If we later on want to add additional flexibility by considering something like dev vs. prod foundations this is fine, but I'd advocate for keeping the status quo first.

Re-reading @9numbernine9's comment, I'm not sure if it is suggesting to keep the exact same Diego behavior or if the suggestion is to at least keep an app instance up-and-running to be able to serve requests. As mentioned above, my strong preference would be to retain Diego behavior.

braunsonm

comment created time in an hour

push eventcloudfoundry/cf-for-k8s

Dave Walter

commit sha bf724363899931668a1b546aa87d90a87de7462e

CI-FIX: Update the slack notification to be more helpful ... when there aren't any claimed environments.

view details

Dave Walter

commit sha f21aede78475f578e2684d8cc28e04a5f47eb70a

CI-FIX: Remove the unused color codes from the slack message

view details

push time in 8 hours

push eventcloudfoundry/cf-for-k8s

relint-ci

commit sha f7a4510bcb97313282ae033c3705f6c4959fa1fd

Autobump buildpacks

view details

push time in 8 hours

push eventcloudfoundry/cf-for-k8s

relint-ci

commit sha 51f19e0141bb832eb4e18c2be0eee54b7ab98425

Bump capi-k8s-release to 1c62ed4e700eb0ce89821f1a4e90a9957f6ab6ba

view details

relint-ci

commit sha e14e2122ee57bc58147925b7d94df421e9267a09

Autobump stack images

view details

Andrew Wittrock

commit sha 2ecdc51eb9a0233634a856f6439f77b6b4ef6a28

Update kbld minimum version to latest release. [kbld v0.28.0](https://github.com/vmware-tanzu/carvel-kbld/releases/tag/v0.28.0) Signed-off-by: James Pollard <pollardja@vmware.com>

view details

relint-ci

commit sha 2da32e0c9fb011f826939fcea19338b7c35d4c58

Bump uaa to v74.31.0

view details

Andrew Wittrock

commit sha a9ac8d0f749867bba5728fa103a98a0cd40058fc

Add leading 'v' character to CI version - Merged PR function had invalid ref with semver resource.

view details

relint-ci

commit sha 61edddf5a739d7b1e7ecee742fc2bd2bc112127d

Autobump stack images

view details

relint-ci

commit sha 9e96768d31ac0b9313db218adc496c576f3d520f

Autobump stack images

view details

Andrew Wittrock

commit sha f1e382e134c2f9929bc764ec407c15d5dc009237

Changed merged PRs pattern match

view details

relint-ci

commit sha 04904218627c653fe12cb9856b4d09b4c6f7d2df

Autobump buildpacks

view details

relint-ci

commit sha 76aa540eee3a91bf40dbb726e7b670592109dbc1

Bump capi-k8s-release to e592f00bb4428b6e7efd15854c4082114da307eb

view details

relint-ci

commit sha ffb4d36efecccee4ec82661393efaf87921058cd

Autobump buildpacks

view details

relint-ci

commit sha ca44e5609a311e55ead39279da3e3f48ec4a8bdd

Autobump stack images

view details

James Pollard

commit sha d441e6a3ef3dec59d188b1ecc6cc3ad2c0ca3fe9

CI-MAINT: bump dependencies of KinD tests - kind v0.8.0 -> v0.9.0 - ginkgo v1.110.0 -> v1.14.2 - add retry to k14s download [#174887425](https://www.pivotaltracker.com/story/show/174887425)

view details

James Pollard

commit sha 89c24f4e5d2154e570b406c6df59fff018df4d8a

ENH: be consistent about python yq usage - removed yq usage from the getting-started-tutorial [#176520180](https://www.pivotaltracker.com/story/show/176520180)

view details

James Pollard

commit sha 08f094cd4a02c52277fb9c364157beb7fd06d57d

DOC: add info about app capacity on small local cf-for-k8s clusters [#175401585](https://www.pivotaltracker.com/story/show/175401585)

view details

James Pollard

commit sha b39e576a383831f81562c99478175e232f83065e

Merge pull request #612 from cloudfoundry/ci-maint/bump-kind-dependencies CI-MAINT: bump dependencies of KinD tests

view details

James Pollard

commit sha f63e71a969768b04599b5784ad3b1ec1c7e88af5

ENH: default to 2 replicas for istiod for non-local clusters Fixes #604 [#176366929](https://www.pivotaltracker.com/story/show/176366929) Co-authored-by: Andrew Wittrock <awittrock@vmware.com>

view details

relint-ci

commit sha b13ef2349a712e2d80eabc82e288e1beeda80dbf

Autobump buildpacks

view details

relint-ci

commit sha a4b2fc310bea112ed50394d5072903251633a5f0

Autobump buildpacks

view details

Jaskanwal Pawar

commit sha 97a0156c110cb9c0a58c01fa665a74cdf41bbaf0

Migrate CAKE pipeline - Mostly lifted and shifted with these exceptions: - Tasks to deploy CF were removed in favor of our tasks - Tasks to provision a persistent-ish cluster to use for testing were removed in favor of using our pools and their associated tasks - Removed jobs related to building the `backup-metadata-generator` stuff since it looks it is unused (will bring back as needed though) - CAKE tasks which we needed were more or less inlined in `ci/tasks/cake` - We changed the image some of them used to one of our CI images instead - TODO: add `.sh` extensions to their task scripts to conform to our conventions - Had to add some peripheral stuff in order to get the main job to work with our pools: - Had to generate a config file for the BARAS to use - Had to ensure some dependencies in images were installed - e.g. log-cache plugin for the cf CLI [#176354661] Co-authored-by: Andrew Wittrock <awittrock@vmware.com> Co-authored-by: Jaskanwal Pawar <jpawar@pivotal.io> Co-authored-by: James Pollard <pollardja@vmware.com>

view details

push time in 8 hours

push eventcloudfoundry/cf-for-k8s

Andrew Wittrock

commit sha 80dfc9236e108121aebf284a3273608d45f71cd9

WIP - exploratory for building all images.

view details

push time in 9 hours

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha 405cbbc98132cfd0a3f44f9a04eafd3c9cc5f1b8

remove last vestiges of samus Authored-by: Jaskanwal Pawar <jpawar@pivotal.io>

view details

push time in 9 hours

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha ecec743a54e36fa99be853ba49b70fa5a2380f2b

don't touch backup-metadata stuff on promotion - since we're not building its image anymore Authored-by: Jaskanwal Pawar <jpawar@pivotal.io>

view details

push time in 11 hours

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@loewenstein In Diego-land, an app developer doesn't need to specify anything more than instances: 2 in their app manifest (or cf push --instances 2). My understanding is that when a Diego cell needs to be shutdown for maintenance/upgrades, if all the instances of the app are on that one cell, then they will be terminated sequentially and started back up on a different cell (or cells), using the health-check defined for that app to know when the new instances are successfully migrated. Essentially, one instance of a given app is always guaranteed to be running as a result.

FYI, I'm a colleague of @braunsonm and CF-for-VMs operator, in case you're wondering where I'm coming from. 😄 In our CF-for-VMs foundations I don't think we've ever seen an app suffer total failure during a CF upgrade if it was running 2+ instances; I think this kind of behaviour is ultimately the goal that @braunsonm (and me, by extension!) are looking to have wiht cf-for-k8s.

braunsonm

comment created time in 12 hours

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@loewenstein Ah yea still I don't think that would be preferred. For instance I might have an app with two instances but still not really worry about downtime (perhaps it's a consumer for a rabbit queue and not user facing) I wouldn't want to make assumptions about what availability it needs just because some other app needs 75%.

I'd prefer not to expose PDBs either. Not sure how Diego handles this. The only thing I could think of would be a new manifest property for minAvailable or something that supports a number or percent like the PDB. Eirini then makes the PDB for us?

braunsonm

comment created time in 13 hours

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@braunsonm Good point. I was seeing dev foundation vs. prod foundation. With dev spaces and prod spaces in the same foundation, this is of course looking different.

I'd still prefer not to expose PDBs to app developers. They shouldn't know anything about Pods or the details of Kubernetes node update. How's this handled with Diego BTW?

braunsonm

comment created time in 13 hours

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@loewenstein Hmm I'm confused by your reply. It makes perfect sense what you said but that's exactly why I was thinking it's better being an individual app setting vs foundational. Because I don't care about some app in my dev space going offline during an upgrade (I don't need a PDB), but my prod app I would want control over the PDB. For exactly the reason you said, some apps might be under different loads and need more available at anytime.

braunsonm

comment created time in 13 hours

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@braunsonm one thought I had was that this setting might rather be a setting of the foundation instead of the individual app. Like, if you run an app in production on 20 instances you'll probably have some reason and likely fail to keep it available if you drop below idk 15 instances. If you didn't, why would you run 20 instances in the first place.

This might be different for staging, qa, playground systems though. In short, if you want app HA, you probably want min available >50% if you don't want HA, you might be fine without any PDB or at least with a different min percentage..

WDYT?

braunsonm

comment created time in 13 hours

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha 6931fc8196f1ad9c4bb6cae4ce317ca645476306

fix: need to install log-cache plugin w/o prompt Authored-by: Jaskanwal Pawar <jpawar@pivotal.io>

view details

push time in 14 hours

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha 47ff8647fa4aa63a551077459a7bb5e0647db2ed

install log-cache plugin as needed for BARAS Authored-by: Jaskanwal Pawar <jpawar@pivotal.io>

view details

push time in 14 hours

push eventcloudfoundry/cf-for-k8s

relint-ci

commit sha a4b2fc310bea112ed50394d5072903251633a5f0

Autobump buildpacks

view details

push time in 16 hours

push eventcloudfoundry/cf-for-k8s

relint-ci

commit sha a4b2fc310bea112ed50394d5072903251633a5f0

Autobump buildpacks

view details

push time in 16 hours

push eventcloudfoundry/cf-for-k8s

relint-ci

commit sha b13ef2349a712e2d80eabc82e288e1beeda80dbf

Autobump buildpacks

view details

push time in 17 hours

push eventcloudfoundry/cf-for-k8s

relint-ci

commit sha b13ef2349a712e2d80eabc82e288e1beeda80dbf

Autobump buildpacks

view details

push time in 17 hours

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha 9ce0fb482ced0a881a3459d18bc5f6206c58c9c1

use our CI images instead of CAKE ones where possible - remove unused tasks as well (backup-metadata-generator stuff and tasks to deploy/delete CF) - remove `backup-metadata-generator` stuff from pipeline Co-authored-by: Jaskanwal Pawar <jpawar@pivotal.io> Co-authored-by: James Pollard <pollardja@vmware.com>

view details

push time in a day

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha 9c54f4ebd2f3f304c6b06a1eb5948a02cdf9bf4a

wip: refactor config generation to just heredoc - also migrated over a task script we were missing from CAKE pipeline Co-authored-by: Jaskanwal Pawar <jpawar@pivotal.io> Co-authored-by: James Pollard <pollardja@vmware.com>

view details

push time in a day

issue commentcloudfoundry/cf-for-k8s

"info.Labels: label key and value greater than maximum size" on docker pull

Oh interesting - thanks @joscha-alisch for filing this issue!

From what I remember from #444, it looked like this issue was going to be fixed by an update in containerd. So hopefully that'll get into one of the next GKE releases / cos_containerd.

Until then, maybe you'll need to use the default cos (Docker) for your GKE cluster.

Does that sound reasonable?

joscha-alisch

comment created time in a day

push eventcloudfoundry/cf-for-k8s

Jaskanwal Pawar

commit sha 9356fd4cffba95ffbb1b91503f717c7608a10eeb

wip: attempt to fix weirdness with running baras - switching to an image with a newer version of `kubectl` Co-authored-by: Jaskanwal Pawar <jpawar@pivotal.io> Co-authored-by: James Pollard <pollardja@vmware.com>

view details

push time in 2 days

issue commentcloudfoundry/cf-for-k8s

CF CLI frequently gets 'stuck' during push commands while doing concurrent pushes

@jamespollard8 we believe this is still happening. It's difficult to time the timeout as you need two apps with rolling deployed at the same time. However we are seeing some pretty long deployment times during peak business hours that we think could be related to this issue still just waiting for other rolling deployments to go through instead of doing them in parallel.

Unfortunately it's hard to catch this in a busy cluster and I haven't been able to see any errors in registry buddy. (Primarily filled with the registry deletion logic)

braunsonm

comment created time in 2 days

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

Could we solve this without exposing PDBs to the end-user?

50% could work but it makes assumptions about the load that an app can handle. I would prefer it being user configurable.

braunsonm

comment created time in 2 days

issue commentcloudfoundry/cf-for-k8s

"info.Labels: label key and value greater than maximum size" on docker pull

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/176592119

The labels on this github issue will be updated when the story is started.

joscha-alisch

comment created time in 2 days

issue openedcloudfoundry/cf-for-k8s

"info.Labels: label key and value greater than maximum size" on docker pull

Unfortunately we seem to be hitting a similar bug to this one: #444 but on GKE instead of KinD.

We run cf-for-k8s version 1.1.0 on GKE version 1.18.12-gke.1201 with nodes running on image type cos_containerd.

When we push an app via cf push, we see the following error:

Failed to pull image "gcr.io/PROJECT_ID/cf-workloads/cf-default-builder@sha256:b65a50427c08b4ade4e75582a4d6fa86c6da0eec7c1f1932c1881154c3527fb5": 
[rpc error: code = InvalidArgument desc = failed to pull and unpack image "gcr.io/PROJECT_ID/cf-workloads/cf-default-builder@sha256:b65a50427c08b4ade4e75582a4d6fa86c6da0eec7c1f1932c1881154c3527fb5": failed to prepare extraction snapshot "extract-929517039-zaeh sha256:e693876ebf739f21944936aabae94530f008be3cb7f14f66c4a2f4fd9b4bcf54": 
info.Labels: label key and value greater than maximum size (4096 bytes), key: containerd: invalid argument, rpc error: code = InvalidArgument desc = failed to pull and unpack image "gcr.io/PROJECT_ID/cf-workloads/cf-default-builder@sha256:b65a50427c08b4ade4e75582a4d6fa86c6da0eec7c1f1932c1881154c3527fb5": failed to prepare extraction snapshot "extract-809081699-XqHo sha256:e693876ebf739f21944936aabae94530f008be3cb7f14f66c4a2f4fd9b4bcf54": 
info.Labels: label key and value greater than maximum size (4096 bytes), key: containerd: invalid argument]

When we remove buildpack-groups from the cf-default-builder in cf-for-k8s/config/kpack/default-buildpacks.yml then it works fine (only keeping the go-buildpack for example). So the assumption is, that these cause too much metadata in the resulting image.

created time in 2 days

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@loewenstein yes it is parallel. As PDBs are on a per app basis, It can be considered to make Eirini create a PDB for every app when there are more than one instance.

braunsonm

comment created time in 2 days

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

@herrjulz I guess this is still parallel to the question of PDBs, isn't it?

braunsonm

comment created time in 2 days

issue commentcloudfoundry/cf-for-k8s

As an operator I would like my apps to stay online during Kubernetes upgrades

That's true, if we deprecate routing to individual instances we could switch to Deployments instead of using Statefulsets. @bkrannich @voelzmo

braunsonm

comment created time in 2 days

more