profile
viewpoint
Norman Joyner normanjoyner @argoai Pittsburgh, PA http://normanjoyner.com Software Engineering Manager, Cloud Platform @argoai. Previously, Founder and CTO @containership.

jeremykross/konstellate 1392

Free and Open Source GUI to Visualize Kubernetes Applications.

normanjoyner/chef-api 27

A simple client providing access the chef server api, implemented in nodejs.

brosandilabs/kubectl-tmux-logs 17

A kubectl plugin to display container logs within separate tmux panes

normanjoyner/circle-api 5

A nodejs module for interacting with the Circle API

mattkelly/deckconverter 2

A utility for converting between Magic: The Gathering deck formats.

normanjoyner/clifx 2

Command line interface for interacting with LIFX bulbs using LIFX Cloud

normanjoyner/cats-as-a-service 1

Cats. As a service.

normanjoyner/chef-xen 1

A chef cookbook for setting up xen and managing guests

brosandilabs/kubectl-decode-secret 0

A kubectl plugin to decode Kubernetes secrets

brosandilabs/kubectl-tmux-exec 0

A kubectl plugin to exec into Kubernetes pods within separate tmux panes

issue commentNVIDIA/k8s-device-plugin

Error: failed to start container "nvidia-device-plugin-ctr": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: signal: segmentation fault (core dumped), stdout: , stderr: \\\"\"": unknown

Hello, after I installed the nvidia-container-toolkit package, there are still such problems: ``` docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout:, stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded\\n \""": unknown. ERRO[0000] error waiting for container: context canceled


$ nvidia-container-runtime-hook Usage of nvidia-container-runtime-hook: -config string configuration file -debug enable debug output

Commands: prestart run the prestart hook poststart no-op poststop no-op

$ nvidia-container-toolkit Usage of nvidia-container-toolkit: -config string configuration file -debug enable debug output

Commands: prestart run the prestart hook poststart no-op poststop no-op

My system is: Ubuntu 20.04 (Windows10 WSL2)

What can I do to solve this problem? @klueska 
wxitzxg

comment created time in 2 hours

issue openedrook/rook

md raid as metadataDevice not working

Hello I'm trying to setup rook-ceph with metadataDevice on md raid like this:

  storage:
    useAllDevices: false
    useAllNodes: true
    devicePathFilter: "^/dev/disk/by-id/ata-ST.*"
    config:
      metadataDevice: "md0p3"

Looks like it's skipping md devices somehow. However it's not reliable to keep db on raw device cause single drive failure could kill all your data. Is it possible to use raid here? Or is there any other reliable solution?

osd-prepare log:

2020-11-28 14:31:15.680911 I | rookcmd: starting Rook v1.5.1 with arguments '/rook/rook ceph osd provision'
2020-11-28 14:31:15.681141 I | rookcmd: flag values: --cluster-id=9850d371-0108-45b1-822b-07536465547b, --data-device-filter=, --data-device-path-filter=^/dev/disk/by-id/ata-ST.*, --data-devices=, --drive-groups=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=DEBUG, --metadata-device=md0p3, --node-name=hosting617717, --operator-image=, --osd-database-size=0, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
2020-11-28 14:31:15.681254 I | op-mon: parsing mon endpoints: c=10.220.223.205:6789,a=10.220.93.72:6789,b=10.220.97.126:6789
2020-11-28 14:31:15.694212 I | op-osd: CRUSH location=root=default host=hosting617717
2020-11-28 14:31:15.694233 I | cephcmd: crush location of osd: root=default host=hosting617717
2020-11-28 14:31:15.694251 D | exec: Running command: nsenter --mount=/rootfs/proc/1/ns/mnt -- /usr/sbin/lvm --help
2020-11-28 14:31:15.705134 I | cephosd: successfully called nsenter
2020-11-28 14:31:15.705157 I | cephosd: binary "/usr/sbin/lvm" found on the host, proceeding with osd preparation
2020-11-28 14:31:15.705164 D | exec: Running command: dmsetup version
2020-11-28 14:31:15.707318 I | cephosd: Library version:   1.02.169-RHEL8 (2020-02-11)
Driver version:    4.42.0
2020-11-28 14:31:15.716651 D | cephclient: No ceph configuration override to merge as "rook-config-override" configmap is empty
2020-11-28 14:31:15.716679 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-11-28 14:31:15.716811 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2020-11-28 14:31:15.716972 D | cephosd: config file @ /etc/ceph/ceph.conf: [global]
fsid                = a53c7a93-60cb-42c7-ad45-2b7cfd9fb1a0
mon initial members = c a b
mon host            = [v2:10.220.223.205:3300,v1:10.220.223.205:6789],[v2:10.220.93.72:3300,v1:10.220.93.72:6789],[v2:10.220.97.126:3300,v1:10.220.97.126:6789]
public addr         = 10.30.210.178
cluster addr        = 10.30.210.178

[client.admin]
keyring = /var/lib/rook/rook-ceph/client.admin.keyring

2020-11-28 14:31:15.716982 I | cephosd: discovering hardware
2020-11-28 14:31:15.716989 D | exec: Running command: lsblk --all --noheadings --list --output KNAME
2020-11-28 14:31:15.720704 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.722588 D | exec: Running command: sgdisk --print /dev/sda
2020-11-28 14:31:15.728945 D | exec: Running command: udevadm info --query=property /dev/sda
2020-11-28 14:31:15.734907 D | exec: Running command: lsblk --noheadings --pairs /dev/sda
2020-11-28 14:31:15.738843 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.740650 D | exec: Running command: sgdisk --print /dev/sdb
2020-11-28 14:31:15.744433 D | exec: Running command: udevadm info --query=property /dev/sdb
2020-11-28 14:31:15.749730 D | exec: Running command: lsblk --noheadings --pairs /dev/sdb
2020-11-28 14:31:15.753716 D | exec: Running command: lsblk /dev/sdc --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.755832 D | exec: Running command: sgdisk --print /dev/sdc
2020-11-28 14:31:15.758183 D | exec: Running command: udevadm info --query=property /dev/sdc
2020-11-28 14:31:15.763772 D | exec: Running command: lsblk --noheadings --pairs /dev/sdc
2020-11-28 14:31:15.768409 I | inventory: skipping device "sdc" because it has child, considering the child instead.
2020-11-28 14:31:15.768453 D | exec: Running command: lsblk /dev/sdc1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.770352 D | exec: Running command: udevadm info --query=property /dev/sdc1
2020-11-28 14:31:15.775801 D | exec: Running command: lsblk /dev/sdd --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.777738 D | exec: Running command: sgdisk --print /dev/sdd
2020-11-28 14:31:15.780234 W | inventory: skipping device "sdd". exit status 2
2020-11-28 14:31:15.780253 D | exec: Running command: lsblk /dev/sdd1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.782179 D | exec: Running command: udevadm info --query=property /dev/sdd1
2020-11-28 14:31:15.787443 D | exec: Running command: lsblk /dev/md0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.790455 W | inventory: skipping device "md0". unsupported diskType raid1
2020-11-28 14:31:15.790470 D | exec: Running command: lsblk /dev/md0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.792607 W | inventory: skipping device "md0". unsupported diskType raid1
2020-11-28 14:31:15.792660 D | exec: Running command: lsblk /dev/nbd0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.794484 W | inventory: skipping device "nbd0". diskType is empty
2020-11-28 14:31:15.794503 D | exec: Running command: lsblk /dev/nbd1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.796414 W | inventory: skipping device "nbd1". diskType is empty
2020-11-28 14:31:15.796433 D | exec: Running command: lsblk /dev/nbd2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.798135 W | inventory: skipping device "nbd2". diskType is empty
2020-11-28 14:31:15.798155 D | exec: Running command: lsblk /dev/nbd3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.799932 W | inventory: skipping device "nbd3". diskType is empty
2020-11-28 14:31:15.799951 D | exec: Running command: lsblk /dev/nbd4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.802125 W | inventory: skipping device "nbd4". diskType is empty
2020-11-28 14:31:15.802145 D | exec: Running command: lsblk /dev/nbd5 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.803853 W | inventory: skipping device "nbd5". diskType is empty
2020-11-28 14:31:15.803867 D | exec: Running command: lsblk /dev/nbd6 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.805608 W | inventory: skipping device "nbd6". diskType is empty
2020-11-28 14:31:15.805629 D | exec: Running command: lsblk /dev/nbd7 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.807348 W | inventory: skipping device "nbd7". diskType is empty
2020-11-28 14:31:15.807368 D | exec: Running command: lsblk /dev/md0p1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.809269 W | inventory: skipping device "md0p1". unsupported diskType md
2020-11-28 14:31:15.809289 D | exec: Running command: lsblk /dev/md0p1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.811178 W | inventory: skipping device "md0p1". unsupported diskType md
2020-11-28 14:31:15.811206 D | exec: Running command: lsblk /dev/md0p2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.813094 W | inventory: skipping device "md0p2". unsupported diskType md
2020-11-28 14:31:15.813112 D | exec: Running command: lsblk /dev/md0p2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.814974 W | inventory: skipping device "md0p2". unsupported diskType md
2020-11-28 14:31:15.814991 D | exec: Running command: lsblk /dev/md0p3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.816910 W | inventory: skipping device "md0p3". unsupported diskType md
2020-11-28 14:31:15.816930 D | exec: Running command: lsblk /dev/md0p3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.819256 W | inventory: skipping device "md0p3". unsupported diskType md
2020-11-28 14:31:15.819274 D | exec: Running command: lsblk /dev/nbd8 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.821192 W | inventory: skipping device "nbd8". diskType is empty
2020-11-28 14:31:15.821212 D | exec: Running command: lsblk /dev/nbd9 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.823093 W | inventory: skipping device "nbd9". diskType is empty
2020-11-28 14:31:15.823112 D | exec: Running command: lsblk /dev/nbd10 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.824941 W | inventory: skipping device "nbd10". diskType is empty
2020-11-28 14:31:15.824971 D | exec: Running command: lsblk /dev/nbd11 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.827010 W | inventory: skipping device "nbd11". diskType is empty
2020-11-28 14:31:15.827065 D | exec: Running command: lsblk /dev/nbd12 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.828948 W | inventory: skipping device "nbd12". diskType is empty
2020-11-28 14:31:15.828967 D | exec: Running command: lsblk /dev/nbd13 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.830966 W | inventory: skipping device "nbd13". diskType is empty
2020-11-28 14:31:15.830985 D | exec: Running command: lsblk /dev/nbd14 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.833007 W | inventory: skipping device "nbd14". diskType is empty
2020-11-28 14:31:15.833027 D | exec: Running command: lsblk /dev/nbd15 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.834831 W | inventory: skipping device "nbd15". diskType is empty
2020-11-28 14:31:15.834862 D | inventory: discovered disks are [0xc0003ef320 0xc0003ef7a0 0xc00019dd40 0xc00017a480]
2020-11-28 14:31:15.834868 I | cephosd: creating and starting the osds
2020-11-28 14:31:15.841165 D | cephosd: No Drive Groups configured.
2020-11-28 14:31:15.841199 D | cephosd: desiredDevices are [{Name:^/dev/disk/by-id/ata-ST.* OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:true}]
2020-11-28 14:31:15.841207 D | cephosd: context.Devices are [0xc0003ef320 0xc0003ef7a0 0xc00019dd40 0xc00017a480]
2020-11-28 14:31:15.841215 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:15.843098 D | exec: Running command: ceph-volume inventory --format json /dev/sda
2020-11-28 14:31:16.502800 I | cephosd: device "sda" is available.
2020-11-28 14:31:16.502915 I | cephosd: device "sda" (aliases: "/dev/disk/by-id/wwn-0x5000c500932fac52 /dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA15A14X /dev/disk/by-path/pci-0000:05:00.0-sas-phy0-lun-0") matches device path filter "^/dev/disk/by-id/ata-ST.*"
2020-11-28 14:31:16.502927 I | cephosd: device "sda" is selected by the device filter/name "^/dev/disk/by-id/ata-ST.*"
2020-11-28 14:31:16.502947 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2020-11-28 14:31:16.505561 D | exec: Running command: ceph-volume inventory --format json /dev/sdb
2020-11-28 14:31:17.130993 I | cephosd: device "sdb" is available.
2020-11-28 14:31:17.131063 I | cephosd: device "sdb" (aliases: "/dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA15M9Y7 /dev/disk/by-id/wwn-0x5000c50093484442 /dev/disk/by-path/pci-0000:05:00.0-sas-phy1-lun-0") matches device path filter "^/dev/disk/by-id/ata-ST.*"
2020-11-28 14:31:17.131071 I | cephosd: device "sdb" is selected by the device filter/name "^/dev/disk/by-id/ata-ST.*"
2020-11-28 14:31:17.131080 I | cephosd: skipping device "sdc1" because it contains a filesystem "linux_raid_member"
2020-11-28 14:31:17.131085 I | cephosd: skipping device "sdd1" because it contains a filesystem "linux_raid_member"
2020-11-28 14:31:17.131226 I | cephosd: configuring osd devices: {"Entries":{"sda":{"Data":-1,"Metadata":null,"Config":{"Name":"^/dev/disk/by-id/ata-ST.*","OSDsPerDevice":1,"MetadataDevice":"","DatabaseSizeMB":0,"DeviceClass":"","IsFilter":false,"IsDevicePathFilter":true},"PersistentDevicePaths":["/dev/disk/by-id/wwn-0x5000c500932fac52","/dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA15A14X","/dev/disk/by-path/pci-0000:05:00.0-sas-phy0-lun-0"]},"sdb":{"Data":-1,"Metadata":null,"Config":{"Name":"^/dev/disk/by-id/ata-ST.*","OSDsPerDevice":1,"MetadataDevice":"","DatabaseSizeMB":0,"DeviceClass":"","IsFilter":false,"IsDevicePathFilter":true},"PersistentDevicePaths":["/dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA15M9Y7","/dev/disk/by-id/wwn-0x5000c50093484442","/dev/disk/by-path/pci-0000:05:00.0-sas-phy1-lun-0"]}}}
2020-11-28 14:31:17.131299 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd"
2020-11-28 14:31:17.131491 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/472627608
2020-11-28 14:31:17.695956 I | cephosd: configuring new device sda
2020-11-28 14:31:17.695983 I | cephosd: using md0p3 as metadataDevice for device /dev/sda and let ceph-volume lvm batch decide how to create volumes
2020-11-28 14:31:17.695991 I | cephosd: configuring new device sdb
2020-11-28 14:31:17.695996 I | cephosd: using md0p3 as metadataDevice for device /dev/sdb and let ceph-volume lvm batch decide how to create volumes
2020-11-28 14:31:17.696010 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sda /dev/sdb --db-devices /dev/md0p3 --report
2020-11-28 14:31:18.791881 D | exec: Traceback (most recent call last):
2020-11-28 14:31:18.791940 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2020-11-28 14:31:18.791945 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2020-11-28 14:31:18.791950 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2020-11-28 14:31:18.791954 D | exec:     self.main(self.argv)
2020-11-28 14:31:18.791957 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2020-11-28 14:31:18.791960 D | exec:     return f(*a, **kw)
2020-11-28 14:31:18.791964 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 151, in main
2020-11-28 14:31:18.791967 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2020-11-28 14:31:18.791971 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2020-11-28 14:31:18.791974 D | exec:     instance.main()
2020-11-28 14:31:18.791977 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2020-11-28 14:31:18.791981 D | exec:     terminal.dispatch(self.mapper, self.argv)
2020-11-28 14:31:18.791984 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2020-11-28 14:31:18.791988 D | exec:     instance.main()
2020-11-28 14:31:18.791991 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2020-11-28 14:31:18.791994 D | exec:     return func(*a, **kw)
2020-11-28 14:31:18.791997 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 322, in main
2020-11-28 14:31:18.792001 D | exec:     self._get_explicit_strategy()
2020-11-28 14:31:18.792004 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 332, in _get_explicit_strategy
2020-11-28 14:31:18.792008 D | exec:     self._filter_devices()
2020-11-28 14:31:18.792011 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 385, in _filter_devices
2020-11-28 14:31:18.792014 D | exec:     raise RuntimeError(err.format(len(devs) - len(usable)))
2020-11-28 14:31:18.792018 D | exec: RuntimeError: 1 devices were filtered in non-interactive mode, bailing out
failed to configure devices: failed to initialize devices: failed ceph-volume report: exit status 1

Even tried to patch daemon/ceph/osd/volume.go:

2020-11-28 13:23:18.693154 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --osds-per-device 1 /dev/sda /dev/sdb --db-devices /dev/md0p3 --report --format json
2020-11-28 13:23:19.667110 D | cephosd: ceph-volume report: {
    "changed": true,
    "osds": [
        {
            "block.db": {},
            "data": {
                "human_readable_size": "7.28 TB",
                "parts": 1,
                "path": "/dev/sda",
                "percentage": 100.0,
                "size": 7451
            }
        },
        {
            "block.db": {},
            "data": {
                "human_readable_size": "7.28 TB",
                "parts": 1,
                "path": "/dev/sdb",
                "percentage": 100.0,
                "size": 7451
            }
        }
    ],
    "vgs": []
}
failed to configure devices: failed to initialize devices: ceph-volume did not use the expected metadataDevice [md0p3]

ceph_version: "v15.2.6" rook_version: "v1.5.1"

created time in 3 hours

issue closedNVIDIA/k8s-device-plugin

Automatically provisioning /dev/dri devices of GPU inside container?

Tagging: VirtualGL maintainer @dcommander and personal collaborator @dimm0

1. Issue or feature description

In docker and kubernetes, people have had to have manual host setup to provision the X server using hostPath directive /tmp/X11-unix. This is quite tedious for sysadmins and at the same time a security threat as people can spoof the host.

There have also been attempts (https://github.com/ryought/glx-docker-headless-gpu) to execute an X server and use GLX inside the container after getting provisioned the GPU using the k8s-device-plugin. This requires privileged access or at least access to one or more of the virtual terminal console devices /dev/ttyX of the host.

However at last, the developers at VirtualGL (used widely in HPC to enable GPU-based rendering in VNC virtual display environments) have been able to develop a feature that uses the EGL API to enable 3D GL rendering such as Blender, Matlab, and Unity previously only possible with GLX and thus an X server. As you guys know well, nvidia-docker does not support GLX but has introduced the EGL API just below two years ago.

See EGL config section of https://github.com/VirtualGL/virtualgl/issues/113#issuecomment-693127236

This requires access to the devices /dev/dri/cardX corresponding to each GPU provisioned using k8s-device-plugin. However, it does not seem like k8s-device-plugin provisions this automatically. I would like to ask you whether this is possible, and how this can be configured.

2. Steps to reproduce the issue

Provision one nvidia.com/gpu: 1 inside container nvidia/cudagl:11.0-devel-ubuntu20.04 or 1.2-glvnd-devel-ubuntu20.04 with k8s-device-plugin v0.7.0 with default configurations in Kubernetes v1.18.6.

Do: ls /dev

Result: Inside the container you see /dev/nvidiaX, /dev/nvidia-modeset, /dev/nvidia-uvm, /dev/nvidia-uvm-tools, HOWEVER /dev/dri does not exist. VirtualGL does not work through any devices other than /dev/dri/cardX

3. Information to attach (optional if deemed irrelevant)

Other issues and repos: Example of VirtualGL EGL configuration that requires /dev/dri/cardX: https://github.com/ffeldhaus/docker-xpra-html5-gpu-minimal Unsuccessful attempts to run an NVIDIA X server inside a container: https://github.com/VirtualGL/virtualgl/issues/98

closed time in 4 hours

ehfd

issue commentNVIDIA/k8s-device-plugin

Automatically provisioning /dev/dri devices of GPU inside container?

Moving to https://github.com/NVIDIA/libnvidia-container/issues/118

ehfd

comment created time in 4 hours

pull request commentfirecracker-microvm/firecracker

[RFC] Add prototype support for macvtap interfaces

I've updated the PR; unless you're happy with the current approach (which was just a PoC, and I think is a bit too simple-minded), it would be useful to have a steer on how you want to proceed, e.g.

  • Whether you want to support both the current /dev/net/tun approach and macvtap interfaces, or see the macvtap approach replacing the existing one.

  • Whether checking for an absolute path is OK to use the macvtap implementation, or whether it look up the interface index given the interface name and see whether a suitable /dev/tapN exists. (Would probably play better with snapshotting.)

Apart from the above, I'm aware at the moment that I need to:

  • Add jailer support.
  • Ensure that snapshotting works. This will have to be aware of MAC addresses - the current code appears to persist the MAC address, in which case snapshot restoration will also have to set the host MAC address of the macvtap interface. I think that that's fine though.
  • Look into unit testing.
  • Consider doc changes.
  • Sign commits.
upxe

comment created time in 6 hours

push eventkubevirt/kubevirt

Miguel Duarte Barroso

commit sha 6d2b097365af05c7d128c0d40253c0c4c6ef1b0b

dev guide, networking: no capabilities are required Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>

view details

kubevirt-bot

commit sha 373b594019328a7636db4ace4762fae0f1c43c5b

Merge pull request #4579 from maiqueb/remove-capabilities-net-dev-guide dev guide, networking: no capabilities are required

view details

push time in 7 hours

PR merged kubevirt/kubevirt

Reviewers
dev guide, networking: no capabilities are required approved dco-signoff: yes lgtm release-note-none sig/network size/XS

<!-- Thanks for sending a pull request! Here are some tips for you:

  1. Follow the instructions for writing a release note from k8s: https://git.k8s.io/community/contributors/guide/release-notes.md -->

What this PR does / why we need it: Keep the networking developer guide up to date, thus indicating that no network related capabilities are required on the virt-launcher pod.

Release note: <!-- Write your release note:

  1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
  2. If no release note is required, just write "NONE". -->
NONE
+1 -3

20 comments

1 changed file

maiqueb

pr closed time in 7 hours

issue commentfirecracker-microvm/firecracker

Use Python generated client in the testing framework

I wouldn't mind taking a look at this. Is there a specific method we would want to use to generate the code? I assume the YAML file mentioned is the firecracker.yaml file?

gc-plp

comment created time in 10 hours

pull request commentcncf/k8s-conformance

Send conformance test for minikube

This conformance request is not yet verifiable. Please ensure that PR Title refernces the Kubernetes Release and that the supplied logs refer to the specified Release

azhao155

comment created time in 10 hours

PR opened cncf/k8s-conformance

Send conformance test for minikube

Pre-submission checklist:

Please check each of these after submitting your pull request:

  • [ ] If this is a new entry, have you submitted a signed participation form?
  • [ ] Did you include the product/project logo in SVG, EPS or AI format?
  • [ ] Does your logo clearly state the name of the product/project and follow the other logo guidelines?
  • [ ] If your product/project is open source, did you include the repo_url?
  • [ ] Did you copy and paste the installation and configuration instructions into the README.md file in addition to linking to them?
+27944 -0

0 comment

4 changed files

pr created time in 10 hours

issue commentfirecracker-microvm/firecracker

[Code improvement] deduplicate literal HTTP responses in tests

Would we want to be parsing the (firecracker.yaml)[https://github.com/firecracker-microvm/firecracker/blob/master/src/api_server/swagger/firecracker.yaml] file in case the API changes, or should we just keep it simple and hard-code the function definitions first?

An example function definition might be fn call_machine-config(request: Request, ht_enabled: bool, mem_size_mib: i32, track_dirty_pages: Option<bool>, vcpu_count: i32) -> String where Request is simply an enum for PATCH, POST, etc.

If the usage of Option for the optional params seem good to y'all I'll go ahead with implementing these.

acatangiu

comment created time in 12 hours

pull request commentkubevirt/kubevirt

dev guide, networking: no capabilities are required

/retest This bot automatically retries jobs that failed/flaked on approved PRs. Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

maiqueb

comment created time in 14 hours

pull request commentalexellis/awesome-baremetal

Update Talos Systems

This seems to be removing the Talos Systems entry, replacing it with one for Sidero. Is that correct?

That is correct. Talos Systems is the company behind Sidero.

andrewrynhard

comment created time in 16 hours

pull request commentkubevirt/kubevirt

dev guide, networking: no capabilities are required

/retest This bot automatically retries jobs that failed/flaked on approved PRs. Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

maiqueb

comment created time in 16 hours

issue commentrook/rook

Error: cannot find volume "rook-binaries" to mount into container "copy-bins"

Same error happening about 3 times a day

vldanch

comment created time in 17 hours

pull request commentkubevirt/kubevirt

dev guide, networking: no capabilities are required

@maiqueb: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.19 6d2b097365af05c7d128c0d40253c0c4c6ef1b0b link /test pull-kubevirt-e2e-k8s-1.19

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

maiqueb

comment created time in 18 hours

pull request commentalexellis/awesome-baremetal

Update Talos Systems

This seems to be removing the Talos Systems entry, replacing it with one for Sidero. Is that correct?

andrewrynhard

comment created time in 18 hours

pull request commentkubevirt/kubevirt

Volume hotplug

@awels: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-generate 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-generate
pull-kubevirt-e2e-k8s-1.18 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-e2e-k8s-1.18
pull-kubevirt-e2e-k8s-1.19 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-e2e-k8s-1.19
pull-kubevirt-e2e-k8s-1.17 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-e2e-k8s-1.17

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

awels

comment created time in 18 hours

pull request commentkubevirt/kubevirt

Volume hotplug

@awels: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.17 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.17
pull-kubevirt-generate 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-generate
pull-kubevirt-e2e-k8s-1.18 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-e2e-k8s-1.18
pull-kubevirt-e2e-k8s-1.19 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-e2e-k8s-1.19

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

awels

comment created time in 18 hours

pull request commentkubevirt/kubevirt

Volume hotplug

@awels: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.17 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.17
pull-kubevirt-e2e-k8s-1.19 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.19
pull-kubevirt-generate 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-generate
pull-kubevirt-e2e-k8s-1.18 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-e2e-k8s-1.18

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

awels

comment created time in 18 hours

pull request commentkubevirt/kubevirt

Volume hotplug

@awels: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.17 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.17
pull-kubevirt-e2e-k8s-1.18 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.18
pull-kubevirt-e2e-k8s-1.19 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.19
pull-kubevirt-generate 910ac37b039614017d11417ae5b0bc8314f4576b link /test pull-kubevirt-generate

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

awels

comment created time in 19 hours

pull request commentfirecracker-microvm/firecracker

Enabled describeInstance to show VM pause/resume

Please rebase your PR and fix conflicts. It's currently failing some build steps.

Done :)

HQ01

comment created time in 19 hours

issue openedpelotom/runtypes

BigInt support

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt

created time in 19 hours

pull request commentkubevirt/kubevirt

Volume hotplug

New changes are detected. LGTM label has been removed.

awels

comment created time in 19 hours

push eventkubevirt/kubevirt

Quique Llorente

commit sha 7114ce1489a64590a7e7152dd38b30fe398e830b

tests, network: Remove vmi Status ip normalization Now that the CIDR is not included at the VMI Status IPs we don't need to remove it before compare it with the pod Ips. Signed-off-by: Quique Llorente <ellorent@redhat.com>

view details

kubevirt-bot

commit sha 0c67402f836b0a742e61345d5fd3a52279d1a562

Merge pull request #4596 from qinqon/remove-status-ips-normalization tests, network: Remove vmi Status ip normalization

view details

push time in 19 hours

PR merged kubevirt/kubevirt

Reviewers
tests, network: Remove vmi Status ip normalization approved dco-signoff: yes lgtm release-note-none size/S

<!-- Thanks for sending a pull request! Here are some tips for you:

  1. Follow the instructions for writing a release note from k8s: https://git.k8s.io/community/contributors/guide/release-notes.md -->

What this PR does / why we need it: Now that the CIDR is not included at the VMI Status IPs (https://github.com/kubevirt/kubevirt/pull/4217) we don't need to remove it before compare it with the pod Ips.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Special notes for your reviewer:

Release note: <!-- Write your release note:

  1. Enter your extended release note in the below block. If the PR requires additional action from users switching to the new release, include the string "action required".
  2. If no release note is required, just write "NONE". -->
NONE
+2 -19

4 comments

1 changed file

qinqon

pr closed time in 19 hours

PR closed google/eng-practices

Update looking-for.md

I found this https://google.github.io/eng-practices/review/reviewer/looking-for.html and was confused about what CL stood for.

+1 -1

3 comments

1 changed file

oscarmandley

pr closed time in 20 hours

pull request commentgoogle/eng-practices

Update looking-for.md

Hi, thanks for the PR. We're planning to address the widespread use of "CL" within the content with a fix to issue #22, which will involve a tooling fix on our side. (Apologies for the slow movement on the issue; 2020 has been...something.)

oscarmandley

comment created time in 20 hours

pull request commentkubevirt/kubevirt

Volume hotplug

@awels: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.17 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.17
pull-kubevirt-e2e-k8s-1.18 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.18
pull-kubevirt-e2e-k8s-1.19 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.19

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

awels

comment created time in 20 hours

pull request commentkubevirt/kubevirt

Volume hotplug

@awels: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.19 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.19
pull-kubevirt-e2e-k8s-1.17 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.17
pull-kubevirt-e2e-k8s-1.18 caf3d079723d3834ffcc1cf9664d0f9d9aee007f link /test pull-kubevirt-e2e-k8s-1.18

<details>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. </details> <!-- test report -->

awels

comment created time in 20 hours

more