Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

OpenShift 4.13
https://github.com/openshift/installer/pull/6770 https://github.com/openshift/installer/pull/6782 https://github.com/openshift/installer/pull/6750 https://github.com/openshift/installer/pull/6738 https://github.com/openshift/installer/pull/6612 https://github.com/openshift/installer/pull/6327 https://github.com/openshift/api/pull/1388 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/224 https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/218 https://github.com/openshift/openshift-docs/pull/54788 https://github.com/openshift/installer/pull/6905

Epic	Control Plane with Multiple Subnets	Compute with Multiple Subnets	Doesn't need external LB	Built-in LB
~~NE-1069~~ (all-platforms)	✓	✓	✓	✓
~~NE-905~~ (all-platforms)	✓	✓	✓	✕
~~NE-1086~~ (vSphere)	✓	✓	✓	✓
~~NE-1087~~ (Bare Metal)	✓	✓	✓	✓
~~OSASINFRA-2999~~ (OSP)	✓	✓	✓
~~SPLAT-860~~ (vSphere)	✓	✓	✓	✕
~~NE-905~~ (all platforms)	✓	✓	✓	✕
~~OPNET-133~~ (vSphere/Bare Metal for AI/ZTP)	✓	✓	✓	✓
~~OSASINFRA-2087~~ (OSP)	✕	✓	✓	✓
~~KNIDEPLOY-4421~~ (Bare Metal workaround)	✕	✓	✓	✓
~~SPLAT-409~~ (vSphere)	✕	✓	✓	✓

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Resource	Terraform API
VM Instance	google_compute_instance
Storage Bucket	google_storage_bucket

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
vSphere autoscaling from zero		No
Upstream E2E testing		No
Upstream adapt scale from zero replicas		No

# namespaces	4.19	4.18	4.17	4.16	4.15	4.14
monitored	82	82	82	82	82	82
fix needed	68	68	68	68	68	68
fixed	39	39	35	32	39	1
remaining	29	29	33	36	29	67
~ remaining non-runlevel	8	8	12	15	8	46
~ remaining runlevel (low-prio)	21	21	21	21	21	21
~ untested	5	2	2	2	82	82

namespace	in review	merged
openshift
openshift-apiserver-operator	PR
openshift-cloud-credential-operator	PR
openshift-cloud-network-config-controller	PR
openshift-cluster-samples-operator	PR
openshift-cluster-storage-operator	PR
openshift-config ~~OCPBUGS-28621~~	PR
openshift-config-managed	PR
openshift-config-operator	PR
openshift-console	PR
openshift-console-operator	PR
openshift-console-user-settings	PR
openshift-controller-manager	PR
openshift-controller-manager-operator	PR
openshift-dns-operator	PR
openshift-etcd-operator	PR
openshift-host-network	PR
openshift-ingress-canary	PR
openshift-ingress-operator	PR
openshift-insights	PR
openshift-kube-apiserver-operator	PR
openshift-kube-controller-manager-operator	PR
openshift-kube-scheduler-operator	PR
openshift-kube-storage-version-migrator	PR
openshift-kube-storage-version-migrator-operator	PR
openshift-network-diagnostics	PR
openshift-node
openshift-operator-lifecycle-manager	PR
openshift-operators	PR
openshift-route-controller-manager	PR
openshift-service-ca	PR
openshift-service-ca-operator	PR
openshift-user-workload-monitoring	PR

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Bug OCPBUGS-34158: infra machine going to failed status unexpectedly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33954~~. The following is the description of the original issue:
—
Description of problem:

Infra machine is going to failed status:

2024-05-18 07:26:49.815 | NAMESPACE               NAME                          PHASE     TYPE     REGION      ZONE   AGE
2024-05-18 07:26:49.822 | openshift-machine-api   ostest-wgdc2-infra-0-4sqdh    Running   master   regionOne   nova   31m
2024-05-18 07:26:49.826 | openshift-machine-api   ostest-wgdc2-infra-0-ssx8j    Failed                                31m
2024-05-18 07:26:49.831 | openshift-machine-api   ostest-wgdc2-infra-0-tfkf5    Running   master   regionOne   nova   31m
2024-05-18 07:26:49.841 | openshift-machine-api   ostest-wgdc2-master-0         Running   master   regionOne   nova   38m
2024-05-18 07:26:49.847 | openshift-machine-api   ostest-wgdc2-master-1         Running   master   regionOne   nova   38m
2024-05-18 07:26:49.852 | openshift-machine-api   ostest-wgdc2-master-2         Running   master   regionOne   nova   38m
2024-05-18 07:26:49.858 | openshift-machine-api   ostest-wgdc2-worker-0-d5cdp   Running   worker   regionOne   nova   31m
2024-05-18 07:26:49.868 | openshift-machine-api   ostest-wgdc2-worker-0-jcxml   Running   worker   regionOne   nova   31m
2024-05-18 07:26:49.873 | openshift-machine-api   ostest-wgdc2-worker-0-t29fz   Running   worker   regionOne   nova   31m

Logs from machine-controller shows below error:

2024-05-18T06:59:11.159013162Z I0518 06:59:11.158938       1 controller.go:156] ostest-wgdc2-infra-0-ssx8j: reconciling Machine
2024-05-18T06:59:11.159589148Z I0518 06:59:11.159529       1 recorder.go:104] events "msg"="Reconciled machine ostest-wgdc2-worker-0-jcxml" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ostest-wgdc2-worker-0-jcxml","uid":"245bac8e-c110-4bef-ac11-3d3751a93353","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"18617"} "reason"="Reconciled" "type"="Normal"
2024-05-18T06:59:12.749966746Z I0518 06:59:12.749845       1 controller.go:349] ostest-wgdc2-infra-0-ssx8j: reconciling machine triggers idempotent create
2024-05-18T07:00:00.487702632Z E0518 07:00:00.486365       1 leaderelection.go:332] error retrieving resource lock openshift-machine-api/cluster-api-provider-openstack-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-api-provider-openstack-leader": http2: client connection lost
2024-05-18T07:00:00.487702632Z W0518 07:00:00.486497       1 controller.go:351] ostest-wgdc2-infra-0-ssx8j: failed to create machine: error creating bootstrap for ostest-wgdc2-infra-0-ssx8j: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-api/secrets/worker-user-data": http2: client connection lost
2024-05-18T07:00:00.487702632Z I0518 07:00:00.486534       1 controller.go:391] Actuator returned invalid configuration error: error creating bootstrap for ostest-wgdc2-infra-0-ssx8j: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-machine-api/secrets/worker-user-data": http2: client connection lost
2024-05-18T07:00:00.487702632Z I0518 07:00:00.486548       1 controller.go:404] ostest-wgdc2-infra-0-ssx8j: going into phase "Failed"

The openstack VM is not even created:

2024-05-18 07:26:50.911 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+
2024-05-18 07:26:50.917 | | ID                                   | Name                        | Status | Networks                                                                                                            | Image              | Flavor |
2024-05-18 07:26:50.924 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+
2024-05-18 07:26:50.929 | | 3a1b9af6-d284-4da5-8ebe-434d3aa95131 | ostest-wgdc2-worker-0-jcxml | ACTIVE | StorageNFS=172.17.5.187; network-dualstack=192.168.192.185, fd2e:6f44:5dd8:c956:f816:3eff:fe3e:4e7c                 | ostest-wgdc2-rhcos | worker |
2024-05-18 07:26:50.935 | | 5c34b78a-d876-49fb-a307-874d3c197c44 | ostest-wgdc2-infra-0-tfkf5  | ACTIVE | network-dualstack=192.168.192.133, fd2e:6f44:5dd8:c956:f816:3eff:fee6:4410, fd2e:6f44:5dd8:c956:f816:3eff:fef2:930a | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.941 | | d2025444-8e11-409d-8a87-3f1082814af1 | ostest-wgdc2-infra-0-4sqdh  | ACTIVE | network-dualstack=192.168.192.156, fd2e:6f44:5dd8:c956:f816:3eff:fe82:ae56, fd2e:6f44:5dd8:c956:f816:3eff:fe86:b6d1 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.947 | | dcbde9ac-da5a-44c8-b64f-049f10b6b50c | ostest-wgdc2-worker-0-t29fz | ACTIVE | StorageNFS=172.17.5.233; network-dualstack=192.168.192.13, fd2e:6f44:5dd8:c956:f816:3eff:fe94:a2d2                  | ostest-wgdc2-rhcos | worker |
2024-05-18 07:26:50.951 | | 8ad98adf-147c-4268-920f-9eb5c43ab611 | ostest-wgdc2-worker-0-d5cdp | ACTIVE | StorageNFS=172.17.5.217; network-dualstack=192.168.192.173, fd2e:6f44:5dd8:c956:f816:3eff:fe22:5cff                 | ostest-wgdc2-rhcos | worker |
2024-05-18 07:26:50.957 | | f01d6740-2954-485d-865f-402b88789354 | ostest-wgdc2-master-2       | ACTIVE | StorageNFS=172.17.5.177; network-dualstack=192.168.192.198, fd2e:6f44:5dd8:c956:f816:3eff:fe1f:3c64                 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.963 | | d215a70f-760d-41fb-8e30-9f3106dbaabe | ostest-wgdc2-master-1       | ACTIVE | StorageNFS=172.17.5.163; network-dualstack=192.168.192.152, fd2e:6f44:5dd8:c956:f816:3eff:fe4e:67b6                 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.968 | | 53fe495b-f617-412d-9608-47cd355bc2e5 | ostest-wgdc2-master-0       | ACTIVE | StorageNFS=172.17.5.170; network-dualstack=192.168.192.193, fd2e:6f44:5dd8:c956:f816:3eff:febd:a836                 | ostest-wgdc2-rhcos | master |
2024-05-18 07:26:50.975 | +--------------------------------------+-----------------------------+--------+---------------------------------------------------------------------------------------------------------------------+--------------------+--------+

Version-Release number of selected component (if applicable):

RHOS-17.1-RHEL-9-20240123.n.1
4.15.0-0.nightly-2024-05-16-091947

Additional info:

   Must-gather link provided on private comment.

https://github.com/openshift/machine-api-provider-openstack/pull/115

Bug OCPBUGS-34383: logtostderr is removed in the k8s upstream and has no effect any more

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33955~~. The following is the description of the original issue:
—
Description of problem:

4.16.0-0.nightly-2024-05-14-095225, "logtostderr is removed in the k8s upstream and has no effect any more." log in kube-rbac-proxy-main/kube-rbac-proxy-self/kube-rbac-proxy-thanos containers

$ oc -n openshift-monitoring logs -c kube-rbac-proxy-main openshift-state-metrics-7f78c76cc6-nfbl4
W0514 23:19:50.052015       1 deprecated.go:66] 
==== Removed Flag Warning ======================logtostderr is removed in the k8s upstream and has no effect any more.===============================================
...

$ oc -n openshift-monitoring logs -c kube-rbac-proxy-self openshift-state-metrics-7f78c76cc6-nfbl4
...
W0514 23:19:50.177692       1 deprecated.go:66] 
==== Removed Flag Warning ======================logtostderr is removed in the k8s upstream and has no effect any more.===============================================
...

$ oc -n openshift-monitoring get pod openshift-state-metrics-7f78c76cc6-nfbl4 -oyaml | grep logtostderr -C3
spec:
  containers:
  - args:
    - --logtostderr
    - --secure-listen-address=:8443
    - --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    - --upstream=http://127.0.0.1:8081/
--
      name: kube-api-access-v9hzd
      readOnly: true
  - args:
    - --logtostderr
    - --secure-listen-address=:9443
    - --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    - --upstream=http://127.0.0.1:8082/

$ oc -n openshift-monitoring logs -c kube-rbac-proxy-thanos prometheus-k8s-0
W0515 02:55:54.209496       1 deprecated.go:66] 
==== Removed Flag Warning ======================logtostderr is removed in the k8s upstream and has no effect any more.===============================================
...

$ oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml | grep logtostderr -C3
    - --config-file=/etc/kube-rbac-proxy/config.yaml
    - --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    - --allow-paths=/metrics
    - --logtostderr=true
    - --tls-min-version=VersionTLS12
    env:
    - name: POD_IP

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-14-095225

How reproducible:

always

Steps to Reproduce:

1. see the description

Actual results:

logtostderr is removed in the k8s upstream and has no effect any more

Expected results:

no such info

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2370

Bug OCPBUGS-37470: Nodes are drained twice when an OCB image is applied

View the Description View the linked PRs

Description of problem:

When a OCB is enabled, and a new MC is created, nodes are drained twice when the resulting osImage build is applied.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Enable OCB in the worker pool

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: worker
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
    renderedImagePushSecret:
      name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
    renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
EOF



    2. Wait for the image to be built

    3. When the opt-in image has been finished and applied create a new MC

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config-1
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        filesystem: root
        mode: 420
        path: /etc/test-file-1.test

    4. Wait for the image to be built

Actual results:

Once the image is built it is applied to the worker nodes.

If we have a look at the drain operation, we can see that every worker node was drained twice instead of once:

oc -n openshift-machine-config-operator logs $(oc -n openshift-machine-config-operator get pods -l k8s-app=machine-config-controller -o jsonpath='{.items[0].metadata.name}') -c machine-config-controller | grep "initiating drain"
I0430 13:28:48.740300       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:30:08.330051       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:32:32.431789       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:33:50.643544       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:48:08.183488       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:49:01.379416       1 drain_controller.go:182] node ip-10-0-70-208.us-east-2.compute.internal: initiating drain
I0430 13:50:52.933337       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain
I0430 13:52:12.191203       1 drain_controller.go:182] node ip-10-0-69-154.us-east-2.compute.internal: initiating drain

Expected results:

Nodes should drained only once when applying a new MC

Additional info:

https://github.com/openshift/machine-config-operator/pull/4484

Bug OCPBUGS-25676: monitoring ClusterOperator should better handle timeouts

View the Description View the linked PRs

Description of problem:

The monitoring operator may be down or disabled, and the components it manages may be unavailable or degraded.
Upon quick check I've noticed an error:

oc get co -o json | jq -r '.items[].status | select (.conditions) '.conditions | jq -r '.[] | select( (.type == "Degraded") and (.status == "True") )'

    {
      "lastTransitionTime": "2023-12-19T10:25:24Z",
      "message": "syncing Thanos Querier trusted CA bundle ConfigMap failed: reconciling trusted CA bundle ConfigMap failed: updating ConfigMap object failed: Timeout: request did not complete within requested timeout - context deadline exceeded, syncing Thanos Querier trusted CA bundle ConfigMap failed: deleting old trusted CA bundle configmaps failed: error listing configmaps in namespace openshift-monitoring with label selector monitoring.openshift.io/name=alertmanager,monitoring.openshift.io/hash!=2ua4n9ob5qr8o: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps), syncing Prometheus trusted CA bundle ConfigMap failed: deleting old trusted CA bundle configmaps failed: error listing configmaps in namespace openshift-monitoring with label selector monitoring.openshift.io/name=prometheus,monitoring.openshift.io/hash!=2ua4n9ob5qr8o: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps)",
      "reason": "MultipleTasksFailed",
      "status": "True",
      "type": "Degraded"
    }

i.e. updating ConfigMap object failed: Timeout: request did not complete within requested timeout - context deadline exceeded

I ran oc get co again and everything looked fine, it seems this timeout condition could be handled better to avoid alerting SRE.

Actual results:

operator degraded

Expected results:

operator retries operation

https://github.com/openshift/cluster-monitoring-operator/pull/2219

Bug OCPBUGS-30447: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kube-rbac-proxy/pull/111

Bug OCPBUGS-31631: Deploy dual stack with IPv6 on top of bond/vlan fails

View the Description View the linked PRs

Description of problem:
Given this nmstate inside the agent-config

        - name: bond0.10
          type: vlan
          state: up
          vlan:
            base-iface: bond0
            id: 10
          ipv4:
            address:
              - ip: 10.10.10.116
                prefix-length: 24
            dhcp: false
            enabled: true
          ipv6:
            enabled: true
            autoconf: true
            dhcp: true
            auto-dns: false
            auto-gateway: true
            auto-routes: true

The installation fails due to the assisted-service validation

    "message": "No connectivity to the majority of hosts in the cluster"

It misses the l2 connectivity for the ipv6 part (??)
Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/assisted-service/pull/6149

Bug OCPBUGS-34521: [v1] Disk to Mirror or use of targetCatalog requires access to internet for catalog images

View the Description View the linked PRs

Description of problem:

when use targetCatalog, mirror failed with error: 
error: error rebuilding catalog images from file-based catalogs: error copying image docker://registry.redhat.io/abc/redhat-operator-index:v4.13 to docker://localhost:5000/abc/redhat-operator-index:v4.13: initializing source docker://registry.redhat.io/abc/redhat-operator-index:v4.13: (Mirrors also failed: [localhost:5000/abc/redhat-operator-index:v4.13: pinging container registry localhost:5000: Get "https://localhost:5000/v2/": http: server gave HTTP response to HTTPS client]): registry.redhat.io/abc/redhat-operator-index:v4.13: reading manifest v4.13 in registry.redhat.io/abc/redhat-operator-index: unauthorized: access to the requested resource is not authorized

Version-Release number of selected component (if applicable):

oc-mirror 4.16

How reproducible:

always

Steps to Reproduce:

1) Use following isc to do mirror2mirror for v1:    
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /tmp/case60597
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.13
    targetCatalog: abc/redhat-operator-index
    packages:
    - name: servicemeshoperator  
`oc-mirror --config config.yaml docker://localhost:5000 --dest-use-http`

Actual results:

1) mirror failed with error:
info: Mirroring completed in 420ms (0B/s)
error: error rebuilding catalog images from file-based catalogs: error copying image docker://registry.redhat.io/abc/redhat-operator-index:v4.13 to docker://localhost:5000/abc/redhat-operator-index:v4.13: initializing source docker://registry.redhat.io/abc/redhat-operator-index:v4.13: (Mirrors also failed: [localhost:5000/abc/redhat-operator-index:v4.13: pinging container registry localhost:5000: Get "https://localhost:5000/v2/": http: server gave HTTP response to HTTPS client]): registry.redhat.io/abc/redhat-operator-index:v4.13: reading manifest v4.13 in registry.redhat.io/abc/redhat-operator-index: unauthorized: access to the requested resource is not authorized

Expected results:

1) no error.

Additional information:

compared with oc-mirror 4.15.9, can't reproduce this issue

https://github.com/openshift/oc-mirror/pull/863

Bug OCPBUGS-23827: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24990: Update 4.16 openshift-enterprise-keepalived-ipfailover-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/159

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/159

Bug OCPBUGS-29601: [CI issue]Pipeline tests are failing in CI

View the Description View the linked PRs

Description of problem:

The pipeline operator has been removed from the operator hub so CI has been failing since
https://search.ci.openshift.org/?search=Entire+pipeline+flow+from+Builder+page+%22before+all%22+hook+for+%22Background+Steps%22&maxAge=336h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13611

Bug OCPBUGS-34499: capi-based aws installs sometimes fail with HostedZoneAlreadyExists

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34243~~. The following is the description of the original issue:
—
Description of problem:

aws capi installs, particularly when running under heavy load in ci, can sometimes fail with:

    level=info msg=Creating private Hosted Zone
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed provisioning resources after infrastructure ready: failed to create private hosted zone: error creating private hosted zone: HostedZoneAlreadyExists: A hosted zone has already been created with the specified caller reference.
level=error msg=	status code: 409, request id: f173760d-ab43-41b8-a8a0-568cf387bf5e

example job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/openshift-installer-8448-ci-4.16-e2e-aws-ovn/1793287246942572544/artifacts/e2e-aws-ovn-8/ipi-install-install/build-log.txt

Version-Release number of selected component (if applicable):

How reproducible:

not reproducible - needs to be discovered in ci

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

    install fails due to existing hosted zone

Expected results:

    HostedZoneAlreadyExists error should not cause install to fail

Additional info:

https://github.com/openshift/installer/pull/8476

Bug OCPBUGS-38131: discoverOpenIDURLs and checkOIDCPasswordGrantFlow fail if endpoints are private to the data plane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37753~~. The following is the description of the original issue:
—
discoverOpenIDURLs and checkOIDCPasswordGrantFlow fail if endpoints are private to the data plane.

This enabled the oauth server traffic to flow through the dataplane to enable reaching private endpoints e.g ldap https://issues.redhat.com/browse/HOSTEDCP-421

This enabled fallback to the management cluster network so for public endpoints we are not blocking on having data plane, e.g. github https://issues.redhat.com/browse/OCPBUGS-8073

This issue is to enable the CPO oidc checks to flow through the data plane and fallback to the management side to satisfy both cases above.

This woudl cover https://issues.redhat.com/browse/RFE-5638

https://github.com/openshift/hypershift/pull/4505

Bug OCPBUGS-22221: DNS pods stuck in CrashLoopBackOff on upgrade of cluster using Hybrid Networking

View the Description View the linked PRs

Description of problem:

When upgrading clusters to any 4.13 version (from either 4.12 or 4.13), clusters with Hybrid Networking enabled appear to have a few DNS pods (not all) falling into CrashLoopBackOff status. Notably, pods are failing Readiness probes, and when deleted, work without incident. Nearly all other aspects of upgrade continue as expected

Version-Release number of selected component (if applicable):

4.13.x

How reproducible:

Always for systems in use

Steps to Reproduce:

1. Configure initial cluster installation with OVNKubernetes and enable Hybrid Networking on 4.12 or 4.13, e.g. 4.13.13
2. Upgrade cluster to 4.13.z, e.g. 4.13.14

Actual results:

dns-default pods in CrashLoopBackOff status, failing Readiness probes

Expected results:

dns-default pods are rolled out without incident

Additional info:

Appears strongly related to OCPBUGS-13172. CU has kept an affected cluster with the DNS pod issue ongoing for additional investigating, if needed.

https://github.com/openshift/ovn-kubernetes/pull/2038

Bug OCPBUGS-23457: Openshift API server should not go unavailable during upgrade of an HA control plane

View the Description View the linked PRs

Description of problem:

During the control plane upgrade e2e test, it seems that the openshift apiserver becomes unavailable during the upgrade process. The test is run on an HA control plane, and this should not happen.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Often

Steps to Reproduce:

1. Create a hosted cluster with HA control plane and wait for it to become available
2. Upgrade the hosted cluster to a newer release
3. While upgrading, monitor whether the openshift apiserver is available by either querying APIService resources or resources served by the openshift apiserver.

Actual results:

The openshift apiserver is unavailable at some point during the upgrade

Expected results:

The openshift apiserver is available throughout the upgrade

Additional info:

https://github.com/openshift/hypershift/pull/3627

Bug OCPBUGS-23744: operator-lifecycle-manager-packageserver ClusterOperator should not blip Available=False on 4.14 to 4.15 updates

View the Description View the linked PRs

Description of problem:

Seen in 4.14 to 4.15 update CI:

: [bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should not change condition/Available expand_less
Run #0: Failed expand_less	1h34m55s
{  1 unexpected clusteroperator state transitions during e2e test run 

Nov 22 21:48:41.624 - 56ms  E clusteroperator/operator-lifecycle-manager-packageserver condition/Available reason/ClusterServiceVersionNotSucceeded status/False ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: APIServiceInstallFailed, message: APIService install failed: forbidden: User "system:anonymous" cannot get path "/apis/packages.operators.coreos.com/v1"}

While a brief auth failure isn't fantastic, an issue that only persists for 56ms is not long enough to warrant immediate admin intervention. Teaching the operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required. It's also possible that this is an incoming-RBAC vs. outgoing-RBAC race of some sort, and that shifting manifest filenames around could avoid the hiccup entirely.

Version-Release number of selected component (if applicable):

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/operator-lifecycle-manager-packageserver+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort
periodic-ci-openshift-cluster-etcd-operator-release-4.15-periodics-e2e-aws-etcd-recovery (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 8 runs, 38% failed, 33% of failures match = 13% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 20% failed, 400% of failures match = 80% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le (all) - 6 runs, 67% failed, 75% of failures match = 50% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-sdn-arm64 (all) - 5 runs, 20% failed, 300% of failures match = 60% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 40% failed, 100% of failures match = 40% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 43 runs, 51% failed, 36% of failures match = 19% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 5 runs, 20% failed, 300% of failures match = 60% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 17% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 63% of failures match = 19% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-uwm (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 8 runs, 25% failed, 200% of failures match = 50% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 50% of failures match = 21% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 50 runs, 16% failed, 50% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-vsphere-ovn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-from-stable-4.13-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 80% of failures match = 80% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-upgrade-rollback-oldest-supported (all) - 4 runs, 25% failed, 100% of failures match = 25% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 50 runs, 18% failed, 178% of failures match = 32% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-sdn-bm-upgrade (all) - 6 runs, 83% failed, 20% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 6 runs, 83% failed, 60% of failures match = 50% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade-paused (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 40% of failures match = 40% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 6 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 19 runs, 63% failed, 33% of failures match = 21% impact
periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 15 runs, 47% failed, 57% of failures match = 27% impact

I'm not sure if all of those are from this system:anonymous issue, or if some of them are other mechanisms. Ideally we fix all of the Available=False noise, while, again, still going Available=False when it is worth summoning an admin immediately. Checking for different reason and message strings in recent 4.15-touching update runs:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/operator-lifecycle-manager-packageserver.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False.*message: \(.*\)|\1 \2 \3|' | sort | uniq -c | sort -n
      3 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded APIService install failed: Unauthorized
      3 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded install timeout
      4 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded install strategy failed: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1.packages.operators.coreos.com": the object has been modified; please apply your changes to the latest version and try again
      9 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded apiServices not installed
     23 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded install strategy failed: could not create service packageserver-service: services "packageserver-service" already exists
     82 operator-lifecycle-manager-packageserver ClusterServiceVersionNotSucceeded APIService install failed: forbidden: User "system:anonymous" cannot get path "/apis/packages.operators.coreos.com/v1"

How reproducible:

Lots of hits in the above CI search. Running one of the 100% impact flavors has a good chance at reproducing.

Steps to Reproduce:

1. Install 4.14
2. Update to 4.15
3. Keep an eye on operator-lifecycle-manager-packageserver's ClusterOperator Available.

Actual results:

Available=False blips.

Expected results:

Available=True the whole time, or any Available=False looks like a serious issue where summoning an admin would have been appropriate.

Additional info

Causes also these testcases to fail (mentioning them here for Sippy to link here on relevant component readiness failures):

[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]

https://github.com/openshift/operator-framework-olm/pull/708

Bug OCPBUGS-24815: Update 4.16 csi-livenessprobe-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/55

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/55

Bug OCPBUGS-24934: Update 4.16 ose-machine-api-provider-aws-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/94

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/94

Bug OCPBUGS-33565: pod cannot be ready due to incompatible CNI versions

View the Description View the linked PRs

Description of problem:

pod cannot be ready due to incompatible CNI versions, see: 

 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod1gjvg8_z1_1cdbb285-d4f4-4fbb-8e30-16933315aa65_0(8a7067a7914fbf21f0f083a97be5ac48aa562ecc21472780d4cc2af3e5b7784e): error adding pod z1_testpod1gjvg8 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"8a7067a7914fbf21f0f083a97be5ac48aa562ecc21472780d4cc2af3e5b7784e" Netns:"/var/run/netns/fd68f325-4141-49be-8840-a48e51c5b76d" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=z1;K8S_POD_NAME=testpod1gjvg8;K8S_POD_INFRA_CONTAINER_ID=8a7067a7914fbf21f0f083a97be5ac48aa562ecc21472780d4cc2af3e5b7784e;K8S_POD_UID=1cdbb285-d4f4-4fbb-8e30-16933315aa65" Path:"" ERRORED: error configuring pod [z1/testpod1gjvg8] networking: [z1/testpod1gjvg8/1cdbb285-d4f4-4fbb-8e30-16933315aa65:static-sriovnetwork]: error adding container to network "static-sriovnetwork": failed to set up IPAM plugin type "whereabouts" from the device "ens2f0": incompatible CNI versions; config is "1.0.0", plugin supports ["0.1.0" "0.2.0" "0.3.0" "0.3.1" "0.4.0"]

Version-Release number of selected component (if applicable):

    sriov-network-operator.v4.16.0-202405110441

How reproducible:

    always

Steps to Reproduce:

    1. Create cluster with sriov operator
    2. Create VF by snnp
    3. Create NAD by sriovnetwork as below

# cat sriovnetwork-whereabouts
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: static-sriovnetwork
  namespace: openshift-sriov-network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "range":"10.31.0.0/30"
    }
  capabilities: |
    {
      "mac": true,
      "ips": true
    }
  spoofChk: "off"
  trust: "on"
  resourceName: e810
  networkNamespace: z1

4.  the NAD generated cni version is 1.0.0
# oc get network-attachment-definitions.k8s.cni.cncf.io -n z1 -o yaml
apiVersion: v1
items:
- apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: openshift.io/e810
    creationTimestamp: "2024-05-13T02:24:39Z"
    generation: 1
    name: static-sriovnetwork
    namespace: z1
    resourceVersion: "380833"
    uid: 62d05e42-24c0-4427-bcd7-77d02fce31fb
  spec:
    config: |-
      {
          "cniVersion": "1.0.0",
          "name": "static-sriovnetwork",
          "type": "sriov",
          "vlan": 0,
          "spoofchk": "off",

5. Create test pod with above NAD network

Actual results:

      Warning  FailedCreatePodSandBox  25s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod1gjvg8_z1_1cdbb285-d4f4-4fbb-8e30-16933315aa65_0(8a7067a7914fbf21f0f083a97be5ac48aa562ecc21472780d4cc2af3e5b7784e): error adding pod z1_testpod1gjvg8 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"8a7067a7914fbf21f0f083a97be5ac48aa562ecc21472780d4cc2af3e5b7784e" Netns:"/var/run/netns/fd68f325-4141-49be-8840-a48e51c5b76d" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=z1;K8S_POD_NAME=testpod1gjvg8;K8S_POD_INFRA_CONTAINER_ID=8a7067a7914fbf21f0f083a97be5ac48aa562ecc21472780d4cc2af3e5b7784e;K8S_POD_UID=1cdbb285-d4f4-4fbb-8e30-16933315aa65" Path:"" ERRORED: error configuring pod [z1/testpod1gjvg8] networking: [z1/testpod1gjvg8/1cdbb285-d4f4-4fbb-8e30-16933315aa65:static-sriovnetwork]: error adding container to network "static-sriovnetwork": failed to set up IPAM plugin type "whereabouts" from the device "ens2f0": incompatible CNI versions; config is "1.0.0", plugin supports ["0.1.0" "0.2.0" "0.3.0" "0.3.1" "0.4.0"]

Expected results:

Additional info:

    should be caused by this changes https://github.com/openshift/sriov-network-operator/commit/ace40a0f6b8d32c34a05fc680130c1a358d90fbd

https://github.com/openshift/whereabouts-cni/pull/278

Bug OCPBUGS-30544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/516

Bug OCPBUGS-29176: ART requests updates to 4.16 image ose-oauth-apiserver-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-apiserver/pull/101

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-apiserver/pull/101

Bug OCPBUGS-33404: Assisted installer reports installed SATA SDDs are removable and hangs the installation

View the Description View the linked PRs

Description of problem:

OpenShift Assisted Installer reporting Dell PowerEdge C6615 node’s four 960GB SATA Solid State Disks as removable and subsequently refusing to continue installation of OpenShift on to at least one of those Disks.
This issue is where by OpenShift agent installer reports installed SATA SDDs are removable and refuses to use any of them as installation targets.

Linux Kernel reports:
sd 4:0:0:0 [sdb] Attached SCSI removable disk
sd 5:0:0:0 [sdc] Attached SCSI removable disk
sd 6:0:0:0 [sdd] Attached SCSI removable disk
sd 3:0:0:0 [sda] Attached SCSI removable disk
Each removable disk is clean, 894.3GiB  free space, no partitions etc.

However - Insufficient
This host does not meet the minimum hardware or networking requirements and will not be included in the cluster.
Hardware: 
Failed     
  Warning alert:    
    Insufficient        
Minimum disks of required size: No eligible disks were found, please check specific disks to see why they are not eligible.

Version-Release number of selected component (if applicable):

   4.15.z

How reproducible:

    100 %

Steps to Reproduce:

    1. Install with assisted Installer
    2. Generate ISO using option over console.
    3. Boot the ISO on dell HW mentioned in description
    4. Observe journal logs for disk validations

Actual results:

    Installation fails at disk validation

Expected results:

    Installation should complete

Additional info:

https://github.com/openshift/assisted-installer-agent/pull/717

Bug OCPBUGS-27092: Baremetal bootstrap logs no longer contain all services

View the Description View the linked PRs

Description of problem:

When bootstrap logs are collected (e.g. as part of a CI run when bootstrapping fails), it no longer contains most of the Ironic services. They used to be run in standalone pods, but after a recent refactoring, they are systemd services.

https://github.com/openshift/installer/pull/7854

Bug OCPBUGS-29417: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1201

Bug OCPBUGS-24357: oc-mirror should not save the .oc-mirror logs to user home dir , better to use the working directory

View the Description View the linked PRs

Description of problem:

when use the oc-mirror with v2 format , will save the .oc-mirror dir to the default jjjjuser directory , and the data is very large. Now we don't have flag to specify the path for the log, but should save to the working directory.

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.15.0-202312011230.p0.ge4022d0.assembly.stream-e4022d0", GitCommit:"e4022d08586406f3a0f92bab1d3ea6cb8856b4fa", GitTreeState:"clean", BuildDate:"2023-12-01T12:48:12Z", GoVersion:"go1.20.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

run command : oc-mirror --from file://out docker://localhost:5000/ocptest --v2 --config config.yaml

Actual results:

will save the logs to user directory.

Expected results:

Better to have flags to specify where to save the logs or use the working directory .

https://github.com/openshift/oc-mirror/pull/750

Bug OCPBUGS-25514: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/33

Bug OCPBUGS-26122: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/92

Bug OCPBUGS-26979: [EIP multi NIC] In LGW mode, when pod is selected, connection to node IP fails

View the Description View the linked PRs

Description of problem:

In LGW (local gateway mode) mode, when pod is selected by an EIP thats hosted by an interface that isnt the default interface, connection to node IP fails

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2048

Bug OCPBUGS-35460: [CAPI install] failed to run command 'openshift-install destroy bootstrap'

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33745~~. The following is the description of the original issue:
—
Description of problem:

Follow up the step described in https://github.com/openshift/installer/pull/8350 to destroy bootstrap server manually, failed with error `FATAL error destroying bootstrap resources failed to delete bootstrap machine: machines.cluster.x-k8s.io "jimatest-5sjqx-bootstrap" not found`


# ./openshift-install version
./openshift-install 4.16.0-0.nightly-2024-05-15-001800
built from commit 494b79cf906dc192b8d1a6d98e56ce1036ea932f
release image registry.ci.openshift.org/ocp/release@sha256:d055d117027aa9afff8af91da4a265b7c595dc3ded73a2bca71c3161b28d9d5d
release architecture amd64

On AWS:
# ./openshift-install create cluster --dir ipi-aws
INFO Credentials loaded from the "default" profile in file "/root/.aws/credentials" 
WARNING failed to find default instance type: no instance type found for the zone constraint 
WARNING failed to find default instance type for worker pool: no instance type found for the zone constraint 
INFO Consuming Install Config from target directory 
WARNING failed to find default instance type: no instance type found for the zone constraint 
WARNING FeatureSet "TechPreviewNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. 
INFO Creating infrastructure resources...         
INFO Creating IAM roles for control-plane and compute nodes 
INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /tmp/jima/ipi-aws/auth/envtest.kubeconfig 
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:44379 --webhook-port=44331 --webhook-cert-dir=/tmp/envtest-serving-certs-1391600832] 
INFO Running process: aws infrastructure provider with args [-v=4 --diagnostics-address=0 --health-addr=127.0.0.1:42725 --webhook-port=45711 --webhook-cert-dir=/tmp/envtest-serving-certs-1758849099 --feature-gates=BootstrapFormatIgnition=true,ExternalResourceGC=true] 
INFO Created manifest *v1.Namespace, namespace= name=openshift-cluster-api-guests 
INFO Created manifest *v1beta2.AWSClusterControllerIdentity, namespace= name=default 
INFO Created manifest *v1beta1.Cluster, namespace=openshift-cluster-api-guests name=jima16a-2xszh 
INFO Created manifest *v1beta2.AWSCluster, namespace=openshift-cluster-api-guests name=jima16a-2xszh 
INFO Waiting up to 15m0s (until 11:01PM EDT) for network infrastructure to become ready... 
INFO Network infrastructure is ready              
INFO Creating private Hosted Zone                 
INFO Creating Route53 records for control plane load balancer 
INFO Created manifest *v1beta2.AWSMachine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-bootstrap 
INFO Created manifest *v1beta2.AWSMachine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master-0 
INFO Created manifest *v1beta2.AWSMachine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master-1 
INFO Created manifest *v1beta2.AWSMachine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master-2 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-bootstrap 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master-0 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master-1 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master-2 
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=jima16a-2xszh-bootstrap 
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=jima16a-2xszh-master 
INFO Waiting up to 15m0s (until 11:07PM EDT) for machines to provision... 
INFO Control-plane machines are ready             
INFO Cluster API resources have been created. Waiting for cluster to become ready... 
INFO Waiting up to 20m0s (until 11:12PM EDT) for the Kubernetes API at https://api.jima16a.qe.devcluster.openshift.com:6443... 
INFO API v1.29.4+4a87b53 up                       
INFO Waiting up to 30m0s (until 11:25PM EDT) for bootstrapping to complete... 
^CWARNING Received interrupt signal                    
INFO Shutting down local Cluster API control plane... 
INFO Stopped controller: Cluster API              
INFO Stopped controller: aws infrastructure provider 
INFO Local Cluster API system has completed operations 

# ./openshift-install destroy bootstrap --dir ipi-aws
INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /tmp/jima/ipi-aws/auth/envtest.kubeconfig 
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:45869 --webhook-port=43141 --webhook-cert-dir=/tmp/envtest-serving-certs-3670728979] 
INFO Running process: aws infrastructure provider with args [-v=4 --diagnostics-address=0 --health-addr=127.0.0.1:46111 --webhook-port=35061 --webhook-cert-dir=/tmp/envtest-serving-certs-3674093147 --feature-gates=BootstrapFormatIgnition=true,ExternalResourceGC=true] 
FATAL error destroying bootstrap resources failed to delete bootstrap machine: machines.cluster.x-k8s.io "jima16a-2xszh-bootstrap" not found 
INFO Shutting down local Cluster API control plane... 
INFO Stopped controller: Cluster API              
INFO Stopped controller: aws infrastructure provider 
INFO Local Cluster API system has completed operations 

Same issue on vSphere:
# ./openshift-install create cluster --dir ipi-vsphere/
INFO Consuming Install Config from target directory 
WARNING FeatureSet "CustomNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. 
INFO Creating infrastructure resources...         
INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /tmp/jima/ipi-vsphere/auth/envtest.kubeconfig 
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:39945 --webhook-port=36529 --webhook-cert-dir=/tmp/envtest-serving-certs-3244100953] 
INFO Running process: vsphere infrastructure provider with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:45417 --webhook-port=37503 --webhook-cert-dir=/tmp/envtest-serving-certs-3224060135 --leader-elect=false] 
INFO Created manifest *v1.Namespace, namespace= name=openshift-cluster-api-guests 
INFO Created manifest *v1beta1.Cluster, namespace=openshift-cluster-api-guests name=jimatest-5sjqx 
INFO Created manifest *v1beta1.VSphereCluster, namespace=openshift-cluster-api-guests name=jimatest-5sjqx 
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=vsphere-creds 
INFO Waiting up to 15m0s (until 10:47PM EDT) for network infrastructure to become ready... 
INFO Network infrastructure is ready              
INFO Created manifest *v1beta1.VSphereMachine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-bootstrap 
INFO Created manifest *v1beta1.VSphereMachine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master-0 
INFO Created manifest *v1beta1.VSphereMachine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master-1 
INFO Created manifest *v1beta1.VSphereMachine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master-2 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-bootstrap 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master-0 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master-1 
INFO Created manifest *v1beta1.Machine, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master-2 
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-bootstrap 
INFO Created manifest *v1.Secret, namespace=openshift-cluster-api-guests name=jimatest-5sjqx-master 
INFO Waiting up to 15m0s (until 10:47PM EDT) for machines to provision... 
INFO Control-plane machines are ready             
INFO Cluster API resources have been created. Waiting for cluster to become ready... 
INFO Waiting up to 20m0s (until 10:57PM EDT) for the Kubernetes API at https://api.jimatest.qe.devcluster.openshift.com:6443... 
INFO API v1.29.4+4a87b53 up                       
INFO Waiting up to 1h0m0s (until 11:37PM EDT) for bootstrapping to complete... 
^CWARNING Received interrupt signal                    
INFO Shutting down local Cluster API control plane... 
INFO Stopped controller: Cluster API              
INFO Stopped controller: vsphere infrastructure provider 
INFO Local Cluster API system has completed operations 
 
# ./openshift-install destroy bootstrap --dir ipi-vsphere/
INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /tmp/jima/ipi-vsphere/auth/envtest.kubeconfig 
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:34957 --webhook-port=34511 --webhook-cert-dir=/tmp/envtest-serving-certs-94748118] 
INFO Running process: vsphere infrastructure provider with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:42073 --webhook-port=46721 --webhook-cert-dir=/tmp/envtest-serving-certs-4091171333 --leader-elect=false] 
FATAL error destroying bootstrap resources failed to delete bootstrap machine: machines.cluster.x-k8s.io "jimatest-5sjqx-bootstrap" not found 
INFO Shutting down local Cluster API control plane... 
INFO Stopped controller: Cluster API              
INFO Stopped controller: vsphere infrastructure provider 
INFO Local Cluster API system has completed operations

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-05-15-001800

How reproducible:

    Always

Steps to Reproduce:

    1. Create cluster
    2. Interrupt the installation when waiting for bootstrap completed
    3. Run command "openshift-install destroy bootstrap --dir <dir>" to destroy bootstrap manually

Actual results:

    Failed to destroy bootstrap through command 'openshift-install destroy bootstrap --dir <dir>'

Expected results:

    Bootstrap host is destroyed successfully

Additional info:

https://github.com/openshift/installer/pull/8600

Bug OCPBUGS-24887: Update 4.16 ose-cluster-kube-cluster-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-operator/pull/32

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-operator/pull/32

Bug OCPBUGS-25013: Update 4.16 ose-powervs-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-powervs/pull/62

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-powervs/pull/62

Bug OCPBUGS-30426: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13661

Bug OCPBUGS-25006: Update 4.16 ose-cluster-ingress-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/1006

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/1006

Bug OCPBUGS-24911: Update 4.16 ose-cluster-autoscaler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/303

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/303

Bug OCPBUGS-37428: Machine-config operator should not hot loop generating ValidatingAdmissionPolicyUpdated events

View the Description View the linked PRs

Description of problem

Seen in a 4.17 nightly-to-nightly CI update:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade/1809154554084724736/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-machine-config-operator") | .reason' | sort | uniq -c | sort -n | tail -n3
     82 Pulled
     82 Started
   2116 ValidatingAdmissionPolicyUpdated
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade/1809154554084724736/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-machine-config-operator" and .reason == "ValidatingAdmissionPolicyUpdated").message' | sort | uniq -c
    705 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/machine-configuration-guards because it changed
    705 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/managed-bootimages-platform-check because it changed
    706 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/mcn-guards because it changed

I'm not sure what those are about (which may be a bug on it's own? Would be nice to know what changed), but it smells like a hot loop to me.

Version-Release number of selected component

Seen in 4.17. Not clear yet how to audit for exposure frequency or versions, short of teaching the origin test suite to fail if it sees too many of these kinds of events? Maybe a for openshift-... namespaces version of the current events should not repeat pathologically in e2e namespaces test-case? Which we may have, but it's not tripping?

How reproducible

Besides the initial update, also seen in this 4.17.0-0.nightly-2024-07-05-091056 serial run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial/1809154615350923264/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-machine-config-operator" and .reason == "ValidatingAdmissionPolicyUpdated").message' | sort | uniq -c
   1006 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/machine-configuration-guards because it changed
   1006 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/managed-bootimages-platform-check because it changed
   1007 Updated ValidatingAdmissionPolicy.admissionregistration.k8s.io/mcn-guards because it changed

So possibly every time, in all 4.17 clusters?

Steps to Reproduce

1. Unclear. Possibly just install 4.17.
2. Run oc -n openshift-machine-config-operator get -o json events | jq -r '.items[] | select(.reason == "ValidatingAdmissionPolicyUpdated")'.

Actual results

Thousands of hits.

Expected results

Zero to few hits.

https://github.com/openshift/machine-config-operator/pull/4482

Bug OCPBUGS-41540: [4.16] opm creates FBCs which are incompatible with IIB catalogs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39458~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37819. The following is the description of the original issue:
—
Description of problem:

    When we added new bundle metadata encoding as `olm.csv.metadata` in https://github.com/operator-framework/operator-registry/pull/1094 (downstreamed for 4.15+) we created situations where
- konflux onboarded operators, encouraged to use upstream:latest to generate FBC from templates; and
- IIB-generated catalog images which used earlier opm versions to serve content

could generate the new format but not be able to serve it. 

One only has to `opm render` an SQLite catalog image, or expand a catalog template.

Version-Release number of selected component (if applicable):

How reproducible:

every time

Steps to Reproduce:

    1. opm render an SQLite catalog image
    2.
    3.

Actual results:

    uses `olm.csv.metadata` in the output

Expected results:

    only using `olm.bundle.object` in the output

Additional info:

https://github.com/openshift/operator-framework-olm/pull/859

Bug OCPBUGS-24859: Update 4.16 ose-haproxy-router-base-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/546

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/546

Bug OCPBUGS-35318: [CI-Watcher] add-flow-ci.feature is failing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34791~~. The following is the description of the original issue:
—
add-flow-ci.feature test is flaking sporadically for both console and console-operator repositories.

  Running:  add-flow-ci.feature                                                             (1 of 1)
[23798:0602/212526.775826:ERROR:zygote_host_impl_linux.cc(273)] Failed to adjust OOM score of renderer with pid 24169: Permission denied (13)
Couldn't determine Mocha version


  Logging in as test
  Create the different workloads from Add page
      redirect to home
      ensure perspective switcher is set to Developer
    ✓ Getting started resources on Developer perspective (16906ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Select Template category CI/CD
      You are on Topology page - Graph view
    ✓ Deploy Application using Catalog Template "CI/CD": A-01-TC02 (example #1) (27858ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Select Template category Databases
      You are on Topology page - Graph view
    ✓ Deploy Application using Catalog Template "Databases": A-01-TC02 (example #2) (29800ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Select Template category Languages
      You are on Topology page - Graph view
    ✓ Deploy Application using Catalog Template "Languages": A-01-TC02 (example #3) (38286ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Select Template category Middleware
      You are on Topology page - Graph view
    ✓ Deploy Application using Catalog Template "Middleware": A-01-TC02 (example #4) (30501ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Select Template category Other
      You are on Topology page - Graph view
    ✓ Deploy Application using Catalog Template "Other": A-01-TC02 (example #5) (35567ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Application Name "sample-app" is created
      Resource type "deployment" is selected
      You are on Topology page - Graph view
    ✓ Deploy secure image with Runtime icon from external registry: A-02-TC02 (example #1) (28896ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Application Name "sample-app" is selected
      Resource type "deployment" is selected
      You are on Topology page - Graph view
    ✓ Deploy image with Runtime icon from internal registry: A-02-TC03 (example #1) (23555ms)
      redirect to home
      ensure perspective switcher is set to Developer
      Resource type "deployment" is selected
      You are on Topology page - Graph view
      You are on Topology page - Graph view
      You are on Topology page - Graph view
    ✓ Edit Runtime Icon while Editing Image: A-02-TC05 (47438ms)
      redirect to home
      ensure perspective switcher is set to Developer
      You are on Topology page - Graph view
    ✓ Create the Database from Add page: A-03-TC01 (19645ms)
      redirect to home
      ensure perspective switcher is set to Developer
      redirect to home
      ensure perspective switcher is set to Developer
    1) Deploy git workload with devfile from topology page: A-04-TC01
      redirect to home
      ensure perspective switcher is set to Developer
      Resource type "Deployment" is selected
      You are on Topology page - Graph view
    ✓ Create a workload from Docker file with "Deployment" as resource type: A-05-TC02 (example #1) (43434ms)
      redirect to home
      ensure perspective switcher is set to Developer
      You are on Topology page - Graph view
    ✓ Create a workload from YAML file: A-07-TC01 (31905ms)
      redirect to home
      ensure perspective switcher is set to Developer
    ✓ Upload Jar file page details: A-10-TC01 (24692ms)
      redirect to home
      ensure perspective switcher is set to Developer
      You are on Topology page - Graph view
    ✓ Create Sample Application from Add page: GS-03-TC05 (example #1) (40882ms)
      redirect to home
      ensure perspective switcher is set to Developer
      You are on Topology page - Graph view
    ✓ Create Sample Application from Add page: GS-03-TC05 (example #2) (52287ms)
      redirect to home
      ensure perspective switcher is set to Developer
    ✓ Quick Starts page when no Quick Start has started: QS-03-TC02 (23439ms)
      redirect to home
      ensure perspective switcher is set to Developer
      quick start is complete
    ✓ Quick Starts page when Quick Start has completed: QS-03-TC03 (28139ms)


  17 passing (10m)
  1 failing

  1) Create the different workloads from Add page
       Deploy git workload with devfile from topology page: A-04-TC01:
     CypressError: `cy.focus()` can only be called on a single element. Your subject contained 14 elements.

https://on.cypress.io/focus
      at Context.focus (https://console-openshift-console.apps.ci-op-lm9pvf4l-be832.origin-ci-int-aws.dev.rhcloud.com/__cypress/runner/cypress_runner.js:112944:70)
      at wrapped (https://console-openshift-console.apps.ci-op-lm9pvf4l-be832.origin-ci-int-aws.dev.rhcloud.com/__cypress/runner/cypress_runner.js:138021:19)
  From Your Spec Code:
      at Context.eval (webpack:///./support/step-definitions/addFlow/create-from-devfile.ts:10:59)
      at Context.resolveAndRunStepDefinition (webpack:////go/src/github.com/openshift/console/frontend/node_modules/cypress-cucumber-preprocessor/lib/resolveStepDefinition.js:217:0)
      at Context.eval (webpack:////go/src/github.com/openshift/console/frontend/node_modules/cypress-cucumber-preprocessor/lib/createTestFromScenario.js:26:0)



[mochawesome] Report JSON saved to /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress_report_devconsole.json


  (Results)

  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Tests:        18                                                                               │
  │ Passing:      17                                                                               │
  │ Failing:      1                                                                                │
  │ Pending:      0                                                                                │
  │ Skipped:      0                                                                                │
  │ Screenshots:  2                                                                                │
  │ Video:        false                                                                            │
  │ Duration:     10 minutes, 0 seconds                                                            │
  │ Spec Ran:     add-flow-ci.feature                                                              │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘


  (Screenshots)

  -  /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress/scree     (1280x720)
     nshots/add-flow-ci.feature/Create the different workloads from Add page -- Deplo               
     y git workload with devfile from topology page A-04-TC01 (failed).png                          
  -  /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress/scree     (1280x720)
     nshots/add-flow-ci.feature/Create the different workloads from Add page -- Deplo               
     y git workload with devfile from topology page A-04-TC01 (failed) (attempt 2).pn               
     g                                                                                              


====================================================================================================

  (Run Finished)


       Spec                                              Tests  Passing  Failing  Pending  Skipped  
  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ ✖  add-flow-ci.feature                      10:00       18       17        1        -        - │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘
    ✖  1 of 1 failed (100%)                     10:00       18       17        1        -        -

console

console-operator

https://github.com/openshift/console/pull/13962

Bug OCPBUGS-25004: Update 4.16 ose-apiserver-network-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/apiserver-network-proxy/pull/46

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/apiserver-network-proxy/pull/46

Bug OCPBUGS-31678: [aws] s3:HeadBucket permission does not exist

View the Description View the linked PRs

Description of problem:

    The code requires the `s3:HeadObject` permission (https://github.com/openshift/cloud-credential-operator/blob/master/pkg/aws/utils.go#L57) but it doesn't exist. The AWS docs say the permission needed is `s3:ListBucket`: https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Try to install cluster with minimal permissions without s3:HeadBucket
    2.
    3.

Actual results:

level=warning msg=Action not allowed with tested creds action=iam:DeleteUserPolicy
level=warning msg=Tested creds not able to perform all requested actions
level=warning msg=Action not allowed with tested creds action=s3:HeadBucket
level=warning msg=Tested creds not able to perform all requested actions
level=fatal msg=failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: AWS credentials cannot be used to either create new creds or use as-is
Installer exit with code 1

Expected results:

    Only `s3:ListBucket` should be checked.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/690

Bug OCPBUGS-38707: [release-4.16] promote feature from Tech Preview to Accessible-by-default

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28992

Bug OCPBUGS-43021: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6893

Bug OCPBUGS-24833: Update 4.16 ose-libvirt-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-libvirt/pull/273

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-libvirt/pull/273

Bug OCPBUGS-29484: openshift/images repository lacks a CI job to run unit tests

View the Description View the linked PRs

Description of problem

The egress-router implementations under https://github.com/openshift/images/tree/master/egress have unit tests alongside the implementations, within the same repository, but the repository does not have a CI job to run those unit tests. We do not have any tests for egress-router in https://github.com/openshift/origin. This means that we are effectively lacking CI test coverage for egress-router.

Version-Release number of selected component (if applicable)

All versions.

How reproducible

100%.

Steps to Reproduce

1. Open a PR in https://github.com/openshift/images and check which CI jobs are run on it.
2. Check the job definitions in https://github.com/openshift/release/blob/master/ci-operator/jobs/openshift/images/openshift-images-master-presubmits.yaml.

Actual results

There are "ci/prow/e2e-aws", "ci/prow/e2e-aws-upgrade", and "ci/prow/images" jobs defined, but no "ci/prow/unit" job.

Expected results

There should be a "ci/prow/unit" job, and this job should run the unit tests that are defined in the repository.

Additional info

The lack of a CI job came up on https://github.com/openshift/images/pull/162.

https://github.com/openshift/images/pull/165

Bug OCPBUGS-24831: Update 4.16 ose-machine-config-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/4070

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/4070

Bug OCPBUGS-27263: Bump Golang to 1.21 to be consistent with ART

View the Description View the linked PRs

Description of problem:

    ART is moving the container images to be built by Golang 1.21. We should do the same to keep our build config in sync with ART.

Version-Release number of selected component (if applicable):

    4.16/master

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7925

Bug OCPBUGS-30836: Power VS: DHCP service is not dependent on wait_for_workspace

View the Description View the linked PRs

Description of problem:

    When deploying a cluster on Power VS, you need to wait for a short period after the workspace is created to facilitate the network configuration. This period is ignored by the DHCP service.

Version-Release number of selected component (if applicable):

How reproducible:

    Easily

Steps to Reproduce:

    1. Deploy a cluster on Power VS with an installer provisioned workspace
    2. Observe that the terraform logs ignore the waiting period

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8145

Bug OCPBUGS-34013: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2374

Bug OCPBUGS-24049: PF-5: Add option to enable/disable tailing to Pod log viewer mobile screen issues

View the Description View the linked PRs

Description of problem:

Follow on bug for story Add option to enable/disable tailing to Pod log viewer issues:
1. The position property for pf-5 Dropdown component doesn't workTo reproduce:Add `position="right"` property to `Dropdown` componentThe position doesn't change in `"@patternfly/react-core": "5.1.0"` 
2. Clicking the `Checkbox` label wrapped with `DropdownItem` doesn't  trigger the `onChange` on mobile screen. 
3.  The Expand button color is not blue in mobile due to replacing Button with DropdownItem4. The kebab toggle jumps to the screen top if already opened when resizing. 
4. The kebab toggle jumps to the screen top if already opened when resizing.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13394

Task MON-3661: Update downstream prometheus-operator to v0.71.0

View the linked PRs

Bug OCPBUGS-39412: Multipart upload issues with Cloudflare R2 using S3 api

View the Description View the linked PRs

Description of problem:

Multipart upload issues with Cloudflare R2 using S3 api. Some S3 compatible object storage systems like R2 require that all multipart chunks are the same size. This was mostly true before, except the final chunk was larger than the requested chunk size which causes uploads to fail.

Version-Release number of selected component (if applicable):

How reproducible:

    Problem shows itself on OpenShift CI clusters intermittently.

Steps to Reproduce:

This behavior has been causing 504 Gateway Timeout issues in the image registry instances in OpenShift CI clusters.
It is connected to uploading big images (i.e 35GB), but we do not currently have the exact steps that reproduce it.

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    https://github.com/distribution/distribution/issues/3873 
    https://github.com/distribution/distribution/issues/3873#issuecomment-2258926705
    https://developers.cloudflare.com/r2/api/workers/workers-api-reference/#r2multipartupload-definition (look for "uniform in size")

https://github.com/openshift/image-registry/pull/410

Bug OCPBUGS-24537: pathological events test failed multiple times for ns/openshift-kube-scheduler

View the Description View the linked PRs

Description of problem:

    4.15 nightly payloads have been affected by this test multiple times:

: [sig-arch] events should not repeat pathologically for ns/openshift-kube-scheduler expand_less0s{ 1 events happened too frequently

event happened 21 times, something is wrong: namespace/openshift-kube-scheduler node/ci-op-2gywzc86-aa265-5skmk-master-1 pod/openshift-kube-scheduler-guard-ci-op-2gywzc86-aa265-5skmk-master-1 hmsg/2652c73da5 - reason/ProbeError Readiness probe error: Get "https://10.0.0.7:10259/healthz": dial tcp 10.0.0.7:10259: connect: connection refused result=reject
body:
 From: 08:41:08Z To: 08:41:09Z}

In each of the 10 jobs aggregated, 2 to 3 jobs failed with this test. Historically this test passed 100%. But with the past two days test data, the passing rate has dropped to 97% and aggregator started allowing this in the latest payload: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1732295947339173888

The first payload this started appearing is https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-12-05-071627.

All the events happened during cluster-operator/kube-scheduler progressing.

For comparison, here is a passed job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936539870498816

Here is a failed one: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936538192777216

They both have the same set of probe error events. For the passing jobs, the frequency is lower than 20, while for the failed job, one of those events repeated more than 20 times and therefore results in the test failure.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28448

Bug OCPBUGS-43881: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/134

Bug OCPBUGS-36107: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/73

Bug OCPBUGS-26127: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/277

Bug OCPBUGS-31959: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/insights-operator/pull/924

Bug OCPBUGS-29565: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/793

Bug OCPBUGS-30052: PAC: Repositories list page breaks with a TypeError

View the Description View the linked PRs

Description of problem:

    Repositories list page breaks with a TypeError 
cannot read properties of undefined (reading `pipelinesascode.tekton.dev/repository`)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://drive.google.com/file/d/1TpH_PTyBxNX0b9SPZ2yS8b-q-tbvp6Ok/view?usp=sharing

https://github.com/openshift/console/pull/13659

Bug OCPBUGS-23319: When starting cluster update, nodes in paused MCPs begin updating

View the Description View the linked PRs

Description of problem:

When using canary rollout, paused MCPs begin updating when the user triggers the cluster update.

Version-Release number of selected component (if applicable):

How reproducible:

Approximately 3/10 times that I have witnessed.

Steps to Reproduce:

1. Install cluster
2. Follow canary rollout strategy: https://docs.openshift.com/container-platform/4.11/updating/update-using-custom-machine-config-pools.html 
3. Start cluster update

Actual results:

Worker nodes in paused MCPs begin update

Expected results:

Worker nodes in paused MCPs will not begin update until cluster admin unpauses the MCPs

Additional info:

This has occurred with my customer in their Azure self-managed cluster and their on-prem cluster in vSphere, as well as my lab cluster in vSphere.

https://github.com/openshift/console/pull/13717

Bug OCPBUGS-25394: namespace port group is cleaned up on restart

View the Description View the linked PRs

Description of problem:

The problem was that namespace handler on initial sync would delete all ports (because logical port cache where it got lsp UUIDs wasn't populated) and all acls (they were just set to nil). Even though both ports and acls will be re-added by the corresponding handlers, it may cause disruption.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. create a namespace with at least 1 pod and egress firewall in it

2. pick any ovnkube-node pod, find namespace port group UUID in nbdb by external_ids["name"]=<namespace name>, e.g. for "test" namespace

_uuid               : 6142932d-4084-4bc3-bdcb-1990fc71891b
acls                : [ab2be619-1266-41c2-bb1d-1052cb4e1e97, b90a4b4a-ceee-41ee-a801-08c37a9bf3e7, d314fa8d-7b5a-40a5-b3d4-31091d7b9eae]
external_ids        : {name=test}
name                : a18007334074686647077
ports               : [55b700e4-8176-42e7-97a6-8b32a82fefe5, cb71739c-ad6c-4436-8fd6-0643a5417c7d, d8644bf1-6bed-4db7-abf8-7aaab0625324]

3. restart chosen ovn-k pod

4. check logs on restart that update chosen port group to have zero ports and zero acls

Update operations generated as: [{Op:update Table:Port_Group Row:map[acls:{GoSet:[]} external_ids:{GoMap:map[name:test]} ports:{GoSet:[]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {6142932d-4084-4bc3-bdcb-1990fc71891b}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUID: UUIDName:}]

Actual results:

Expected results:

On restart port group stays the same, no extra update with empty ports and acls is generated

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/1990

Bug OCPBUGS-35379: [capi aws]Master is fetching ignition from the bootstrap MCS through proxy incorrectly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35197~~. The following is the description of the original issue:
—
Description of problem:

The issue is found when QE testing the minimal Firewall list required by an AWS installation
(https://docs.openshift.com/container-platform/4.15/installing/install_config/configuring-firewall.html) for 4.16. The way we're verifying this is by setting all the URLs listed in the doc into the whitelist of a proxy server[1], adding the proxy to install-config.yaml, so addresses outside of the doc will be rejected by the proxy server during cluster installation. 
[1]https://steps.ci.openshift.org/chain/proxy-whitelist-aws

We're seeing such error from Masters' console
``` 
[  344.982244] ignition[782]: GET https://api-int.ci-op-b2hcg02h-ce587.qe.devcluster.openshift.com:22623/config/master: attempt #73
[  344.985074] ignition[782]: GET error: Get "https://api-int.ci-op-b2hcg02h-ce587.qe.devcluster.openshift.com:22623/config/master": Forbidden
```

And the deny log from proxy server 
```
1717653185.468   0 10.0.85.91 TCP_DENIED/403 2252 CONNECT api-int.ci-op-b2hcg02h-ce587.qe.devcluster.openshift.com:22623 - HIER_NONE/- text/html

```
So looks Master is using proxy to visit the MCS address, and the Internal API domain - api-int.ci-op-b2hcg02h-ce587.qe.devcluster.openshift.com  is not in the whitelist of proxy, so the request is denied by proxy. But actually such Internal API address should be already in the NoProxy list, so master shouldn't use proxy to send the internal request. 

This is a proxy info collected from another cluster, the api-int.<cluter_domain> is added in the no proxy list by default. 
```
[root@ip-10-0-11-89 ~]# cat /etc/profile.d/proxy.sh 
export HTTP_PROXY="http://ec2-3-16-83-95.us-east-2.compute.amazonaws.com:3128"
export HTTPS_PROXY="http://ec2-3-16-83-95.us-east-2.compute.amazonaws.com:3128"
export NO_PROXY=".cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.gpei-dis3.qe.devcluster.openshift.com,localhost,test.no-proxy.com" 
```

Version-Release number of selected component (if applicable):

registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-06-02-202327

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8586

Bug OCPBUGS-35730: CSI pods are not restarted when changing enable_topology value

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30949~~. The following is the description of the original issue:
—
Description of problem: After changing the value of enable_topology in the openshift-config/cloud-provider-config config map, the CSI controller pods should restart to pick up the new value. This is not happening.

It seems like our understanding in https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/127#issuecomment-1780967488 was wrong.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/174

Bug OCPBUGS-24888: Update 4.16 ose-cluster-openshift-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/321

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/321

Bug OCPBUGS-28661: openshift/builder - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/377

Bug OCPBUGS-31960: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/318

Bug OCPBUGS-31699: gstreamer1 package dependency in network-tools creates legal concerns

View the Description View the linked PRs

Description of problem:

gstreamer1 package (and its plugins) include certain video/audio codecs, which create licensing concerns for our Partners, who embed our solutions (OCP) and deliver it to their end customers. 

ose-network-tools container image (seems applicable for all OCP releases) includes dependency to gstreamer1 rpm (and its plugin rpms, like gstreamer1-plugins-bad-free). The request is re-consider this dependency and if possible totally remove it. It is a blocking issue which prevents our partners to deliver their solution on the field.

It is an indirect dependency. ose-network-tools includes wireshark, wireshark has dependency to qt5-multimedia, which in turn includes dependency to gstreamer1-plugins-bad-free. 

First question: is wireshark really needed for network-tools? Wireshirk is a GUI tool, so dependency is not clear. 
Second question: would wireshark-cli be sufficient for needed purposes instead? Because CLI version does not contain dependency to qt5 and so on.

Version-Release number of selected component (if applicable):

    Seems applicable to all active OCP releases.

How reproducible:

    Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/network-tools/pull/116

Bug MGMT-15886: [STG] Numeric dns name 11.11.11 is now accepted in BE as valid dns name

View the Description View the linked PRs

Description of the problem:

Until latest release we had a test which try to set dns name to 11.11.11 and was expecting BE to throw exception was succeding , (BE was throwing exception)
since last release it is no longer the case and dns name is beeing accepted as

How reproducible:

Steps to reproduce:

1.create a cluster , set base dns name to 11.11.11

2.

3.

Actual results:

according to discussion in thread dns name must start with a letter , in that case we expecting BE to throw exception

Expected results:
No exception thrown

https://github.com/openshift/assisted-service/pull/5801

Task MON-3708: Fix integration job for telemeter

View the Description View the linked PRs

https://prow.ci.openshift.org/job-history/gs/test-platform-results/pr-logs/directory/pull-ci-openshift-telemeter-master-integration shows that the job fails a lot while there was no recent change which could explain this.

It blocks merges to the telemeter repository for no valid reason.

https://github.com/openshift/telemeter/pull/509

Bug OCPBUGS-37046: Topology view shows "TypeError: Cannot read properties of null (reading 'metadata')"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35879~~. The following is the description of the original issue:
—
Description of problem:

Customer reports that in the OpenShift Container Platform for a single namespace they are seeing a "TypeError: Cannot read properties of null (reading 'metadata')" error when navigating to the Topology view (Developer Console):

TypeError: Cannot read properties of null (reading 'metadata')
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1220454)
    at s (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:424007)
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:330465)
    at na (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:58879)
    at Hs (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:111315)
    at xl (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:98327)
    at Cl (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:98255)
    at _l (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:98118)
    at pl (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:95105)
    at https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:263:44774

Screenshot is available in the linked Support Case. The following Stack Trace is shown:

at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:330387)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at a (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:245070)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:426770)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at g
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at a (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:242507)
    at svg
    at div
    at https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:603940
    at u (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:602181)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at e.a (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:398426)
    at div
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:353461
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:354168
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1405970)
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:452052)
    at withFallback(Connect(withUserSettingsCompatibility(undefined)))
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:62178)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:545565)
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:775077)
    at div
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:458280)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:719437)
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:9899)
    at div
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:512628
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:123:75018)
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:511867
    at https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:150:220157
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:375316
    at div
    at R (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:183146)
    at N (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:183594)
    at f (https://console.apps.example.com/static/vendors~app/code-refs/actions~delete-revision~dev-console/code-refs/actions~dev-console/code-refs/ad~01887c45-chunk-0fc9a9eb8a528a7c580c.min.js:26:22249)
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:509351
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:548866
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at div
    at div
    at t.b (https://console.apps.example.com/static/dev-console/code-refs/common-chunk-5e4f38c02bde64a97ae5.min.js:1:113711)
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/common-chunk-5e4f38c02bde64a97ae5.min.js:1:116541)
    at u (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:305613)
    at https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:509656
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:452052)
    at withFallback()
    at t.a (https://console.apps.example.com/static/dev-console/code-refs/topology-chunk-e4ae65442e61628a832f.min.js:1:553554)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:67625)
    at I (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1533554)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:69670)
    at Suspense
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:452052)
    at section
    at m (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:720427)
    at div
    at div
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1533801)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:545565)
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:775077)
    at div
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:458280)
    at l (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1175827)
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:458912
    at S (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:98:86864)
    at main
    at div
    at v (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:264220)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:62178)
    at div
    at div
    at c (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:545565)
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:775077)
    at div
    at d (https://console.apps.example.com/static/vendor-patternfly-core-chunk-cdcfdc55890623d5fc26.min.js:1:458280)
    at Un (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:36:183620)
    at t.default (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:880042)
    at e.default (https://console.apps.example.com/static/quick-start-chunk-794085a235e14913bdf3.min.js:1:3540)
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:239711)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1610459)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at _t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:36:142374)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at ee (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628636)
    at i (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:830807)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1604651)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1604840)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1602256)
    at te (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1628767)
    at https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1631899
    at r (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:36:121910)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:67625)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:69670)
    at t (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:21:64230)
    at re (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1632210)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:804787)
    at t.a (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:1079398)
    at s (https://console.apps.example.com/static/main-chunk-876b3080b765b87baa51.min.js:1:654118)
    at t.a (https://console.apps.example.com/static/vendors~main-chunk-4b6445a3b3fc17bf0831.min.js:150:195887)
    at Suspense

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.38
Developer Console

How reproducible:

Only on customer side, in a single namespace on a single cluster

Steps to Reproduce:

1. On a particular cluster, enter the Developer Console
2. Navigate to "Topology"

Actual results:

Loading the page fails with the error "TypeError: Cannot read properties of null (reading 'metadata')"

Expected results:

No error is shown. The Topology view is shown

Additional info:

- Screenshot available in linked Support Case
- HAR file is available in linked Support Case

https://github.com/openshift/console/pull/14059

Bug OCPBUGS-26399: Missing support for singular VIP in ACI for BareMetal

View the Description View the linked PRs

Description of problem:

Since the singular variant of APIVIP/IngressVIP has been removed as part of https://github.com/openshift/installer/pull/7574, the appliance disk image e2e job is now failing: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-appliance-master-e2e-compact-ipv4-static

The job fails since th appliance support only 4.14, which still requires the singular variant of the VIP properties.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

1. Invoke appliance e2e job on master: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-appliance-master-e2e-compact-ipv4-static

Actual results:

Job fails with the following validation error:
"the Machine Network CIDR can be defined by setting either the API or Ingress virtual IPs"
Due to missing apiVIP and ingressVIP in AgentClusterInstall.

Expected results:

AgentClusterInstall should include also the singular 'apiVIP' and 'ingressVIP', and the e2e job should successfully complete

Additional info:

https://github.com/openshift/installer/pull/7859

Bug OCPBUGS-29566: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-scheduler-operator/pull/535

Bug OCPBUGS-37180: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8769

Bug OCPBUGS-37707: When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29664~~. The following is the description of the original issue:
—
Description of problem:

Created Net-attach-def with 2 IPs in range. After that created deployment with 2 replicas using that net-attach-def. Whereabouts daemoneset is created also cronjob is enable reconsiling at every one min. 
When i poweroff the node one which one of pod is deployded gracefully(poweroff)/ungracefully(poweroff --force) new pod is getting created on healthy node and stuck in container creating state

Version-Release number of selected component (if applicable):

    4.14.11

How reproducible:

- Create whereabout daemon set with help of [documentation]([https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network)]
- Update the reconciler_cron_expression to: "*/1 * * * *"
- Create net-attach-def with 2 IPs in range
- Create deployment with 2 replicas
- Powreoff the node on which on of the POD is running
- New Pod spawned on new healthy node with Container Creating in status.

Steps to Reproduce:

1. On fresh cluster with version 4.14.11
2. Create whereabout daemon set with help of documentation   
3. Update the reconciler_cron_expression to: "*/1 * * * *"
$ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/1 * * * *"

4. Create new project
$ oc new-project nadtesting

5. Apply below nad.yaml
$ cat nad.yaml 
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-net-attach1
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "br-ex",
      "mode": "bridge",
      "ipam": {
        "type": "whereabouts",
        "datastore": "kubernetes",
        "range": "172.17.20.0/24",
        "range_start": "172.17.20.11",
        "range_end": "172.17.20.12"
      }
    }'

6. Create deployment using net-attach-def with two replica,
$ cat naddeployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment1
  labels:
    app: macvlan1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: macvlan1
  template:
    metadata:
      annotations:
           k8s.v1.cni.cncf.io/networks: macvlan-net-attach1
      labels:
        app: macvlan1
    spec:
      containers:
      - name: google
        image: gcr.io/google-samples/kubernetes-bootcamp:v1
        ports:
        - containerPort: 8080

7. Two Pod will be created
$ oc get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                       NOMINATED NODE   READINESS GATES
deployment1-fbfdf5cbc-d6sgr   1/1     Running   0          15m   10.129.2.9    ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   <none>           <none>
deployment1-fbfdf5cbc-njkpz   1/1     Running   0          15m   10.128.2.16   ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   <none>           <none>

8. Power off the node using debug
$ oc debug node/ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh 
# chroot /host
# shutdown

9. Wait for sometime new pod will created on healthy node which stuck in containercreating 
$ oc get pod -o wide
NAME                          READY   STATUS              RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
deployment1-fbfdf5cbc-6cb8d   0/1     ContainerCreating   0          9m53s   <none>        ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk   <none>           <none>
deployment1-fbfdf5cbc-d6sgr   1/1     Running             0          28m     10.129.2.9    ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   <none>           <none>
deployment1-fbfdf5cbc-njkpz   1/1     Terminating         0          28m     10.128.2.16   ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   <none>           <none>

10. Node status just for reference,
$ oc get nodes  
NAME                                       STATUS     ROLES                  AGE   VERSION
ci-ln-xvfy762-c1627-h7xzk-master-0         Ready      control-plane,master   59m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-master-1         Ready      control-plane,master   59m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-master-2         Ready      control-plane,master   58m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   NotReady   worker                 43m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk   Ready      worker                 43m   v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   Ready      worker                 43m   v1.27.10+28ed2d

Actual results:

Shutdown node's pod stuck in terminating state and not releasing IP. New Pod is stuck in container creating status.

Expected results:

New Pod should start smoothly on new-node.

Additional info:

- Just for information : If i follow manual approach the this issue will resolve for that i need to follow this step
1. remove that termination IP from overlapping
$ oc delete overlappingrangeipreservations.whereabouts.cni.cncf.io <IP>

2. remove that termination IP from ippools.whereabouts.cni.cncf.io
$ oc edit ippools.whereabouts.cni.cncf.io <IP Pool> 
Remove that stale IP from list

Also, the whereabouts-reconciler logs on the Terminating pod's node report:
2024-02-19T10:48:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-njkpz
2024-02-19T10:48:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-njkpz
2024-02-19T10:48:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-njkpz matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}]
2024-02-19T10:48:00Z [debug] no IP addresses to cleanup
2024-02-19T10:48:00Z [verbose] reconciler success

i.e. it fails to recognize the need to remove the allocation.

https://github.com/openshift/whereabouts-cni/pull/304

Story WRKLDS-1071: [R&D] revision controller spinning 30+ revisions

View the Description View the linked PRs

Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1705425516419799

A revision controller is spinning to many revisions.

Goal: update the revision controller code to temporarily log config changes to validate the newly created revisions are valid. Or, proof some new revisions are unnecessary.

Bug OCPBUGS-25434: nmstateconfig file not getting applied to bare metal host

View the Description View the linked PRs

Description of problem:

Configuration files applied via the API do not have any effect on the configuration of the bare metal host.

Version-Release number of selected component (if applicable):

OpenShift 4.12.42 with ACM 2.8

How reproducible:

Reproducible.

Steps to Reproduce:

1. Applied nmstateconfig via 'oc apply', realized after it booted the subnet prefix was incorrect.
2. Deleted bmh and nmstateconfig.
3. Applied correct config  via 'oc apply', machine boots with 1st config still.
4. Deleted bmh and nmstateconfig.
5. Created host via BMC form in GUI with correct config.  Machine boots with correct config.
6. Tested deleting bmh and nmstateconfig, and creating new machine by just applying the bmh file with zero network config, and machine boots again with networking from step 5.

Actual results:

Bare metal host does not get config via 'oc apply'.

Expected results:

'oc apply -f nmstateconfig.yaml' should work to apply networking configuration.

Additional info:

HP Synergy 480 Gen10 (871942-B21)
UEFI boot with redfish virtual media
Static IP with bonding.

https://github.com/openshift/assisted-service/pull/5844

Bug OCPBUGS-24592: Re-enable and fix broken-tests after PatternFly 5 update

View the Description View the linked PRs

As part of the PatternFly update from 5.1.0 to 5.1.1 it was required to disable some dev console e2e tests.

See https://github.com/openshift/console/pull/13380

We need to re-enable and adapt at least this tests:

In frontend/packages/dev-console/integration-tests/features/addFlow/create-from-devfile.feature and in frontend/packages/dev-console/integration-tests/features/e2e/add-flow-ci.feature

Scenario: Deploy git workload with devfile from topology page: A-04-TC01

In frontend/packages/helm-plugin/integration-tests/features/helm-release.feature and frontend/packages/helm-plugin/integration-tests/features/helm/actions-on-helm-release.feature:

Scenario: Context menu options of helm release: HR-01-TC01

Can we please also check why we have both broken tests in two features files? 🤷

https://github.com/openshift/console/pull/13464

Bug OCPBUGS-31585: disable http2 for ignition endpoint

View the Description View the linked PRs

The hypershift ignition endpoint needlessly supports APLN http2. In light of CVE-2023-39325, there is no reason to support http2 if it is not being used.

https://github.com/openshift/hypershift/pull/3817

Bug OCPBUGS-24903: Update 4.16 ose-csi-driver-shared-resource-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/158

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/158

Bug OCPBUGS-36351: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8683

Bug MGMT-16926: [STG] LVMS Multi Node - lvm requirments success although no attached disk added

View the Description View the linked PRs

Description of the problem:

LVMS multi node
requires a additional disk for the operator

however i was able to create cluster 4.15 multinode , select lvms and without adding the additional disk i see that lvm requirment passes and i am able to continue and start installaiton

How reproducible:

Steps to reproduce:

1. create cluter 4.15 multi node

2. select lvms operator

3. do not attach additional disk

Actual results:

it is possible to continue until installation page and start installation
lvm requirment is set as success

Expected results:

lvm requirement should show fail
should not be bale to proceed to installation before attaching difk

https://github.com/openshift/assisted-service/pull/5998

Bug OCPBUGS-24363: ovnkube-controller bug: ovn service lb still has the endpoint when pod is in terminating state

View the Description View the linked PRs

Description of problem:

The users are experiencing an issue with NodePort traffic forwarding, where the TCP traffic continues to be directed to pods which are under terminating state, the connection cannot be created sucessfully, as per the customer mentioned this issue is causing the connection disruptions in the business transaction.

Version-Release number of selected component (if applicable):

On the OpenShift 4.12.13 with RHEL8.6 workers and OVN environment.

How reproducible:

here is the code found.
https://github.com/openshift/ovn-kubernetes/blob/dd3c7ed8c1f41873168d3df26084ecbfd3d9a36b/go-controller/pkg/util/kube.go#L360；
—
func IsEndpointServing(endpoint discovery.Endpoint) bool {
if endpoint.Conditions.Serving != nil

{ return *endpoint.Conditions.Serving }

else

{ return IsEndpointReady(endpoint) }

}

// IsEndpointValid takes as input an endpoint from an endpoint slice and a boolean that indicates whether to include
// all terminating endpoints, as per the PublishNotReadyAddresses feature in kubernetes service spec. It always returns true
// if includeTerminating is true and falls back to IsEndpointServing otherwise.
func IsEndpointValid(endpoint discovery.Endpoint, includeTerminating bool) bool

{ return includeTerminating || IsEndpointServing(endpoint) }

—

Look like 'IsEndpointValid' function will retrun serving=true endpoint, it not checking the ready=true endpoint
I see recently the code has been changed in this section(look up Ready=true is changed to Serving=true)?

[Check the "Serving" field for endpoints]
https://github.com/openshift/ovn-kubernetes/commit/aceef010daf0697fe81dba91a39ed0fdb6563dea#diff-daf9de695e0ff81f9173caf83cb88efa138e92a9b35439bd7044aa012ff931c0

https://github.com/openshift/ovn-kubernetes/blob/release-4.12/go-controller/pkg/util/kube.go#L326-L386
—
out.Port = *port.Port
for _, endpoint := range slice.Endpoints {
// Skip endpoint if it's not valid
if !IsEndpointValid(endpoint, includeTerminating)

{ klog.V(4).Infof("Slice endpoint not valid") continue }

for _, ip := range endpoint.Addresses {
klog.V(4).Infof("Adding slice %s endpoint: %v, port: %d", slice.Name, endpoint.Addresses, *port.Port)
ipStr := utilnet.ParseIPSloppy(ip).String()
switch slice.AddressType

{ case discovery.AddressTypeIPv4: v4ips.Insert(ipStr) case discovery.AddressTypeIPv6: v6ips.Insert(ipStr) default: klog.V(5).Infof("Skipping FQDN slice %s/%s", slice.Namespace, slice.Name) }

}
}
—

Steps to Reproduce:

Here is the customer's sample pods for you refering.
mbgateway-st-8576f6f6f8-5jc75 1/1 Running 0 104m 172.30.195.124 appn01-100.app.paas.example.com <none> <none>
mbgateway-st-8576f6f6f8-q8j6k 1/1 Running 0 5m51s 172.31.2.97 appn01-202.app.paas.example.com <none> <none>

pod yaml：
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 40
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 9190
timeoutSeconds: 5
name: mbgateway-st
ports:
- containerPort: 9190
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 40
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 9190
timeoutSeconds: 5
resources:
limits:
cpu: "2"
ephemeral-storage: 10Gi
memory: 2G
requests:
cpu: 50m
ephemeral-storage: 100Mi
memory: 1111M

when delete pod Pod（mbgateway-st-8576f6f6f8-5jc75）, check the EndpointSlice status：
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:

addresses:
- 172.30.195.124
conditions:
ready: false
serving: true
terminating: true
nodeName: appn01-100.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-5jc75
namespace: lb59-10-st-unigateway
uid: 5e8a375d-ba56-4894-8034-0009d0ab8ebe
zone: AZ61QEBIZ_AZ61QEM02_FD3
addresses:
- 172.31.2.97
conditions:
ready: true
serving: true
terminating: false
nodeName: appn01-202.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-q8j6k
namespace: lb59-10-st-unigateway
uid: 5bd195b7-e342-4b34-b165-12988a48e445
zone: AZ61QEBIZ_AZ61QEM02_FD1

Wait for a little moment, try to check Ovn Service lb， it found the endpoints information doesn't update to the latest.
9349d703-1f28-41fe-b505-282e8abf4c40 Service_lb59-10- tcp 172.35.0.185:31693 172.30.195.124:9190,172.31.2.97:9190
dca65745-fac4-4e73-b412-2c7530cf4a91 Service_lb59-10- tcp 172.35.0.170:31693 172.30.195.124:9190,172.31.2.97:9190
a5a65766-b0f2-4ac6-8f7c-cdebeea303e3 Service_lb59-10- tcp 172.35.0.89:31693 172.30.195.124:9190,172.31.2.97:9190
a36517c5-ecaa-4a41-b686-37c202478b98 Service_lb59-10- tcp 172.35.0.213:31693 172.30.195.124:9190,172.31.2.97:9190
16d997d1-27f0-41a3-8a9f-c63c8872d7b8 Service_lb59-10- tcp 172.35.0.92:31693 172.30.195.124:9190,172.31.2.97:9190

Wait for a little moment,
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:

addresses:
- 172.30.195.124
conditions:
ready: false
serving: true
terminating: true
nodeName: appn01-100.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-5jc75
namespace: lb59-10-st-unigateway
uid: 5e8a375d-ba56-4894-8034-0009d0ab8ebe
zone: AZ61QEBIZ_AZ61QEM02_FD3
addresses:
- 172.31.2.97
conditions:
ready: true
serving: true
terminating: false
nodeName: appn01-202.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-q8j6k
namespace: lb59-10-st-unigateway
uid: 5bd195b7-e342-4b34-b165-12988a48e445
zone: AZ61QEBIZ_AZ61QEM02_FD1
addresses:
- 172.30.132.78
conditions:
ready: false
serving: false
terminating: false
nodeName: appn01-089.app.paas.example.com
targetRef:
kind: Pod
name: mbgateway-st-8576f6f6f8-8lp4s
namespace: lb59-10-st-unigateway
uid: 755cbd49-792b-4527-b96a-087be2178e9d
zone: AZ61QEBIZ_AZ61QEM02_FD3

check Ovn Service lb， it found the Pod Endpoint information is still here：
fceeaf8f-e747-4290-864c-ba93fb565a8a Service_lb59-10- tcp 172.35.0.56:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190
bef42efd-26db-4df3-b99d-370791988053 Service_lb59-10- tcp 172.35.1.26:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190
84172e2c-081c-496a-afec-25ebcb83cc60 Service_lb59-10- tcp 172.35.0.118:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190
34412ddd-ab5c-4b6b-95a3-6e718dd20a4f Service_lb59-10- tcp 172.35.1.14:31693 172.30.132.78:9190,172.30.195.124:9190,172.31.2.97:9190

Actual results:

Service LB endpoint determines on the POD.status.condition[type=Serving] status.

Expected results:

Service LB endpoint should determines on the POD.status.condition[type=Ready] status.

Additional info:

The ovn-controller determines whether an endpoint should be added to the Service Load Balancer (serviceLB) based on the condition.serving. The current issue is that when a pod is in the terminating state, the condition.serving remains true. Its state determines on the POD.status.condition[type=Ready] status is being true.

However when a pod is deleted, the endpointslice condition.serving state remains unchanged, and the backend pool of the service LB still includes the IP information of the deleted pod.Why doesn't ovn-controller use the condition.ready status to decide whether the pod's IP should be added to the service LB backend pool?

Could the shift-networking experts confirm whether this is the openshift ovn service lb bug or not?

https://github.com/openshift/ovn-kubernetes/pull/2018

Bug OCPBUGS-36601: Tooltip on Pipeline when expression is not shows

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36260~~. The following is the description of the original issue:
—

Description of problem:

Tooltip on Pipeline whenexpression is not shows in Pipeline visualization.

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Create a Pipeline with whenExpression
navigate to the Pipeline details page
hover over the whenExpression diamond shape

Actual results:

When expression tooltip is not shows on hover

Expected results:

Should show When expression tooltip on hover

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

https://github.com/openshift/console/pull/14033

Task PODAUTO-104: True up OWNERS files for PODAUTO repos

View the Description View the linked PRs

Slack discussion here: https://redhat-internal.slack.com/archives/C02F1J9UJJD/p1702394712492839

  Repo: openshift/kubernetes/pkg/controller/podautoscaler
    MISSING: jkyros, joelsmith
  Repo: openshift/kubernetes-autoscaler/vertical-pod-autoscaler
    MISSING: jkyros
  Repo: openshift/vertical-pod-autoscaler-operator/
    MISSING: jkyros

The openshift/kubernetes one was the only real weird one where there might not be a precedent. Looking at the kubernetes repo it looks like the custom is to add a DOWNSTREAM_APPROVERS file as a carry patch for downstream approvers?

https://github.com/openshift/kubernetes-autoscaler/pull/283

Bug MGMT-17764: [custom ocp] Base version major.minor does not pick the latest from candidate channel

View the Description View the linked PRs

Description of the problem:

When create a cluster with base version 4.16 which is from candidate channel,
all versions returned with same X.Y.Z 4.16.0-ec.Z format

The returned version should be latest but we returned the first hit because we do not check versions after -ec.Z
means

4.16.0-ec.1 4.16.0-ec.2 4.16.0-ec.3

Wont work.

How reproducible:

Always , return different result

Steps to reproduce:

1. Create a cluster with Latest release from test-infra
export OPENSHIFT_VERSION=4.16

2. Once cluster created check the picked version

(Pdb++) cluster.get_details().openshift_version
2024-05-07 06:55:26,409  root INFO       - 140479183103808 - Refreshing API key     (/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py:78)->refresh_api_key
'4.16.0-ec.2'

--> When this order 4.16.0-ec.5 chosen   github.com/openshift/assisted-service/models.ReleaseImages len: 7, cap: 20, [ *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.5-x86_64", Version: *"4.16.0-ec.5",}, *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.6-x86_64", Version: *"4.16.0-ec.6",}, *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-x86_64", Version: *"4.16.0-ec.4",}, *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.0-x86_64", Version: *"4.16.0-ec.0",}, *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.1-x86_64", Version: *"4.16.0-ec.1",}, *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.3-x86_64", Version: *"4.16.0-ec.3",}, *{ CPUArchitecture: *"x86_64", CPUArchitectures: github.com/lib/pq.StringArray len: 1, cap: 1, ["x86_64"], Default: false, OpenshiftVersion: *"4.16", SupportLevel: "beta", URL: *"quay.io/openshift-release-dev/ocp-release:4.16.0-ec.2-x86_64", Version: *"4.16.0-ec.2",}, ]

Expected results:
Expecting to get always the latest from channel

https://github.com/openshift/assisted-service/pull/6276

Bug OCPBUGS-30449: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/226

Bug OCPBUGS-38424: [release-4.16] In Cluster settings, version text is black when in dark mode on firefox

View the Description View the linked PRs

This is a clone of issue OCPBUGS-37988. The following is the description of the original issue:
—
Description of problem:

    In the Administrator view under Cluster Settings -> Update Status Pane, the text for the versions is black instead of white when Dark mode is selected on Firefox (128.0.3 Mac). Also happens if you choose System default theme and the system is set to Dark mode.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Open /settings/cluster using Firefox with Dark mode selected
    2.
    3.

Actual results:

    The version numbers under Update status are black

Expected results:

    The version numbers under Update status are white

Additional info:

https://github.com/openshift/console/pull/14144

Bug OCPBUGS-43467: Load Red Hat keys in FIPS mode with Go 1.22

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35528~~. The following is the description of the original issue:
—

Description of problem

Cluster-update keys has some old Red Hat keys which are self-signed with SHA-1. The keys that we use have recently been resigned with SHA256. We don't rely on the self-signing to establish trust in the keys (that trust is established by baking a ConfigMap manifest into release images, where it can be read by the cluster-version operator), but we do need to avoid spooking the key-loading library. Currently Go-1.22-build CVOs in FIPS mode fail to bootstrap,
like this aws-ovn-fips run ~~> Artifacts~~ > install artifacts:

$ curl -s [https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-fips/1800906552731766784/artifacts/e2e-aws-ovn-fips/ipi-install-install/artifacts/log-bundle-20240612161314.tar] | tar -tvz | grep 'cluster-version.*log' -rw-r--r-- core/core 54653 2024-06-12 09:13 log-bundle-20240612161314/bootstrap/containers/cluster-version-operator-bd9f61984afa844dcd284f68006ffc9548377c045eff840096c74bcdcbe5cca3.log $ curl -s [https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-fips/1800906552731766784/artifacts/e2e-aws-ovn-fips/ipi-install-install/artifacts/log-bundle-20240612161314.tar] | tar -xOz log-bundle-20240612161314/bootstrap/containers/cluster-version-operator-bd9f61984afa844dcd284f68006ffc9548377c045eff840096c74bcdcbe5cca3.log | grep GPG I0612 16:06:15.952567 1 start.go:256] Failed to initialize from payload; shutting down: the config map openshift-config-managed/release-verification has an invalid key "verifier-public-key-redhat" that must be a GPG public key: openpgp: invalid data: tag byte does not have MSB set: openpgp: invalid data: tag byte does not have MSB set E0612 16:06:15.952600 1 start.go:309] Collected payload initialization goroutine: the config map openshift-config-managed/release-verification has an invalid key "verifier-public-key-redhat" that must be a GPG public key: openpgp: invalid data: tag byte does not have MSB set: openpgp: invalid data: tag byte does not have MSB set

That's this code attempting to call ReadArmoredKeyRing (which fails with a currently-unlogged openpgp: invalid data: user ID self-signature invalid: openpgp: invalid signature: RSA verification failure complaining about the SHA-1 signature, and then a fallback to ReadKeyRing, which fails on the reported openpgp: invalid data: tag byte does not have MSB set.

To avoid these failures, we should:

Improve the library-go function, so we get both the ReadArmoredKeyRing error and the ReadKeyRing error back on load failures.
Update our keys in cluster-update-keys to ones with SHA256 or other still-acceptable digest algorithm.
Drop verifier-public-key-redhat-release-auxiliary, which we have versioned in cluster-update-keys despite no known users ever.

Version-Release number of selected component

Only 4.17 will use Go 1.22, so that's the only release that needs patching. But the changes would be fine to backport if we wanted.

How reproducible

100%.

Steps to Reproduce

1. Build the CVO with Go 1.22
2. Launch a FIPS cluster.

Actual results

Fails to bootstrap, with the bootstrap CVO complaining, as shown in the Description of problem section.

Expected results

Successful install

https://github.com/openshift/cluster-update-keys/pull/64

Bug OCPBUGS-31080: Installed Operators in "Failed" status after upgrading to 4.15.3

View the Description View the linked PRs

Description of problem:

We upgraded our OpenShift Cluster from 4.4.16 to 4.15.3 and multiple operators are now in "Failed" status with the following CSV conditions such as:
- NeedsReinstall installing: deployment changed old hash=5f6b8fc6f7, new hash=5hFv6Gemy1Zri3J9ulXfjG9qOzoFL8FMsLNcLR
- InstallComponentFailed install strategy failed: rolebindings.rbac.authorization.k8s.io "openshift-gitops-operator-controller-manager-service-auth-reader" already exists

All other failures refer to a similar "auth-reader" rolebinding that already exist.

Version-Release number of selected component (if applicable):

OpenShift 4.15.3

How reproducible:

Happened on several installed operators but on the only cluster we upgraded (our staging cluster)

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

All operators should be up-to-date

Additional info:


This may be related to https://github.com/operator-framework/operator-lifecycle-manager/pull/3159

https://github.com/openshift/operator-framework-olm/pull/719

Bug OCPBUGS-34528: [Nutanix CAPI install] IPI install with capi failed to set bootType

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33570~~. The following is the description of the original issue:
—
Description of problem:

Install OCP with capi, when setting bootType: "UEFI", got unsupported value error. Installing with terrform did not met this issue.

  platform:
    nutanix:
      bootType: "UEFI"

# ./openshift-install create cluster --dir cluster --log-level debug
...
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to create control-plane manifest: NutanixMachine.infrastructure.cluster.x-k8s.io "sgao-nutanix-zonal-jwp6d-bootstrap" is invalid: spec.bootType: Unsupported value: "UEFI": supported values: "legacy", "uefi"

Set bootType: "uefi" also won't work

# ./openshift-install create manifests --dir cluster
...
FATAL failed to fetch Master Machines: failed to generate asset "Master Machines": failed to create master machine objects: platform.nutanix.bootType: Invalid value: "uefi": valid bootType: "", "Legacy", "UEFI", "SecureBoot".

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-05-08-222442

How reproducible:

    always

Steps to Reproduce:

    1.Create install config with bootType: "UEFI" and enable capi by setting:
featureSet: CustomNoUpgrade
featureGates:
- ClusterAPIInstall=true

    2.Install cluster

Actual results:

    Install failed

Expected results:

    Install passed

Additional info:

https://github.com/openshift/installer/pull/8481

Bug OCPBUGS-35571: build.sh now appears to require go v1.22

View the Description View the linked PRs

I was seeing the following error running `build.sh` with go v1.19.5 until I upgraded to v1.22.4:

```
❯ ./build.sh
pkg/auth/sessions/server_session.go:7:2: cannot find package "." in:
/Users/rhamilto/Git/console/vendor/slices
```

https://github.com/openshift/console/pull/13981

Bug OCPBUGS-19628: nodeip-configuration doesn't log to serial console

View the Description View the linked PRs

Description of problem:

The nodeip-configuration service does not log to the serial console, which makes it difficult to debug problems when networking is not available and there is no access to the node.

Version-Release number of selected component (if applicable):

Reported against 4.13, but present in all releases

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3927

Bug OCPBUGS-33893: openshift-controller-manager overwriting/undoing changes to ServiceAccount imagePullSecrets

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33815~~. The following is the description of the original issue:
—
In our hypershift test, we see the openshift-controller-manager undoing the work of our controllers to set an imagePullSecrets entry on our ServiceAccounts. The result is a rapid updating of ServiceAccounts as the controllers fight.

This started happening after https://github.com/openshift/openshift-controller-manager/pull/305

https://github.com/openshift/openshift-controller-manager/pull/309

Bug OCPBUGS-25144: MachineConfigNode lister fires unexpectedly

View the Description View the linked PRs

Description of problem: MCN lister fires in the operator pod before the CRD exists. This causes API issues and could impact upgrades.

    Version-Release number of selected component (if applicable):{code:none}

How reproducible: always

    Steps to Reproduce:{code:none}
    1. upgrade to 4.15 from any version
    2.
    3.

Actual results:

I1211 18:44:40.972098       1 operator.go:347] Starting MachineConfigOperator
I1211 18:44:40.982079       1 event.go:298] Event(v1.ObjectReference{Kind:"", Namespace:"openshift-machine-config-operator", Name:"machine-config", UID:"68bc5e8f-b7f5-4506-a870-2eecaa5afd35", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorVersionChanged' clusteroperator/machine-config-operator started a version change from [{operator 4.14.6}] to [{operator 4.15.0-0.nightly-2023-12-11-033133}]
W1211 18:44:41.255502       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:44:41.255587       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:04.915119       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:06.425952       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:06.426037       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:09.396004       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:09.396068       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:14.540488       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:14.540560       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:25.293029       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:25.293095       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:58:50.166866       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:58:50.166903       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 18:59:39.950454       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 18:59:39.950523       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:00:23.432005       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:00:23.432038       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:01:13.237298       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:01:13.237382       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:02:02.035555       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:02:02.035628       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:02:52.111260       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:02:52.111332       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:03:38.243461       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:03:38.243499       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
W1211 19:04:27.848493       1 reflector.go:535] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:04:27.848585       1 reflector.go:147] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:116: Failed to watch *v1alpha1.MachineConfigNode: failed to list *v1alpha1.MachineConfigNode: the server could not find the requested resource (get machineconfignodes.machineconfiguration.openshift.io)
E1211 19:05:37.064033       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:38.057685       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:39.036638       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:40.039736       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:41.039696       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:42.034840       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:43.044901       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:44.033229       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:45.034792       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"
E1211 19:05:46.052866       1 sync.go:1250] Error syncing Required MachineConfigPools: "error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)"

    Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4071

Bug OCPBUGS-25771: Long catalog source display name will flow out of tile on operatorhub page

View the Description View the linked PRs

Description of problem:

Check on OperatorHub page, the long catalogsource display name will overflow the operator item tile

    Version-Release number of selected component (if applicable):{code:none}
4.15.0-0.nightly-2023-12-19-033450

How reproducible:

Always

Steps to Reproduce:

    1. Create a catalogsource with a long display name.
    2. Check operator items supplied by the created catalogsource on OperatorHub page
    3.

Actual results:

2. The catalogsource display name overflows from the item tile

Expected results:

2. Show show catalogsource display name in the item tile dynamically without overflow.

Additional info:

screenshot: https://drive.google.com/file/d/1GOHJOxoBmtZX3QWDsIvc2RT5a2inkpzM/view?usp=sharing

https://github.com/openshift/console/pull/13474

Bug CNF-11145: NTO make render-sync does not support the bootstrap scenarios

View the Description View the linked PRs

Jose added few other rendering tests that utilize different inputs and outputs. make render-sync should be able to prepare them.

https://github.com/openshift/cluster-node-tuning-operator/pull/932

Bug OCPBUGS-24886: Update 4.16 ose-gcp-pd-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/53

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/53

Bug OCPBUGS-31920: openshift/api update breaks manifest generation

View the Description View the linked PRs

Due to structural changes in openshift/api our generate make target fails after an api update

make: *** No rule to make target 'vendor/github.com/openshift/api/monitoring/v1/0000_50_monitoring_01_alertingrules.crd.yaml', needed by 'jsonnet/crds/alertingrules-custom-resource-definition.json'. Stop.

https://github.com/openshift/cluster-monitoring-operator/pull/2290

Bug OCPBUGS-41806: When newly built images rolled out, the update progress is not displaying correctly (went 0 --> 3)

View the Description View the linked PRs

This is a clone of issue OCPBUGS-32812. The following is the description of the original issue:
—
Description of problem:

    When the image from a build is rolling out on the nodes, the update progress on the node is not displaying correctly.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Enable OCL functionality 
    2. Opt the pool in by MachineOSConfig 
    3. Wait for the image to build and roll out
    4. Track mcp update status by oc get mcp

Actual results:

The MCP start with O ready pool. While there are 1-2 pools got updated already, the count still remains 0. The count jump to 3 when all the pools are ready.

Expected results:

The update progress should be reflected in the mcp status correctly.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4584

Bug OCPBUGS-30103: Cluster-network-operator doesn't use node local kube-apiserver loadbalancer when templating in cluster resources

View the Description View the linked PRs

Description of problem:

The cluster-network-operator in hypershift when templating in cluster resources does not use the node local address of the client side haproxy load balancer that runs on all nodes. This bypasses a level of health checks for the backend redundant apiserver addresses that is performed by the local kube-apiserver-proxy pods that run on every node in a hypershift environment. In environments where the backend api servers are not fronted through an additional cloud load balancer: this leads to a percentage of request failures from the in cluster components occuring when a control plane endpoint goes down even if other endpoints are available.

Version-Release number of selected component (if applicable):

  4.16 4.15 4.14

How reproducible:

    100%

Steps to Reproduce:

    1. Setup a hypershift cluster in a baremetal/non cloud environment where there are redundant API servers behind a DNS that point directly to the node IPs.
    2. Power down one of the control plane nodes
    3. Schedule workload into cluster that depends on kube-proxy and/or multus to setup networking configuration
    4. You will see errors like the following 
```
add): Multus: [openshiftai/moe-8b-cmisale-master-0/9c1fd369-94f5-481c-a0de-ba81a3ee3583]: error getting pod: Get "https://[p9d81ad32fcdb92dbb598-6b64a6ccc9c596bf59a86625d8fa2202-c000.us-east.satellite.appdomain.cloud]:30026/api/v1/namespaces/openshiftai/pods/moe-8b-cmisale-master-0?timeout=1m0s": dial tcp 192.168.98.203:30026: connect: timeout
```

Actual results:

    When a control plane node fails intermittent timeouts occur when kube-proxy/multus resolve the dns and a failed control plane node ip is returned

Expected results:

    No requests fail (which will occur if all traffic is routed through the node local load balancer instance

Additional info:

    Additionally: control plane components in the management cluster that live next to the apiserver are adding uneeded dependencies by using an external DNS entry to talk to the kube-apiserver when it can use the local kube-apiserver address to have it all go over cluster local networking

https://github.com/openshift/cluster-network-operator/pull/2288

Bug OCPBUGS-36286: PowerVS: Add ibmcloud plugins

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36176~~. The following is the description of the original issue:
—
Description of problem:

The PowerVS CI uses the installer image to do some necessary setup.  The openssl binary was recently removed from that image.  So we need to switch to the upi-installer image.

Version-Release number of selected component (if applicable):

4.17

How reproducible:

Always

Steps to Reproduce:

    1. Look at CI runs

https://github.com/openshift/installer/pull/8672

Bug OCPBUGS-27159: Storage is Progressing when it can't connect to vCenter

View the Description View the linked PRs

This is continuation of ~~OCPBUGS-23342~~, now the vmware-vsphere-csi-driver-operator cannot connect to vCenter at all. Tested using invalid credentials.

The operator ends up with no Progressing condition during upgrade from 4.11 to 4.12, and cluster-storage-operator interprets it as Progressing=true.

Bug OCPBUGS-24810: Update 4.16 ose-containernetworking-plugins-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/containernetworking-plugins/pull/142

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-30957: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1919

Bug OCPBUGS-22899: Self-managed HCP pods are scheduled on single mgmt cluster node when no zones are in use

View the Description View the linked PRs

Description of problem:


In the self-managed HCP use case, if the on-premise baremetal management cluster does not have nodes labeled with the "topology.kubernetes.io/zone" key, then all HCP pods for a High Available cluster are scheduled to a single mgmt cluster node.

This is a result of the way the affinity rules are constructed.

Take the pod affinity/antiAffinity example below, which is generated for a HA HCP cluster. If the "topology.kubernetes.io/zone" label does not exist on the mgmt cluster nodes, then the pod will still get scheduled but that antiAffinity rule is effectively ignored. That seems odd due to the usage of the "requiredDuringSchedulingIgnoredDuringExecution" value, but I have tested this and the rule truly is ignored if the topologyKey is not present.

        podAffinity: 
          preferredDuringSchedulingIgnoredDuringExecution: 
          - podAffinityTerm: 
              labelSelector: 
                matchLabels: 
                  hypershift.openshift.io/hosted-control-plane: clusters-vossel1
              topologyKey: kubernetes.io/hostname
            weight: 100
        podAntiAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution: 
          - labelSelector: 
              matchLabels: 
                app: kube-apiserver
                hypershift.openshift.io/control-plane-component: kube-apiserver
            topologyKey: topology.kubernetes.io/zone

In the event that no "zones" are configured for the baremetal mgmt cluster, then the only other pod affinity rule is one that actually colocates the pods together. This results in a HA HCP having all the etcd, apiservers, etc... scheduled to a single node.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. Create a self-managed HA HCP cluster on a mgmt cluster with nodes that lack the "topology.kubernetes.io/zone" label

Actual results:

all HCP pods are scheduled to a single node.

Expected results:

HCP pods should always be spread across multiple nodes.

Additional info:


A way to address this is to add another anti-affinity rule which prevents every component from being scheduled on the same node as its replicas

https://github.com/openshift/hypershift/pull/3286

Bug OCPBUGS-27842: Add SNO to HighOverallControlPlaneCPU alert description

View the Description View the linked PRs

Current description of HighOverallControlPlaneCPU is wrong for SNO cases and can mislead users. We need to add information regarding SNO clusters to the description of the alert

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1633

Bug OCPBUGS-31017: ec2:DisassociateAddress is required for 4.16 AWS OCP installation

View the Description View the linked PRs

Document URL:

[1] https://docs.openshift.com/container-platform/4.15/installing/installing_aws/installing-aws-account.html#installation-aws-permissions_installing-aws-account

Section Number and Name:

* Required EC2 permissions for installation

Description of problem:

The permission ec2:DisassociateAddress is required for OCP 4.16+ install, but it's missing the official doc [1] - we would like to understand why/if this permission is necessary.

level=info msg=Destroying the bootstrap resources...
...
level=error msg=Error: disassociating EC2 EIP (eipassoc-01e8cc3f06f2c2499): UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:iam::301721915996:user/ci-op-0xjvtwb0-4e979-minimal-perm is not authorized to perform: ec2:DisassociateAddress on resource: arn:aws:ec2:us-east-1:301721915996:elastic-ip/eipalloc-0274201623d8569af because no identity-based policy allows the ec2:DisassociateAddress action.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-03-13-061822

How reproducible:

Always

Steps to Reproduce:

    1. Create OCP cluster with permissions listed in the official doc.
    2.
    3.

Actual results:

See description.

Expected results:

Cluster is created successfully.

Suggestions for improvement:

Add ec2:DisassociateAddress to `Required EC2 permissions for installation` in [1]

Additional info:

This impacts the permission list in ROSA Installer-Role as well.

Bug OCPBUGS-31630: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4299

Bug OCPBUGS-35822: ocm-operator: panic detected in pod

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35801~~. The following is the description of the original issue:
—
Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
at:
github.com/openshift/cluster-openshift-controller-manager-operator/pkg/operator/internalimageregistry/cleanup_controller.go:146 +0xd65

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/356

Bug MGMT-16966: Problem creating extra partition on main disk in 4.15+

View the Description View the linked PRs

Description of the problem:

Impossible to create an extra partition on the main disk at installation time with OCP 4.15. It works perfectly with 4.14 and under

I supply a custom machineconfig manifest to do so, and the behavior is that during installation, after reboot, screen is blank, and host has no networking (no route to host)

A slack thread explaining the issue with further debugging can be consulted in https://redhat-internal.slack.com/archives/C999USB0D/p1707991107757299

The bug seems to be introduced in https://github.com/openshift/assisted-installer/pull/713 , which allows for one less reboot on installation time, and to do that, it implements part of the post-reboot code. This code runs BEFORE the extra partition is created, and this creates a problem

How reproducible:

Always

Steps to reproduce:

1. Create a 4.15 cluster with an extra manifest that creates a extra partition at the end of the main disk

Example of machineconfig (change device to match installation disk):

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 98-extra-partition
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      disks:
        - device: /dev/vda
          partitions:
            - label: javi
              startMiB: 110000 # space left for the CoreOS partition.
              sizeMiB: 0 # Use all available space

2. Proceed with the installation

Actual results:

After reboot, node never comes back up

Expected results:

Cluster installs without problem

https://github.com/openshift/assisted-installer/pull/787

Bug OCPBUGS-24961: Update 4.16 vmware-vsphere-syncer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/102

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-27872: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-ibmcloud/pull/36

Bug OCPBUGS-35283: [release-4.16] Build Failure with BuildConfig from Git Repository Containing Nexus LFS Files

View the Description View the linked PRs

Description of problem:

Builds from a buildconfig are failing on OCP 4.12.48. The developers are impacted since large files cant be cloned anymore within a BuildConfig.

Version-Release number of selected component (if applicable):

4.12.48

How reproducible:

Always

Steps to Reproduce:

The issue is fixed in version 4.12.45 as per https://issues.redhat.com/browse/OCPBUGS-23419 but still the issue persists in 4.12.48

Actual results:

The build is failing.

Expected results:

The build should work without any issues.

Additional info:

Build fails with error:
```
Adding cluster TLS certificate authority to trust store
Cloning "https://<path>.git" ...
error: Downloading <github-repo>/projects/<path>.mp4 (70 MB)
Error downloading object: <github-repo>/projects/<path>.mp4 (a11ce74): Smudge error: Error downloading <github-repo>/projects/<path>.mp4 (a11ce745c147aa031dd96915716d792828ae6dd17c60115b675aba75342bb95a): batch request: missing protocol: "origin.git/info/lfs"
Errors logged to /tmp/build/inputs/.git/lfs/logs/20240430T112712.167008327.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: <github-repo>/projects/<path>.mp4: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
```

https://github.com/openshift/builder/pull/397

Bug OCPBUGS-39145: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/76

Bug OCPBUGS-30941: Agent installer attempt to contact libvirt in openshift-baremetal-install

View the Description View the linked PRs

In 4.15 when the agent installer is run using the openshift-baremetal-installer binary using an install-config containing platform data, it attempts to contact libvirt to validate the provisioning network interfaces for the bootstrap VM. This should never happen, as the agent installer doesn't use the bootstrap VM.

It is possible that users in the process of converting from baremetal IPI to the agent installer might run into this issue, since they would already be using the openshift-baremetal-installer binary.

https://github.com/openshift/installer/pull/8161

Bug OCPBUGS-35722: PowerVS: Query the CAPI provider for the timeouts needed during provisioning

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35430~~. The following is the description of the original issue:
—
Description of problem:

Query the CAPI provider for the timeouts needed during provisioning. This is optional to support.

The current default of 15 minutes is sufficient for normal CAPI installations. However, given how the current PowerVS CAPI provider waits for some resources to be created before creating the load balancers, it is possible that the LBs will not create before the 15 minute timeout. An issue was created to track this [1].

[1] kubernetes-sigs/cluster-api-provider-ibmcloud#1837

https://github.com/openshift/installer/pull/8624

Bug OCPBUGS-24637: [dell isilon CSI] Invalid volume size when restoring as new PVC from VolumeSnapshot

View the Description View the linked PRs

Description of problem:
Invalid volume size when restoring as new PVC from VolumeSnapshot. The size unit is undefined and the Size unit appears TiB. Please check the attachment.

oc get vs -n syq-test
NAME READYTOUSE SOURCEPVC RESTORESIZE SNAPSHOTCLASS
isilon-data1-snapshot-1 true isilon-data1 2219907496 isilon-snapclass-dev01
isilon-datas-snapshot true isilon-datas 2219907496 isilon-snapclass-dev01
test1-syq-snapshot true test1-syq 1Ki isilon-snapclass-dev01
test1-syq-snapshot-11 true test1-syq 1106 isilon-snapclass-dev01

[Env]
dell isilon CSI volumes

Additional info:
Issue doesn't happen with ODF

https://github.com/openshift/console/pull/13620

Bug OCPBUGS-24853: Update 4.16 ose-installer-artifacts-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7818

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7818

Bug OCPBUGS-35471: Installer is ensuring userTags on subnets in BYO VPC deployments on AWS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35467~~. The following is the description of the original issue:
—
Description of problem:

openshift-install is creating user-defined tags (platform.aws.userTags) in subnets on AWS of BYO VPC (unmanaged VPC) deployment when using CAPA.

The documentation[1] for userTags state:
> A map of keys and values that the installation program adds as tags to all resources that it creates.

So when the network (VPC and subnets) are managed by user (BYO VPC), the installer should not create additional tags when provided in install-config.yaml. 

Investigating in CAPA codebase, the feature gate TagUnmanagedNetworkResources is enabled, and the subnet is propagating the userTags in the reconciliation loop[2].

[1] https://docs.openshift.com/container-platform/4.15/installing/installing_aws/installation-config-parameters-aws.html
[2] https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/pkg/cloud/services/network/subnets.go#L618

Version-Release number of selected component (if applicable):

4.16.0-ec.6-x86_64

How reproducible:

always

Steps to Reproduce:

- 1. create VPC and subnets using CloudFormation. Example template: https://github.com/openshift/installer/blob/master/upi/aws/cloudformation/01_vpc.yaml
- 2. create install-config with user-tags and subnet IDs to install the cluster:
- 3. create the cluster with feature gate for CAPI 

```
featureSet: CustomNoUpgrade
featureGates:
- ClusterAPIInstall=true
metadata:
  name: "${CLUSTER_NAME}"
platform:
  aws:
    region: us-east-1
    subnets:
    - subnet-0165c70573a45651c
    - subnet-08540527fffeae3e9
    userTags:
      x-red-hat-clustertype: installer
      x-red-hat-managed: "true"
```

Actual results:

installer/CAPA is setting the user-defined tags in unmanaged subnets

Expected results:

- installer/CAPA does not create userTags on unmanaged subnets 
- userTags is applied for regular/standard workflow (managed VPC) with CAPA

Additional info:

- Impacting on SD/ROSA: https://redhat-internal.slack.com/archives/CCPBZPX7U/p1717588837289489

https://github.com/openshift/installer/pull/8604

Bug OCPBUGS-25766: Cluster Baremetal operator should use a leader lock

View the Description View the linked PRs

Description of problem:

Seen in this 4.15 to 4.16 CI run:

: [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers	0s
{  event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 26 times
event [namespace/openshift-machine-api node/ip-10-0-62-147.us-west-2.compute.internal pod/cluster-baremetal-operator-574577fbcb-z8nd4 hmsg/bf39bb17ae - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-574577fbcb-z8nd4_openshift-machine-api(441969c1-b430-412c-b67f-4ae2f7797f4f)] happened 51 times}

The operator recovered, and the update completed, but it's still probably worth cleaning up whatever's happening to avoid alarming anyone.

Version-Release number of selected component (if applicable):

Seems like all recent CI runs that match this string touch 4.15, 4.16, or development branches:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=Back-off+restarting+failed+container+cluster-baremetal-operator+in+pod+cluster-baremetal-operator' | grep 'failures match'
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade-local-gateway (all) - 11 runs, 36% failed, 25% of failures match = 9% impact
periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 15 runs, 20% failed, 33% of failures match = 7% impact
pull-ci-openshift-kubernetes-master-e2e-aws-ovn-downgrade (all) - 3 runs, 67% failed, 50% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 15 runs, 27% failed, 25% of failures match = 7% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-azure-sdn-upgrade (all) - 32 runs, 91% failed, 7% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 40 runs, 25% failed, 20% of failures match = 5% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-ovn-upgrade-out-of-change (all) - 4 runs, 25% failed, 100% of failures match = 25% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade (all) - 40 runs, 8% failed, 33% of failures match = 3% impact
pull-ci-openshift-azure-file-csi-driver-operator-main-e2e-azure-ovn-upgrade (all) - 2 runs, 100% failed, 50% of failures match = 50% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 7 runs, 43% failed, 33% of failures match = 14% impact
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade (all) - 10 runs, 30% failed, 33% of failures match = 10% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-gcp-ovn-arm64 (all) - 6 runs, 33% failed, 50% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 11 runs, 18% failed, 50% of failures match = 9% impact

How reproducible:


Looks like ~8% impact.

h2. Steps to Reproduce:

1.  Run ~20 exposed job types.
2. Check for {{: [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers}} failures with {{Back-off restarting failed container cluster-baremetal-operator}} messages.

h2. Actual results:

~8% impact.

h2. Expected results:

~0% impact.

h2. Additional info:

Dropping into Loki for the run I'd picked:

{code:none}
{invoker="openshift-internal-ci/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1737335551998038016"} | unpack | pod="cluster-baremetal-operator-574577fbcb-z8nd4" container="cluster-baremetal-operator" |~ "220 06:0"

includes:

E1220 06:04:18.794548       1 main.go:131] "unable to create controller" err="unable to put \"baremetal\" ClusterOperator in Available state: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"baremetal\": the object has been modified; please apply your changes to the latest version and try again" controller="Provisioning"
I1220 06:05:40.753364       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I1220 06:05:40.766200       1 webhook.go:104] WebhookDependenciesReady: everything ready for webhooks
I1220 06:05:40.780426       1 clusteroperator.go:217] "new CO status" reason="WaitingForProvisioningCR" processMessage="" message="Waiting for Provisioning CR on BareMetal Platform"
E1220 06:05:40.795555       1 main.go:131] "unable to create controller" err="unable to put \"baremetal\" ClusterOperator in Available state: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"baremetal\": the object has been modified; please apply your changes to the latest version and try again" controller="Provisioning"
I1220 06:08:21.730591       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I1220 06:08:21.747466       1 webhook.go:104] WebhookDependenciesReady: everything ready for webhooks
I1220 06:08:21.768138       1 clusteroperator.go:217] "new CO status" reason="WaitingForProvisioningCR" processMessage="" message="Waiting for Provisioning CR on BareMetal Platform"
E1220 06:08:21.781058       1 main.go:131] "unable to create controller" err="unable to put \"baremetal\" ClusterOperator in Available state: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"baremetal\": the object has been modified; please apply your changes to the latest version and try again" controller="Provisioning"

So some kind of ClusterOperator-modification race?

https://github.com/openshift/cluster-baremetal-operator/pull/395

Bug OCPBUGS-29956: Azure MAO CredentialsRequest Contains Unnecessary virtualMachines/extensions Permissions

View the Description View the linked PRs

Description of problem:

CredentialsRequest for Azure AD workload identity contains unnecessary permissions under `virtualMachines/extensions`.   Specifically write and delete.

Version-Release number of selected component (if applicable):

4.14.0+

How reproducible:

Every time

Steps to Reproduce:

    1. Create a cluster without the CredentialsRequest permissions mentioned
    2. Scale machineset
    3. See no permission errors

Actual results:

We have unnecessary permissions, but still no errors

Expected results:

Still no permission errors after removal.

Additional info:

RHCOS doesn't leverage virtual machine extensions.  It appears as though the code path is dead.

Bug OCPBUGS-25142: OCP conformance - oc service creates and deletes services - provided port is already allocated

View the Description View the linked PRs

Description of problem:

    Ran into a problem with our testing this morning on a newly create ROKS cluster
```
    Error running /usr/bin/oc --namespace=e2e-test-oc-service-p4fz2 --kubeconfig=/tmp/configfile2694323048 create service nodeport mynodeport --tcp=8080:7777 --node-port=30000:    StdOut>    error: failed to create NodePort service: Service "mynodeport" is invalid: spec.ports[0].nodePort: Invalid value: 30000: provided port is already allocated    StdErr>    error: failed to create NodePort service: Service "mynodeport" is invalid: spec.ports[0].nodePort: Invalid value: 30000: provided port is already allocated    exit status 1
```

The port was already used by a different service, we would like to make a feature request to the testing to make the port number dynamic so that if that port is taken up, it can choose an available one.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28495

Bug OCPBUGS-25555: Update 4.16 ose-powervs-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/70

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/70

Bug OCPBUGS-29039: default value of option parallelism cannot be parsed to int

View the Description View the linked PRs

Description of problem:

default value of option --parallelism cannot be parsed to int

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-02-02-002725

How reproducible:

    reproduce with cmd copy-to-node

Steps to Reproduce:

    Cmd: "oc --namespace=e2e-test-mco-4zb88 --kubeconfig=/tmp/kubeconfig-3071675436 adm copy-to-node node/ip-10-0-17-85.ec2.internal --copy=/tmp/fetch-w637bgyv=/etc/mco-compressed-test-file",
            StdErr: "error: --parallelism must be either N or N%: strconv.ParseInt: parsing \"10%%\": invalid syntax",

Actual results:

default value of --parallelism cannot be parsed

Expected results:

no error

Additional info:

there is a hack code to append % to the default value

ref: https://github.com/openshift/oc/blob/79dc671bdaeafa74b92f14ad9f6d84e344608034/pkg/cli/admin/pernodepod/command.go#L75-L79

https://github.com/openshift/oc/blob/79dc671bdaeafa74b92f14ad9f6d84e344608034/pkg/cli/admin/pernodepod/command.go#L94

err var percentParseErr should be used

https://github.com/openshift/oc/pull/1679

Bug OCPBUGS-31849: cert signer controller race condition with quorum checker

View the Description View the linked PRs

This has been reported by lance5890 upstream: https://github.com/openshift/cluster-etcd-operator/issues/1237

Description of problem:

During master node removal (out of 3), the etcd cert signer controller might still rollout a revision even though quorum is obviously going to be broken with that.

Important events:

08:06:26.674067       1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-etcd-operator", Name:"etcd-operator", UID:"253380d4-7d65-496f-8214-ab89f7878550", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MasterNodeRemoved' Observed removal of master node node3

08:06:26.909780       1 base_controller.go:272] EtcdEndpointsController reconciliation failed: EtcdEndpointsController can't evaluate whether quorum is safe: CheckSafeToScaleCluster 3 nodes are required, but only 2 are available

08:06:27.005308       1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-etcd-operator", Name:"etcd-operator", UID:"253380d4-7d65-496f-8214-ab89f7878550", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'SecretUpdated' Updated Secret/etcd-all-certs -n openshift-etcd because it changed

08:06:27.149860       1 base_controller.go:272] EtcdCertSignerController reconciliation failed: EtcdCertSignerController can't evaluate whether quorum is safe: CheckSafeToScaleCluster 3 nodes are required, but only 2 are available

Version-Release number of selected component (if applicable):

all versions where we introduced the quorum guard (> 4.12 current applicable).

How reproducible:

depends on the timing of the removal and the controller runs, but somewhat frequent.

Steps to Reproduce:

    1. remove a master node
    2. wait for quorum loss / downtime due to revision rollout

Actual results:

quorum is lost and there is brief api downtime during the revision is rolled out

Expected results:

the revisioned secret should not be updated when quorum is about to be lost

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1239

Bug OCPBUGS-23838: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/177

Bug OCPBUGS-24795: Update 4.16 ose-olm-operator-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-operator-controller/pull/51

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-operator-controller/pull/51

Bug OCPBUGS-25306: HyperShift failing conformance tests on Kubernetes 1.29 rebase

View the Description View the linked PRs

Description of problem:

Hosted control plane kube scheduler pods crashloop on clusters created with Kube 1.29 rebase

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Create a hosted cluster using 4.16 kube rebase code base
    2. Wait for the cluster to come up

Actual results:

    Cluster never comes up because kube scheduler pod crashloops

Expected results:

    Cluster comes up

Additional info:

    The kube scheduler configuration generated by the control plane operator is using the v1beta3 version of the configuration. That version is no longer included in Kubernetes v1.29

https://github.com/openshift/hypershift/pull/3313

Bug OCPBUGS-23848: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-36855: Openstack UPI - Reintroduce unique resource names

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33973~~. The following is the description of the original issue:
—
Description of problem:

The network resource provisioning playbook for 4.15 dualstack UPI contains a task for adding an IPv6 subnet to the existing external router [1].
This task fails with:
- ansible-2.9.27-1.el8ae.noarch & ansible-collections-openstack-1.8.0-2.20220513065417.5bb8312.el8ost.noarch in OSP 16 env (RHEL 8.5) or
- openstack-ansible-core-2.14.2-4.1.el9ost.x86_64 & ansible-collections-openstack-1.9.1-17.1.20230621074746.0e9a6f2.el9ost.noarch in OSP 17 env (RHEL 9.2)

Besides that we need to have a way for identifying resources for particular deployment, as it may interfere with existing one.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-22-160236

How reproducible:

Always

Steps to Reproduce:

1. Set the os_subnet6 in the inventory file for setting dualstack
2. Run the 4.15 network.yaml playbook

Actual results:

Playbook fails:
TASK [Add IPv6 subnet to the external router] ********************************** fatal: [localhost]: FAILED! => {"changed": false, "extra_data": {"data": null, "details": "Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.", "response": "{\"NeutronError\": {\"type\": \"HTTPBadRequest\", \"message\": \"Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.\", \"detail\": \"\"}}"}, "msg": "Error updating router 8352c9c0-dc39-46ed-94ed-c038f6987cad: Client Error for url: https://10.46.43.81:13696/v2.0/routers/8352c9c0-dc39-46ed-94ed-c038f6987cad, Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}."}

Expected results:

Successful playbook execution

Additional info:

The router can be created in two different tasks, the playbook [2] worked for me.

[1] https://github.com/openshift/installer/blob/1349161e2bb8606574696bf1e3bc20ae054e60f8/upi/openstack/network.yaml#L43
[2] https://file.rdu.redhat.com/juriarte/upi/network.yaml

https://github.com/openshift/installer/pull/8724

Bug OCPBUGS-39005: Failure to pull NTO image preventing startup of ocp-tuned-one-shot.service

View the Description View the linked PRs

Hello Team,

After the hard reboot of all nodes due to a power outage, failure of image pull of NTO preventing "ocp-tuned-one-shot.service" startup result in dependency failure for kubelet and crio services,

------------

journalctl_--no-pager

Aug 26 17:07:46 ocp05 systemd[1]: Reached target The firstboot OS update has completed.
Aug 26 17:07:46 ocp05 resolv-prepender.sh[3577]: NM resolv-prepender: Starting download of baremetal runtime cfg image
Aug 26 17:07:46 ocp05 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
Aug 26 17:07:46 ocp05 systemd[1]: Starting TuneD service from NTO image...
Aug 26 17:07:46 ocp05 nm-dispatcher[3687]: NM resolv-prepender triggered by lo up.
Aug 26 17:07:46 ocp05 resolv-prepender.sh[3644]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ lo == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + '[' -z ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + echo 'Not a DHCP4 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: Not a DHCP4 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + exit 0
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + '[' -z '' ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + echo 'Not a DHCP6 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: Not a DHCP6 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + exit 0
Aug 26 17:07:46 ocp05 bash[3655]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
Aug 26 17:07:46 ocp05 podman[3661]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26...
Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Main process exited, code=exited, status=125/n/a
Aug 26 17:07:46 ocp05 nm-dispatcher[3793]: NM resolv-prepender triggered by brtrunk up.
Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ brtrunk == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + '[' -z ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + echo 'Not a DHCP4 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: Not a DHCP4 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + exit 0
Aug 26 17:07:46 ocp05 systemd[1]: Failed to start TuneD service from NTO image.
Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Dependencies necessary to run kubelet.
Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Kubernetes Kubelet.
Aug 26 17:07:46 ocp05 systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'.
Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Container Runtime Interface for OCI (CRI-O).
Aug 26 17:07:46 ocp05 systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'.
Aug 26 17:07:46 ocp05 systemd[1]: kubelet-dependencies.target: Job kubelet-dependencies.target/start failed with result 'dependency'.
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + '[' -z '' ']'
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + echo 'Not a DHCP6 address. Ignoring.'
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: Not a DHCP6 address. Ignoring.
Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + exit 0

-----------

$ oc get proxy config cluster -oyaml
status:
httpProxy: http://proxy_ip:8080
httpsProxy: http://proxy_ip:8080

$ cat /etc/mco/proxy.env
HTTP_PROXY=http://proxy_ip:8080
HTTPS_PROXY=http://proxy_ip:8080

-----------

-----------
× ocp-tuned-one-shot.service - TuneD service from NTO image
Loaded: loaded (/etc/systemd/system/ocp-tuned-one-shot.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Mon 2024-08-26 17:07:46 UTC; 2h 30min ago
Main PID: 3661 (code=exited, status=125)

Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
-----------

Customer has proxy configured in their environment. However, nodes can not start after hard reboot of all nodes as it looks that NTO ignoring cluster wide proxy settings. To resolve NTO image pull issue, customer has to include proxy variable in /etc/systemd/system.conf manually.

https://github.com/openshift/cluster-node-tuning-operator/pull/1166

Bug OCPBUGS-30484: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1015

Bug OCPBUGS-37430: The Utilization of CPU and memory looks not correct on node overview page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23332~~. The following is the description of the original issue:
—
Description of problem:

Navigate to Node overview and check the Utilization of CPU and memory, it shows something like: "6.53 GiB available of 300 MiB total limit", which looks very confuse.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Navigate to Node overview
2. Check the Utilization of CPU and memory
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/14082

Bug MGMT-17513: [BE][custom OCP] version Major.Minor returns candidate image and not latest stable

View the Description View the linked PRs

Description of the problem:

API tests , running from test-infra , set the OPENSHIFT_VERSION=4.15

We expect from service to return latest stable version (x.y.z)

The returned version is 4.15.8-multi which it not from stable stream but from candidate and should not be chosen

This behviour break the API tests because we expect to pick latest stream when sending Major.Minor.

How reproducible:

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/6185

Bug OCPBUGS-25032: Update 4.16 ose-ovn-kubernetes-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1980

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1980

Bug OCPBUGS-27341: ingress operator appears to be reporting unavailable in error

View the Description View the linked PRs

: [bz-Routing] clusteroperator/ingress should not change

Has been failing for over a month in the e2e-metal-ipi-sdn-bm-upgrade jobs

I think this is because there are only two worker nodes in the BM environment and some HA services loose redundancy when one of the workers is rebooted.

In the medium term I hope to add another node to each cluster but in the sort term we should skip the test.

https://github.com/openshift/origin/pull/28480

Bug OCPBUGS-31727: Invalid Pull-Secret when using password which contains a colon character

View the Description View the linked PRs

Description of problem:

When using the OpenShift Assisted Installer with a password containing the `:` colon character.

Version-Release number of selected component (if applicable):

    OpenShift 4.15

How reproducible:

    Everytime

Steps to Reproduce:

    1. Attempt to install using the Agentbased installer with a pull-secret which includes a colon character.
   
   The following snippet of. code appears to be hit when there is a colon within the user/password section of the pull-secret.
https://github.com/openshift/assisted-service/blob/d3dd2897d1f6fe108353c9241234a724b30262c2/internal/cluster/validations/validations.go#L132-L135

Actual results:

    Install fails

Expected results:

   Install succeeds

Additional info:

https://github.com/openshift/assisted-service/pull/6362

Task HOSTEDCP-1566: 2024-05-10 Snyk Duty

View the linked PRs

Bug OCPBUGS-22648: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/108

Bug OCPBUGS-36025: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/310

Bug OCPBUGS-22438: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28421

Bug OCPBUGS-26147: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-operator/pull/269

Bug OCPBUGS-27323: Test failure in upgrade jobs- [bz-Image Registry] clusteroperator/image-registry should not change condition/Available

View the Description View the linked PRs

Description of problem:

Observing the following test case failure in 4.14 to 4.15 and 4.15 to 4.16 upgrade CI runs continuously.
[bz-Image Registry] clusteroperator/image-registry should not change condition/Available

JobLink:https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le/1746834772249808896

4.14 Image: registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.14.0-0.nightly-ppc64le-2024-01-15-085349
4.15 Image: registry.ci.openshift.org/ocp-ppc64le/release-ppc64le:4.15.0-0.nightly-ppc64le-2024-01-15-042536

https://github.com/openshift/origin/pull/28605

Story TRT-1452: Breakout Additional Pathological Event Tests by Namespace

View the Description View the linked PRs

Recent periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade failure caused by

: [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers expand_less 	0s
{  event [namespace/openshift-machine-api node/ci-op-j666c60n-23cd9-nb7wr-master-1 pod/cluster-baremetal-operator-79b78c4548-n5vrt hmsg/b7cb271b13 - Back-off restarting failed container cluster-baremetal-operator in pod cluster-baremetal-operator-79b78c4548-n5vrt_openshift-machine-api(32835332-fc25-4ddf-84ce-d3aa447d3ce0)] happened 25 times}

Shows in Component Readiness as unknown component

/ ovn upgrade-minor amd64 gcp rt > Unknown> [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers

We should update testBackoffStartingFailedContainer to check for / use known namespaces where a junit is created for known namespaces indicating pass, fail or flake.

The code already handles testBackoffStartingFailedContainerForE2ENamespaces

It looks as though we only check for known or e2e namespaces, need to double check that if that is the case we are ok with potentially unknown namespaced events getting through.

We should also review Sippy pathological tests for results that don't contain `for ns/namespace` format and review if they need to be broken out as well.

For each test that we break out we need to map the new namespace specific test to the correct component in the test mapping repository

https://github.com/openshift/origin/pull/28543

Bug OCPBUGS-31716: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/apiserver-network-proxy/pull/50

Bug OCPBUGS-25117: Update 4.16 csi-node-driver-registrar-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-node-driver-registrar/pull/59

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-node-driver-registrar/pull/59

Bug OCPBUGS-24415: Installer will fail if disable CloudControllerManager capabilities for cloud

View the Description View the linked PRs

Description of problem:

If ccm disabled in cloud such as aws, installation will continue until failed in ingress LoadBalancerPending

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Build image with pr openshift/cluster-cloud-controller-manager-operator#284,openshift/installer#7546,openshift/cluster-version-operator#979,openshift/machine-config-operator#3999 
2. Install cluster on aws with "baselineCapabilitySet: v4.14"
3.

Actual results:

Installation failed, ingress LoadBalancerPending.
$ oc get node              
NAME                                        STATUS   ROLES                  AGE   VERSION
ip-10-0-25-230.us-east-2.compute.internal   Ready    control-plane,master   86m   v1.28.3+20a5764
ip-10-0-3-101.us-east-2.compute.internal    Ready    worker                 78m   v1.28.3+20a5764
ip-10-0-46-198.us-east-2.compute.internal   Ready    control-plane,master   87m   v1.28.3+20a5764
ip-10-0-48-220.us-east-2.compute.internal   Ready    worker                 80m   v1.28.3+20a5764
ip-10-0-79-203.us-east-2.compute.internal   Ready    control-plane,master   86m   v1.28.3+20a5764
ip-10-0-95-83.us-east-2.compute.internal    Ready    worker                 78m   v1.28.3+20a5764
 
$ oc get co          
NAME                            VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                  4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   False       False         True       85m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.zhsun-aws1.qe.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.zhsun-aws1.qe.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
baremetal                       4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      84m
cloud-credential                4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      86m
cluster-autoscaler              4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      84m
config-operator                 4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      85m
console                         4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   False       True          False      79m     DeploymentAvailable: 0 replicas available for console deployment...
control-plane-machine-set       4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      81m
csi-snapshot-controller         4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      85m
dns                             4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      84m
etcd                            4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      83m
image-registry                  4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      78m
ingress                                                                                   False       True          True       78m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
insights                        4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      78m
kube-apiserver                  4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      71m
kube-controller-manager         4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      82m
kube-scheduler                  4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      82m
kube-storage-version-migrator   4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      85m
machine-api                     4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      77m
machine-approver                4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      84m
machine-config                  4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      84m
marketplace                     4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      84m
monitoring                      4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      73m
network                         4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      86m
node-tuning                     4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      78m
openshift-apiserver             4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      71m
openshift-controller-manager    4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      75m
openshift-samples               4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      78m
service-ca                      4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False         False      85m
storage                         4.15.0-0.ci.test-2023-11-22-202439-ci-ln-d46q8qt-latest   True        False

Expected results:

Tell users not to turn CCM off for cloud.

Additional info:

https://github.com/openshift/installer/pull/7546

Bug OCPBUGS-22095: PerformanceProfile render fails at Day-0 because the master/worker pools are not yet present

View the Description View the linked PRs

Description of problem:

The installer supports pre-rendering of the PerformanceProfile related manifests. However the MCO render is executed after the PerfProfile render and so the master and worker MachineConfigPools are created too late.

This causes the installation process to fail with:

Oct 18 18:05:25 localhost.localdomain bootkube.sh[537963]: I1018 18:05:25.968719       1 render.go:73] Rendering files into: /assets/node-tuning-bootstrap
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.008421       1 render.go:133] skipping "/assets/manifests/99_feature-gate.yaml" [1] manifest because of unhandled *v1.FeatureGate
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.013043       1 render.go:133] skipping "/assets/manifests/cluster-dns-02-config.yml" [1] manifest because of unhandled *v1.DNS
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.021978       1 render.go:133] skipping "/assets/manifests/cluster-ingress-02-config.yml" [1] manifest because of unhandled *v1.Ingress
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.023016       1 render.go:133] skipping "/assets/manifests/cluster-network-02-config.yml" [1] manifest because of unhandled *v1.Network
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.023160       1 render.go:133] skipping "/assets/manifests/cluster-proxy-01-config.yaml" [1] manifest because of unhandled *v1.Proxy
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.023445       1 render.go:133] skipping "/assets/manifests/cluster-scheduler-02-config.yml" [1] manifest because of unhandled *v1.Scheduler
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: I1018 18:05:26.024475       1 render.go:133] skipping "/assets/manifests/cvo-overrides.yaml" [1] manifest because of unhandled *v1.ClusterVersion
Oct 18 18:05:26 localhost.localdomain bootkube.sh[537963]: F1018 18:05:26.037467       1 cmd.go:53] no MCP found that matches performance profile node selector "node-role.kubernetes.io/master="

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

Always

Steps to Reproduce:

1. Add an SNO PerformanceProfile to extra manifest in the installer. Node selector should be: "node-role.kubernetes.io/master="
2.
3.

Actual results:

no MCP found that matches performance profile node selector "node-role.kubernetes.io/master="

Expected results:

Installation completes

Additional info:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
 name: openshift-node-workload-partitioning-sno
spec:
 cpu:
   isolated: 4-X <- must match the topology of the node
   reserved: 0-3
 nodeSelector:
   node-role.kubernetes.io/master: ""

https://github.com/openshift/cluster-node-tuning-operator/pull/833

Bug OCPBUGS-28596: Update 4.16 ose-cluster-ingress-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/1020

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/1020

Bug TRT-1480: Disruption risk analysis empty on /payload jobs

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-router-551-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1752661032133726208

The funky job renaming done in these is breaking risk analysis. The disruption checking actually ran if you look in the logs, but we don't get far enough to generate the html template, I suspect because of the error returned by risk analysis.

Would be nice if both would work but I'd be happy just to get the disruption portion populated for now. I don't think the overall risk analysis will be easy.

https://github.com/openshift/origin/pull/28568

Story HOSTEDCP-1283: Investigate and fix why some azure nodes not joining the cluster

View the Description View the linked PRs

Currently when creating an Azure cluster, only the first node of the nodePool will be ready and join the cluster, all other azure machines are stuck in the `Creating` state.

https://github.com/openshift/hypershift/pull/3445

Bug OCPBUGS-21593: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-33329: [ibm-vpc] Scheduling issue on IBM Cloud Bare Metal nodes

View the Description View the linked PRs

Description of problem:

The DaemonSet code any taints to be ignored - therefore the Operator executes on the IBM Cloud Bare Metal

Version-Release number of selected component (if applicable):

IBM Cloud Infrastructure Services (formerly known as VPC Infrastructure Environment), using IBM Cloud Bare Metal profiles with either Gen2 (Intel Cascade Lake) or Gen3 (Intel Sapphire Rapids) hardware.
Special note - this refers to IBM Cloud Bare Metal, and NOT applicable to IBM Cloud Bare Metal (Classic) in the legacy Classic Infrastructure environment (AKA. Softlayer).

How reproducible:

    Reproducible

Steps to Reproduce:

IBM LAB team found a bug that is causing errors on the bare metal worker nodes, and is requesting a patch to ibm-vpc-block-csi-driver

The proposed solution, enforce Namespace to select nodes where instance-type NOT CONTAINS substring 'metal'. This will stop the Namespace's DaemonSet from scheduling the Operator on IBM Cloud Bare Metals:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/manifests/01_namespace.yaml

```
kind: Namespace
apiVersion: v1
metadata:
  annotations:
    openshift.io/node-selector: 'node.openshift.io/instance-type notin (metal)'

Actual results:

Expected results:

enforce Namespace to select nodes where instance-type NOT CONTAINS substring 'metal'. This will stop the Namespace's DaemonSet from scheduling the Operator on IBM Cloud Bare Metals:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/manifests/01_namespace.yaml

Additional info:

03802506

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/115

Bug OCPBUGS-25703: oc tag command not having regexp check towards tag names cause OADP backup image fail

View the Description View the linked PRs

Description of problem:

1) Customer tag a image which including # (Hashtag) in the tag name

uk302-img-app-j:v0.6.12-build0000#000

2)When customer using OADP to backup images , they got below error

error excuting custom action(groupResource=imagestream.image.openshift.io namespace=dbp-p0010001, name=uk302-image-app-j): rpc error: code= Unknown= Invalid destination name udistribution-s3-c9814a92-67a4-4251-bd0d-142dfc4d3c80://dbp-p0010001/uk302-image-app-j:v0.6.12-build0000#00: invalid reference format

3) when check the source code below, we found that there are check towards tag name , seems # (Hashtag) is not allowed in regexp check

https://github.com/openshift/openshift-velero-plugin/blob/83f5067b1e04d740cd79ee0046e24283a8d7a184/velero-plugins/imagecopy/imagestream.go#L138

func copyImage(log logr.Logger, src, dest string, copyOptions *copy.Options) ([]byte, error) {
    policyContext, err := getPolicyContext()
    if err != nil {
        return []byte{}, fmt.Errorf("Error loading trust policy: %v", err)
    }
    defer policyContext.Destroy()
    srcRef, err := alltransports.ParseImageName(src)
    if err != nil {
        return []byte{}, fmt.Errorf("Invalid source name %s: %v", src, err)
    }
    destRef, err := alltransports.ParseImageName(dest)
    if err != nil {
        return []byte{}, fmt.Errorf("Invalid destination name %s: %v", dest, err)
    }

https://github.com/containers/image/blob/main/docker/reference/regexp.go#L111

const (
    // alphaNumeric defines the alpha numeric atom, typically a
    // component of names. This only allows lower case characters and digits.
    alphaNumeric = `[a-z0-9]+`

    // separator defines the separators allowed to be embedded in name
    // components. This allow one period, one or two underscore and multiple
    // dashes. Repeated dashes and underscores are intentionally treated
    // differently. In order to support valid hostnames as name components,
    // supporting repeated dash was added. Additionally double underscore is
    // now allowed as a separator to loosen the restriction for previously
    // supported names.
    separator = `(?:[._]|__|[-]*)`

    // repository name to start with a component as defined by DomainRegexp
    // and followed by an optional port.
    domainComponent = `(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])`

    // The string counterpart for TagRegexp.
    tag = `[\w][\w.-]{0,127}`

    // The string counterpart for DigestRegexp.
    digestPat = `[A-Za-z][A-Za-z0-9]*(?:[-_+.][A-Za-z][A-Za-z0-9]*)*[:][[:xdigit:]]{32,}`

    // The string counterpart for IdentifierRegexp.
    identifier = `([a-f0-9]{64})`

    // The string counterpart for ShortIdentifierRegexp.
    shortIdentifier = `([a-f0-9]{6,64})`

Expected results:

Customer want to know if this should be a bug that ,  when doing 
{code:java}
oc tag

We should have some checking towards the tag name to prevent the #(Hashtag) or other non allowed code been setting in the image tag which causing unexpected issue like in using OADP or other tools.

please have a check , thank you!

Regards
Jacob

https://github.com/openshift/oc/pull/1643

Bug OCPBUGS-31552: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/537

Bug OCPBUGS-43315: hypershift: shared-resource-csi-driver-operator SA does not include HCP pull-secret

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43112~~. The following is the description of the original issue:
—
HCP fails to deploy with SR CSI driver failing to pull its image

https://github.com/openshift/cluster-storage-operator/pull/521

Bug OCPBUGS-18643: typo in 4.14/4.15 manifests_test.go

View the Description View the linked PRs

Description of problem:

found typo in 4.14/4.15 branch when review PR: https://github.com/openshift/cluster-monitoring-operator/pull/2073

example typo in 4.14 branch

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/pkg/manifests/manifests_test.go

1. systemd unit pattern valiation error
valiation should be validation

2. enable systemd collector with invalid units parttern
parttern should be pattern

3. t.Fatalf("invalid secret namepace, got %s, want %s", s.Namespace, "openshift-user-workload-monitoring")
namepace should be namespace

4. spread contraints
should be spread constraints

Version-Release number of selected component (if applicable):

4.14/4.15

How reproducible:

always

I remember we've added golang-lint to repo, it should find the errors

https://github.com/openshift/cluster-monitoring-operator/pull/2284

Bug OCPBUGS-25540: Update 4.16 ose-csi-external-resizer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/154

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/154

Bug OCPBUGS-30620: Remove csi-operator legacy/ directory

View the Description View the linked PRs

AWS EBS, Azure Disk and Azure File operators are now built from cmd/ and pkg/, there is no code used from legacy/ dir and we should remove it.

There are still test manifests in legacy/ directory that are still used! They need to be moved somewhere else + Dockerfile.*.test and CI steps must be updated!

Technically, this is a copy of ~~STOR-1797~~, but we need a bug to be able to backport aws-ebs changes to 4.15 and not use legacy/ directory there too.

Story TRT-1557: Attribute unknown cert tests to kube-apiserver

View the Description View the linked PRs

[sig-arch][Late] collect certificate data [Suite:openshift/conformance/parallel]

Test is currently making the Unknown component red, but this test should be aligned to the kube-apiserver component. Looks like two others in the same file should be as well.

https://github.com/openshift/origin/pull/28649

Task MGMT-16501: Refactor API_VIP check to pass additional headers

View the Description View the linked PRs

Assisted installer agent's api_vip check doesn't accept multiple headers (src). This poses an issue when there are different ignition servers (e.g. hypershift) that expect different headers.

Latest use case: Hypershift's ignition server expects this header: Nodepool name and targetconfigversionhash

Bug OCPBUGS-32110: ROSA HCP openshift-controller-manager controllers for image registry are not disabled when managementState is Removed

View the Description View the linked PRs

Description of problem:

When using OpenShift 4.15 on ROSA Hosted Control Planes, after disabling the ImageRegistry, the default secrets and service accounts are still being created.

This functionality should not be occurring once the registry is removed:

https://docs.openshift.com/rosa/nodes/pods/nodes-pods-secrets.html#auto-generated-sa-token-secrets_nodes-pods-secrets

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1. Deploy ROSA 4.15 HCP Cluster
    2. Set spec.managementState = "Removed" on the cluster.config.imageregistry.operator.openshift.io. The image registry will be removed
    3. Create a new OpenShift Project
    4. Observe the builder, default and deployer ServiceAccounts and their associated Secrets are still created

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3906

Bug OCPBUGS-24895: Update 4.16 ose-openshift-apiserver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-apiserver/pull/409

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-apiserver/pull/409

Bug OCPBUGS-28870: [IBMCloud] dns records from a private cluster are not destroyed when it is using the same domain name as another existing CIS instance

View the Description View the linked PRs

Description of problem:

Install a private cluster, the base domain set in install-config.yaml is same as another existed cis domain name. 
After destroy the private cluster, the dns resource-records remains.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.create a DNS service instance, setting its domain to "ibmcloud.qe.devcluster.openshift.com", Note, this domain name is also being used in another existing CIS domain.
2.Install a private ibmcloud cluster, the base domain set in install-config is "ibmcloud.qe.devcluster.openshift.com"
3.Destroy the cluster
4.Check the remains dns records

Actual results:

$ ibmcloud dns resource-records 5f8a0c4d-46c2-4daa-9157-97cb9ad9033a -i preserved-openshift-qe-private | grep ci-op-17qygd06-23ac4
api-int.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com 
*.apps.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com 
api.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com

Expected results:

No more dns records about the cluster

Additional info:

$ ibmcloud dns zones -i preserved-openshift-qe-private | awk '{print $2}'   
Name
private-ibmcloud.qe.devcluster.openshift.com 
private-ibmcloud-1.qe.devcluster.openshift.com 
ibmcloud.qe.devcluster.openshift.com  

$ ibmcloud cis domains
Name
ibmcloud.qe.devcluster.openshift.com

When use private-ibmcloud.qe.devcluster.openshift.com and private-ibmcloud-1.qe.devcluster.openshift.com as domain, no such issue, when use ibmcloud.qe.devcluster.openshift.com as domain the dns records remains.

https://github.com/openshift/installer/pull/7987

Bug OCPBUGS-29952: [OVN Bump Tracker] ovn-controller hangs with a lot of meters

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/2120

Bug OCPBUGS-32304: [4.17] metal3-ironic-inspector CrashLoopBackOff - /certs/ca/ironic permission denied

View the Description View the linked PRs

Description of problem:


There is one pod of metal3 operator in constant failure state. The cluster was acting as Hub cluster with ACM + GitOps for SNO installation. It was working well for a few days, until this moment when no other sites could be deployed.

oc get pods -A | grep metal3
openshift-machine-api                              metal3-64cf86fb8b-fg5b9                                           3/4     CrashLoopBackOff   35 (108s ago)   155m
openshift-machine-api                              metal3-baremetal-operator-84875f859d-6kj9s                        1/1     Running            0               155m
openshift-machine-api                              metal3-image-customization-57f8d4fcd4-996hd                       1/1     Running            0               5h

Version-Release number of selected component (if applicable):

OCP version: 4.16.ec5

How reproducible:

Once it starts to fail, it does not recover.

Steps to Reproduce:

    1. Unclear. Install Hub cluster with ACM+GitOps
    2. (Perhaps: Update AgentServiceConfig

Actual results:

Pod crashing and installation of spoke cluster fails

Expected results:

Pod running and installation of spoke cluster succeds.

Additional info:

Logs of metal3-ironic-inspector:

`[kni@infra608-1 ~]$ oc logs pods/metal3-64cf86fb8b-fg5b9 -c metal3-ironic-inspector
+ CONFIG=/etc/ironic-inspector/ironic-inspector.conf
+ export IRONIC_INSPECTOR_ENABLE_DISCOVERY=false
+ IRONIC_INSPECTOR_ENABLE_DISCOVERY=false
+ export INSPECTOR_REVERSE_PROXY_SETUP=true
+ INSPECTOR_REVERSE_PROXY_SETUP=true
+ . /bin/tls-common.sh
++ export IRONIC_CERT_FILE=/certs/ironic/tls.crt
++ IRONIC_CERT_FILE=/certs/ironic/tls.crt
++ export IRONIC_KEY_FILE=/certs/ironic/tls.key
++ IRONIC_KEY_FILE=/certs/ironic/tls.key
++ export IRONIC_CACERT_FILE=/certs/ca/ironic/tls.crt
++ IRONIC_CACERT_FILE=/certs/ca/ironic/tls.crt
++ export IRONIC_INSECURE=true
++ IRONIC_INSECURE=true
++ export 'IRONIC_SSL_PROTOCOL=-ALL +TLSv1.2 +TLSv1.3'
++ IRONIC_SSL_PROTOCOL='-ALL +TLSv1.2 +TLSv1.3'
++ export 'IPXE_SSL_PROTOCOL=-ALL +TLSv1.2 +TLSv1.3'
++ IPXE_SSL_PROTOCOL='-ALL +TLSv1.2 +TLSv1.3'
++ export IRONIC_VMEDIA_SSL_PROTOCOL=ALL
++ IRONIC_VMEDIA_SSL_PROTOCOL=ALL
++ export IRONIC_INSPECTOR_CERT_FILE=/certs/ironic-inspector/tls.crt
++ IRONIC_INSPECTOR_CERT_FILE=/certs/ironic-inspector/tls.crt
++ export IRONIC_INSPECTOR_KEY_FILE=/certs/ironic-inspector/tls.key
++ IRONIC_INSPECTOR_KEY_FILE=/certs/ironic-inspector/tls.key
++ export IRONIC_INSPECTOR_CACERT_FILE=/certs/ca/ironic-inspector/tls.crt
++ IRONIC_INSPECTOR_CACERT_FILE=/certs/ca/ironic-inspector/tls.crt
++ export IRONIC_INSPECTOR_INSECURE=true
++ IRONIC_INSPECTOR_INSECURE=true
++ export IRONIC_VMEDIA_CERT_FILE=/certs/vmedia/tls.crt
++ IRONIC_VMEDIA_CERT_FILE=/certs/vmedia/tls.crt
++ export IRONIC_VMEDIA_KEY_FILE=/certs/vmedia/tls.key
++ IRONIC_VMEDIA_KEY_FILE=/certs/vmedia/tls.key
++ export IPXE_CERT_FILE=/certs/ipxe/tls.crt
++ IPXE_CERT_FILE=/certs/ipxe/tls.crt
++ export IPXE_KEY_FILE=/certs/ipxe/tls.key
++ IPXE_KEY_FILE=/certs/ipxe/tls.key
++ export RESTART_CONTAINER_CERTIFICATE_UPDATED=false
++ RESTART_CONTAINER_CERTIFICATE_UPDATED=false
++ export MARIADB_CACERT_FILE=/certs/ca/mariadb/tls.crt
++ MARIADB_CACERT_FILE=/certs/ca/mariadb/tls.crt
++ export IPXE_TLS_PORT=8084
++ IPXE_TLS_PORT=8084
++ mkdir -p /certs/ironic
++ mkdir -p /certs/ironic-inspector
++ mkdir -p /certs/ca/ironic
mkdir: cannot create directory '/certs/ca/ironic': Permission denied

https://github.com/openshift/ironic-image/pull/500

Bug OCPBUGS-33600: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/305

Bug OCPBUGS-15941: ABI arm64 SNO cluster installation stuck on initial screen

View the Description View the linked PRs

Description of problem:

~~Agent based installation is stuck on the booting screen for the arm64 SNO cluster.~~

The installer shuold validate the architecture set by the users in the install-config.yaml with the payload image being used.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

100%

Steps to Reproduce:

[Fixed original version]

1. Create agent ISO with the amd64 payload
2. Boot the created ISO on arm64 server
3. Monitor the booting screen for error

[Generalized]

1. Set the install-config.yaml controlPlane.architecture to arm64
2. Try to install with an

Actual results:

The installation is currently stuck on the initial booting screen.

Expected results:

The SNO cluster should be installed without any issues.

Additional info:

Compact cluster installation was successful, here is the prow ci link: 

https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-nightly-baremetal-compact-agent-ipv4-static-connected-p1-f7/1665833590451081216/artifacts/baremetal-compact-agent-ipv4-static-connected-p1-f7/baremetal-lab-agent-install/build-log.txt

https://github.com/openshift/installer/pull/7349

Bug OCPBUGS-34903: cluster-network-operator role doesn't have access to root-ca

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34699~~. The following is the description of the original issue:
—
Description of problem:

In the RBAC which is set up for networkTypes other than OVNKubernetes, the cluster-network-operator role allows access to a configmap named "openshift-service-ca.crt", but the configmap which is actually used is named "root-ca".

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4155

Bug TRT-1618: Upgradeable=False OpenShiftSDNConfigured

View the Description View the linked PRs

PR introduced a change causing failures in all SDN jobs.

We need to revert the change and then update the test to allow this state for SDN configured clusters.

Once the test is fixed we can reintroduce the PR and validate with payload-blocking test

https://github.com/openshift/origin/pull/28738

Bug OCPBUGS-13680: Services collide if there are multiple clusters in a single OpenStack projects

View the Description View the linked PRs

Description of problem:

If two clusters share a single OpenStack projects, cloud-provider-openstack won't distinguish type=LoadBalancer Services between them if they have the same namespace name and service name.

https://github.com/kubernetes/cloud-provider-openstack/issues/2241

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy 2 clusters.
2. Create LoadBalancer Services of the same name in default namespaces of both clusters.

Actual results:

cloud-provider-openstack fights over ownership of the LB.

Expected results:

LBs are distinguished.

Additional info:

Bug OCPBUGS-32942: Bump cluster-ingress-operator to Kubernetes 1.29 for 4.16

View the Description View the linked PRs

Description of problem

The cluster-ingress-operator repository vendors controller-runtime v0.16.3, which uses Kubernetes 1.28 packages. OpenShift 4.16 is based on Kubernetes 1.29.

Version-Release number of selected component (if applicable)

4.16.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/cluster-ingress-operator/blob/release-4.16/go.mod.

Actual results

The sigs.k8s.io/controller-runtime package is at v0.16.3.

Expected results

The sigs.k8s.io/controller-runtime package is at v0.17.0 or newer.

Additional info

https://github.com/openshift/cluster-ingress-operator/pull/1016 already bumped the k8s.io/* packages to v0.29.0, but ideally the controller-runtime package should be bumped too. The controller-runtime v0.17 release includes some breaking changes, such as the removal of apiutil.NewDiscoveryRESTMapper; see the release notes at https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.17.0.

https://github.com/openshift/cluster-ingress-operator/pull/1050

Bug OCPBUGS-30520: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/82

Bug OCPBUGS-30713: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/687

Bug OCPBUGS-25841: [vsphere] IPI destroy cluster failed to delete TagCategory

View the Description View the linked PRs

Description of problem:

After running ./openshift-install destroy cluster, TagCategory still exist

# ./openshift-install destroy cluster --dir cluster --log-level debug
DEBUG OpenShift Installer 4.15.0-0.nightly-2023-12-18-220750
DEBUG Built from commit 2b894776f1653ab818e368fa625019a6de82a8c7
DEBUG Power Off Virtual Machines
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-2
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-1
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-master-0
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-worker-0-kpg46
DEBUG Powered off                                   VirtualMachine=sgao-devqe-spn2w-worker-0-w5rrn
DEBUG Delete Virtual Machines
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-rhcos-generated-region-generated-zone
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-2
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-1
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-master-0
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-worker-0-kpg46
INFO Destroyed                                     VirtualMachine=sgao-devqe-spn2w-worker-0-w5rrn
DEBUG Delete Folder
INFO Destroyed                                     Folder=sgao-devqe-spn2w
DEBUG Delete                                        StoragePolicy=openshift-storage-policy-sgao-devqe-spn2w
INFO Destroyed                                     StoragePolicy=openshift-storage-policy-sgao-devqe-spn2w
DEBUG Delete                                        Tag=sgao-devqe-spn2w
INFO Deleted                                       Tag=sgao-devqe-spn2w
DEBUG Delete                                        TagCategory=openshift-sgao-devqe-spn2w
INFO Deleted                                       TagCategory=openshift-sgao-devqe-spn2w
DEBUG Purging asset "Metadata" from disk
DEBUG Purging asset "Master Ignition Customization Check" from disk
DEBUG Purging asset "Worker Ignition Customization Check" from disk
DEBUG Purging asset "Terraform Variables" from disk
DEBUG Purging asset "Kubeconfig Admin Client" from disk
DEBUG Purging asset "Kubeadmin Password" from disk
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk
DEBUG Purging asset "Cluster" from disk
INFO Time elapsed: 29s
INFO Uninstallation complete!

# govc tags.category.ls | grep sgao
openshift-sgao-devqe-spn2w

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-18-220750

How reproducible:

    always

Steps to Reproduce:

    1. IPI install OCP on vSphere
    2. Destroy cluster installed, check TagCategory

Actual results:

    TagCategory still exist

Expected results:

    TagCategory should be deleted

Additional info:

    Also reproduced in openshift-install-linux-4.14.0-0.nightly-2023-12-20-184526,4.13.0-0.nightly-2023-12-21-194724, while 4.12.0-0.nightly-2023-12-21-162946 have not this issue

https://github.com/openshift/installer/pull/7876

Bug OCPBUGS-27310: Source column header not displayed in PVC > VolumeSnapshots tab

View the Description View the linked PRs

Description of problem:

    Missing Source column header in PVC > VolumeSnapshots tab

Version-Release number of selected component (if applicable):

    Cluster 4.10, 4.14, 4.16

How reproducible:

Yes

Steps to Reproduce:

    1. Create a PVC i.e. "my-pvc"
    2. Create a Pod and bind it to the "my-pvc"
    3. Create a VolumeSnapshots and associate it with the "my-pvc"
    4. Goto to PVC detail > VolumeSnapshots tab

Actual results:

    The Source column header is not displayed

Expected results:

    
the Source column header should be displayed

Additional info:

https://github.com/openshift/console/pull/13565

Bug OCPBUGS-24938: Update 4.16 ose-azure-disk-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver/pull/65

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-disk-csi-driver/pull/65

Bug OCPBUGS-12857: Workload Dashboard not gracefully handling the NaN values

View the Description View the linked PRs

Description of problem:

When a namespace has a Resource Quota applied to it the Workload Graphs in the Observe view does not renders properly.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create a new project/namespace
2. Apply the following Resource Quota (just a sample) to it
```
kind: ResourceQuota
apiVersion: v1
metadata:
  name: staging-workshop-quota
spec:
  hard:
    limits.cpu: '3'
    limits.memory: 3Gi
    pods: '10'
```
3. From the Developer Console, access the Observe View 3. From teh Dashboard list, select `Kubernetes / Compute Resources / Namespaces (Workloads)` Option

Actual results:

The Graph is not rendered (see attached screenshot)

Expected results:

The Graph should render even with no data point

Additional info:

When you have a Resource Quota applied to the namespace you can try this query to see the `NaN` value returned.

```
curl -G -s -k -H "Authorization: Bearer $(oc whoami -t)" 'https://thanos-querier-openshift-monitoring.apps.cluster-your cluster domain here/api/v1/query' --data-urlencode 'query=scalar(kube_resourcequota{cluster="", namespace="user9-staging", type="hard",resource="requests.memory"})'
```

sample respose
```
{"status":"success","data":{"resultType":"scalar","result":[1682600794.396,"NaN"]}}
```

Bug MGMT-16459: [STG] Cluster 4.15.0-rc.0 with HTTP Proxy failed on timeout due the failed to StartContainer for etcd with CrashLoopBackOff

View the Description View the linked PRs

Description of the problem:

Installation of cluster using OCP image 4.15.0-rc.0 and using HTTP Proxy configuration failed on

"Control plane was not installed

3/3 control plane nodes failed to install. Please check the installation logs for more information."
and
"error Host master-0-1: updated status from installing-in-progress to error (Host failed to install because its installation stage Waiting for control plane took longer than expected 1h0m0s)"
After looked at master-0-1 node jpurnactl log found the error:
"Dec 21 00:54:29 master-0-1 kubelet.sh[5111]: E1221 00:54:29.290568 5111 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=etcd pod=etcd-bootstrap-member-master-0-1_openshift-etcd(97cad44a9feb70b1091eaa3fb1e565ca)\"" pod="openshift-etcd/etcd-bootstrap-member-master-0-1" podUID="97cad44a9feb70b1091eaa3fb1e565ca"
"
HTTP Proxy configuration works fine with OCP images 4.14.5 and 4.13.26
Steps to reproduce:

1. Setup HTTP Proxy server on hypervisor using quay.io/sameersbn/squid:3.5.27-2

2. Create cluster and got Host Discovery

3. Press Add Host. Fill out SSH public key

4. Select Show proxy settings and fill out

HTTP proxy URL and No proxy domains

5. Generate ISO image, download it and boot 3 masters and 2 workers node.

6. Continue regular cluster installation

Actual results:

Installation failed on 69%

Expected results:

Installation passed

https://github.com/openshift/assisted-installer-agent/pull/664

Bug OCPBUGS-28934: ART requests updates to 4.16 image csi-driver-manila-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-manila-operator/pull/227

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-manila-operator/pull/227

Bug OCPBUGS-36166: New registry pull secrets roll the control plane after 4.16 cluster updates

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33913~~. The following is the description of the original issue:
—

Description of problem

CI is occasionally bumping into failures like:

: [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial] expand_less	53m22s
{  fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:186]: during upgrade to registry.build05.ci.openshift.org/ci-op-kj8vc4dt/release@sha256:74bc38fc3a1d5b5ac8e84566d54d827c8aa88019dbdbf3b02bef77715b93c210: the "master" pool should be updated before the CVO reports available at the new version
Ginkgo exit error 1: exit with code 1}

where the machine-config operator is rolling the control-plane MachineConfigPool after the ClusterVersion update completes:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1791442450049404928/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/machineconfigpools.json | jq -r '.items[] | select(.metadata.name == "master").status | [.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message] | sort[]'
2024-05-17T12:57:04Z RenderDegraded=False : 
2024-05-17T12:58:35Z Degraded=False : 
2024-05-17T12:58:35Z NodeDegraded=False : 
2024-05-17T15:13:22Z Updated=True : All nodes are updated with MachineConfig rendered-master-4fcadad80c9941813b00ca7e3eef8e69
2024-05-17T15:13:22Z Updating=False : 
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1791442450049404928/artifacts/e2e-azure-ovn-upgrade/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.history[0].completionTime'
2024-05-17T14:15:22Z

Because of changes to registry pull secrets:

$ dump() {
> curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1791442450049404928/artifacts/e2e-azure-ovn-upgrade
/gather-extra/artifacts/machineconfigs.json | jq -r ".items[] | select(.metadata.name == \"$1\").spec.config.storage.files[] | select(.path == \"/etc/mco/internal-registry-pull-secret.json\").contents.source" | 
python3 -c 'import urllib.parse, sys; print(urllib.parse.unquote(sys.stdin.read()).split(",", 1)[-1])' | jq -c '.auths | to_entries[]'
> }
$ diff -u0 <(dump rendered-master-d6a8cd53ae132250832cc8267e070af6) <(dump rendered-master-4fcadad80c9941813b00ca7e3eef8e69) | sed 's/"value":.*/.../'
--- /dev/fd/63  2024-05-17 12:28:37.882351026 -0700
+++ /dev/fd/62  2024-05-17 12:28:37.883351026 -0700
@@ -1 +1 @@
-{"key":"172.30.124.169:5000",...
+{"key":"172.30.124.169:5000",...
@@ -3,3 +3,3 @@
-{"key":"default-route-openshift-image-registry.apps.ci-op-kj8vc4dt-6c39f.ci2.azure.devcluster.openshift.com",...
-{"key":"image-registry.openshift-image-registry.svc.cluster.local:5000",...
-{"key":"image-registry.openshift-image-registry.svc:5000",...
+{"key":"default-route-openshift-image-registry.apps.ci-op-kj8vc4dt-6c39f.ci2.azure.devcluster.openshift.com",...
+{"key":"image-registry.openshift-image-registry.svc.cluster.local:5000",...
+{"key":"image-registry.openshift-image-registry.svc:5000",...

Version-Release number of selected component (if applicable)

Seen in 4.16-to-4.16 Azure update CI. Unclear what the wider scope is.

How reproducible

Sippy reports Success Rate: 94.27% post regression, so a rare race.

But using CI search to pick jobs with 10 or more runs over the past 2 days:

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?maxAge=48h&type=junit&search=master.*pool+should+be+updated+before+the+CVO+reports+available' | grep '[0-9][0-9] runs.*failures ma
tch' | sort
periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade (all) - 52 runs, 50% failed, 12% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-upgrade (all) - 80 runs, 20% failed, 25% of failures match = 5% impact
periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade (all) - 82 runs, 21% failed, 59% of failures match = 12% impact
periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade (all) - 80 runs, 53% failed, 14% of failures match = 8% impact
periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-sdn-upgrade (all) - 50 runs, 12% failed, 50% of failures match = 6% impact
pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws-ovn-upgrade (all) - 14 runs, 21% failed, 33% of failures match = 7% impact
pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn-upgrade (all) - 11 runs, 36% failed, 75% of failures match = 27% impact
pull-ci-openshift-cluster-version-operator-master-e2e-agnostic-ovn-upgrade-out-of-change (all) - 11 runs, 18% failed, 100% of failures match = 18% impact
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade (all) - 19 runs, 21% failed, 25% of failures match = 5% impact
pull-ci-openshift-machine-config-operator-master-e2e-azure-ovn-upgrade-out-of-change (all) - 21 runs, 48% failed, 50% of failures match = 24% impact
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade (all) - 16 runs, 81% failed, 15% of failures match = 13% impact
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade (all) - 16 runs, 25% failed, 75% of failures match = 19% impact
pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade (all) - 26 runs, 35% failed, 67% of failures match = 23% impact

shows some flavors like pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn-upgrade up at a 27% hit rates.

Steps to Reproduce

Unclear.

Actual results

Pull secret changes after the ClusterVersion update cause an unexpected master MachineConfigPool roll.

Expected results

No MachineConfigPool roll after the ClusterVersion update completes.

Additional info

https://github.com/openshift/machine-config-operator/pull/4430

Bug OCPBUGS-29553: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-25710: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/50

Bug OCPBUGS-28723: Improve Create serverless function error message

View the Description View the linked PRs

Description of problem:

 "create serverless function" functionality in the Openshift UI. When you add a (random) repository it shows a warning saying "func.yaml is not present and builder strategy is not s2i" but without any further link or information. That's not a very good UX imo.  Could we add a link to explain to the user what that entails?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    https://redhat-internal.slack.com/archives/CJYKV1YAH/p1706639383940559

https://github.com/openshift/console/pull/13591

Bug OCPBUGS-28557: Update 4.16 ose-multus-route-override-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-override-cni/pull/54

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/route-override-cni/pull/54

Bug OCPBUGS-25574: Update 4.16 ose-ibmcloud-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-ibmcloud/pull/32

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/32

Bug OCPBUGS-31968: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/82

Bug OCPBUGS-43337: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/767

Bug OCPBUGS-26940: If OLMPlacement is set to management, disableAllDefaultSources doesn't get updated in the guest cluster after it is removed in the HostedCluster CR

View the Description View the linked PRs

Description of problem:

If OLMPlacement is set to management,  the cluster is up with disableAllDefaultSources set to true, remove it in the HostedCluster CR, in the guest cluster disableAllDefaultSources isn't removed and still set to true

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3454

Bug OCPBUGS-28967: 'Oh no somthing went wrong' shown on Image Manifest Vulnerability page after create IMV via CLI

View the Description View the linked PRs

Description of problem:

    'Oh no somthing went wrong' shown on Image Manifest Vulnerability page after create IMV via CLI

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-02-03-192446

How reproducible:

    Always

Steps to Reproduce:

    1. Installed the operator of 'Red Hat Quay Container Security Operator'
    2. Use Command Line to created the IMV
       $ oc create -f imv.yaml 
        imagemanifestvuln.secscan.quay.redhat.com/example created
       $ cat IMV.yaml
         apiVersion: secscan.quay.redhat.com/v1alpha1
         kind: ImageManifestVuln
         metadata:
           name: example
           namespace: openshift-operators
         spec: {}
     3. Navigate to page /k8s/ns/openshift-operators/operators.coreos.com~v1alpha1~ClusterServiceVersion/container-security-operator.v3.10.3/secscan.quay.redhat.com~v1alpha1~ImageManifestVuln

Actual results:

    Oh no! Something went wrong. will be shown
Description:Cannot read properties of undefined (reading 'replace')Component trace:Copy to clipboardat T (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/container-security-chunk-c75b48f176a6a5981ee2.min.js:1:3465)
    at https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/main-chunk-d635f27942d1b5fdaad6.min.js:1:631947
    at tr
    at x (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/main-chunk-d635f27942d1b5fdaad6.min.js:1:630876)
    at t (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/vendors~main-chunk-e839d85039c974dbb9bb.min.js:82:73479)
    at tbody
    at table
    at g (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-69dd6dd4312fdf07fedf.min.js:6:199268)
    at l (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-69dd6dd4312fdf07fedf.min.js:10:88631)
    at D (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/main-chunk-d635f27942d1b5fdaad6.min.js:1:632038)
    at div
    at div
    at t (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/vendors~main-chunk-e839d85039c974dbb9bb.min.js:50:39294)
    at t (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/vendors~main-chunk-e839d85039c974dbb9bb.min.js:49:16122)
    at o (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/main-chunk-d635f27942d1b5fdaad6.min.js:1:642088)
    at div
    at M (https://console-openshift-console.apps.qe-uidaily-0205.qe.devcluster.openshift.com/static/main-chunk-d635f27942d1b5fdaad6.min.js:1:631697)
    at div

Expected results:

    no error issue

Additional info:

https://github.com/openshift/console/pull/13578

Bug OCPBUGS-32461: geneve traffic may stop working when additional IP addresses are added to primary NIC

View the Description View the linked PRs

Description of problem:

    When adding another IP address to br-ex, geneve traffic sent from this node may be sent with the new IP address rather than the one configured for this tunnel. This will cause traffic to be dropped by the destination with the error:

[root@ovn-control-plane openvswitch]# cat  ovs-vswitchd.log  | grep fc00:f853:ccd:e793::4
2024-04-17T16:47:02.146Z|00012|tunnel(revalidator10)|WARN|receive tunnel port not found (tcp6,tun_id=0xff0003,tun_src=0.0.0.0,tun_dst=0.0.0.0,tun_ipv6_src=fc00:f853:ccd:e793:ffff::1,tun_ipv6_dst=fc00:f853:ccd:e793::3,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=64,tun_erspan_ver=0,gtpu_flags=0,gtpu_msgtype=0,tun_flags=csum|key,in_port=5,vlan_tci=0x0000,dl_src=0a:58:2b:22:eb:86,dl_dst=0a:58:92:3f:71:e5,ipv6_src=fc00:f853:ccd:e793::4,ipv6_dst=fd00:10:244:1::7,ipv6_label=0x630b1,nw_tos=0,nw_ecn=0,nw_ttl=63,nw_frag=no,tp_src=8080,tp_dst=59130,tcp_flags=syn|ack)

This is more likely to occur on ipv6 than ipv4, due to IP address ordering on the NIC and linux rules used to determine source IP to use when sending host originated traffic.

Version-Release number of selected component (if applicable):

    All versions

How reproducible:

    Always

To workaround with ipv6, set preferred_lft 0 on the address, which will cause it to become deprecated and linux will choose an alternative. Alternatively set external_ids:ovn-set-local-ip="true" in openvswitch on each node, which will force OVN to use the configured geneve-encap-ip. Related OVN issue: https://issues.redhat.com/browse/FDP-570

https://github.com/openshift/ovn-kubernetes/pull/2136

Bug MGMT-16993: [STG] avoid reboot not working correctly when there is a partition on installation disk

View the Description View the linked PRs

Description of the problem:
this issue was discovered while trying to verify bug opened by Javipolo
~~MGMT-16966~~

It seems that the avoid extra reboot not working properly for 4.15 cluster with partition on installation disk

Here are 3 clusters:
1) test-infra-cluster-7a4cb4cc OCP 4.15 with partition on installation disk
2) test-infra-cluster-066749e2 OCP 4.14 with partition on installation disk
3) test-infra-cluster-f9051a36 OCP 4.15 without partition on installation disk

i see indications:
*1) test-infra-cluster-7a4cb4cc *

2/19/2024, 9:19:53 PM	Node test-infra-cluster-7a4cb4cc-worker-0 has been rebooted 1 times before completing installation
2/19/2024, 9:19:51 PM	Node test-infra-cluster-7a4cb4cc-master-2 has been rebooted 2 times before completing installation
2/19/2024, 9:19:08 PM	Node test-infra-cluster-7a4cb4cc-worker-1 has been rebooted 1 times before completing installation
2/19/2024, 8:49:59 PM	Node test-infra-cluster-7a4cb4cc-master-1 has been rebooted 2 times before completing installation
2/19/2024, 8:49:55 PM	Node test-infra-cluster-7a4cb4cc-master-0 has been rebooted 2 times before completing installation

2) test-infra-cluster-066749e2

2/19/2024, 8:32:36 PM	Node test-infra-cluster-066749e2-master-2 has been rebooted 2 times before completing installation
2/19/2024, 8:32:35 PM	Node test-infra-cluster-066749e2-worker-1 has been rebooted 2 times before completing installation
2/19/2024, 8:32:31 PM	Node test-infra-cluster-066749e2-worker-0 has been rebooted 2 times before completing installation
2/19/2024, 8:05:26 PM	Node test-infra-cluster-066749e2-master-0 has been rebooted 2 times before completing installation
2/19/2024, 8:05:25 PM	Node test-infra-cluster-066749e2-master-1 has been rebooted 2 times before completing installation

3) test-infra-cluster-f9051a36

2/18/2024, 5:13:49 PM	Node test-infra-cluster-f9051a36-worker-1 has been rebooted 1 times before completing installation
2/18/2024, 5:10:10 PM	Node test-infra-cluster-f9051a36-worker-0 has been rebooted 1 times before completing installation
2/18/2024, 5:08:46 PM	Node test-infra-cluster-f9051a36-worker-2 has been rebooted 1 times before completing installation
2/18/2024, 5:03:12 PM	Node test-infra-cluster-f9051a36-master-1 has been rebooted 1 times before completing installation
2/18/2024, 4:33:39 PM	Node test-infra-cluster-f9051a36-master-2 has been rebooted 1 times before completing installation
2/18/2024, 4:33:38 PM	Node test-infra-cluster-f9051a36-master-0 has been rebooted 1 times before completing installation

according to Ori analysis
It seems skip MCO reboot didn't happen for masters. The ignition was not accessible

Feb 19 18:38:19 test-infra-cluster-7a4cb4cc-master-0 installer[3403]: time="2024-02-19T18:38:19Z" level=warning msg="failed getting encapsulated machine config. Continuing installation without skipping MCO reboot" error="failed after 240 attempts, last error: unexpected end of JSON input"

How reproducible:

Steps to reproduce:

1.create cluster with 4.15

2.add ustom manifest which modify ignition to create a partition on disk

3.start installation

Actual results:

seems that reboot avoid did not work properly

Expected results:

https://github.com/openshift/assisted-installer/pull/798

Bug OCPBUGS-13726: Hypershift image configuration not working for Hypershift HostedCluster

View the Description View the linked PRs

Description of problem:

Add image configuration for hypershift Hosted Cluster not working as expected.

Version-Release number of selected component (if applicable):

# oc get clusterversions.config.openshift.io
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-rc.8   True        False         6h46m   Cluster version is 4.13.0-rc.8

How reproducible:

Always

Steps to Reproduce:

1. Get hypershift hosted cluster detail from management cluster. 

# hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r '.items[].metadata.name')  

2. Apply image setting for hypershift hosted cluster. 
#  oc patch hc/$hostedcluster -p '{"spec":{"configuration":{"image":{"registrySources":{"allowedRegistries":["quay.io","registry.redhat.io","image-registry.openshift-image-registry.svc:5000","insecure.com"],"insecureRegistries":["insecure.com"]}}}}}' --type=merge -n clusters     
hostedcluster.hypershift.openshift.io/85ea85757a5a14355124 patched 

# oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.image
{
  "registrySources": {
    "allowedRegistries": [
      "quay.io",
      "registry.redhat.io",
      "image-registry.openshift-image-registry.svc:5000",
      "insecure.com"
    ],
    "insecureRegistries": [
      "insecure.com"
    ]
  }
}

3. Check Pod or operator restart to apply configuration changes. 

# oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-67b6d4556b-9nk8s   5/5     Running   0          49m
kube-apiserver-67b6d4556b-v4fnj   5/5     Running   0          47m
kube-apiserver-67b6d4556b-zldpr   5/5     Running   0          51m

#oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster} -l app=openshift-apiserver
NAME                                   READY   STATUS    RESTARTS   AGE
openshift-apiserver-7c69d68f45-4xj8c   3/3     Running   0          136m
openshift-apiserver-7c69d68f45-dfmk9   3/3     Running   0          135m
openshift-apiserver-7c69d68f45-r7dqn   3/3     Running   0          136m  

4. Check image.config in hosted cluster.
# oc get image.config -o yaml
...
  spec:
    allowedRegistriesForImport: []
  status:
    externalRegistryHostnames:
    - default-route-openshift-image-registry.apps.hypershift-ci-32506.qe.devcluster.openshift.com
    internalRegistryHostname: image-registry.openshift-image-registry.svc:5000  

#oc get node
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-128-61.us-east-2.compute.internal    Ready    worker   6h42m   v1.26.3+b404935
ip-10-0-130-68.us-east-2.compute.internal    Ready    worker   6h42m   v1.26.3+b404935
ip-10-0-134-89.us-east-2.compute.internal    Ready    worker   6h42m   v1.26.3+b404935
ip-10-0-138-169.us-east-2.compute.internal   Ready    worker   6h42m   v1.26.3+b404935

# oc debug node/ip-10-0-128-61.us-east-2.compute.internal
Temporary namespace openshift-debug-mtfcw is created for debugging node...
Starting pod/ip-10-0-128-61us-east-2computeinternal-debug-mctvr ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.61
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# cat /etc/containers/registries.conf
unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
short-name-mode = ""[[registry]]
  prefix = ""
  location = "registry-proxy.engineering.redhat.com"  [[registry.mirror]]
    location = "brew.registry.redhat.io"
    pull-from-mirror = "digest-only"[[registry]]
  prefix = ""
  location = "registry.redhat.io"  [[registry.mirror]]
    location = "brew.registry.redhat.io"
    pull-from-mirror = "digest-only"[[registry]]
  prefix = ""
  location = "registry.stage.redhat.io"  [[registry.mirror]]
    location = "brew.registry.redhat.io"
    pull-from-mirror = "digest-only"

Actual results:

Config changes not applied in backend.Not operator & pod restart

Expected results:

Configuration should applied and pod & operator should restart after config changes.

Additional info:

Bug OCPBUGS-31809: Pipeline details page Metrics tab crashed due to no custom data

View the Description View the linked PRs

Description of problem:

    While doing the migration of Pipeline details page, it is expecting customData from Details page - https://github.com/openshift/console/blob/master/frontend/packages/pipelines-plugin/src/components/pipelines/pipeline-metrics/PipelineMetrics.tsx        but in horizontalnav component exposed to dynamic plugin, we don't have customData prop. https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#horizontalnav

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. Pipeline details page PR to be up for testing this[WIP][Story - https://issues.redhat.com/browse/ODC-7525]     2. Install Pipelines Operator and don't install Tekton result
    3. Enabled Pipeline details page in dynamic plugin
    4. create a pipeline and go to Metrics tab in details page

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13733

Bug OCPBUGS-32080: Unable to extract ccoctl.rhel8 using oc adm

View the Description View the linked PRs

Description of problem:

    With cloud-credential-operator moving to rhel9 by default, we added rhel8 binaries. However, users currently have no way of downloading them using `oc`

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

When attempting to extract ccoctl.rhel8

Steps to Reproduce:

    1. oc adm release extract --tools
    2.
    3.

Actual results:

    Only contains ccoctl tarball

Expected results:

    Should include ccoctl.rhel8 and ccoctl.rhel9 tarballs

Additional info:

    ccoctl.rhel8 and ccoctl.rhel9 binaries added in https://issues.redhat.com//browse/OCPBUGS-31290

https://github.com/openshift/oc/pull/1734

Bug OCPBUGS-34979: [release-4.16] OLM - Set default CatalogSource pod SecurityContext as `restricted`

View the Description View the linked PRs

Description of problem:

With the introduction of the Pod Security Adminssion, the recommended best practice is to enforce the `restricted` policy of admission.

However, if the user creates the CatalogSource in the namespace running with `restricted` policy, the CatalogSource Pod fails to be created.

This is because when the `.spec.grpcPodConfig.securityContextConfig` is NOT SET in the CatalogSource, OLM treats the value's default as "legacy", which means that the Catalog Pod does NOT set the `restricted` securityContext, meaning that a Catalog pod will fail to run.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

1. On a OCP 4.15 cluster, create a custom CatalogSource object without `.spec.grpcPodConfig.securityContextConfig` being specified

2. See if the CatalogSource Pod started successfully without errors.

Actual results:

1. the CatalogSource Pod fails to be created with the error like:

status:
  message: >-
    couldn't ensure registry server - error ensuring pod: : error creating new
    pod: foobar-: pods "foobar-6ttkb" is forbidden:
    violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false
    (container "registry-server" must set
    securityContext.allowPrivilegeEscalation=false), unrestricted capabilities
    (container "registry-server" must set
    securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or
    container "registry-server" must set securityContext.runAsNonRoot=true),
    seccompProfile (pod or container "registry-server" must set
    securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
  reason: RegistryServerError

Expected results:

The CatalogSource Pod started successfully by default without specifying `.spec.grpcPodConfig.securityContextConfig` as `restricted`

Additional info:

https://github.com/openshift/operator-framework-olm/pull/788

Bug OCPBUGS-31092: The archive tar file size should respect the archiveSize setting when mirror with V2 format

View the Description View the linked PRs

Description of problem:

The archive tar file size should respect the archiveSize setting when mirror with V2 format

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295", GitCommit:"c4f829512107f7d0f52a057cd429de2030b9b3b3", GitTreeState:"clean", BuildDate:"2024-03-07T03:46:24Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1) With following imagesetconfigure : 
cat config.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 8
storageConfig:
  local:
    path: /app1/ocmirror/offline
mirror:
  platform:
    channels:
    - name: stable-4.12                                             
      type: ocp
      minVersion: '4.12.46'
      maxVersion: '4.12.46'
      shortestPath: true
    graph: true
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
    packages:
    - name: advanced-cluster-management                                  
      channels:
      - name: release-2.9             
    - name: compliance-operator
      channels:
      - name: stable
    - name: multicluster-engine
      channels:
      - name: stable-2.4
      - name: stable-2.5
  additionalImages:
  - name: registry.redhat.io/ubi8/ubi:latest                        
  - name: registry.redhat.io/rhel8/support-tools:latest
  - name: registry.access.redhat.com/ubi8/nginx-120:latest
  - name: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0
  - name: registry.k8s.io/sig-storage/csi-resizer:v1.8.0
  - name: quay.io/openshifttest/hello-openshift@sha256:4200f438cf2e9446f6bcff9d67ceea1f69ed07a2f83363b7fb52529f7ddd8a83
  - name: quay.io/openshifttest/hello-openshift@sha256:61b8f5e1a3b5dbd9e2c35fd448dc5106337d7a299873dd3a6f0cd8d4891ecc27

2) Run `oc-mirror --config config.yaml file://out --v2`

Actual results:

2) The archive size is still 49G , not following the setting in imagesetconfigure.
ll  out/ -h
total 49G
-rw-r--r--.  1 root root  49G Mar 20 09:03 mirror_000001.tar
drwxr-xr-x. 11 root root 4.0K Mar 20 08:54 working-dir

Expected results:

multiple tar files with size greater or equal to 8G should be generated

https://github.com/openshift/oc-mirror/pull/842

Bug OCPBUGS-37853: Open limited z rollbacks again

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37345~~. The following is the description of the original issue:
—

Description of problem

~~OTA-941~~ landed a rollback guard in 4.14 that blocked all rollbacks. ~~OCPBUGS-24535~~ drilled a hole in that guard to allow limited rollbacks to the previous release the cluster had been aiming at, as long as that previous release was part of the same 4.y z stream. We decided to block that hole back up in ~~OCPBUGS-35994~~. And now folks want the hole re-opened in this bug. We also want to bring back the oc adm upgrade rollback ... subcommand. Hopefully this new plan sticks

Version-Release number of selected component

Folks want the guard-hole and rollback subcommand restored for 4.16 and 4.17.

How reproducible

Every time.

Steps to Reproduce

Try to perform the rollbacks that ~~OCPBUGS-24535~~ allowed.

Actual results

They stop working, with reasonable ClusterVersion conditions explaining that even those rollback requests will not be accepted.

Expected results

They work, as verified in ~~OCPBUGS-24535~~.

Bug OCPBUGS-17207: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2081

Bug OCPBUGS-37065: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4367

Bug OCPBUGS-42432: HostedClusterConfigOperator used wrong certificate for Kube certificate authority

View the Description View the linked PRs

This is a clone of issue OCPBUGS-41328. The following is the description of the original issue:
—
Description of problem:

    Rotating the root certificates (root CA) requires multiple certificates during the rotation process to prevent downtime as the server and client certificates are updated in the control and data planes. Currently, the HostedClusterConfigOperator uses the cluster-signer-ca from the control plane to create a kublet-serving-ca on the data plane. The cluster-signer-ca contains only a single certificate that is used for signing certificates for the kube-controller-manager. 

During a rotation, the kublet-serving-ca will be updated with the new CA which triggers the metrics-server pod to restart and use the new CA. This will lead to an error in the metrics-server where it cannot scrape metrics as the kublet has yet to pickup the new certificate.

E0808 16:57:09.829746       1 scraper.go:149] "Failed to scrape node" err="Get \"https://10.240.0.29:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="pres-cqogb7a10b7up68kvlvg-rkcpsms0805-default-00000130"

rkc@rmac ~> kubectl get pods -n openshift-monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
metrics-server-594cd99645-g8bj7                          0/1     Running   0          2d20h
metrics-server-594cd99645-jmjhj                          1/1     Running   0          46h 

The HostedClusterConfigOperator should likely be using the KubeletClientCABundle from the control plane for the kublet-serving-ca in the data plane. This CA bundle will contain both the new and old CA such that all data plane components can remain up during the rotation process.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4799

Bug OCPBUGS-34970: [release-4.16] control-plane-machine-set goes Available=False with UnavailableReplicas during updates

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20061~~. The following is the description of the original issue:
—

Description of problem:

Possibly reviving ~~OCPBUGS-10771~~, the control-plane-machine-set ClusterOperator occasionally goes Available=False with reason=UnavailableReplicas. For example, this run includes:

: [bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Available expand_less	1h34m30s
{  3 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:

Oct 03 22:03:29.822 - 106s  E clusteroperator/control-plane-machine-set condition/Available reason/UnavailableReplicas status/False Missing 1 available replica(s)
Oct 03 22:08:34.162 - 98s   E clusteroperator/control-plane-machine-set condition/Available reason/UnavailableReplicas status/False Missing 1 available replica(s)
Oct 03 22:13:01.645 - 118s  E clusteroperator/control-plane-machine-set condition/Available reason/UnavailableReplicas status/False Missing 1 available replica(s)

But those are the nodes rebooting into newer RHCOS, and do not warrant immediate admin intervention. Teaching the CPMS operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component (if applicable):

4.15. Possibly all supported versions of the CPMS operator have this exposure.

How reproducible:

Looks like many (all?) 4.15 update jobs have near 100% reproducibility for some kind of issue with CPMS going Available=False, see Actual results below. These are likely for reasons that do not require admin intervention, although figuring that out is tricky today, feel free to push back if you feel that some of these do warrant admin immediate admin intervention.

Steps to Reproduce:

w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/control-plane-machine-set+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort

Actual results:

periodic-ci-openshift-cluster-etcd-operator-release-4.15-periodics-e2e-aws-etcd-recovery (all) - 2 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 19 runs, 42% failed, 225% of failures match = 95% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 18 runs, 61% failed, 127% of failures match = 78% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-e2e-aws-sdn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 19 runs, 47% failed, 200% of failures match = 95% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-sdn-arm64 (all) - 9 runs, 78% failed, 114% of failures match = 89% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 11 runs, 64% failed, 143% of failures match = 91% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 70 runs, 41% failed, 207% of failures match = 86% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-upgrade (all) - 7 runs, 43% failed, 200% of failures match = 86% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn (all) - 6 runs, 50% failed, 33% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 71 runs, 24% failed, 382% of failures match = 92% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 70 runs, 30% failed, 281% of failures match = 84% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 8 runs, 50% failed, 175% of failures match = 88% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 71 runs, 38% failed, 233% of failures match = 89% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 69 runs, 49% failed, 171% of failures match = 84% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade (all) - 7 runs, 57% failed, 175% of failures match = 100% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-sdn-upgrade (all) - 6 runs, 33% failed, 250% of failures match = 83% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 63 runs, 37% failed, 222% of failures match = 81% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-upgrade (all) - 6 runs, 33% failed, 250% of failures match = 83% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 7 runs, 43% failed, 233% of failures match = 100% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 13 runs, 54% failed, 100% of failures match = 54% impact
periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 16 runs, 63% failed, 90% of failures match = 56% impact

Expected results:

CPMS goes Available=False if and only if immediate admin intervention is appropriate.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/298

Bug OCPBUGS-29441: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-29384~~.

https://github.com/openshift/installer/pull/8015

Bug OCPBUGS-33132: Control plane operator fails to reconcile due to missing IngressController permissions (hypershift operator)

View the Description View the linked PRs

Description of problem:

   PublicAndPrivate and Private clusters fail to provision due to missing IngressController RBAC in control plane operator. This RBAC was recently removed from the HyperShift operator.

Version-Release number of selected component (if applicable):

    4.14.z

How reproducible:

    Always

Steps to Reproduce:

    1. Install hypershift operator from main
    2. Create an AWS PublicAndPrivate cluster using a 4.14.z release

Actual results:

    The cluster never provisions because the cpo is stuck

Expected results:

    The cluster provisions successfully

Additional info:

https://github.com/openshift/hypershift/pull/3965

Bug OCPBUGS-33523: Masthead logo no longer restricted to a max-height of 60px

View the Description View the linked PRs

With the change from PatternFly's `PageHeader` to `Masthead`, there is no longer a max-height of 60px restricting the size of the masthead logo. As a result, logos that are larger than 60px high display at their native size and cause the masthead to get taller (see https://drive.google.com/file/d/11enMtMU1cfzXQqRfd0eTdsKFkBVPWoFc/view?usp=sharing). This went unnoticed in the change because OpenShift and OKD logos are sized appropriately for the masthead and do not need the restriction. Further, the docs state a custom logo "is constrained to a max-width of 200px and a max-height` of 68px.", which is a separate bug that needs to be addressed (should read "is constrained to a max-height of 60px").

https://github.com/openshift/console/pull/13838

Bug OCPBUGS-30005: Power VS: PlatformCredsCheck relies on endpoint that has been removed.

View the Description View the linked PRs

Description of problem:

    On February 27th endpoints were turned off that were being queried for account details. The check is not vital so we are fine with removing it, however it is currently blocking all Power VS installs.

Version-Release number of selected component (if applicable):

    4.13.0 - 4.16.0

How reproducible:

    Easily

Steps to Reproduce:

    1. Try to deploy with Power VS
    2. Fail at the platform credentials check

Actual results:

    Check fails

Expected results:

    Check should succeed

Additional info:

https://github.com/openshift/installer/pull/8072

Bug OCPBUGS-36608: [4.16] ovs-vswitchd is using isolated cpu pool instead of reserved pool

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35347~~. The following is the description of the original issue:
—
Description of problem:

OCP/RHCOS system daemon(s) like ovs-vswitchd (revalidator process) use the same vCPU (from isolated vCPU pool) that is already reserved by CPU Manager for CNF workloads, causing intermittent issues for CNF workloads performance (and also causing vCPU level overload). Note: NCP 23.11 uses CPU Manager with static policy and Topology Manager set to "single-numa-node". Also, specific isolated and reserved vCPU pools have been defined.

Version-Release number of selected component (if applicable):

4.14.22

How reproducible:

Intermittent at customer environment.

Steps to Reproduce:

1.
2.
3.

Actual results:

ovs-vswitchd is using isolated CPUs

Expected results:

ovs-vswitchd to use only  reserved CPUs

Additional info:

We want to understand if customer is hitting the bug:

  https://issues.redhat.com/browse/OCPBUGS-32407

This bug was fixed at 4.14.25. Customer cluster is 4.14.22. Customer is also asking if it is possible to get a private fix since they cannot update at the moment.

All case files have been yanked at both US and EU instances of Supportshell. In case case updates or attachments are not accessible please let me know.

https://github.com/openshift/ovn-kubernetes/pull/2220

Bug OCPBUGS-24225: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1619

Bug OCPBUGS-27489: sha256sum mismatch for openshift-install-mac-arm64-4.14.9.tar.gz

View the Description View the linked PRs

Description of problem:

sha256 sum for "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.14.9/openshift-install-mac-arm64-4.14.9.tar.gz" does not match what it should be

# sha256sum openshift-install-mac-arm64-4.14.9.tar.gz 61cccc282f39456b7db730a0625d0a04cd6c1c2ac0f945c4c15724e4e522a073 openshift-install-mac-arm64-4.14.9.tar.gz

Which does not match what is posted here: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.14.9/sha256sum.txt


It should be :
c765c90a32b8a43bc62f2ba8bd59dc8e620b972bcc2a2e217c36ce139d517e29  openshift-install-mac-arm64-4.14.9.tar.gz

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1659

Bug OCPBUGS-43056: gather_network_logs_basics script when node is in the NotReady [backport 4.16]

View the Description View the linked PRs

This comes from this bug https://issues.redhat.com/browse/OCPBUGS-29940

After applying the workaround suggested [1][2] with "oc adm must-gather --node-name" we found another issue where must-gather creates the debug pod on all master nodes and gets stuck for a while because the script gather_network_logs_basics loop. Filtering out the NotReady nodes would allow us to apply the workaround.

The script gather_network_logs_basics gets the master nodes by label (node-role.kubernetes.io/master) and saves them in the CLUSTER_NODES variable. It then passes this as a parameter to the function gather_multus_logs $CLUSTER_NODES, where it loops through the list of master nodes and performs debugging for each node.

collection-scripts/gather_network_logs_basics
...
CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -oname)}"
/usr/bin/gather_multus_logs $CLUSTER_NODES
...

collection-scripts/gather_multus_logs
...
function gather_multus_logs {
  for NODE in "$@"; do
    nodefilename=$(echo "$NODE" | sed -e 's|node/||')
    out=$(oc debug "${NODE}" -- \
    /bin/bash -c "cat $INPUT_LOG_PATH" 2>/dev/null) && echo "$out" 1> "${OUTPUT_LOG_PATH}/multus-log-$nodefilename.log"
  done
}

This could be resolved with something similar to this:

CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -o json | jq -r '.items[] | select(.status.conditions[] | select(.type=="Ready" and .status=="True")).metadata.name')}"
/usr/bin/gather_multus_logs $CLUSTER_NODES

[1] - https://access.redhat.com/solutions/6962230
[2] - https://issues.redhat.com/browse/OCPBUGS-29940

https://github.com/openshift/must-gather/pull/449

Story TRT-1559: Investigate why prow job loading is often very slow

View the Description View the linked PRs

In looking at a component readiness test page we see some failures that take a long time to load:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-dualstack/1758641985364692992 (I noticed that this one resulted in messages asking me to restart chrome)
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-dualstack/1767279909555671040
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-dualstack/1766663255406678016
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-dualstack/1765279833048223744

We'd like to understand why it takes a long time to load these jobs and possible take some action to remediate as much of that slowness as possible.

Taking a long time to load prow jobs will make our TRT tools seem unusable and might make it difficult for managers to inspect Component Readiness failures which would slow down getting them resolved.

Some idea of what to look at:

see if the file size of the jobs is any bigger now than before esp. for runs with a lot of failures
see if the recent change that cuts the size of the intervals down is still working as expected
compare the file size of a passing run vs. one with a lot of failures

https://github.com/openshift/origin/pull/28654

Bug OCPBUGS-27397: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/109

Bug OCPBUGS-27468: Enable TranslateStreamCloseWebsocketRequests through TechPreviewNoUpgrade

View the Description View the linked PRs

Enable websockets (https://github.com/kubernetes/enhancements/issues/4006) in 4.16 so https://issues.redhat.com/browse/OCPBUGS-20515 can test whether websockets allows to timeout idle connection.

Bug OCPBUGS-33788: [release-4.16] Console operator goes degraded on HyperShift HCP clusters with the ingress capability disabled

View the Description View the linked PRs

Description of problem:

    The ingress cluster capability has been introduced in OCP 4.16 (https://github.com/openshift/enhancements/pull/1415). It includes the cluster ingress operator and all its controllers. If the ingress capability is disabled all the routes of the cluster become unavailable (no router to back them up). The console operator heavily depends on the working (admitted/active) routes to do the health checks, configure the authentication flows, client downloads, etc. The console operator goes degraded if the routes are not served by a router. The console operator needs to be able to tolerate the absence of the ingress capability.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. Create ROSA HCP cluster.
    2. Scale the default ingresscontroller to 0: oc -n openshift-ingress-operator patch ingresscontroller default --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value":0}]'
    3. Check the status of console cluster operator: oc get co console

Actual results:

$ oc get co console
NAME      VERSION  AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console   4.16.0   False       False         False      53s     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.49e4812b7122bc833b72.hypershift.aws-2.ci.openshift.org): Get "https://console-openshift-console.apps.49e4812b7122bc833b72.hypershift.aws-2.ci.openshift.org": EOF

Expected results:

$ oc get co console
NAME      VERSION  AVAILABLE   PROGRESSING   DEGRADED   
console   4.16.0   True        False         False

Additional info:

    The ingress capability cannot be disabled on a standalone OpenShift (when the payload is managed by ClusterVersionOperator). Only clusters managed HyperShift with HostedControlPlane are impacted.

Bug OCPBUGS-18844: There is blank space at the bottom of the cluster settings page when an upgrade is in progress

View the Description View the linked PRs

Description of problem:

When the cluster is in upgrade, scroll sidebar to bottom on cluster settings page, there is blank space at the bottom.

Version-Release number of selected component (if applicable):

upgrade 4.14.0-0.nightly-2023-09-09-164123 to 4.14.0-0.nightly-2023-09-10-184037

How reproducible:

Always

Steps to Reproduce:

1.Launch a 4.14 cluster, trigger an upgrade.
2.Go to "Cluster Settings"->"Details" page during upgrade, scroll down the right sidebar to the bottom.
3.

Actual results:

2. It's blank at the bottom

Expected results:

2. Should not show blank.

Additional info:

screenshot: ~~https://drive.google.com/drive/folders/1DenrQTX7K0chbs9hG9ZbSZyY-viRRy1k?ths=true~~
https://drive.google.com/drive/folders/10dgToTxZf7gOfmL2Mp5gAMVQnM06XvAf?usp=sharing

https://github.com/openshift/console/pull/13453

Bug OCPBUGS-24948: Update 4.16 ose-cluster-cloud-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/308

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/308

Bug OCPBUGS-25560: Update 4.16 prometheus-config-reloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/270

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/270

Bug OCPBUGS-32255: kube-scheduler doesn't need a readiness probe

View the Description View the linked PRs

Description of problem:

    There is no kubernetes service associated with the kube-scheduler, so it does not require a readiness probe.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

# In the control plane:
kubectl get services | grep scheduler
kubectl get deploy kube-scheduler | grep readiness

Actual results:

    Probe exists, but no service

Expected results:

    No probe or service

Additional info:

https://github.com/openshift/hypershift/pull/3889

Bug OCPBUGS-25030: Update 4.16 ovn-kubernetes-microshift-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1979

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1979

Bug OCPBUGS-26119: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-26145: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/161

Story HOSTEDCP-1420: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3525

Bug OCPBUGS-26539: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/553

Bug OCPBUGS-25003: Update 4.16 ose-azure-cloud-node-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/99

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/99

Bug OCPBUGS-25575: Update 4.16 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/136

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/136

Bug OCPBUGS-36536: Disconnected ARO clusters fail to add new nodes after upgrading to 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35300~~. The following is the description of the original issue:
—
Description of problem:

ARO cluster fails to install with disconnected networking.
We see master nodes bootup hang on the service machine-config-daemon-pull.service. Logs from the service indicate it cannot reach the public IP of the image registry. In ARO, image registries need to go via a proxy. Dnsmasq is used to inject proxy DNS answers, but machine-config-daemon-pull is starting before ARO's dnsmasq.service starts.

Version-Release number of selected component (if applicable):

4.14.16

How reproducible:

Always

Steps to Reproduce:

For Fresh Install:
1. Create the required ARO vnet and subnets
2. Attach a route table to the subnets with a blackhole route 0.0.0.0/0
3. Create 4.14 ARO cluster with --apiserver-visibility=Private --ingress-visibility=Private --outbound-type=UserDefinedRouting

[OR]

Post Upgrade to 4.14:
1. Create a ARO 4.13 UDR.
2. ClusterUpgrade the cluster 4.13-> 4.14 , upgrade was successful
3. Create a new node (scale up), we run into the same issue.

Actual results:

For Fresh Install of 4.14:
ERROR: (InternalServerError) Deployment failed.

[OR]

Post Upgrade to 4.14:
Node doesn't come into a Ready State and Machine is stuck in Provisioned status.

Expected results:

Succeeded

Additional info:
We see in the node logs that machine-config-daemon-pull.service is unable to reach the image registry. ARO's dnsmasq was not yet started.
Previously, systemd ordering was set for ovs-configuration.service to start after (ARO's) dnsmasq.service. Perhaps that should have gone on machine-config-daemon-pull.service.
See https://issues.redhat.com/browse/OCPBUGS-25406.

https://github.com/openshift/machine-config-operator/pull/4453

Bug OCPBUGS-24969: Update 4.16 ose-kube-storage-version-migrator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/202

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/202

Bug OCPBUGS-28728: Update 4.16 ose-image-customization-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-customization-controller/pull/122

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-customization-controller/pull/122

Bug OCPBUGS-25638: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-ibm/pull/63

Task OKD-214: console: Pass TAGS=scos to OKD/SCOS build

View the Description View the linked PRs

So that the OKD build has the correct config by default.

https://github.com/openshift/console-operator/pull/896

Bug OU-321: Alerts extension is not passing the alert object correctly

View the Description View the linked PRs

Looks like we're accidentally passing the JavaScript `window.alert()` method instead of the Prometheus alert object.

https://github.com/openshift/monitoring-plugin/pull/101

Bug OCPBUGS-28597: Reenable cluster-olm-operator, platform-operators

View the Description View the linked PRs

Rukpak – an alpha tech preview API – has pushed a breaking change upstream. This bug tracks the need for us to disable and then reenable the cluster-olm-operator and platform-operators components which both depend on rukpak in order to push the breaking API change. This bug can be closed once those components are all updated and available on the cluster again.

Story HOSTEDCP-1310: Move aws and azure kms provider images into payload

View the Description View the linked PRs

DoD:

https://github.com/openshift/hypershift/blob/b06d0bdba5e267b3d0824a2e958f0e00145aedb9/control-plane-operator/main.go#L143

https://github.com/openshift/hypershift/blob/28e8a064d52628b1805d15cfe7e230add2a76a7c/control-plane-operator/controllers/hostedcontrolplane/kas/kms/azure.go#L29

Images to include in the payload:

registry.ci.openshift.org/hypershift/aws-encryption-provider:latest
mcr.microsoft.com/oss/azure/kms/keyvault:v0.5.0

Step 1: Finish the following checklist

I have onboarded the image/rpm with DPTP/CI
Is this an Operator destined for Operator Hub? Talk to ART before proceeding.
You have performed a threat model assessment
My image/rpm has aligned with Product Management / Docs / QE / Product Support

https://github.com/openshift/hypershift/pull/3996

Bug OCPBUGS-30502: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-resizer/pull/159

Bug OCPBUGS-34402: [4.16] geneve port not created for a set of nodes and causing POD to POD connectivity issue

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/2181

Bug OCPBUGS-39468: [OCP 4.16] Need HTTPS support for TransferProtocolTypes in Redfish APIs

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39467. The following is the description of the original issue:
—
Description of problem:

Hi team,

The customer is performing RHOCP IPI testing for H/W certification and is referencing this document: https://docs.openshift.com/container-platform/4.15/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#bmc-addressing_ipi-install-installation-workflow

The issue occurred during the IPI installation. The redfish ISO mount point shows it only supports HTTP, as per the error message below. However, the official document mentioned that both HTTP and HTTPS are supported parameter types for TransferProtocolTypes. 


~~~
VirtualMedia.InsertMedia BEF (/redfish/v1/Managers/Self/VirtualMedia/CD1/Actions/VirtualMedia.InsertMedia)
===============================================
{"error":"@Message.ExtendedInfo": 
"@odata.type":"#Message.v1_0_8.Message", "Message":"The value 'HTTP' for the property TransferProtocolType is not in the list of acceptable values.", "MessageArgs": ["HTTP", "TransferProtocolType"], "MessageId":"Base.1.12.PropertyValueNotInList", "RelatedProperties":["/TransferProtocolType"], "Resolution": "Choose a value from the enumeration list that the implementation can support and resubmit the request if the operation failed.", "Severity": "Warning"},"code":"Base.1.12.PropertyValueNotInList", "message":"The value 'HTTP' for the property TransferProtocolType is not in the list of acceptable values."}
~~~


Could you please confirm if we currently support HTTPS?


***Business impact:

We have business visibility on current telco project, the customer needs passed the IPI testing for H/W certification.

The problem is the customer's BMC currently only supports HTTPS mounting per the AMI code-base requirement.

ACM/ZTP based installation on the fleet of 1000s of these servers. Support for https will be great. 



Please help to check HTTPS support plan. Any recommendation would be appreciated!

Version-Release number of selected component (if applicable):

How reproducible:

Follow the document steps.  https://docs.openshift.com/container-platform/4.15/installing/installing_bare_metal_ipi/ipi-install-overview.html

Steps to Reproduce:

The installation steps we follow are baed on Overview - Deploying installer-provisioned clusters on bare metal | Installing | OpenShift Container Platform 4.14


The DNS and DHCP are setup for provisioning but not disconnected registry.  We failed at the following command:./openshift-baremetal-install --dir ~/clusterconfigs --log-level debug create cluster


The console log :
~~~
ERROR Error: could not inspect: inspect failed , last error was 'Failed to inspect hardware. Reason: unable to start inspection: ('All virtual media mount attempts failed. Most recent error: ', ('Inserting virtual media into %(boot_device)s failed for node %(node)s, moving to next virtual media device, if available', {'node': '861f2cf6-3638-43c3-aa51-f1a2dee43c93', 'boot_device': <VirtualMediaType.CD: 'CD'>}))'
ERROR
ERROR   with ironic_node_v1.openshift-master-host[0],
ERROR   on main.tf line 13, in resource "ironic_node_v1" "openshift-master-host":
ERROR   13: resource "ironic_node_v1" "openshift-master-host" {
ERROR
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "masters" stage: failed to create cluster: failed to apply Terraform: exit status 1
~~~


Part of the ironic.service log in bootstrap node:

Jun 28 03:33:44 localhost.localdomain ironic[7091]: 2024-06-28 03:33:44.019 1 DEBUG sushy.exceptions [None req-922eeb65-bb47-44e5-9aed-0a86779b77b6 - - - - - -] HTTP response for POST https://10.102.13.230:443/redfish/v1/Managers/Self/VirtualMedia/CD1/Actions/VirtualMedia.InsertMedia: status code: 400, error: Base.1.5.PropertyValueNotInList: The value HTTP for the property TransferProtocolType is not in the list of acceptable values., extended: [{'@odata.type': '#Message.v1_0_8.Message', 'Message': 'The value HTTP for the property TransferProtocolType is not in the list of acceptable values.', 'MessageArgs': ['HTTP', 'TransferProtocolType'], 'MessageId': 'Base.1.5.PropertyValueNotInList', 'RelatedProperties': ['#/TransferProtocolType'], 'Resolution': 'Choose a value from the enumeration list that the implementation can support and resubmit the request if the operation failed.', 'Severity': 'Warning'}] __init__ /usr/lib/python3.9/site-packages/sushy/exceptions.py:122
Jun 28 03:33:44 localhost.localdomain ironic[7091]: 2024-06-28 03:33:44.020 1 WARNING ironic.drivers.modules.redfish.boot [None req-922eeb65-bb47-44e5-9aed-0a86779b77b6 - - - - - -] ('Inserting virtual media into %(boot_device)s failed for node %(node)s, moving to next virtual media device, if available', {'node': '861f2cf6-3638-43c3-aa51-f1a2dee43c93', 'boot_device': <VirtualMediaType.CD: 'CD'>}): sushy.exceptions.BadRequestError: HTTP POST https://10.102.13.230:443/redfish/v1/Managers/Self/VirtualMedia/CD1/Actions/VirtualMedia.InsertMedia returned code 400. Base.1.5.PropertyValueNotInList: The value HTTP for the property TransferProtocolType is not in the list of acceptable values. Extended information: [{'@odata.type': '#Message.v1_0_8.Message', 'Message': 'The value HTTP for the property TransferProtocolType is not in the list of acceptable values.', 'MessageArgs': ['HTTP', 'TransferProtocolType'], 'MessageId': 'Base.1.5.PropertyValueNotInList', 'RelatedProperties': ['#/TransferProtocolType'], 'Resolution': 'Choose a value from the enumeration list that the implementation can support and resubmit the request if the operation failed.', 'Severity': 'Warning'}]
Jun 28 03:33:44 localhost.localdomain ironic[7091]: 2024-06-28 03:33:44.024 1 ERROR ironic.drivers.modules.inspector.interface [None req-922eeb65-bb47-44e5-9aed-0a86779b77b6 - - - - - -] Unable to start managed inspection for node 861f2cf6-3638-43c3-aa51-f1a2dee43c93: ('All virtual media mount attempts failed. Most recent error: ', ('Inserting virtual media into %(boot_device)s failed for node %(node)s, moving to next virtual media device, if available', {'node': '861f2cf6-3638-43c3-aa51-f1a2dee43c93', 'boot_device': <VirtualMediaType.CD: 'CD'>})): ironic.common.exception.InvalidParameterValue: ('All virtual media mount attempts failed. Most recent error: ', ('Inserting virtual media into %(boot_device)s failed for node %(node)s, moving to next virtual media device, if available', {'node': '861f2cf6-3638-43c3-aa51-f1a2dee43c93', 'boot_device': <VirtualMediaType.CD: 'CD'>}))

Actual results:

HTTPS is not supported.

Expected results:

Per the doc mentioned, HTTPS should be supported.

Additional info:

Also raised the bug ticket for document check: https://issues.redhat.com/browse/OCPBUGS-36280

https://github.com/openshift/installer/pull/8955

Task MGMT-14159: Migrate ImageContentSourcePolicy to ImageDigestMirrorSet

View the Description View the linked PRs

Use the new CRD when available, as the current one is being deprecated

https://github.com/openshift/assisted-service/pull/5799

Bug OCPBUGS-26081: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/161

Bug MGMT-17313: BMH and Machine resources not created for ZTP day-2 control-plane nodes

View the Description View the linked PRs

Description of the problem:

BMH and Machine resources not created for ZTP day-2 control-plane nodes

How reproducible:

100%

Steps to reproduce:

1. Use ZTP to add control-plane nodes to an existing baremetal spoke cluster that was installed using ZTP

Actual results:

CSRs are not being approved automatically because Machine and BMH resources are not being created due to this condition which excludes control plane nodes. This condition seems to be old and no longer relevant, as it was written before adding day-2 control plane nodes was supported

Expected results:

Machine and BMH resources are being created and as a result CSRs are being approved automatically

https://github.com/openshift/assisted-service/pull/6142

Bug OCPBUGS-22699: IPXE connection timed out

View the Description View the linked PRs

Description of problem:

New deployment of BM IPI using provisioning network with IPV6 is showing:

http://XXXX:XXXX:XXXX:XXXX::X:6180/images/ironic-python-agernt.kernel....
connection timed out (http://ipxe.org/4c0a6092)" error

Version-Release number of selected component (if applicable):

Openshift 4.12.32
Also seen in Openshift 4.14.0-rc.5 when adding new nodes

How reproducible:

Very frequent

Steps to Reproduce:

1. Deploy cluster using BM with provided config
2.
3.

Actual results:

Consistent failures depending of the version of OCP used to deploy

Expected results:

No error, successful deployment

Additional info:

Things checked while the bootstrap host is active and the installation information is still valid (and failing):
- tried downloading the "ironic-python-agent.kernel" file from different places (bootstrap, bastion hosts, another provisioned host) and in all cases it worked:
[core@control-1-ru2 ~]$ curl -6 -v -o ironic-python-agent.kernel http://[XXXX:XXXX:XXXX:XXXX::X]:80/images/ironic-python-agent.kernel
\*   Trying XXXX:XXXX:XXXX:XXXX::X...
\* TCP_NODELAY set
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to XXXX:XXXX:XXXX:XXXX::X (xxxx:xxxx:xxxx:xxxx::x) port 80   #0)
> GET /images/ironic-python-agent.kernel HTTP/1.1
> Host: [xxxx:xxxx:xxxx:xxxx::x]
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 27 Oct 2023 08:28:09 GMT
< Server: Apache
< Last-Modified: Thu, 26 Oct 2023 08:42:16 GMT
< ETag: "a29d70-6089a8c91c494"
< Accept-Ranges: bytes
< Content-Length: 10657136
<
{ [14084 bytes data]
100 10.1M  100 10.1M    0     0   597M      0 --:--:-- --:--:-- --:--:--  597M
\* Connection #0 to host xxxx:xxxx:xxxx:xxxx::x left intact

This verifies some of the components like the network setup and the httpd service running on ironic pods.

- Also gathered listing of the contents of the ironic pod running in podman, specially in the shared directory. The contents of /shared/html/inspector.ipxe seems correct compared to a working installation, also all files look in place.

- Logs from the ironic container shows the errors coming from the node being deployed, we also show here the curl log to compare:

xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx::x - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 200 10657136 "-" "curl/7.61.1"
cxxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"

Seems like an issue with iPXE and IPV6

https://github.com/openshift/ironic-image/pull/446

Bug OCPBUGS-26058: Revert termination log permissions carry

View the Description View the linked PRs

~~OCPBUGS-11856~~ added a patch to change termination log permissions manually. Since 4.15.z this is not longer necessary as its fixed by lumberjack dep bump.

This bug tracks carry revert

https://github.com/openshift/kubernetes/pull/1841

Bug OCPBUGS-23362: CPO Failing to delete default worker security group, but not reflected in HostedCluster status condition

View the Description View the linked PRs

A hostedcluster/hostedcontrolplane were stuck uninstalling. Inspecting the CPO logs, it showed that

"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

Unfortunately, I do not have enough access to the AWS account to inspect this security group, though I know it is the default worker security group because it's recorded in the hostedcluster .status.platform.aws.defaultWorkerSecurityGroupID

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

I haven't tried to reproduce it yet, but can do so and update this ticket when I do. My theory is:

Steps to Reproduce:

1. Create an AWS HostedCluster, wait for it to create/populate defaultWorkerSecurityGroupID
2. Attach the defaultWorkerSecurityGroupID to anything else in the AWS account unrelated to the HCP cluster
3. Attempt to delete the HostedCluster

Actual results:

CPO logs:
"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

HostedCluster Status Condition
  - lastTransitionTime: "2023-11-09T22:18:09Z"
    message: ""
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: CloudResourcesDestroyed

Expected results:

I would expect that the CloudResourcesDestroyed status condition on the hostedcluster would reflect this security group as holding up the deletion instead of having to parse through logs.

Additional info:

https://github.com/openshift/hypershift/pull/3307

Bug OCPBUGS-29765: [Power VS] dns records from a private cluster are not destroyed when it is using the same domain name as another existing CIS instance

View the Description View the linked PRs

Description of problem:

This is an issue that IBM Cloud found and it likely effects Power VS. See https://issues.redhat.com/browse/OCPBUGS-28870

Install a private cluster, the base domain set in install-config.yaml is same as another existed cis domain name. 
After destroy the private cluster, the dns resource-records remains.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.create a DNS service instance, setting its domain to "ibmcloud.qe.devcluster.openshift.com", Note, this domain name is also being used in another existing CIS domain.
2.Install a private ibmcloud cluster, the base domain set in install-config is "ibmcloud.qe.devcluster.openshift.com"
3.Destroy the cluster
4.Check the remains dns records

Actual results:

$ ibmcloud dns resource-records 5f8a0c4d-46c2-4daa-9157-97cb9ad9033a -i preserved-openshift-qe-private | grep ci-op-17qygd06-23ac4
api-int.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com 
*.apps.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com 
api.ci-op-17qygd06-23ac4.ibmcloud.qe.devcluster.openshift.com

Expected results:

No more dns records about the cluster

Additional info:

$ ibmcloud dns zones -i preserved-openshift-qe-private | awk '{print $2}'   
Name
private-ibmcloud.qe.devcluster.openshift.com 
private-ibmcloud-1.qe.devcluster.openshift.com 
ibmcloud.qe.devcluster.openshift.com  

$ ibmcloud cis domains
Name
ibmcloud.qe.devcluster.openshift.com

When use private-ibmcloud.qe.devcluster.openshift.com and private-ibmcloud-1.qe.devcluster.openshift.com as domain, no such issue, when use ibmcloud.qe.devcluster.openshift.com as domain the dns records remains.

https://github.com/openshift/installer/pull/8057

Bug OCPBUGS-25566: Update 4.16 ose-ibm-vpc-block-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/63

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/63

Bug OCPBUGS-31931: Unable to create alert silence in UI though "creator" filed is NOT mandatory

View the Description View the linked PRs

Description of problem:

When creating alerting silence from RHOCP UI without specifying "Creator" field, error "createdBy in body is required" even though field "Creator" is not marked as mandatory.

Version-Release number of selected component (if applicable):

4.15.5

How reproducible:

100%

Steps to Reproduce:

    1. Login to webconsole (Admin view)
    2. Observe > Alerting
    3. Select the alert to silence
    4. Click Create Silence.
    5. in Info section, update the "Comment" field and skip the "Creator" field. Now, click on Create button.
    6. It will throw an error "createdBy in body is required".

Actual results:

Able to create alerting silence without specifying "Creator" field.

Expected results:

User should not be able to create silences without specifying "Creator" field as it should be a mandatory.

Additional info:

The steps works well for prior version of RHOCP 4.15 (tested on 4.14)

https://github.com/openshift/monitoring-plugin/pull/113

Bug OCPBUGS-35992: Workaround: Pods writing files larger than memory limit to PVCs tend to OOM frequently

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35971~~. The following is the description of the original issue:
—

This tracks disabling of MG LRU by writing 0 to `/sys/kernel/mm/lru_gen/enabled`

Description of problem:

Since 4.16.0 pods with memory limits tend to OOM very frequently when writing files larger than memory limit to PVC

Version-Release number of selected component (if applicable):

4.16.0-rc.4

How reproducible:

100% on certain types of storage
(AWS FSx, certain LVMS setups, see additional info)

Steps to Reproduce:

1. Create pod/pvc that writes a file larger than the container memory limit (attached example)
2.
3.

Actual results:

OOMKilled

Expected results:

Success

Additional info:

For simplicity, I will focus on BM setup that produces this with LVM storage.
This is also reproducible on AWS clusters with NFS backed NetApp ONTAP FSx.

Further reduced to exclude the OpenShift layer, LVM on a separate (non root) disk:

Prepare disk
lvcreate -T vg1/thin-pool-1 -V 10G -n oom-lv
mkfs.ext4 /dev/vg1/oom-lv 
mkdir /mnt/oom-lv
mount /dev/vg1/oom-lv /mnt/oom-lv

Run container
podman run -m 600m --mount type=bind,source=/mnt/oom-lv,target=/disk --rm -it quay.io/centos/centos:stream9 bash
[root@2ebe895371d2 /]# curl https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-x86_64-9-20240527.0.x86_64.qcow2 -o /disk/temp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 47 1157M   47  550M    0     0   111M      0  0:00:10  0:00:04  0:00:06  111MKilled
(Notice the process gets killed, I don't think podman ever whacks the whole container over this though)

The same process on the same hardware on a 4.15 node (9.2) does not produce an OOM
(vs 4.16 which is RHEL 9.4)

For completeness, I will provide some details about the setup behind the LVM pool, though I believe it should not impact the decision about whether this is an issue:
sh-5.1# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               vg1
  PV Size               446.62 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              114335
  Free PE               11434
  Allocated PE          102901
  PV UUID               <UUID>
Hardware:
SSD (INTEL SSDSC2KG480G8R) behind a RAID 0 of a PERC H330 Mini controller

At the very least, this seems like a change in behavior but tbh I am leaning towards an outright bug.

QE Verification Steps

It's been independently verified that setting /sys/kernel/mm/lru_gen/enabled = 0 avoids the oomkills. So verifying that nodes get this value applied is the main testing concern at this point, new installs, upgrades, and new nodes scaled after an upgrade.

If we want to go so far as to verify that the oomkills don't happen the kernel QE team have a simplified reproducer here which involves mounting an NFS volume and using podman to create a container with a memory limit and writing data to that NFS volume.

https://issues.redhat.com/browse/RHEL-43371?focusedId=24981771&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-24981771

https://github.com/openshift/machine-config-operator/pull/4427

Bug OCPBUGS-24875: Update 4.16 ose-gcp-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-gcp/pull/215

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-gcp/pull/215

Bug OCPBUGS-25943: Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

View the Description View the linked PRs

Description of problem:

Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

Version-Release number of selected component (if applicable):

    4.16

pr - https://github.com/openshift/origin/pull/28464

https://github.com/openshift/origin/pull/28464

Bug OCPBUGS-28535: CCO Pod crashes on BM cluster when AWS Root Credential exists

View the Description View the linked PRs

Description of problem:

Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1996624, when the AWS root credential (must possesses the "iam:SimulatePrincipalPolicy" permission) exists on a BM cluster, the CCO Pod crashes when running the secretannotator controller.

Steps to Reproduce:

1. Install a BM cluster
fxie-mac:cloud-credential-operator fxie$ oc get infrastructures.config.openshift.io cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2024-01-28T19:50:05Z"
  generation: 1
  name: cluster
  resourceVersion: "510"
  uid: 45bc2a29-032b-4c74-8967-83c73b0141c4
spec:
  cloudConfig:
    name: ""
  platformSpec:
    type: None
status:
  apiServerInternalURI: https://api-int.fxie-bm1.qe.devcluster.openshift.com:6443
  apiServerURL: https://api.fxie-bm1.qe.devcluster.openshift.com:6443
  controlPlaneTopology: SingleReplica
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: fxie-bm1-x74wn
  infrastructureTopology: SingleReplica
  platform: None
  platformStatus:
    type: None 

2. Create an AWS user with IAMReadOnlyAccess permissions:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:GenerateCredentialReport",
                "iam:GenerateServiceLastAccessedDetails",
                "iam:Get*",
                "iam:List*",
                "iam:SimulateCustomPolicy",
                "iam:SimulatePrincipalPolicy"
            ],
            "Resource": "*"
        }
    ]
}

3. Create AWS root credentials with a set of access keys of the user above
4. Trigger a reconcile of the secretannotator controller, e.g. via editting cloudcredential/cluster

Logs:

time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:CreateAccessKey" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:CreateUser" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteAccessKey" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteUser" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:DeleteUserPolicy" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:PutUserPolicy" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Action not allowed with tested creds" action="iam:TagUser" controller=secretannotator
time="2024-01-29T04:47:27Z" level=warning msg="Tested creds not able to perform all requested actions" controller=secretannotator
I0129 04:47:27.988535 1 reflector.go:289] Starting reflector *v1.Infrastructure (10h37m20.569091933s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233
I0129 04:47:27.988546 1 reflector.go:325] Listing and watching *v1.Infrastructure from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233
I0129 04:47:27.989503 1 reflector.go:351] Caches populated for *v1.Infrastructure from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a964a0]

goroutine 341 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:115 +0x1e5
panic({0x3fe72a0?, 0x809b9e0?})
/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/cloud-credential-operator/pkg/operator/utils/aws.LoadInfrastructureRegion({0x562e1c0?, 0xc002c99a70?}, {0x5639ef0, 0xc0001b6690})
/go/src/github.com/openshift/cloud-credential-operator/pkg/operator/utils/aws/utils.go:72 +0x40
github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws.(*ReconcileCloudCredSecret).validateCloudCredsSecret(0xc0008c2000, 0xc002586000)
/go/src/github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws/reconciler.go:206 +0x1a5
github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws.(*ReconcileCloudCredSecret).Reconcile(0xc0008c2000, {0x30?, 0xc000680c00?}, {0x4f38a3d?, 0x0?}, {0x4f33a20?, 0x416325?})
/go/src/github.com/openshift/cloud-credential-operator/pkg/operator/secretannotator/aws/reconciler.go:166 +0x605
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x561ff20?, {0x561ff20?, 0xc002ff3b00?}, {0x4f38a3d?, 0x3b180c0?}, {0x4f33a20?, 0x55eea08?})
/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000189360, {0x561ff58, 0xc0007e5040}, {0x4589f00?, 0xc000570b40?})
/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314 +0x365
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000189360, {0x561ff58, 0xc0007e5040})
/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 183
/go/src/github.com/openshift/cloud-credential-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222 +0x565

Actual results:

CCO Pod crashes and restarts in a loop:
fxie-mac:cloud-credential-operator fxie$ oc get po -n openshift-cloud-credential-operator -w
NAME                                         READY   STATUS    RESTARTS        AGE
cloud-credential-operator-657bdffdff-9wzrs   2/2     Running   3 (2m35s ago)   8h

https://github.com/openshift/cloud-credential-operator/pull/667

Bug OCPBUGS-35806: ARO-specific dnsmasq dependency causes dependency loop

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35519~~. The following is the description of the original issue:
—
Description of problem:

In an attempt to fix https://issues.redhat.com/browse/OCPBUGS-35300, we introduced an Azure-specific dependency on dnsmasq, which introduced a dependency loop. This bug aims to revert that chain.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4414

Bug OCPBUGS-28936: ART requests updates to 4.16 image ose-vmware-vsphere-csi-driver-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/222

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/222

Bug OCPBUGS-35829: Live migration gets stuck when the ConfigMap MTU is absent

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35316~~. The following is the description of the original issue:
—
Description of problem:

Live migration gets stuck when the ConfigMap MTU is absent. The ConfigMap mtu should be created by the mtu-prober job at the installation time since 4.11. But if the cluster was upgrade from a very early releases, such as 4.4.4, the ConfigMap mtu may be absent.

Version-Release number of selected component (if applicable):

4.16.rc2

How reproducible:

Steps to Reproduce:

1. build a 4.16 cluster with OpenShiftSDN
2. remove the configmap mtu from the namespace cluster-network-operator.
3. start live migration.

Actual results:

Live migration gets stuck with error

NetworkTypeMigrationFailed
Failed to process SDN live migration (configmaps "mtu" not found)

Expected results:

Live migration finished successfully.

Additional info:

A workaround is to create the configmap mtu manually before starting live migration.

https://github.com/openshift/cluster-network-operator/pull/2416

Bug OCPBUGS-34801: Removing old weak ciphers from security profile for Hypershift hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30986~~. The following is the description of the original issue:
—
Description of problem:

After we applied the old tlsSecurityProfile to the Hypershift hosted clsuter, the apiserver ran into CrashLoopBackOff failure, this blocked our test.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-03-13-061822   True        False         129m    Cluster version is 4.16.0-0.nightly-2024-03-13-061822

How reproducible:

    always

Steps to Reproduce:

    1. Specify KUBECONFIG with kubeconfig of the Hypershift management cluster
    2. hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r .items[].metadata.name)
    3. oc patch hostedcluster $hostedcluster -n clusters --type=merge -p '{"spec": {"configuration": {"apiServer": {"tlsSecurityProfile":{"old":{},"type":"Old"}}}}}'
hostedcluster.hypershift.openshift.io/hypershift-ci-270930 patched
    4. Checked the tlsSecurityProfile,
    $ oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.apiServer
{
  "audit": {
    "profile": "Default"
  },
  "tlsSecurityProfile": {
    "old": {},
    "type": "Old"
  }
}

Actual results:

One of the kube-apiserver of Hosted cluster ran into CrashLoopBackOff, stuck in this status, unable to complete the old tlsSecurityProfile configuration.

$ oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS             RESTARTS      AGE
kube-apiserver-5b6fc94b64-c575p   5/5     Running            0             70m
kube-apiserver-5b6fc94b64-tvwtl   5/5     Running            0             70m
kube-apiserver-84c7c8dd9d-pnvvk   4/5     CrashLoopBackOff   6 (20s ago)   7m38s

Expected results:

    Applying the old tlsSecurityProfile should be successful.

Additional info:

   This also can be reproduced on 4.14, 4.15. We have the last passed log of the test case as below:
  passed      API_Server       2024-02-19 13:34:25(UTC)    aws 	4.14.0-0.nightly-2024-02-18-123855   hypershift 	
  passed      API_Server	  2024-02-08 02:24:15(UTC)   aws 	4.15.0-0.nightly-2024-02-07-062935 	hypershift
  passed      API_Server	  2024-02-17 08:33:37(UTC)   aws 	4.16.0-0.nightly-2024-02-08-073857 	hypershift

From the history of the test, it seems that some code changes were introduced in February that caused the bug.

Bug OCPBUGS-27192: Remove NCv2 series from azure doc tested_instance_types_x86_64

View the Description View the linked PRs

Description of problem:

Based on Azure doc [1], NCv2 series Azure virtual machines (VMs) are retired on September 6, 2023. VM could not be provisioned on those instance types.

So remove standardNCSv2Family from azure doc tested_instance_types_x86_64 on 4.13+.

[1] https://learn.microsoft.com/en-us/azure/virtual-machines/ncv2-series

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. cluster is installed failed on NCv2 series instance type 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7911

Bug OCPBUGS-33610: fix API detection, azure bootstrap disk size, and use get to verify endpoint

View the Description View the linked PRs

Description of problem:

The installer - in some cases - will not report an error when the API is failed to be detected. On Azure, the bootstrap node is under provisioned for IOPS. The detection logic with check_url is checking against an endpoint that 403s on HEAD requests.

Version-Release number of selected component (if applicable):

How reproducible:

All the time

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8371

Bug OCPBUGS-34736: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-42431: Hypershift is managing kubeconfigs for DNS and Ingress operators

View the Description View the linked PRs

This is a clone of issue OCPBUGS-41824. The following is the description of the original issue:
—
Description of problem:

    The kubeconfigs for the DNS Operator and the Ingress Operator are managed by Hypershift and they should only be managed by the cloud service provider. This can lead to the kubeconfig/certificate being invalid in the cases where the cloud service provider further manages the kubeconfig (for example ca-rotation).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4798

Bug OCPBUGS-23115: Doc for Builds with Red Hat Subscriptions Missing Steps

View the Description View the linked PRs

Description of problem:


Documentation for using Red Hat subscriptions in builds is missing a few important steps, especially for customers that have not turned on the tech preview feature for the Shared Resource CSI driver.

These are the following:

1. Customer needs Simple Content Access import enabled in the Insights Operator: https://docs.openshift.com/container-platform/4.12/support/remote_health_monitoring/insights-operator-simple-access.html
2. Customer needs to copy the secret data from openshift-config-managed/etc-pki-entitlement to the workspace the build is running in. We should provide oc commands that a cluster admin/platform team can execute.

For builds that are running in a network-restricted environment and access RHEL content through Satellite, the documentation must also provide instructions on how to obtain an `rhsm.conf` file for the Satellite instance and mount it into the build container.

Version-Release number of selected component (if applicable):


4.12

How reproducible:


Always

Steps to Reproduce:


Read the documentation for https://docs.openshift.com/container-platform/4.12/cicd/builds/running-entitled-builds.html#builds-create-imagestreamtag_running-entitled-builds and execute the commands as is.

Actual results:


Build probably won't run because the required secret is not created.

Expected results:


Customers should be able to run a build that requires RHEL entitlements following the exact steps as described in the doc.

Additional info:


https://docs.openshift.com/container-platform/4.12/cicd/builds/running-entitled-builds.html

https://github.com/openshift/origin/pull/28560

Bug OCPBUGS-36918: `PrometheusRemoteStorageFailures` alert failed to trigger

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35483~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-13-084629

How reproducible:

100%

Steps to Reproduce:

1.apply configmap
*****
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      remoteWrite:
        - url: "http://invalid-remote-storage.example.com:9090/api/v1/write"
          queue_config:
            max_retries: 1
*****

2. check logs
% oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring
...
ts=2024-06-14T01:28:01.804Z caller=dedupe.go:112 component=remote level=warn remote_name=5ca657 url=http://invalid-remote-storage.example.com:9090/api/v1/write msg="Failed to send batch, retrying" err="Post \"http://invalid-remote-storage.example.com:9090/api/v1/write\": dial tcp: lookup invalid-remote-storage.example.com on 172.30.0.10:53: no such host"

3.query after 15mins
% oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="PrometheusRemoteStorageFailures"}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   145  100    78  100    67    928    797 --:--:-- --:--:-- --:--:--  1726
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [],
    "analysis": {}
  }
}

% oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=prometheus_remote_storage_failures_total' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   124  100    78  100    46   1040    613 --:--:-- --:--:-- --:--:--  1653
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [],
    "analysis": {}
  }
}

Actual results:

alert did not triggeted

Expected results:

alert triggered, able to see the alert and metrics

Additional info:

below metrics show as `No datapoints found.`
prometheus_remote_storage_failures_total
prometheus_remote_storage_samples_dropped_total
prometheus_remote_storage_retries_total

`prometheus_remote_storage_samples_failed_total` value is 0

https://github.com/openshift/prometheus/pull/214

Bug OCPBUGS-25840: Warning info title on operator page for Azure WI/FI cluster are not completely consistent

View the Description View the linked PRs

Description of problem:

For special operators, there are warning info on operator detail modal page and installation page when it's  Azure WI/FI cluster. The warning info titles are not consistent on these two pages.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-21-155123

How reproducible:

Always

Steps to Reproduce:

    1.Prepare a special operator which would show warning info on  Azure WI/FI cluster.
    2.Login console of  Azure WI/FI cluster, check the warning info title on the operator detail item modal and installation page.
    3.

Actual results:

2. On operator detail item modal, the warning title is "Cluster in Azure Workload Identity / Federated Identity Mode", and on installation page, the warning info title is "Cluster in Workload Identity / Federated Identity Mode". The word "Azure" is missed on the second page.

Expected results:

2. The warning title should keep consistent.

Additional info:

screenshot: https://drive.google.com/drive/folders/1alFBEtO1gN4q5_mAtHCNzuLTOe5zXp0K?usp=drive_link

https://github.com/openshift/console/pull/13472

Bug OCPBUGS-30442: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/94

Bug OCPBUGS-24601: Add minReadySeconds to network-node-identity

View the Description View the linked PRs

Description of problem:

    The node-network-identity deployment should be set to assist in a controlled rollout of the microservice pods. The general goal is to have a microservice pod only report to Kubernetes as being ready when it has completed initialization and is stable enough to complete tasks.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2151

Bug OCPBUGS-24957: Update 4.16 coredns-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/107

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/107

Story HELM-522: Bump Helm version to 3.13 in ODC

View the Description View the linked PRs

Owner: Architect:

Story (Required)

As an ODC helm backend developer I would like to be able to bump version of helm to 3.13 to stay synched up with the version we will ship with OCP 4.15

Background (Required)

Normal activity we do every time a new OCP version is release to stay current

Glossary

NA

Out of scope

NA

Approach(Required)

Bump version of helm to 3.13 run, build and unit test and make sure everything is working as expected. Last time we had a conflict with DevFile backend.

Dependencies

Might had dependencies with DevFile team to move some dependencies forward

Edge Case

NA

Acceptance Criteria

Console Helm dependency is moved to 3.13

INVEST Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Legend

Unknown
Verified
Unsatisfied

https://github.com/openshift/console/pull/13410

Bug OCPBUGS-33595: ClusterSizing controller requeues hosted clusters that no longer exist indefinitely

View the Description View the linked PRs

Description of problem:

    Logs like the following are constantly emitted by the clustersizing controller:
{"level":"error","ts":"2024-05-13T11:30:43Z","msg":"Reconciler error","controller":"hostedclustersizing","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"dry-4","namespace":"ocm-staging-2b611n1002jcb8ikrn3to0vbds64qup9"},"namespace":"ocm-staging-2b611n1002jcb8ikrn3to0vbds64qup9","name":"dry-4","reconcileID":"7e4c2fa1-a2cb-40e7-bed3-38d2d498e59d","error":"could not get hosted cluster ocm-staging-2b611n1002jcb8ikrn3to0vbds64qup9/dry-4: HostedCluster.hypershift.openshift.io \"dry-4\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

  sometimes

Steps to Reproduce:

    1. Set up a management cluster with size tagging
    2. Create a hosted cluster
    3. Delete the hosted cluster

Actual results:

    The hypershift operator continues logging about not being able to find the deleted hosted cluster.

Expected results:

    No additional logging happens.

Additional info:

    The clusterszing controller returns an error when it can't find a hostedcluster instead of returning nil. This causes that hostedcluster to be requeued indefinitely.

https://github.com/openshift/hypershift/pull/4023

Bug OCPBUGS-38458: Manila driver and node-registrar does not uses healtcheck

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38457~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-29240. The following is the description of the original issue:
—
Manila drivers and node-registrar should be configured to use healtchecks.

https://github.com/openshift/csi-driver-manila-operator/pull/241

Bug OCPBUGS-25688: microshift is not managed by CNO, remove the folder

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2173

Bug OCPBUGS-27823: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2213

Bug OCPBUGS-27825: Incompatible upstream manifests

View the Description View the linked PRs

The upstream project reorganized the config directory and we need to adapt it for downstream. Until then, upstream->downstream syncing is blocked.

https://github.com/openshift/baremetal-operator/pull/329

Bug OCPBUGS-30532: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/101

Bug OCPBUGS-33357: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4356

Bug OCPBUGS-30257: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2269

Bug OCPBUGS-30488: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/798

Bug OCPBUGS-25578: Update 4.16 ose-machine-api-provider-gcp-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/75

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/75

Bug OCPBUGS-25669: apbexternalroute and egressfirewall status shows empty on hypershift hosted cluster

View the Description View the linked PRs

Description of problem:

apbexternalroute and egressfirewall status shows empty on hypershift hosted cluster

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-17-173511

How reproducible:

always

Steps to Reproduce:

1. setup hypershift, login hosted cluster
% oc get node
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-128-55.us-east-2.compute.internal    Ready    worker   125m   v1.28.4+7aa0a74
ip-10-0-129-197.us-east-2.compute.internal   Ready    worker   125m   v1.28.4+7aa0a74
ip-10-0-135-106.us-east-2.compute.internal   Ready    worker   125m   v1.28.4+7aa0a74
ip-10-0-140-89.us-east-2.compute.internal    Ready    worker   125m   v1.28.4+7aa0a74


2. create new project test
% oc new-project test


3. create apbexternalroute and egressfirewall on hosted cluster
apbexternalroute yaml file:
---
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: apbex-route-policy
spec:
  from:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: test
  nextHops:
    static:
    - ip: "172.18.0.8"
    - ip: "172.18.0.9"
% oc apply -f apbexroute.yaml 
adminpolicybasedexternalroute.k8s.ovn.org/apbex-route-policy created

egressfirewall yaml file:
---
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress:
  - type: Allow
    to: 
      cidrSelector: 0.0.0.0/0
% oc apply -f egressfw.yaml 
egressfirewall.k8s.ovn.org/default created


3. oc get apbexternalroute and oc get egressfirewall

Actual results:

The status show empty:
% oc get apbexternalroute
NAME                 LAST UPDATE   STATUS
apbex-route-policy   49s                     <--- status is empty
% oc describe apbexternalroute apbex-route-policy | tail -n 8
Status:
  Last Transition Time:  2023-12-19T06:54:17Z
  Messages:
    ip-10-0-135-106.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
    ip-10-0-129-197.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
    ip-10-0-128-55.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
    ip-10-0-140-89.us-east-2.compute.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
Events:  <none>

% oc get egressfirewall
NAME      EGRESSFIREWALL STATUS
default                           <--- status is empty 
% oc describe egressfirewall default | tail -n 8
    Type:             Allow
Status:
  Messages:
    ip-10-0-129-197.us-east-2.compute.internal: EgressFirewall Rules applied
    ip-10-0-128-55.us-east-2.compute.internal: EgressFirewall Rules applied
    ip-10-0-140-89.us-east-2.compute.internal: EgressFirewall Rules applied
    ip-10-0-135-106.us-east-2.compute.internal: EgressFirewall Rules applied
Events:  <none>

Expected results:

the status can be shown correctly

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
Don't presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with "sbr-untriaged"
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"

https://github.com/openshift/cluster-network-operator/pull/2169

Bug OCPBUGS-27881: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-powervs/pull/74

Bug OCPBUGS-24861: Update 4.16 openshift-enterprise-egress-dns-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/158

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/158

Bug OCPBUGS-30314: Serialisation error in DoHTTPProbe function logging at verbosity level 4 in probehttp.go

View the Description View the linked PRs

In the `DoHTTPProbe` function located at `github.com/openshift/router/pkg/router/metrics/probehttp/probehttp.go`, logging of the HTTP response object at verbosity level 4 results in a serialisation error due to non-serialisable fields within the `http.Response` object. The error logged is `<<error: json: unsupported type: func() (io.ReadCloser, error)>>`, pointing towards an inability to serialise the `Body` field, which is of type `io.ReadCloser`.

This function is designed to check if a GET request to the specified URL succeeds, logging detailed response information at higher verbosity levels for diagnostic purposes.

Steps to Reproduce:
1. Increase the logging level to 4.
2. Perform an operation that triggers the `DoHTTPProbe` function.
3. Review the logging output for the error message.

Expected Behaviour:

The logger should gracefully handle or exclude non-serialisable fields like `Body`, ensuring clean and informative logging output that aids in diagnostics without encountering serialisation errors.

Actual Behaviour:

Non-serialisable fields in the `http.Response` object lead to the error `<<error: json: unsupported type: func() (io.ReadCloser, error)>>` being logged. This diminishes the utility of logs for debugging at higher verbosity levels.

Impact:

The issue is considered of low severity since it only appears at logging level 4, which is beyond standard operational levels (level 2) used in production. Nonetheless, it could hinder effective diagnostics and clutter logs with errors when high verbosity levels are employed for troubleshooting.

Suggested Fix:

Modify the logging functionality within `DoHTTPProbe` to either filter out non-serialisable fields from the `http.Response` object or implement a custom serialisation approach that allows these fields to be logged in a more controlled and error-free manner.

https://github.com/openshift/router/pull/566

Bug OCPBUGS-33011: Unable to remove the AlternateBackends from the routes using the web console

View the Description View the linked PRs

Issue customer is experiencing:
Despite manually removing the alternate service (old) and saving the configuration from the UI, the alternate service did not get removed from the route, and the changes did not take effect.

From the UI, if using the Form view and select Remove Alternate Service, click save, if they refresh the route information it still shows the route configuration with Alternate service defined.
If they use the YAML view, and remove the entry from there and save it's gone properly.
If they use the CLI and edit the route, and remove the alternate service section, it also works properly.

Tests:

I have tested this scenario in my test cluster with OCP v4.13

I have created a route with the Alternate Backends:
~~~

oc describe routes.route.openshift.io
Name: httpd-example
Namespace: test-ab
Created: 5 minutes ago
Labels: app=httpd-example
template=httpd-example
Annotations: openshift.io/generated-by=OpenShiftNewApp
openshift.io/host.generated=true
Requested Host: httpd-example-test-ab.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com
exposed on router default (host router-default.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com) 5 minutes ago
Path: <none>
TLS Termination: <none>
Insecure Policy: <none>
Endpoint Port: <all endpoint ports>
Service: httpd-example <-----------
Weight: 50 (50%).
Endpoints: <none>
Service: pod-b. <-----------
Weight: 50 (50%)
Endpoints: <none>
~~~

Then I tried deleting it from the Console.

After removing the Alternate Backend from the console in the Form view, I saved the config.

But upon checking the route details again in the CLI, I could see the same Alternate Backend even though I have removed it:
~~~

oc describe routes.route.openshift.io
Name: httpd-example
Namespace: test-ab
Created: 12 minutes ago
Labels: app=httpd-example
template=httpd-example
Annotations: openshift.io/generated-by=OpenShiftNewApp
openshift.io/host.generated=true
Requested Host: httpd-example-test-ab.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com
exposed on router default (host router-default.apps.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com) 12 minutes ago
Path: <none>
TLS Termination: <none>
Insecure Policy: <none>
Endpoint Port: web
Service: httpd-example. <-----
Weight: 100 (66%)
Endpoints: 10.131.0.148:8080
Service: pod-b <-----
Weight: 50 (33%)
Endpoints: <none>
~~~

https://github.com/openshift/console/pull/13799

Bug OCPBUGS-37621: Bump to kubernetes 1.29.7

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.29.7:

Changelog:
v1.29.7: https://github.com/kubernetes/kubernetes/blob/release-1.29/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1296

https://github.com/openshift/kubernetes/pull/2039

Bug OCPBUGS-24919: Update 4.16 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/206

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/206

Bug OCPBUGS-27264: flakiness in local/shared gateway migration jobs

View the Description View the linked PRs

Description of problem:

The e2e-aws-ovn-shared-to-local-gateway-mode-migration and e2e-aws-ovn-local-to-shared-gateway-mode-migration jobs fail about 50% of the time with

+ oc patch Network.operator.openshift.io cluster --type=merge --patch '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"routingViaHost":false}}}}}'
network.operator.openshift.io/cluster patched
+ oc wait co network --for=condition=PROGRESSING=True --timeout=60s
error: timed out waiting for the condition on clusteroperators/network

https://github.com/openshift/cluster-network-operator/pull/2206

Bug OCPBUGS-39490: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-operator/pull/378

Bug OCPBUGS-29578: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/111

Bug OCPBUGS-39082: vsphere - when folder is undefined and datacenter is in a folder, entire folder path is incorrectly created

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38616~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38599. The following is the description of the original issue:
—
Description of problem:

If folder is undefined and the datacenter exists in a datacenter-based folder
the installer will create the entire path of folders from the root of vcenter - which is incorrect

This does not occur if folder is defined.

An upstream bug was identified when debugging this:

https://github.com/vmware/govmomi/issues/3523

https://github.com/openshift/installer/pull/8912

Bug OCPBUGS-23199: Whereabouts reconciler errors with "IPPool not found" on pod deletion although the IPPool exists

View the Description View the linked PRs

Description of problem:

During a pod deletion, the whereabouts reconciler correctly detects the pod deletion but it errors out claiming that the IPPool is not found.However, when checking the audit logs, we can see no deletion, no re-creation and we can even see successful "patch" and "get" requests to the same IPPool. This means that the IPPool was never deleted and properly accessible at the time of the issue, so the error in the reconciler looks like it made some mistake while retrieving the IPPool.

Version-Release number of selected component (if applicable):

4.12.22

How reproducible:

Sometimes

Steps to Reproduce:

    1.Delete pod
    2.
    3.

Actual results:

Error in whereabouts reconciler. New pods cannot using additional networks with whereabouts IPAM plugin cannot have IPs allocated due to wrong cleanup.

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2160

Bug OCPBUGS-26513: The source of idms should not be localhost:55000/openshift

View the Description View the linked PRs

Description of problem:

oc-mirror with v2 will create the idms file as output , but the source is like :
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  creationTimestamp: null
  name: idms-2024-01-08t04-19-04z
spec:
  imageDigestMirrors:
  - mirrors:
    - ec2-3-144-29-184.us-east-2.compute.amazonaws.com:5000/ocp2/openshift
    source: localhost:55000/openshift
  - mirrors:
    - ec2-3-144-29-184.us-east-2.compute.amazonaws.com:5000/ocp2/openshift-release-dev
    source: quay.io/openshift-release-dev
status: {}

The source should always be the origin registry like :quay.io/openshift-release-dev

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

   1. run the command with v2 :
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  platform:
    channels:
      - name: stable-4.14
        minVersion: 4.14.3
        maxVersion: 4.14.3
    graph: true

`oc-mirror --config config.yaml file://out --v2` 
`oc-mirror --config config.yaml --from file://out  --v2 docker://xxxx:5000/ocp2`    
2. check the idms file

Actual results:

    2. cat idms-2024-01-08t04-19-04z.yaml 
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  creationTimestamp: null
  name: idms-2024-01-08t04-19-04z
spec:
  imageDigestMirrors:
  - mirrors:
    - xxxx.com:5000/ocp2/openshift
    source: localhost:55000/openshift
  - mirrors:
    - xxxx.com:5000/ocp2/openshift-release-dev
    source: quay.io/openshift-release-dev

Expected results:

The source should not be localhost:55000, should be like the origin registry.

Additional info:

https://github.com/openshift/oc-mirror/pull/772

Bug OCPBUGS-31263: PodSecurityViolation alert missing in Hypershift

View the Description View the linked PRs

hypershift is not creating this alert in HostedClusters
https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/bindata/assets/alerts/podsecurity-violations.yaml

In standalone OCP, it is done by the KASO.

https://github.com/openshift/hypershift/pull/3733

Bug OCPBUGS-16760: NodeLogQuery e2e tests are failing with Kubernetes 1.28 bump

View the Description View the linked PRs

GDescription of problem:

NodeLogQuery e2e tests are failing with Kubernetes 1.28 bump. Example:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/1646/pull-ci-openshift-kubernetes-master-k8s-e2e-gcp-ovn/1683472309211369472

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1792

Bug OCPBUGS-29992: Clicking on quickstart tile has no response

View the Description View the linked PRs

Description of problem:

There is no response when user clicks on quickstart items.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-02-26-013420
browser 122.0.6261.69 (Official Build) (64-bit)

How reproducible:

always

Steps to Reproduce:

    1.Go to quick starts page by clicking "View all quick starts" on Home -> Overview page.
    2. Click on any quickstart item to check its steps.
    3.

Actual results:

2. There is no response.

Expected results:

2. Should open quickstart sidepage for installation instructions.

Additional info:

The issue doesn't exist on firefox 123.0 (64-bit)

https://github.com/openshift/console/pull/13693

Spike SPLAT-1596: Look into why devqe vCenter breaks when using new powercli upi

View the linked PRs

https://github.com/openshift/installer/pull/8315

Bug OCPBUGS-22994: Add alert to warn vSphere users about using usernames without domain

View the Description View the linked PRs

This is a follow up for https://issues.redhat.com/browse/OCPBUGS-14829 and https://issues.redhat.com/browse/OCPBUGS-21821, let's add an alert if vSphere users using usernames without domain, it makes storage doesn't work and should be alerted.

Bug OCPBUGS-24819: Update 4.16 ose-powervs-block-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/53

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25392: [4.16] OCP 4.15 nightly deployment on a bare-metal server without using the provisioning network is stuck during deployment.

View the Description View the linked PRs

Description of problem:

OCP 4.15 nightly deployment on a Bare-metal servers without using the provisioning network is stuck during deployment.

Job history:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g

Deployment stuck similiar to this:

Upstream job logs:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g/1732520780954079232/artifacts/e2e-telco5g/telco5g-cluster-setup/artifacts/cloud-init-output.log

~~~

level=debug msg=ironic_node_v1.openshift-master-host[2]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[0]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[1]: Creating...level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [10s elapsed]..level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [2h28m51s elapsed]level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [2h28m51s elapsed]
~~~

Ironic logs from bootstrap node:
~~~
Dec 07 13:10:13 localhost.localdomain start-provisioning-nic.sh[3942]: Error: failed to modify ipv4.addresses: invalid IP address: Invalid IPv4 address ''.
Dec 07 13:10:13 localhost.localdomain systemd[1]: provisioning-interface.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 07 13:10:13 localhost.localdomain systemd[1]: provisioning-interface.service: Failed with result 'exit-code'.
Dec 07 13:10:13 localhost.localdomain systemd[1]: Failed to start Provisioning interface.
Dec 07 13:10:13 localhost.localdomain systemd[1]: Dependency failed for DHCP Service for Provisioning Network.
Dec 07 13:10:13 localhost.localdomain systemd[1]: ironic-dnsmasq.service: Job ironic-dnsmasq.service/start failed with result 'dependency'
~~~

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Everytime

Steps to Reproduce:

1.Deploy OCP

More information about our setup:
In our environment, We have 3 virtual master node, 1 virtual worker and 1 baremetal worker. We use KCLI tool for creation of the virtual environment and for running the deployment workflow using IPI, In our setup we don't use provisioning network. (Same setup is used for other OCP version till 4.14 and are working fine.)

We have attached our install-config.yaml (for RH employees) and logs from bootstrap node.

Actual results:

Deployment is failing

Dec 07 13:10:13 localhost.localdomain start-provisioning-nic.sh[3942]: Error: failed to modify ipv4.addresses: invalid IP address: Invalid IPv4 address ''.

Expected results:

Deployment should pass

Additional info:

https://github.com/openshift/ironic-image/pull/440

Bug OCPBUGS-23666: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/604

Bug OCPBUGS-23811: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-node-driver-registrar/pull/58

Bug OCPBUGS-33709: Bootstrap proxy no longer used

View the Description View the linked PRs

Description of problem:

    In https://github.com/openshift/installer/pull/8248, the bootstrap node metadata was overridden and the proxy information was no longer used.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8342

Bug OCPBUGS-23888: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/196

Bug OCPBUGS-35476: Race condition in CPMS presubmits can cause not found error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35416~~. The following is the description of the original issue:
—
Description of problem:

The presubmit test that expects an inactive CPMS to be regnerated, resets the state at the end of the test.
In doing so, it causes the CPMS generator to re-generate back to the original state.
Part of regeneration involves deleting and recreating the CPMS.

If the regeneration is not quick enough, the next part of the test can fail, as it is expecting the CPMS to exist.

We should change this to an eventually to avoid the race between the generator and the test.

See https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-control-plane-machine-set-operator/304/pull-ci-openshift-cluster-control-plane-machine-set-operator-release-4.13-e2e-aws-operator/1801195115868327936 as an example failure

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/307

Bug OCPBUGS-24997: Update 4.16 ose-cluster-image-registry-operator-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-25147: Update 4.16 ose-aws-ebs-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-operator/pull/81

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-37492: RHOCP installation on Openstack fails with error "failed to tag the Control plane security group"

View the Description View the linked PRs

Description of problem:

During OpenShift cluster installation - 4.16 Openshift installer file which uses terraform module is unable to create tags for the security groups associated with master / worker nodes since the tag is in key value format.  (i.e key=value)

Error log for reference:
level=error msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed during pre-provisioning: faile                    d to create security groups: failed to tag the Control plane security group: Resource not found: [PUT
https://example.cloud:443/v2.0/security-groups/sg-id/tags/openshiftClusterID=ocpclientprod2-vwgsc]

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

100%

Steps to Reproduce:

    1. Create install-config
    2. run the 4.16 installer
    3. Observe the installation logs

Actual results:

installation fails to tag the security group

Expected results:

installation to be successful

Additional info:

https://github.com/openshift/installer/pull/8767

Bug OCPBUGS-24787: Update 4.16 ose-cluster-samples-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-samples-operator/pull/527

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25708: CVO should continue to periodically fetch upstream Cincinnati despite Recommended=Unknown risks

View the Description View the linked PRs

Description of problem:

Changes made for faster risk cache-warming (the ~~OCPBUGS-19512~~ series) introduced an unfortunate cycle:

1. Cincinnati serves vulnerable PromQL, like graph-data#4524.
2. Clusters pick up that broken PromQL, try to evaluate, and fail. Re-eval-and-fail loop continues.
3. Cincinnati PromQL fixed, like graph-data#4528.
4. Cases:

- (a) Before the cache-warming changes, and also after this bug's fix, Clusters pick up the fixed PromQL, try to evaluate, and start succeeding. Hooray!
- (b) Clusters with the cache-warming changes but without this bug's fix say "it's been a long time since we pulled fresh Cincinanti information, but it has not been long since my last attempt to eval this broken PromQL, so let me skip the Cincinnati pull and re-eval that old PromQL", which fails. Re-eval-and-fail loop continues.

Version-Release number of selected component (if applicable):

The regression went back via:

Updates from those releases (and later in their 4.y, until this bug lands a fix) to later releases are exposed.

How reproducible:

Likely very reproducible for exposed releases, but only when clusters are served PromQL risks that will consistently fail evaluation.

Steps to Reproduce:

1. Launch a cluster.
2. Point it at dummy Cincinnati data, as described in ~~OTA-520~~. Initially declare a risk with broken PromQL in that data, like cluster_operator_conditions.
3. Wait until the cluster is reporting Recommended=Unknown for those risks (oc adm upgrade --include-not-recommended).
4. Update the risk to working PromQL, like group(cluster_operator_conditions). Alternatively, update anything about the update-service data (e.g. adding a new update target with a path from the cluster's version).
5. Wait 10 minutes for the CVO to have plenty of time to pull that new Cincinnati data.
6. oc get -o json clusterversion version | jq '.status.conditionalUpdates[].risks[].matchingRules[].promql.promql' | sort | uniq | jq -r .

Actual results:

Exposed releases will still have the broken PromQL in their output (or will lack the new update target you added, or whatever the Cincinnati data change was).

Expected results:

Fixed releases will have picked up the fixed PromQL in their output (or will have the new update target you added, or whatever the Cincinnati data change was).

Additional info:

Identification

To detect exposure in collected Insights, look for EvaluationFailed conditionalUpdates like:

$ oc get -o json clusterversion version | jq -r '.status.conditionalUpdates[].conditions[] | select(.type == "Recommended" and .status == "Unknown" and .reason == "EvaluationFailed" and (.message | contains("invalid PromQL")))'
{
  "lastTransitionTime": "2023-12-15T22:00:45Z",
  "message": "Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34\nAdding a new worker node will fail for clusters running on ARO. https://issues.redhat.com/browse/MCO-958",
  "reason": "EvaluationFailed",
  "status": "Unknown",
  "type": "Recommended"
}

To confirm in-cluster vs. other EvaluationFailed invalid PromQL issues, you can look for Cincinnati retrieval attempts in CVO logs. Example from a healthy cluster:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:36:39.783530       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:36:39.831358       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:40:19.674925       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:40:19.727998       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:43:59.567369       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:43:59.620315       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:47:39.457582       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:47:39.509505       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:51:19.348286       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:51:19.401496       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"

showing fetch lines every few minutes. And from an exposed cluster, only showing PromQL eval lines:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:50:10.165101       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:11.166170       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:12.166314       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:13.166517       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:14.166847       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:15.167737       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:16.168486       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:17.169417       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:18.169576       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:19.170544       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from' | tail
...no hits...

Recovery

If bitten, the remediation is to address the invalid PromQ. For example, we fixed that AROBrokenDNSMasq expression in graph-data#4528. And after that the local cluster administrator should restart their CVO, such as with:

$ oc -n openshift-cluster-version delete -l k8s-app=cluster-version-operator pods

https://github.com/openshift/cluster-version-operator/pull/1009

Bug OCPBUGS-29028: 4.16 Test failing on Power trying to connect to thanos-querier

View the Description View the linked PRs

Description of problem:

Test case:
[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster should start and expose a secured proxy and unsecured metrics [apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

Example Z Job Link:
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-ovn-remote-libvirt-s390x/1754481543524388864

Z must-gather Link:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-ovn-remote-libvirt-s390x/1754481543524388864/artifacts/ocp-e2e-ovn-remote-libvirt-s390x/gather-libvirt/artifacts/must-gather.tar

Example P Job Link:
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-ovn-remote-libvirt-ppc64le/1754481543436308480

P must-gather Link:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-ovn-remote-libvirt-ppc64le/1754481543436308480/artifacts/ocp-e2e-ovn-remote-libvirt-ppc64le/gather-libvirt/artifacts/must-gather.tar

JSON body of error:
{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:383]: Unexpected error:
    <*fmt.wrapError | 0xc001d9c000>: 
    https://thanos-querier.openshift-monitoring.svc:9091: request failed: Get "https://thanos-querier.openshift-monitoring.svc:9091": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.38.188:53: no such host
    {
        msg: "https://thanos-querier.openshift-monitoring.svc:9091: request failed: Get \"https://thanos-querier.openshift-monitoring.svc:9091\": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.38.188:53: no such host",
        err: <*url.Error | 0xc0020e02d0>{
            Op: "Get",
            URL: "https://thanos-querier.openshift-monitoring.svc:9091",
            Err: <*net.OpError | 0xc000b8f770>{
                Op: "dial",
                Net: "tcp",
                Source: nil,
                Addr: nil,
                Err: <*net.DNSError | 0xc0020df700>{
                    Err: "no such host",
                    Name: "thanos-querier.openshift-monitoring.svc",
                    Server: "172.30.38.188:53",
                    IsTimeout: false,
                    IsTemporary: false,
                    IsNotFound: true,
                },
            },
        },
    }

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Observe Nightlies on P and/or Z

Actual results:

    Test failing

Expected results:

    Test passing

Additional info:

https://github.com/openshift/origin/pull/28621

Bug OCPBUGS-25101: Update 4.16 ose-libvirt-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-libvirt/pull/274

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-libvirt/pull/274

Bug OCPBUGS-41293: Supporting Bridge Type Linux Interfaces for Primary Networking

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39226. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible: Always

Repro Steps:

Add: "bridge=br0:enpf0,enpf2 ip=br0:dhcp" to dracut cmdline. Make sure either enpf0/enpf2 is the primary network of the cluster subnet.

The linux bridge can be configured to add a virtual switch between one or many ports. This can be done by a simple machine config that adds:
"bridge=br0:enpf0,enpf2 ip=br0:dhcp"
to the the kernel command line options which will be processed by dracut.

The use case of adding such a virtual bridge for simple IEEE802.1 switching is to support PCIe devices that act as co-processors in a baremetal server. For example:
-------- ---------------------

Host	PCIe	Co-processor
eth0	<------->	enpf0 <~~br0~~> enpf2	<---> network

-------- ---------------------
This co-processor could be a "DPU" network interface card. Thus the co-processor can be part of the same underlay network as the cluster and pods can be scheduled on the Host and the Co-processor. This allows for pods to be offloaded to the co-processor for scaling workloads.

Actual results:

ovs-configuration service fails.

Expected results:

ovs-configuration service passes with the bridge interface added to the ovs bridge.

https://github.com/openshift/machine-config-operator/pull/4563

Bug OCPBUGS-34929: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/286

Bug OCPBUGS-28331: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/220

Bug OCPBUGS-31439: Backport volumegroupsnapshot fixes to OCP 4.16

View the Description View the linked PRs

Description of problem:

Backport volumegroupsnapshot fixes to OCP 4.16, below are the PR's that need to be backported to external-snapshotter for OCP 4.16

https://github.com/kubernetes-csi/external-snapshotter/pull/1014
https://github.com/kubernetes-csi/external-snapshotter/pull/1015
https://github.com/kubernetes-csi/external-snapshotter/pull/1011
https://github.com/kubernetes-csi/external-snapshotter/pull/1034

https://github.com/openshift/csi-external-snapshotter/pull/146

Bug OCPBUGS-22969: Use v1 for flowcontrol API objects

View the Description View the linked PRs

flowcontrol v1beta3 is deprecated from 1.29, and will be removed in 1.32
update the OpenShift specific APF manifests to use v1

The flowcontrol manifests in the following operators (kas, oas, etcd, openshift controller manager, auth, and network) should use v1.

Bug OCPBUGS-26039: sidebar on deployment/deploymentconfig creation YAML view page is occupying too much screen width

View the Description View the linked PRs

Description of problem:

The YAML sidebar is occupying too much space on some pages

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-03-140457

How reproducible:

Always

Steps to Reproduce:

1. Go to Deployment/DeploymentConfig creation page
2. Choose 'YAML view'
3. (for comparison) Go to other resources YAML page, open the sidebar

Actual results:

We can see the sidebar is occupying too much screen compared with other resources YAML page

Expected results:

We should reduce the space sidebar occupies

Additional info:

https://github.com/openshift/console/pull/13495

Bug OCPBUGS-26765: Frequent SAST false positives

View the Description View the linked PRs

Description of problem:

The SAST scans keep coming up with bogus positive results from test and vendor files. This bug is just a placeholder to allow us to backport the change to ignore those files.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/292

Bug OCPBUGS-28309: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/99

Bug OCPBUGS-37725: Update owners in containernetworking-plugins

View the Description View the linked PRs

Backport ownerfile changes

https://github.com/openshift/containernetworking-plugins/pull/166

Story TRT-1370: Break etcd leadership intervals out of pod logs section in chart

View the Description View the linked PRs

Today these are in isPodLog in the javascript, we'd like them in their own section, preferably charted very close to the node update section.

https://github.com/openshift/origin/pull/28441

Bug OCPBUGS-24924: Update 4.16 ose-ibmcloud-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-ibmcloud/pull/70

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/70

Bug OCPBUGS-37241: hypershift ignition server uses RHEL major version mismatched MCO binaries

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37222~~. The following is the description of the original issue:
—

https://github.com/openshift/hypershift/pull/4383

Bug OCPBUGS-39496: [AWS CAPI install] Network setting is not correct while install cluster into VPC which contains multi-CIDR subnets

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35054~~. The following is the description of the original issue:
—
Description of problem:

Create VPC and subnets with following configs [refer to attached CF template]:
Subnets (subnets-pair-default) in CIDR 10.0.0.0/16
Subnets (subnets-pair-134) in CIDR 10.134.0.0/16
Subnets (subnets-pair-190) in CIDR 10.190.0.0/16

Create cluster into subnets-pair-134, the bootstrap process fails [see attached log-bundle logs]:

level=debug msg=I0605 09:52:49.548166 	937 loadbalancer.go:1262] "adding attributes to load balancer" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="openshift-cluster-api-guests/yunjiang29781a-86-rvqd9" namespace="openshift-cluster-api-guests" name="yunjiang29781a-86-rvqd9" reconcileID="a9310bd5-acc7-4b01-8a84-e47139fc0d1d" cluster="openshift-cluster-api-guests/yunjiang29781a-86-rvqd9" attrs=[{"Key":"load_balancing.cross_zone.enabled","Value":"true"}]
level=debug msg=I0605 09:52:49.909861 	937 awscluster_controller.go:291] "Looking up IP address for DNS" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="openshift-cluster-api-guests/yunjiang29781a-86-rvqd9" namespace="openshift-cluster-api-guests" name="yunjiang29781a-86-rvqd9" reconcileID="a9310bd5-acc7-4b01-8a84-e47139fc0d1d" cluster="openshift-cluster-api-guests/yunjiang29781a-86-rvqd9" dns="yunjiang29781a-86-rvqd9-int-19a9485653bf29a1.elb.us-east-2.amazonaws.com"
level=debug msg=I0605 09:52:53.483058 	937 reflector.go:377] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: forcing resync
level=debug msg=Fetching Bootstrap SSH Key Pair...

Checking security groups:
<infraid>-lb allows 10.0.0.0/16:6443 and 10.0.0.0/16:22623
<infraid>-apiserver-lb allows 10.0.0.0/16:6443 and 10.134.0.0/16:22623 (and 0.0.0.0/0:6443)

are these settings correct?

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-03-060250

How reproducible:

Always

Steps to Reproduce:

    1. Create subnets using attached CG template
    2. Create cluster into subnets which CIDR is 10.134.0.0/16
    3.

Actual results:

Bootstrap process fails.

Expected results:

Bootstrap succeeds.

Additional info:

No issues if creating cluster into subnets-pair-default (10.0.0.0/16)
No issues if only one CIDR in VPC, e.g. set VpcCidr to 10.134.0.0/16 in https://github.com/openshift/installer/blob/master/upi/aws/cloudformation/01_vpc.yaml

https://github.com/openshift/installer/pull/8953

Bug OCPBUGS-32519: Agent appliance installs are broken

View the Description View the linked PRs

Since https://github.com/openshift/installer/pull/8093 merged, CI jobs for the agent appliance have been broken. It appears that the agent-register-cluster.service is no longer getting enabled.

https://github.com/openshift/installer/pull/8297

Bug OCPBUGS-30574: 4.16+ HCP clusters are using default catalog sources of v4.14

View the Description View the linked PRs

Description of problem:

Hosted control plane clusters of OCP 4.16 are using default catalog sources (redhat-operators, certified-operators, community-operators and redhat-marketplace) pointing to the 4.14, thus 4.16 operators are not available and this can't be updated from within the guest.

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

Always

Steps to Reproduce:

1. check the .spec.image of the default catalog sources in openshift-marketplace namespace.

Actual results:

the default catalogs are pointing to :v4.14

Expected results:

they should point to :v4.16 instead

Additional info:

https://github.com/openshift/hypershift/pull/3707

Bug OCPBUGS-24874: Update 4.16 ose-azure-file-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/45

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/45

Bug OCPBUGS-32425: Missing master nodes

View the Description View the linked PRs

Looking at recent CI metal-ipi CI jobs

Some of the boot strap failures seem to be because of master nodes failing to come up

Search https://search.dptools.openshift.org/?search=Got+0+worker+nodes%2C+%5B12%5D+master+nodes%2C&maxAge=336h&context=-1&type=build-log&name=metal-ipi&excludeName=&maxMatches=1&maxBytes=20971520&groupBy=none
43 results over the last 14 days

e.g.
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-upgrade-ovn-ipv6/1779842483996332032

level=error msg=ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 1 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).

https://github.com/openshift/installer/pull/8289

Bug OCPBUGS-24834: Need to bump api at oc to include the CloudCredential capability

View the Description View the linked PRs

Background:

CCO was made optional in OCP 4.15, see https://issues.redhat.com/browse/OCPEDGE-69. CloudCredential was introduced as a new capability to openshift/api. We need to bump api at oc to include the CloudCredential capability so oc adm release extract works correctly.

Description of problem:

Some relevant CredentialsRequests are not extracted by the following command: oc adm release extract --credentials-requests --included --install-config=install-config.yaml ...
where install-config.yaml looks like the following:
...
capabilities:
  baselineCapabilitySet: None
  additionalEnabledCapabilities:
  - MachineAPI
  - CloudCredential
platform:
  aws:
...

Logs:

...
I1209 19:57:25.968783 79037 extract.go:418] Found manifest 0000_50_cloud-credential-operator_05-iam-ro-credentialsrequest.yaml
I1209 19:57:25.968902 79037 extract.go:429] Excluding Group: "cloudcredential.openshift.io" Kind: "CredentialsRequest" Namespace: "openshift-cloud-credential-operator" Name: "cloud-credential-operator-iam-ro": unrecognized capability names: CloudCredential
...

https://github.com/openshift/oc/pull/1622

Bug OCPBUGS-42942: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2341

Bug OCPBUGS-25580: Update 4.16 ose-machine-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1190

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1190

Bug OCPBUGS-27473: Error in displaying BuildRun logs in Console

View the Description View the linked PRs

Description of problem:

BuildRun logs cannot be displayed in the console and shows the following error:

The buildrun is created and started using the shp cli (similar behavior is observed when the build is created & started via console/yaml too):

shp build create goapp-buildah \
    --strategy-name="buildah" \
    --source-url="https://github.com/shipwright-io/sample-go" \
    --source-context-dir="docker-build" \
    --output-image="image-registry.openshift-image-registry.svc:5000/demo/go-app"

The issue occurs on OCP 4.14.6. Investigation showed that this works correctly on OCP 4.14.5.

https://github.com/openshift/console/pull/13583

Bug OCPBUGS-32402: ovn-ipsec-host pod fails to configure cert on nss db

View the Description View the linked PRs

Description of problem:

It is noticed that ovs-monitor-ipsec fails to import cert into nss db with following error.

2024-04-17T19:57:21.140989157Z 2024-04-17T19:57:21Z |  6  | reconnect | INFO | unix:/var/run/openvswitch/db.sock: connecting...
2024-04-17T19:57:21.142234972Z 2024-04-17T19:57:21Z |  9  | reconnect | INFO | unix:/var/run/openvswitch/db.sock: connected
2024-04-17T19:57:21.170709468Z 2024-04-17T19:57:21Z |  14 | ovs-monitor-ipsec | INFO | Tunnel ovn-69b991-0 appeared in OVSDB
2024-04-17T19:57:21.171379359Z 2024-04-17T19:57:21Z |  16 | ovs-monitor-ipsec | INFO | Tunnel ovn-52bc87-0 appeared in OVSDB
2024-04-17T19:57:21.171826906Z 2024-04-17T19:57:21Z |  18 | ovs-monitor-ipsec | INFO | Tunnel ovn-3e78bb-0 appeared in OVSDB
2024-04-17T19:57:21.172300675Z 2024-04-17T19:57:21Z |  20 | ovs-monitor-ipsec | INFO | Tunnel ovn-12fb32-0 appeared in OVSDB
2024-04-17T19:57:21.172726970Z 2024-04-17T19:57:21Z |  22 | ovs-monitor-ipsec | INFO | Tunnel ovn-8a4d01-0 appeared in OVSDB
2024-04-17T19:57:21.178644919Z 2024-04-17T19:57:21Z |  24 | ovs-monitor-ipsec | ERR | Import cert and key failed.
2024-04-17T19:57:21.178644919Z b"No cert in -in file '/etc/openvswitch/keys/ipsec-cert.pem' matches private key\n80FBF36CDE7F0000:error:05800074:x509 certificate routines:X509_check_private_key:key values mismatch:crypto/x509/x509_cmp.c:405:\n"
2024-04-17T19:57:21.179581526Z 2024-04-17T19:57:21Z |  25 | ovs-monitor-ipsec | ERR | traceback
2024-04-17T19:57:21.179581526Z Traceback (most recent call last):
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 1382, in <module>
2024-04-17T19:57:21.179581526Z     main()
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 1369, in main
2024-04-17T19:57:21.179581526Z     monitor.run()
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 1176, in run
2024-04-17T19:57:21.179581526Z     if self.ike_helper.config_global(self):
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 521, in config_global
2024-04-17T19:57:21.179581526Z     self._nss_import_cert_and_key(cert, key, name)
2024-04-17T19:57:21.179581526Z   File "/usr/share/openvswitch/scripts/ovs-monitor-ipsec", line 809, in _nss_import_cert_and_key
2024-04-17T19:57:21.179581526Z     os.remove(path)
2024-04-17T19:57:21.179581526Z FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ovs_certkey_ef9cf1a5-bfb2-4876-8fb3-69c6b22561a2.p12'

Version-Release number of selected component (if applicable):

 4.16.0

How reproducible:

Hit on the CI: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/50690/rehearse-50690-pull-ci-openshift-cluster-network-operator-master-e2e-ovn-ipsec-step-registry/1780660589492703232

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

openshift-install failed with error:

time="2024-04-17T19:34:47Z" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
time="2024-04-17T19:34:47Z" level=error msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Cluster operator authentication is degraded\n* Cluster operators monitoring, openshift-apiserver are not available"

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/50690/rehearse-50690-pull-ci-openshift-cluster-network-operator-master-e2e-ovn-ipsec-step-registry/1780660589492703232/artifacts/e2e-ovn-ipsec-step-registry/ipi-install-install/artifacts/.openshift_install-1713382487.log

Expected results:

Cluster must come up COs running with IPsec enabled for EW traffic.

Additional info:

It seems like ovn-ipsec-host pod's ovn-keys init container write empty content into /etc/openvswitch/keys/ipsec-cert.pem though corresponding csr request containing certificate in its status.

https://github.com/openshift/cluster-network-operator/pull/2342

Bug OCPBUGS-33617: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/144

Bug MGMT-17367: Assisted not creating ICSPs for spoke versions that do not support IDMS

View the Description View the linked PRs

Description of the problem:

When installing a spoke cluster earlier that 4.14 with a mirror registry config, assisted does not create the required ImageContentSourcePolicy needed to pull images from a custom registry.

How reproducible:

4/4

Steps to reproduce:

1. Install 4.12 spoke cluster with ACM 2.10 using a mirror registry config

Actual results:

Spoke installation fails because master can not pull images needed to run assisted-installer-controller

Expected results:

ICSP created and installation finishes successfully

https://github.com/openshift/assisted-service/pull/6125

Bug OCPBUGS-23925: VPAs from different projects are shown under one deployment "Resources" tab

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-32393: multus doesn't publish MTU in network-status from cni result

View the Description View the linked PRs

Description of problem:

We need to make a d/s sync with the u/s multus to support the expose of MTU in the network-status annotation.

The PR was merged u/s https://github.com/k8snetworkplumbingwg/multus-cni/pull/1250

https://github.com/openshift/multus-cni/pull/228

Bug OCPBUGS-20097: tests image is still on RHEL 8

View the Description View the linked PRs

Description of problem:

Recently we bumped the hyperkube image [1] to use both RHEL 9 builder and base images.

In order to keep things consistent, we tried to do the same with the "tests" image [2], however, that was not possible because there is currently no "tools" image on RHEL 9. The "tests" image uses "tools" as the base image.

As a result, we decided to keep builder & base images for "tests" in RHEL 8, as this work was not required for the kube 1.28 bump nor the FIPS issue we were addressing.

However, for the sake of consistency, eventually it'd be good to bump the "tests" builder image to RHEL 9. This would also require us to create a "tools" image based on RHEL 9.

[1] https://github.com/openshift/kubernetes/blob/6ab54b8d9a0ea02856efd3835b6f9df5da9ce115/openshift-hack/images/hyperkube/Dockerfile.rhel#L1[2] https://github.com/openshift/kubernetes/blob/master/openshift-hack/images/tests/Dockerfile.rhel#L1

[2] https://github.com/openshift/kubernetes/blob/master/openshift-hack/images/tests/Dockerfile.rhel#L1

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

"tests" image is build and based on a RHEL 8 image.

Expected results:

"tests" image is build and based on a RHEL 9 image.

Additional info:

Bug OCPBUGS-37048: The option "Auto deploy when new image is available" becomes unchecked when editing a deployment from web console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36339~~. The following is the description of the original issue:
—
Description of problem:

The option "Auto deploy when new image is available" becomes unchecked when editing a deployment from web console

Version-Release number of selected component (if applicable):

4.15.17

How reproducible:

100%

Steps to Reproduce:

1. Goto Workloads --> Deployments --> Edit Deployment --> Under Images section --> Tick the option "Auto deploy when new Image is available" and now save deployment.
2. Now again edit the deployment and observe that the option "Auto deploy when new Image is available" is unchecked.
3. Same test work fine in 4.14 cluster.

Actual results:

Option "Auto deploy when new Image is available" is in unchecked state.

Expected results:

Option "Auto deploy when new Image is available" remains in checked state.

Additional info:

https://github.com/openshift/console/pull/14060

Bug OCPBUGS-24949: Update 4.16 ose-nutanix-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/58

Bug OCPBUGS-25670: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2038

Bug OCPBUGS-33989: Updating the secrets using Form editor displays an unknown warning message

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33636~~. The following is the description of the original issue:
—
Updating the secrets using Form editor displays the an unknown warning message. This is caused due to incorrect request object sent to server in edit Secret form.

Slack: https://redhat-internal.slack.com/archives/C6A3NV5J9/p1715693990795919?thread_ts=1715685364.476189&cid=C6A3NV5J9

Description of problem:

Version-Release number of selected component (if applicable):

    Version4.16 - Always

How reproducible:

Steps to Reproduce:

    1. Goto Edit Secret form editor
    2. Click Save 
    The warning notification is triggered because of incorrect request object

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13873

Bug OCPBUGS-24404: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-37485: PSI causing latency issues

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37271~~. The following is the description of the original issue:
—
Description of problem:

In debugging recent cyclictest issues on OCP 4.16 (5.14.0-427.22.1.el9_4.x86_64+rt kernel), we have discovered that the "psi=1" kernel cmdline argument, which is now added by default due to cgroupsv2 being enabled, is causing latency issues (both cyclictest and timerlat are failing to meet the latency KPIs we commit to for Telco RAN DU deployments). See RHEL-42737 for reference.

Version-Release number of selected component (if applicable):

OCP 4.16

How reproducible:

Cyclictest and timerlat consistently fail on long duration runs (e.g. 12 hours).

Steps to Reproduce:

    1. Install OCP 4.16 and configure with the Telco RAN DU reference configuration.
    2. Run a long duration cyclictest or timerlat test

Actual results:

Maximum latencies are detected above 20us.

Expected results:

All latencies are below 20us.

Additional info:

See RHEL-42737 for test results and debugging information. This was originally suspected to be an RHEL issue, but it turns out that PSI is being enabled by OpenShift code (which adds psi=1 to the kernel cmdline).

https://github.com/openshift/machine-config-operator/pull/4486

Bug OCPBUGS-29580: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/572

Bug OCPBUGS-29220: cluster install failed with azure workload identity

View the Description View the linked PRs

Description of problem:

Install cluster with azure workload identity against 4.16 nightly build, failed as some co are degraded.
$ oc get co | grep -v "True        False         False"
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.16.0-0.nightly-2024-02-07-200316   False       False         True       153m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jima416a1.qe.azure.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jima416a1.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console                                    4.16.0-0.nightly-2024-02-07-200316   False       True          True       141m    DeploymentAvailable: 0 replicas available for console deployment...
ingress                                                                         False       True          True       137m    The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)

Ingress LB public IP is pending to be created
$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.199.169   <pending>     80:32007/TCP,443:30229/TCP   154m
router-internal-default   ClusterIP      172.30.112.167   <none>        80/TCP,443/TCP,1936/TCP      154m


Detected that CCM pod is CrashLoopBackOff with error
$ oc get pod -n openshift-cloud-controller-manager
NAME                                              READY   STATUS             RESTARTS         AGE
azure-cloud-controller-manager-555cf5579f-hz6gl   0/1     CrashLoopBackOff   21 (2m55s ago)   160m
azure-cloud-controller-manager-555cf5579f-xv2rn   0/1     CrashLoopBackOff   21 (15s ago)     160m

error in ccm pod:
I0208 04:40:57.141145       1 azure.go:931] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000
I0208 04:40:57.141193       1 azure_auth.go:86] azure: using workload identity extension to retrieve access token
I0208 04:40:57.141290       1 azure_diskclient.go:68] Azure DisksClient using API version: 2022-07-02
I0208 04:40:57.141380       1 azure_blobclient.go:73] Azure BlobClient using API version: 2021-09-01
F0208 04:40:57.141471       1 controllermanager.go:314] Cloud provider azure could not be initialized: could not init cloud provider azure: no token file specified. Check pod configuration or set TokenFilePath in the options

Version-Release number of selected component (if applicable):

4.16 nightly build

How reproducible:

Always

Steps to Reproduce:

    1. Install cluster with azure workload identity
    2.
    3.

Actual results:

    Installation failed due to some operators are degraded

Expected results:

    Installation is successful.

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/332

Bug OCPBUGS-35395: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13970

Story WRKLDS-1076: oc idle changes for Endpoint resource modification

View the Description View the linked PRs

Changes to oc idle needed to support the elimination of this carry - https://github.com/openshift/kubernetes/commit/bd2d0db195d?w=1

More details here - https://redhat-internal.slack.com/archives/C065R4NCLGM/p1701252429658919

Bug OCPBUGS-25610: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-38112: [release-4.16] Directly mutating links in useMemo may not result in re-render

View the Description View the linked PRs

Description of problem:

See https://github.com/openshift/console/pull/14030/files/0eba7f7db6c35bbf7bca5e0b8eebd578e47b15cc#r1707020700

https://github.com/openshift/console/pull/14117

Bug OCPBUGS-25764: OVN remove explicit memory-trim-on-compaction enable

View the Description View the linked PRs

Description of problem:

From ~~OCPBUGS-237~~ we discussed disabled memory-trim-on-compaction once it was enabled by default.

On OCP with 4.15.0-0.nightly-2023-12-19-033450 we are at

Red Hat Enterprise Linux CoreOS 415.92.202312132107-0 (Plow)
openvswitch3.1-3.1.0-59.el9fdp.x86_64

This should have memory-trim-on-compaction enabled by default

v3.0.0 - 15 Aug 2022
--------------------
     * Returning unused memory to the OS after the database compaction is now
       enabled by default.  Use 'ovsdb-server/memory-trim-on-compaction off'
       unixctl command to disable.

https://github.com/openvswitch/ovs/commit/e773140ec3f6d296e4a3877d709fb26fb51bc6ee

If we are enabled by default, we should remove the enable loop.

      # set trim-on-compaction
      if ! retry 60 "trim-on-compaction" "ovn-appctl -t ${nbdb_ctl} --timeout=5 ovsdb-server/memory-trim-on-compaction on"; then
        exit 1
      fi

https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/common/008-script-lib.yaml#L314

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-19-033450

How reproducible:

Always

Steps to Reproduce:

1. check if memory-trim-on-compaction is enabled by default in OVS

2. check ndbd log files for

Actual results:

2023-12-20T18:12:47.053444489Z 2023-12-20T18:12:47.053Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-12-20T18:12:49.001580092Z 2023-12-20T18:12:49.001Z|00003|ovsdb_server|INFO|memory trimming after compaction enabled.

Expected results:

memory-trim-on-compaction should be enabled by default, we don't need to re-enable it.

Affected Platforms:

All

https://github.com/openshift/cluster-network-operator/pull/2260

Bug OCPBUGS-28539: capi-ibmcloud-controller-manager ContainerCreating shouldn't happen on IBMCloud

View the Description View the linked PRs

Description of problem:

Pod capi-ibmcloud-controller-manager stuck in ContainerCreating on IBM cloud

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Built a cluster on ibm cloud and enable TechPreviewNoUpgrade
    2.
    3.

Actual results:

4.16 cluster
$ oc get po                        
NAME                                                READY   STATUS              RESTARTS      AGE
capi-controller-manager-6bccdc844-jsm4s             1/1     Running             9 (24m ago)   175m
capi-ibmcloud-controller-manager-75d55bfd7d-6qfxh   0/2     ContainerCreating   0             175m
cluster-capi-operator-768c6bd965-5tjl5              1/1     Running             0             3h

  Warning  FailedMount       5m15s (x87 over 166m)  kubelet            MountVolume.SetUp failed for volume "credentials" : secret "capi-ibmcloud-manager-bootstrap-credentials" not found

$ oc get clusterversion               
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-01-21-154905   True        False         156m    Cluster version is 4.16.0-0.nightly-2024-01-21-154905

4.15 cluster
$ oc get po                           
NAME                                                READY   STATUS              RESTARTS        AGE
capi-controller-manager-6b67f7cff4-vxtpg            1/1     Running             6 (9m51s ago)   35m
capi-ibmcloud-controller-manager-54887589c6-6plt2   0/2     ContainerCreating   0               35m
cluster-capi-operator-7b7f48d898-9r6nn              1/1     Running             1 (17m ago)     39m
$ oc get clusterversion           
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-22-160236   True        False         11m     Cluster version is 4.15.0-0.nightly-2024-01-22-160236

Expected results:

No pod is in ContainerCreating status

Additional info:

must-gather: https://drive.google.com/file/d/1F5xUVtW-vGizAYgeys0V5MMjp03zkSEH/view?usp=sharing

https://github.com/openshift/cluster-capi-operator/pull/157

Bug OCPBUGS-35373: InstallPlan fails with "updated validation is too restrictive" when multiple CRD versions are served

View the Description View the linked PRs

Description of problem:

InstallPlan fails with "updated validation is too restrictive" when:

* Previous CRs and CRDs exist, and 
* Multiple CRD versions are served (ex. v1alpha1 and v1alpha2)

Version-Release number of selected component (if applicable):

This is reproducible on the OpenShift 4.15.3 rosa cluster, and not reproducible on 4.14.15 or 4.13.

How reproducible:

Always

Steps to Reproduce:

1.Create the following catalogsource and subscription
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: devworkspace-operator-catalog
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/devfile/devworkspace-operator-index:release
  publisher: Red Hat
  displayName: DevWorkspace Operator Catalog
  updateStrategy:
    registryPoll:
      interval: 5m
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  namespace: openshift-operators
  name: devworkspace-operator
spec:
  channel: fast
  installPlanApproval: Manual
  name: devworkspace-operator
  source: devworkspace-operator-catalog
  sourceNamespace: openshift-marketplace

2. Approve the installplan

3. Create a CR instance (DevWorkspace CR):
$ curl https://raw.githubusercontent.com/devfile/devworkspace-operator/main/samples/empty.yaml | kubectl apply -f - 

4. Delete the subscription and csv
$ oc project openshift-operators
$ oc delete sub devworkspace-operator
$ oc get csv
$ oc delete csv devworkspace-operator.v0.26.0 

5. Create the subscription from step 1 again, and approve the installplan

6. View the "updated validation is too restrictive" error in the installplan's status.conditions:
---
error validating existing CRs against new CRD's schema for "devworkspaces.workspace.devfile.io": error validating workspace.devfile.io/v1alpha1, Kind=DevWorkspace "openshift-operators/empty-devworkspace": updated validation is too restrictive: [].status.workspaceId: Required value
---

Actual results:

InstallPlan fails and the operator is not installed

Expected results:

InstallPlan succeeds

Additional info:

For this specific scenario, a workaround is to temporarily un-serve the v1alpha1 version before approving the installplan:


$ oc patch crd devworkspacetemplates.workspace.devfile.io --type='json' -p='[{"op": "replace", "path": "/spec/versions/0/served", "value": false}]'
$ oc patch crd devworkspaces.workspace.devfile.io --type='json' -p='[{"op": "replace", "path": "/spec/versions/0/served", "value": false}]'

Another workaround is to delete the existing CR before approving the new installplan.

https://github.com/openshift/operator-framework-olm/pull/781

Bug OCPBUGS-35850: Quick Start "next" button requires double click to move to next step

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25929~~. The following is the description of the original issue:
—

Description of problem:

In Quick Start guided tour, user needs to click "next" button two times for moving forward to next step. If you skip the alert(Yes/No input) and click the "next" button it won't work.

Steps to Reproduce

Open any Quick Start from quick start catalog page
Click Start
Click "yes" for the "Check your work Alert" on the first step
Click Next button to go to second step
Skip the "Check your work Alert" and click Next Button

Actual results:

The next button don't respond for first click

Expected results:

The next button should navigate to next step whether the user has answered the Alert message or not.

Reproducibility (Always/Intermittent/Only Once): Always

https://github.com/openshift/console/pull/13989

Bug OCPBUGS-24988: Update 4.16 ose-etcd-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/etcd/pull/236

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/etcd/pull/236

Bug OCPBUGS-25462: Number of configured control plane replicas should be validated

View the Description View the linked PRs

The number of control plane replicas defined in install-config.yaml (or agent-cluster-install.yaml) should be validated to check its set to 3, or 1 in the case of SNO. If set to another value the "create image" command should fail.

We recently had a case where the number of replicas was set to 2 and the installation failed. It would be good to catch this misconfiguration prior to the install.

https://github.com/openshift/installer/pull/8082

Bug MGMT-16739: [STG][BE][Nutanix] CNV and MCE should be disabled when select platform Nutanix

View the Description View the linked PRs

Description of the problem:

Up to latest decision RH won't going to support installation OCP cluster on Nutanix with

nested virtualization. Thus the check box "Install OpenShift Virtualization" on page "Operators" should be disabled when select platform "Nutanix" on page "Cluster Details"

Slack discussion thread

https://redhat-internal.slack.com/archives/C0211848DBN/p1706640683120159

Nutanix
https://portal.nutanix.com/page/documents/kbs/details?targetId=kA00e000000XeiHCAS

https://github.com/openshift/assisted-service/pull/5941

Bug OCPBUGS-34693: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-33531~~.

https://github.com/openshift/installer/pull/8517

Bug OCPBUGS-11624: ose-cluster-image-registry-operator-container: cluster-image-registry-operator: Minimize wildcard/privilege Usage in Cluster and Local Roles [openshift-4]

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/964

Bug OCPBUGS-38900: [Documentation] Errors reported by tuned when using SecureBoot

View the Description View the linked PRs

Description of problem:

When using SecureBoot tuned reports the following error as debugfs access is restricted:

tuned.utils.commands: Writing to file '/sys/kernel/debug/sched/migration_cost_ns' error: '[Errno 1] Operation not permitted: '/sys/kernel/debug/sched/migration_cost_ns''
tuned.plugins.plugin_scheduler: Error writing value '5000000' to 'migration_cost_ns'

This issue has been reported with the following tickets:

As this is a confirmed limitation of the NTO due to the TuneD component, we should document this as a limitation in the OpenShift Docs:
https://docs.openshift.com/container-platform/4.16/nodes/nodes/nodes-node-tuning-operator.html

Expected Outcome:

Document that the NTO cannot leverage some of the Tuned features when secureboot is enabled.

https://github.com/openshift/cluster-node-tuning-operator/pull/1203

Bug OCPBUGS-26216: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8178

Bug OCPBUGS-30540: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/156

Bug OCPBUGS-27965: installation failing if proxy set with % character in the credentials

View the Description View the linked PRs

Description of problem:

    If a cluster is installed using proxy and the username used for connecting to the proxy contains the characters "%40" for encoding a "@" in case of providing a doamin, the instalation fails. The failure is because the proxy variables implemented in the file "/etc/systemd/system.conf.d/10-default-env.conf" in the bootstrap node are ignored by systemd. This issue seems was already fixed in MCO (BZ 1882674 - fixed in RHOCP 4.7), but looks like is affecting the bootstrap process in 4.13 and 4.14, causing the installation to not start at all.

Version-Release number of selected component (if applicable):

    4.14, 4.13

How reproducible:

    100% always

Steps to Reproduce:

    1. create a install-config.yaml file with "%40" in the middle of the username used for proxy.
    2. start cluster installation.
    3. bootstrap will fail for not using proxy variables.

Actual results:

Installation fails because systemd fails to load the proxy varaibles if "%" is present in the username.

Expected results:

    Installation to succeed using a username with "%40" for the proxy.

Additional info:

File "/etc/systemd/system.conf.d/10-default-env.conf" for the bootstrap should be generated in a way accepted by systemd.

https://github.com/openshift/installer/pull/8265

Bug OCPBUGS-29888: [2186372] Packet drops during the initial phase of VM live migration

View the Description View the linked PRs

Description of problem:

Following https://issues.redhat.com/browse/CNV-28040
On CNV, when virtual machine, with secondary interfaces connected with bridge CNI, is live migrated we observe disruption at the VM inbound traffic.

The root cause for it is the migration target bridge interface advertise before the migration is completed.

When the migration destination pod is created an IPv6 NS (Neighbor Solicitation)
and NA (Neighbor Advertisement) are sent automatically by the kernel.
The switches at the endpoints (e.g.: migration destination node) tables
get updated and the traffic is forwarded to the migration destination before
the migration is completed [1].

The solution is to have the bridge CNI create the pod interface in "link-down" state [2], the IPv6 NS/NA packets are avoided, CNV in turn, set the pod interface to "link-up" [3].

CNV depends on bridge CNI with [2] bits, which is deployed by cluster-network-operator.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2186372#c6
[2] https://github.com/kubevirt/kubevirt/pull/11069
[3] https://github.com/containernetworking/plugins/pull/997

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

100%

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

CNO deploys CNI bridge w/o an option to set the bridge interface down.

Expected results:

CNO to deploy bridge CNI with [1] changes, from release-4.16 branch. 

[1] https://github.com/containernetworking/plugins/pull/997

Additional info:

More https://issues.redhat.com/browse/CNV-28040

https://github.com/openshift/containernetworking-plugins/pull/154

Bug OCPBUGS-35929: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4420

Bug OCPBUGS-22910: The ovs-if-br-ex.nmconnection.J1K8B2 like files breaks ovs-configuration.service

View the Description View the linked PRs

Description of problem:

The ovs-if-br-ex.nmconnection.J1K8B2 like files breaks ovs-configuration.service. Deleting the file fixes the issue.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4016

Bug OCPBUGS-25760: Cannot change default network type when not doing migration

View the Description View the linked PRs

Description of problem:

During live OVN migration, network operator show the error message: Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create 4.15 nightly SDN ROSA cluster
2. oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation
3. oc edit featuregate cluster to enable featuregates 
4. Wait for all node rebooting and back to normal
5. oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'

Actual results:

[weliang@weliang ~]$ oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation[weliang@weliang ~]$ oc edit featuregate cluster[weliang@weliang ~]$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'network.config.openshift.io/cluster patched[weliang@weliang ~]$ [weliang@weliang ~]$ oc get co networkNAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGEnetwork   4.15.0-0.nightly-2023-12-18-220750   True        False         True       105m    Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.[weliang@weliang ~]$ oc describe Network.config.openshift.io clusterName:         clusterNamespace:    Labels:       <none>Annotations:  network.openshift.io/network-type-migration: API Version:  config.openshift.io/v1Kind:         NetworkMetadata:  Creation Timestamp:  2023-12-20T15:13:39Z  Generation:          3  Resource Version:    119899  UID:                 6a621b88-ac4f-4918-a7f6-98dba7df222cSpec:  Cluster Network:    Cidr:         10.128.0.0/14    Host Prefix:  23  External IP:    Policy:  Network Type:  OVNKubernetes  Service Network:    172.30.0.0/16Status:  Cluster Network:    Cidr:               10.128.0.0/14    Host Prefix:        23  Cluster Network MTU:  8951  Network Type:         OpenShiftSDN  Service Network:    172.30.0.0/16Events:  <none>[weliang@weliang ~]$ oc describe Network.operator.openshift.io clusterName:         clusterNamespace:    Labels:       <none>Annotations:  <none>API Version:  operator.openshift.io/v1Kind:         NetworkMetadata:  Creation Timestamp:  2023-12-20T15:15:37Z  Generation:          275  Resource Version:    120026  UID:                 278bd491-ac88-4038-887f-d1defc450740Spec:  Cluster Network:    Cidr:         10.128.0.0/14    Host Prefix:  23  Default Network:    Openshift SDN Config:      Enable Unidling:          true      Mode:                     NetworkPolicy      Mtu:                      8951      Vxlan Port:               4789    Type:                       OVNKubernetes  Deploy Kube Proxy:            false  Disable Multi Network:        false  Disable Network Diagnostics:  false  Kube Proxy Config:    Bind Address:      0.0.0.0  Log Level:           Normal  Management State:    Managed  Observed Config:     <nil>  Operator Log Level:  Normal  Service Network:    172.30.0.0/16  Unsupported Config Overrides:  <nil>  Use Multi Network Policy:      falseStatus:  Conditions:    Last Transition Time:  2023-12-20T15:15:37Z    Status:                False    Type:                  ManagementStateDegraded    Last Transition Time:  2023-12-20T16:58:58Z    Message:               Not applying unsafe configuration change: invalid configuration: [cannot change default network type when not doing migration]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.    Reason:                InvalidOperatorConfig    Status:                True    Type:                  Degraded    Last Transition Time:  2023-12-20T15:15:37Z    Status:                True    Type:                  Upgradeable    Last Transition Time:  2023-12-20T16:52:11Z    Status:                False    Type:                  Progressing    Last Transition Time:  2023-12-20T15:15:45Z    Status:                True    Type:                  Available  Ready Replicas:          0  Version:                 4.15.0-0.nightly-2023-12-18-220750Events:                    <none>[weliang@weliang ~]$ oc get clusterversionNAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUSversion   4.15.0-0.nightly-2023-12-18-220750   True        False         84m     Error while reconciling 4.15.0-0.nightly-2023-12-18-220750: the cluster operator network is degraded[weliang@weliang ~]$

Expected results:

Migration success

Additional info:

Get same error message from ROSA and GCP cluster.

https://github.com/openshift/cluster-network-operator/pull/2179

Bug OCPBUGS-30091: TestHostNetworkPort is half serial and half parallel

View the Description View the linked PRs

Description of problem

CI is flaky because the TestHostNetworkPort test fails:

=== NAME  TestAll/serial/TestHostNetworkPortBinding
    operator_test.go:1034: Expected conditions: map[Admitted:True Available:True DNSManaged:False DeploymentReplicasAllAvailable:True LoadBalancerManaged:False]
         Current conditions: map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:False DeploymentReplicasMinAvailable:True DeploymentRollingOut:True EvaluationConditionsDetected:False LoadBalancerManaged:False LoadBalancerProgressing:False Progressing:True Upgradeable:True]
    operator_test.go:1034: Ingress Controller openshift-ingress-operator/samehost status: {
          "availableReplicas": 0,
          "selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=samehost",
          "domain": "samehost.ci-op-xlwngvym-43abb.origin-ci-int-aws.dev.rhcloud.com",
          "endpointPublishingStrategy": {
            "type": "HostNetwork",
            "hostNetwork": {
              "protocol": "TCP",
              "httpPort": 9080,
              "httpsPort": 9443,
              "statsPort": 9936
            }
          },
          "conditions": [
            {
              "type": "Admitted",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "Valid"
            },
            {
              "type": "DeploymentAvailable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentAvailable",
              "message": "The deployment has Available status condition set to True"
            },
            {
              "type": "DeploymentReplicasMinAvailable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentMinimumReplicasMet",
              "message": "Minimum replicas requirement is met"
            },
            {
              "type": "DeploymentReplicasAllAvailable",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentReplicasNotAvailable",
              "message": "0/1 of replicas are available"
            },
            {
              "type": "DeploymentRollingOut",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentRollingOut",
              "message": "Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n"
            },
            {
              "type": "LoadBalancerManaged",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer",
              "message": "The configured endpoint publishing strategy does not include a managed load balancer"
            },
            {
              "type": "LoadBalancerProgressing",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "LoadBalancerNotProgressing",
              "message": "LoadBalancer is not progressing"
            },
            {
              "type": "DNSManaged",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "UnsupportedEndpointPublishingStrategy",
              "message": "The endpoint publishing strategy doesn't support DNS management."
            },
            {
              "type": "Available",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z"
            },
            {
              "type": "Progressing",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "IngressControllerProgressing",
              "message": "One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n)"
            },
            {
              "type": "Degraded",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z"
            },
            {
              "type": "Upgradeable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "Upgradeable",
              "message": "IngressController is upgradeable."
            },
            {
              "type": "EvaluationConditionsDetected",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "NoEvaluationCondition",
              "message": "No evaluation condition is detected."
            }
          ],
          "tlsProfile": {
            "ciphers": [
              "ECDHE-ECDSA-AES128-GCM-SHA256",
              "ECDHE-RSA-AES128-GCM-SHA256",
              "ECDHE-ECDSA-AES256-GCM-SHA384",
              "ECDHE-RSA-AES256-GCM-SHA384",
              "ECDHE-ECDSA-CHACHA20-POLY1305",
              "ECDHE-RSA-CHACHA20-POLY1305",
              "DHE-RSA-AES128-GCM-SHA256",
              "DHE-RSA-AES256-GCM-SHA384",
              "TLS_AES_128_GCM_SHA256",
              "TLS_AES_256_GCM_SHA384",
              "TLS_CHACHA20_POLY1305_SHA256"
            ],
            "minTLSVersion": "VersionTLS12"
          },
          "observedGeneration": 1
        }
    operator_test.go:1036: failed to observe expected conditions for the second ingresscontroller: timed out waiting for the condition
    operator_test.go:1059: deleted ingresscontroller samehost
    operator_test.go:1059: deleted ingresscontroller hostnetworkportbinding

This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1017/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1762147882179235840. Search.ci shows another failure: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/48873/rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi/1762576595890999296. The test has failed sporadically in the past, beyond what search.ci is able to search.

TestHostNetworkPort is marked as a serial test in TestAll and marked with t.Parallel() in the test itself. Not sure if this is what is causing a new failure seen in this test, but something is incorrect.

Version-Release number of selected component (if applicable)

The test failures have been observed recently on 4.16 as well as on 4.12 (https://github.com/openshift/cluster-ingress-operator/pull/828#issuecomment-1292888086) and 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/914#issuecomment-1526808286). The logic error was introduced in 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc).

How reproducible

The logic error is self-evident. The test failure is very rare. The failure has been observed sporadically over the past couple years. Presently, search.ci shows two failures, with the following impact, for the past 14 days:

rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi (all) - 3 runs, 33% failed, 100% of failures match = 33% impact

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 16 runs, 25% failed, 25% of failures match = 6% impact

Steps to Reproduce

N/A.

Actual results

The TestHostNetworkPort test fails. The test is marked as both serial and parallel.

Expected results

Test should be marked as either serial or parallel, and it should pass consistently.

Additional info

When TestAll was introduced, TestHostNetworkPortBinding was initially marked parallel in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc. After some discussion, it was moved to the serial list in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a449e497e35fafeecbee9ea656e0631393182f70, but the commit to remove t.Parallel() evidently got inadvertently dropped.

https://github.com/openshift/cluster-ingress-operator/pull/1032

Bug OCPBUGS-29713: Excessive node status updates causing high control plane CPU

View the Description View the linked PRs

Description of problem:

OCPBUGS-29424 revealed that setting the node status update frequency in kubelet (introduced with OCPBUGS-15583) causes a lot of control plane CPU. 

The reason is the increased frequency of kubelet node status updates will trigger second order effects in all control plane operators that usually trigger on node changes (api server, etcd, PDB guard pod controllers, or any other static pod based machinery).

Reverting the code in OCPBUGS-15583, or manually setting the report/status frequency to 0s causes the CPU to drop immediately.

Version-Release number of selected component (if applicable):

Versions where OCPBUGS-15583 was backported. This includes 4.16, 4.15.0, 4.14.8, 4.13.33, and the next 4.12.z likely 4.12.51.

How reproducible:

always

Steps to Reproduce:

1. create a cluster that contains a fix for OCPBUGS-15583
2. observe the apiserver metrics (eg rate(apiserver_request_total[5m])), those should show abnormal values for pod/configmap GET
    alternatively the rate of node updates is increaed (rate(apiserver_request_total{resource="nodes", subresource="status", verb="PATCH"}[1m]))

Actual results:

the node status updates every 10s, which causes high CPU usage on control plane operators and apiserver

Expected results:

the node status should not update that frequently, meaning the control plane CPU usage should go down again

Additional info:

slack thread with the node team:
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1708429189987849

https://github.com/openshift/machine-config-operator/pull/4204

Bug OCPBUGS-18940: On-prem keepalived check scripts do not take machine-config-server into consideration

View the Description View the linked PRs

Description of problem:

Check scripts for the on-premise keepalived static pods only check the haproxy, which only directs to kube-apiserver pod. They do not take into consideration whether the control plane node has a healthy machine-config-server.

This may be a problem because, in a failure scenario, it may be required to rebuild nodes and machine-config-server is required for that (so that ignitions are provided).

One example is the etcd restore procedure (https://docs.openshift.com/container-platform/4.12/backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html). In our case, the following happened (I'd suggest reading the recovery procedure before this sequence of events):
- Machine config server was healthy in the recovery control plane node but not in the other hosts. 
- At this point, we can only guarantee the health of the recovery control plane node because the non-recovery ones are to be replaced and must be removed first from the cluster (node objects deleted) so that OVN-Kubernetes control plane can work properly.
- The keepalived check scripts were succeeding in the non-recovery control plane nodes because their haproxy pods were up and running. That is fine from kube-apiserver point of view, actually, but does not take machine config server into consideration.
- As the machine-config-server was not reachable, provision of the new masters required by the procedure was impossible.

In parallel to this bug, I'll be raising another bug to improve the restore procedure. Basically, asking to stop the keepalived static pods on the non-recovery control plane nodes. This would prevent the exact situation above.

However, there are other situations where machine-config-server pods may be unhealthy and we should not just be manually stopping keepalived. In such cases, keepalived should take machine-config-server into consideration.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Under some failure scenarios, where machine-config-server is not healthy in one control plane node.

Steps to Reproduce:

1. Try to provision new machine for recovery.
2.
3.

Actual results:

Machine-config-server not serving because keepalived assigned the VIP to one node that doesn't have a working machine-config-server pod.

Expected results:

Keepalived to take machine-config-server health into consideration while doing failover.

Additional info:

Possible ideas to fix:
- Create a check script for the machine-config-server check. It may have less weight than the kube-apiserver ones.
- Include machine-config-server endpoint in the haproxy of the kube-apiservers.

https://github.com/openshift/machine-config-operator/pull/4129

Bug OCPBUGS-25532: Update 4.16 csi-provisioner-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-provisioner/pull/84

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-provisioner/pull/84

Bug OCPBUGS-43308: HCP unable to pull images from registries only accessible from worker nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43051~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42783. The following is the description of the original issue:
—
Context
Some ROSA HCP users host their own container registries (e.g., self-hosted Quay servers) that are only accessible from inside of their VPCs. This is often achieved through the use of private DNS zones that resolve non-public domains like quay.mycompany.intranet to non-public IP addresses. The private registries at those addresses then present self-signed SSL certificates to the client that can be validated against the HCP's additional CA trust bundle.

Problem Description
A user of a ROSA HCP cluster with a configuration like the one described above is encountering errors when attempting to import a container image from their private registry into their HCP's internal registry via oc import-image. Originally, these errors showed up in openshift-apiserver logs as DNS resolution errors, i.e., ~~OCPBUGS-36944~~. After the user upgraded their cluster to 4.14.37 (which fixes ~~OCPBUGS-36944~~), openshift-apiserver was able to properly resolve the domain name but complains of HTTP 502 Bad Gateway errors. We suspect these 502 Bad Gateway errors are coming from the Konnectivity-agent while it proxies traffic between the control and data planes.

We've confirmed that the private registry is accessible from the HCP data plane (worker nodes) and that the certificate presented by the registry can be validated against the cluster's additional trust bundle. IOW, curl-ing the private registry from a worker node returns a HTTP 200 OK, but doing the same from a control plane node returns a HTTP 502. Notably, this cluster is not configured with a cluster-wide proxy, nor does the user's VPC feature a transparent proxy.

Version-Release number of selected component
OCP v4.14.37

How reproducible
Can be reliably reproduced, although the network config (see Context above) is quite specific

Steps to Reproduce

Run the following command from the HCP data plane

oc import-image imagegroup/imagename:v1.2.3 --from=quay.mycompany.intranet/imagegroup/imagename:v1.2.3 --confirm

Observe the command output, the resulting ImageStream object, and openshift-apiserver logs

Actual Results

error: tag v1.2.3 failed: Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
imagestream.image.openshift.io/imagename imported with errors

Name:            imagename
Namespace:        mynamespace
Created:        Less than a second ago
Labels:            <none>
Annotations:        openshift.io/image.dockerRepositoryCheck=2024-10-01T12:46:02Z
Image Repository:    default-route-openshift-image-registry.apps.rosa.clustername.abcd.p1.openshiftapps.com/mynamespace/imagename
Image Lookup:        local=false
Unique Images:        0
Tags:            1

v1.2.3
  tagged from quay.mycompany.intranet/imagegroup/imagename:v1.2.3

  ! error: Import failed (InternalError): Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
      Less than a second ago

error: imported completed with errors

Expected Results
Desired container image is imported from private external image registry into cluster's internal image registry without error

https://github.com/openshift/hypershift/pull/4905

Bug OCPBUGS-25055: no detail log on signature verification failure

View the Description View the linked PRs

Description of problem:

    No detail failure on signature verification while failing to validate signature of the target release payload during upgrade. It's unclear for user to know which action could be taken for the failure. For example, checking if any wrong configmap set, or default store is not available or any issue on custom store?
 
# ./oc adm upgrade
Cluster version is 4.15.0-0.nightly-2023-12-08-202155
Upgradeable=False  

  Reason: FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade
  Message: Cluster operator config-operator should not be upgraded between minor versions: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates

ReleaseAccepted=False  
  Reason: RetrievePayload
  Message: Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat

Upstream: https://amd64.ocp.releases.ci.openshift.org/graph
Channel: stable-4.15
Recommended updates:  
  VERSION                            IMAGE
  4.15.0-0.nightly-2023-12-09-012410 registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7
 
# ./oc -n openshift-cluster-version logs cluster-version-operator-6b7b5ff598-vxjrq|grep "verified"|tail -n4
I1211 09:28:22.755834       1 sync_worker.go:434] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:22.755974       1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:37.817102       1 sync_worker.go:434] loadUpdatedPayload syncPayload err=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat
I1211 09:28:37.817488       1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.15.0-0.nightly-2023-12-09-012410" image="registry.ci.openshift.org/ocp/release@sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7" failure=The update cannot be verified: unable to verify sha256:0bc9978f420a152a171429086853e80f033e012e694f9a762eee777f5a7fb4f7 against keyrings: verifier-public-key-redhat

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-08-202155

How reproducible:

    always

Steps to Reproduce:

    1. trigger an fresh installation with tp enabled(no spec.signaturestores property set by default) 

    2.trigger an upgrade against a nightly build(no signature available in default signature store)

    3.

Actual results:

    no detail log on signature verification failure

Expected results:

    include detail failure on signature verification in the cvo log

Additional info:

    https://github.com/openshift/cluster-version-operator/pull/1003

https://github.com/openshift/cluster-version-operator/pull/1003

Bug OCPBUGS-43028: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/804

Bug MGMT-17384: When the function ops.GetEncapsulatedMC takes too long, the host-stage 'Writing Image to Disk' may time out

View the Description View the linked PRs

Description of the problem:

When the function ops.GetEncapsulatedMC takes too long, the host-stage 'Writing Image to Disk' may time out
This may be caused by timed out connection to API VIP to get the ignition

How reproducible:

When connection to bootstrap API VIP times out

Steps to reproduce:

Only artificially

Actual results:

When the problem happens, the host-stage 'Writing image to disk' host stage times out.

Expected results:

In case such problem happens, the host-stage shouldn't time out

https://github.com/openshift/assisted-installer/pull/807

Story AUTH-486: KRP: add tests to protect downstream use case

View the Description View the linked PRs

What

Add tests for the hardcoded authorizer.

Why

This feature is specific to OpenShift and not part of the upstream project. Therefore it would be good to have an actual E2E-test protect this feature from being destroyed by an upstream bump.

https://github.com/openshift/kube-rbac-proxy/pull/90

Bug OCPBUGS-43604: Allow from host network networkpolicies do not work during live migration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42605~~. The following is the description of the original issue:
—
Description of problem:

We are in a live migration scenario.

If a project has a networkpolicy to allow from the host network (more concretely, to allow from the ingress controllers and the ingress controllers are in the host network), traffic doesn't work during the live migration between any ingress controller node (either migrated or not migrated) and an already migrated application node.

I'll expand later in the description and internal comments, but the TL;DR is that the IPs of the tun0 of not migrated source nodes and the IPs of the ovn-k8s-mp0 from migrated source nodes are not added to the address sets related to the networkpolicy ACL in the target OVN-Kubernetes node, so that traffic is not allowed.

Version-Release number of selected component (if applicable):

4.16.13

How reproducible:

Always

Steps to Reproduce:

1. Before the migration: have a project with a networkpolicy that allows from the ingress controller and the ingress controller in the host network. Everything must work properly at this point.

2. Start the migration

3. During the migration, check connectivity from the host network of either a migrated node or a non-migrated node. Both will fail (checking from the same node doesn't fail)

Actual results:

Pod on the worker node is not reachable from the host network of the ingress controller node (unless the pod is in the same node than the ingress controller), which causes the ingress controller routes to throw 503 error.

Expected results:

Pod on the worker node to be reachable from the ingress controller node, even when the ingress controller node has not migrated yet and the application node has.

Additional info:

This is not a duplicate of OCPBUGS-42578. This bug refers to the host-to-pod communication path while the other one doesn't.

This is a customer issue. More details to be included in private comments for privacy.

Workaround: Creating a networkpolicy that explicitly allows traffic from tun0 and ovn-k8s-mp0 interfaces. However, note that the workaround can be problematic for clusters with hundreds or thousands of projects. Another possible workaround is to temporarily delete all the networkpolicies of the projects. But again, this may be problematic (and a security risk).

https://github.com/openshift/cluster-network-operator/pull/2538

Bug OCPBUGS-24436: Add probes to node-network-operator

View the Description View the linked PRs

Description of problem:

    The node-network-identity deployment should conform to hypershift control plane expectations that all applicable containers should have a liveness probe, and a readiness probe if it is an endpoint for a service.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    No liveness or readiness probes

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2148

Bug OCPBUGS-25526: Update 4.16 ironic-rhcos-downloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-rhcos-downloader/pull/95

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-rhcos-downloader/pull/95

Bug OCPBUGS-26069: cluster-monitoring-operator watches on metal-ipi are higher

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel].

Probability of significant regression: 98.46%

Sample (being evaluated) Release: 4.15
Start Time: 2023-12-29T00:00:00Z
End Time: 2024-01-04T23:59:59Z
Success Rate: 83.33%
Successes: 15
Failures: 3
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 98.36%
Successes: 120
Failures: 2
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=sdn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-01-04%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-29%2000%3A00%3A00&testId=openshift-tests%3A9ff4e9b171ea809e0d6faf721b2fe737&testName=%5Bsig-arch%5D%5BLate%5D%20operators%20should%20not%20create%20watch%20channels%20very%20often%20%5Bapigroup%3Aapiserver.openshift.io%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial

https://github.com/openshift/origin/pull/28498

Bug OCPBUGS-26767: Image registry operator does not support new PowerVS regions

View the Description View the linked PRs

Description of problem:

[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc get co/image-registry
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry             False       True          True       50m     Available: The deployment does not exist...
[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc describe co/image-registry
...
    Message:               Progressing: Unable to apply resources: unable to sync storage configuration: cos region corresponding to a powervs region wdc not found
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-ppc64le-2024-01-10-083055

How reproducible:

Always

Steps to Reproduce:

    1. Deploy a PowerVS cluster in wdc06 zone

Actual results:

See above error message

Expected results:

Cluster deploys

https://github.com/openshift/cluster-image-registry-operator/pull/987

Bug MGMT-16266: Staging: Indication event showing how often host has been rebooted missing on some nodes

View the Description View the linked PRs

Description of the problem:
Event showing how often host has been rebooted showing now only for part of the nodes

11/22/2023, 12:15:10 AM Node test-infra-cluster-57eb6989-master-1 has been rebooted 2 times before completing installation
11/22/2023, 12:00:01 AM Host: test-infra-cluster-57eb6989-master-1, reached installation stage Rebooting
11/21/2023, 11:53:14 PM Host: test-infra-cluster-57eb6989-worker-0, reached installation stage Rebooting
11/21/2023, 11:53:13 PM Host: test-infra-cluster-57eb6989-worker-1, reached installation stage Rebooting
11/21/2023, 11:34:56 PM Host: test-infra-cluster-57eb6989-master-0, reached installation stage Rebooting
11/21/2023, 11:34:26 PM Host: test-infra-cluster-57eb6989-master-2, reached installation stage Rebooting

in this cluster 4 events are missing

11/21/2023, 3:49:34 PM Node test-infra-cluster-164a0f73-master-0 has been rebooted 2 times before completing installation
11/21/2023, 3:49:32 PM Node test-infra-cluster-164a0f73-worker-0 has been rebooted 2 times before completing installation
11/21/2023, 3:49:32 PM Node test-infra-cluster-164a0f73-worker-1 has been rebooted 2 times before completing installation
11/21/2023, 3:37:15 PM Host: test-infra-cluster-164a0f73-master-0, reached installation stage Rebooting
11/21/2023, 3:27:34 PM Host: test-infra-cluster-164a0f73-worker-0, reached installation stage Rebooting
11/21/2023, 3:27:30 PM Host: test-infra-cluster-164a0f73-worker-1, reached installation stage Rebooting
11/21/2023, 3:09:40 PM Host: test-infra-cluster-164a0f73-master-2, reached installation stage Rebooting
11/21/2023, 3:09:35 PM Host: test-infra-cluster-164a0f73-master-1, reached installation stage Rebooting

in this cluster 2 events are missing

How reproducible:

Steps to reproduce:

1. create cluster

2. start installation

3.

Actual results:
some events are missing for the indication how often

Expected results:
for each host there should be indication evet

https://github.com/openshift/assisted-installer/pull/757

Bug OCPBUGS-34783: [4.16.z] SCC pinning for all workloads in platform namespaces

View the Description View the linked PRs

Backport of AUTH-482

https://github.com/openshift/console-operator/pull/908

Bug OCPBUGS-36969: [4.16] globalMaxSnapshotsPerBlockVolume not configurable

View the Description View the linked PRs

Description of problem:

Not able to configurable the `globalMaxSnapshotsPerBlockVolume`

- As per official doc this feature is configuration.
https://docs.openshift.com/container-platform/4.16/storage/container_storage_interface/persistent-storage-csi-vsphere.html#vsphere-change-max-snapshot_persistent-storage-csi-vsphere

Please see the below testing result.

Before Patch:

$ ./oc -n openshift-cluster-csi-drivers get cm/vsphere-csi-config -o yaml
apiVersion: v1
data:
  cloud.conf: |
    # Labels with topology values are added dynamically via operator
    [Global]
    cluster-id = ci-ln-1pd7szb-c1627-cbtd8

    [VirtualCenter "vcenter-1.ci.ibmc.devcluster.openshift.com"]
    insecure-flag           = true
    datacenters             = cidatacenter-2
    migration-datastore-url = ds:///vmfs/volumes/vsan:52eb63e99ce26f5b-b5ba4b2484169430/
kind: ConfigMap
metadata:
  creationTimestamp: "2024-07-12T04:54:31Z"
  name: vsphere-csi-config
  namespace: openshift-cluster-csi-drivers
  resourceVersion: "8172"
  uid: b1a4cf21-8416-4dc2-a3b5-2abe887dbe4f

Patch command:

$ ./oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 10}}}}'
Warning: unknown field "spec.driverConfig.vSphere.globalMaxSnapshotsPerBlockVolume"
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ ./oc -n openshift-cluster-csi-drivers get cm/vsphere-csi-config -o yaml
apiVersion: v1
data:
  cloud.conf: |
    # Labels with topology values are added dynamically via operator
    [Global]
    cluster-id = ci-ln-1pd7szb-c1627-cbtd8

    [VirtualCenter "vcenter-1.ci.ibmc.devcluster.openshift.com"]
    insecure-flag           = true
    datacenters             = cidatacenter-2
    migration-datastore-url = ds:///vmfs/volumes/vsan:52eb63e99ce26f5b-b5ba4b2484169430/
kind: ConfigMap
metadata:
  creationTimestamp: "2024-07-12T04:54:31Z"
  name: vsphere-csi-config
  namespace: openshift-cluster-csi-drivers
  resourceVersion: "8172"
  uid: b1a4cf21-8416-4dc2-a3b5-2abe887dbe4f

$ ./oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.2    True        False         11m     Cluster version is 4.16.2

Got same result with OCP v4.16.1

Also can you confirm if this is TP or GA feature?
As per the https://github.com/openshift/enhancements/blob/master/enhancements/storage/vsphere-driver-configuration.md?plain=1#L154
I see it still in enhancement and in TP
~~~

1. 1. Graduation Criteria| OpenShift | Maturity     |
    
    ----------- --------------
    
    4.16      Tech Preview
    
    4.17      GA
    
    ~~~

https://redhat-internal.slack.com/archives/CBQHQFU0N/p1720768379160209

https://github.com/openshift/cluster-storage-operator/pull/491

Bug OCPBUGS-37060: ca-bundle.crt is not injected in the global-ca configmaps from builds in HCP cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35905~~. The following is the description of the original issue:
—
Description of problem:

The builds installed in the hosted clusters are having issues to git-clone repositories from external URLs where their CA are configured in the ca-bundle.crt from trsutedCA section:

 spec:
    configuration:
      apiServer:
       [...]
      proxy:
        trustedCA:
          name: user-ca-bundle <---

In traditional OCP implementations, the *-global-ca configmap is installed in the same namespace from the build and the ca-bundle.crt is injected into this configmap. In hosted clusters the configmap is being created empty: 

$ oc get cm -n <app-namespace> <build-name>-global-ca  -oyaml
apiVersion: v1
data:
  ca-bundle.crt: ""


As mentioned, the user-ca-bundle has the certificates configured:

$ oc get cm -n openshift-config user-ca-bundle -oyaml
apiVersion: v1
data:
  ca-bundle.crt: |
    -----BEGIN CERTIFICATE----- <---

Version-Release number of selected component (if applicable):

How reproducible:

Easily

Steps to Reproduce:

1. Install hosted cluster with trustedCA configmap
2. Run a build in the hosted cluster
3. Check the global-ca configmap

Actual results:

global-ca is empty

Expected results:

global-ca injects the ca-bundle.crt properly

Additional info:

https://github.com/openshift/hypershift/pull/4366

Bug OCPBUGS-36556: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14354

Bug OCPBUGS-20487: Missing 'ping' executable file on s390x node in origin tests:[sig-network][Feature:EgressFirewall]

View the Description View the linked PRs

Description of problem:

1. [sig-network][Feature:EgressFirewall] egressFirewall should have no impact outside its namespace [Suite:openshift/conformance/parallel] 
2. [sig-network][Feature:EgressFirewall] when using openshift ovn-kubernetes should ensure egressfirewall is created [Suite:openshift/conformance/parallel]

The issue arises during the execution of the above tests and appears to be related to the image in use, specifically, the image located at https://quay.io/repository/redhat-developer/nfs-server?tab=tags&tag=1.1 (quay.io/redhat-developer/nfs-server:1.1). 
This image does not include the 'ping' executable for the s390x architecture, leading to the following error in the prow job logs:
...
msg: "Error running /usr/bin/oc --namespace=e2e-test-no-egress-firewall-e2e-6mg9v --kubeconfig=/tmp/configfile3768380277 exec dummy -- ping -c 1 8.8.8.8:\nStdOut>\ntime=\"2023-10-11T19:04:52Z\" level=error msg=\"exec failed: unable to start container process: exec: \\\"ping\\\": executable file not found in $PATH\"\ncommand terminated with exit code 255\nStdErr>\ntime=\"2023-10-11T19:04:52Z\" level=error msg=\"exec failed: unable to start container process: exec: \\\"ping\\\": executable file not found in $PATH\"\ncommand terminated with exit code 255\nexit status 255\n"
...

Our suggest fix: build new s390x image that contains ping binary.

Version-Release number of selected component (if applicable):

How reproducible:

The issue is reproducible when the test container (quay.io/redhat-developer/nfs-server:1.1) is scheduled on an s390x node, leading to test failures.

Steps to Reproduce:

1.Have a multi-arch cluster (x86 + s390x day2 worker node attached)
2.Execute the two tests
3.Try few times to make the pod assigned to s390x node

Actual results from prow job:

Run #0: Failed expand_less30s{  fail [github.com/openshift/origin/test/extended/networking/egress_firewall.go:70]: Unexpected error:
    <*fmt.wrapError | 0xc005924300>: 
    Error running /usr/bin/oc --namespace=e2e-test-no-egress-firewall-e2e-6r9zh --kubeconfig=/tmp/configfile3961753222 exec dummy -- ping -c 1 8.8.8.8:
    StdOut>
    time="2023-10-12T07:17:02Z" level=error msg="exec failed: unable to start container process: exec: \"ping\": executable file not found in $PATH"
    command terminated with exit code 255
    StdErr>
    time="2023-10-12T07:17:02Z" level=error msg="exec failed: unable to start container process: exec: \"ping\": executable file not found in $PATH"
    command terminated with exit code 255
    exit status 255
    
    {
        msg: "Error running /usr/bin/oc --namespace=e2e-test-no-egress-firewall-e2e-6r9zh --kubeconfig=/tmp/configfile3961753222 exec dummy -- ping -c 1 8.8.8.8:\nStdOut>\ntime=\"2023-10-12T07:17:02Z\" level=error msg=\"exec failed: unable to start container process: exec: \\\"ping\\\": executable file not found in $PATH\"\ncommand terminated with exit code 255\nStdErr>\ntime=\"2023-10-12T07:17:02Z\" level=error msg=\"exec failed: unable to start container process: exec: \\\"ping\\\": executable file not found in $PATH\"\ncommand terminated with exit code 255\nexit status 255\n",
        err: <*exec.ExitError | 0xc0059242e0>{
            ProcessState: {
                pid: 78611,
                status: 65280,
                rusage: {
                    Utime: {Sec: 0, Usec: 168910},
                    Stime: {Sec: 0, Usec: 60897},
                    Maxrss: 206428,
                    Ixrss: 0,
                    Idrss: 0,
                    Isrss: 0,
                    Minflt: 4199,
                    Majflt: 0,
                    Nswap: 0,
                    Inblock: 0,
                    Oublock: 0,
                    Msgsnd: 0,
                    Msgrcv: 0,
                    Nsignals: 0,
                    Nvcsw: 753,
                    Nivcsw: 149,
                },
            },
            Stderr: nil,
        },
    }
occurred
Ginkgo exit error 1: exit with code 1}

Expected results:

Passed

Additional info:

This issue pertains to a specific bug on the s390x architecture and additionally impacts the libvirt-s390x prow job.

https://github.com/openshift/origin/pull/28408

Bug OCPBUGS-24850: Update 4.16 openshift-enterprise-egress-router-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/157

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Important: ART has recorded in their product data that bugs for
this component should be opened against Jira project "OCPBUGS" and
component "Networking / Router". This project or component does not exist. Jira
should either be updated to include this component or @release-artists should be
notified of the proper mapping in the #forum-ocp-art Slack channel.

Component name: openshift-enterprise-egress-router-container .
Jira mapping: https://github.com/openshift-eng/ocp-build-data/blob/main/product.yml

https://github.com/openshift/images/pull/175

Bug OCPBUGS-24868: Update 4.16 ose-ovirt-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovirt-csi-driver/pull/132

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovirt-csi-driver/pull/132

Task RHOBS-1036: Harden telemeter integration tests

View the Description View the linked PRs

CI is flakey and causing issues for in-cluster team.

Integration tests need a clean-up and made more robust.

https://github.com/openshift/telemeter/pull/529

Task HOSTEDCP-1371: Bump Golang to v1.21

View the Description View the linked PRs

Bump Golang to v1.21 in main and hack/tools go.mod's.

https://github.com/openshift/hypershift/pull/3359

Bug OCPBUGS-25589: Update 4.16 ose-azure-cloud-node-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/101

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/101

Bug OCPBUGS-29575: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/556

Bug OCPBUGS-35924: status: add an estimated time to complete the control plane update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33897~~. The following is the description of the original issue:
—

=Control Plane Upgrade=
...
Completion: 45% (Est Time Remaining: 35m)
                ^^^^^^^^^^^^^^^^^^^^^^^^^

Do not worry too much about the precision, we can make this more precise in the future. I am thinking of
1. Assigning a fixed amount of time per CO remaining for COs that do not have daemonsets
2. Assign an amount of time proportional to # of workers to each remaining CO that has daemonsets (network, dns)
3. Assign a special amount of time proportional to # of workers to MCO

We can probably take into account the "how long are we upgrading this operator right now" exposed by CVO in ~~OTA-1160~~

https://github.com/openshift/oc/pull/1804

Bug OCPBUGS-27385: GCP machine-API provider permissions should support publicIP

View the Description View the linked PRs

Description of problem:

In a 4.16.0-ec.1 cluster, scaling up a MachineSet with publicIP:true fails with:

$ oc -n openshift-machine-api get -o json machines.machine.openshift.io | jq -r '.items[] | select(.status.phase == "Failed") | .status.providerStatus.conditions[].message' | sort  | uniq -c
      1 googleapi: Error 403: Required 'compute.subnetworks.useExternalIp' permission for 'projects/openshift-gce-devel-ci-2/regions/us-central1/subnetworks/ci-ln-q4d8y8t-72292-msmgw-worker-subnet', forbidden

Version-Release number of selected component

Seen in 4.16.0-ec.1. Not noticed in 4.15.0-ec.3. Fix likely needs a backport to 4.15 to catch up with ~~OCPBUGS-26406~~.

How reproducible

Seen in the wild in a cluster after updating from 4.15.0-ec.3 to 4.16.0-ec.1. Reproduced in Cluster Bot on the first attempt, so likely very reproducible.

Steps to Reproduce

launch 4.16.0-ec.1 gcp Cluster Bot cluster (logs).

$ oc adm upgrade
Cluster version is 4.16.0-ec.1

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.16 (available channels: candidate-4.16)
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.
$ oc -n openshift-machine-api get machinesets
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-q4d8y8t-72292-msmgw-worker-a   1         1         1       1           60m
ci-ln-q4d8y8t-72292-msmgw-worker-b   1         1         1       1           60m
ci-ln-q4d8y8t-72292-msmgw-worker-c   1         1         1       1           60m
ci-ln-q4d8y8t-72292-msmgw-worker-f   0         0                             60m
$ oc -n openshift-machine-api get -o json machinesets | jq -c '.items[].spec.template.spec.providerSpec.value.networkInterfaces' | sort | uniq -c
      4 [{"network":"ci-ln-q4d8y8t-72292-msmgw-network","subnetwork":"ci-ln-q4d8y8t-72292-msmgw-worker-subnet"}]
$ oc -n openshift-machine-api edit machineset ci-ln-q4d8y8t-72292-msmgw-worker-f  # add publicIP
$ oc -n openshift-machine-api get -o json machineset ci-ln-q4d8y8t-72292-msmgw-worker-f | jq -c '.spec.template.spec.providerSpec.value.networkInterfaces'
[{"network":"ci-ln-q4d8y8t-72292-msmgw-network","publicIP":true,"subnetwork":"ci-ln-q4d8y8t-72292-msmgw-worker-subnet"}]
$ oc -n openshift-machine-api scale --replicas 1 machineset ci-ln-q4d8y8t-72292-msmgw-worker-f
$ sleep 300
$ oc -n openshift-machine-api get -o json machines.machine.openshift.io | jq -r '.items[] | select(.status.phase == "Failed") | .status.providerStatus.conditions[].message' | sort  | uniq -c

Actual results

      1 googleapi: Error 403: Required 'compute.subnetworks.useExternalIp' permission for 'projects/openshift-gce-devel-ci-2/regions/us-central1/subnetworks/ci-ln-q4d8y8t-72292-msmgw-worker-subnet', forbidden

Expected results

Successfully created machines.

Additional info

I would expect the CredentialsRequest to ask for this permission, but it doesn't seem to. The old roles/compute.admin includes it, and it probably just needs to be added explicitly. Not clear how many other permissions might also need explicit listing.

https://github.com/openshift/machine-api-operator/pull/1206

Bug OCPBUGS-33048: BareMetalHost CR gets stuck if delete before installing starts

View the Description View the linked PRs

Description of problem:

When you delete a cluster, or just a BMH, before the installation starts (Assisted Service takes the control), the metal3-operator tries to generate a PreprovisioningImage.

In previous versions, it was created a fix that, during some first installation phases the creation of the PreprovisioningImage was not invoked:

https://github.com/openshift/baremetal-operator/pull/262/files#diff-a69d9029388ab766ed36b32180145f52785a9d4a153775510dbddfa928a72e1cR787

it was based on the status "StateDeleting".

Recently, it was added a new status "StatePoweringOffBeforeDelete":

https://github.com/openshift/baremetal-operator/commit/6f65d8e75ef6ed921863ebaf793cccda61de8bcb#diff-eeed3703d04e4c23a7d7af8cd0b7931b6b7990f23d826c49bdbc31c5f0a50291

but this status is not covered on the previous fix. And during this new phase there should not be tried to create the image.

The problem of trying create the PreprovisioningImage, when it should not, it is that create problems on ZTP. Where the BMH and all the objects are deleted at the same time. And the operator cannot create the image because the NS is been deleted.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

    1.Create a cluster
    2.Wait until the provisioing phase
    3.Delete the cluster
    4.The metal3-operator tries to create the PreprovisioningImage wrongly.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-operator/pull/349

Bug OCPBUGS-39287: UPI playbook failing due to missing metadata.json

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39286~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39285. The following is the description of the original issue:
—
Description of problem: https://github.com/openshift/installer/pull/7727 changed the order of some playbooks and we're expected to run the network.yaml playbook before the metadata.json file is created. This isn't a problem with newer version of ansible, that will happily ignore missing var_files, however this is a problem with older ansible that fail with:

[cloud-user@installer-host ~]$ ansible-playbook -i "/home/cloud-user/ostest/inventory.yaml" "/home/cloud-user/ostest/network.yaml"

PLAY [localhost] *****************************************************************************************************************************************************************************************************************************
ERROR! vars file metadata.json was not found                                                                                       
Could not find file on the Ansible Controller.                                                                                      
If you are using a module and expect the file to exist on the remote, see the remote_src option

https://github.com/openshift/installer/pull/8936

Bug OCPBUGS-25730: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28478

Bug OCPBUGS-28650: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2298

Bug OCPBUGS-20129: The upgradenotification triggers itself in the cluster installation time

View the Description View the linked PRs

Description of problem:

We should be checking the `currentVersion` and `desiredVersion` for being empty.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/790

Bug OCPBUGS-27222: egressIP with IPv6 not working on dualstack cluster on openstack

View the Description View the linked PRs

Description of problem:

On ipv6primary dualstack cluster, creating an ipv6 egressIP following this procedure:

https://docs.openshift.com/container-platform/4.14/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html

is not working. ovnkube-cluster-manager shows below error:

2024-01-16T14:48:18.156140746Z I0116 14:48:18.156053       1 obj_retry.go:358] Adding new object: *v1.EgressIP egress-dualstack-ipv6
2024-01-16T14:48:18.161367817Z I0116 14:48:18.161269       1 obj_retry.go:370] Retry add failed for *v1.EgressIP egress-dualstack-ipv6, will try again later: cloud add request failed for CloudPrivateIPConfig: fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]
2024-01-16T14:48:18.161416023Z I0116 14:48:18.161357       1 event.go:298] Event(v1.ObjectReference{Kind:"EgressIP", Namespace:"", Name:"egress-dualstack-ipv6", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'CloudAssignmentFailed' egress IP: fd2e:6f44:5dd8:c956:f816:3eff:fef0:3333 for object EgressIP: egress-dualstack-ipv6 could not be created, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]
2024-01-16T14:49:37.714410622Z I0116 14:49:37.714342       1 reflector.go:790] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.Service total 8 items received
2024-01-16T14:49:48.155826915Z I0116 14:49:48.155330       1 obj_retry.go:296] Retry object setup: *v1.EgressIP egress-dualstack-ipv6
2024-01-16T14:49:48.156172766Z I0116 14:49:48.155899       1 obj_retry.go:358] Adding new object: *v1.EgressIP egress-dualstack-ipv6
2024-01-16T14:49:48.168795734Z I0116 14:49:48.168520       1 obj_retry.go:370] Retry add failed for *v1.EgressIP egress-dualstack-ipv6, will try again later: cloud add request failed for CloudPrivateIPConfig: fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]
2024-01-16T14:49:48.169400971Z I0116 14:49:48.168937       1 event.go:298] Event(v1.ObjectReference{Kind:"EgressIP", Namespace:"", Name:"egress-dualstack-ipv6", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'CloudAssignmentFailed' egress IP: fd2e:6f44:5dd8:c956:f816:3eff:fef0:3333 for object EgressIP: egress-dualstack-ipv6 could not be created, err: CloudPrivateIPConfig.cloud.network.openshift.io "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333" is invalid: [<nil>: Invalid value: "": "metadata.name" must validate at least one schema (anyOf), metadata.name: Invalid value: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333": metadata.name in body must be of type ipv4: "fd2e.6f44.5dd8.c956.f816.3eff.fef0.3333"]

Same is observed with ipv6 subnet on slaac mode.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-06-062415
RHOS-16.2-RHEL-8-20230510.n.1

How reproducible: Always.
Steps to Reproduce:

Applying below:

$ oc label node/ostest-8zrlf-worker-0-4h78l k8s.ovn.org/egress-assignable=""

$ cat egressip_ipv4.yaml && cat egressip_ipv6.yaml 
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-dualstack-ipv4
spec:
  egressIPs:
    - 192.168.192.111
  namespaceSelector:
    matchLabels: 
      app: egress
      
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-dualstack-ipv6
spec:
  egressIPs:
    - fd2e:6f44:5dd8:c956:f816:3eff:fef0:3333
  namespaceSelector:
    matchLabels: 
      app: egress

$ oc apply -f egressip_ipv4.yaml
$ oc apply -f egressip_ipv6.yaml

But it shows only info about ipv4 egressIP. The IPv6 port is not even created in openstack:

oc logs -n openshift-cloud-network-config-controller cloud-network-config-controller-67cbc4bc84-786jm 
I0116 13:15:48.914323       1 controller.go:182] Assigning key: 192.168.192.111 to cloud-private-ip-config workqueue
I0116 13:15:48.928927       1 cloudprivateipconfig_controller.go:357] CloudPrivateIPConfig: "192.168.192.111" will be added to node: "ostest-8zrlf-worker-0-4h78l"
I0116 13:15:48.942260       1 cloudprivateipconfig_controller.go:381] Adding finalizer to CloudPrivateIPConfig: "192.168.192.111"
I0116 13:15:48.943718       1 controller.go:182] Assigning key: 192.168.192.111 to cloud-private-ip-config workqueue
I0116 13:15:49.758484       1 openstack.go:760] Getting port lock for portID 8854b2e9-3139-49d2-82dd-ee576b0a0cce and IP 192.168.192.111
I0116 13:15:50.547268       1 cloudprivateipconfig_controller.go:439] Added IP address to node: "ostest-8zrlf-worker-0-4h78l" for CloudPrivateIPConfig: "192.168.192.111"
I0116 13:15:50.602277       1 controller.go:160] Dropping key '192.168.192.111' from the cloud-private-ip-config workqueue
I0116 13:15:50.614413       1 controller.go:160] Dropping key '192.168.192.111' from the cloud-private-ip-config workqueue

$ openstack port list --network network-dualstack | grep -e 192.168.192.111 -e 6f44:5dd8:c956:f816:3eff:fef0:3333
| 30fe8d9a-c1c6-46c3-a873-9a02e1943cb7 | egressip-192.168.192.111      | fa:16:3e:3c:23:2a | ip_address='192.168.192.111', subnet_id='ae8a4c1f-d3e4-4ea2-bc14-ef1f6f5d0bbe'                         | DOWN   |

Actual results: ipv6 egressIP object is ignored.
Expected results: ipv6 egressIP is created and can be attached to a pod.
Additional info: must-gather linked in private comment.

Bug OCPBUGS-32212: Fix SAST warning in containernetworking-plugins

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/containernetworking-plugins/pull/159

Bug OCPBUGS-34390: HCP: imagesStreams on hosted-clusters pointing to image on private registries are failing due to tls verification although the registry is correctly trusted

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31446~~. The following is the description of the original issue:
—
Description of problem:

    imagesStreams on hosted-clusters pointing to image on private registries are failing due to tls verification although the registry is correctly trusted.

example:
$ oc create namespace e2e-test

$ oc --namespace=e2e-test tag virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ busybox:latest

$ oc --namespace=e2e-test  set image-lookup busybox

stirabos@t14s:~$ oc get imagestream -n e2e-test 
NAME      IMAGE REPOSITORY                                                    TAGS     UPDATED
busybox   image-registry.openshift-image-registry.svc:5000/e2e-test/busybox   latest   
stirabos@t14s:~$ oc get imagestream -n e2e-test busybox -o yaml
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/image.dockerRepositoryCheck: "2024-03-27T12:43:56Z"
  creationTimestamp: "2024-03-27T12:43:56Z"
  generation: 3
  name: busybox
  namespace: e2e-test
  resourceVersion: "49021"
  uid: 847281e7-e307-4057-ab57-ccb7bfc49327
spec:
  lookupPolicy:
    local: true
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ
    generation: 2
    importPolicy:
      importMode: Legacy
    name: latest
    referencePolicy:
      type: Source
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/e2e-test/busybox
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2024-03-27T12:43:56Z"
      message: 'Internal error occurred: virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ:
        Get "https://virthost.ostest.test.metalkube.org:5000/v2/": tls: failed to
        verify certificate: x509: certificate signed by unknown authority'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: latest

While image virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ can be properly consumed if directly used for a container on a pod on the same cluster.

user-ca-bundle config map is properly propagated from hypershift:

$ oc get configmap -n openshift-config user-ca-bundle
NAME             DATA   AGE
user-ca-bundle   1      3h32m

$ openssl x509 -text -noout -in <(oc get cm -n openshift-config user-ca-bundle -o json | jq -r '.data["ca-bundle.crt"]')
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            11:3f:15:23:97:ac:c2:d5:f6:54:06:1a:9a:22:f2:b5:bf:0c:5a:00
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, ST = NC, L = Raleigh, O = Test Company, OU = Testing, CN = test.metalkube.org
        Validity
            Not Before: Mar 27 08:28:07 2024 GMT
            Not After : Mar 27 08:28:07 2025 GMT
        Subject: C = US, ST = NC, L = Raleigh, O = Test Company, OU = Testing, CN = test.metalkube.org
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:c1:49:1f:18:d2:12:49:da:76:05:36:3e:6b:1a:
                    82:a7:22:0d:be:f5:66:dc:97:44:c7:ca:31:4d:f3:
                    7f:0a:d3:de:df:f2:b6:23:f9:09:b1:7a:3f:19:cc:
                    22:c9:70:90:30:a7:eb:49:28:b6:d1:e0:5a:14:42:
                    02:93:c4:ac:cc:da:b1:5a:8f:9c:af:60:19:1a:e3:
                    b1:34:c2:b6:2f:78:ec:9f:fe:38:75:91:0f:a6:09:
                    78:28:36:9e:ab:1c:0d:22:74:d5:52:fe:0a:fc:db:
                    5a:7c:30:9d:84:7d:f7:6a:46:fe:c5:6f:50:86:98:
                    cc:35:1f:6c:b0:e6:21:fc:a5:87:da:81:2c:7b:e4:
                    4e:20:bb:35:cc:6c:81:db:b3:95:51:cf:ff:9f:ed:
                    00:78:28:1d:cd:41:1d:03:45:26:45:d4:36:98:bd:
                    bf:5c:78:0f:c7:23:5c:44:5d:a6:ae:85:2b:99:25:
                    ae:c0:73:b1:d2:87:64:3e:15:31:8e:63:dc:be:5c:
                    ed:e3:fe:97:29:10:fb:5c:43:2f:3a:c2:e4:1a:af:
                    80:18:55:bc:40:0f:12:26:6b:f9:41:da:e2:a4:6b:
                    fd:66:ae:bc:9c:e8:2a:5a:3b:e7:2b:fc:a6:f6:e2:
                    73:9b:79:ee:0c:86:97:ab:2e:cc:47:e7:1b:e5:be:
                    0c:9f
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: 
                CA:TRUE, pathlen:0
            X509v3 Subject Alternative Name: 
                DNS:virthost.ostest.test.metalkube.org
    Signature Algorithm: sha256WithRSAEncryption
    Signature Value:
        58:d2:da:f9:2a:c0:2d:7a:d9:9f:1f:97:e1:fd:36:a7:32:d3:
        ab:3f:15:cd:68:8e:be:7c:11:ec:5e:45:50:c4:ec:d8:d3:c5:
        22:3c:79:5a:01:63:9e:5a:bd:02:0c:87:69:c6:ff:a2:38:05:
        21:e4:96:78:40:db:52:c8:08:44:9a:96:6a:70:1e:1e:ae:74:
        e2:2d:fa:76:86:4d:06:b1:cf:d5:5c:94:40:17:5d:9f:84:2c:
        8b:65:ca:48:2b:2d:00:3b:42:b9:3c:08:1b:c5:5d:d2:9c:e9:
        bc:df:9a:7c:db:30:07:be:33:2a:bb:2d:69:72:b8:dc:f4:0e:
        62:08:49:93:d5:0f:db:35:98:18:df:e6:87:11:ce:65:5b:dc:
        6f:f7:f0:1c:b0:23:40:1e:e3:45:17:04:1a:bc:d1:57:d7:0d:
        c8:26:6d:99:fe:28:52:fe:ba:6a:a1:b8:d1:d1:50:a9:fa:03:
        bb:b7:ad:0e:82:d2:e8:34:91:fa:b4:f9:81:d1:9b:6d:0f:a3:
        8c:9d:c4:4a:1e:08:26:71:b9:1a:e8:49:96:0f:db:5c:76:db:
        ae:c7:6b:2e:ea:89:5d:7f:a3:ba:ea:7e:12:97:12:bc:1e:7f:
        49:09:d4:08:a6:4a:34:73:51:9e:a2:9a:ec:2a:f7:fc:b5:5c:
        f8:20:95:ad

This is probably a side effect of https://issues.redhat.com/browse/RFE-3093 - imagestream to trust CA added during the installation, that is also affecting imagestreams that requires a CA cert injected by hypershift during hosted-cluster creation in the disconnected use case.

Version-Release number of selected component (if applicable):

    v4.14, v4.15, v4.16

How reproducible:

    100%

Steps to Reproduce:

once connected to a disconnected hosted cluster, create an image stream pointing to an image on the internal mirror registry:
    1. $ oc --namespace=e2e-test tag virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ busybox:latest

    2. $ oc --namespace=e2e-test  set image-lookup busybox
    3. then check the image stream

Actual results:

    status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/e2e-test/busybox
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2024-03-27T12:43:56Z"
      message: 'Internal error occurred: virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ:
        Get "https://virthost.ostest.test.metalkube.org:5000/v2/": tls: failed to
        verify certificate: x509: certificate signed by unknown authority'

although the same image can be directly consumed by a pod on the same cluster

Expected results:

    status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/e2e-test/busybox
  tags:
  - conditions:
    - generation: 8
      lastTransitionTime: "2024-03-27T13:30:46Z"
      message: dockerimage.image.openshift.io "virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ"
        not found
      reason: NotFound
      status: "False"
      type: ImportSuccess

Additional info:

    This is probably a side effect of https://issues.redhat.com/browse/RFE-3093

Marking the imagestream as:
    importPolicy:
      importMode: Legacy
      insecure: true
is enough to workaround this.

https://github.com/openshift/hypershift/pull/4087

Bug OCPBUGS-30688: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/297

Bug OCPBUGS-36510: DeploymentConfigs deprecation info alert should not present on the Edit deployment page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36424~~. The following is the description of the original issue:
—
Description of problem:

    DeploymentConfigs deprecation info alert is shows on the Edit deployment form. It should be shows on only deploymentConfigs pages.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a deployment
    2. Open Edit deployment form from the actions menu
    3.

Actual results:

    DeploymentConfigs deprecation info alert present on the edit deployment form

Expected results:

    DeploymentConfigs deprecation info alert should not be shown for the Deployment

Additional info:

https://github.com/openshift/console/pull/14031

Bug OCPBUGS-22324: Node fails to join cluster as CSR contains wrong hostname in dualstack setup

View the Description View the linked PRs

Description of problem:

A node fails to join cluster as it's CSR contains incorrect hostname

oc describe csr csr-7hftm
Name:               csr-7hftm
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Tue, 24 Oct 2023 10:22:39 -0400
Requesting User:    system:serviceaccount:openshift-machine-config-operator:node-bootstrapper
Signer:             kubernetes.io/kube-apiserver-client-kubelet
Status:             Pending
Subject:
         Common Name:    system:node:openshift-worker-1
         Serial Number:
         Organization:   system:nodes
Events:  <none>

oc get csr csr-7hftm -o yaml
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  creationTimestamp: "2023-10-24T14:22:39Z"
  generateName: csr-
  name: csr-7hftm
  resourceVersion: "96957"
  uid: 84b94213-0c0c-40e4-8f90-d6612fbdab58
spec:
  groups:
  - system:serviceaccounts
  - system:serviceaccounts:openshift-machine-config-operator
  - system:authenticated
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlIN01JR2lBZ0VBTUVBeEZUQVRCZ05WQkFvVERITjVjM1JsYlRwdWIyUmxjekVuTUNVR0ExVUVBeE1lYzNsegpkR1Z0T201dlpHVTZiM0JsYm5Ob2FXWjBMWGR2Y210bGNpMHhNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBECkFRY0RRZ0FFMjRabE1JWGE1RXRKSGgwdWg2b3RVYTc3T091MC9qN0xuSnFqNDJKY0dkU01YeTJVb3pIRTFycmYKOTFPZ3pOSzZ5Z1R0Qm16NkFOdldEQTZ0dUszMlY2QUFNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJRFhHMlFVWQoxMnVlWXhxSTV3blArRFBQaE5oaXhiemJvaTBpQzhHci9kMXRBaUVBdEFDcVVwRHFLYlFUNWVFZXlLOGJPN0dlCjhqVEI1UHN1SVpZM1pLU1R2WG89Ci0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=
  signerName: kubernetes.io/kube-apiserver-client-kubelet
  uid: c3adb2e0-6d60-4f56-a08d-6b01d3d3c065
  usages:
  - digital signature
  - client auth
  username: system:serviceaccount:openshift-machine-config-operator:node-bootstrapper
status: {}

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

So far only on one setup

Steps to Reproduce:

1. Deploy dualstack baremetal cluster with day1 networking with static DHCP hostnames
2.
3.

Actual results:

A node fails to join the cluster

Expected results:

All nodes join the cluster

https://github.com/openshift/machine-config-operator/pull/4079

Story TRT-1545: Remove use of non-structured interval locators/messages in origin

View the Description View the linked PRs

Continued work on the move to structured intervals requires us to replace all uses of the legacy format in in origin so we can reclaim the "locator" (and "message") properties for the new structured interval, and stop duplicating a lot of text when we store and upload and process these.

Bug OCPBUGS-29493: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28595

Bug OCPBUGS-34388: monitor-add-nodes Error: open .addnodesparams: permission denied

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34040~~. The following is the description of the original issue:
—
Description of problem:

monitor-add-nodes.sh returns Error: open .addnodesparams: permission denied.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

sometimes

Steps to Reproduce:

    1. Monitor adding a day2 node using monitor-add-nodes.sh
    2.
    3.

Actual results:

    Error: open .addnodesparams: permission denied.

Expected results:

    monitor-add-nodes runs successfully

Additional info:

zhenying niu found an issue the node-joiner-monitor.sh

[core@ocp-edge49 installer]$ ./node-joiner-monitor.sh 192.168.122.6 namespace/openshift-node-joiner-mz8anfejbn created serviceaccount/node-joiner-monitor created clusterrole.rbac.authorization.k8s.io/node-joiner-monitor unchanged clusterrolebinding.rbac.authorization.k8s.io/node-joiner-monitor configured pod/node-joiner-monitor created Now using project "openshift-node-joiner-mz8anfejbn" on server "https://api.ostest.test.metalkube.org:6443". pod/node-joiner-monitor condition met time=2024-05-21T09:24:19Z level=info msg=Monitoring IPs: [192.168.122.6] Error: open .addnodesparams: permission denied Usage: node-joiner monitor-add-nodes [flags] Flags: -h, --help help for monitor-add-nodes Global Flags: --dir string assets directory (default ".") --kubeconfig string Path to the kubeconfig file. --log-level string log level (e.g. "debug | info | warn | error") (default "info") time=2024-05-21T09:24:19Z level=fatal msg=open .addnodesparams: permission denied Cleaning up Removing temporary file /tmp/nodejoiner-mZ8aNfEjbn

[~afasano@redhat.com]  found the root cause, the working directory was not set, so the pwd folder /output is used, and is not writable. An easy fix would be to just use /tmp, ie:
{code:java}
command: ["/bin/sh", "-c", "node-joiner monitor-add-nodes $ipAddresses --dir=/tmp --log-level=info; sleep 5"]

https://github.com/openshift/installer/pull/8461

Bug OCPBUGS-39028: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1282

Story TRT-1420: console and auth operators failing to install in 4.16 intermittently

View the Description View the linked PRs

Last payload showed several occurrences of this problem seemingly surfacing the same way:

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.nightly/release/4.16.0-0.nightly-2023-12-18-092716

Example jobs:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-sdn-techpreview-serial/1736680969496170496

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-sdn-techpreview-serial/1736680969496170496 (blocked the payload)

Looking at sippy, both tests took a dip in pass rate on the 16th, meaning a regression may have merged late on the 15th (friday) or somewhere on the 16th (less likely)

console operator test 4.16

auth operator test 4.16

The problem kills the install and thus we are getting no intervals charts to help debug.

Bug OCPBUGS-24974: Update 4.16 ose-vertical-pod-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/272

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/272

Bug OCPBUGS-34213: lots of churn during image registry managed/removed transition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34054~~. The following is the description of the original issue:
—
The OCM-operator's imagePullSecretCleanupController attempts to prevent new pods from using an image pull secret that needs to be deleted, but this results in the OCM creating a new image pull secret in the meantime.

The overlap occurs when OCM-operator has detected the registry is removed, simultaneously triggering the imagePullSecretCleanup controller to start deleting and updating the OCM config to stop creating, but the OCM behavior change is delayed until its pods are restarted.

In 4.16 this churn is minimized due to the OCM naming the image pull secrets consistently, but the churn can occur during an upgrade given that the OCM-operator is updated first.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/348

Bug OCPBUGS-42724: openshift-apiserver panicked with runtime error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42232~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-35036. The following is the description of the original issue:
—
Description of problem:

The following logs are from namespaces/openshift-apiserver/pods/apiserver-6fcd57c747-57rkr/openshift-apiserver/openshift-apiserver/logs/current.log

    2024-06-06T15:57:06.628216833Z E0606 15:57:06.628186       1 finisher.go:175] FinishRequest: post-timeout activity - time-elapsed: 139.823053ms, panicked: true, err: <nil>, panic-reason: runtime error: invalid memory address or nil pointer dereference
2024-06-06T15:57:06.628216833Z goroutine 192790 [running]:
2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest.func1.1()
2024-06-06T15:57:06.628216833Z  k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/finisher/finisher.go:105 +0xa5
2024-06-06T15:57:06.628216833Z panic({0x498ac60?, 0x74a51c0?})
2024-06-06T15:57:06.628216833Z  runtime/panic.go:914 +0x21f
2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer.(*ImageStreamImporter).importImages(0xc0c5bf0fc0, {0x5626bb0, 0xc0a50c7dd0}, 0xc07055f4a0, 0xc0a2487600)
2024-06-06T15:57:06.628216833Z  github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer/importer.go:263 +0x1cf5
2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer.(*ImageStreamImporter).Import(0xc0c5bf0fc0, {0x5626bb0, 0xc0a50c7dd0}, 0x0?, 0x0?)
2024-06-06T15:57:06.628216833Z  github.com/openshift/openshift-apiserver/pkg/image/apiserver/importer/importer.go:110 +0x139
2024-06-06T15:57:06.628216833Z github.com/openshift/openshift-apiserver/pkg/image/apiserver/registry/imagestreamimport.(*REST).Create(0xc0033b2240, {0x5626bb0, 0xc0a50c7dd0}, {0x5600058?, 0xc07055f4a0?}, 0xc08e0b9ec0, 0x56422e8?)
2024-06-06T15:57:06.628216833Z  github.com/openshift/openshift-apiserver/pkg/image/apiserver/registry/imagestreamimport/rest.go:337 +0x1574
2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers.(*namedCreaterAdapter).Create(0x55f50e0?, {0x5626bb0?, 0xc0a50c7dd0?}, {0xc0b5704000?, 0x562a1a0?}, {0x5600058?, 0xc07055f4a0?}, 0x1?, 0x2331749?)
2024-06-06T15:57:06.628216833Z  k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/create.go:254 +0x3b
2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers.CreateResource.createHandler.func1.1()
2024-06-06T15:57:06.628216833Z  k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/create.go:184 +0xc6
2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers.CreateResource.createHandler.func1.2()
2024-06-06T15:57:06.628216833Z  k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/create.go:209 +0x39e
2024-06-06T15:57:06.628216833Z k8s.io/apiserver/pkg/endpoints/handlers/finisher.finishRequest.func1()
2024-06-06T15:57:06.628216833Z  k8s.io/apiserver@v0.29.2/pkg/endpoints/handlers/finisher/finisher.go:117 +0x84

Version-Release number of selected component (if applicable):

We applied into all clusters in CI and checked 3 of them and all 3 share the same errors.

oc --context build09 get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-rc.3   True        False         3d9h    Error while reconciling 4.16.0-rc.3: the cluster operator machine-config is degraded

oc --context build02 get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-rc.2   True        False         15d     Error while reconciling 4.16.0-rc.2: the cluster operator machine-config is degraded

oc --context build03 get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.16   True        False         34h     Error while reconciling 4.15.16: the cluster operator machine-config is degraded

How reproducible:

We applied this PR https://github.com/openshift/release/pull/52574/files to the clusters.

It breaks at least 3 of them.

"qci-pull-through-cache-us-east-1-ci.apps.ci.l2s4.p1.openshiftapps.com" is a registry cache server https://github.com/openshift/release/blob/master/clusters/app.ci/quayio-pull-through-cache/qci-pull-through-cache-us-east-1.yaml

Additional info:

There are lots of image imports in OpenShift CI jobs.

It feels like the registry cache server returns unexpected results to the openshift-apiserver:

2024-06-06T18:13:13.781520581Z E0606 18:13:13.781459       1 strategy.go:60] unable to parse manifest for "sha256:c5bcd0298deee99caaf3ec88de246f3af84f80225202df46527b6f2b4d0eb3c3": unexpected end of JSON input

Our theory is that the requests of imports from all CI clusters crashed the cache server and it sent some unexpected data which caused apiserver to panic.

The expected behaviour is that if the image cannot be pulled from the first mirror in the ImageDigestMirrorSet, then it will be failed over to the next one.

https://github.com/openshift/openshift-apiserver/pull/455

Bug OCPBUGS-31962: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1228

Bug OCPBUGS-36777: [CAPI install] envtest.kubeconfig is not deleted when destroying cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35542~~. The following is the description of the original issue:
—
Description of problem:

After destroying cluster, there are still some files leftover in <install-dir>/.clusterapi_output
$ ls -ltra
total 1516
drwxr-xr-x. 1 fedora fedora     596 Jun 17 03:46 ..
drwxr-x---. 1 fedora fedora      88 Jun 17 06:09 .clusterapi_output
-rw-r--r--. 1 fedora fedora 1552382 Jun 17 06:09 .openshift_install.log
drwxr-xr-x. 1 fedora fedora      80 Jun 17 06:09 . 
$ ls -ltr .clusterapi_output/
total 40
-rw-r--r--. 1 fedora fedora  2335 Jun 17 05:58 envtest.kubeconfig
-rw-r--r--. 1 fedora fedora 20542 Jun 17 06:03 kube-apiserver.log
-rw-r--r--. 1 fedora fedora 10656 Jun 17 06:03 etcd.log

Then continue installing new cluster within same install dir, installer exited with error as below:
$ ./openshift-install create cluster --dir ipi-aws
INFO Credentials loaded from the "default" profile in file "/home/fedora/.aws/credentials" 
INFO Consuming Install Config from target directory 
FATAL failed to fetch Cluster: failed to load asset "Cluster": local infrastructure provisioning artifacts already exist. There may already be a running cluster 


After removing .clusterapi_output/envtest.kubeconfig, and creating cluster again, installation is continued.

Version-Release number of selected component (if applicable):

4.16 nightly build

How reproducible:

always

Steps to Reproduce:

1. Launch capi-based installation
2. Destroy cluster
3. Launch new cluster within same install dir

Actual results:

Fail to launch new cluster within the same install dir, because .clusterapi_output/envtest.kubeconfig is still there.

Expected results:

Succeed to create a new cluster within the same install dir

Additional info:

https://github.com/openshift/installer/pull/8719

Bug OCPBUGS-26924: Enable healthcheck of stale node-registration sockets

View the Description View the linked PRs

Following up from ~~OCPBUGS-16357~~, we should enable health check of stale registration sockets in our operators.

We will need - https://github.com/kubernetes-csi/node-driver-registrar/pull/322 and we will have to enable healthcheck for registration sockets - https://github.com/kubernetes-csi/node-driver-registrar#example

Bug OCPBUGS-32257: Registry overrides are being propagated to some data plane components

View the Description View the linked PRs

Description of problem:

When using the registry-overrides flag to override registries for control plane components, it seems like the current implementation prpagates the override to some data plane components. 

It seems that certain components like multus, dns, and ingress get values for their containers' images from env vars set in operators on the control plane (cno/dns operator/konnectivity), and hence also get the overridden registry propagated to them.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1.Input a registry override through the HyperShift Operator
    2.Check registry fields for components on data plane
    3.

Actual results:

Data plane components that get registry values from env vars set in dns-operator, ingress-operator, cluster-network-operator, and cluster-node-tuning-operator get overridden registries.

Expected results:

overriden registries should not get propagated to data plane

Additional info:

https://github.com/openshift/hypershift/pull/3952

Bug OCPBUGS-24581: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/877

Bug OCPBUGS-28856: Modal dialogs expose code that is not null object safe

View the Description View the linked PRs

Description of problem:

When using the modal dialogs in a hook as part of the actions hook (i.e. useApplicationsActionsProvider) the console will throw an error since the console framework will pass null objects as part of the render cycle. According to Jon Jackson, the console should be safe from null objects but it looks like the code for useDeleteModal and getGroupVersionKindForresource are not safe,

Version-Release number of selected component (if applicable):

How reproducible:

   Always

Steps to Reproduce:

    1. Use one of the modal APIs in an actions provider hook
    2.
    3.

Actual results:

    Caught error in a child component: TypeError: Cannot read properties of undefined (reading 'split')
    at i (main-chunk-9fbeef79a…d3a097ed.min.js:1:1)
    at u (main-chunk-9fbeef79a…d3a097ed.min.js:1:1)
    at useApplicationActionsProvider (useApplicationActionsProvider.tsx:23:43)
    at ApplicationNavPage (ApplicationDetails.tsx:38:67)
    at na (vendors~main-chunk-8…87b.min.js:174297:1)
    at Hs (vendors~main-chunk-8…87b.min.js:174297:1)
    at Sc (vendors~main-chunk-8…87b.min.js:174297:1)
    at Cc (vendors~main-chunk-8…87b.min.js:174297:1)
    at _c (vendors~main-chunk-8…87b.min.js:174297:1)
    at pc (vendors~main-chunk-8…87b.min.js:174297:1)

Expected results:

    Works with no error

Additional info:

https://github.com/openshift/console/pull/13574

Bug OCPBUGS-23519: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2073

Bug OCPBUGS-37018: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/822

Bug OCPBUGS-38797: [4.16] redfish-virtualmedia fails on xFusion nodes

View the Description View the linked PRs

Description of problem:

 When trying to onboard a xFusion baremetal node using redfish-virtual media (no provisioning network), it fails after the node registration with this error:

Normal InspectionError 60s metal3-baremetal-controller Failed to inspect hardware. Reason: unable to start inspection: The attribute Links/ManagedBy is missing from the resource /redfish/v1/Systems/1

Version-Release number of selected component (if applicable):

    4.14.18

How reproducible:

    Just add a xFusion baremetal node, specifing in the manifest

Spec: 
  Automated Cleaning Mode: metadata 
  Bmc: 
    Address: redfish-virtualmedia://w.z.x.y/redfish/v1/Systems/1 
    Credentials Name: hu28-tovb-bmc-secret 
    Disable Certificate Verification: true 
  Boot MAC Address: <MAC>
  Boot Mode: UEFI Online: false 
  Preprovisioning Network Data Name: openstack-hu28-tovb-network-config-secret

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Inspection fails with afore mentioned error, no preprovisioning image is mounted on the hoste virtualmedia

Expected results:

    VirtualMedia get mounted and inspection starts.

Additional info:

https://github.com/openshift/ironic-image/pull/564

Bug OCPBUGS-25996: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7898

Bug OCPBUGS-27180: Sync openshift-oauth-apiserver's shutdown-delay-duration with core offering

View the Description View the linked PRs

Description of problem:

The shutdown-delay-duration argument for the openshift-oauth-apiserver is set to 3s in hypershift, but set to 15s in core openshift. Hypershift should update the value to match.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3608

Bug OCPBUGS-31421: Autoscaler should scale from zero when taints do not have a "value" field

View the Description View the linked PRs

Description of problem:

When scaling from zero replicas, the cluster autoscaler can panic if there are taints on the machineset with no "value" field defined.

Version-Release number of selected component (if applicable):

4.16/master

How reproducible:

always

Steps to Reproduce:

    1. create a machineset with a taint that has no value field and 0 replicas
    2. enable the cluster autoscaler
    3. force a workload to scale the tainted machineset

Actual results:

a panic like this is observed

I0325 15:36:38.314276       1 clusterapi_provider.go:68] discovered node group: MachineSet/openshift-machine-api/k8hmbsmz-c2483-9dnddr4sjc (min: 0, max: 2, replicas: 0)
panic: interface conversion: interface {} is nil, not string

goroutine 79 [running]:
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredToTaint(...)
	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:246
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.unstructuredScalableResource.Taints({0xc000103d40?, 0xc000121360?, 0xc002386f98?, 0x2?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go:214 +0x8a5
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi.(*nodegroup).TemplateNodeInfo(0xc002675930)
	/go/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go:266 +0x2ea
k8s.io/autoscaler/cluster-autoscaler/core/utils.GetNodeInfoFromTemplate({0x276b230, 0xc002675930}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60?, 0xc0023ffe90?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/core/utils/utils.go:41 +0x9d
k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider.(*MixedTemplateNodeInfoProvider).Process(0xc00084f848, 0xc0023f7680, {0xc001dcdb00, 0x3, 0x0?}, {0xc001bf2c00, 0x10, 0x10}, {0xc0023ffe60, 0xc0023ffe90}, ...)
	/go/src/k8s.io/autoscaler/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go:155 +0x599
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000617550, {0x4?, 0x0?, 0x3a56f60?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:352 +0xcaa
main.run(0x0?, {0x2761b48, 0xc0004c04e0})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:529 +0x2cd
main.main.func2({0x0?, 0x0?})
	/go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:617 +0x25
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x105

Expected results:

expect the machineset to scale up

Additional info:
i think the e2e test that exercises this is only running on periodic jobs and as such we missed this error in ~~OCPBUGS-27509~~ .

this search shows some failed results

https://github.com/openshift/kubernetes-autoscaler/pull/292

Bug MGMT-17468: Assisted-service fail to register cluster with "CPU architecture is not supported" error

View the Description View the linked PRs

Description of problem:

When running agent-based installation with arm64 and multi payload, after booting the iso file, assisted-service raise the error, and the installation fail to start:

Openshift version 4.16.0-0.nightly-arm64-2024-04-02-182838 for CPU architecture arm64 is not supported: no release image found for openshiftVersion: '4.16.0-0.nightly-arm64-2024-04-02-182838' and CPU architecture 'arm64'" go-id=419 pkg=Inventory request_id=5817b856-ca79-43c0-84f1-b38f733c192f 

The same error when running the installation with multi-arch build in assisted-service.log:

Openshift version 4.16.0-0.nightly-multi-2024-04-01-135550 for CPU architecture multi is not supported: no release image found for openshiftVersion: '4.16.0-0.nightly-multi-2024-04-01-135550' and CPU architecture 'multi'" go-id=306 pkg=Inventory request_id=21a47a40-1de9-4ee3-9906-a2dd90b14ec8 

Amd64 build works fine for now.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create agent iso file with openshift-install binary: openshift-install agent create image with arm64/multi payload
2. Booting the iso file 
3. Track the "openshift-install agent wait-for bootstrap-complete" output and assisted-service log

Actual results:

 The installation can't start with error

Expected results:

 The installation is working fine

Additional info:

assisted-service log: https://docs.google.com/spreadsheets/d/1Jm-eZDrVz5so4BxsWpUOlr3l_90VmJ8FVEvqUwG8ltg/edit#gid=0

Job fail url: 
multi payload: 
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-baremetal-compact-agent-ipv4-dhcp-day2-amd-mixarch-f14/1774134780246364160

arm64 payload:
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-arm64-nightly-baremetal-pxe-ha-agent-ipv4-static-connected-f14/1773354788239446016

https://github.com/openshift/assisted-service/pull/6190

Bug OCPBUGS-29972: ART requests updates to 4.16 image csi-livenessprobe-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/59

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/62

Bug OCPBUGS-31278: ART requests updates to 4.16 image ose-olm-rukpak-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-rukpak/pull/81

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-rukpak/pull/81

Task OU-310: Add an extension point for alerts, to be able to add an inspect link

View the Description View the linked PRs

Background

Lightspeed requires to add a link to each alert item, into the kebab menu so they can open a panel

Outcomes

An extension point is added to the alert table so new elements can be added to each alert kebab menu

https://github.com/openshift/monitoring-plugin/pull/95

Bug OCPBUGS-35192: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8558

Task MGMT-16452: Change MCE subscription to use default channel

View the linked PRs

https://github.com/openshift/assisted-service/pull/5843

Bug OCPBUGS-15827: console operator degraded following service CA rotation by deleting the signing-key

View the Description View the linked PRs

Description of problem:

following signing-key deletion, there is a service CA rotation process which might temporary disrupt cluster operators, but eventually all should regenerate. in recent 4.14 nighties however this is not the case anymore. following a deletion of the signing-key using
oc delete secret/signing-key -n openshift-service-ca
operators will progress for a while, but eventually console as well as monitoring will end up in available=false and degraded=true, which is only recoverable by manually deleting all the pods in the cluster.

console                                    4.14.0-0.nightly-2023-06-30-131338   False       False         True       159m    RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.evakhoni-0412.qe.gcp.devcluster.openshift.com returns '503 Service Unavailable'

monitoring                                 4.14.0-0.nightly-2023-06-30-131338   False       True          True       161m    reconciling Console Plugin failed: retrieving ConsolePlugin object failed: conversion webhook for console.openshift.io/v1alpha1, Kind=ConsolePlugin failed: Post "https://webhook.openshift-console-operator.svc:9443/crdconvert?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority

same deletion in the previous versions of 4.14-ec.2 or earlier doesn't have this issue, and able to recover eventually without any manual pod deletion. I believe this to be regression bug.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-30-131338 and other recent 4.14 nightlies

How reproducible:

100%

Steps to Reproduce:

1.oc delete secret/signing-key -n openshift-service-ca
2. wait at least 30+ minutes
3. observe oc get co

Actual results:

console and monitoring degraded and not recovering

Expected results:

able to recover eventually as in previous versions

Additional info:

using manual deletion of all pods it is possible to recover the cluster from this state as follows:
for I in $(oc get ns -o jsonpath='{range .items[*]} {.metadata.name}{"\n"} {end}'); \
      do oc delete pods --all -n $I; \
      sleep 1; \
      done

must-gather:
https://drive.google.com/file/d/1Y3RrYZlz0EncG-Iqt8USFPsTd-br36Zt/view?usp=sharing

Bug OCPBUGS-32133: GCP Bucket creation twice

View the Description View the linked PRs

ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed preparing ignition data: ignition failed to provision storage: failed to create storage: failed to create bucket: googleapi: Error 409: Your previous request to create the named bucket succeeded and you already own it., conflict

https://github.com/openshift/installer/pull/8248

Bug OCPBUGS-24995: [azure] bootstrap failed to be provisioned when vm type is set to Standard_NP10s

View the Description View the linked PRs

Description of problem:

Configure vm type as Standard_NP10s in install-config, which only supports Generation V1.
--------------
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    azure:
      type: Standard_NP10s
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    azure:
      type: Standard_NP10s
  replicas: 3

Continue installation, installer failed when provisioning bootstrap node.
--------------
ERROR                                              
ERROR Error: creating Linux Virtual Machine: (Name "jima1211test-rqfhm-bootstrap" / Resource Group "jima1211test-rqfhm-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NP10s' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm" 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 193, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  193: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "bootstrap" stage: error applying Terraform configs: failed to apply Terraform: exit status 1 
ERROR                                              
ERROR Error: creating Linux Virtual Machine: (Name "jima1211test-rqfhm-bootstrap" / Resource Group "jima1211test-rqfhm-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NP10s' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm" 
ERROR                                              
ERROR   with azurerm_linux_virtual_machine.bootstrap, 
ERROR   on main.tf line 193, in resource "azurerm_linux_virtual_machine" "bootstrap": 
ERROR  193: resource "azurerm_linux_virtual_machine" "bootstrap" { 
ERROR                                              
ERROR                                              

seems that issue is introduced by https://github.com/openshift/installer/pull/7642/

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-09-012410

How reproducible:

Always

Steps to Reproduce:

    1. configure vm type to Standard_NP10s on control-plane in install-config.yaml
    2. install cluster
    3.

Actual results:

    installer failed when provisioning bootstrap node

Expected results:

    installation get successful

Additional info:

https://github.com/openshift/installer/pull/7822

Bug OCPBUGS-29745: PipelineRun details page break for pipeline with when expression using CEL expression

View the Description View the linked PRs

Description of problem:

   When expression using CEL is the alpha feature of the Pipeline. and it is not handled and not supported in the UI console so that UI breaks

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a Pipeline with when expression using the CEL expression
    2. Run the pipeline and navigate to the PipelineRun details page
    3.

Actual results:

    UI breaks

Expected results:

    UI should not break

Additional info:

    CEL expression doc https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#use-cel-expression-in-whenexpression

https://github.com/openshift/console/pull/13834

Bug OCPBUGS-31289: ART requests updates to 4.16 image ose-cloud-network-config-controller-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-network-config-controller/pull/135

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-network-config-controller/pull/135

Bug OCPBUGS-31384: api-int Certificate Authority rotation during 4.14.17 to 4.15.3 update

View the Description View the linked PRs

Description of problem:

In a cluster updating from 4.5.11 through many intermediate versions to 4.14.17 and on to 4.15.3 (initiated 2024-03-18T07:33:11Z), multus pods are sad about api-int X.509:

$ tar -xOz inspect.local.5020316083985214391/namespaces/openshift-kube-apiserver/core/events.yaml <hivei01ue1.inspect.local.5020316083985214391.gz | yaml2json | jq -r '[.items[] | select(.reason == "FailedCreatePodSandBox")][0].message'
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-928-ip-10-164-221-242.ec2.internal_openshift-kube-apiserver_9e87f20b-471a-447e-9679-edce26b4ef78_0(8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c): error adding pod openshift-kube-apiserver_installer-928-ip-10-164-221-242.ec2.internal to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c Netns:/var/run/netns/6e2b0b10-5006-4bf9-bd74-17333e0cdceb IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-928-ip-10-164-221-242.ec2.internal;K8S_POD_INFRA_CONTAINER_ID=8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c;K8S_POD_UID=9e87f20b-471a-447e-9679-edce26b4ef78 Path: StdinData:[REDACTED]} ContainerID:"8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c" Netns:"/var/run/netns/6e2b0b10-5006-4bf9-bd74-17333e0cdceb" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-928-ip-10-164-221-242.ec2.internal;K8S_POD_INFRA_CONTAINER_ID=8322d383c477c29fe0221fdca5eaf5ca5b2f57f8a7077c7dd7d2861be0f5288c;K8S_POD_UID=9e87f20b-471a-447e-9679-edce26b4ef78" Path:"" ERRORED: error configuring pod [openshift-kube-apiserver/installer-928-ip-10-164-221-242.ec2.internal] networking: Multus: [openshift-kube-apiserver/installer-928-ip-10-164-221-242.ec2.internal/9e87f20b-471a-447e-9679-edce26b4ef78]: error waiting for pod: Get "https://api-int.REDACTED:6443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-928-ip-10-164-221-242.ec2.internal?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority

Version-Release number of selected component (if applicable)

4.15.3, so we have 4.15.2's ~~OCPBUGS-30304~~ but not 4.15.5's ~~OCPBUGS-30237~~.

How reproducible

Seen in two clusters after updating from 4.14 to 4.15.3.

Steps to Reproduce

Unclear.

Actual results

Sad multus pods.

Expected results

Happy cluster.

Additional info

$ openssl s_client -showcerts -connect api-int.REDACTED:6443 < /dev/null
...
Certificate chain
 0 s:CN = api-int.REDACTED
   i:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 25 19:35:55 2024 GMT; NotAfter: Apr 24 19:35:56 2024 GMT
...
 1 s:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   i:CN = openshift-kube-apiserver-operator_loadbalancer-serving-signer@1710747228
   a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
   v:NotBefore: Mar 18 07:33:47 2024 GMT; NotAfter: Mar 16 07:33:48 2034 GMT
...

So that's created seconds after the update was initiated. We have inspect logs for some namespaces, but they don't go back quite that far, because the machine-config roll at the end of the update into 4.15.3 rolled all the pods:

$ tar -xOz inspect.local.5020316083985214391/namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-6cbfdd467c-4ctq7/kube-apiserver-operator/kube-apiserver-operator/logs/current.log <hivei01ue1.inspect.local.5020316083985214391.gz | head -n2
2024-03-18T08:22:05.058253904Z I0318 08:22:05.056255       1 cmd.go:241] Using service-serving-cert provided certificates
2024-03-18T08:22:05.058253904Z I0318 08:22:05.056351       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.

We were able to recover individual nodes via:

oc config new-kubelet-bootstrap-kubeconfig > bootstrap.kubeconfig from any machine with an admin kubeconfig
copy to all nodes as /etc/kubernetes/kubeconfig
on each node rm /var/lib/kubelet/kubeconfig
restart each node
approve each kubelet CSR
delete the node's multus-* pod.

Bug OCPBUGS-32347: ovn-ipsec-host pods are crashlooping

View the Description View the linked PRs

Description of problem:

The ovn-ipsec-host pods are crashlooping on a 24 node cluster.

Version-Release number of selected component (if applicable):

 4.16.0, master

How reproducible:

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/50690/rehearse-50690-pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.15-nightly-x86-control-plane-ipsec-24nodes/1780216294851743744

Steps to Reproduce:

Running rehearse test for the PR https://github.com/openshift/release/pull/50690

Actual results:

CI lane fails at control-plane-ipsec-24nodes-ipi-install-install step.

Seeing following errors from ipsec pod:

2024-04-16T14:18:01.158407293Z + counter=0
2024-04-16T14:18:01.158407293Z + '[' -f /etc/cni/net.d/10-ovn-kubernetes.conf ']'
2024-04-16T14:18:01.158512920Z ovnkube-node has configured node.
2024-04-16T14:18:01.158519623Z + echo 'ovnkube-node has configured node.'
2024-04-16T14:18:01.158519623Z + pgrep pluto
2024-04-16T14:18:01.166444142Z pluto is not running, enable the service and/or check system logs
2024-04-16T14:18:01.166465551Z + echo 'pluto is not running, enable the service and/or check system logs'
2024-04-16T14:18:01.166465551Z + exit 2

Expected results:

The step must pass and CI lane should succeed eventually.

Additional info:

The mcp status for the worker pool contains the following:

status:
  certExpirys:
  - bundle: KubeAPIServerServingCAData
    expiry: "2034-04-14T12:58:49Z"
    subject: CN=admin-kubeconfig-signer,OU=openshift
  - bundle: KubeAPIServerServingCAData
    expiry: "2024-04-17T12:58:51Z"
    subject: CN=kube-csr-signer_@1713274017
  - bundle: KubeAPIServerServingCAData
    expiry: "2024-04-17T12:58:51Z"
    subject: CN=kubelet-signer,OU=openshift
  - bundle: KubeAPIServerServingCAData
    expiry: "2025-04-16T12:58:51Z"
    subject: CN=kube-apiserver-to-kubelet-signer,OU=openshift
  - bundle: KubeAPIServerServingCAData
    expiry: "2025-04-16T12:58:51Z"
    subject: CN=kube-control-plane-signer,OU=openshift
  - bundle: KubeAPIServerServingCAData
    expiry: "2034-04-14T12:58:50Z"
    subject: CN=kubelet-bootstrap-kubeconfig-signer,OU=openshift
  - bundle: KubeAPIServerServingCAData
    expiry: "2025-04-16T13:26:54Z"
    subject: CN=openshift-kube-apiserver-operator_node-system-admin-signer@1713274014
  conditions:
  - lastTransitionTime: "2024-04-16T13:28:53Z"
    message: ""
    reason: ""
    status: "False"
    type: RenderDegraded
  - lastTransitionTime: "2024-04-16T13:34:52Z"
    message: ""
    reason: ""
    status: "False"
    type: Updated
  - lastTransitionTime: "2024-04-16T13:35:08Z"
    message: ""
    reason: ""
    status: "False"
    type: NodeDegraded
  - lastTransitionTime: "2024-04-16T13:35:08Z"
    message: ""
    reason: ""
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-16T13:34:52Z"
    message: All nodes are updating to MachineConfig rendered-worker-226a284eb61d46506202285ee1cf4688
    reason: ""
    status: "True"
    type: Updating
  configuration:
    name: rendered-worker-95c2861c75a83c0523dcba922c3b9982
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 98-worker-generated-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 97-worker-generated-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-worker-generated-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-worker-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-worker-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 80-ipsec-worker-extensions
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-worker-ssh
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-worker
  degradedMachineCount: 0
  machineCount: 24
  observedGeneration: 140
  readyMachineCount: 8
  unavailableMachineCount: 1
  updatedMachineCount: 8

https://github.com/openshift/cluster-network-operator/pull/2341

Bug OCPBUGS-33154: Incorrect use of go wait groups

View the Description View the linked PRs

Description of problem:

Looking at the code snippet at line 198, the wg.Add(1) should be moved closer to the function it is waiting for (line 226).

Having another function in between that could exit could leave the controller in a state where it will be waiting for a defer that can never occur, meaning that the controller will never terminate.

Version-Release number of selected component (if applicable):

Found on the master branch while cross-referencing errors/logs for a cluster.

How reproducible:

Not reproducible.

Additional info:

Not required: resolution has already found

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”

- Done

https://github.com/openshift/cloud-network-config-controller/pull/140

Bug OCPBUGS-34602: Registry overrides are being propagated to some data plane components

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32257~~. The following is the description of the original issue:
—
Description of problem:

When using the registry-overrides flag to override registries for control plane components, it seems like the current implementation prpagates the override to some data plane components. 

It seems that certain components like multus, dns, and ingress get values for their containers' images from env vars set in operators on the control plane (cno/dns operator/konnectivity), and hence also get the overridden registry propagated to them.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1.Input a registry override through the HyperShift Operator
    2.Check registry fields for components on data plane
    3.

Actual results:

Data plane components that get registry values from env vars set in dns-operator, ingress-operator, cluster-network-operator, and cluster-node-tuning-operator get overridden registries.

Expected results:

overriden registries should not get propagated to data plane

Additional info:

https://github.com/openshift/hypershift/pull/4114

Bug OCPBUGS-20151: Errors in node-exporter pod logs for missing /host/sys/class/fc_host/host*/symbolic_name

View the Description View the linked PRs

The node-exporter pods throws following errors if `symbolic_name` is not present or provided by fibre channel vendor.

$ oc logs node-exporter-m6lbc -n openshift-monitoring -c node-exporter | tail -2
2023-09-27T12:13:39.403106561Z ts=2023-09-27T12:13:39.403Z caller=collector.go:169 level=error msg="collector failed" name=fibrechannel duration_seconds=0.000249813 err="error obtaining FibreChannel class info: failed to read file \"/host/sys/class/fc_host/host0/symbolic_name\": open /host/sys/class/fc_host/host0/symbolic_name: no such file or directory"

https://github.com/prometheus/node_exporter/blob/master/collector/fibrechannel_linux.go#L116C28-L116C28

The ibmvfc kernel module does not supply `symbolic_name`.

https://github.com/torvalds/linux/blob/master/drivers/scsi/ibmvscsi/ibmvfc.c#L6308

grep -v "zZzZ" -H /sys/class/fc_host/host*/port_state
/sys/class/fc_host/host0/port_state:Online
/sys/class/fc_host/host1/port_state:Online

sh-5.1# cd /sys/class/fc_host/host0
sh-5.1# ls -ltr
total 0
rrr-. 1 root root 65536 Sep 28 19:43 speed
rrr-. 1 root root 65536 Sep 28 19:43 port_type
rrr-. 1 root root 65536 Sep 28 19:43 port_state
rrr-. 1 root root 65536 Sep 28 19:43 port_name
rrr-. 1 root root 65536 Sep 28 19:43 port_id
rrr-. 1 root root 65536 Sep 28 19:43 node_name
rrr-. 1 root root 65536 Sep 28 19:43 fabric_name
~~rw-r~~r-. 1 root root 65536 Sep 28 19:43 dev_loss_tmo
~~rw-r~~r-. 1 root root 65536 Oct 3 09:24 uevent
~~rw-r~~r-. 1 root root 65536 Oct 3 09:24 tgtid_bind_type
rrr-. 1 root root 65536 Oct 3 09:24 supported_classes
lrwxrwxrwx. 1 root root 0 Oct 3 09:24 subsystem -> ../../../../../../class/fc_host
drwxr-xr-x. 2 root root 0 Oct 3 09:24 power
rrr-. 1 root root 65536 Oct 3 09:24 maxframe_size
-w------. 1 root root 65536 Oct 3 09:24 issue_lip
lrwxrwxrwx. 1 root root 0 Oct 3 09:24 device -> ../../../host0

https://github.com/openshift/node_exporter/pull/144

Bug OCPBUGS-34773: HCP: hypershift-operator on disconnected clusters ignores RegistryOverrides inspecting for nodepool release image(setting hypershift.openshift.io/control-plane-operator-image is a workaround)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34734~~. The following is the description of the original issue:
—
Description of problem:

For the fix of ~~OCPBUGS-29494~~, only the hosted cluster was fixed, and changes to the node pool were ignored. The node pool encountered the following error:

    - lastTransitionTime: "2024-05-31T09:11:40Z"
      message: 'failed to check if we manage haproxy ignition config: failed to look
        up image metadata for registry.ci.openshift.org/ocp/4.14-2024-05-29-171450@sha256:9b88c6e3f7802b06e5de7cd3300aaf768e85d785d0847a70b35857e6d1000d51:
        failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-05-29-171450@sha256:9b88c6e3f7802b06e5de7cd3300aaf768e85d785d0847a70b35857e6d1000d51:
        unauthorized: authentication required'
      observedGeneration: 1
      reason: ValidationFailed
      status: "False"
      type: ValidMachineConfig

Version-Release number of selected component (if applicable):

    4.14, 4.15, 4.16, 4.17

How reproducible:

    100%

Steps to Reproduce:

    1. try to deploy an hostedCluster on a disconnected environment without explicitly set hypershift.openshift.io/control-plane-operator-image annotation.
    2.
    3.

Expected results:

without set hypershift.openshift.io/control-plane-operator-image annotation
nodepool can be ready

https://github.com/openshift/hypershift/pull/4137

Bug OCPBUGS-24914: Update 4.16 prometheus-config-reloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/265

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-30483: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/198

Bug OCPBUGS-35299: Panic when we remove an OCL infra MCP and we try to create new ones with different names

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33129~~. The following is the description of the original issue:
—
Description of problem:

Given that we create a new pool, and we enable OCB in this pool, and we remove the pool and the MachineOSConfig resource, and we create another new pool to enable OCB again, then the controller pod panics.

Version-Release number of selected component (if applicable):

pre-merge https://github.com/openshift/machine-config-operator/pull/4327

How reproducible:

Always

Steps to Reproduce:

    1. Create a new infra MCP

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""


    2. Create a MachineOSConfig for infra pool

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: infra
spec:
  machineConfigPool:
    name: infra
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
    renderedImagePushSecret:
      name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
    renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
EOF


    3. When the build is finished, remove the MachineOSConfig and the pool

oc delete machineosconfig infra
oc delete mcp infra

    4. Create a new infra1 pool
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra1
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra1]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra1: ""

    5. Create a new machineosconfig for infra1 pool

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: infra1
spec:
  machineConfigPool:
    name: infra1
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
    renderedImagePushSecret:
      name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
    renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
    containerFile:
    - containerfileArch: noarch
      content: |-
        RUN echo 'test image' > /etc/test-image.file
EOF

Actual results:

The MCO controller pod panics (in updateMachineOSBuild):

E0430 11:21:03.779078       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 265 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00035e000?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x3547bc0?, 0x53ebb20?})
	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?)
	<autogenerated>:1 +0x9
k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25
k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74
k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e
k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0007097a0, 0x0, 0x0?)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateMachineOSBuild(0xc0007097a0, {0xc001c37800?, 0xc000029678?}, {0x3904000?, 0xc0028361a0})
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:395 +0xd1
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:970 +0xea
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e5738?, {0x3de6020, 0xc0008fe780}, 0x1, 0xc0000ac720)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x6974616761706f72?, 0x3b9aca00, 0x0, 0x69?, 0xc0005e5788?)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000b97c20)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 248
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]



When the controller pod is restarted, it panics again, but in a different function (addMachineOSBuild):

E0430 11:26:54.753689       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 97 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x15555555aa?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x3547bc0?, 0x53ebb20?})
	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?)
	<autogenerated>:1 +0x9
k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25
k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74
k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e
k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc000899560, 0x0, 0x0?)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addMachineOSBuild(0xc000899560, {0x3904000?, 0xc0006a8b60})
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:386 +0xc5
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x13e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00066bf38?, {0x3de6020, 0xc0008f8b40}, 0x1, 0xc000c2ea20)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00066bf88?)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000ba6240)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 43
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]

Expected results:

No panic should happen. Errors should be controlled.

Additional info:

    In order to recover from this panic, we need to  manually delete the MachineOSBuild resources that are related to the pool that does not exist anymore.

https://github.com/openshift/machine-config-operator/pull/4403

Bug OCPBUGS-28388: Redundant reconciles by CCO's status controller

View the Description View the linked PRs

Description of problem:

The status controller of CCO reconciles 500+ times/h on average on a resting 6-node mint-mode OCP cluster on AWS.

Steps to Reproduce:

1. Install a 6-node mint-mode OCP cluster on AWS
2. Do nothing with it and wait for a couple of hours
3. Plot the following metric in the metrics dashboard of OCP console:
rate(controller_runtime_reconcile_total{controller="status"}[1h]) * 3600

Actual results:

500+ reconciles/h on a resting cluster

Expected results:

12-50 reconciles/h on a resting cluster
Note: the reconcile() function always requeues after 5min so the theoretical minimum is 12 reconciles/h

https://github.com/openshift/cloud-credential-operator/pull/665

Bug OCPBUGS-30439: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-nfs/pull/139

Bug OCPBUGS-31868: monitor test service-type-load-balancer-availability cleanup failing on http2 client connection lost

View the Description View the linked PRs

Component Readiness has found a potential regression in [Jira:"Networking / router"] monitor test service-type-load-balancer-availability cleanup.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.16
Start Time: 2024-04-02T00:00:00Z
End Time: 2024-04-08T23:59:59Z
Success Rate: 94.67%
Successes: 213
Failures: 12
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 751
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Networking%20%2F%20router&confidence=95&environment=sdn%20upgrade-minor%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-04-08%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-02%2000%3A00%3A00&testId=openshift-tests-upgrade%3A9bc4661b05ba13ed49d4c91f63899776&testName=%5BJira%3A%22Networking%20%2F%20router%22%5D%20monitor%20test%20service-type-load-balancer-availability%20cleanup&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

The failure message that we're after here is

{  failed during cleanup
Get "https://api.ci-op-tgk1b3if-9d969.ci2.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-service-lb-test-xqptd": http2: client connection lost}

Looking at the sample runs, the failure is in monitortest e2e xml junit files, and it appears this one always happens after upgrade, but before conformance. Unfortunately that means we may not have reliable intervals during the time this occurs. It also means there's no excuse for a connection lost to the apiserver.

Example: this junit xml from this job run

The problem actually dates back to March 3, see attachment for the full list of job runs affected. Almost entirely Azure, entirely 4.16 (never happened prior as far as we can see back).

It occurs in a poll loop checking if a namespace exists after being deleted. Failure rate seems to be around 5% of the time on this specific job.

Bug OCPBUGS-25631: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/71

Bug OCPBUGS-35500: update 4.16 CEO dev cert docs

View the Description View the linked PRs

fake bug to merge

https://github.com/openshift/cluster-etcd-operator/pull/1276

Bug OCPBUGS-35369: [4.16][HyperShift] don't enforce PSa in 4.16

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35252~~. The following is the description of the original issue:
—
Clone of original bug to ensure the change is made in HyperShift

Description of problem:

We shouldn't enforce PSa in 4.16, neither by label sync, neither by global cluster config.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

As a cluster admin:
1. create two new namespaces/projects: pokus, openshift-pokus
2. as a cluster-admin, attempt to create a privileged pod in both the namespaces from 1.

Actual results:

pod creation is blocked by pod security admission

Expected results:

only a warning about pod violating the namespace pod security level should be emitted

Additional info:

Bug OCPBUGS-37967: Rebase CAPO in 4.16

View the Description View the linked PRs

Description of problem:

    Rebase CAPO upstream for OCP 4.16

Version-Release number of selected component (if applicable):

    4.16

https://github.com/openshift/cluster-api-provider-openstack/pull/319

Bug OCPBUGS-38894: Image registry unable to run due to permissions error

View the Description View the linked PRs

This is a clone of issue OCPBUGS-38842. The following is the description of the original issue:
—
Component Readiness has found a potential regression in the following test:

[sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers for ns/openshift-image-registry

Probability of significant regression: 98.02%

Sample (being evaluated) Release: 4.17
Start Time: 2024-08-15T00:00:00Z
End Time: 2024-08-22T23:59:59Z
Success Rate: 94.74%
Successes: 180
Failures: 10
Flakes: 0

Base (historical) Release: 4.16
Start Time: 2024-05-31T00:00:00Z
End Time: 2024-06-27T23:59:59Z
Success Rate: 100.00%
Successes: 89
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Architecture=amd64&Architecture=amd64&FeatureSet=default&FeatureSet=default&Installer=ipi&Installer=ipi&Network=ovn&Network=ovn&NetworkAccess=default&Platform=aws&Platform=aws&Scheduler=default&SecurityMode=default&Suite=unknown&Suite=unknown&Topology=ha&Topology=ha&Upgrade=micro&Upgrade=micro&baseEndTime=2024-06-27%2023%3A59%3A59&baseRelease=4.16&baseStartTime=2024-05-31%2000%3A00%3A00&capability=Other&columnGroupBy=Platform%2CArchitecture%2CNetwork&component=Image%20Registry&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20ipi%20ovn%20aws%20unknown%20ha%20micro&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&minFail=3&pity=5&sampleEndTime=2024-08-22%2023%3A59%3A59&sampleRelease=4.17&sampleStartTime=2024-08-15%2000%3A00%3A00&testId=openshift-tests-upgrade%3A10a9e2be27aa9ae799fde61bf8c992f6&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers%20for%20ns%2Fopenshift-image-registry

Also hitting 4.17, I've aligned this bug to 4.18 so the backport process is cleaner.

The problem appears to be a permissions error preventing the pods from starting:

2024-08-22T06:14:14.743856620Z ln: failed to create symbolic link '/etc/pki/ca-trust/extracted/pem/directory-hash/ca-certificates.crt': Permission denied

Originating from this code: https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/resource/podtemplatespec.go#L489

Both 4.17 and 4.18 nightlies bumped rhcos and in there is an upgrade like this:

container-selinux-3-2.231.0-1.rhaos4.16.el9-noarch container-selinux-3-2.231.0-2.rhaos4.17.el9-noarch

With slightly different versions in each stream, but both were on 3-2.231.

Hits other tests too:

operator conditions image-registry
Operator upgrade image-registry
[sig-cluster-lifecycle] Cluster completes upgrade
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
[sig-arch][Feature:ClusterUpgrade] Cluster should be upgradeable after finishing upgrade [Late][Suite:upgrade]

Bug OCPBUGS-25124: Update 4.16 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/130

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/130

Bug OCPBUGS-27792: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-28282: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/130

Bug OCPBUGS-30958: Upgrade EventListener apiVersion to v1beta1

View the Description View the linked PRs

Description of problem:

    Support of apiVersion v1alpha1 has been removed. So, it is better to upgrade the apiVersion to v1beta1.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13702

Bug OCPBUGS-32760: Hypershift Operator is scheduing control plane on Deleting nodes

View the Description View the linked PRs

Description of problem:

    Hypershift Operator is scheduing control plane on Deleting nodes

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

https://web-rca.devshift.net/incident/ITN-2024-00068

    1. HO was trying to create an HCP on a node being deleted
    2. HO couldn't find the paired node 'cause it was already deleted 
  
 Forcing the removal of pending node (blocked by PDB) solves the issue

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3929

Bug OCPBUGS-25441: Oh no! Something went wrong" in Topology -> Observese Tab

View the Description View the linked PRs

Description of problem:

    Oh no! Something went wrong" in Topology -> Observese Tab

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-14-115151

How reproducible:

    Always

Steps to Reproduce:

    1.Navigate to Topology -> click one deployment and go to Observer Tab
    2.
    3.

Actual results:

    The page crushed
ErrorDescription:Component trace:Copy to clipboardat te (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-b3bd2b20c770a4e73b50.min.js:31:9773)
    at j (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-b3bd2b20c770a4e73b50.min.js:12:3324)
    at div
    at s (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:60:70124)
    at div
    at g (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:6:11163)
    at div
    at d (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:1:174472)
    at t.a (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/dev-console/code-refs/topology-chunk-769d28af48dd4b29136f.min.js:1:487478)
    at t.a (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/dev-console/code-refs/topology-chunk-769d28af48dd4b29136f.min.js:1:486390)
    at div
    at l (https://console-openshift-console.apps.qe-uidaily-1215.qe.devcluster.openshift.com/static/vendor-patternfly-5~main-chunk-c9c3c11a060d045a85da.min.js:60:106304)
    at div

Expected results:
{code:none}
    not crush

Additional info:

https://github.com/openshift/console/pull/13451

Bug OCPBUGS-33157: IPv6 metal-ipi jobs: master-bmh-update loosing access to API

View the Description View the linked PRs

The last 4 IPv6 jobs are failing on the same error

https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6
master-bmh-update.log looses access to the the API when trying to get/update the BMH details

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6/1785492737169035264

May 01 03:32:23 localhost.localdomain master-bmh-update.sh[4663]: Waiting for 3 masters to become provisioned
May 01 03:32:23 localhost.localdomain master-bmh-update.sh[24484]: E0501 03:32:23.531242   24484 memcache.go:265] couldn't get current server API group list: Get "https://api-int.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp [fd2e:6f44:5dd8:c956::5]:6443: connect: connection refused
May 01 03:32:23 localhost.localdomain master-bmh-update.sh[24484]: E0501 03:32:23.531808   24484 memcache.go:265] couldn't get current server API group list: Get "https://api-int.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp [fd2e:6f44:5dd8:c956::5]:6443: connect: connection refused
May 01 03:32:23 localhost.localdomain master-bmh-update.sh[24484]: E0501 03:32:23.533281   24484 memcache.go:265] couldn't get current server API group list: Get "https://api-int.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp [fd2e:6f44:5dd8:c956::5]:6443: connect: connection refused
May 01 03:32:23 localhost.localdomain master-bmh-update.sh[24484]: E0501 03:32:23.533630   24484 memcache.go:265] couldn't get current server API group list: Get "https://api-int.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp [fd2e:6f44:5dd8:c956::5]:6443: connect: connection refused
May 01 03:32:23 localhost.localdomain master-bmh-update.sh[24484]: E0501 03:32:23.535180   24484 memcache.go:265] couldn't get current server API group list: Get "https://api-int.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp [fd2e:6f44:5dd8:c956::5]:6443: connect: connection refused
May 01 03:32:23 localhost.localdomain master-bmh-update.sh[24484]: The connection to the server api-int.ostest.test.metalkube.org:6443 was refused - did you specify the right host or port?

https://github.com/openshift/installer/pull/8346

Bug OCPBUGS-35733: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8626

Bug OCPBUGS-37735: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/168

Bug OCPBUGS-26504: Update 4.16 atomic-openshift-cluster-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/279

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/279

Bug OCPBUGS-32785: Azure Workload Identity in static PVs did not work for CSI-File

View the Description View the linked PRs

Description of problem:

The customer uses Azure File CSI driver and without this they cannot make use of the Azure Workload Identity work which was one of the banner features of OCP 4.14. This feature is currently available in 4.16, however it will take the customer 3-6 months to validate 4.16 and start its rollout putting their plans to complete a large migration to Azure by end of 2024 at risk.
Could you please backport either the 1.29.3 feature for Azure Workload Idenity or rebase our Azure File CSI driver in 4.14 and 4.15 to at least 1.29.3 which includes the desired feature.

Version-Release number of selected component (if applicable):

azure-file-csi-driver in 4.14 and 4.15
- In 4.14, azure-file-csi-driver is version 1.28.1
- In 4.15, azure-file-csi-driver is version 1.29.2

How reproducible:

Always

Steps to Reproduce:

    1. Install ocp 4.14 with Azure Workload Managed Identity
    2. Try to configure Managed Workload Identiy with Azure CSI file

https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/docs/workload-identity-static-pv-mount.md

Actual results:

Is not usable

Expected results:

Azure Workload Identity should be manage with Azure File CSi as part of the whole feature

Additional info:

Bug OCPBUGS-37936: HCP CCMs attempt direct internet access with proxied management cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37832~~. The following is the description of the original issue:
—
CCMs attempt direct connections when the mgmt cluster on which the HCP runs is proxied and does not allow direction outbound connections.

Example from the AWS CCM

 I0731 21:46:33.948466       1 event.go:389] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: error listing AWS instances: \"WebIdentityErr: failed to retrieve credentials\\ncaused by: RequestError: send request failed\\ncaused by: Post \\\"https://sts.us-east-1.amazonaws.com/\\\": dial tcp 72.21.206.96:443: i/o timeout\""

https://github.com/openshift/hypershift/pull/4476

Bug OCPBUGS-24830: Update 4.16 openshift-enterprise-console-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console/pull/13434

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-28583: Update 4.16 multus-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-cni/pull/212

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/multus-cni/pull/212

Bug MGMT-17008: [STG] Unable to install cluster with lvms cnv and MCE

View the Description View the linked PRs

Description of the problem:
When trying to install cluster on 4.15 LVMS with CNV and MCE operator
In operator page i am unabe to continue with installation
since host discovery pointing that hosts required more resources (as if i also have selected odf)

How reproducible:

Steps to reproduce:

1.Create a multi node cluster 4.15

2.make sure to have enough resources for lvms , cnv and mce operator

3.select cnv lvms and mce operator

Actual results:

In operator page it show that also cpu and ram resources related to ODF are required (which is not selected) and user unable to start installation

Expected results:

Should be able to start installation

https://github.com/openshift/assisted-service/pull/6021

Bug OCPBUGS-26181: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/495

Bug OCPBUGS-30132: OCP 4.15.0 is not correctly refreshing operator catalogs (imagePullPolicy: IfNotPresent)

View the Description View the linked PRs

Description of problem:

In OCP 4.14 the catalog pods in openshift-marketplace where defined as:

$ oc get pods -n openshift-marketplace redhat-operators-4bnz4 -o yaml
apiVersion: v1
kind: Pod
metadata:
...
  labels:
    olm.catalogSource: redhat-operators
    olm.pod-spec-hash: 658b699dc
  name: redhat-operators-4bnz4
  namespace: openshift-marketplace
...
spec:
  containers:
  - image: registry.redhat.io/redhat/redhat-operator-index:v4.14
    imagePullPolicy: Always



Now on OCP 4.15 they are defined as:
apiVersion: v1
kind: Pod
metadata:
...
  name: redhat-operators-44wxs
  namespace: openshift-marketplace
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: CatalogSource
    name: redhat-operators
    uid: 3b41ac7b-7ad1-4d58-a62f-4a9e667ae356
  resourceVersion: "877589"
  uid: 65ad927c-3764-4412-8d34-82fd856a4cbc
spec:
  containers:
  - args:
    - serve
    - /extracted-catalog/catalog
    - --cache-dir=/extracted-catalog/cache
    command:
    - /bin/opm
...
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7259b65d8ae04c89cf8c4211e4d9ddc054bb8aebc7f26fac6699b314dc40dbe3
    imagePullPolicy: Always
...
  initContainers:
...
  - args:
    - --catalog.from=/configs
    - --catalog.to=/extracted-catalog/catalog
    - --cache.from=/tmp/cache
    - --cache.to=/extracted-catalog/cache
    command:
    - /utilities/copy-content
    image: registry.redhat.io/redhat/redhat-operator-index:v4.15
    imagePullPolicy: IfNotPresent
...



And due to `imagePullPolicy: IfNotPresent` on the initContainer used to extract the index image (referenced by tag) content, they are never really updated.

Version-Release number of selected component (if applicable):

    OCP 4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. wait for the next version of a released operator on OCP 4.15
    2.
    3.

Actual results:

    Operator catalogs are never really refreshed due to  imagePullPolicy: IfNotPresent for the index image

Expected results:

    Operator catalogs are periodically (every 10 minutes by default) refreshed

Additional info:

https://github.com/openshift/operator-framework-olm/pull/709

Bug OCPBUGS-24908: Update 4.16 ose-cluster-csi-snapshot-controller-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/178

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/178

Bug OCPBUGS-24912: Update 4.16 configmap-reload-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/58

Bug OCPBUGS-34575: Rename machine-config-operator CRDs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34397~~. The following is the description of the original issue:
—
Description of problem:

Upstream machine-config-operator has renamed their CRDs https://github.com/openshift/machine-config-operator/tree/master/install.  HyperShift must make similar changes now in 4.16.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4105

Bug OCPBUGS-38968: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14193

Bug OCPBUGS-23085: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4046

Bug OCPBUGS-14478: large number of additional manifests exceeds ignition area

View the Description View the linked PRs

Description of problem:

This was discovered during Contrail testing when a large number of additional manifests specific to contrail were added to the openshift/ dir. The additional manifests are here - https://github.com/Juniper/contrail-networking/tree/main/releases/23.1/ocp.

When creating the agent image the following error occurred:
failed to fetch Agent Installer ISO: failed to generate asset \"Agent Installer ISO\": failed to create overwrite reader for ignition: content length (802204) exceeds embed area size (262144)"]

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8205

Bug OCPBUGS-24972: Update 4.16 ose-cluster-openshift-apiserver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/561

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/561

Bug OCPBUGS-25440: [AWS] iam:TagInstanceProfile permission is required for ipi install

View the Description View the linked PRs

Description of problem:

iam:TagInstanceProfile is not listed in official document [1], IPI install would fail if iam:TagInstanceProfile permission is missing

level=error msg=Error: creating IAM Instance Profile (ci-op-4hw2rz1v-49c30-zt9vx-worker-profile): AccessDenied: User: arn:aws:iam::301721915996:user/ci-op-4hw2rz1v-49c30-minimal-perm is not authorized to perform: iam:TagInstanceProfile on resource: arn:aws:iam::301721915996:instance-profile/ci-op-4hw2rz1v-49c30-zt9vx-worker-profile because no identity-based policy allows the iam:TagInstanceProfile action
level=error msg=    status code: 403, request id: bb0641f5-d01c-4538-b333-261a804ddb59

[1] https://docs.openshift.com/container-platform/4.14/installing/installing_aws/installing-aws-account.html#installation-aws-permissions_installing-aws-account

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-14-115151

How reproducible:

Always

Steps to Reproduce:

    1. install a common IPI cluster with minimal permission provided in official document
    2.
    3.

Actual results:

Install failed.

Expected results:

Additional info:

install does a precheck for iam:TagInstanceProfile

https://github.com/openshift/installer/pull/7843

Bug OCPBUGS-25172: Update 4.16 ose-olm-catalogd-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-catalogd/pull/36

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-catalogd/pull/37

Task MON-3671: update OWNERS file for configmap-reloader

View the linked PRs

https://github.com/openshift/configmap-reload/pull/55

Bug OCPBUGS-25052: Update 4.16 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/128

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/131

Bug OCPBUGS-29760: Dev console: Observe > Dashboard page should be called "Dashboards"

View the Description View the linked PRs

Description of problem:

There are multiple dashboards, so this page title should be "Dashboards" rather than "Dashboard". The admin console version of this page is already titled "Dashboards".

Version-Release number of selected component (if applicable):

4.16

Steps to Reproduce:

    1. Open "Developer" View > Observe

Actual results:

    See the tab title is "Dashboard"

Expected results:

    The tab title should be "Dashboards"

https://github.com/openshift/console/pull/13597

Bug OCPBUGS-33041: Anonymous Users Cannot Trigger BuildConfig Webhooks

View the Description View the linked PRs

Description of problem:

When triggering a build from a webhook (HTTP POST request), it fails with 403 - FORBIDDEN if the request does not have an OpenShift authorization token.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Create a BuildConfig with a webhook trigger and configured secret
    2. Make appropriate cURL call to trigger the build via webhook

Actual results:

Webhook call refused with 403 Forbidden:      "message": "buildconfigs.build.openshift.io \"sample-build\" is forbidden: User \"system:anonymous\" cannot create resource \"buildconfigs/webhooks\" in API group \"build.openshift.io\" in the namespace \"e2e-test-cli-start-build-dxxkx\"",

Expected results:

Builds can be triggered via webhook

Additional info:

https://docs.openshift.com/container-platform/4.15/cicd/builds/triggering-builds-build-hooks.html#builds-webhook-triggers_triggering-builds-build-hooks

https://github.com/openshift/origin/pull/28750

Bug OCPBUGS-32390: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/472

Bug OCPBUGS-33976: baremetal operator not starting on assisted/agent installs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33493~~. The following is the description of the original issue:
—
The provisioning CR is now created with a paused annotation (since https://github.com/openshift/installer/pull/8346)

On baremetal IPI installs, this annotation is removed at the conclusion of bootstrapping.

On assisted/ABI installs there is nothing to remove it, so cluster-baremetal-operator never deploys anything.

https://github.com/openshift/installer/pull/8435

Bug OCPBUGS-29482: Power VS: All deploys are failing due to terraform-provider-ibm

View the Description View the linked PRs

Description of problem:

    A change to how Power VS Workspaces are queried is not compatible with the version of terraform-provider-ibm

Version-Release number of selected component (if applicable):

How reproducible:

    Easily

Steps to Reproduce:

    1. Try to deploy with Power VS
    2. Fail with an error stating that [ERROR] Error retrieving service offering: ServiceDoesnotExist: Given service : "power-iaas" doesn't exist

Actual results:

    Fail with [ERROR] Error retrieving service offering: ServiceDoesnotExist: Given service : "power-iaas" doesn't exist

Expected results:

    Install should succeed.

Additional info:

https://github.com/openshift/installer/pull/8023

Bug OCPBUGS-33505: Undiagnosed panic detected in openshift-console-operator

View the Description View the linked PRs

Noticed in k8s 1.30 PR, here's the run where it happened:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_kubernetes/1953/pull-ci-openshift-kubernetes-master-e2e-aws-ovn-fips/1788800196772106240

E0510 05:58:26.315444       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 992 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26915e0?, 0x471dff0})
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x26915e0?, 0x471dff0?})
	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/console-operator/pkg/console/controllers/healthcheck.(*HealthCheckController).CheckRouteHealth.func2()
	/go/src/github.com/openshift/console-operator/pkg/console/controllers/healthcheck/controller.go:156 +0x62
k8s.io/client-go/util/retry.OnError.func1()
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/util/retry/util.go:51 +0x30
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x2fdcde8?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:145 +0x3e
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x2fdcde8?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:461 +0x5a
k8s.io/client-go/util/retry.OnError({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x2667a00?, 0xc001b185d0?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/client-go/util/retry/util.go:50 +0xa5
github.com/openshift/console-operator/pkg/console/controllers/healthcheck.(*HealthCheckController).CheckRouteHealth(0xc001b097e8?, {0x2fdce90?, 0xc00057c870?}, 0x16?, 0x2faf140?)
	/go/src/github.com/openshift/console-operator/pkg/console/controllers/healthcheck/controller.go:152 +0x9a
github.com/openshift/console-operator/pkg/console/controllers/healthcheck.(*HealthCheckController).Sync(0xc000748ae0, {0x2fdce90, 0xc00057c870}, {0x7f84e80672b0?, 0x7f852f941108?})
	/go/src/github.com/openshift/console-operator/pkg/console/controllers/healthcheck/controller.go:143 +0x8eb
github.com/openshift/library-go/pkg/controller/factory.(*baseController).reconcile(0xc000b57950, {0x2fdce90, 0xc00057c870}, {0x2fd5350?, 0xc001b185a0?})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:201 +0x43
github.com/openshift/library-go/pkg/controller/factory.(*baseController).processNextWorkItem(0xc000b57950, {0x2fdce90, 0xc00057c870})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:260 +0x1b4
github.com/openshift/library-go/pkg/controller/factory.(*baseController).runWorker.func1({0x2fdce90, 0xc00057c870})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:192 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x22
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0014b7b60?, {0x2faf040, 0xc001b18570}, 0x1, 0xc0014b7b60)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00057c870?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x2fdce90, 0xc00057c870}, 0xc00139c770, 0x0?, 0x0?, 0x0?)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x93
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(...)
	/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:170
github.com/openshift/library-go/pkg/controller/factory.(*baseController).runWorker(0x0?, {0x2fdce90?, 0xc00057c870?})
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:183 +0x4d
github.com/openshift/library-go/pkg/controller/factory.(*baseController).Run.func2()
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:117 +0x65
created by github.com/openshift/library-go/pkg/controller/factory.(*baseController).Run in goroutine 749
	/go/src/github.com/openshift/console-operator/vendor/github.com/openshift/library-go/pkg/controller/factory/base_controller.go:112 +0x2ba

https://github.com/openshift/console-operator/pull/900

Bug OCPBUGS-10996: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1899

Bug OCPBUGS-24967: Update 4.16 ose-cluster-control-plane-machine-set-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/268

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/268

Bug OCPBUGS-32218: "Oh no! Something went wrong." will shown on Pending pod details page

View the Description View the linked PRs

Description of problem:

"Oh no! Something went wrong." will shown on Pending pod details page

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-04-14-063437

How reproducible:

always

Steps to Reproduce:

1. Create a dummy pod with pending status
eg:
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd
    
OR 

apiVersion: v1
kind: Pod
metadata:  
  name: dummy-pod
spec:  
  containers:    
    - name: dummy-pod      
    image: ubuntu  
  restartPolicy: Always  
  nodeSelector:    
    testtype: pending


2. Navigate to Pod Details page
3.

Actual results:

Oh no! Something went wrong. will shown

TypeError
Description:Cannot read properties of undefined (reading 'restartCount')

Component trace:Copy to clipboardat fe (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/main-chunk-7643d3f1edb399bb7d65.min.js:1:562500)
    at div
    at div
    at ve (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/main-chunk-7643d3f1edb399bb7d65.min.js:1:563346)
    at div
    at ke (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/main-chunk-7643d3f1edb399bb7d65.min.js:1:571308)
    at i (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/main-chunk-7643d3f1edb399bb7d65.min.js:1:329180)
    at _ (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-4dc722526d0f0470939e.min.js:31:4920)
    at ne (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/vendor-plugins-shared~main-chunk-4dc722526d0f0470939e.min.js:31:10364)
    at Suspense
    at div
    at k (https://console-openshift-console.apps.qe-daily-416-0415.qe.azure.devcluster.openshift.com/static/main-chunk-7643d3f1edb399bb7d65.min.js:1:118938)

Expected results:

    no issue was found

Additional info:

Enable pod securtiy labels when create the pod via UI:
$ oc label namespace <ns> security.openshift.io/scc.podSecurityLabelSync=false --overwrite
$ oc label namespace <ns> pod-security.kubernetes.io/enforce=privileged --overwrite
$ oc label namespace <ns> pod-security.kubernetes.io/audit=privileged --overwrite
$ oc label namespace <ns> fix

https://github.com/openshift/console/pull/13768

Bug OCPBUGS-29858: origin needs workaround for ROSA's infra labels

View the Description View the linked PRs

The convention is a format like node-role.kubernetes.io/role: "", not node-role.kubernetes.io: role, however ROSA uses the latter format to indicate the infra role. This changes the node watch code to ignore it, as well as other potential variations like node-role.kubernetes.io/.

The current code panics when run against a ROSA cluster:
{{ E0209 18:10:55.533265 78 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
goroutine 233 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x7a71840?, 0xc0018e2f48})
k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1000251f9fe?})
k8s.io/apimachinery@v0.27.2/pkg/util/runtime/runtime.go:49 +0x75
panic({0x7a71840, 0xc0018e2f48})
runtime/panic.go:884 +0x213
github.com/openshift/origin/pkg/monitortests/node/watchnodes.nodeRoles(0x7ecd7b3?)
github.com/openshift/origin/pkg/monitortests/node/watchnodes/node.go:187 +0x1e5
github.com/openshift/origin/pkg/monitortests/node/watchnodes.startNodeMonitoring.func1(0}}

https://github.com/openshift/origin/pull/28585

Bug OCPBUGS-26762: CNO unable to healthcheck api.openshift.com on HyperShift when a proxy is configured

View the Description View the linked PRs

Description of problem:

When a proxy.config.openshift.io is specified on a HyperShift cluster (in this case ROSA HCP), the network cluster operator is degraded:

❯ k get co network                                                                                                 
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGEhttps://github.com/openshift/ovn-kubernetes/pull/2135 network   4.14.6    True        False         True       2d1h    The configuration is invalid for proxy 'cluster' (readinessEndpoint probe failed for endpoint 'https://api.openshift.com': endpoint probe failed for endpoint 'https://api.openshift.com' using proxy 'http://ip-172-17-1-38.ec2.internal:3128': Get "https://api.openshift.com": Service Unavailable). Use 'oc edit proxy.config.openshift.io cluster' to fix.

because the CNO pod runs on the management cluster and does not have connectivity to the customer's proxy which is accessible from the HyperShift worker nodes' network.

Version-Release number of selected component (if applicable):

4.14.6

How reproducible:

100%

Steps to Reproduce:

1. Create a proxy that's only accessible from a HyperShift cluster's workers network
2. Update the cluster's proxy.config.openshift.io cluster object accordingly
3. Observe that the network ClusterOperator is degraded

Actual results:

I'm not sure how important it is that the CNO has connectivity to api.openshift.com and leave it up for discussion. Maybe CNO should ignore the proxy configuration in HyperShift for its own health checks for example.

Expected results:

The network ClusterOperator is not degraded

Additional info:

https://github.com/openshift/hypershift/pull/3986

Bug OCPBUGS-41976: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-powervs/pull/84

Task HOSTEDCP-1479: Updating github.com/IBM/networking-go-sdk 0.45.0 causes issues with PowerVS infra create

View the Description View the linked PRs

Updating github.com/IBM/networking-go-sdk 0.45.0 causes issues with PowerVS infra create as the API has changed.

https://github.com/openshift/hypershift/pull/3722

Bug OCPBUGS-23430: EgressIP cannot be applied to egress node(rhcos) on clusters with Windows nodes existing

View the Description View the linked PRs

Description of problem:

On a hybrid cluster with Windows nodes and coreOS nodes mixed, egressIP cannot be applied to coreOS anymore. 
QE testing profile: 53_IPI on AWS & OVN & WindowsContainer

Version-Release number of selected component (if applicable):

4.14.3

How reproducible:

Always

Steps to Reproduce:

1.  Setup cluster with template aos-4_14/ipi-on-aws/versioned-installer-ovn-winc-ci
2.  Label on coreOS node as egress node 
% oc describe node ip-10-0-59-132.us-east-2.compute.internal
Name:               ip-10-0-59-132.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m6i.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2b
                    k8s.ovn.org/egress-assignable=
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-59-132.us-east-2.compute.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m6i.xlarge
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-2b
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2b
Annotations:        cloud.network.openshift.io/egress-ipconfig:
                      [{"interface":"eni-0c661bbdbb0dde54a","ifaddr":{"ipv4":"10.0.32.0/19"},"capacity":{"ipv4":14,"ipv6":15}}]
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0629862832fff4ae3"}
                    k8s.ovn.org/host-cidrs: ["10.0.59.132/19"]
                    k8s.ovn.org/hybrid-overlay-distributed-router-gateway-ip: 10.129.2.13
                    k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 0a:58:0a:81:02:0d
                    k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-59-132.us-east-2.compute.internal","mac-address":"06:06:e2:7b:9c:45","ip-address...
                    k8s.ovn.org/network-ids: {"default":"0"}
                    k8s.ovn.org/node-chassis-id: fa1ac464-5744-40e9-96ca-6cdc74ffa9be
                    k8s.ovn.org/node-gateway-router-lrp-ifaddr: {"ipv4":"100.64.0.7/16"}
                    k8s.ovn.org/node-id: 7
                    k8s.ovn.org/node-mgmt-port-mac-address: a6:25:4e:55:55:36
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.59.132/19"}
                    k8s.ovn.org/node-subnets: {"default":["10.129.2.0/23"]}
                    k8s.ovn.org/node-transit-switch-port-ifaddr: {"ipv4":"100.88.0.7/16"}
                    k8s.ovn.org/remote-zone-migrated: ip-10-0-59-132.us-east-2.compute.internal
                    k8s.ovn.org/zone-name: ip-10-0-59-132.us-east-2.compute.internal
                    machine.openshift.io/machine: openshift-machine-api/wduan-debug-1120-vtxkp-worker-us-east-2b-z6wlc
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-5a29871efb344f7e3a3dc51c42c21113
                    machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: 22806
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 20 Nov 2023 09:46:53 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-59-132.us-east-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Mon, 20 Nov 2023 14:01:05 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:46:53 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:46:53 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:46:53 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 20 Nov 2023 13:57:33 +0800   Mon, 20 Nov 2023 09:47:34 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.59.132
  InternalDNS:  ip-10-0-59-132.us-east-2.compute.internal
  Hostname:     ip-10-0-59-132.us-east-2.compute.internal
Capacity:
  cpu:                4
  ephemeral-storage:  125238252Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16092956Ki
  pods:               250
Allocatable:
  cpu:                3500m
  ephemeral-storage:  114345831029
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             14941980Ki
  pods:               250
System Info:
  Machine ID:                             ec21151a2a80230ce1e1926b4f8a902c
  System UUID:                            ec21151a-2a80-230c-e1e1-926b4f8a902c
  Boot ID:                                cf4b2e39-05ad-4aea-8e53-be669b212c4f
  Kernel Version:                         5.14.0-284.41.1.el9_2.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 414.92.202311150705-0 (Plow)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.27.1-13.1.rhaos4.14.git956c5f7.el9
  Kubelet Version:                        v1.27.6+b49f9d1
  Kube-Proxy Version:                     v1.27.6+b49f9d1
ProviderID:                               aws:///us-east-2b/i-0629862832fff4ae3
Non-terminated Pods:                      (21 in total)
  Namespace                               Name                                                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                               ----                                                      ------------  ----------  ---------------  -------------  ---
  openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-tlw5h                             30m (0%)      0 (0%)      150Mi (1%)       0 (0%)         4h14m
  openshift-cluster-node-tuning-operator  tuned-4fvgv                                               10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         4h14m
  openshift-dns                           dns-default-z89zl                                         60m (1%)      0 (0%)      110Mi (0%)       0 (0%)         11m
  openshift-dns                           node-resolver-v9stn                                       5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         4h14m
  openshift-image-registry                image-registry-67b88dc677-76hfn                           100m (2%)     0 (0%)      256Mi (1%)       0 (0%)         4h14m
  openshift-image-registry                node-ca-hw62n                                             10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         4h14m
  openshift-ingress-canary                ingress-canary-9r9f8                                      10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         4h13m
  openshift-ingress                       router-default-5957f4f4c6-tl9gs                           100m (2%)     0 (0%)      256Mi (1%)       0 (0%)         4h18m
  openshift-machine-config-operator       machine-config-daemon-h7fx4                               40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         4h14m
  openshift-monitoring                    alertmanager-main-1                                       9m (0%)       0 (0%)      120Mi (0%)       0 (0%)         4h12m
  openshift-monitoring                    monitoring-plugin-68995cb674-w2wr9                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                    node-exporter-kbq8z                                       9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                    prometheus-adapter-54fc7b9c87-sg4vt                       1m (0%)       0 (0%)      40Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                    prometheus-k8s-1                                          75m (2%)      0 (0%)      1104Mi (7%)      0 (0%)         4h12m
  openshift-monitoring                    prometheus-operator-admission-webhook-84b7fffcdc-x8hsz    5m (0%)       0 (0%)      30Mi (0%)        0 (0%)         4h18m
  openshift-monitoring                    thanos-querier-59cbd86d58-cjkxt                           15m (0%)      0 (0%)      92Mi (0%)        0 (0%)         4h13m
  openshift-multus                        multus-7gjnt                                              10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         4h14m
  openshift-multus                        multus-additional-cni-plugins-gn7x9                       10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         4h14m
  openshift-multus                        network-metrics-daemon-88tf6                              20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         4h14m
  openshift-network-diagnostics           network-check-target-kpv5v                                10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         4h14m
  openshift-ovn-kubernetes                ovnkube-node-74nl9                                        80m (2%)      0 (0%)      1630Mi (11%)     0 (0%)         3h51m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                619m (17%)    0 (0%)
  memory             4296Mi (29%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

 % oc get node -l k8s.ovn.org/egress-assignable=             
NAME                                        STATUS   ROLES    AGE     VERSION
ip-10-0-59-132.us-east-2.compute.internal   Ready    worker   4h14m   v1.27.6+b49f9d1
3.  Create egressIP object

Actual results:

% oc get egressip        
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   10.0.59.101        

% oc get cloudprivateipconfig
No resources found

Expected results:

The egressIP should be applied to egress node

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2018

Vulnerability OCPBUGS-43958: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/604

Bug OCPBUGS-24860: Update 4.16 ose-egress-router-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/egress-router-cni/pull/79

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/egress-router-cni/pull/79

Bug OCPBUGS-24963: Update 4.16 cluster-version-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-version-operator/pull/1004

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24781: Update 4.16 ironic-agent-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-agent-image/pull/97

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-agent-image/pull/97

Bug OCPBUGS-25535: Update 4.16 ose-olm-rukpak-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-rukpak/pull/68

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-rukpak/pull/68

Bug OCPBUGS-34779: Need auth to access public images

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33453~~. The following is the description of the original issue:
—
Description of problem:

Can't access the openshift namespace images without auth after grant public access to openshift namespace

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-05-05-102537

How reproducible:

    always

Steps to Reproduce:

    1.   $ oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge
  $ HOST=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
    2. $ oc adm policy add-role-to-group system:image-puller system:unauthenticated --namespace openshift
  Warning: Group 'system:unauthenticated' not found
clusterrole.rbac.authorization.k8s.io/system:image-puller added: "system:unauthenticated"

    3. Try to fetch image metadata:
    $ oc image info --insecure "${HOST}/openshift/cli:latest"

Actual results:

   $ oc image info default-route-openshift-image-registry.apps.wxj-a41659.qe.azure.devcluster.openshift.com/openshift/cli:latest  --insecure
error: unable to read image default-route-openshift-image-registry.apps.wxj-a41659.qe.azure.devcluster.openshift.com/openshift/cli:latest: unauthorized: authentication required

Expected results:

    Could get the public image info without auth

Additional info:

   This is a regression for 4.16, this feature works on 4.15 and below.

Bug OCPBUGS-24486: cloud-provider-azure: drop temporary HealthProbe downstream carry

View the Description View the linked PRs

We added a carry patch to change the healthcheck behaviour in the Azure CCM: https://github.com/openshift/cloud-provider-azure/pull/72 and whilst we opened an upstream twin PR for that https://github.com/kubernetes-sigs/cloud-provider-azure/pull/3887 it got closed in favour of a different approach https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4891 .

As such in the next rebase we need to drop the commit introduced in 72, in favour of downstreaming through the rebase the change in 4891. While doing that we need to explicitly set the new probe behaviour, as the default is still the classic behaviour, which doesn't work with our cluster architecture setup on Azure.

For the steps on how to do this, we can follow this comment: https://github.com/openshift/cloud-provider-azure/pull/88#issuecomment-1803832076

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/326

Bug OCPBUGS-26541: cleanup cluster-config-operator image

View the Description View the linked PRs

Description of problem:

    manifests are duplicated with cluster-config-api image

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-28540: Should contain oc.rhel8 for 4.16 ocp payload

View the Description View the linked PRs

Description of problem:

Now for 4.16 ocp payload , only contain the oc.rhel9 , can't find the oc.rhel8. 
oc adm release extract --command='oc.rhel8'  registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-01-24-133352 --command-os=linux/amd64  -a tmp/config.json --to octest/
error: image did not contain usr/share/openshift/linux_amd64/oc.rhel8

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

always

Steps to Reproduce:

    1.oc adm release extract --command='oc.rhel8'  registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-01-24-133352 --command-os=linux/amd64  -a tmp/config.json --to octest/
    2.
    3.

Actual results:

 failed to extract oc.rhel8

Expected results:

 for ocp paypload should contain the oc.rhel8.

Additional info:

https://github.com/openshift/oc/pull/1669

Bug OCPBUGS-29546: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/958

Bug OCPBUGS-35891: No ability to debug node-ip detection logic

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32348~~. The following is the description of the original issue:
—
After fixing https://issues.redhat.com/browse/OCPBUGS-29919 by merging https://github.com/openshift/baremetal-runtimecfg/pull/301 we have lost ability to properly debug the logic of selection Node IP used in runtimecfg.

In order to preserve debugability of this component, it should be possible to selectively enable verbose logs.

Bug OCPBUGS-24630: don't find "scrape.timestamp-tolerance" setting in prometheus

View the Description View the linked PRs

Description of problem:

tested https://github.com/openshift/cluster-monitoring-operator/pull/2187 with PR

launch 4.15,openshift/cluster-monitoring-operator#2187 aws

don't find "scrape.timestamp-tolerance" setting in prometheus and prometheus pod, no result for below commands

$ oc -n openshift-monitoring get prometheus k8s -oyaml | grep -i "scrape.timestamp-tolerance"
$ oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml | grep -i "scrape.timestamp-tolerance"
$ oc -n openshift-monitoring get sts  prometheus-k8s  -oyaml | grep -i "scrape.timestamp-tolerance"

not in prometheus configuration file either

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- cat /etc/prometheus/config_out/prometheus.env.yaml | head
global:
  evaluation_interval: 30s
  scrape_interval: 30s
  external_labels:
    prometheus: openshift-monitoring/k8s
    prometheus_replica: prometheus-k8s-0
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
scrape_configs:
- job_name: serviceMonitor/openshift-apiserver-operator/openshift-apiserver-operator/0

https://github.com/openshift/cluster-monitoring-operator/pull/2189

Bug OCPBUGS-29931: ART requests updates to 4.16 image ose-baremetal-runtimecfg-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-runtimecfg/pull/300

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-runtimecfg/pull/300

Bug OCPBUGS-35522: hcp cli does not contain --infra-volumesnapshot-class-mapping cli art

View the Description View the linked PRs

Description of problem:

Snapshot support is being delivered for kubevirt-csi in 4.16, but the cli used to configure snapshot support did not expose the argument that makes using snapshots possible.


The cli arg [--infra-volumesnapshot-class-mapping] was added to the developer cli [hypershift] but never made it to the productized cli [hcp] that end users will use.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

1. hcp create cluster kubevirt -h | grep infra-volumesnapshot-class-mapping 2.
3.

Actual results:

no value is found

Expected results:

the infra-volumesnapshot-class-mapping cli arg should be found

Additional info:

https://github.com/openshift/hypershift/pull/4291

Bug OCPBUGS-29863: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/314

Bug OCPBUGS-24883: Update 4.16 ose-cluster-authentication-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-authentication-operator/pull/644

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-authentication-operator/pull/644

Bug OCPBUGS-25461: Implement multi-rhel artifact extraction

View the Description View the linked PRs

Description of problem:


The following binaries need to get extracted from the release payload for both rhel8 and rhel9:

oc
ccoctl
opm
openshift-install
oc-mirror

The images that contain these, should produce artifacts of both kinds in some locatiuon, and probably make the artifact of their architecture available under a normal location in path. Example:

/usr/share/<binary>.rhel8
/usr/share/<binary>.rhel9
/usr/bin/<binary>

This ticket is about getting "oc adm release extract" to do the right thing in a backwards compatible way. If both binaries are available get those. If not, get from the old location.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1647

Bug OCPBUGS-27957: Should not collect the previous.log which not corresponding with the --since/--since-time

View the Description View the linked PRs

Description of problem:

Should not collect the previous.log which not corresponding with the --since/--since-time for `oc adm inspect` command

Version-Release number of selected component (if applicable):

  4.16

How reproducible:

    always

Steps to Reproduce:

    1.  `oc adm inspect --since-time="2024-01-25T01:35:27Z"  ns/openshift-multus`

Actual results:

    also collect the previous.log which not corresponding with the specified time.

Expected results:

    Only collect the logs after the --since/--since-time.

Additional info:

https://github.com/openshift/oc/pull/1666

Bug OCPBUGS-35103: [gcp][CORS-2420] deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35099~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-4466~~. The following is the description of the original issue:
—
Description of problem:

deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic

Version-Release number of selected component (if applicable):

$ openshift-install version
openshift-install 4.13.0-0.nightly-2022-12-04-194803
built from commit cc689a21044a76020b82902056c55d2002e454bd
release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea
release architecture amd64

How reproducible:

Always

Steps to Reproduce:

1. create manifests
2. set 'spec.mastersSchedulable' as 'true', in <installation dir>/manifests/cluster-scheduler-02-config.yml
3. remove the worker machineset YAML file from <installation dir>/openshift directory
4. create cluster

Actual results:

Got "panic: runtime error: index out of range [0] with length 0".

Expected results:

The installation should succeed, or giving clear error messages.

Additional info:

$ openshift-install version
openshift-install 4.13.0-0.nightly-2022-12-04-194803
built from commit cc689a21044a76020b82902056c55d2002e454bd
release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea
release architecture amd64
$ 
$ openshift-install create manifests --dir test1
? SSH Public Key /home/fedora/.ssh/openshift-qe.pub
? Platform gcp
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
? Project ID OpenShift QE (openshift-qe)
? Region us-central1
? Base Domain qe.gcp.devcluster.openshift.com
? Cluster Name jiwei-1205a
? Pull Secret [? for help] ******
INFO Manifests created in: test1/manifests and test1/openshift 
$ 
$ vim test1/manifests/cluster-scheduler-02-config.yml
$ yq-3.3.0 r test1/manifests/cluster-scheduler-02-config.yml spec.mastersSchedulable
true
$ 
$ rm -f test1/openshift/99_openshift-cluster-api_worker-machineset-?.yaml
$ 
$ tree test1
test1
├── manifests
│   ├── cloud-controller-uid-config.yml
│   ├── cloud-provider-config.yaml
│   ├── cluster-config.yaml
│   ├── cluster-dns-02-config.yml
│   ├── cluster-infrastructure-02-config.yml
│   ├── cluster-ingress-02-config.yml
│   ├── cluster-network-01-crd.yml
│   ├── cluster-network-02-config.yml
│   ├── cluster-proxy-01-config.yaml
│   ├── cluster-scheduler-02-config.yml
│   ├── cvo-overrides.yaml
│   ├── kube-cloud-config.yaml
│   ├── kube-system-configmap-root-ca.yaml
│   ├── machine-config-server-tls-secret.yaml
│   └── openshift-config-secret-pull-secret.yaml
└── openshift
    ├── 99_cloud-creds-secret.yaml
    ├── 99_kubeadmin-password-secret.yaml
    ├── 99_openshift-cluster-api_master-machines-0.yaml
    ├── 99_openshift-cluster-api_master-machines-1.yaml
    ├── 99_openshift-cluster-api_master-machines-2.yaml
    ├── 99_openshift-cluster-api_master-user-data-secret.yaml
    ├── 99_openshift-cluster-api_worker-user-data-secret.yaml
    ├── 99_openshift-machineconfig_99-master-ssh.yaml
    ├── 99_openshift-machineconfig_99-worker-ssh.yaml
    ├── 99_role-cloud-creds-secret-reader.yaml
    └── openshift-install-manifests.yaml2 directories, 26 files
$ 
$ openshift-install create cluster --dir test1
INFO Consuming Openshift Manifests from target directory
INFO Consuming Master Machines from target directory 
INFO Consuming Worker Machines from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Consuming Common Manifests from target directory 
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
panic: runtime error: index out of range [0] with length 0goroutine 1 [running]:
github.com/openshift/installer/pkg/tfvars/gcp.TFVars({{{0xc000cf6a40, 0xc}, {0x0, 0x0}, {0xc0011d4a80, 0x91d}}, 0x1, 0x1, {0xc0010abda0, 0x58}, ...})
        /go/src/github.com/openshift/installer/pkg/tfvars/gcp/gcp.go:70 +0x66f
github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1daff070, 0xc000cef530?)
        /go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:479 +0x6bf8
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000c78870, {0x1a777f40, 0x1daff070}, {0x0, 0x0})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffc4c21413b?, {0x1a777f40, 0x1daff070}, {0x1dadc7e0, 0x8, 0x8})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffc4c21413b, 0x5})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:259 +0x125
main.runTargetCmd.func2(0x1dae27a0?, {0xc000c702c0?, 0x2?, 0x2?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:289 +0xe7
github.com/spf13/cobra.(*Command).execute(0x1dae27a0, {0xc000c70280, 0x2, 0x2})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000c3a500)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:918
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff
$

https://github.com/openshift/installer/pull/8553

Story TRT-1623: Console operator change broke hypershift E2E

View the Description View the linked PRs

https://github.com/openshift/console-operator/pull/889 is causing failures in hypershift e2e https://testgrid.k8s.io/redhat-hypershift#4.16-aws-ovn

Some payloads are affected too. Retry sometimes avoided the problem. But this should still be reverted.

https://github.com/openshift/console-operator/pull/892

Bug OCPBUGS-25588: Update 4.16 ose-aws-pod-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-pod-identity-webhook/pull/183

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-pod-identity-webhook/pull/183

Bug OCPBUGS-34139: Improve Pipeline list page performance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32632~~. The following is the description of the original issue:
—
Description of problem:

    In PR - https://github.com/openshift/console/pull/13676 we worked on improving the performance of the PipelineRun list page and the issue https://issues.redhat.com/browse/OCPBUGS-32631 is created to still improve the performance of the PLR list page. Once this is complete, we have to improve the performance of Pipeline list page by considering below point,

1. TaskRuns should not be fetched for all the PLR's. 
2. Use pipelinerun.status.conditions.message  to get the status of TaskRuns 3. For any PLR, if string pipelinerun.status.conditions.message having data about Tasks status use that string only instead of fetching TaskRuns

https://github.com/openshift/console/pull/13882

Story TRT-1491: Reorder Intervals groups to put more relevant info at top

View the Description View the linked PRs

Tired of scrolling through alerts and pod states that are seldom useful to get to things that we need every day.

https://github.com/openshift/origin/pull/28576

Bug OCPBUGS-32350: Tracker for issues in the Ironic servicing feature

View the Description View the linked PRs

This is a tracker bug for issues discovered when working on https://issues.redhat.com/browse/METAL-940. No QA verification will be possible until the feature is implemented much later.

https://github.com/openshift/ironic-image/pull/478

Bug OCPBUGS-25516: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/106

Bug OCPBUGS-25539: Update 4.16 ose-ibm-vpc-block-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/102

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/102

Bug OCPBUGS-35204: replace global refresh sync lock in OIDC provider with per-refresh-token one

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35080~~. The following is the description of the original issue:
—
In a cluster with external OIDC environment we need to replace global refresh sync lock in OIDC provider with per-refresh-token one. The work should replace the sync lock that would apply to all HTTP-serving spawned goroutines with a sync-lock that is specific to each of the refresh tokens

Description of problem:

Version-Release number of selected component (if applicable):

Steps to Reproduce:

Actual results:

Expected results:

    That reduces token refresh request handling time by about 30%.

Additional info:

https://github.com/openshift/console/pull/13950

Bug OCPBUGS-24366: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4124

Bug OCPBUGS-26111: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/194

Bug OCPBUGS-24226: setting TLSSecurityProfile with no minTLSVersion crashes controller

View the Description View the linked PRs

Maxim Patlasov pointed this out in ~~STOR-1453~~ but still somehow we missed it. I tested this on 4.15.0-0.ci-2023-11-29-021749.

It is possible to set a custom TLSSecurityProfile without minTLSversion:

$ oc edit apiserver cluster
...
spec:
tlsSecurityProfile:
type: Custom
custom:
ciphers:
- ECDHE-ECDSA-CHACHA20-POLY1305
- ECDHE-ECDSA-AES128-GCM-SHA256

This causes the controller to crash loop:

$ oc get pods -n openshift-cluster-csi-drivers
NAME READY STATUS RESTARTS AGE
aws-ebs-csi-driver-controller-589c44468b-gjrs2 6/11 CrashLoopBackOff 10 (18s ago) 37s
...

because the `${TLS_MIN_VERSION}` placeholder is never replaced:

- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}
- --tls-min-version=${TLS_MIN_VERSION}

The observed config in the ClusterCSIDriver shows an empty string:

$ oc get clustercsidriver ebs.csi.aws.com -o json | jq .spec.observedConfig
{
"targetcsiconfig": {
"servingInfo":

{ "cipherSuites": [ "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256", "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256" ], "minTLSVersion": "" }

}
}

which means minTLSVersion is empty when we get to this line, and the string replacement is not done:

[https://github.com/openshift/library-go/blob/c7f15dcc10f5d0b89e8f4c5d50cd313ae158de20/pkg/operator/csi/csidrivercontrollerservicecontroller/helpers.go#L234]

So it seems we have a couple of options:

1) completely omit the --tls-min-version arg if minTLSVersion is empty, or
2) set --tls-min-version to the same default value we would use if TLSSecurityProfile is not present in the apiserver object

Bug OCPBUGS-36341: Backport owners file for multus admission controller

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/multus-admission-controller/pull/87

Bug OCPBUGS-30930: Disabling nodepool autoscaling by setting an explicit scale, leave the nodepool configuration invalid

View the Description View the linked PRs

Description of problem:

on a nodepool with Autoscaling Enabled, "oc scale nodepool" command is disabling Autoscaling, but leavs an invalis configuration with Autoscaling info that should have been cleared.

Version-Release number of selected component (if applicable):

  (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc version
Client Version: 4.14.14
Kustomize Version: v5.0.1
Server Version: 4.14.14
Kubernetes Version: v1.27.10+28ed2d7
(.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$

How reproducible:

  happens all the time

Steps to Reproduce:

     1. deploy a hub cluster with 3 master nodes, and 0 workers, on it, a hostedcluster with 6 nodes(I've used this job to deploy: https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/CI/job/job-runner/2431/)
    
2. (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc -n clusters patch nodepool hosted-0 --type=json -p '[{"op": "remove", "path": "/spec/replicas"},{"op":"add", "path": "/spec/autoScaling", "value": { "max": 6, "min": 6 }}]'
nodepool.hypershift.openshift.io/hosted-0 patched
(.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc get nodepool -A
NAMESPACE   NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
clusters    hosted-0   hosted-0                   6               True          False        4.14.14                                      



3. scale to 2 nodes in the nodepool: (.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc scale nodepool/hosted-0 --namespace clusters --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig --replicas=2
nodepool.hypershift.openshift.io/hosted-0 scaled
(.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc get nodepool -A
NAMESPACE   NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
clusters    hosted-0   hosted-0   2               6               False         False        4.14.14         

4. and after scaledown ends :
(.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc get nodepool -A
NAMESPACE   NAME       CLUSTER    DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
clusters    hosted-0   hosted-0   2               6               False         False        4.14.14

Actual results:

(.venv) [kni@ocp-edge77 ocp-edge-auto_cluster]$ oc describe nodepool hosted-0 --namespace clusters --kubeconfig ~/clusterconfigs/auth/hub-kubeconfig
Name:         hosted-0
Namespace:    clusters
Labels:       <none>
Annotations:  hypershift.openshift.io/nodePoolCurrentConfig: de17bd57
              hypershift.openshift.io/nodePoolCurrentConfigVersion: 84116781
              hypershift.openshift.io/nodePoolPlatformMachineTemplate: hosted-0-52df983b
API Version:  hypershift.openshift.io/v1beta1
Kind:         NodePool
Metadata:
  Creation Timestamp:  2024-03-13T22:39:57Z
  Finalizers:
    hypershift.openshift.io/finalizer
  Generation:  4
  Owner References:
    API Version:     hypershift.openshift.io/v1beta1
    Kind:            HostedCluster
    Name:            hosted-0
    UID:             ec16c5a2-b8dc-4c54-abe8-297020df4442
  Resource Version:  818918
  UID:               671bdaf2-c8f9-4431-9493-476e9fe44d76
Spec:
  Arch:  amd64
  Auto Scaling:
    Max:         6
    Min:         6
  Cluster Name:  hosted-0
  Management:
    Auto Repair:  false
    Replace:
      Rolling Update:
        Max Surge:        1
        Max Unavailable:  0
      Strategy:           RollingUpdate
    Upgrade Type:         InPlace
  Node Drain Timeout:     30s
  Platform:
    Agent:
    Type:  Agent
  Release:
    Image:   quay.io/openshift-release-dev/ocp-release:4.14.14-x86_64
  Replicas:  2

Expected results:

No spec.autoscaling data, only spec.Replicas:2, as were before Enabling Autoscaling.


Spec:   
    Auto Scaling:     
        Max:         6
        Min:         6

Additional info:

https://github.com/openshift/hypershift/pull/3786

Bug OCPBUGS-25735: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1192

Bug OCPBUGS-24421: [vSphere-CSI-Driver-Operator] does not update the VSphereCSIDriverOperatorCRAvailable status timely

View the Description View the linked PRs

Description of problem:

[vSphere-CSI-Driver-Operator] does not update the VSphereCSIDriverOperatorCRAvailable status timely

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-04-162702

How reproducible:

Always

Steps to Reproduce:

1. Set up a vSphere cluster with 4.15 nightly;
2. Backup the secret/vmware-vsphere-cloud-credentials to "vmware-cc.yaml"
3. Change the secret/vmware-vsphere-cloud-credentials password to an invalid value under ns/openshift-cluster-csi-drivers by oc edit;
4. Wait for the cluster storage operator degrade and the driver controller pods CrashLoopBackOff, then recover the backup secret "vmware-cc.yaml" back by apply;
5. Observer the driver controller pods back to Running and the cluster storage operator should be back to healthy.

Actual results:

In Step5 : The driver controller pods back to Running but the cluster storage operator stuck at Degrade: True status for almost 1 hour$ oc get po
NAME                                                    READY   STATUS    RESTARTS        AGE
vmware-vsphere-csi-driver-controller-664db7d497-b98vt   13/13   Running   0               16s
vmware-vsphere-csi-driver-controller-664db7d497-rtj49   13/13   Running   0               23s
vmware-vsphere-csi-driver-node-2krg6                    3/3     Running   1 (3h4m ago)    3h5m
vmware-vsphere-csi-driver-node-2t928                    3/3     Running   2 (3h16m ago)   3h16m
vmware-vsphere-csi-driver-node-45kb8                    3/3     Running   2 (3h16m ago)   3h16m
vmware-vsphere-csi-driver-node-8vhg9                    3/3     Running   1 (3h16m ago)   3h16m
vmware-vsphere-csi-driver-node-9fh9l                    3/3     Running   1 (3h4m ago)    3h5m
vmware-vsphere-csi-driver-operator-5954476ddc-rkpqq     1/1     Running   2 (3h10m ago)   3h17m
vmware-vsphere-csi-driver-webhook-7b6b5d99f6-rxdt8      1/1     Running   0               3h16m
vmware-vsphere-csi-driver-webhook-7b6b5d99f6-skcbd      1/1     Running   0               3h16m
$ oc get co/storage -w
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
storage   4.15.0-0.nightly-2023-12-04-162702   False       False         True       8m39s   VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable: error logging into vcenter: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
storage   4.15.0-0.nightly-2023-12-04-162702   True        False         False      0s
$  oc get co/storage
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
storage   4.15.0-0.nightly-2023-12-04-162702   True        False         False      3m41s

Expected results:

In Step5 : After driver controller pods back to Running the cluster storage operator should recover healthy status immediatelly

Additional info:

I compare with the previous CI results seems this issue happened after 4.15.0-0.nightly-2023-11-25-110147

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/215

Bug OCPBUGS-34760: [AWS CAPI install] custom AMI can not be applied to master machines

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33508~~. The following is the description of the original issue:
—
Description of problem:

Config custom AMI for cluster:
platform.aws.defaultMachinePlatform.amiID
Or 
installconfig.controlPlane.platform.aws.amiID
installconfig.compute.platform.aws.amiID


Master machines still use default AMI instead of custom one.

aws ec2 describe-instances --filters "Name=tag:kubernetes.io/cluster/yunjiang-cap6-qjc5t,Va│
lues=owned" "Name=tag:Name,Values=*worker*" --output json | jq '.Reservations[].Instances[].ImageId' | sort | uniq   	│
"ami-0f71147cab4dbfb61"


aws ec2 describe-instances --filters "Name=tag:kubernetes.io/cluster/yunjiang-cap6-qjc5t,Va│
lues=owned" "Name=tag:Name,Values=*master*" --output json | jq '.Reservations[].Instances[].ImageId' | sort | uniq   	│
"Ami-0ae9b509738034a2c" <- default ami

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-08-222442

How reproducible:

Steps to Reproduce:

    1.See description
    2.
    3.

Actual results:

See description

Expected results:

master machines use custom AMI

Additional info:

https://github.com/openshift/installer/pull/8520

Bug OKD-225: Incorrect samples operator in OKD-SCOS

View the Description View the linked PRs

In all releases tested, in particular, 4.16.0-0.okd-scos-2024-08-21-155613, Samples operator uses incorrect templates, resulting in following alert:

Samples operator is detecting problems with imagestream image imports. You can look at the "openshift-samples" ClusterOperator object for details. Most likely there are issues with the external image registry hosting the images that needs to be investigated. Or you can consider marking samples operator Removed if you do not care about having sample imagestreams available. The list of ImageStreams for which samples operator is retrying imports: fuse7-eap-openshift fuse7-eap-openshift-java11 fuse7-java-openshift fuse7-java11-openshift fuse7-karaf-openshift-jdk11 golang httpd java jboss-datagrid73-openshift jboss-eap-xp3-openjdk11-openshift jboss-eap-xp3-openjdk11-runtime-openshift jboss-eap-xp4-openjdk11-openshift jboss-eap-xp4-openjdk11-runtime-openshift jboss-eap74-openjdk11-openshift jboss-eap74-openjdk11-runtime-openshift jboss-eap74-openjdk8-openshift jboss-eap74-openjdk8-runtime-openshift jboss-webserver57-openjdk8-tomcat9-openshift-ubi8 jenkins jenkins-agent-base mariadb mysql nginx nodejs perl php postgresql13-for-sso75-openshift-rhel8 postgresql13-for-sso76-openshift-rhel8 python redis ruby sso75-openshift-rhel8 sso76-openshift-rhel8 fuse7-karaf-openshift jboss-webserver57-openjdk11-tomcat9-openshift-ubi8 postgresql

For example, the sample image for Mysql 8.0 is being pulled from registry.redhat.io/rhscl/mysql-80-rhel7:latest (and cannot be found using the dummy pull secret).

Works correctly on OKD FCOS builds.

https://github.com/openshift/cluster-samples-operator/pull/576

Task METAL-913: Stop depending on python-hardware

View the linked PRs

https://github.com/openshift/ironic-agent-image/pull/113

Bug OCPBUGS-3522: Improve CanaryChecksRepetitiveFailures actionability

View the Description View the linked PRs

Description of problem:

CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing

doesn't give much detail or suggest next-steps. Expanding it to include at least a more detailed error message would make it easier for the admin to figure out how to resolve the issue.

Version-Release number of selected component (if applicable):

It's in the dev branch, and probably dates back to whenever the canary system was added.

How reproducible:

100%

Steps to Reproduce:

1. Break ingress. FIXME: Maybe by deleting the cloud load balancer, or dropping a firewall in the way, or something.
2. See the canary pods start failing.
3. Ingress operator sets CanaryChecksRepetitiveFailures with a message.

Actual results:

Canary route checks for the default ingress controller are failing

Expected results:

Canary route checks for the default ingress controller are failing: ${ERROR_MESSAGE}. ${POSSIBLY_ALSO_MORE_TROUBLESHOOTING_IDEAS?}

Additional info:

Plumbing the error message through might be as straightforward as passing probeRouteEndpoint's err through to setCanaryFailingStatusCondition for formatting. Or maybe it's more complicated than that?

https://github.com/openshift/cluster-ingress-operator/pull/865

Task MON-3689: Update downstream prometheus-operator to v0.71.2

View the linked PRs

Bug OCPBUGS-27316: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4042

Bug OCPBUGS-25207: machine-config degraded with error "rendered-master-${hash}" not found

View the Description View the linked PRs

Description of problem:

    The cluster operator "machine-config" degraded due to MachineConfigPool master is not ready, which tells error like "rendered-master-${hash} not found".

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-11-033133

How reproducible:

    Always. We met the issue in 2 CI profiles, Flexy template "functionality-testing/aos-4_15/upi-on-gcp/versioned-installer-ovn-ipsec-ew-ns-ci", and PROW CI test "periodic-ci-openshift-openshift-tests-private-release-4.15-multi-ec-gcp-ipi-disc-priv-oidc-arm-mixarch-f14".

Steps to Reproduce:

The Flexy template brief steps:
1. "create install-config" and then "create manifests"
2. add manifest file to config ovnkubernetes network for IPSec (please refer to https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blame/master/functionality-testing/aos-4_15/hosts/create_ign_files.sh#L517-530)
3. (optional) "create ignition-config"
4. UPI installation steps (see OCP doc https://docs.openshift.com/container-platform/4.14/installing/installing_gcp/installing-gcp-user-infra.html#installation-gcp-user-infra-exporting-common-variables)

Actual results:

    Installation failed, with the machine-config operator degraded

Expected results:

    Installation should succeed.

Additional info:

    The must-gather is at https://drive.google.com/file/d/12xbjWUknDL_DRNSS8T_Z3u4d1KrNlJgT/view?usp=drive_link

https://github.com/openshift/cluster-network-operator/pull/2187

Bug OCPBUGS-25546: Update 4.16 ose-haproxy-router-base-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/550

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/550

Bug OCPBUGS-25974: Specifying a control-plane-operator image without a digest results in InvalidImageRef

View the Description View the linked PRs

Description of problem:

    When specifying a control plane operator image for dev purposes, the control plane operator pod fails to come up with an InvalidImageRef status.

Version-Release number of selected component (if applicable):

    Mgmt cluster is 4.15, HyperShift control plane operator is latest from main

How reproducible:

Always

Steps to Reproduce:

    1. Create a hosted cluster with an annotation to override control plane operator image and point it to a non-digest image ref.

Actual results:

    The cluster fails to come up with the CPO pod failing with InvalidImageRef

Expected results:

The cluster comes up fine.

Additional info:

https://github.com/openshift/hypershift/pull/3361

Bug OCPBUGS-29982: ART requests updates to 4.16 image ose-thanos-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/thanos/pull/142

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/thanos/pull/142

Bug OCPBUGS-35557: Migrate HyperShift KAS to none endpoint reconciler type

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33428~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4227

Bug OCPBUGS-35831: [release-4.16] Misleading alert regarding high control plane CPU utilization in Single Node OpenShift (SNO) cluster

View the Description View the linked PRs

The monitoring system for Single Node OpenShift (SNO) cluster is triggering an alert named "HighOverallControlPlaneCPU" related to excessive control plane CPU utilization. However, this alert is misleading as it assumes a multi-node setup with high availability (HA) considerations, which do not apply to SNO deployment.

The customer is receiving MNO alerts in the SNO cluster. Below are the details:

The vDU with 2xRINLINE card is installed on the SNO node with OCP 4.14.14.
Used hardware: Airframe OE22 2U server CPU Intel(R) Xeon Intel(R) Xeon(R) Gold 6428N SPR-SP S3, (32 cores 64 threads) with 128GB memory.

After all vDU pods became running, a few minutes later the following alert was triggered:

"labels":

{ "alertname": "HighOverallControlPlaneCPU", "namespace": "openshift-kube-apiserver", "openshift_io_alert_source": "platform", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }

,
"annotations": {
"description": "Given three control plane nodes, the overall CPU utilization may only be about 2/3 of all available capacity.
This is because if a single control plane node fails, the remaining two must handle the load of the cluster in order to be HA.
If the cluster is using more than 2/3 of all capacity, if one control plane node fails, the remaining two are likely to fail when they take the load.
To fix this, increase the CPU and memory on your control plane nodes.",
"runbook_url": https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-apiserver-operator/ExtremelyHighIndividualControlPlaneCPU.md,
"summary": "CPU utilization across all three control plane nodes is higher than two control plane nodes can sustain;
a single control plane node outage may cause a cascading failure; increase available CPU."

The alert description is misleading since this cluster is SNO, there is no HA in this cluster.
Increasing CPU capacity in SNO cluster is not an option.
Although the CPU usage is high, this alarm is not correct.
MNO and SNO clusters should have separate alert descriptions.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1705

Story HOSTEDCP-1488: Use AWS Regionalized STS Endpoints

View the Description View the linked PRs

User Story:

As a ROSA customer, I want to enforce that my workloads follow AWS best-practices by using AWS Regionalized STS Endpoints instead of the global one.

As Red Hat, I would like to follow AWS best-practices by using AWS Regionalized STS Endpoints instead of the global one.

Per AWS docs:

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html

AWS recommends using Regional AWS STS endpoints instead of the global endpoint to reduce latency, build in redundancy, and increase session token validity.

https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

All new SDK major versions releasing after July 2022 will default to regional. New SDK major versions might remove this setting and use regional behavior. To reduce future impact regarding this change, we recommend you start using regional in your application when possible.

Acceptance Criteria:

Areas where HyperShift creates STS credentials use regionalized STS endpoints, e.g. https://github.com/openshift/hypershift/blob/ae1caa00ff3a2c2bfc1129f0168efc1e786d1d12/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L1225-L1228

Engineering Details:

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html
https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html
This was done in ROSA Classic via ~~SDA-7313~~

https://github.com/openshift/hypershift/pull/3742

Bug OCPBUGS-29570: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-update-keys/pull/54

Bug OCPBUGS-35755: Fix etcd profiles e2e test to check returned status for updated value

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28885

Bug OCPBUGS-39179: Prometheus no longer accepts samples of the same series with different timestamps

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38690~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38622. The following is the description of the original issue:
—
Description of problem:

    See https://github.com/prometheus/prometheus/issues/14503 for more details

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

Steps to Reproduce:
1. Make Prometheus scrape a target that exposes multiple samples of the same series with different explicit timestamps, for example:

# TYPE requests_per_second_requests gauge
# UNIT requests_per_second_requests requests
# HELP requests_per_second_requests test-description
requests_per_second_requests 16 1722466225604
requests_per_second_requests 14 1722466226604
requests_per_second_requests 40 1722466227604
requests_per_second_requests 15 1722466228604
# EOF

2. Not all the samples will be ingested
3. If Prometheus continues scraping that target for a moment, the PrometheusDuplicateTimestamps will fire.
Actual results:

Expected results: all the samples should be considered (or course if the timestamps are too old or are too in the future, Prometheus may refuses them.)

Additional info:

     Regression introduced in Prometheus 2.52.
    Proposed upstream fixes: https://github.com/prometheus/prometheus/pull/14683 https://github.com/prometheus/prometheus/pull/14685

https://github.com/openshift/prometheus/pull/224

Bug OU-416: When custom datasource is not found it should not fallback to the default in cluster prometheus

View the Description View the linked PRs

when using the monitoring plugin with the console dashboards plugin, if a custom datasource defined in a dashboard is not found, the default in cluster prometheus is used to fetch data. This creates a false assumption to the user that the custom dashboard is working when in reality, it should fail.

How to reproduce:

In OpenShift 4.16
Install COO
Enable the console dashboards plugin as documented here
Create a dashboard that uses custom datasources as documented here. Do not create a datasource so the bug can be reproduced
Go to monitoring -> dashboards and select the dashboard created above

Expected result

The dashboard should display an error as the custom datasource was not found

https://github.com/openshift/monitoring-plugin/pull/117

Bug OCPBUGS-35101: Add new regions that have added PER capabilities

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35097~~. The following is the description of the original issue:
—
Description of problem:

Some regions were migrated to PER in the 4.16 cycle. We want to enable them.

Version-Release number of selected component (if applicable):

How reproducible:

Easily

Steps to Reproduce:

1. Try to deploy to some PER enabled regions
2. Fail because the installer does not consider them valid.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8533

Bug OCPBUGS-28835: Failed to watch Metal3Remediation template

View the Description View the linked PRs

Description of problem:

NHC failed to watch Metal3 remediation template

Version-Release number of selected component (if applicable):

OCP4.13 and higher

How reproducible:

    100%

Steps to Reproduce:

    1. Create Metal3RemediationTemplate
    2. Install NHCv.0.7.0
    3. Create NHC with Metal3RemediationTemplate

Actual results:

E0131 14:07:51.603803 1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: failed to list infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: metal3remediationtemplates.infrastructure.cluster.x-k8s.io is forbidden: User "system:serviceaccount:openshift-workload-availability:node-healthcheck-controller-manager" cannot list resource "metal3remediationtemplates" in API group "infrastructure.cluster.x-k8s.io" at the cluster scope

E0131 14:07:59.912283 1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Failed to watch infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3Remediation: unknown

W0131 14:08:24.831958 1 reflector.go:539] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: failed to list infrastructure.cluster.x-k8s.io/v1beta1, Kind=Metal3RemediationTemplate: metal3remediationtemplates.infrastructure.cluster.x-k8s.io is forbidden: User "system:serviceaccount:openshift-workload-availability:node-healthcheck-controller-manager" cannot list resource

Expected results:

    No errors

Additional info:

https://github.com/openshift/cluster-api-provider-baremetal/pull/209

Bug TRT-1499: gcp-network-liveness has taken out payloads

View the Description View the linked PRs

This wasn't supposed to have a junit associated but it looks like it did and is now killing payloads. It started failing because the LB was reaped, and we do not yet have confirmation the new one will be preserved.

This should be pulled out until we've got confirmation that (a) there is no junit for the backend, and (b) the LB is on the preserve whitelist.

https://github.com/openshift/origin/pull/28586

Bug OCPBUGS-24832: Update 4.16 ose-image-customization-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-customization-controller/pull/113

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-customization-controller/pull/113

Bug OCPBUGS-31398: HCP: recycler pods are not starting on hostedcontrolplane in disconnected environments ( ImagePullBackOff on quay.io/openshift/origin-tools:latest )

View the Description View the linked PRs

Description of problem:

Recycler pods are not starting on hostedcontrolplane in disconnected environments ( ImagePullBackOff on quay.io/openshift/origin-tools:latest ).

The root cause is that the recycler-pod template (stored in the recycler-config ConfigMap) on hostedclusters is always pointing to `quay.io/openshift/origin-tools:latest` .

The same configMap for the management cluster is correctly pointing to an image which is part of the release payload:
$ oc get cm -n openshift-kube-controller-manager recycler-config -o json | jq -r '.data["recycler-pod.yaml"]' | grep "image"
      image: "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e458f24c40d41c2c802f7396a61658a5effee823f274be103ac22c717c157308"

but on hosted clusters we have:
$ oc get cm -n clusters-guest414a recycler-config -o json | jq -r '.data["recycler-pod.yaml"]' | grep "image" 
    image: quay.io/openshift/origin-tools:latest

This is likely due to:
https://github.com/openshift/hypershift/blob/e1b75598a62a06534fab6385d60d0f9a808ccc52/control-plane-operator/controllers/hostedcontrolplane/kcm/config.go#L80

quay.io/openshift/origin-tools:latest is not part of any mirrored release payload and it's referenced by tag so it will not be available on disconnected environments.

Version-Release number of selected component (if applicable):

    v4.14, v4.15, v4.16

How reproducible:

    100%

Steps to Reproduce:

    1. create an hosted cluster
    2. check the content of the recycler-config configmap in an hostedcontrolplane namespace
    3.

Actual results:

image field for the recycler-pod template is always pointing to `quay.io/openshift/origin-tools:latest` which is not part of the release payload

Expected results:

image field for the recycler-pod template is pointing to the right image (which one???) as extracted from the release payload

Additional info:

see: https://github.com/openshift/cluster-kube-controller-manager-operator/blob/64b4c1ba/bindata/assets/kube-controller-manager/recycler-cm.yaml#L21
to compare with cluster-kube-controller-manager-operator on OCP

https://github.com/openshift/hypershift/pull/3901

Bug OCPBUGS-25561: Update 4.16 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/208

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/208

Bug OCPBUGS-25849: PrometheusAdapter and MetricsServer tasks are conflict prone

View the Description View the linked PRs

Description of problem:

The two tasks will always "UPDATE" the underlying "APIService" resource even when no changes are to be made.
This behavior significantly elevates the likelihood of encountering conflicts, especially during upgrades, with other controllers that are concurrently monitoring the same resources (CA controller e.g.). Moreover, this consumes resources for unnecessary work.

The tasks rely on CreateOrUpdateAPIService: https://github.com/openshift/cluster-monitoring-operator/blob/5e394dd9de305cb6927a23c31b3f9651aa806fb8/pkg/client/client.go#L1725-L1745 which always "UPDATE" the APIService resource.

Version-Release number of selected component (if applicable):

How reproducible:

keep a cluster running then take a look at audit logs concerning the APIService v1beta1.metrics.k8s.io, you could use: oc adm must-gather -- /usr/bin/gather_audit_logs

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

You would see that every "get" is followed by an "update"

You can also take a look at the code taking care of that: https://github.com/openshift/cluster-monitoring-operator/blob/5e394dd9de305cb6927a23c31b3f9651aa806fb8/pkg/client/client.go#L1725-L1745

Expected results:

"Updates" should be avoided if no changes are to be made.

Additional info:

it'd be even better if we could avoid the "get"s, but that would be another subject to discuss.

https://github.com/openshift/cluster-monitoring-operator/pull/2218

Bug OCPBUGS-20220: [Multi-NIC]Egress traffic connection got timeout after remove another pod label

View the Description View the linked PRs

Description of problem:

[Multi-NIC]Egress traffic connect got timeout after remove another pod label in same namespace

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-024357

How reproducible:

Always

Steps to Reproduce:

1. Label one node as egress node
2. Create an egressIP object, egressIP was assigned to egress node secondary interface
# oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip-66293"},"spec":{"egressIPs":["172.22.0.190"],"namespaceSelector":{"matchLabels":{"org":"qe"}},"podSelector":{"matchLabels":{"color":"pink"}}}}
    creationTimestamp: "2023-10-08T07:28:04Z"
    generation: 2
    name: egressip-66293
    resourceVersion: "461590"
    uid: f1ca3483-63f1-4f31-99b0-e6a55161c285
  spec:
    egressIPs:
    - 172.22.0.190
    namespaceSelector:
      matchLabels:
        org: qe
    podSelector:
      matchLabels:
        color: pink
  status:
    items:
    - egressIP: 172.22.0.190
      node: worker-0
kind: List
metadata:
  resourceVersion: ""
3. Created a namespace and two pod under it. 
% oc get pods -n hrw -o wide
NAME         READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
hello-pod    1/1     Running   0          6m46s   10.129.2.7    worker-1   <none>           <none>
hello-pod1   1/1     Running   0          6s      10.131.0.14   worker-0   <none>           <none>

4. Add label org=qe to namespace hrw
# oc get ns hrw --show-labels
NAME   STATUS   AGE   LABELS
hrw    Active   21m   kubernetes.io/metadata.name=hrw,*org=qe,*pod-security.kubernetes.io/audit-version=v1.24,pod-security.kubernetes.io/audit=restricted,pod-security.kubernetes.io/warn-version=v1.24,pod-security.kubernetes.io/warn=restricted

5. At this time, from both pods to access external endpoint, succeeded. 
% oc rsh -n hrw hello-pod 
~ $ curl 172.22.0.1 --connect-timeout 5
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>
~ $ exit
 % oc rsh -n hrw hello-pod1 
~ $ curl 172.22.0.1 --connect-timeout 5
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>

6. Add label color=pink to both pods
 % oc label pod hello-pod color=pink -n hrw
pod/hello-pod labeled
 % oc label pod hello-pod1 color=pink -n hrw
pod/hello-pod1 labeled

7. Both pods can access external endpoint.
8. Remove label color=pink from pod hello-pod
% oc label pod hello-pod color- -n hrw     
pod/hello-pod unlabeled

Actual results:


Access external endpoint from the pod which keep the label got connect timeout
 % oc rsh -n hrw hello-pod1            
~ $ curl 172.22.0.1 --connect-timeout 5
curl: (28) Connection timeout after 5000 ms
~ $ 
~ $ 
~ $ curl 172.22.0.1 --connect-timeout 5
curl: (28) Connection timeout after 5000 ms

Note the label was removed from hello-pod , but try to access external endpoint from another pod, here hello-pod1 which should still use egressIP and be able to access

Expected results:

Should be able to access external endpoint

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2048

Bug OCPBUGS-31430: E2E: re-enable Pipelines e2e tests in CI

View the Description View the linked PRs

Description of problem:

    Re-enable e2e tests Red Hat Openshift Pipelines operator is now available in the operator hub.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13700

Bug OCPBUGS-32217: avoid race conditions during render cert creation

View the Description View the linked PRs

Description of problem:

This bug is to get the fixes from another PR in:
https://github.com/openshift/cluster-etcd-operator/pull/1235

Namely that we were relying on a race condition in the fake library to sync the informer and client lister to generate the certificates.
This fix entails a lister that directly goes via the client to avoid using the informer.

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

always, when reordering the statements in the code

Steps to Reproduce:

Reodering the code blocks as in https://github.com/openshift/cluster-etcd-operator/pull/1235/files#diff-273071b77ba329777b70cb3c4d3fb2e33bc8abf45cb3da28cbee512d591ab9ee 

will immediately expose the race condition in unit tests.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1246

Bug OCPBUGS-32307: Increase Max node limit for topology page to 300

View the Description View the linked PRs

Description of problem:

    Increase MAX_NODES_LIMIT to 300 for 4.16 and 200 for 4.15 so that users don't see alert "Loading is taking longer than expected" in topology page

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create more than 100 nodes in a namespace

Additional info:

https://github.com/openshift/console/pull/13761

Bug OCPBUGS-24820: Update 4.16 ose-baremetal-installer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7817

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7817

Bug OCPBUGS-26052: [AWS SDK install] Failed to create bootstrap role on C2S cluster

View the Description View the linked PRs

Description of problem:

Enable installer AWS SDK install, and create a C2S cluster, will hit following fatal error:

level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to create bootstrap resources: failed to create bootstrap instance profile: failed to create role (yunjiang-14c2a-t4wp7-bootstrap-role): RequestCanceled: request context canceled

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-03-140457
4.16.0-0.nightly-2024-01-03-193825

How reproducible:

Always

Steps to Reproduce:

    1. Enable AWS SDK install and create a C2S cluster
    2.
    3.

Actual results:

failed to create bootstrap instance profile: failed to create role (yunjiang-14c2a-t4wp7-bootstrap-role), bootstrap process failed

Expected results:

bootstrap process can be finished successfully.

Additional info:

No issue on terraform way.

https://github.com/openshift/installer/pull/7871

Bug OCPBUGS-25579: Update 4.16 ose-vertical-pod-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/274

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/274

Bug OCPBUGS-23809: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/220

Bug OCPBUGS-43438: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/599

Bug OCPBUGS-25585: Update 4.16 kube-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/109

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/109

Bug OCPBUGS-33196: skip images which contains both digest and tag when mirroring with oc-mirror v2

View the Description View the linked PRs

Description of problem:

when user tries to  perform mirror to mirror with all the catalogs in oc-mirror v2 i see that  it fails with error as some catalog contains images that has both tags and digest. oc-mirror v2 fails to mirror such kind of operators and it  does not generate IDMS and ITMS

Version-Release number of selected component (if applicable):

WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202404231239.p0.ge7889a7.assembly.stream.el9-e7889a7", GitCommit:"e7889a7ec70dd66b0d6a7ba6dedc3e4b93ebf4de", GitTreeState:"clean", BuildDate:"2024-04-23T17:20:33Z", GoVersion:"go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

Always

Steps to Reproduce:

    1. Install latest oc-mirror
    2. create imageSetconfig as below
    3. Now run the command `oc-mirror  --v2 -c /tmp/customer.yaml --workspace file:///app1/knarra/customertest docker://localhost:5000 --dest-tls-verify=false`

Actual results:

    Verify that mirroring fails with error as listed below
     2024/04/30 18:24:02  [ERROR]  : [Worker] err: Invalid source name docker://quay.io/cilium/cilium-etcd-operator:v2.0.7@sha256:04b8327f7f992693c2cb483b999041ed8f92efc8e14f2a5f3ab95574a65ea2dc: Docker references with both a tag and digest are currently not supported
2024/04/30 18:24:02  [ERROR]  : [Worker] err: Invalid source name docker://quay.io/cilium/startup-script:62093c5c233ea914bfa26a10ba41f8780d9b737f@sha256:a1454ca1f93b69ecd2c43482c8e13dc418ae15e28a46009f5934300a20afbdba: Docker references with both a tag and digest are currently not supported
2024/04/30 18:24:02  [ERROR]  : [Worker] err: Invalid source name docker://quay.io/coreos/etcd:v3.5.4@sha256:a67fb152d4c53223e96e818420c37f11d05c2d92cf62c05ca5604066c37295e9: Docker references with both a tag and digest are currently not supported

Expected results:

     Images with both tag and digest should be skipped while mirroring

Additional info:

    https://redhat-internal.slack.com/archives/C02JW6VCYS1/p1714501713082839?thread_ts=1714458218.270709&cid=C02JW6VCYS1

https://github.com/openshift/oc-mirror/pull/853

Bug OCPBUGS-29757: Devconsole internet proxy failed on airgapped cluster

View the Description View the linked PRs

Description of problem:

    Tekton Results API endpoint failed to fetch data on airgapped cluster.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13619

Bug OCPBUGS-19830: conformance tests failing due to openshift-multus config

View the Description View the linked PRs

Description of problem:

There are several testcases in conformance testsuite that are failing due to openshift-multus configuration.

We are running conformance testsuite as part of our Openshift on Openstack CI. We use that just to confirm correct functionality of the cluster. The command we are using to run the test suite is:

openshift-tests run  --provider '{\"type\":\"openstack\"}' openshift/conformance/parallel

The name of the tests that failed are:
1. sig-arch] Managed cluster should ensure platform components have system-* priority class associated [Suite:openshift/conformance/parallel]

Reason is:

6 pods found with invalid priority class (should be openshift-user-critical or begin with system-):
openshift-multus/whereabouts-reconciler-6q6h7 (currently "")
openshift-multus/whereabouts-reconciler-87dwn (currently "")
openshift-multus/whereabouts-reconciler-fvhwv (currently "")
openshift-multus/whereabouts-reconciler-h68h5 (currently "")
openshift-multus/whereabouts-reconciler-nlz59 (currently "")
openshift-multus/whereabouts-reconciler-xsch6 (currently "")

2. [sig-arch] Managed cluster should only include cluster daemonsets that have maxUnavailable or maxSurge update of 10 percent or maxUnavailable of 33 percent [Suite:openshift/conformance/parallel]
Reason is:

fail [github.com/openshift/origin/test/extended/operators/daemon_set.go:105]: Sep 23 16:12:15.283: Daemonsets found that do not meet platform requirements for update strategy:
  expected daemonset openshift-multus/whereabouts-reconciler to have maxUnavailable 10% or 33% (see comment) instead of 1, or maxSurge 10% instead of 0
Ginkgo exit error 1: exit with code 1

3.[sig-arch] Managed cluster should set requests but not limits [Suite:openshift/conformance/parallel]

Reason is:

fail [github.com/openshift/origin/test/extended/operators/resources.go:196]: Sep 23 16:12:17.489: Pods in platform namespaces are not following resource request/limit rules or do not have an exception granted:
  apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts defines a limit on cpu of 50m which is not allowed (rule: "apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts/limit[cpu]")
  apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts defines a limit on memory of 100Mi which is not allowed (rule: "apps/v1/DaemonSet/openshift-multus/whereabouts-reconciler/container/whereabouts/limit[memory]")
Ginkgo exit error 1: exit with code 1

4. [sig-node][apigroup:config.openshift.io] CPU Partitioning cluster platform workloads should be annotated correctly for DaemonSets [Suite:openshift/conformance/parallel]

Reason is:

fail [github.com/openshift/origin/test/extended/cpu_partitioning/pods.go:159]: Expected
    <[]error | len:1, cap:1>: [
        <*errors.errorString | 0xc0010fa380>{
            s: "daemonset (whereabouts-reconciler) in openshift namespace (openshift-multus) must have pod templates annotated with map[target.workload.openshift.io/management:{\"effect\": \"PreferredDuringScheduling\"}]",
        },
    ]
to be empty

How reproducible: Always
Steps to Reproduce: Run conformance testsuite:
https://github.com/openshift/origin/blob/master/test/extended/README.md

Actual results: Testcases failing
Expected results: Testcases passing

https://github.com/openshift/cluster-network-operator/pull/2103

Bug OCPBUGS-35098: [capi aws] missing interface implementation guard for BootstrapDestroyer

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35038~~. The following is the description of the original issue:
—
Description of problem:

    In case the interface changes, we might miss updating AWS and not realize it.

Version-Release number of selected component (if applicable):

    4.16+

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    No issue currently but could potentially break in the future.

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8552

Bug OCPBUGS-35472: Helm dependency update to 3.14.4

View the Description View the linked PRs

This is a backport fix for PR https://github.com/openshift/console/pull/13816

https://github.com/openshift/console/pull/13971

Bug OCPBUGS-36227: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4276

Bug OCPBUGS-24055: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1990

Bug OCPBUGS-29494: HCP: hypershift-operator on disconnected clusters ignores RegistryOverrides inspecting the control-plane-operator-image (setting hypershift.openshift.io/control-plane-operator-image is a workaround)

View the Description View the linked PRs

Description of problem:

    The hypershift operator ignores RegistryOverrides (from ICSP/IDMS) inspecting the control-plane-operator-image so on disconnected cluster the user should explicitly set hypershift.openshift.io/control-plane-operator-image annotation pointing to the mirrored image on the internal registry.

Example:
the correct match is in the IDMS:
# oc get imagedigestmirrorset -oyaml | grep -B2 registry.ci.openshift.org/ocp/4.14-2024-02-14-135111
 ...
    - mirrors:
      - virthost.ostest.test.metalkube.org:5000/localimages/local-release-image
      source: registry.ci.openshift.org/ocp/4.14-2024-02-14-135111

Creating an hosted cluster with:
hcp create cluster kubevirt --image-content-sources /home/mgmt_iscp.yaml  --additional-trust-bundle /etc/pki/ca-trust/source/anchors/registry.2.crt --name simone3 --node-pool-replicas 2 --memory 16Gi --cores 4 --root-volume-size 64 --namespace local-cluster --release-image virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:66c6a46013cda0ad4e2291be3da432fdd03b4a47bf13067e0c7b91fb79eb4539 --pull-secret /tmp/.dockerconfigjson --generate-ssh

on the hostedCluster object we see:
status:
  conditions:
  - lastTransitionTime: "2024-02-14T22:01:30Z"
    message: 'failed to look up image metadata for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      unauthorized: authentication required'
    observedGeneration: 3
    reason: ReconciliationError
    status: "False"
    type: ReconciliationSucceeded


and in the logs of the hypershift operator:
{"level":"info","ts":"2024-02-14T22:18:11Z","msg":"registry override coincidence not found","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"simone3","namespace":"local-cluster"},"namespace":"local-cluster","name":"simone3","reconcileID":"6d6a2479-3d54-42e3-9204-8d0ab1013745","image":"4.14-2024-02-14-135111"}
{"level":"error","ts":"2024-02-14T22:18:12Z","msg":"Reconciler error","controller":"hostedcluster","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"simone3","namespace":"local-cluster"},"namespace":"local-cluster","name":"simone3","reconcileID":"6d6a2479-3d54-42e3-9204-8d0ab1013745","error":"failed to look up image metadata for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6: failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6: unauthorized: authentication required","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}


so the hypershift-operator is not using the RegistryOverrides mechanism to inspect the image from the internal registry (virthost.ostest.test.metalkube.org:5000/localimages/local-release-image in this example).

Explicitly setting annotation:
hypershift.openshift.io/control-plane-operator-image: virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6
on the hosted-cluster directly pointing to the mirrored control-plane-operator image is required to proceed on disconnected environments.

Version-Release number of selected component (if applicable):

    4.14, 4.15, 4.16

How reproducible:

    100%

Steps to Reproduce:

    1. try to deploy an hostedCluster on a disconnected environment without explicitly set hypershift.openshift.io/control-plane-operator-image annotation.
    2.
    3.

Actual results:

    A reconciliation error reported on the hostedCluster object:
status:
  conditions:
  - lastTransitionTime: "2024-02-14T22:01:30Z"
    message: 'failed to look up image metadata for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      failed to obtain root manifest for registry.ci.openshift.org/ocp/4.14-2024-02-14-135111@sha256:84c74cc05250d0e51fe115274cc67ffcf0a4ac86c831b7fea97e484e646072a6:
      unauthorized: authentication required'
    observedGeneration: 3
    reason: ReconciliationError
    status: "False"
    type: ReconciliationSucceeded

The hostedCluster is not spawn.

Expected results:

    The hypershift operator uses the RegistryOverrides mechanism also for the control-plane-operator image.
    Explicitly setting hypershift.openshift.io/control-plane-operator-image annotation is not required.

Additional info:

    - Maybe related to OCPBUGS-29110
    - Explicitly setting hypershift.openshift.io/control-plane-operator-image annotation pointing to the mirrored image on the internal registry is a valid workaround.

https://github.com/openshift/hypershift/pull/3860

Bug OCPBUGS-29581: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/334

Bug OCPBUGS-39087: [aws] add validation for public-only subnets workflows

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38832~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38722. The following is the description of the original issue:
—
Description of problem:

    We should add validation in the Installer when public-only subnets is enabled to make sure that:

	1. Print a warning if OPENSHIFT_INSTALL_AWS_PUBLIC_ONLY is set
	2. If this flag is only applicable for public cluster, we could consider exit earlier if publish: Internal
	3. If this flag is only applicable for byo-vpc configuration, we could
 consider exit earlier if no subnets provided in install-config.

Version-Release number of selected component (if applicable):

    all versions that support public-only subnets

How reproducible:

    always

Steps to Reproduce:

    1. Set OPENSHIFT_INSTALL_AWS_PUBLIC_ONLY
    2. Do a cluster install without specifying a VPC.
    3.

Actual results:

    No warning about the invalid configuration.

Expected results:

Additional info:

    This is an internal-only feature, so this validations shouldn't affect the normal path used by customers.

https://github.com/openshift/installer/pull/8913

Bug OCPBUGS-22422: [4.13+] Alert PodStartupStorageOperationsFailing not getting raised

View the Description View the linked PRs

Description of problem:

PodStartupStorageOperationsFailing alert is not getting raised when there are no successfull(zero) mount/attach happens on node

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-10-25-185510

How reproducible:

Always

Steps to Reproduce:

1. Install any platform cluster.
2. Create sc, pvc, dep.
3. Check dep pod reaching to containercreatingstate and check for alert

Actual results:

Alert is not getting raised when there are 0 successfull mount/attach happens

Expected results:

Alert should get raised when there are no successfull mount/attach happens

Additional info:

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1697793500890839 

When we take same alerting expression from 4.12, we can observe the alert in ocp web console page.

https://github.com/openshift/cluster-storage-operator/pull/436

Bug OCPBUGS-24587: ResolutionFailed doesn't clear after recovery

View the Description View the linked PRs

Description of problem:

Installation some operators. After some time the ResolutionFailed showing up:

$ kubectl get subscription.operators -A -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,ResolutionFailed:.status.conditions[?(@.type=="ResolutionFailed")].status,MSG:.status.conditions[?(@.type=="ResolutionFailed")].message'
NAMESPACE                   NAME                                                                         ResolutionFailed   MSG
infra-sso                   rhbk-operator                                                                True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
metallb-system              metallb-operator-sub                                                         True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
multicluster-engine         multicluster-engine                                                          True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
open-cluster-management     acm-operator-subscription                                                    True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
openshift-cnv               kubevirt-hyperconverged                                                      True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
openshift-gitops-operator   openshift-gitops-operator                                                    True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
openshift-local-storage     local-storage-operator                                                       True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
openshift-nmstate           kubernetes-nmstate-operator                                                  <none>             <none>
openshift-operators         devworkspace-operator-fast-redhat-operators-openshift-marketplace            <none>             <none>
openshift-operators         external-secrets-operator                                                    <none>             <none>
openshift-operators         web-terminal                                                                 <none>             <none>
openshift-storage           lvms                                                                         <none>             <none>
openshift-storage           mcg-operator-stable-4.14-redhat-operators-openshift-marketplace              <none>             <none>
openshift-storage           ocs-operator-stable-4.14-redhat-operators-openshift-marketplace              <none>             <none>
openshift-storage           odf-csi-addons-operator-stable-4.14-redhat-operators-openshift-marketplace   <none>             <none>
openshift-storage           odf-operator                                                                 <none>             <none>

At the package server logs you can see one time the catalog source is not available, after a while the catalog source is available but the error doesn't disappear from the subscription.

Package server logs:

time="2023-12-05T14:27:09Z" level=warning msg="error getting bundle stream" action="refresh cache" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.30.37.69:50051: connect: connection refused\"" source="{redhat-operators openshift-marketplace}"
time="2023-12-05T14:27:09Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
time="2023-12-05T14:28:26Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-marketplace openshift-marketplace}" action="sync catalogsource" address="redhat-marketplace.openshift-marketplace.svc:50051" name=redhat-marketplace namespace=openshift-marketplace
time="2023-12-05T14:30:23Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
time="2023-12-05T14:35:56Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
time="2023-12-05T14:37:28Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
time="2023-12-05T14:37:28Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-operators openshift-marketplace}" action="sync catalogsource" address="redhat-operators.openshift-marketplace.svc:50051" name=redhat-operators namespace=openshift-marketplace
time="2023-12-05T14:39:40Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-marketplace openshift-marketplace}" action="sync catalogsource" address="redhat-marketplace.openshift-marketplace.svc:50051" name=redhat-marketplace namespace=openshift-marketplace
time="2023-12-05T14:46:07Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
time="2023-12-05T14:47:37Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-operators openshift-marketplace}" action="sync catalogsource" address="redhat-operators.openshift-marketplace.svc:50051" name=redhat-operators namespace=openshift-marketplace
time="2023-12-05T14:48:21Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
time="2023-12-05T14:49:53Z" level=info msg="updating

Version-Release number of selected component (if applicable):

4.14.3

How reproducible:

Steps to Reproduce:

    1. Install an operator for example metallb
    2. Wait until the catalog pod is not available for on time.
    3. ResolutionFailed doesn't disappear anymore

Actual results:

ResolutionFailed doesn't disappear anymore from subscription.

Expected results:

ResolutionFailed disappear from subscription.

Additional info:

https://github.com/openshift/operator-framework-olm/pull/679

Bug OCPBUGS-24711: Last visited tab not get selected on Pipelines page in dev perspective

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13430

Bug OCPBUGS-29479: excessive Back-off restarting failed container console

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2024-02-08T00:00:00Z
End Time: 2024-02-14T23:59:59Z
Success Rate: 91.30%
Successes: 63
Failures: 6
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 735
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20upgrade-micro%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-02-14%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-02-08%2000%3A00%3A00&testId=openshift-tests-upgrade%3A37f1600d4f8d75c47fc5f575025068d2&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers&upgrade=upgrade-micro&upgrade=upgrade-micro&variant=standard&variant=standard

Note: When you look at the link above you will notice some of the failures mention the bare metal operator. That's being investigated as part of https://issues.redhat.com/browse/OCPBUGS-27760. There have been 3 cases in the last week where the console was in a fail loop. Here's an example:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1757637415561859072

We need help understanding why this is happening and what needs to be done to avoid it.

Bug OCPBUGS-36864: [release-4.16] RHOCP Web Console - One inactive/idle tab causes session expiry for all other tabs.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34387~~. The following is the description of the original issue:
—
Description of problem:

Using "accessTokenInactivityTimeoutSeconds: 900" for "OAuthClient" config. 

One inactive or idle tab causes session expiry for all other tabs. 

Following are the tests performed: 
Test 1 - a single window with a single tab no activity would time out after 15 minutes. 
 
Test 2 - a single window two tabs. No activity in the first tab, but was active in the second tab. Timeout occurred for both tabs after 15 minutes.

Test 3 - a single window with a single tab and activity, does not time out after 15 minutes.

Hence single idle tab causes the user logout from rest of the tabs.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Set the OAuthClient.accessTokenInactivityTimeoutSeconds to 300(or any value)
    2. Login using to OCP web console and open multiple tabs.
    3. Keep one tab idle and work on the other open tabs.
    4. After 5 minutes the session expires for all tabs.

Actual results:

    One inactive or idle tab causes session expiry for all other tabs.

Expected results:

    Session should not be expired if any tab is not idle.

Additional info:

https://github.com/openshift/console/pull/14051

Bug OCPBUGS-37937: HCP: nodes never become available when workers require a proxy to access KAS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37786~~. The following is the description of the original issue:
—
Description of problem:

In the use case when worker nodes require a proxy for outside access and the control plane is external (and only accessible via the internet), ovnkube-node pods never become available because the ovnkube-controller container cannot reach the Kube APIServer.

Version-Release number of selected component (if applicable):

How reproducible: Always

Steps to Reproduce:

1. Create an AWS hosted cluster with Public access and requires a proxy to access the internet.

2. Wait for nodes to become active

Actual results:

Nodes join cluster, but never become active

Expected results:

Nodes join cluster and become active

https://github.com/openshift/cluster-network-operator/pull/2457

Bug OCPBUGS-25011: Update 4.16 ose-vsphere-problem-detector-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vsphere-problem-detector/pull/144

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-27335: Installation fails with 1 master and 2 workers as the console deployment set the number of replicas based on the InfrastructureTopology rather than the ControlPlaneTopology

View the Description View the linked PRs

Description of problem:

The node selector for the console deployment requires deploying it on the master nodes, The node selector for the console deployment requires deploying it on the master nodes, while the replica count is determined by the infrastructureTopology, which primarily tracks the workers' setup.

When an OpenShift cluster is installed with a single master node and multiple workers, this leads the console deployment to request 2 replicas as infrastructureTopology is set to HighlyAvailable. Instead, ControlPlaneTopology is set to SingleReplica as expected.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Install an openshift cluster with 1 master and 2 workers

Actual results:

The installation fails as the replicas for the console deployment is set to 2.

  apiVersion: config.openshift.io/v1
  kind: Infrastructure
  metadata:
    creationTimestamp: "2024-01-18T08:34:47Z"
    generation: 1
    name: cluster
    resourceVersion: "517"
    uid: d89e60b4-2d9c-4867-a2f8-6e80207dc6b8
  spec:
    cloudConfig:
      key: config
      name: cloud-provider-config
    platformSpec:
      aws: {}
      type: AWS
  status:
    apiServerInternalURI: https://api-int.adstefa-a12.qe.devcluster.openshift.com:6443
    apiServerURL: https://api.adstefa-a12.qe.devcluster.openshift.com:6443
    controlPlaneTopology: SingleReplica
    cpuPartitioning: None
    etcdDiscoveryDomain: ""
    infrastructureName: adstefa-a12-6wlvm
    infrastructureTopology: HighlyAvailable
    platform: AWS
    platformStatus:
      aws:
        region: us-east-2
      type: AWS


apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
   .... 
  creationTimestamp: "2024-01-18T08:54:23Z"
  generation: 3
  labels:
    app: console
    component: ui
  name: console
  namespace: openshift-console
spec:
  progressDeadlineSeconds: 600
  replicas: 2

Expected results:

The replica is set to 1, tracking the ControlPlaneTopology value instead of hte infrastructureTopology.

Additional info:

Bug OCPBUGS-31286: ART requests updates to 4.16 image ose-olm-catalogd-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-catalogd/pull/45

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-catalogd/pull/48

Bug OCPBUGS-33567: Expose useUserSettings hooks in dynamic plugin sdk

View the Description View the linked PRs

Description of problem:

useUserSettings hooks are required in the dynamic plugin so, it would be nice to have it available in the dynamic plugin SDK instead of duplicating the codes in the dynamic plugin.

https://github.com/openshift/console/blob/master/frontend/packages/console-shared/src/hooks/useUserSettings.ts

https://github.com/openshift/console/pull/13843

Bug OCPBUGS-25005: Update 4.16 ose-azure-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/100

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/100

Bug OCPBUGS-20209: [Multi-NIC]EgressIP was not moved to second egress node after first egress node unavailable

View the Description View the linked PRs

Description of problem:

Not able to reproduce it manually, but frequently happens when run auto scripts.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-05-195247

How reproducible:

Steps to Reproduce:

1. Label worker-0 node as egress node, created egressIP object,the egressIP was assigned to worker-0 node successfully on secondary NIC

2. Block 9107 port on  worker-0 node and label worker-1 as egress node

3.

Actual results:

EgressIP was not moved to second node
 % oc get egressip
NAME             EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-66330   172.22.0.196


40m         Warning   EgressIPConflict          egressip/egressip-66330       Egress IP egressip-66330 with IP 172.22.0.196 is conflicting with a host (worker-0) IP address and will not be assigned
sh-4.4# ip a show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:1c:cf:40:5d:25 brd ff:ff:ff:ff:ff:ff
    inet 172.22.0.109/24 brd 172.22.0.255 scope global dynamic noprefixroute enp1s0
       valid_lft 76sec preferred_lft 76sec
    inet6 fe80::21c:cfff:fe40:5d25/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Expected results:

EgressIP should move to second egress node

Additional info:

Workaround: deleted it and recreated it works
% oc get egressip
NAME             EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-66330   172.22.0.196                   
% oc delete egressip --all
egressip.k8s.ovn.org "egressip-66330" deleted
 % oc create -f ../data/egressip/config1.yaml 
egressip.k8s.ovn.org/egressip-3 created
% oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-3   172.22.0.196   worker-1        172.22.0.196

https://github.com/openshift/ovn-kubernetes/pull/2048

Story TRT-1581: single-node installs are failing

View the Description View the linked PRs

aws single-node are failing starting with https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.nightly/release/4.16.0-0.nightly-2024-03-27-123853

periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node

periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node-serial

A bunch of operators are degraded, I did notice this but still investigating:

    - lastTransitionTime: '2024-03-27T15:56:02Z'
      message: 'OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve
        route from cache: route.route.openshift.io "oauth-openshift" not found        OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.201.206:443/healthz":
        dial tcp 172.30.201.206:443: connect: connection refused        OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints
        "oauth-openshift" not found        ReadyIngressNodesAvailable: Authentication requires functional ingress which
        requires at least one schedulable and ready node. Got 0 worker nodes, 1 master
        nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).        WellKnownAvailable: The well-known endpoint is not yet available: failed to
        get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap:
        configmap "oauth-openshift" not found (check authentication operator, it is
        supposed to create this)'

https://github.com/openshift/cluster-config-operator/pull/414

Bug OCPBUGS-23167: On SNO with DU profile(RT kernel) tuned profile is always degraded due to net.core.busy_read, net.core.busy_poll and kernel.numa_balancing sysctl not existing in RT kernel

View the Description View the linked PRs

Description of problem:

On SNO with DU profile(RT kernel) tuned profile is always degraded due to net.core.busy_read, net.core.busy_poll and kernel.numa_balancing sysctl not existing in RT kernel

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with DU profile(RT kernel)
2. Check tuned profile

Actual results:

oc -n openshift-cluster-node-tuning-operator get profile -o yaml
apiVersion: v1
items:
- apiVersion: tuned.openshift.io/v1
  kind: Profile
  metadata:
    creationTimestamp: "2023-11-09T18:26:34Z"
    generation: 2
    name: sno.kni-qe-1.lab.eng.rdu2.redhat.com
    namespace: openshift-cluster-node-tuning-operator
    ownerReferences:
    - apiVersion: tuned.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Tuned
      name: default
      uid: 4e7c05a2-537e-4212-9009-e2724938dec9
    resourceVersion: "287891"
    uid: 5f4d5819-8f84-4b3b-9340-3d38c41501ff
  spec:
    config:
      debug: false
      tunedConfig: {}
      tunedProfile: performance-patch
  status:
    conditions:
    - lastTransitionTime: "2023-11-09T18:26:39Z"
      message: TuneD profile applied.
      reason: AsExpected
      status: "True"
      type: Applied
    - lastTransitionTime: "2023-11-09T18:26:39Z"
      message: 'TuneD daemon issued one or more error message(s) during profile application.
        TuneD stderr: net.core.rps_default_mask'
      reason: TunedError
      status: "True"
      type: Degraded
    tunedProfile: performance-patch
kind: List
metadata:
  resourceVersion: ""

Expected results:

Not degraded

Additional info:

Looking at the tuned log the following errors show up which are probably causing the profile to get into degraded state:

2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.core.busy_read', the parameter does not exist
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: sysctl option net.core.busy_read will not be set, failed to read the original value.
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.core.busy_poll', the parameter does not exist
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: sysctl option net.core.busy_poll will not be set, failed to read the original value.
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'kernel.numa_balancing', the parameter does not exist
2023-11-09 18:30:49,287 ERROR    tuned.plugins.plugin_sysctl: sysctl option kernel.numa_balancing will not be set, failed to read the original value.

These sysctl parameters seem not to be available with RT kernel.

https://github.com/openshift/cluster-node-tuning-operator/pull/954

Bug OCPBUGS-35742: TechPreviewNoUpgrade featureset should not be disabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34907~~. The following is the description of the original issue:
—
Description of problem:

    The TechPreviewNoUpgrade featureset could be disabled on a 4.16 cluster after enabling it. But according to the official doc `Enabling this feature set cannot be undone and prevents minor version updates`, it should not be disabled.

# ./oc get featuregate cluster -ojson|jq .spec
{  "featureSet": "TechPreviewNoUpgrade"}

# ./oc patch featuregate cluster --type=json -p '[{"op":"remove", "path":"/spec/featureSet"}]
'featuregate.config.openshift.io/cluster patched

# ./oc get featuregate cluster -ojson|jq .spec
{}

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-06-03-060250

How reproducible:

    always

Steps to Reproduce:

    1. enable the TechPreviewNoUpgrade fs on a 4.16 cluster
    2. then remove it 
    3.

Actual results:

    TechPreviewNoUpgrade featureset was disabled

Expected results:

    Enabling this feature set cannot be undone

Additional info:

https://github.com/openshift/api/blob/master/config/v1/types_feature.go#L43-L44

https://github.com/openshift/api/pull/1933

Bug OCPBUGS-26073: Setting image trigger from web console adds annotation as pause instead of paused

View the Description View the linked PRs

Description of problem:

We want to update trigger from auto to manual or vice versa. We can do it with CLI 'oc set triggers deployment/<name> --manual'. It normally changes to deployment annotation metadata.annotations.image.openshift.io/triggers to "paused: true" or "paused: false" when set to auto. But when we enable or disable auto trigger by editing deployment from web console, it overrides annotation to "pause: false" or "pause: true" without 'd'.

Version-Release number of selected component (if applicable):

How reproducible:

Create simple httpd application. Follow [1] to  set trigger using CLI. Steps to set trigger from console:

Web console->deployment-> Edit deployment > Form view-> Images section -> Enable Deploy image from an image stream tag -> Enable Auto deploy when new Image is available an save the changes -> check annotations

[1] https://docs.openshift.com/container-platform/4.12/openshift_images/triggering-updates-on-imagestream-changes.html

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

code: https://github.com/openshift/console/blob/master/frontend/packages/dev-console/src/utils/resource-label-utils.ts#L78

https://github.com/openshift/console/pull/13492

Bug OCPBUGS-30075: [4.16] okd build ironic-agent-image job is failing

View the Description View the linked PRs

the okd build image job in ironic-agent-image is failing with the error message

Complete!
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    14  100    14    0     0     73      0 --:--:-- --:--:-- --:--:--    73
  File "<stdin>", line 1
    404: Not Found
    ^
SyntaxError: illegal target for annotation
INFO[2024-02-29T08:06:27Z] Ran for 4m3s                                 
ERRO[2024-02-29T08:06:27Z] Some steps failed:                           
ERRO[2024-02-29T08:06:27Z] 
  * could not run steps: step ironic-agent failed: error occurred handling build ironic-agent-amd64: the build ironic-agent-amd64 failed after 1m57s with reason DockerBuildFailed: Dockerfile build strategy has failed. 
INFO[2024-02-29T08:06:27Z] Reporting job state 'failed' with reason 'executing_graph:step_failed:building_project_image'

https://github.com/openshift/ironic-agent-image/pull/114

Task SPLAT-1209: [vsphere] log known datacenters in the problem detector when datacenter not found

View the linked PRs

https://github.com/openshift/vsphere-problem-detector/pull/154

Bug OCPBUGS-27450: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/299

Bug OCPBUGS-25687: [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup fails frequently for OpenStack CSI jobs

View the Description View the linked PRs

Description of problem:

The [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup test is a frequent offender in the OpenStack CSI jobs. We're seeing it fail on 4.14 up to 4.16.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.14-e2e-openstack-csi-cinder

Example of failed job.
Example of successful job.

It seems like the 1 min timeout is too short and does not give enough time for the pods backing the service to come up.

https://github.com/openshift/origin/blob/1e6c579/pkg/monitortests/network/disruptionpodnetwork/monitortest.go#L191

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
Don't presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with "sbr-untriaged"
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"

https://github.com/openshift/origin/pull/28474

Bug OCPBUGS-36773: 4.16: in-tree volume tests are failing in UPI multi-zone jobs

View the Description View the linked PRs

Description of problem:

   The following tests are failing:

- [sig-storage] In-tree Volumes [Driver: vsphere] [Testpattern: Inline-volume (ext4)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s] 
- [sig-storage] In-tree Volumes [Driver: vsphere] [Testpattern: Pre-provisioned PV (block volmode)] volumes should store data [Suite:openshift/conformance/parallel] [Suite:k8s] 
- [sig-storage] In-tree Volumes [Driver: vsphere] [Testpattern: Pre-provisioned PV (ext4)] volumes should store data [Suite:openshift/conformance/parallel] [Suite:k8s]

Job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-upi-zones/1808605014239744000

Version-Release number of selected component (if applicable):

    4.16 nightly

How reproducible:

    consistently: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-upi-zones

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/2021

Story CONSOLE-3924: i18n upload/download routine task

View the Description View the linked PRs

The story is to track i18n upload/download routine tasks which are perform every sprint.

A.C.

- Upload strings to Memosource at the start of the sprint and reach out to localization team

- Download translated strings from Memsource when it is ready

- Review the translated strings and open a pull request

- Open a followup story for next sprint

https://github.com/openshift/console/pull/13608

Bug OCPBUGS-24356: configure-ovs.sh fails to correctly bring up "br-ex" connection during upgrade of OpenShift from 4.12.15 to 4.12.42

View the Description View the linked PRs

Description of problem:

After updating the cluster to 4.12.42 (from 4.12.15), the customer noticed some issues for the scheduled PODs to start on the node.

The initial thought was a multus issue, and then we realised that the script /usr/local/bin/configure-ovs.sh was modified and reverting the modification fixed the issue.

Modification:

>     if nmcli connection show "$vlan_parent" &> /dev/null; then
>       # if the VLAN connection is configured with a connection UUID as parent, we need to find the underlying device
>       # and create the bridge against it, as the parent connection can be replaced by another bridge.
>       vlan_parent=$(nmcli --get-values GENERAL.DEVICES conn show ${vlan_parent})
>     fi

Reference:

Version-Release number of selected component (if applicable):

4.12.42

How reproducible:

Should be reproducible by setting inactive nmcli connections with the same names as the active once

Steps to Reproduce:

Not tested, but this should be something like
1. create inactive same nmcli connections
2. run the script

Actual results:

Script failing

Expected results:

Script should manage the connection using the UUID instead of using the Name.
Or maybe it's an underline issue how nmcli is managing the relationship between objects.

Additional info:

The issue may be related to the way that nmcli is working, as it should use the UUID to match the `vlan.parent` as it does with the `connection.master`

https://github.com/openshift/machine-config-operator/pull/4055

Bug USHIFT-2912: Update Multus manifests after ART produces smaller Multus & container-networking-plugins images

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2365

Story MGMT-16649: Assisted should pass correct headers when pulling ignition from hypershift

View the Description View the linked PRs

Hypershift's ignition server expects certain headers. Assisted needs to gather all of this data and pass it when fetching the ignition from hypershift's ignition server.

Work items:

Modify DB host to contain ignition information
Modify agent kube api and agent controller to get additional information needed (targetconfighash and nodepool name) and store it in the db host
Use the headers when fetching the ignition

Bug OCPBUGS-27928: Update 4.16 coredns-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/111

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/111

Bug OCPBUGS-28658: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-apiserver/pull/98

Bug OCPBUGS-31341: Unnecessarily complex code to update pull secret

View the Description View the linked PRs

Code to decide whether to update the Pull Secret replicates most of the functionality of the ApplySecret() func in library-go, which it then calls anyway.

This is hard to read, and misleading for anybody wanting to add similar functionality.

https://github.com/openshift/cluster-baremetal-operator/pull/409

Bug OCPBUGS-31663: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1928

Bug OCPBUGS-32019: Handle loading issue for PLR status in PLR list page

View the Description View the linked PRs

Description of problem:

    In PipelineRun list page, while fetching taskruns for particular pipelinerun, add loading if TaskRuns is not fetched yet

Version-Release number of selected component (if applicable):

4.15.z

How reproducible:

    Sometimes

Steps to Reproduce:

    1.Create a failed pipelinerun
    2.Check Task Status field
    3.

Actual results:

    Sometimes TaskRun Status value is -

Expected results:

    Should show status bars

Additional info:

https://github.com/openshift/console/pull/13747

Bug MGMT-16843: non-lowercase hostname in DHCP breaks assisted installation

View the Description View the linked PRs

Description of the problem:

non-lowercase hostname in DHCP breaks assisted installation

How reproducible:

100%

Steps to reproduce:

https://issues.redhat.com/browse/AITRIAGE-10248
User did ask for a valid requested_hostname

Actual results:

bootkube fails

Expected results:{}

bootkube should succeed

slack thread

Bug OCPBUGS-23631: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-scheduler-operator/pull/524

Bug OCPBUGS-25053: Update 4.16 csi-attacher-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/66

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/67

Bug OCPBUGS-43104: OAuthServer service with Route type does not work with a custom hostname

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42714~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-36261. The following is the description of the original issue:
—
Description of problem:

In hostedcluster installations, when the following OAuthServer service is configure without any configured hostname parameter, the oauth route is created in the management cluster with the standard hostname  which following the pattern from ingresscontroller wilcard domain (oauth-<hosted-cluster-namespace>.<wildcard-default-ingress-controller-domain>):  

~~~
$ oc get hostedcluster -n <namespace> <hosted-cluster-name> -oyaml
  - service: OAuthServer
    servicePublishingStrategy:
      type: Route
~~~  

On the other hand, if any custom hostname parameter is configured, the oauth route is created in the management cluster with the following labels: 

~~~
$ oc get hostedcluster -n <namespace> <hosted-cluster-name> -oyaml
  - service: OAuthServer
    servicePublishingStrategy:
      route:
        hostname: oauth.<custom-domain>
      type: Route

$ oc get routes -n hcp-ns --show-labels
NAME    HOST/PORT             LABELS
oauth oauth.<custom-domain>  hypershift.openshift.io/hosted-control-plane=hcp-ns <---
~~~

The configured label makes the ingresscontroller does not admit the route as the following configuration is added by hypershift operator to the default ingresscontroller resource: 

~~~
$ oc get ingresscontroller -n openshift-ingress-default default -oyaml
    routeSelector:
      matchExpressions:
      - key: hypershift.openshift.io/hosted-control-plane <---
        operator: DoesNotExist <---
~~~

This configuration should be allowed as there are use-cases where the route should have a customized hostname. Currently the HCP platform is not allowing this configuration and the oauth route does not work.

Version-Release number of selected component (if applicable):

   4.15

How reproducible:

    Easily

Steps to Reproduce:

    1. Install HCP cluster 
    2. Configure OAuthServer with type Route 
    3. Add a custom hostname different than default wildcard ingress URL from management cluster

Actual results:

    Oauth route is not admitted

Expected results:

    Oauth route should be admitted by Ingresscontroller

Additional info:

https://github.com/openshift/hypershift/pull/4899

Bug OCPBUGS-34417: [capi aws] failed to create IAM roles in ROSA

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33926~~. The following is the description of the original issue:
—
Description of problem:

During the creation of a 4.16 cluster using the nightly build (--channel-group nightly --version 4.16.0-0.nightly-2024-05-19-235324) with the following command:

osa create cluster --cluster-name $CLUSTER_NAME --sts --mode auto --machine-cidr 10.0.0.0/16 --compute-machine-type m6a.xlarge --region $REGION --oidc-config-id $OIDC_ID --channel-group nightly --version 4.16.0-0.nightly-2024-05-19-235324 --ec2-metadata-http-tokens optional --replicas 2 --service-cidr 172.30.0.0/16 --pod-cidr 10.128.0.0/14 --host-prefix 23 -y

How reproducible:

1. Run the command provided above to create a cluster.Observe the error during the IAM role creation step.
2. Observe the error during the IAM role creation step.

Actual results:

time="2024-05-20T03:21:03Z" level=error msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed during pre-provisioning: failed to create IAM roles: failed to create inline policy for role master: AccessDenied: User: arn:aws:sts::890193308254:assumed-role/ManagedOpenShift-Installer-Role/1716175231092827911 is not authorized to perform: iam:PutRolePolicy on resource: role ManagedOpenShift-ControlPlane-Role because no identity-based policy allows the iam:PutRolePolicy action\n\tstatus code: 403, request id: 27f0f631-abdd-47e9-ba02-a2e71a7487dc"
time="2024-05-20T03:21:04Z" level=error msg="error after waiting for command completion" error="exit status 4" installID=wx9l766h
time="2024-05-20T03:21:04Z" level=error msg="error provisioning cluster" error="exit status 4" installID=wx9l766h
time="2024-05-20T03:21:04Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 4" installID=wx9l766h
time="2024-05-20T03:21:04Z" level=debug msg="OpenShift Installer v4.16.0

Expected results:

The cluster should be created successfully without IAM permission errors.

Additional info:

- The IAM role ManagedOpenShift-Installer-Role does not have the necessary permissions to perform iam:PutRolePolicy on the ManagedOpenShift-ControlPlane-Role.

- This issue was observed with the nightly build 4.16.0-0.nightly-2024-05-19-235324.

More context: https://redhat-internal.slack.com/archives/C070BJ1NS1E/p1716182046041269

https://github.com/openshift/installer/pull/8463

Bug OCPBUGS-35152: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13947

Bug OCPBUGS-30489: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-scheduler-operator/pull/538

Story TRT-1406: hypershift e2e conformance failing OLM test on 4.16 nightlies

View the Description View the linked PRs

This is failing on hypershift:

[sig-operator] an end user can use OLM can subscribe to the operator [apigroup:config.openshift.io]

https://redhat-internal.slack.com/archives/C01CQA76KMX/p1702325839977229?thread_ts=1702316018.298589&cid=C01CQA76KMX

https://github.com/openshift/hypershift/pull/3306

Bug OCPBUGS-24840: Update 4.16 ose-thanos-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/thanos/pull/134

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-30954: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-39394: [release-4.16] gather nmstate custom resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39393~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39111. The following is the description of the original issue:
—
Gather the nodenetworkconfigurationpolicy.nmstate.io/v1 and nodenetworkstate.nmstate.io/v1beta1 cluster scoped resources in the Insights data. This CRs are introduced by the NMState operator.

https://github.com/openshift/insights-operator/pull/986

Bug OCPBUGS-32725: [4.16] unable to logout from cluster with external OIDC provider

View the Description View the linked PRs

Description of problem:

   As a logged in user Im unable to logout from cluster with external OIDC provider.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Login into cluster with external OIDC setup
    2.
    3.

Actual results:

    Unable to logout

Expected results:

    Logout successfully

Additional info:

https://github.com/openshift/console/pull/13755

Bug OCPBUGS-24955: Update 4.16 ose-cluster-config-api-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/api/pull/1700

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-27027: CNO pod restart during kubevirt e2e

View the Description View the linked PRs

W0109 17:47:02.340203       1 builder.go:109] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'

https://github.com/openshift/hypershift/pull/3409

Bug OCPBUGS-42720: oc adm prune deployments` does not work and giving panic when using --replica-set option

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42164~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42143. The following is the description of the original issue:
—
Description of problem:

    There is another panic occurred in https://issues.redhat.com/browse/OCPBUGS-34877?focusedId=25580631&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-25580631 which should be fixed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1891

Bug OCPBUGS-24041: Console blips Available=False with RouteHealth_FailedGet and such

View the Description View the linked PRs

Description

Seen in 4.15-related update CI:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/console.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False[^:]*: \(.*\)|\1 \2 \3|' | sed 's|[.]apps[.][^ /]*|.apps...|g' | sort | uniq -c | sort -n
      1 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... dial tcp 52.158.160.194:443: connect: connection refused
      1 console RouteHealth_StatusError route not yet available, https://console-openshift-console.apps... returns '503 Service Unavailable'
      2 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... dial tcp: lookup console-openshift-console.apps... on 172.30.0.10:53: no such host
      2 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... EOF
      8 console RouteHealth_RouteNotAdmitted console route is not admitted
     16 console RouteHealth_FailedGet failed to GET route (https://console-openshift-console.apps... Get "https://console-openshift-console.apps... context deadline exceeded (Client.Timeout exceeded while awaiting headers)

For example this 4.14 to 4.15 run had:

: [bz-Management Console] clusteroperator/console should not change condition/Available 
Run #0: Failed 	1h25m23s
{  1 unexpected clusteroperator state transitions during e2e test run 

Nov 28 03:42:41.207 - 1s    E clusteroperator/console condition/Available reason/RouteHealth_FailedGet status/False RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.ci-op-d2qsp1gp-2a31d.aws-2.ci.openshift.org): Get "https://console-openshift-console.apps.ci-op-d2qsp1gp-2a31d.aws-2.ci.openshift.org": context deadline exceeded (Client.Timeout exceeded while awaiting headers)}

While a timeout for console Route isn't fantastic, an issue that only persists for 1s is not long enough to warrant immediate admin intervention. Teaching the console operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component

At least 4.15. Possibly other versions; I haven't checked.

.h2 How reproducible

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/console.*condition/Available.*status/False' | grep 'periodic.*failures match' | sort
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 12 runs, 17% failed, 50% of failures match = 8% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-ppc64le (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 12 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 12 runs, 25% failed, 33% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 23% failed, 28% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 28% failed, 23% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade (all) - 63 runs, 38% failed, 8% of failures match = 3% impact
periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-azure-sdn-upgrade (all) - 60 runs, 73% failed, 11% of failures match = 8% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 70 runs, 7% failed, 20% of failures match = 1% impact

Seems like it's primarily minor-version updates that trip this, and in jobs with high run counts, the impact percentage is single-digits.

Steps to reproduce

There may be a way to reliable trigger these hiccups, but as a reproducer floor, running days of CI and checking to see whether impact percentages decrease would be a good way to test fixes post-merge.

Actual results

Lots of console ClusterOperator going Available=False blips in 4.15 update CI.

Expected results

Console goes Available=False if and only if immediate admin intervention is appropriate.

https://github.com/openshift/console-operator/pull/834

Bug OCPBUGS-24977: Update 4.16 cluster-monitoring-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/2190

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25632: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/47

Bug OCPBUGS-25618: Bump documentationBaseURL to 4.16

View the Description View the linked PRs

Description of problem:

documentationBaseURL still points to 4.14

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

1.Check documentationBaseURL on 4.16 cluster: 
# oc get configmap console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/

2.
3.

Actual results:

1.documentationBaseURL is still pointing to 4.14

Expected results:

1.documentationBaseURL should point to 4.16

Additional info:

https://github.com/openshift/console-operator/pull/824

Bug OCPBUGS-26466: [4.16] don't enforce PSa in 4.16

View the Description View the linked PRs

Description of problem:

We shouldn't enforce PSa in 4.16, neither by label sync, neither by global cluster config.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

As a cluster admin:
1. create two new namespaces/projects: pokus, openshift-pokus
2. as a cluster-admin, attempt to create a privileged pod in both the namespaces from 1.

Actual results:

pod creation is blocked by pod security admission

Expected results:

only a warning about pod violating the namespace pod security level should be emitted

Additional info:

https://github.com/openshift/api/pull/1912

Task HOSTEDCP-1355: Remove Unused Functions from Repo

View the Description View the linked PRs

As a maintainer of the HyperShift repo, I would like to remove unused functions from the code base to reduce the code footprint of the repo.

https://github.com/openshift/hypershift/pull/3325

Bug OCPBUGS-27892: panic in poller

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28544

Bug OCPBUGS-34117: Cloud credential operator logs two errors per second when awsSTSIAMRoleARN is empty

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33566~~. The following is the description of the original issue:
—
Description of problem:

When the cloud-credential operator is used in manual mode, and awsSTSIAMRoleARN is not present in the secret operator pods, it throws aggressive errors every second. 

One of the customer concern about the number of errors from the operator pods

Two errors per second
============================
time="2024-05-10T00:43:45Z" level=error msg="error syncing credentials: an empty awsSTSIAMRoleARN was found so no Secret was created" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials

time="2024-05-10T00:43:46Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator secret=openshift-cluster-csi-drivers/ebs-cloud-credentials

Version-Release number of selected component (if applicable):

    4.15.3

How reproducible:

    Always present in managed rosa clusters

Steps to Reproduce:

    1.create a rosa cluster 
    2.check the errors of cloud credentials operator pods 
    3.

Actual results:

    The CCO logs continually throw errors

Expected results:

    The CCO logs should not be continually throwing these errors.

Additional info:

    The focus of this bug is only to remove the error lines from the logs. The underlying issue, of continually attempting to reconcile the CRs will be handled by other bugs.

https://github.com/openshift/cloud-credential-operator/pull/704

Bug OCPBUGS-24825: Update 4.16 ose-olm-rukpak-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-rukpak/pull/67

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-rukpak/pull/67

Bug OCPBUGS-25457: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/42

Bug OCPBUGS-25581: Update 4.16 ose-gcp-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/52

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/52

Bug OCPBUGS-27930: Update 4.16 ose-cluster-kube-storage-version-migrator-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/102

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/103

Bug OCPBUGS-29624: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4201

Bug OCPBUGS-25503: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/71

Task MON-3847: Bump downstream node_exporter to v1.8.0

View the linked PRs

Bug OCPBUGS-23624: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/325

Bug OCPBUGS-36744: New kubelet metrics test should ignore outages during node update, not just reboot

View the Description View the linked PRs

This is a clone of issue OCPBUGS-36263. The following is the description of the original issue:
—
The new test: [sig-node] kubelet metrics endpoints should always be reachable

Is picking up some upgrade job runs where we see the metrics endpoint go down for about 30 seconds, during the generic node update phase, and recover before we reboot the node. This is treated as a reason to flake the test because there was no overlap with reboot as initially written.

Example: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade/1806142925785010176
Interval chart showing the problem: https://sippy.dptools.openshift.org/sippy-ng/job_runs/1806142925785010176/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade/intervals?filterText=master-1&intervalFile=e2e-timelines_spyglass_20240627-024633.json&overrideDisplayFlag=0&selectedSources=E2EFailed&selectedSources=MetricsEndpointDown&selectedSources=NodeState

The master outage at 3:30:59 is causing a flake when I'd rather it didn't, because it doesn't extend into the reboot.

I'd like to tighten this up to include any overlap with update.

Will be backported to 4.16 to tighten the signal there as well.

https://github.com/openshift/origin/pull/28928

Bug OCPBUGS-25569: Update 4.16 ose-azure-disk-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver/pull/70

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-disk-csi-driver/pull/70

Bug OCPBUGS-39135: monitor test pod-network-avalibility setup fails frequently on openstack

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39134. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-31738. The following is the description of the original issue:
—
Description of problem:

The [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup test frequently fails on OpenStack platform, which in turn also causes the [sig-network] can collect pod-to-service poller pod logs and [sig-network] can collect host-to-service poller pod logs tests to fail.

These failure happen frequently in vh-mecha, for example for all CSI jobs, such as 4.16-e2e-openstack-csi-cinder.

https://github.com/openshift/origin/pull/29052

Bug OCPBUGS-39098: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1104

Story METAL-949: force install min version of pyasn1 and pysnmp-lextudio

View the Description View the linked PRs

Recently lextudio dropped pyasn1 so we want to be explicit and show that we install pysnmp-lextudio but normal pyasn1

https://github.com/openshift/ironic-image/pull/464

Bug OCPBUGS-34728: No default region set in install-config survey

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34714~~. The following is the description of the original issue:
—
Description of problem:

While creating an install configuration for PowerVS IPI, the default region is not set leading to the survey getting stuck if nothing is entered at the command line.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. openshift-install create install-config

Actual results:

No default region is selected and hence pressing enter on the first option without going up or down results in the error "Sorry, your reply was invalid: invalid region """

Expected results:

    dal gets selected as the default

Additional info:

https://github.com/openshift/installer/pull/8506

Bug OCPBUGS-30824: Remove react-helmet from shared modules in plugin sdk

View the Description View the linked PRs

Description of problem:

    Remove react-helmet from the list of shared modules in console/frontend/packages/console-dynamic-plugin-sdk/src/shared-modules.ts (noted as deprecated from last release)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13687

Bug OCPBUGS-32400: Console-operator should update OIDC status based on the ExternalOIDC feature gate

View the Description View the linked PRs

Description of problem:

console-operator is updating the OIDC status without checking the feature gate

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Setup a OCP cluster without external OIDC provider, using default OAuth.

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

the OIDC related conditions are is being surfaced in the console-operator's config conditions.

Expected results:

    the OIDC related conditions should not be surfaced in the console-operator's config conditions.

Additional info:

https://github.com/openshift/console-operator/pull/887

Bug OCPBUGS-25159: Update 4.16 baremetal-machine-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-baremetal/pull/207

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-baremetal/pull/207

Bug OCPBUGS-25788: Console Node view is missing information when using metal3-plugin

View the Description View the linked PRs

Description of problem:

When an OpenShift Container Platform cluster is installed on Bare Metal with RHACM, the "metal3-plugin" for the OpenShift Console is installed automatically.

The "Nodes view (`<console>/k8s/cluster/core~v1~Node`) uses the `BareMetalNodesTable` which has very limited columns. However in the meantime OCP improved their Nodes table and added more features (like metrics) and we havent done any work in metal3. Customers are missing information like metrics or Pods, which are present in the standard Node view.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.33

How reproducible:

Always

Steps to Reproduce:

    1. Install a cluster using RHACM on Bare Metal
    2. Ensure the "metal3-plugin" is enabled
    3. Navigate to the "Nodes" view in the OpenShift Container Platform Console (`<console>/k8s/cluster/core~v1~Node`)

Actual results:

Limited columns (Name, Status, Role, Machine, Management Address) is visible. Columns like Memory, CPU, Pods, Filesystem, Instance Type are missing

Expected results:

All the columns from the standard view are visible, plus the "Management Address" column

Additional info:

* Issue was discussed here: https://redhat-internal.slack.com/archives/C027TN14SGJ/p1702552957981989
* Screenshot of non-metal3 cluster: https://redhat-internal.slack.com/archives/C027TN14SGJ/p1702552980878029?thread_ts=1702552957.981989&cid=C027TN14SGJ
* Screenshot of metal3 cluster: https://redhat-internal.slack.com/archives/C027TN14SGJ/p1702552995363389?thread_ts=1702552957.981989&cid=C027TN14SGJ

https://github.com/openshift/console/pull/13506

Bug OCPBUGS-39017: Ironic inspection fails due to utf-8 decoding issue on Disk serial

View the Description View the linked PRs

Description of problem:

Inspection is failing on hosts which special characters found in serial number of block devices:

Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: 2024-07-03 09:16:11.325 1 DEBUG ironic_python_agent.inspector [-] collected data: {'inventory'....'error': "The following errors were encountered:\n* collector logs failed: 'utf-8' codec can't decode byte 0xff in position 12: invalid start byte"} call_inspector /usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py:128

Serial found:
"serial": "2HC015KJ0000\udcff\udcff\udcff\udcff\udcff\udcff\udcff\udcff"

Interesting stacktrace error:
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1260-1267: surrogates not allowed

Full stack trace:
~~~
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: 2024-07-03 09:16:11.628 1 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -bia --json -oKNAME,MODEL,SIZE,ROTA,TYPE,UUID,PARTUUID,SERIAL" returned: 0 in 0.006s e
xecute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: --- Logging error ---
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: --- Logging error ---
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: Traceback (most recent call last):
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]:   File "/usr/lib64/python3.9/logging/__init__.py", line 1086, in emit
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Traceback (most recent call last):
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib64/python3.9/logging/__init__.py", line 1086, in emit
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     stream.write(msg + self.terminator)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1260-1267: surrogates not allowed
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Call stack:
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]:     stream.write(msg + self.terminator)
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1260-1267: surrogates not allowed
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/bin/ironic-python-agent", line 10, in <module>
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     sys.exit(run())
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/cmd/agent.py", line 50, in run
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     agent.IronicPythonAgent(CONF.api_url,
Jul 03 09:16:11 master3.xxxxxx.yyy ironic-agent[2272]: Call stack:
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/agent.py", line 485, in run
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     self.process_lookup_data(content)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/agent.py", line 400, in process_lookup_data
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     hardware.cache_node(self.node)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3179, in cache_node
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     dispatch_to_managers('wait_for_disks')
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3124, in dispatch_to_managers
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     return getattr(manager, method)(*args, **kwargs)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 997, in wait_for_disks
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     self.get_os_install_device()
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1518, in get_os_install_device
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     block_devices = self.list_block_devices_check_skip_list(
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1495, in list_block_devices_check_skip_list
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     block_devices = self.list_block_devices(
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1460, in list_block_devices
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     block_devices = list_all_block_devices()
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 526, in list_all_block_devices
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     report = il_utils.execute('lsblk', '-bia', '--json',
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_lib/utils.py", line 111, in execute
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     _log(result[0], result[1])
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:   File "/usr/lib/python3.9/site-packages/ironic_lib/utils.py", line 99, in _log
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]:     LOG.debug('Command stdout is: "%s"', stdout)
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Message: 'Command stdout is: "%s"'
Jul 03 09:16:11 master3.xxxxxx.yyy podman[2234]: Arguments: ('{\n   "blockdevices": [\n      {\n         "kname": "loop0",\n         "model": null,\n         "size": 67467313152,\n         "rota": false,\n         "type": "loop",\n         "uuid": "28f5ff52-7f5b-4e5a-bcf2-59813e5aef5a",\n         "partuuid": null,\n         "serial": null\n      },{\n         "kname": "loop1",\n         "model": null,\n         "size": 1027846144,\n         "rota": false,\n         "type": "loop",\n         "uuid": null,\n         "partuuid": null,\n         "serial": null\n      },{\n         "kname": "sda",\n         "model": "LITEON IT ECE-12",\n         "size": 120034123776,\n         "rota": false,\n         "type": "disk",\n         "uuid": null,\n         "partuuid": null,\n         "serial": "XXXXXXXXXXXXXXXXXX"\n      },{\n         "kname": "sdb",\n         "model": "LITEON IT ECE-12",\n         "size": 120034123776,\n         "rota": false,\n         "type": "disk",\n         "uuid": null,\n         "partuuid": null,\n         "serial": "XXXXXXXXXXXXXXXXXXXX"\n      },{\n         "kname": "sdc",\n         "model": "External",\n         "size": 0,\n         "rota": true,\n         "type": "disk",\n         "uuid": null,\n         "partuuid": null,\n         "serial": "2HC015KJ0000\udcff\udcff\udcff\udcff\udcff\udcff\udcff\udcff"\n      }\n   ]\n}\n',)
~~~

Version-Release number of selected component (if applicable):

OCP 4.14.28

How reproducible:

Always

Steps to Reproduce:

    1. Add a BMH with a bad utf-8 characters in serial
    2.
    3.

Actual results:

Inspection fail

Expected results:

Inspection works

Additional info:

Bug OCPBUGS-42015: Topology screen crashes when completed pod is selected

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41685~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37584. The following is the description of the original issue:
—
Description of problem:

Topology screen crashes and reports "Oh no! something went wrong" when a pod in completed state is selected.

Version-Release number of selected component (if applicable):

RHOCP 4.15.18

How reproducible:

100%

Steps to Reproduce:

1. Switch to developer mode
2. Select Topology
3. Select a project that has completed cron jobs like openshift-image-registry
4. Click the green CronJob Object
5. Observe Crash

Actual results:

The Topology screen crashes with error "Oh no! Something went wrong."

Expected results:

After clicking the completed pod / workload, the screen should display the information related to it.

Additional info:

https://github.com/openshift/console/pull/14298

Bug OCPBUGS-24743: Remove CRI-O-update-triggered image wipe

View the Description View the linked PRs

Description of problem:

Since many 4.y ago, before 4.11 and all the minor versions that are still supported, CRI-O has wiped images when it comes up after a node reboot and notices it has a new (minor?) version. This causes redundant pulls, as seen in this 4.11-to-4.12 update run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade/1732741139229839360/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes/ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4/journal | zgrep 'Starting update from rendered-\|crio-wipe\|Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2'
Dec 07 13:05:42.474144 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 13:05:42.481470 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 191ms CPU time
Dec 07 13:59:51.000686 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1498]: time="2023-12-07 13:59:51.000591203Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=a62bc972-67d7-401a-9640-884430bd16f1 name=/runtime.v1.ImageService/PullImage
Dec 07 14:00:55.745095 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 root[101294]: machine-config-daemon[99469]: Starting update from rendered-worker-ca36a33a83d49b43ed000fd422e09838 to rendered-worker-c0b3b4eadfe6cdfb595b97fa293a9204: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Dec 07 14:05:33.274241 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 14:05:33.289605 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 216ms CPU time
Dec 07 14:14:50.277011 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1573]: time="2023-12-07 14:14:50.276961087Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=1a092fbd-7ffa-475a-b0b7-0ab115dbe173 name=/runtime.v1.ImageService/PullImage

The redundant pulls cost network and disk traffic, and avoiding them should make those update-initiated reboots quicker and cheaper. The lack of update-initiated wipes is not expected to cost much, because the Kubelet's old-image garbage collection should be along to clear out any no-longer-used images if disk space gets tight.

Version-Release number of selected component (if applicable):

At least 4.11. Possibly older 4.y; I haven't checked.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Update to a release image with a different CRI-O (minor?) version.
3. Check logs on the nodes.

Actual results:

crio-wipe entries in the logs, with reports of target-release images being pulled before and after those wipes, as I quoted in the Description.

Expected results:

Target-release images pulled before the reboot, and found in the local cache if that image is needed again post-reboot.

https://github.com/openshift/machine-config-operator/pull/4068

Bug OCPBUGS-31466: oc-mirror's new defaultChannel property breaks after initial sync

View the Description View the linked PRs

Description of problem:

It seems something might be wrong with the logic for the new defaultChannel property. After initially syncing an operator to a tarball, subsequent runs complain the catalog is invalid, as if defaultChannel was never set.

Version-Release number of selected component (if applicable):

I tried oc-mirror v4.14.16 and v4.15.2

How reproducible:

100%

Steps to Reproduce:

1. Write this yaml config to an isc.yaml file in an empty dir. (It is worth noting that right now the default channel for this operator is of course something else – currently `latest`.)

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: ./operator-images
mirror:
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.14
      packages:
        - name: openshift-pipelines-operator-rh
          defaultChannel: pipelines-1.11
          channels:
            - name: pipelines-1.11
              minVersion: 1.11.3
              maxVersion: 1.11.3

2. Using oc-mirror v4.14.16 or v4.15.2, run:

oc-mirror -c ./isc.yaml file://operator-images

3. Without the defaultChannel property and a recent version of oc-mirror, that would have failed. Assuming it succeeds, run the same command a second time (with or without the --dry-run option) and note that it now fails. It seems nothing can be done. oc-mirror says the catalog is invalid.

Actual results:

$ oc-mirror -c ./isc.yaml file://operator-images
Creating directory: operator-images/oc-mirror-workspace/src/publish
Creating directory: operator-images/oc-mirror-workspace/src/v2
Creating directory: operator-images/oc-mirror-workspace/src/charts
Creating directory: operator-images/oc-mirror-workspace/src/release-signatures
No metadata detected, creating new workspace
wrote mirroring manifests to operator-images/oc-mirror-workspace/operators.1711523827/manifests-redhat-operator-indexTo upload local images to a registry, run:        oc adm catalog mirror file://redhat/redhat-operator-index:v4.14 REGISTRY/REPOSITORY
<dir>
  openshift-pipelines/pipelines-chains-controller-rhel8
    blobs:
      registry.redhat.io/openshift-pipelines/pipelines-chains-controller-rhel8 sha256:b06cce9e748bd5e1687a8d2fb11e5e01dd8b901eeeaa1bece327305ccbd62907 11.51KiB
      registry.redhat.io/openshift-pipelines/pipelines-chains-controller-rhel8 sha256:e5897b8264878f1f63f6eceed870b939ff39993b05240ce8292f489e68c9bd19 11.52KiB
...
  stats: shared=12 unique=274 size=24.71GiB ratio=0.98
info: Mirroring completed in 9m45.86s (45.28MB/s)
Creating archive operator-images/mirror_seq1_000000.tar


$ oc-mirror -c ./isc.yaml file://operator-images
Found: operator-images/oc-mirror-workspace/src/publish
Found: operator-images/oc-mirror-workspace/src/v2
Found: operator-images/oc-mirror-workspace/src/charts
Found: operator-images/oc-mirror-workspace/src/release-signatures
The current default channel was not valid, so an attempt was made to automatically assign a new default channel, which has failed.
The failure occurred because none of the remaining channels contain an "olm.channel" priority property, so it was not possible to establish a channel to use as the default channel.

This can be resolved by one of the following changes:
1) assign an "olm.channel" property on the appropriate channels to establish a channel priority
2) modify the default channel manually in the catalog
3) by changing the ImageSetConfiguration to filter channels or packages in such a way that it will include a package version that exists in the current default channel

The rendered catalog is invalid.

Run "oc-mirror list operators --catalog CATALOG-NAME --package PACKAGE-NAME" for more information.

error: error generating diff: the current default channel "latest" for package "openshift-pipelines-operator-rh" could not be determined... ensure that your ImageSetConfiguration filtering criteria results in a package version that exists in the current default channel or use the 'defaultChannel' field

Expected results:

It should NOT throw that error and instead should either update (if you've added more to the imagesetconfig) or gracefully print the "No new images" message.

https://github.com/openshift/oc-mirror/pull/815

Bug OCPBUGS-21846: Test "static pods should start after being created" failed

View the Description View the linked PRs

Description of problem:

Recently, the passing rate for test "static pods should start after being created" has dropped significantly for some platforms: 

https://sippy.dptools.openshift.org/sippy-ng/tests/4.15/analysis?test=%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-node%5D%20static%20pods%20should%20start%20after%20being%20created%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

Take a look at this example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072

The test failed with the following message:
{  static pod lifecycle failure - static pod: "kube-controller-manager" in namespace: "openshift-kube-controller-manager" for revision: 6 on node: "ci-op-2z99zzqd-7f99c-rfp4q-master-0" didn't show up, waited: 3m0s}

Seemingly revision 6 was never reached. But if we look at the log from kube-controller-manager-operator, it jumps from revision 5 to revision 7: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview/1712803313642115072/artifacts/e2e-azure-sdn-techpreview/gather-extra/artifacts/pods/openshift-kube-controller-manager-operator_kube-controller-manager-operator-7cd978d745-bcvkm_kube-controller-manager-operator.log

The log also indicates that there is a possibility of race:

W1013 12:59:17.775274       1 staticpod.go:38] revision 7 is unexpectedly already the latest available revision. This is a possible race!

This might be a static controller issue. But I am starting with kube-controller-manager component for the case. Feel free to reassign. 

Here is a slack thread related to this:
https://redhat-internal.slack.com/archives/C01CQA76KMX/p1697472297510279

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug MGMT-17328: OCI shown as Dev Preview for OCP 4.14

View the Description View the linked PRs

Description of the problem:
OCI external platform should be shown as Tech Preview when OCP 4.14 is selected.

https://redhat-internal.slack.com/archives/C04RBMZCBGW/p1711029226861489

How reproducible:

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/6104

Bug OCPBUGS-30712: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-workload-identity/pull/15

Bug OCPBUGS-42342: ROSA HCP Nodepool versions unexpectedly do not match Node versions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41552~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39420. The following is the description of the original issue:
—
Description of problem:

ROSA HCP allows customers to select hostedcluster and nodepool OCP z-stream versions, respecting version skew requirements. E.g.:

A 4.15.28 hostedcluster with
A 4.15.28 nodepool
A 4.15.25 nodepool

Version-Release number of selected component (if applicable):

Reproducible on 4.14-4.16.z, this bug report demonstrates it for a 4.15.28 hostedcluster with a 4.15.25 nodepool

How reproducible:

100%

Steps to Reproduce:

    1. Create a ROSA HCP cluster, which comes with a 2-replica nodepool with the same z-stream version (4.15.28)
    2. Create an additional nodepool at a different version (4.15.25)

Actual results:

Observe that while nodepool objects report the different version (4.15.25), the resulting kernel version of the node is that of the hostedcluster (4.15.28)

❯ k get nodepool -n ocm-staging-2didt6btjtl55vo3k9hckju8eeiffli8                                                                                    
NAME                     CLUSTER       DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
mshen-hyper-np-4-15-25   mshen-hyper   1               1               False         True         4.15.25   False             False            
mshen-hyper-workers      mshen-hyper   2               2               False         True         4.15.28   False             False  


❯ k get no -owide                                            
NAME                                         STATUS   ROLES    AGE   VERSION            INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-129-139.us-west-2.compute.internal   Ready    worker   24m   v1.28.12+396c881   10.0.129.139   <none>        Red Hat Enterprise Linux CoreOS 415.92.202408100433-0 (Plow)   5.14.0-284.79.1.el9_2.aarch64   cri-o://1.28.9-5.rhaos4.15.git674ed4c.el9
ip-10-0-129-165.us-west-2.compute.internal   Ready    worker   98s   v1.28.12+396c881   10.0.129.165   <none>        Red Hat Enterprise Linux CoreOS 415.92.202408100433-0 (Plow)   5.14.0-284.79.1.el9_2.aarch64   cri-o://1.28.9-5.rhaos4.15.git674ed4c.el9
ip-10-0-132-50.us-west-2.compute.internal    Ready    worker   30m   v1.28.12+396c881   10.0.132.50    <none>        Red Hat Enterprise Linux CoreOS 415.92.202408100433-0 (Plow)   5.14.0-284.79.1.el9_2.aarch64   cri-o://1.28.9-5.rhaos4.15.git674ed4c.el9

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4787

Bug OCPBUGS-44301: pod cannot be ready during live migration

View the Description View the linked PRs

Description of problem:

After applying networkpolicy on the namespace, and do live migration. Pod cannot be ready after updating the route table mtu and reboot

cat <<EOF | oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: z3
spec:
podSelector: {}
policyTypes:
- Ingress
—
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-all-ingress
namespace: z3
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
team: qe
podSelector:
matchLabels:
name: test
policyTypes:
- Ingress
—
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-openshift-ingress
namespace: z3
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
policy-group.network.openshift.io/ingress: ""
podSelector: {}
policyTypes:
- Ingress
EOF

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 6m25s (x4579 over 19h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_hello-hdx8h_z3_8e3a0595-fabd-4953-a460-5c014290122d_0(383f4845fa3cc790f58c5d1a755fa46cc69c220a3669c65422a0423293c9863a): error adding pod z3_hello-hdx8h to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"383f4845fa3cc790f58c5d1a755fa46cc69c220a3669c65422a0423293c9863a" Netns:"/var/run/netns/cbba5e98-ae28-4199-a573-ef1c24013442" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=z3;K8S_POD_NAME=hello-hdx8h;K8S_POD_INFRA_CONTAINER_ID=383f4845fa3cc790f58c5d1a755fa46cc69c220a3669c65422a0423293c9863a;K8S_POD_UID=8e3a0595-fabd-4953-a460-5c014290122d" Path:"" ERRORED: error configuring pod [z3/hello-hdx8h] networking: [z3/hello-hdx8h/8e3a0595-fabd-4953-a460-5c014290122d:openshift-sdn]: error adding container to network "openshift-sdn": failed to add route to 10.128.0.2/14 via SDN: invalid argument
': StdinData: {"binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/80-openshift-network.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}
Normal AddedInterface 2m4s multus Add eth0 [10.128.0.209/23] from openshift-sdn

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. setup 4.16 cluster

2. Create namespace and pods and then apply Networkpolicy

3. do live migration

Actual results:

After route table mtu is updated and reboot. the pods on that worker cannot be ready with error (see description)

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
For guidance on using this template please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/sdn/pull/641

Bug OCPBUGS-19853: Hitting issues while running e2e tests on oc-mirror plugin on Power

View the Description View the linked PRs

Description of problem:

E2E test failing   

1. Clone oc-mirror repository
git clone https://github.com/openshift/oc-mirror.git && cd oc-mirror

2. Find the oc-mirror image in the release: https://mirror.openshift.com/pub/openshift-v4/multi/clients/ocp/4.14.0-rc.2/ppc64le/release.txt
oc-mirror quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fff150b00081ed565169de24cfc82481c5017de73986552d15d129530b62e531

3. Pull container
podman pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fff150b00081ed565169de24cfc82481c5017de73986552d15d129530b62e531

4. Extract binary
mkdir bin
container_id=$(podman create quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fff150b00081ed565169de24cfc82481c5017de73986552d15d129530b62e531)
podman cp ${container_id}:usr/bin/oc-mirror bin/oc-mirror

5. comfirm file
[root@rdr-ani-014-bastion-0 oc-mirror]# file bin/oc-mirror bin/oc-mirror: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked, interpreter /lib64/ld64.so.2, for GNU/Linux 3.10.0, Go BuildID=HuBgap--bII0r0Nw0GxI/SOZCyTWk4pH5ciuQtUO8/ib6uaSW-eAJl24Zzk-G2/O4yxlKreHK_BaH9F4RU6, BuildID[sha1]=c018e70301e18c23f2c119ba451a32aff980d618, with debug_info, not stripped, too many notes (256)
   

6. Build go-toolset and run e2e test
[root@rdr-ani-014-bastion-0 oc-mirror]# podman build -f Dockerfile -t local/go-toolset:latest
Successfully tagged localhost/local/go-toolset:latest
bf24f160059d7ae2ef99a77e6680cdac30e3ba942911b88c7e60dca88fd768f7

[root@rdr-ani-014-bastion-0 oc-mirror]# podman run -it -v $(pwd):/build:z --entrypoint /bin/bash local/go-toolset:latest ./test/e2e/e2e-simple.sh bin/oc-mirror | tee oc-mirror-e2e.log  /build/test/e2e/operator-test.28124 /build
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 49.0M  100 49.0M    0     0  60.2M      0 --:--:-- --:--:-- --:--:--  106M
go: downloading github.com/google/go-containerregistry v0.16.1
go: downloading github.com/docker/cli v24.0.0+incompatible
go: downloading github.com/opencontainers/image-spec v1.1.0-rc3
go: downloading github.com/spf13/cobra v1.7.0
go: downloading github.com/mitchellh/go-homedir v1.1.0
go: downloading golang.org/x/sync v0.2.0
go: downloading github.com/opencontainers/go-digest v1.0.0
go: downloading github.com/docker/distribution v2.8.2+incompatible
go: downloading github.com/google/go-cmp v0.5.9
go: downloading github.com/containerd/stargz-snapshotter/estargz v0.14.3
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/klauspost/compress v1.16.5
go: downloading github.com/vbatts/tar-split v0.11.3
go: downloading github.com/pkg/errors v0.9.1
go: downloading github.com/docker/docker v24.0.0+incompatible
go: downloading golang.org/x/sys v0.8.0
go: downloading github.com/docker/docker-credential-helpers v0.7.0
go: downloading github.com/sirupsen/logrus v1.9.1
bin/registry
/build
INFO: Running 22 test cases
INFO: Running full_catalog
.
.
.
sha256:17de509b5c9e370d501951850ba07f6cbefa529f598f3011766767d1181726b3 localhost.localdomain:5001/skhoury/oc-mirror-dev:4138bec2
info: Mirroring completed in 40ms (119.4kB/s)
worker 0 stopping
worker 1 stopping
worker 5 stopping
worker 3 stopping
worker 2 stopping
worker 3 stopping
worker 2 stopping
worker 4 stopping
work queue exiting
No images specified for pruning
Unpack release signatures
worker 1 stopping
work queue exiting
worker 0 stopping
Wrote release signatures to oc-mirror-workspace/results-1695964813
rebuilding catalog images
Rendering catalog image "localhost.localdomain:5001/skhoury/oc-mirror-dev:test-catalog-latest" with file-based catalog
error: error rebuilding catalog images from file-based catalogs: error regenerating the cache for localhost.localdomain:5001/skhoury/oc-mirror-dev:test-catalog-latest: fork/exec oc-mirror-workspace/images.1753960055/catalogs/localhost.localdomain:5000/skhoury/oc-mirror-dev/test-catalog-latest/bin/opm: exec format error

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

Always

Steps to Reproduce

Same as details provided in description

Actual results:

E2E test is getting terminated in between the execution

Expected results:

E2E testing should pass with no errors

Additional info:

E2E logs are provided here:
oc-mirror-e2e.log - https://github.ibm.com/redstack-power/project-mgmt/issues/3284#issuecomment-63722862
re-oc-mirror-e2e.log - https://github.ibm.com/redstack-power/project-mgmt/issues/3284#issuecomment-63806863

https://github.com/openshift/oc-mirror/pull/813

Bug OCPBUGS-24877: Update 4.16 ose-cluster-policy-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-policy-controller/pull/144

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-policy-controller/pull/144

Bug OCPBUGS-30531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/64

Bug OCPBUGS-32278: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4323

Bug OCPBUGS-19429: oc-mirror failed with a ImageSetConfiguration yaml containing two EUS channels

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/752

Bug OCPBUGS-26547: external-dns causing route53 throttling

View the Description View the linked PRs

Many jobs are failing because route53 is throttling us during cluster creation.
We need a make external-dns make fewer calls.

The theoretical minimum is:
list zones - 1 call
list zone records - (# of records / 100) calls
create 3 records per HC - 1-3 calls depending on how they are batched

Bug OCPBUGS-24962: Update 4.16 ose-ibmcloud-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-ibmcloud/pull/31

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/31

Bug OCPBUGS-25262: Update 4.16 operator-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/633

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/657

Bug OCPBUGS-29115: hcp create nodepool agent '--node-upgrade-type' param is mandatory although in --help it has default value

View the Description View the linked PRs

Description of problem:

Trying to run without --node-upgrade-type param fails for "spec.management.upgradeType: Unsupported value: \"\": supported values: \"Replace\", \"InPlace\""


although in --help it is documented to have a default value of 'InPlace'

Version-Release number of selected component (if applicable):

 [kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp -v
hcp version openshift/hypershift: af9c0b3ce9c612ec738762a8df893c7598cbf157. Latest supported OCP: 4.15.0
[

How reproducible:

  happens all the time

Steps to Reproduce:

    1.on an hosted cluster setup run :
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2 --node-upgrade-type Replace --help
Creates basic functional NodePool resources for Agent platformUsage:
  hcp create nodepool agent [flags]Flags:
  -h, --help   help for agentGlobal Flags:
      --cluster-name string             The name of the HostedCluster nodes in this pool will join. (default "example")
      --name string                     The name of the NodePool.
      --namespace string                The namespace in which to create the NodePool. (default "clusters")
      --node-count int32                The number of nodes to create in the NodePool. (default 2)
      --node-upgrade-type UpgradeType   The NodePool upgrade strategy for how nodes should behave when upgraded. Supported options: Replace, InPlace (default )
      --release-image string            The release image for nodes; if this is empty, defaults to the same release image as the HostedCluster.
      --render                          Render output as YAML to stdout instead of applying.
     

2.try to run with default value of --node-upgrade-type:
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2

Actual results:

[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2
2024-02-06T19:57:03+02:00       ERROR   Failed to create nodepool       {"error": "NodePool.hypershift.openshift.io \"nodepool-of-extra1\" is invalid: spec.management.upgradeType: Unsupported value: \"\": supported values: \"Replace\", \"InPlace\""}
github.com/openshift/hypershift/cmd/nodepool/core.(*CreateNodePoolOptions).CreateRunFunc.func1
        /home/kni/hypershift_working/hypershift/cmd/nodepool/core/create.go:39
github.com/spf13/cobra.(*Command).execute
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:1115
github.com/spf13/cobra.(*Command).Execute
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:1039
github.com/spf13/cobra.(*Command).ExecuteContext
        /home/kni/hypershift_working/hypershift/vendor/github.com/spf13/cobra/command.go:1032
main.main
        /home/kni/hypershift_working/hypershift/product-cli/main.go:60
runtime.main
        /home/kni/hypershift_working/go/src/runtime/proc.go:250
Error: NodePool.hypershift.openshift.io "nodepool-of-extra1" is invalid: spec.management.upgradeType: Unsupported value: "": supported values: "Replace", "InPlace"
NodePool.hypershift.openshift.io "nodepool-of-extra1" is invalid: spec.management.upgradeType: Unsupported value: "": supported values: "Replace", "InPlace"

Expected results:

   should pass as if your adding the param :
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2 --node-upgrade-type InPlace
NodePool nodepool-of-extra1 created
[kni@ocp-edge119 ~]$

Additional info:

A related issue is that we have a difference if the --help is used with other parameters or not :

[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --cluster-name hosted-0 --name nodepool-of-extra1 --node-count 2 --node-upgrade-type Replace --help > long.help.out
[kni@ocp-edge119 ~]$ ~/hypershift_working/hypershift/bin/hcp create nodepool agent --help > short.help.out
[kni@ocp-edge119 ~]$ diff long.help.out short.help.out 
14c14
<       --node-upgrade-type UpgradeType   The NodePool upgrade strategy for how nodes should behave when upgraded. Supported options: Replace, InPlace (default )
---
>       --node-upgrade-type UpgradeType   The NodePool upgrade strategy for how nodes should behave when upgraded. Supported options: Replace, InPlace
[kni@ocp-edge119 ~]$

https://github.com/openshift/hypershift/pull/3572

Bug OCPBUGS-29974: ART requests updates to 4.16 image kube-rbac-proxy-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/91

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-rbac-proxy/pull/91

Bug OCPBUGS-31213: Unable to look up the service account secrets for build

View the Description View the linked PRs

When running a build (e.g. oc start-build) the build fails with reason CannotRetrieveServiceAccount and message Unable to look up the service account secrets for this build.

https://github.com/openshift/openshift-controller-manager/pull/292

Bug HOSTEDCP-1955: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4696

Bug OCPBUGS-43025: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/920

Bug USHIFT-2256: OVN-K is flaking in periodic nightlies

View the Description View the linked PRs

Description of problem:

Pre-test greenboot checks fail in during scenario run due to OVN-K pods reporting a "failed" status.

Version-Release number of selected component (if applicable):

I believe this is only affecting `periodic-ci-openshift-microshift-main-ocp-metal-nightly` jobs.

How reproducible:

Unsure.  Has occurred 2 times in consecutive daily-periodic jobs.

Steps to Reproduce:

n/a

Actual results:

- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-main-ocp-metal-nightly/1753221880392716288
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-microshift-main-ocp-metal-nightly/1753403067350388736

Expected results:

OVN-K Pods should deploy into a healthy state

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2051

Bug OCPBUGS-22559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/66

Bug OCPBUGS-28370: HCP deletion can get stuck if CPO is unable to delete the default worker security group

View the Description View the linked PRs

Description of problem:

If a ROSA HCP customer uses the default worker security group that the CPO creates for some other purpose (i.e. creates their own VPC Endpoint or EC2 instance using this security group) and then starts an uninstallation - the uninstallation will hang indefinitely because the CPO is unable to delete the security group.

https://github.com/openshift/hypershift/blob/9e6255e5e44c8464da0850f8c19dc085bdbaf8cb/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L317-L331

Version-Release number of selected component (if applicable):

4.14.8

How reproducible:

100%

Steps to Reproduce:

    1. Create a ROSA HCP cluster
    2. Attach the default worker security group to some other object unrelated to the cluster, like an EC2 instance or VPC Endpoint
    3. Uninstall the ROSA HCP cluster

Actual results:

The uninstall hangs without much feedback to the customer

Expected results:

Either that the uninstall gives up and moves on eventually, or that clear feedback is provided to the customer, so that they know that the uninstall is held up because of an inability to delete a specific security group id. If this feedback mechanism is already in place, but not wired through to OCM, this may not be an OCPBUGS and could just be an OCM bug instead!

Additional info:

Bug OCPBUGS-30146: fatal error: concurrent map read and map write between Scheme New() and AddKnownTypeWithName()

View the Description View the linked PRs

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1763616334094012416/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestUpgradeControlPlane/namespaces/e2e-clusters-5tmhk-example-j4msj/core/pods/logs/cloud-network-config-controller-9689f46c8-w8h64-controller-previous.log

fatal error: concurrent map read and map write

goroutine 31 [running]:
k8s.io/apimachinery/pkg/runtime.(*Scheme).New(0xc0002e81c0, {{0x0, 0x0}, {0xc000bd17c6, 0x2}, {0xc000bd17c0, 0x6}})
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/runtime/scheme.go:296 +0x67

goroutine 82 [runnable]:
k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName(0xc0002e81c0, {{0x2ede1b9, 0x1a}, {0x2eb1bb8, 0x2}, {0x2ebb205, 0xa}}, {0x3388ab8?, 0xc00062e540})
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/runtime/scheme.go:176 +0x2b2

https://github.com/openshift/cloud-network-config-controller/pull/134

Bug OCPBUGS-32485: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28704

Bug OCPBUGS-33727: Remove TechPreview from Gateway API feature for 4.16

View the Description View the linked PRs

Description of problem:

Gateway API was added as a DevPreviewNoUpgrade feature before recent changes to the FeatureGate framework, and has not progressed to TechPreviewNoUpgrade.    When the FeatureGate framework changed, Gateway API was mistakenly listed as a TechPreviewNoUpgrade feature in https://github.com/openshift/api/blob/master/features/features.go#L71

For 4.16 we are adding TechPreview testing to the cluster-ingress-operator for other features, and we do not want to test Gateway API as a tech preview feature.  In fact, it has its own separate test.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

N/A

Steps to Reproduce:

N/A

Actual results:

    Gateway API is tested as a Tech Preview feature

Expected results:

    Gateway API should only be tested as a Dev Preview feature.

Additional info:

https://github.com/openshift/api/pull/1894

Bug OCPBUGS-34711: [release-4.16] Show pod debug action for Succeeded pods

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33631~~. The following is the description of the original issue:
—
Description of problem:

Currently we show the debug container action for pods that are failing. We should be showing the action also for pods in 'Succeeded' phase

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Log in into a cluster
    2. Create an example Job resource
    3. Check the job's pod and wait till it is in 'Succeeded' phase

Actual results:

Debug container action is not available, on the pod's Logs page

Expected results:

Debug container action is available, on the pod's Logs page

Additional info:

Since users are looking for this feature even for pods in any phase, we are treating this issue as bug.
Related stories:
RFE - https://issues.redhat.com/browse/RFE-1935
STORY - https://issues.redhat.com/browse/CONSOLE-4057

Code that needs to be removed - https://github.com/openshift/console/blob/ae115a9e8c72f930a67ee0c545d36f883cd6be34/frontend/public/components/utils/resource-log.tsx#L149-L151

https://github.com/openshift/console/pull/13914

Bug OCPBUGS-36764: Block all z rollbacks again

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35994~~. The following is the description of the original issue:
—

Description of problem

To reduce QE load, we've decided to block up the hole drilled in ~~OCPBUGS-24535~~. We might not want a pure revert, if some of the changes are helpful (e.g. more helpful error messages).

We also want to drop the oc adm upgrade rollback subcommand which was the client-side tooling associated with the ~~OCPBUGS-24535~~ hole.

Version-Release number of selected component

Both 4.16 and 4.17 currently have the rollback subcommand and associated CVO-side hole.

How reproducible

Every time.

Steps to Reproduce

Try to perform the rollbacks that ~~OCPBUGS-24535~~ allowed.

Actual results

They work, as verified in ~~OCPBUGS-24535~~.

Expected results

They stop working, with reasonable ClusterVersion conditions explaining that even those rollback requests will not be accepted.

Bug OCPBUGS-24913: Update 4.16 openshift-enterprise-console-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/823

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-31090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3778

Bug OCPBUGS-26566: Page fails to return to the Secrets list after clicking 'Cancel' on any Secret creation page

View the Description View the linked PRs

Description of problem:

  when user click ‘Cancel’ on any Secret creation page, it doesn’t return to Secrets list page

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-01-06-062415

How reproducible:

    Always

Steps to Reproduce:

    1. Go to Create Key/value secret|Image pull secret|Source secret|Webhook secret|FromYaml page
       eg：/k8s/ns/default/secrets/~new/generic
    2. Click Cancel button
    3.

Actual results:

    The page does not go back to Secrets list page
    eg: /k8s/ns/default/core~v1~Secret

Expected results:

    The page should go back to the Secrets list page

Additional info:

https://github.com/openshift/console/pull/13504

Bug OCPBUGS-32046: ART requests updates to 4.16 image ose-azure-file-csi-driver-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/60

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/60

Bug OCPBUGS-32176: cluster-etcd-operator panic in CI

View the Description View the linked PRs

Seen in CI:

I0409 09:52:54.280834       1 builder.go:299] openshift-cluster-etcd-operator version v0.0.0-alpha.0-1430-g3d5483e-3d5483e1
...
E0409 10:08:08.921203       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1581 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28cd3c0?, 0x4b191e0})
	k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0xc0016eccd0, 0x1, 0x27036c0?})
	k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x28cd3c0?, 0x4b191e0?})
	runtime/panic.go:914 +0x21f
github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner.addCertSecretToMap(0x0?, 0x0)
	github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner/etcdcertsignercontroller.go:341 +0x27
github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner.(*EtcdCertSignerController).syncAllMasterCertificates(0xc000521ea0, {0x32731e8, 0xc0006fd1d0}, {0x3280cb0, 0xc000194ee0})
	github.com/openshift/cluster-etcd-operator/pkg/operator/etcdcertsigner/etcdcertsignercontroller.go:252 +0xa65
...

It looks like syncAllMasterCertificates needs to be skipping the addCertSecretToMap calls for certs where EnsureTargetCertKeyPair returned an error.

https://github.com/openshift/cluster-etcd-operator/pull/1245

Bug OCPBUGS-35818: [4.16] dev-scripts fails bootstrapping OCP 4.16 and greater with MIRROR_IMAGES=true AND INSTALLER_PROXY=true

View the Description View the linked PRs

Description of problem:

    Trying to execute https://github.com/openshift-metal3/dev-scripts to deploy an OCP 4.16 or 4.17 cluster (with the same configuration OCP 4.14 and 4.15 are instead working) with:
 MIRROR_IMAGES=true
 INSTALLER_PROXY=true

the bootstrap process fails with:

 level=debug msg=    baremetalhost resource not yet available, will retry
level=debug msg=    baremetalhost resource not yet available, will retry
level=info msg=  baremetalhost: ostest-master-0: uninitialized
level=info msg=  baremetalhost: ostest-master-0: registering
level=info msg=  baremetalhost: ostest-master-1: uninitialized
level=info msg=  baremetalhost: ostest-master-1: registering
level=info msg=  baremetalhost: ostest-master-2: uninitialized
level=info msg=  baremetalhost: ostest-master-2: registering
level=info msg=  baremetalhost: ostest-master-1: inspecting
level=info msg=  baremetalhost: ostest-master-2: inspecting
level=info msg=  baremetalhost: ostest-master-0: inspecting
E0514 12:16:51.985417   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?allowWatchBookmarks=true&resourceVersion=5466&timeoutSeconds=547&watch=true": Service Unavailable
W0514 12:16:52.979254   89709 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=5466": Service Unavailable
E0514 12:16:52.979293   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=5466": Service Unavailable
E0514 12:37:01.927140   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?allowWatchBookmarks=true&resourceVersion=7800&timeoutSeconds=383&watch=true": Service Unavailable
W0514 12:37:03.173425   89709 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=7800": Service Unavailable
E0514 12:37:03.173473   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?resourceVersion=7800": Service Unavailable
level=debug msg=Fetching Bootstrap SSH Key Pair...
level=debug msg=Loading Bootstrap SSH Key Pair...

it looks like up to a certain point https://api.ostest.test.metalkube.org:6443 was reachable but then for some reason it started failing because its not using the proxy or is and it shouldn't be (???)

The 3 master nodes are reported as:
[root@ipi-ci-op-0qigcrln-b54ee-1790684582253694976 home]# oc get baremetalhosts -A
NAMESPACE               NAME              STATE        CONSUMER                ONLINE   ERROR              AGE
openshift-machine-api   ostest-master-0   inspecting   ostest-bbhxb-master-0   true     inspection error   24m
openshift-machine-api   ostest-master-1   inspecting   ostest-bbhxb-master-1   true     inspection error   24m
openshift-machine-api   ostest-master-2   inspecting   ostest-bbhxb-master-2   true     inspection error   24m

With something like:

 status:
  errorCount: 5
  errorMessage: 'Failed to inspect hardware. Reason: unable to start inspection: Validation
    of image href http://0.0.0.0:8084/34427934-f1a6-48d6-9666-66872eec9ba2 failed,
    reason: Got HTTP code 503 instead of 200 in response to HEAD request.'
  errorType: inspection error

on their status

Version-Release number of selected component (if applicable):

    4.16, 4.17

How reproducible:

    100%

Steps to Reproduce:

    1. Try to create an OCP 4.16 cluster with dev-scrips with IP_STACK=v4, MIRROR_IMAGES=true and INSTALLER_PROXY=true
    2.
    3.

Actual results:

    level=info msg=  baremetalhost: ostest-master-0: inspecting
E0514 12:16:51.985417   89709 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *unstructured.Unstructured: Get "https://api.ostest.test.metalkube.org:6443/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts?allowWatchBookmarks=true&resourceVersion=5466&timeoutSeconds=547&watch=true": Service Unavailable

Expected results:

    Successful deployment

Additional info:

I'm using IP_STACK=v4, MIRROR_IMAGES=true and INSTALLER_PROXY=true
with the same configuration (MIRROR_IMAGES=true and INSTALLER_PROXY=true) OCP 4.14 and OCP 4.15 are working.

When removing INSTALLER_PROXY=true, OCP 4.16 is also working.

I'm going to attach bootstrap gather logs

https://github.com/openshift/installer/pull/8632

Bug OCPBUGS-24980: Update 4.16 ose-machine-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1187

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1187

Task OSASINFRA-3421: Support customizing client for OpenStack API calls from MachineSet generator

View the Description View the linked PRs

External consumers of MachineSets(), such as hive, need to be able to customize the client that queries the OpenStack cloud for trunk support.

~~OSASINFRA-3420~~, eliminating what looked like tech debt, removed that enablement, which had been added via a revert of a previous similar removal.

Reinstate the customizability, and include a docstring explanation to hopefully prevent it being removed again.

https://github.com/openshift/installer/pull/8209

Bug OCPBUGS-25890: there is flicker when clicking on perspective switcher after hard refresh

View the Description View the linked PRs

Description of problem:

when user clicks on perspective switcher after a hard refresh, the flicker appears

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-25-100326

How reproducible:

Always after user refresh the console

Steps to Reproduce:

1. user login to OCP console
2. refresh the whole console then click perspective switcher 
3.

Actual results:

there is flicker when clicking on perspective switcher

Expected results:

no flickers

Additional info:

screen recording https://drive.google.com/file/d/1_2tPZ0DXNTapFP9sSz27vKbnwxxdWZSV/view?usp=drive_link

https://github.com/openshift/console/pull/13500

Bug OCPBUGS-30299: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2299

Task MON-3707: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2326

Bug OCPBUGS-24968: Update 4.16 ose-vmware-vsphere-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/103

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/103

Bug OCPBUGS-25245: MCO the content mismatch bug revised when upgrading from 4.13.23 to 4.14.3

View the Description View the linked PRs

Description of problem:

    When upgrading cluster from 4.13.23 to 4.14.3, machine-config CO gets stuck due to a content mismatch error on all nodes.

Node node-xxx-xxx is reporting: "unexpected on-disk state
      validating against rendered-master-734521b50f69a1602a3a657419ed4971: content
      mismatch for file \"/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt\""

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. perform a upgrade from 4.13.x to 4.14.x
    2. 
    3.

Actual results:

    machine-config stalls during upgrade

Expected results:

    the "content mismatch" shouldn't happen anymore according to the MCO engineering team

Additional info:

https://github.com/openshift/machine-config-operator/pull/4073

Bug OCPBUGS-25934: Icon colors and sizes props broken

View the Description View the linked PRs

Description of problem:

    Icons which were formally blue are no longer blue.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13460

Bug OCPBUGS-36415: [4.16.z] SCC pinning for all workloads in platform namespaces (cluster-baremetal-operator)

View the Description View the linked PRs

Backport to 4.16 of AUTH-482 specifically for the cluster-baremetal-operator.

4.16 PR

https://github.com/openshift/cluster-baremetal-operator/pull/433

Bug OCPBUGS-19807: 'oc adm release extract --include ...' obsoletes ccoctl's --enable-tech-preview

View the Description View the linked PRs

Description of problem:

ccoctl consumes CredentialsRequests extracted from OpenShift releases and manages secrets associated with those requests for the cluster. Over time, ccoctl has grown a number of CredentialRequest filters, including deletion annotations in ~~CCO-175~~ and tech-preview annotations in cco#444.

But with ~~OTA-559~~, 4.14 and later oc adm release extract ... learned about an --included parameter, which allows oc to perform that "will the cluster need this credential?" filtering, and there is no longer a need for ccoctl to perform that filtering, or for ccoctl callers to have to think through "do I need to enable tech-preview CRs for this cluster or not?".

Version-Release number of selected component (if applicable):

4.14 and later.

How reproducible:

100%.

Steps to Reproduce:

$ cat <<EOF >install-config.yaml 
> apiVersion: v1
> platform:
>   gcp:
>     dummy: data
> featureSet: TechPreviewNoUpgrade
> EOF
$ oc adm release extract --included --credentials-requests --install-config install-config.yaml --to credentials-requests quay.io/openshift-release-dev/ocp-release:4.14.0-rc.2-x86_64
$ ccoctl gcp create-all --dry-run --name=test --region=test --project=test --credentials-requests-dir=credentials-requests

Actual results:

ccoctl doesn't dry-run create the TechPreviewNoUpgrade openshift-cluster-api-gcp CredentialsRequest unless you pass it {--enable-tech-preview}}.

Expected results:

ccoctl does dry-run create the TechPreviewNoUpgrade openshift-cluster-api-gcp CredentialsRequest unless you pass it --enable-tech-preview=false.

Additional info:

Longer-term, we likely want to go through some phases of deprecating and maybe eventually removing --enable-tech-preview and the ccoctl-side filtering. But for now, I think we want to pivot to defaulting to true, so that anyone with existing flows that do not include the new --included extraction has an easy way to keep their workflow going (they can set --enable-tech-preview=false). And I think we should backport that to 4.14's ccoctl to simplify ~~OSDOCS-4158~~'s docs#62148. But we're close enough to 4.14's expected GA, that it's worth some consensus-building and alternative consideration, before trying to rush changes back to 4.14 branches.

https://github.com/openshift/oc/pull/1551

Bug OCPBUGS-4466: [gcp][CORS-2420] deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic

View the Description View the linked PRs

Description of problem:

deploying compact 3-nodes cluster on GCP, by setting mastersSchedulable as true and removing worker machineset YAMLs, got panic

Version-Release number of selected component (if applicable):

$ openshift-install version
openshift-install 4.13.0-0.nightly-2022-12-04-194803
built from commit cc689a21044a76020b82902056c55d2002e454bd
release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea
release architecture amd64

How reproducible:

Always

Steps to Reproduce:

1. create manifests
2. set 'spec.mastersSchedulable' as 'true', in <installation dir>/manifests/cluster-scheduler-02-config.yml
3. remove the worker machineset YAML file from <installation dir>/openshift directory
4. create cluster

Actual results:

Got "panic: runtime error: index out of range [0] with length 0".

Expected results:

The installation should succeed, or giving clear error messages.

Additional info:

$ openshift-install version
openshift-install 4.13.0-0.nightly-2022-12-04-194803
built from commit cc689a21044a76020b82902056c55d2002e454bd
release image registry.ci.openshift.org/ocp/release@sha256:9e61cdf7bd13b758343a3ba762cdea301f9b687737d77ef912c6788cbd6a67ea
release architecture amd64
$ 
$ openshift-install create manifests --dir test1
? SSH Public Key /home/fedora/.ssh/openshift-qe.pub
? Platform gcp
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
? Project ID OpenShift QE (openshift-qe)
? Region us-central1
? Base Domain qe.gcp.devcluster.openshift.com
? Cluster Name jiwei-1205a
? Pull Secret [? for help] ******
INFO Manifests created in: test1/manifests and test1/openshift 
$ 
$ vim test1/manifests/cluster-scheduler-02-config.yml
$ yq-3.3.0 r test1/manifests/cluster-scheduler-02-config.yml spec.mastersSchedulable
true
$ 
$ rm -f test1/openshift/99_openshift-cluster-api_worker-machineset-?.yaml
$ 
$ tree test1
test1
├── manifests
│   ├── cloud-controller-uid-config.yml
│   ├── cloud-provider-config.yaml
│   ├── cluster-config.yaml
│   ├── cluster-dns-02-config.yml
│   ├── cluster-infrastructure-02-config.yml
│   ├── cluster-ingress-02-config.yml
│   ├── cluster-network-01-crd.yml
│   ├── cluster-network-02-config.yml
│   ├── cluster-proxy-01-config.yaml
│   ├── cluster-scheduler-02-config.yml
│   ├── cvo-overrides.yaml
│   ├── kube-cloud-config.yaml
│   ├── kube-system-configmap-root-ca.yaml
│   ├── machine-config-server-tls-secret.yaml
│   └── openshift-config-secret-pull-secret.yaml
└── openshift
    ├── 99_cloud-creds-secret.yaml
    ├── 99_kubeadmin-password-secret.yaml
    ├── 99_openshift-cluster-api_master-machines-0.yaml
    ├── 99_openshift-cluster-api_master-machines-1.yaml
    ├── 99_openshift-cluster-api_master-machines-2.yaml
    ├── 99_openshift-cluster-api_master-user-data-secret.yaml
    ├── 99_openshift-cluster-api_worker-user-data-secret.yaml
    ├── 99_openshift-machineconfig_99-master-ssh.yaml
    ├── 99_openshift-machineconfig_99-worker-ssh.yaml
    ├── 99_role-cloud-creds-secret-reader.yaml
    └── openshift-install-manifests.yaml2 directories, 26 files
$ 
$ openshift-install create cluster --dir test1
INFO Consuming Openshift Manifests from target directory
INFO Consuming Master Machines from target directory 
INFO Consuming Worker Machines from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Consuming Common Manifests from target directory 
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
panic: runtime error: index out of range [0] with length 0goroutine 1 [running]:
github.com/openshift/installer/pkg/tfvars/gcp.TFVars({{{0xc000cf6a40, 0xc}, {0x0, 0x0}, {0xc0011d4a80, 0x91d}}, 0x1, 0x1, {0xc0010abda0, 0x58}, ...})
        /go/src/github.com/openshift/installer/pkg/tfvars/gcp/gcp.go:70 +0x66f
github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1daff070, 0xc000cef530?)
        /go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:479 +0x6bf8
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000c78870, {0x1a777f40, 0x1daff070}, {0x0, 0x0})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffc4c21413b?, {0x1a777f40, 0x1daff070}, {0x1dadc7e0, 0x8, 0x8})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffc4c21413b, 0x5})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:259 +0x125
main.runTargetCmd.func2(0x1dae27a0?, {0xc000c702c0?, 0x2?, 0x2?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:289 +0xe7
github.com/spf13/cobra.(*Command).execute(0x1dae27a0, {0xc000c70280, 0x2, 0x2})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000c3a500)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:918
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff
$

https://github.com/openshift/installer/pull/8226

Bug OCPBUGS-29377: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13725

Bug OCPBUGS-30703: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1713

Bug OCPBUGS-32026: [Azure] Permissions required when migrating an OpenShift Cluster to Azure AD Workload Identity to apply the Azure Pod Identity webhook configuration.

View the Description View the linked PRs

Description of problem:

    Migrate an OpenShift Cluster to Azure AD Workload Identity, it is not have sufficient permissions to apply the Azure Pod Identity webhook configuration.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

   Always

Steps to Reproduce:

    1. According to the steps provided in the documentation: https://github.com/openshift/cloud-credential-operator/blob/master/docs/azure_workload_identity.md#steps-to-in-place-migrate-an-openshift-cluster-to-azure-ad-workload-identity     
    2. For step10. Failed to apply the azure pod identity webhook configuration.

Actual results:

For step10：
[hmx@fedora CCO]$ oc replace -f ./CCO-456/output_dir/manifests/azure-ad-pod-identity-webhook-config.yaml
 Error from server (NotFound): error when replacing "./CCO-456/output_dir/manifests/azure-ad-pod-identity-webhook-config.yaml": secrets "azure-credentials" not found
   
 [hmx@fedora CCO]$ oc get po -n openshift-cloud-credential-operator  NAME                                         READY   STATUS    RESTARTS   AGE cloud-credential-operator-594bf555b4-6srcq   2/2     Running   0          3h32m 

[hmx@fedora CCO]$ oc logs cloud-credential-operator-594bf555b4-6srcq -n openshift-cloud-credential-operator
Defaulted container "kube-rbac-proxy" out of: kube-rbac-proxy, cloud-credential-operator
Flag --logtostderr has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
I0410 06:41:25.490507       1 kube-rbac-proxy.go:285] Valid token audiences: 
I0410 06:41:25.490752       1 kube-rbac-proxy.go:399] Reading certificate files
I0410 06:41:25.491607       1 kube-rbac-proxy.go:447] Starting TCP socket on 0.0.0.0:8443
I0410 06:41:25.492241       1 kube-rbac-proxy.go:454] Listening securely on 0.0.0.0:8443
E0410 06:41:52.996659       1 webhook.go:154] Failed to make webhook authenticator request: Unauthorized
E0410 06:41:52.997568       1 auth.go:47] Unable to authenticate the request due to an error: Unauthorized
E0410 06:42:15.871706       1 webhook.go:154] Failed to make webhook authenticator request: Unauthorized
E0410 06:42:15.871754       1 auth.go:47] Unable to authenticate the request due to an error: Unauthorized

Expected results:

    Apply the azure pod identity webhook configuration successfully.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/695

Bug OCPBUGS-33876: openshift-controller-manager pod panic due to type assertion

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33834~~. The following is the description of the original issue:
—
4.16.0-0.nightly-2024-05-16-165920 aws-sdn-upgrade failures in 1791152612112863232

Undiagnosed panic detected in pod

{  pods/openshift-controller-manager_controller-manager-8d46bf695-cvdc6_controller-manager.log.gz:E0516 17:36:26.515398       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3ca66c0), concrete:(*abi.Type)(0x3e9f720), asserted:(*abi.Type)(0x41dd660), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)

https://github.com/openshift/openshift-controller-manager/pull/308

Bug OCPBUGS-35836: multus-admission-controller does not preserve modified resource requests/limits

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31878~~. The following is the description of the original issue:
—
The multus-admission-controller does not retain its container resource requests/limits if manually set. The cluster-network-operator overwrites any modifications on the next reconciliation. This resource preservation support has already been added to all other components in https://github.com/openshift/hypershift/pull/1082 and https://github.com/openshift/hypershift/pull/3120. Similar changes should be made for the multus-admission-controller so all hosted control plane components demonstrate the same resource preservation behavior.

https://github.com/openshift/cluster-network-operator/pull/2417

Bug OCPBUGS-31013: Power VS: proxy VM image cannot be found

View the Description View the linked PRs

Description of problem:

    When trying to deploy with an Internal publish strategy, DNS will fail because proxy VM cannot launch.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Set publishStrategy: Internal
    2. Fail
    3.

Actual results:

    terraform fails

Expected results:

    private cluster launches

Additional info:

https://github.com/openshift/installer/pull/8177

Bug OCPBUGS-26983: monitoring-plugin becomes unavailable after forcing the rotation of the service's certificate

View the Description View the linked PRs

Description of problem:

following signing-key deletion, there is a service CA rotation process which might temporary disrupt platform components, but eventually all should use the updated certificates.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-30-131338 and other recent 4.14 nightlies

How reproducible:

100%

Steps to Reproduce:

1.oc delete secret/signing-key -n openshift-service-ca
2. reload the management console
3.

Actual results:

The Observe tab disappears from the menu bar and the monitoring-plugin shows as unavailable.

Expected results:

No disruption

Additional info:

using manual deletion of the monitoring-plugin pods it is possible to recover the situation

https://github.com/openshift/cluster-monitoring-operator/pull/2233

Bug OCPBUGS-41518: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4569

Bug OCPBUGS-30478: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/688

Bug OCPBUGS-25803: ConsolePluginComponents task is not resilient

View the Description View the linked PRs

Description of problem:

ConsolePluginComponents CMO task depends on the availability of a conversion service that is part of the console-operator Pod, that Pod is not duplicated, thus when it restarts due to a cluster upgrade or other that conversion webhook becomes unavailable and all the ConsolePlugin API queries from that CMO task fail.

Version-Release number of selected component (if applicable):

How reproducible:

Create a 4.14 cluster, make the console-operator unmanaged and bring it down, watch the ConsolePluginComponents tasks fail instantly after they're run.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

The ConsolePluginComponents tasks fail instantly after they're run.

Expected results:

The tasks should be more resilient and retry.

The long term solution is for that ConsolePlugin conversion service to be duplicated.

Additional info:

For OCP >=4.15, CMO v1 ConsolePlugin querie no longer rely on the conversion webhook because of https://github.com/openshift/api/pull/1477.

But the retries will keep the task future proof + we'll be able to backport the fix.

https://github.com/openshift/cluster-monitoring-operator/pull/2193

Bug OCPBUGS-35470: [Create for backport]add machine series 'A3' and 'C3D' to 'Tested instance types for GCP'

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35469~~. The following is the description of the original issue:
—
Description of problem:

As described in https://issues.redhat.com/browse/OCPQE-22479.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8614

Bug OCPBUGS-25342: HyperShift should encrypt the same resources that OCP standalone encrypts in etcd

View the Description View the linked PRs

Description of problem:

    Standalone OCP encrypts various resources at rest in etcd:
https://docs.openshift.com/container-platform/4.14/security/encrypting-etcd.html
HyperShift control planes are only encrypting secrets. We should have parity with standalone.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    Always

Steps to Reproduce:

    1. Create HyperShift standalone control plane
    2. Check that configmaps, routes, oauth access tokens or oauth authorize tokens are encrypted

Actual results:

    Those resources are not encrypted

Expected results:

    Those resources are encrypted

Additional info:

Resources to be encrypted are configured here:
https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/kas/kms/aws.go#L121-L126

https://github.com/openshift/hypershift/pull/3341

Bug OCPBUGS-32177: hack/tools.go path is wrong in Makefile

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.16/Makefile#L266

@echo Installing tools from hack/tools.go

from https://github.com/openshift/cluster-monitoring-operator/blob/release-4.16/hack/tools/tools.go, it should be

@echo Installing tools from hack/tools/tools.go

Version-Release number of selected component (if applicable):

4.16

How reproducible:

always

Steps to Reproduce:

1. see the description

Actual results:

hack/tools.go path is wrong in Makefile

Expected results:

should be hack/tools/tools.go

https://github.com/openshift/cluster-monitoring-operator/pull/2327

Bug OCPBUGS-38914: [release-4.16]Unrelated readme opened when opening CodeReady workspaces from Quarkus using s2i quickstart

View the Description View the linked PRs

This is a clone of issue OCPBUGS-34365. The following is the description of the original issue:
—

Follow "Get started with Quarkus using s2i" until step 3 "view the associated code"
when opening CodeReady Workspaces by clicking on the link in the topology view, the readme of "validation-quickstart" is opened although we imported the "getting-started" subfolder"

https://github.com/openshift/console/pull/14188

Bug OCPBUGS-25132: dual-stack UPI: IPv6 security group rules created for single-stack cluster

View the Description View the linked PRs

Description of problem:

security-groups.yaml playbook runs the IPv6 security group rules creation tasks regardless of the os_subnet6 value.
The when clause is not considering the os_subnet6 [1] value and is always executed.

It works with:

  - name: 'Create security groups for IPv6'
    block:
    - name: 'Create master-sg IPv6 rule "OpenShift API"'
    [...]
    when: os_subnet6 is defined

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Always

Steps to Reproduce:

1. Don't set the os_subnet6 in the inventory file [2] (so it's not dual-stack)
2. Deploy 4.15 UPI by running the UPI playbooks

Actual results:

IPv6 security group rules are created

Expected results:

IPv6 security group rules shouldn't be created

Additional info:
[1] https://github.com/openshift/installer/blob/46fd66272538c350327880e1ed261b70401b406e/upi/openstack/security-groups.yaml#L375
[2] https://github.com/openshift/installer/blob/46fd66272538c350327880e1ed261b70401b406e/upi/openstack/inventory.yaml#L77

https://github.com/openshift/installer/pull/7833

Bug OCPBUGS-30600: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-29441~~.

https://github.com/openshift/installer/pull/8121

Bug OCPBUGS-33352: Fix issue with PatternFly modules check script

View the Description View the linked PRs

Description of problem:

It may happen that after changing some of the JS dependencies in Console frontend, the check-patternfly-modules.sh script may report false positives due to yarn install modifying the yarn.lock file in a way which is not expected by this script.

We should fix the check-patternfly-modules.sh script to parse yarn why command output instead.

https://github.com/openshift/console/pull/13817

Bug OCPBUGS-33673: Avoid v1-FlowSchema-unrecognized noise in 4.15-to-4.16 updates

View the Description View the linked PRs

Description of problem

4.16 etcd v1 FlowSchema is unrecognized by the outgoing 4.15 Kube API server, causing noise during 4.15 to 4.16 updates.

Version-Release number of selected component (if applicable)

4.16.

How reproducible

Every time.

Steps to Reproduce

1. Install a 4.15 cluster.
2. Update to 4.16, while streaming CVO logs.
3. Search streamed CVO logs for Could not update flowschema "openshift-etcd-operator" ...the server does not recognize this resource, check extension API servers

Actual results

A bunch of hits, until the Kube API server finishes updating to 4.16.

Expected results

No distracting noise.

Additional info

Evgeni noticed this in ~~OTA-1246~~, and Petr root-caused it.

~~OCPBUGS-22969~~ moved a bunch of FlowSchema to v1, and for most of them, it won't be a problem. But etcd and Kube API server update in parallel in run-level 20, so the CVO pushing out the etcd manifest can race Kube API server being new enough to understand it. One possible solution would be sticking with v1beta3 for 4.16 and moving to v1 in 4.17 (like our customers will have to do). Another solution would be moving the 4.16 manifest out to run-level 31 (e.g. by renaming 0000_20_etcd-operator_10_flowschema.yaml to 0000_21_etcd-operator_10_flowschema.yaml) after branch-fork, since the need would be unique to the 4.16 branch.

https://github.com/openshift/cluster-etcd-operator/pull/1348

Bug OCPBUGS-36241: Disruption monitor failing when running conformance against hypershift cluster

View the Description View the linked PRs

This is a clone of issue OCPBUGS-34475. The following is the description of the original issue:
—
Description of problem:

When running a conformance suite against a hypershift cluster (for example, CNI conformance) the MonitorTests step fails because of missing files from the disruption monitor.

Version-Release number of selected component (if applicable):

4.15.13

How reproducible:

Consistent

Steps to Reproduce:

    1. Create a hypershift cluster
    2. Attempt to run an ose-tests suite. For example, the CNI conformance suite documented here: https://access.redhat.com/documentation/en-us/red_hat_software_certification/2024/html/red_hat_software_certification_workflow_guide/con_cni-certification_openshift-sw-cert-workflow-working-with-cloud-native-network-function#running-the-cni-tests_openshift-sw-cert-workflow-working-with-container-network-interface
    3. Note errors in logs

Actual results:

found errors fetching in-cluster data: [failed to list files in disruption event folder on node ip-10-0-130-177.us-west-2.compute.internal: the server could not find the requested resource failed to list files in disruption event folder on node ip-10-0-152-10.us-west-2.compute.internal: the server could not find the requested resource]
Failed to write events from in-cluster monitors, err: open /tmp/artifacts/junit/AdditionalEvents__in_cluster_disruption.json: no such file or directory

Expected results:

No errors

Additional info:

The first error can be avoided by creating the directory it's looking for on all nodes:
for node in $(oc get nodes -oname); do oc debug -n default $node -- chroot /host mkdir -p /var/log/disruption-data/monitor-events; done
However, I'm not sure if this directory not being created is due to the disruption monitor working properly on hypershift, or if this should be skipped on hypershift entirely.

The second error is related to the ARTIFACT_DIR env var not being set locally, and can be avoided by creating a directory, setting that directory as the ARTIFACT_DIR, and then creating an empty "junit" dir inside of it.
It looks like ARTIFACT_DIR defaults to a temporary directory if it's not set in the env, but the "junit" directory doesn't exist inside of it, so file creation in that non-existant directory fails.

https://github.com/openshift/origin/pull/28909

Bug OCPBUGS-42722: Panic seen in CI job for MCC pod

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42256~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41631. The following is the description of the original issue:
—
Description of problem:

Panic seen in below CI job when run the below command

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?name=^periodic&type=junit&search=machine-config-controller.*Observed+a+panic' | grep 'failures match'
periodic-ci-openshift-insights-operator-stage-insights-operator-e2e-tests-periodic (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-insights-operator-release-4.17-insights-operator-e2e-tests-periodic (all) - 2 runs, 100% failed, 50% of failures match = 50% impact

Panic observed:

E0910 09:00:04.283647       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 268 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x36c8b40, 0x5660c90})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000ce8540?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x36c8b40?, 0x5660c90?})
	/usr/lib/golang/src/runtime/panic.go:770 +0x132
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateNode(0xc000d6e360, {0x3abd580?, 0xc00224a608}, {0x3abd580?, 0xc001bd2308})
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:585 +0x1f3
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001933f70, {0x3faaba0, 0xc000759710}, 0x1, 0xc00097bda0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000750f70, 0x3b9aca00, 0x0, 0x1, 0xc00097bda0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000dc2630)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 261
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x33204b3]

Version-Release number of selected component (if applicable):

How reproducible:

Seen in this CI run -https://prow.ci.openshift.org/job-history/test-platform-results/logs/periodic-ci-openshift-insights-operator-stage-insights-operator-e2e-tests-periodic

Steps to Reproduce:

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?name=^periodic&type=junit&search=machine-config-controller.*Observed+a+panic' | grep 'failures match'

Actual results:

Expected results:

 No panic to observe

Additional info:

https://github.com/openshift/machine-config-operator/pull/4626

Spike HOSTEDCP-1309: Spike on Priority/Fairness configuration for k8s API server

View the Description View the linked PRs

In the "request-serving" deployment model for HCPs, the request-serving nodes are being memory starved by the k8s API server. This has an observed impact on limiting the number of nodes a guest HCP cluster can provision, especially during upgrade events.

This is a spike card to investing setting the API Priority and Fairness [1] configuration, and exactly what configuration would be necessary to set.

[1] https://kubernetes.io/docs/concepts/cluster-administration/flow-control/

https://github.com/openshift/hypershift/pull/3384

Bug OCPBUGS-14383: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24927: Update 4.16 ose-cluster-storage-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-storage-operator/pull/432

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-storage-operator/pull/432

Story TRT-1440: Weekly disruption data update in origin is broken... again

View the Description View the linked PRs

https://github.com/openshift/origin/pull/28484

Broken for 3 weeks, possibly for different reasons.

Unit test failing now, looks like 4.16 data not getting picked up, and it probably should be.

Bug OCPBUGS-25513: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/73

Bug OCPBUGS-34964: Post install featuregate enablement for egressfirewall doesn't work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34524~~. The following is the description of the original issue:
—
Description of problem:
Post featuregate enabling for egressfirewall doesn't work

Version-Release number of selected component (if applicable):
4.16

How reproducible:
Always

Steps to Reproduce:

1. Setup 4.16 ovn cluster
2. Following doc to enable feature gate https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-cli_nodes-cluster-enabling

3. Configure egressfirewall with dnsName

Actual results:
no dnsnameresolver under openshift-ovn-kubernetes
Expected results:
The feature is enabled and should have dnsnameresolver

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2402

Bug OCPBUGS-24950: Update 4.16 ose-powervs-block-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-powervs-block-csi-driver/pull/68

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/68

Bug OCPBUGS-9714: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13411

Bug OCPBUGS-31733: vSphere ABI compact and HA jobs are failing due to control-plane-machine-set operator degraded

View the Description View the linked PRs

Description of problem:

Agent CI jobs (compact and HA) are currently experiencing failures because the control-plane-machine-set operator is degraded, despite the SNO cluster operating normally.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Actual results:

level=info msg=Cluster operator control-plane-machine-set Available is False with UnavailableReplicas: Missing 3 available replica(s)124level=error msg=Cluster operator control-plane-machine-set Degraded is True with UnmanagedNodes: Found 3 unmanaged node(s)125level=info msg=Cluster operator csi-snapshot-controller EvaluationConditionsDetected is Unknown with NoData: 126level=info msg=Cluster operator etcd EvaluationConditionsDetected is Unknown with NoData: 127level=info msg=Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: 128level=info msg=Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer129level=info msg=Cluster operator insights Disabled is False with AsExpected: 130level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"dc5b9421-248f-4ac4-9135-ac5bf6bcd2ce","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}131level=info msg=Cluster operator kube-apiserver EvaluationConditionsDetected is False with AsExpected: All is well132level=info msg=Cluster operator kube-controller-manager EvaluationConditionsDetected is Unknown with NoData: 133level=info msg=Cluster operator kube-scheduler EvaluationConditionsDetected is Unknown with NoData: 134level=info msg=Cluster operator network ManagementStateDegraded is False with : 135level=info msg=Cluster operator openshift-controller-manager EvaluationConditionsDetected is Unknown with NoData: 136level=info msg=Cluster operator storage EvaluationConditionsDetected is Unknown with NoData: 137level=error msg=Cluster initialization failed because one or more operators are not functioning properly.138level=error msg=				The cluster should be accessible for troubleshooting as detailed in the documentation linked below,139level=error msg=				https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html140ERROR: Installation failed. Aborting execution.

Expected results:

Install should be successful.

Additional info:

HA must gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-vsphere-agent-ha-f14/1771068123387006976/artifacts/vsphere-agent-ha-f14/gather-must-gather/artifacts/must-gather.tar

Compact must gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/pr-logs/pull/openshift_release/50544/rehearse-50544-periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-vsphere-agent-compact-fips-f14/1775524930515898368/artifacts/vsphere-agent-compact-fips-f14/gather-must-gather/artifacts/must-gather.tar

https://github.com/openshift/assisted-service/pull/6239

Story RHOBS-995: Add unit testing for telemeter server rules

View the Description View the linked PRs

While implementing ~~MON-3669~~, we realized that none of the recording rules running on the telemeter server side are tested. Given how complex these rules can be, it's important for us to be confident that future changes won't bring regressions.

Even though not perfect, it's possible to unit tests Prometheus rules with the promtool binary (example in CMO: https://github.com/openshift/cluster-monitoring-operator/blob/2ca7067a4d1fc86b31f7a4816c85da6abc0c8abf/Makefile#L218-L221).

DoD

Add a unit test for "cluster:capacity_effective_cpu_cores" recording rule (added by https://github.com/openshift/telemeter/pull/501).
Rule unit testing added as a merge requirement to openshift/telemeter.

https://github.com/openshift/telemeter/pull/506

Bug OCPBUGS-16607: [4.16] Number of clusters failing install on Ironic Inspection has increased with 502 proxy error in logs

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-agent-image/pull/100

Bug OCPBUGS-24964: Update 4.16 ose-csi-external-snapshotter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/128

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/128

Bug OCPBUGS-32371: [Cluster Ingress Operator] Unable to update generated CRDs while bumping new API version

View the Description View the linked PRs

Description of problem:

The IngressController and DNSRecord CRDs were moved to dedicated packages
following the introduction of a new method for generating CRDs in the OpenShift API repository ([openshift/api#1803|https://github.com/openshift/api/pull/1803]).

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. go mod edit -replace=github.com/openshift/api=github.com/openshift/api@ff84c2c732279b16baccf08c7dfc9ff8719c4807
2. go mod tidy
3. go mod vendor
4. make update

Actual results:

$ make update
hack/update-generated-crd.sh
--- vendor/github.com/openshift/api/operator/v1/0000_50_ingress-operator_00-ingresscontroller.crd.yaml    1970-01-01 01:00:00.000000000 +0100
+++ manifests/00-custom-resource-definition.yaml    2024-04-17 18:05:05.009605155 +0200
[LONG DIFF]
cp: cannot stat 'vendor/github.com/openshift/api/operator/v1/0000_50_ingress-operator_00-ingresscontroller.crd.yaml': No such file or directory
make: *** [Makefile:39: crd] Error 1

Expected results:

$ make update
hack/update-generated-crd.sh 
hack/update-profile-manifests.sh

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/1045

Bug OCPBUGS-31744: Add fr and es languages to i18n script for Memsource upload

View the Description View the linked PRs

Add fr and es languages to i18n script for Memsource upload

https://github.com/openshift/console/pull/13729

Bug OCPBUGS-27152: node status popver dialog always jump to the first line

View the Description View the linked PRs

Description of problem:

click on any node status popover, the popover dialog will always move to the first line

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-14-100410

How reproducible:

Always

Steps to Reproduce:

1. goes to Nodes list page, mark any worker as unschedulable
2. click on the 'Ready/Scheduling diabled' text, a popover dialog is opened 
3.

Actual results:

2. the popover dialog will always jump to the first line, it looks like popover dialog is showing wrong status/info, also it's difficult for user to perform node actions

Expected results:

2. popover dialog should be shown exactly next to the correct node

Additional info:

https://github.com/openshift/console/pull/13524

Bug OCPBUGS-28941: ART requests updates to 4.16 image ose-csi-driver-shared-resource-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource-operator/pull/101

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource-operator/pull/101

Bug OCPBUGS-35525: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/egress-router-cni/pull/85

Bug OCPBUGS-38093: [release-4.16] OpenShift Lightspeed link does not open OperatorHub modal

View the Description View the linked PRs

Description of problem:

When clicking the "OpenShift Lightspeed" link on the cluster overview page, the OperatorHub modal does not automatically open.  That is the result of Lightspeed having been added to a different catalog than is being referenced in the link.  `/operatorhub/all-namespaces?keyword=lightspeed&details-item=lightspeed-operator-lightspeed-operator-catalog-openshift-marketplace` should be `/operatorhub/all-namespaces?keyword=lightspeed&details-item=lightspeed-operator-redhat-operators-openshift-marketplace`.  This was fixed in the master branch with https://github.com/openshift/console/pull/14030/files#diff-6abb294e295409309d88e80b2bf2added8a5ebc7cbc12893baddb0b3212c7d85R38-R39, but this change needs to be backported to 4.16.z.

https://github.com/openshift/console/pull/14112

Bug OCPBUGS-38803: [4.16 backport] Show deprecated operators in OperatorHub

View the Description View the linked PRs

Description of problem:

Backport of https://issues.redhat.com/browse/CONSOLE-4108

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/14095

Bug OCPBUGS-24804: Update 4.16 csi-provisioner-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-provisioner/pull/79

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-provisioner/pull/79

Bug OCPBUGS-19008: Metal Day-1 When No Hostname is Provided by Either rDNS or DHCP, All Hosts are Named "localhost".

View the Description View the linked PRs

Platform:

IPI on Baremetal

What happened?

In cases where no hostname is provided, host are automatically assigned the name "localhost" or "localhost.localdomain".

[kni@provisionhost-0-0 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready master 31m v1.22.1+6859754
master-0-1 Ready master 39m v1.22.1+6859754
master-0-2 Ready master 39m v1.22.1+6859754
worker-0-0 Ready worker 12m v1.22.1+6859754
worker-0-1 Ready worker 12m v1.22.1+6859754

What did you expect to happen?

Having all hosts come up as localhost is the worst possible user experience, because they'll fail to form a cluster but you won't know why.

However, we know the BMH name in the image-customization-controller, it would be possible to configure the ignition to set a default hostname if we don't have one from DHCP/DNS.

If not, we should at least fail the installation with a specific error message to this situation.

----------
30/01/22 - adding how to reproduce
----------

How to Reproduce:

1)prepare and installation with day-1 static ip.

add to install-config uner one of the nodes:
networkConfig:
routes:
config:

destination: 0.0.0.0/0
next-hop-address: 192.168.123.1
next-hop-interface: enp0s4
dns-resolver:
config:
server:
192.168.123.1
interfaces:
name: enp0s4
type: ethernet
state: up
ipv4:
address:
ip: 192.168.123.110
prefix-length: 24
enabled: true

2)Ensure a DNS PTR for the address IS NOT configured.

3)create manifests and cluster from install-config.yaml

installation should either:
1)fail as early as possible, and provide some sort of feed back as to the fact that no hostname was provided.
2)derive the Hostname from the bmh or the ignition files

https://github.com/openshift/ironic-agent-image/pull/106

Bug OCPBUGS-33237: Kube-apiserver-proxy pod in Hosted Control Plane cluster does not use no_proxy variable

View the Description View the linked PRs

Description of problem:

Looks like we are facing a bug when trying to spin up a hosted control plane cluster while using proxy settings to connect to the internet. For example, on our worker node, the static pod kube-apiserver-proxy.yaml doesn't contain the noProxy settings which seem to cause the failure of deploying the hosted cluster.

~~~
[root@ocpugbo2cogswo03 manifests]# cat kube-apiserver-proxy.yaml_
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    k8s-app: kube-apiserver-proxy
  name: kube-apiserver-proxy
  namespace: kube-system
spec:
  containers:
  - command:
    - control-plane-operator
    - kubernetes-default-proxy
    - --listen-addr=<IP-Addr>:6443
    - --proxy-addr=<Proxy-Addr>:<Proxy-port>
    - --apiserver-addr=<API-IP-Addr>:6443
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7ca95b9a71e41157c70378896758618b993ad90e6d80a23c46170da5c11f441f
    name: kubernetes-default-proxy
    resources:
      requests:
        cpu: 13m
        memory: 16Mi
    securityContext:
      runAsUser: 1001
  hostNetwork: true
  priorityClassName: system-node-critical
status: {}
~~~

Can you please check this issue.

Steps to Reproduce:

    1. Install a cluster with ACM and HCP
    2. Try to create a hosted cluster using proxy configuration
    3. kube-apiserver-proxy is using proxy to reach API.

Actual results:

    The kube-apiserver-proxy is using proxy to reach API. Worker nodes are unable to reach a Hosted Control Plane's API when a cluster-wide http proxy is configured.

Expected results:

    kube-apiserver-proxy should not use proxy to reach API

Additional info:

https://github.com/openshift/hypershift/pull/3999

Bug OCPBUGS-25099: Update 4.16 ose-image-customization-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-customization-controller/pull/114

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-customization-controller/pull/114

Bug OCPBUGS-29676: namespace "openshift-cluster-api" not found in CustomNoUpgrade

View the Description View the linked PRs

Description of problem:

    capi-based installer failing with missing openshift-cluster-api namespace

Version-Release number of selected component (if applicable):

How reproducible:

Always in CustomNoUpgrade

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Install failure

Expected results:

    namespace create, install succeeds or does not error on missing namespace

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/163

Bug OCPBUGS-24794: Update 4.16 ose-cluster-olm-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/39

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/39

Bug OCPBUGS-24865: Update 4.16 ose-baremetal-runtimecfg-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-runtimecfg/pull/291

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-runtimecfg/pull/291

Bug OCPBUGS-35529: Reusing installer state causes HostedZoneAlreadyExists failure

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35511~~. The following is the description of the original issue:
—
Description of problem:

If an infra-id (which is uniquely generated by the installer) is reused the installer will fail with:

level=info msg=Creating private Hosted Zone
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed provisioning resources after infrastructure ready: failed to create private hosted zone: error creating private hosted zone: HostedZoneAlreadyExists: A hosted zone has already been created with the specified caller reference.


Users should not be reusing installer state in this manner, but we do it purposefully in our ipi-install-install step to mitigate infrastructure provisioning flakes:

https://steps.ci.openshift.org/reference/ipi-install-install#line720

We can fix this by ensuring the caller ref is unique on each invocation.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8612

Bug OCPBUGS-34976: [internal] add spot instance support to control plane nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34975~~. The following is the description of the original issue:
—
Description of problem:

    See https://issues.redhat.com//browse/CORS-3523 and https://issues.redhat.com//browse/CORS-3524 for the overall issue.

Creating this bug for backporting purposes.

Version-Release number of selected component (if applicable):

all

How reproducible:

    always in the terraform path

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    spot instances only supported for worker nodes.

Expected results:

    spot instances used for all nodes.

Additional info:

https://github.com/openshift/installer/pull/8526

Bug OCPBUGS-26197: PKI Operator Starts Even When Hosted Cluster Is Annoated To Turn Off PKI

View the Description View the linked PRs

Description of problem:

    pki operator runs even when annotation to turn off PKI is on the hosted control plane

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3368

Bug OCPBUGS-32089: Authentication blips Available=False with WellKnown_NotReady

View the Description View the linked PRs

This bug just focuses on denoising WellKnown_NotReady. More generic Available=False denoising is tracked in https://issues.redhat.com/browse/OCPBUGS-20056.

Description of problem:

Reviving bugzilla#2010539, the authentication ClusterOperator occasionally blips Available=False with reason=WellKnown_NotReady. For example, this run includes:

: [bz-apiserver-auth] clusteroperator/authentication should not change condition/Available expand_less	47m21s
{  1 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:

Oct 03 19:11:20.502 - 245ms E clusteroperator/authentication condition/Available reason/WellKnown_NotReady status/False WellKnownAvailable: The well-known endpoint is not yet available: failed to GET kube-apiserver oauth endpoint https://10.0.0.3:6443/.well-known/oauth-authorization-server: dial tcp 10.0.0.3:6443: i/o timeout

While a dial timeout for the Kube API server isn't fantastic, an issue that only persists for 245ms is not long enough to warrant immediate admin intervention. Teaching the authentication operator to stay Available=True for this kind of brief hiccup, while still going Available=False for issues where least part of the component is non-functional, and that the condition requires immediate administrator intervention would make it easier for admins and SREs operating clusters to identify when intervention was required.

Version-Release number of selected component (if applicable):

4.8, 4.10, and 4.15. Likely all supported versions of the authentication operator have this exposure.

How reproducible:

Looks like 10 to 50% of 4.15 runs have some kind of issue with authentication going Available=False, see Actual results below. These are likely for reasons that do not require admin intervention, although figuring that out is tricky today, feel free to push back if you feel that some of these do warrant admin immediate admin intervention.

Steps to Reproduce:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/authentication+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort

Actual results:

periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 18 runs, 44% failed, 13% of failures match = 6% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-aws-sdn-arm64 (all) - 9 runs, 67% failed, 17% of failures match = 11% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-azure-ovn-arm64 (all) - 9 runs, 33% failed, 33% of failures match = 11% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-azure-ovn-heterogeneous (all) - 18 runs, 56% failed, 30% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ovn-serial-aws-arm64 (all) - 9 runs, 33% failed, 33% of failures match = 11% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-serial-ovn-ppc64le-powervs (all) - 6 runs, 100% failed, 33% of failures match = 33% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 18 runs, 67% failed, 25% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-e2e-aws-sdn-arm64 (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-aws-ovn-heterogeneous-upgrade (all) - 18 runs, 50% failed, 33% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 70 runs, 41% failed, 86% of failures match = 36% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 21% failed, 76% of failures match = 16% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn-techpreview-serial (all) - 7 runs, 29% failed, 100% of failures match = 29% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 28% failed, 36% of failures match = 10% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 39% failed, 123% of failures match = 48% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 71 runs, 49% failed, 80% of failures match = 39% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-upgrade (all) - 7 runs, 57% failed, 75% of failures match = 43% impact
periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-sdn-upgrade (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 7 runs, 100% failed, 57% of failures match = 57% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 70 runs, 34% failed, 4% of failures match = 1% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-azure-sdn (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-serial (all) - 7 runs, 43% failed, 67% of failures match = 29% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-upgrade (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-serial-ovn-ipv6 (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 7 runs, 71% failed, 20% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 7 runs, 43% failed, 33% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-upgrade-ovn-ipv6 (all) - 7 runs, 100% failed, 57% of failures match = 57% impact
periodic-ci-openshift-release-master-okd-4.15-e2e-aws-ovn-upgrade (all) - 12 runs, 58% failed, 14% of failures match = 8% impact

Digging into reason and message frequency in 4.15-releated update CI:

$ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/authentication.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False[^:]*: \(.*\)|\1 \2 \3|' | sed 's/[0-9]*[.][0-9]*[.][0-9]*[.][0-9]*/x.x.x.x/g;s|[.]apps[.][^/]*|.apps.../|g' | sort | uniq -c | sort -n
      1 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.oauth.openshift.io: not available: failing or missing response from https://x.x.x.x:8443/apis/oauth.openshift.io/v1: Get "https://x.x.x.x:8443/apis/oauth.openshift.io/v1": dial tcp x.x.x.x:8443: i/o timeout
      1 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.oauth.openshift.io: not available: failing or missing response from https://x.x.x.x:8443/apis/oauth.openshift.io/v1: Get "https://x.x.x.x:8443/apis/oauth.openshift.io/v1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      1 authentication APIServices_Error "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
      1 authentication APIServices_Error rpc error: code = Unavailable desc = the connection is draining
      1 authentication OAuthServerRouteEndpointAccessibleController_EndpointUnavailable Get "https://oauth-openshift.apps...//healthz": dial tcp: lookup oauth-openshift.apps.../
      1 authentication OAuthServerRouteEndpointAccessibleController_EndpointUnavailable Get "https://oauth-openshift.apps...//healthz": dial tcp x.x.x.x:443: connect: connection refused
      1 authentication OAuthServerServiceEndpointAccessibleController_EndpointUnavailable Get "https://[fd02::410f]:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
      1 Nov 28 09:09:40.407 - 1s    E clusteroperator/authentication condition/Available reason/APIServerDeployment_PreconditionNotFulfilled status/False
      2 authentication APIServerDeployment_NoPod no .openshift-oauth-apiserver pods available on any node.
      2 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.user.openshift.io: not available: failing or missing response from https://x.x.x.x:8443/apis/user.openshift.io/v1: Get "https://x.x.x.x:8443/apis/user.openshift.io/v1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      2 authentication APIServices_Error rpc error: code = Unknown desc = malformed header: missing HTTP content-type
      4 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.oauth.openshift.io: not available: endpoints for service/api in "openshift-oauth-apiserver" have no addresses with port name "https"
      4 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.user.openshift.io: not available: failing or missing response from https://x.x.x.x:8443/apis/user.openshift.io/v1: Get "https://x.x.x.x:8443/apis/user.openshift.io/v1": dial tcp x.x.x.x:8443: i/o timeout
      6 authentication OAuthServerDeployment_NoDeployment deployment/openshift-authentication: could not be retrieved
      7 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.oauth.openshift.io: not available: endpoints for service/api in "openshift-oauth-apiserver" have no addresses with port name "https"\nAPIServicesAvailable: apiservices.apiregistration.k8s.io/v1.user.openshift.io: not available: endpoints for service/api in "openshift-oauth-apiserver" have no addresses with port name "https"
      7 authentication OAuthServerServiceEndpointAccessibleController_EndpointUnavailable Get "https://x.x.x.x:443/healthz": dial tcp x.x.x.x:443: i/o timeout (Client.Timeout exceeded while awaiting headers)
      8 authentication APIServerDeployment_NoPod no apiserver.openshift-oauth-apiserver pods available on any node.
      9 authentication APIServerDeployment_NoDeployment deployment/openshift-oauth-apiserver: could not be retrieved
      9 authentication OAuthServerRouteEndpointAccessibleController_EndpointUnavailable Get "https://oauth-openshift.apps...//healthz": EOF
     11 authentication WellKnown_NotReady The well-known endpoint is not yet available: failed to GET kube-apiserver oauth endpoint https://x.x.x.x:6443/.well-known/oauth-authorization-server: dial tcp x.x.x.x:6443: i/o timeout
     23 authentication APIServices_Error "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request\nAPIServicesAvailable: "user.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
     26 authentication APIServices_Error "user.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
     29 authentication APIServices_Error apiservices.apiregistration.k8s.io/v1.user.openshift.io: not available: endpoints for service/api in "openshift-oauth-apiserver" have no addresses with port name "https"
     29 authentication OAuthServerRouteEndpointAccessibleController_EndpointUnavailable Get "https://oauth-openshift.apps...//healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
     30 authentication OAuthServerServiceEndpointAccessibleController_EndpointUnavailable Get "https://x.x.x.x:443/healthz": dial tcp x.x.x.x:443: connect: connection refused
     34 authentication OAuthServerServiceEndpointAccessibleController_EndpointUnavailable Get "https://x.x.x.x:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

And simplifying by looking only at reason:

 curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/authentication.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False.*|\1 \2|' | sort | uniq -c | sort -n
      1 authentication APIServerDeployment_PreconditionNotFulfilled
      6 authentication OAuthServerDeployment_NoDeployment
      8 authentication APIServerDeployment_NoDeployment
     10 authentication APIServerDeployment_NoPod
     11 authentication WellKnown_NotReady
     36 authentication OAuthServerRouteEndpointAccessibleController_EndpointUnavailable
     43 authentication APIServices_PreconditionNotReady
     66 authentication OAuthServerServiceEndpointAccessibleController_EndpointUnavailable
     95 authentication APIServices_Error

Expected results:

Authentication goes Available=False on WellKnown_NotReady if and only if immediate admin intervention is appropriate.

https://github.com/openshift/cluster-authentication-operator/pull/664

Bug OCPBUGS-34663: [4.16] Live migration pre-migration validation

View the Description View the linked PRs

The SDN live migration can not work properly in a cluster with specific configurations. CNO shall refuse proceeding the live migration in such a case. We need to add the pre-migration validation to CNO

The live migration shall be blocked for clusters with the following configuration

OpenShiftSDN multitenat mode.
Egress Router
cluster network or service network ranges conflict with the OVN-K internal subnets

https://github.com/openshift/cluster-network-operator/pull/2392

Story OPRUN-3291: Make Validating Admission Policy APIs work downstream with TechPreviewNoUpgrade

View the Description View the linked PRs

In upstream we started using ValidatingAdmissionPolicy API to enforce package uniqueness for ClusterExtension.

These API are still not enabled by default in OCP 4.16 (K8s 1.29), but should be enabled with TechPreviewNoUpgrade. For some reason our E2E CI job fails despite the fact that we are running it with tech preview.

Story TRT-1664: console operator crashing

View the Description View the linked PRs

Undiagnosed panic detected in pod

{  pods/openshift-console-operator_console-operator-598b8fdbb-v2tnv_console-operator.log.gz:E0510 11:53:50.992299       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
pods/openshift-console-operator_console-operator-598b8fdbb-v2tnv_console-operator.log.gz:E0510 11:53:51.008625       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
pods/openshift-console-operator_console-operator-598b8fdbb-v2tnv_console-operator.log.gz:E0510 11:53:51.023108       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
pods/openshift-console-operator_console-operator-598b8fdbb-v2tnv_console-operator.log.gz:E0510 11:53:51.033921       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
pods/openshift-console-operator_console-operator-598b8fdbb-v2tnv_console-operator.log.gz:E0510 11:53:51.045080       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Sippy shows this appears to be hitting a few places though it's early and hard to see. Serial looks affected.

Job runs with failure, note due to the nature of where this panic is detected, job runs can pass despite this crash. The failure will show in prow, but the job will can be considered successful if nothing else failed.

search CI chart view seems to indicate it kicked off roughly 24 hours ago.

Possible revert candidate? https://github.com/openshift/console-operator/pull/895

https://github.com/openshift/console-operator/pull/899

Bug OCPBUGS-27214: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/654

Bug OCPBUGS-36766: Removing imageContentSources from HostedCluster does not update IDMS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34820~~. The following is the description of the original issue:
—
Description of problem:

    Removing imageContentSources from HostedCluster does not update IDMS for the cluster.

Version-Release number of selected component (if applicable):

    Tested with 4.15.14

How reproducible:

    100%

Steps to Reproduce:

    1. add imageContentSources to HostedCluster
    2. verify it is applied to IDMS
    3. remove imageContentSources from HostedCluster

Actual results:

    IDMS is not updated to remove imageDigestMirrors contents

Expected results:

    IDMS is updated to remove imageDigestMirrors contents

Additional info:

    Workaround, set imageContentSources=[]

https://github.com/openshift/hypershift/pull/4434

Bug OCPBUGS-24846: Update 4.16 ose-csi-external-resizer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/152

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/152

Bug OCPBUGS-38822: capv session timeout

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38677~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38657. The following is the description of the original issue:
—
https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/2781
https://kubernetes.slack.com/archives/CKFGK3SSD/p1704729665056699
https://github.com/okd-project/okd/discussions/1993#discussioncomment-10385535

Description of problem:

INFO Waiting up to 15m0s (until 2:23PM UTC) for machines [vsphere-ipi-b8gwp-bootstrap vsphere-ipi-b8gwp-master-0 vsphere-ipi-b8gwp-master-1 vsphere-ipi-b8gwp-master-2] to provision...
E0819 14:17:33.676051    2162 session.go:265] "Failed to keep alive govmomi client, Clearing the session now" err="Post \"https://vctest.ars.de/sdk\": context canceled" server="vctest.ars.de" datacenter="" username="administrator@vsphere.local"
E0819 14:17:33.708233    2162 session.go:295] "Failed to keep alive REST client" err="Post \"https://vctest.ars.de/rest/com/vmware/cis/session?~action=get\": context canceled" server="vctest.ars.de" datacenter="" username="administrator@vsphere.local"
I0819 14:17:33.708279    2162 session.go:298] "REST client session expired, clearing session" server="vctest.ars.de" datacenter="" username="administrator@vsphere.local"

https://github.com/openshift/installer/pull/8891

Bug OCPBUGS-27465: origin tests fail when two systemd service files are present in the kubelet machineconfig

View the Description View the linked PRs

Description of problem:

The test implementation in https://github.com/openshift/origin/commit/5487414d8f5652c301a00617ee18e5ca8f339cb4#L56 assumes there is just one kubelet service or at least that it is always the first one in the MCP. Which just changed in https://github.com/openshift/machine-config-operator/pull/4124 and the test is failing.

Version-Release number of selected component (if applicable):

master branch of 4.16

How reproducible:

always during test

Steps to Reproduce:

    1. Test with https://github.com/openshift/machine-config-operator/pull/4124 applied

Actual results:

Test detects a wrong service and fails

Expected results:

Test finds the proper kubelet.service and passes

Additional info:

https://github.com/openshift/origin/pull/28529

Bug OCPBUGS-29355: Output image url link leads to 404 for Shipwright Builds

View the Description View the linked PRs

When clicking on the output image link on a Shipwright BuildRun details page, the link leads to the imagestream details page but shows 404 error.

The image link is:

https://console-openshift-console.apps...openshiftapps.com/k8s/ns/buildah-example/imagestreams/sample-kotlin-spring%3A1.0-shipwright

The BuildRun spec

apiVersion: shipwright.io/v1beta1
kind: BuildRun
metadata: 
  generateName: sample-spring-kotlin-build-
  name: sample-spring-kotlin-build-xh2dq
  namespace: buildah-example
  labels: 
    build.shipwright.io/generation: '2'
    build.shipwright.io/name: sample-spring-kotlin-build
spec: 
  build: 
    name: sample-spring-kotlin-build
status: 
  buildSpec: 
    output: 
      image: 'image-registry.openshift-image-registry.svc:5000/buildah-example/sample-kotlin-spring:1.0-shipwright'
    paramValues: 
      - name: run-image
        value: 'paketocommunity/run-ubi-base:latest'
      - name: cnb-builder-image
        value: 'paketobuildpacks/builder-jammy-tiny:0.0.176'
      - name: app-image
        value: 'image-registry.openshift-image-registry.svc:5000/buildah-example/sample-kotlin-spring:1.0-shipwright'
    source: 
      git: 
        url: 'https://github.com/piomin/sample-spring-kotlin-microservice.git'
      type: Git
    strategy: 
      kind: ClusterBuildStrategy
      name: buildpacks
  completionTime: '2024-02-12T12:15:03Z'
  conditions: 
    - lastTransitionTime: '2024-02-12T12:15:03Z'
      message: All Steps have completed executing
      reason: Succeeded
      status: 'True'
      type: Succeeded
  output: 
    digest: 'sha256:dc3d44bd4d43445099ab92bbfafc43d37e19cfaf1cac48ae91dca2f4ec37534e'
  source: 
    git: 
      branchName: master
      commitAuthor: Piotr Mińkowski
      commitSha: aeb03d60a104161d6fd080267bf25c89c7067f61
  startTime: '2024-02-12T12:13:21Z'
  taskRunName: sample-spring-kotlin-build-xh2dq-j47ql

https://github.com/openshift/console/pull/13602

Bug OCPBUGS-30769: Kube-Apiserver server certificates should be signed with node local client load balancer address

View the Description View the linked PRs

Description of problem:

   In-cluster clients should be able to talk directly to the node local apiservert ip address and as a best practice should all be configured to use it. This load balancer provides added benefit in cloud environments of healthchecking the path from the machine to the load balancer fronting the kube-apiserver. It becomes more cruicial in baremetal/on-prem environments where there may not be a load balancer and instead just 3 unique endpoints directly to redundant kube-apiservers. In this case: if using just dns: intermittent traffic failures will be experienced if a control plane instance goes down. Using the node local load balancer: there will be no traffic disruption

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    100%

Steps to Reproduce:

    1. Schedule a pod on any hypershift cluster node
    2. In the pod run curl -v -k https://172.20.0.1:6443
    3. The verbose output will show that the kube-apiserver cert does not have the node local client load balancer IP address in it's IPs section and therefore will not allow valid HTTPS requests on that address

Actual results:

    Secure HTTPS requests cannot be made to the kube-apiserver

Expected results:

     Secure HTTPS requests can be made to the kube-apiserver (no need to run -k when specifying proper CA bundle)

Additional info:

https://github.com/openshift/hypershift/pull/3699

Bug OCPBUGS-34220: [aws] timeout deleting ssh rule during bootstrap destroy

View the Description View the linked PRs

Description of problem:

    Sometimes deleting the bootstrap ssh rule during bootstrap destroy can timeout after 5min, failing the installation.

Version-Release number of selected component (if applicable):

    4.16+ with capi/aws

How reproducible:

    Intermittent

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

                level=info msg=Waiting up to 5m0s (until 2:31AM UTC) for bootstrap SSH rule to be destroyed...
                level=fatal msg=error destroying bootstrap resources failed during the destroy bootstrap hook: failed to remove bootstrap SSH rule: bootstrap ssh rule was not removed within 5m0s: timed out waiting for the condition

Expected results:

    The rule is deleted successfully and in a timely manner.

Additional info:

    This is probably happening because we are changing the AWSCluster object, thus causing capi/capa to trigger a big reconciliation of the resources. We should try to delete the rule via aws sdk.

https://github.com/openshift/installer/pull/8451

Bug OCPBUGS-35753: nmstate-configuration.service failed due to wrong variable name $hostname_file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33331~~. The following is the description of the original issue:
—
Description of problem:

nmstate-configuration.service failed due to wrong variable name $hostname_file
https://github.com/openshift/machine-config-operator/blob/5a6e8b81f13de2dbf606a497140ac6e9c2a00e6f/templates/common/baremetal/files/nmstate-configuration.yaml#L26

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

always

Steps to Reproduce:

    1. install cluster via dev-script, with node-specific network configuration

Actual results:

nmstate-configuration failed:

sh-5.1# journalctl -u nmstate-configuration
May 07 02:19:54 worker-0 systemd[1]: Starting Applies per-node NMState network configuration...
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + systemctl -q is-enabled mtu-migration
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + echo 'Cleaning up left over mtu migration configuration'
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: Cleaning up left over mtu migration configuration
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + rm -rf /etc/cno/mtu-migration
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + '[' -e /etc/nmstate/openshift/applied ']'
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + src_path=/etc/nmstate/openshift
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + dst_path=/etc/nmstate
May 07 02:19:54 worker-0 systemd[1]: nmstate-configuration.service: Main process exited, code=exited, status=1/FAILURE
May 07 02:19:54 worker-0 nmstate-configuration.sh[1565]: ++ hostname -s
May 07 02:19:54 worker-0 systemd[1]: nmstate-configuration.service: Failed with result 'exit-code'.
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + hostname=worker-0
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + host_file=worker-0.yml
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + cluster_file=cluster.yml
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + config_file=
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: + '[' -s /etc/nmstate/openshift/worker-0.yml ']'
May 07 02:19:54 worker-0 nmstate-configuration.sh[1553]: /usr/local/bin/nmstate-configuration.sh: line 22: hostname_file: unbound variable
May 07 02:19:54 worker-0 systemd[1]: Failed to start Applies per-node NMState network configuration.

Expected results:

cluster can be setup successfully with node-specific network configuration via new mechanism

Additional info:

https://github.com/openshift/machine-config-operator/pull/4413

Bug OCPBUGS-37615: PF4 chart color CSS variables not available in OpenShift 4.15+

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36816~~. The following is the description of the original issue:
—
Description of problem:

Dynamic plugins using PatternFly 4 could be referring to PF4 variables that do not exist in OpenShift 4.15+. Currently this is causing contrast issues for ACM in dark mode for donut charts.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1. Install ACM on OpenShift 4.15
    2. Switch to dark mode
    3. Observe Home > Overview page

Actual results:

 Some categories in the donut charts cannot be seen due to low contrast

Expected results:

 Colors should match those seen in OpenShift 4.14 and earlier

Additional info:

Also posted about this on Slack: https://redhat-internal.slack.com/archives/C011BL0FEKZ/p1720467671332249

Variables like --pf-chart-color-gold-300 are no longer provided, although the PF5 equivalent, --pf-v5-chart-color-gold-300, is available. The stylesheet @patternfly/patternfly/patternfly-charts.scss is present, but not the V4 version. Hopefully it is possible to also include these styles since the names now include a version.

https://github.com/openshift/console/pull/14091

Bug OCPBUGS-42608: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4830

Bug OCPBUGS-24983: Update 4.16 telemeter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/497

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24889: Update 4.16 cluster-etcd-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1175

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1175

Bug OCPBUGS-27929: Update 4.16 ose-kube-storage-version-migrator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/203

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/203

Bug OCPBUGS-28601: webhook release payload validation introduces resource ordering error

View the Description View the linked PRs

Description of problem:

    Recent introductions of a validation within the hypershift operator's webhook conflicts with the UI's ability to create HCP clusters. Previously the pull secret was not required to be posted before an HC or NP, but with a recent change, the pull secret is required because the pull secret is used to validate the release image payload.

This issue is isolated to 4.15

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    100% attempt to post a HC before the pull secret is posted and the HC will be rejected. 

The expected outcome is that it should be able to post the pull secret for a HC after the HC is posted, and the controller should be eventually consistent to this change.

https://github.com/openshift/hypershift/pull/3484

Bug OCPBUGS-31528: Upgrade failed from 4.15 to 4.16 on UPI vsphere

View the Description View the linked PRs

Description of problem:

    Failed to upgrade 4.15 from 4.16 with vsphere UPI due to 
03-19 05:58:11.372  network                                    4.16.0-0.nightly-2024-03-13-061822   True        False         True       9h      Error while updating infrastructures.config.openshift.io/cluster: failed to apply / update (config.openshift.io/v1, Kind=Infrastructure) /cluster: Infrastructure.config.openshift.io "cluster" is invalid: [spec.platformSpec.vsphere.apiServerInternalIPs: Invalid value: "null": spec.platformSpec.vsphere.apiServerInternalIPs in body must be of type array: "null", spec.platformSpec.vsphere.ingressIPs: Invalid value: "null": spec.platformSpec.vsphere.ingressIPs in body must be of type array: "null", spec.platformSpec.vsphere.machineNetworks: Invalid value: "null": spec.platformSpec.vsphere.machineNetworks in body must be of type array: "null", <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

Version-Release number of selected component (if applicable):

  upgrade chain 4.11.58-x86_64 - > 4.12.53-x86_64,4.13.37-x86_64,4.14.17-x86_64,4.15.3-x86_64,4.16.0-0.nightly-2024-03-13-061822

How reproducible:

    always

Steps to Reproduce:

    1. Upgrade cluster from 4.15 -> 4.16
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2327

Bug OCPBUGS-35302: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8571

Bug OCPBUGS-24386: VRF integration does not work when external traffic policy is local

View the Description View the linked PRs

Description of problem:

The layout as described in https://docs.openshift.com/container-platform/4.14/networking/metallb/metallb-configure-return-traffic.html does not work if the service has external traffic policy = local

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Follow https://docs.openshift.com/container-platform/4.14/networking/metallb/metallb-configure-return-traffic.html with etp = local
    2.
    3.

Actual results:

The service's reply does not make it to the client

Expected results:

Additional info:

This is because the routes leverage the clusterip cidr, when with etp=local they leverage the special masquerade ip

https://github.com/openshift/ovn-kubernetes/pull/2278

Bug OCPBUGS-43046: Traffic to audit-webhook:8443 getting routed through Konnectivity proxy in ROSA

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42974~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42873. The following is the description of the original issue:
—
Description of problem:

openshift-apiserver that sends traffic through konnectivity proxy is sending traffic intended for the local audit-webhook service. The audit-webhook service should be included in the NO_PROXY env var of the openshift-apiserver container.

4.14.z,4.15.z,4.15.z,4.16.z

    How reproducible:{code:none} Always

Steps to Reproduce:

    1. Create a rosa hosted cluster
    2. Obeserve logs of the konnectivity-proxy sidecar of openshift-apiserver
    3.

Actual results:

     Logs include requests to the audit-webhook local service

Expected results:

      Logs do not include requests to audit-webhook

Additional info:

https://github.com/openshift/hypershift/pull/4883

Bug OCPBUGS-26557: Nondeterministic application of kubeletconfigs

View the Description View the linked PRs

Description of problem:

ARO supplies a platform kubeletconfig to enable certain features, currently we use this to enable node sizing or enable autoSizingReserved. Customers want the ability to customize podPidsLimit and we have directed them to configure a second kubeletconfig.

When these kubeletconfigs are rendered into machineconfigs, the order of their application is nondeterministic: the MCs are suffixed by an increasing serial number based on the order the kubeletconfigs were created. This makes it impossible for the customer to ensure their PIDs limit is applied while still allowing ARO to maintain our platform defaults.

We need a way of supplying platform defaults while still allowing the customer to make supported modifications in a way that does not risk being reverted during upgrades or other maintenance.

This issue has manifested in two different ways:

During an upgrade from 4.11.31 to 4.12.40, a cluster had the order of kubeletconfig rendered machine configs reverse. We think that in older versions, the initial kubeletconfig did not get an mc-name-suffix annotation applied, but rendered to "99-worker-generated-kubelet" (no suffix). The customer-provided kubeletconfig rendered to the suffix "-1". During the upgrade, MCO saw this as a new kubeletconfig and assigned it the suffix "-2", effectively reversing their order. See the RCS document https://docs.google.com/document/d/19LuhieQhCGgKclerkeO1UOIdprOx367eCSuinIPaqXA

ARO wants to make updates to the platform defaults. We are changing from a kubeletconfig "aro-limits" to a kubeletconfig "dynamic-node". We want to be able to do this while still keeping it as defaults and if the customer has created their own kubeletconfig, the customer's should still take precedence. What we see is that the creation of a new kubeletconfig regardless of source overrides all other kubeletconfigs, causing the customer to lose their customization.

Version-Release number of selected component (if applicable):

4.12.40+

ARO's older kubeletconfig "aro-limits":

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  labels:
    aro.openshift.io/limits: ""
  name: aro-limits
spec:
  kubeletConfig:
    evictionHard:
      imagefs.available: 15%
      memory.available: 500Mi
      nodefs.available: 10%
      nodefs.inodesFree: 5%
    systemReserved:
      memory: 2000Mi
  machineConfigPoolSelector:
    matchLabels:
      aro.openshift.io/limits: ""

ARO's newer kubeletconfig, "dynamic-node"

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: dynamic-node
spec:
  autoSizingReserved: true
  machineConfigPoolSelector:
    matchExpressions:
    - key: machineconfiguration.openshift.io/mco-built-in
      operator: Exists

Customer's desired kubeletconfig:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  labels:
    arogcd.arogproj.io/instance: cluster-config
  name: default-pod-pids-limit
spec:
  kubeletConfig:
    podPidsLimit: 2000000
  machineConfigPoolSelector:
    matchExpressions:
    - key: pools.operator.machineconfiguration.io/worker
      operator: Exists

https://github.com/openshift/machine-config-operator/pull/4111

Bug OCPBUGS-25634: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/33

Bug OCPBUGS-32091: CAPI-Installer leaks processes during unsuccessful installs

View the Description View the linked PRs

Description of problem:

If the installer using cluster api exits before bootstrap destroy, it may leak processes which continue to run in the background of the host system. These processes may continue to reconcile cloud resources, so the cluster resources would be created and recreated even when you are trying to delete them. 

This occurs because the installer runs kube-apiserver, etcd, and the capi provider binaries as subprocesses. If the installer exits without shutting down those subprocesses, due to an error or user interrupt, the processes will continue to run in the background.

The processes can be identified with the ps command. pgrep and pkill are also useful.

Brief discussion here of this occurring in PowerVS: https://redhat-internal.slack.com/archives/C05QFJN2BQW/p1712688922574429

Version-Release number of selected component (if applicable):

How reproducible:

    Often

Steps to Reproduce:

    1. Run capi-based install (on any platform), by specifying fields below in the install config [0]
    2. Wait until CAPI controllers begin to run. This will be easy to identify because the terminal will fill with controller logs. Particularly you should see [1]
    3. Once the controllers are running interrupt with CTRL + C

[0] Install config for capi install
featureGates:
- ClusterAPIInstall=true
featureSet: CustomNoUpgrade    

[1] INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /c/auth/envtest.kubeconfig 
INFO Running process: Cluster API with args [-v=2 --metrics-bind-addr=0 --

Actual results:

    controllers will leak and continue to run. They can be viewed with ps or pgrep

You may also see INFO Shutting down local Cluster API control plane... 
That means the Shutdown started but did not complete.

Expected results:

    The installer should shutdown gracefully and not leak processes, such as:

^CWARNING Received interrupt signal                    
INFO Shutting down local Cluster API control plane... 
INFO Stopped controller: Cluster API              
INFO Stopped controller: aws infrastructure provider 
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to create infrastructure manifest: Post "https://127.0.0.1:41441/apis/infrastructure.cluster.x-k8s.io/v1beta2/awsclustercontrolleridentities": unexpected EOF 
INFO Local Cluster API system has completed operations

Additional info:

https://github.com/openshift/installer/pull/8063

Bug OCPBUGS-30968: ibmcloud KMS: enable KMS v2

View the Description View the linked PRs

Description of problem:

   Enable KMS v2 in the ibmcloud KMS provider

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3760

Bug OCPBUGS-35052: hypershift create iam cli-role missing Tag permissions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34987~~. The following is the description of the original issue:
—
Description of problem:

When create hostedcluster with -role-arn, --sts-credsfails failed

Version-Release number of selected component (if applicable):

4.16 
4.17

How reproducible:

100%

Steps to Reproduce:

    1.  hypershift-no-cgo create iam cli-role    
    2.  aws sts get-session-token --output json
    3.  hcp create cluster aws --role-arn xxx --sts-creds xxx

Actual results:

2024-06-06T04:34:39Z	ERROR	Failed to create cluster	{"error": "failed to create iam: AccessDenied: User: arn:aws:sts::301721915996:assumed-role/6cd90f28a6449141869b/cli-create-iam is not authorized to perform: iam:TagOpenIDConnectProvider on resource: arn:aws:iam::301721915996:oidc-provider/hypershift-ci-oidc.s3.us-east-1.amazonaws.com/6cd90f28a6449141869b because no identity-based policy allows the iam:TagOpenIDConnectProvider action\n\tstatus code: 403, request id: 20e16ec4-b9a1-4fa4-aa34-1344145d41fd"}
github.com/openshift/hypershift/product-cli/cmd/cluster/aws.NewCreateCommand.func1
	/remote-source/app/product-cli/cmd/cluster/aws/create.go:60
github.com/spf13/cobra.(*Command).execute
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1115
github.com/spf13/cobra.(*Command).Execute
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1039
github.com/spf13/cobra.(*Command).ExecuteContext
	/remote-source/app/vendor/github.com/spf13/cobra/command.go:1032
main.main
	/remote-source/app/product-cli/main.go:60
runtime.main
	/usr/lib/golang/src/runtime/proc.go:271
Error: failed to create iam: AccessDenied: User: arn:aws:sts::301721915996:assumed-role/6cd90f28a6449141869b/cli-create-iam is not authorized to perform: iam:TagOpenIDConnectProvider on resource: arn:aws:iam::301721915996:oidc-provider/hypershift-ci-oidc.s3.us-east-1.amazonaws.com/6cd90f28a6449141869b because no identity-based policy allows the iam:TagOpenIDConnectProvider action
	status code: 403, request id: 20e16ec4-b9a1-4fa4-aa34-1344145d41fd
failed to create iam: AccessDenied: User: arn:aws:sts::301721915996:assumed-role/6cd90f28a6449141869b/cli-create-iam is not authorized to perform: iam:TagOpenIDConnectProvider on resource: arn:aws:iam::301721915996:oidc-provider/hypershift-ci-oidc.s3.us-east-1.amazonaws.com/6cd90f28a6449141869b because no identity-based policy allows the iam:TagOpenIDConnectProvider action
	status code: 403, request id: 20e16ec4-b9a1-4fa4-aa34-1344145d41fd
{"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:84","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.internalRun","level":"error","msg":"Error executing test process","severity":"error","time":"2024-06-06T04:34:39Z"}
error: failed to execute wrapped command: exit status 1

Expected results:

    create hostedcluster successful

Additional info:
Full Logs: https://docs.google.com/document/d/1AnvAHXPfPYtP6KRcAKOebAx1wXjhWMOn3TW604XK09o/edit
The same command can be successfully created the second time

https://github.com/openshift/hypershift/pull/4173

Bug OCPBUGS-25777: Update 4.16 ose-azure-workload-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-workload-identity/pull/11

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-workload-identity/pull/11

Bug OCPBUGS-31860: Drop python-cinder from ppc64le tests image

View the Description View the linked PRs

Description of problem:

Currently, the openshift-enterprise-tests image depends on the openstack repository on the x86_64 and ppc64le repositories. The package python-cinder gets installed, to allow the [openstack end-to-end tests|https://github.com/openshift/release/blob/60fed3474509bff9c5585a736554739e8ec4f017/ci-operator/step-registry/openstack/test/e2e/openstack-test-e2e-chain.yaml#L5] to [run|https://github.com/openshift/openstack-test/]. 

The python-cinder package is not made available for rhel9 on ppc64le. To move the tests image to rhel9, OCP probably should follow upstream's decision to not support ppc64le.

https://github.com/openshift/origin/pull/28692

Bug OCPBUGS-24878: Update 4.16 openshift-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/112

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/112

Bug OCPBUGS-36486: Pod stuck in creating state

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33005~~. The following is the description of the original issue:
—
Description of problem:

    Pod stuck in creating state when running performance benchmark

The exact error when describing the pod -
Events:
  Type     Reason                  Age                    From     Message
  ----     ------                  ----                   ----     -------
  Warning  FailedCreatePodSandBox  45s (x114 over 3h47m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_client-1-5c978b7665-n4tds_cluster-density-v2-35_f57d8281-5a79-4c91-9b83-bb3e4b553597_0(5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564): error adding pod cluster-density-v2-35_client-1-5c978b7665-n4tds to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&\{ContainerID:5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564 Netns:/var/run/netns/e06c9af7-c13d-426f-9a00-73c54441a20b IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=cluster-density-v2-35;K8S_POD_NAME=client-1-5c978b7665-n4tds;K8S_POD_INFRA_CONTAINER_ID=5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564;K8S_POD_UID=f57d8281-5a79-4c91-9b83-bb3e4b553597 Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564" Netns:"/var/run/netns/e06c9af7-c13d-426f-9a00-73c54441a20b" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=cluster-density-v2-35;K8S_POD_NAME=client-1-5c978b7665-n4tds;K8S_POD_INFRA_CONTAINER_ID=5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564;K8S_POD_UID=f57d8281-5a79-4c91-9b83-bb3e4b553597" Path:"" ERRORED: error configuring pod [cluster-density-v2-35/client-1-5c978b7665-n4tds] networking: [cluster-density-v2-35/client-1-5c978b7665-n4tds/f57d8281-5a79-4c91-9b83-bb3e4b553597:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[cluster-density-v2-35/client-1-5c978b7665-n4tds 5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564 network default NAD default] [cluster-density-v2-35/client-1-5c978b7665-n4tds 5a8d6897ca792d91f1c52054f5f8c596530fbf72d3abb07b19a20fd9c95cc564 network default NAD default] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:03:f6 [10.131.3.246/23]
'
': StdinData: \{"binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}

Version-Release number of selected component (if applicable):

    4.16.0-ec.5\{code}
How reproducible:
{code:none}
    50-60%

It seems to be related to the number of times I have ran our test on a single cluster. Many of our performance tests are on ephemeral clusters - so we build the cluster, run the test, tear down. Currently I have a long lived cluster (1 week old), and I have been running many performance tests against this cluster -- serially. After each test, the previous resources are cleaned up. \{code}
Steps to Reproduce:
{code:none}
    1. Use the following cmdline as an example.
    2.  ./bin/amd64/kube-burner-ocp cluster-density-v2 --iterations 90     3. Repeat until issue arises ( usually after 3-4 attempts)./
    \{code}
Actual results:
{code:none}
    client-1-5c978b7665-n4tds    0/1     ContainerCreating   0          4h14m

Expected results:

    For the benchmark not to get stuck waiting for this pod. \{code}
Additional info:
{code:none}
    Looking at the ovnkube-controller pod logs, grepping for the pod which was stuck

oc logs -n openshift-ovn-kubernetes ovnkube-node-qpkws -c ovnkube-controller | grep client-1-5c978b7665-n4tds

W0425 13:12:09.302395    6996 base_network_controller_policy.go:545] Failed to get get LSP for pod cluster-density-v2-35/client-1-5c978b7665-n4tds NAD default for networkPolicy allow-from-openshift-ingress, err: logical port cluster-density-v2-35/client-1-5c978b7665-n4tds for pod cluster-density-v2-35_client-1-5c978b7665-n4tds not found in cache
I0425 13:12:09.302412    6996 obj_retry.go:370] Retry add failed for *factory.localPodSelector cluster-density-v2-35/client-1-5c978b7665-n4tds, will try again later: unable to get port info for pod cluster-density-v2-35/client-1-5c978b7665-n4tds NAD default
W0425 13:12:09.908446    6996 helper_linux.go:481] [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4] pod uid f57d8281-5a79-4c91-9b83-bb3e4b553597: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:03:f6 [10.131.3.246/23]
I0425 13:12:09.963651    6996 cni.go:279] [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default] ADD finished CNI request [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default], result "", err failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:03:f6 [10.131.3.246/23]
I0425 13:12:09.988397    6996 cni.go:258] [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default] DEL starting CNI request [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default]
W0425 13:12:09.996899    6996 helper_linux.go:697] Failed to delete pod "cluster-density-v2-35/client-1-5c978b7665-n4tds" interface 7f80514901cbc57: failed to lookup link 7f80514901cbc57: Link not found
I0425 13:12:10.009234    6996 cni.go:279] [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default] DEL finished CNI request [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default], result "\{\"dns\":{}}", err <nil>
I0425 13:12:10.059917    6996 cni.go:258] [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default] DEL starting CNI request [cluster-density-v2-35/client-1-5c978b7665-n4tds 7f80514901cbc57517d263f1a5aa143d2c82f470132c01f8ba813c18f3160ee4 network default NAD default]

https://github.com/openshift/ovn-kubernetes/pull/2218

Bug OCPBUGS-36137: Catalog operator pod crashed during SNO cluster installation

View the Description View the linked PRs

Description of problem:

    A customer is deploying SNO with lvms-operator being installed during cluster installation using assisted-service. One of the deployment failed with catalog-operator pod crashlooping.

NAME                                      READY   STATUS             RESTARTS   AGE
catalog-operator-db9dff494-pqb68          0/1     CrashLoopBackOff   56         4h

The pod logs show a panic.

$ oc logs catalog-operator-db9dff494-pqb68 -n openshift-operator-lifecycle-manager2024-05-16T13:24:46.709156999Z time="2024-05-16T13:24:46Z" level=info msg="log level info"2024-05-16T13:24:46.709232085Z time="2024-05-16T13:24:46Z" level=info msg="TLS keys set, using https for metrics"2024-05-16T13:24:46.709736948Z W0516 13:24:46.709618       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.2024-05-16T13:24:46.709855179Z time="2024-05-16T13:24:46Z" level=info msg="Using in-cluster kube client config"2024-05-16T13:24:46.710165923Z time="2024-05-16T13:24:46Z" level=info msg="Using in-cluster kube client config"2024-05-16T13:24:46.710274657Z W0516 13:24:46.710268       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.2024-05-16T13:24:46.711960302Z W0516 13:24:46.711831       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="connection established. cluster-version: v1.27.12+7bee54d"2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="operator ready"2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="starting informers..."2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="informers started"2024-05-16T13:24:46.720943025Z time="2024-05-16T13:24:46Z" level=info msg="waiting for caches to sync..."2024-05-16T13:24:46.921220918Z time="2024-05-16T13:24:46Z" level=info msg="starting workers..."2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="connection established. cluster-version: v1.27.12+7bee54d"2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="operator ready"2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="starting informers..."2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="informers started"2024-05-16T13:24:46.921869716Z time="2024-05-16T13:24:46Z" level=info msg="waiting for caches to sync..."2024-05-16T13:24:46.922300604Z time="2024-05-16T13:24:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=2024-05-16T13:24:47.022696884Z time="2024-05-16T13:24:47Z" level=info msg="starting workers..."2024-05-16T13:24:59.544398366Z panic: runtime error: invalid memory address or nil pointer dereference2024-05-16T13:24:59.544398366Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1d761e6]2024-05-16T13:24:59.544398366Z 2024-05-16T13:24:59.544398366Z goroutine 469 [running]:2024-05-16T13:24:59.544398366Z github.com/operator-framework/operator-lifecycle-manager/pkg/controller/bundle.sortUnpackJobs.func1(0xc002bdca20?, 0x0?)2024-05-16T13:24:59.544398366Z     /build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/bundle/bundle_unpacker.go:844 +0xc62024-05-16T13:24:59.544398366Z sort.insertionSort_func({0xc002b7cfb0?, 0xc0029fffe0?}, 0x0, 0x3)2024-05-16T13:24:59.544398366Z     /usr/lib/golang/src/sort/zsortfunc.go:12 +0xb12024-05-16T13:24:59.544398366Z sort.pdqsort_func({0xc002b7cfb0?, 0xc0029fffe0?}, 0x7f07987eab38?, 0x18?, 0xc001e80000?)2024-05-16T13:24:59.544398366Z     /usr/lib/golang/src/sort/zsortfunc.go:73 +0x2dd

Version-Release number of selected component (if applicable):

    4.14.22

How reproducible:

    Only sometimes

Steps to Reproduce:

    1. SNO cluster deployment using assisted service
    2. Provide lvms-operator sub, operatorgroup and namespace yamls during installation
    3. The pod crashed once the node booted after ignition

Actual results:

Pod crashed with panic

Expected results:

The pod should be running

Additional info:

https://github.com/openshift/operator-framework-olm/pull/799

Bug OCPBUGS-34924: Update owners file of multus repo

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34911~~. The following is the description of the original issue:
—
We need to add more people to the owners file of multus repo.

https://github.com/openshift/multus-cni/pull/239

Bug OCPBUGS-35835: [4.16] Multicast packets got 100% loss

View the Description View the linked PRs

Description of problem:
Multicast packets got 100% dropped

Version-Release number of selected component (if applicable):
4.16.0-0.nightly-2024-06-02-202327
How reproducible:
Always

Steps to Reproduce:

1. Create a test namespace and enable multicast

oc  describe ns test   
Name:         test
Labels:       kubernetes.io/metadata.name=test
              pod-security.kubernetes.io/audit=restricted
              pod-security.kubernetes.io/audit-version=v1.24
              pod-security.kubernetes.io/enforce=restricted
              pod-security.kubernetes.io/enforce-version=v1.24
              pod-security.kubernetes.io/warn=restricted
              pod-security.kubernetes.io/warn-version=v1.24
Annotations:  k8s.ovn.org/multicast-enabled: true
              openshift.io/sa.scc.mcs: s0:c28,c27
              openshift.io/sa.scc.supplemental-groups: 1000810000/10000
              openshift.io/sa.scc.uid-range: 1000810000/10000
Status:       Active

No resource quota.

No LimitRange resource.

2. Created multicast pods

% oc get pods -n test -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
mcast-rc-67897   1/1     Running   0          10s   10.129.2.42   ip-10-0-86-58.us-east-2.compute.internal    <none>           <none>
mcast-rc-ftsq8   1/1     Running   0          10s   10.128.2.61   ip-10-0-33-247.us-east-2.compute.internal   <none>           <none>
mcast-rc-q48db   1/1     Running   0          10s   10.131.0.27   ip-10-0-1-176.us-east-2.compute.internal    <none>           <none>

3. Test mulicast traffic with omping from two pods

% oc rsh -n test mcast-rc-67897  
~ $ 
~ $ omping -c10 10.129.2.42 10.128.2.61 
10.128.2.61 : waiting for response msg
10.128.2.61 : joined (S,G) = (*, 232.43.211.234), pinging
10.128.2.61 :   unicast, seq=1, size=69 bytes, dist=2, time=0.506ms
10.128.2.61 :   unicast, seq=2, size=69 bytes, dist=2, time=0.595ms
10.128.2.61 :   unicast, seq=3, size=69 bytes, dist=2, time=0.555ms
10.128.2.61 :   unicast, seq=4, size=69 bytes, dist=2, time=0.572ms
10.128.2.61 :   unicast, seq=5, size=69 bytes, dist=2, time=0.614ms
10.128.2.61 :   unicast, seq=6, size=69 bytes, dist=2, time=0.653ms
10.128.2.61 :   unicast, seq=7, size=69 bytes, dist=2, time=0.611ms
10.128.2.61 :   unicast, seq=8, size=69 bytes, dist=2, time=0.594ms
10.128.2.61 :   unicast, seq=9, size=69 bytes, dist=2, time=0.603ms
10.128.2.61 :   unicast, seq=10, size=69 bytes, dist=2, time=0.687ms
10.128.2.61 : given amount of query messages was sent

10.128.2.61 :   unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.506/0.599/0.687/0.050
10.128.2.61 : multicast, xmt/rcv/%loss = 10/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

% oc rsh -n test mcast-rc-ftsq8
~ $ omping -c10 10.128.2.61  10.129.2.42
10.129.2.42 : waiting for response msg
10.129.2.42 : waiting for response msg
10.129.2.42 : waiting for response msg
10.129.2.42 : waiting for response msg
10.129.2.42 : joined (S,G) = (*, 232.43.211.234), pinging
10.129.2.42 :   unicast, seq=1, size=69 bytes, dist=2, time=0.463ms
10.129.2.42 :   unicast, seq=2, size=69 bytes, dist=2, time=0.578ms
10.129.2.42 :   unicast, seq=3, size=69 bytes, dist=2, time=0.632ms
10.129.2.42 :   unicast, seq=4, size=69 bytes, dist=2, time=0.652ms
10.129.2.42 :   unicast, seq=5, size=69 bytes, dist=2, time=0.635ms
10.129.2.42 :   unicast, seq=6, size=69 bytes, dist=2, time=0.626ms
10.129.2.42 :   unicast, seq=7, size=69 bytes, dist=2, time=0.597ms
10.129.2.42 :   unicast, seq=8, size=69 bytes, dist=2, time=0.618ms
10.129.2.42 :   unicast, seq=9, size=69 bytes, dist=2, time=0.964ms
10.129.2.42 :   unicast, seq=10, size=69 bytes, dist=2, time=0.619ms
10.129.2.42 : given amount of query messages was sent

10.129.2.42 :   unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.463/0.638/0.964/0.126
10.129.2.42 : multicast, xmt/rcv/%loss = 10/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

Actual results:
Mulicast packets loss is 100%
10.129.2.42 : multicast, xmt/rcv/%loss = 10/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

Expected results:
Should no 100% packet loss.

Additional info:
No such issue in 4.15, tested on same profile ipi-on-aws/versioned-installer-ci with 4.15.0-0.nightly-2024-05-31-131420, same operation with above steps.

The output for both mulicast pods:

10.131.0.27 :   unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 1.176/1.239/1.269/0.027
10.131.0.27 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 1.227/1.304/1.755/0.170
and 
10.129.2.16 :   unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 1.101/1.264/1.321/0.065
10.129.2.16 : multicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 1.230/1.351/1.890/0.191

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/ovn-kubernetes/pull/2209

Bug OCPBUGS-33388: ART requests updates to 4.16 image ose-network-tools-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/network-tools/pull/125

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/network-tools/pull/125

Bug OCPBUGS-41256: Configure-ovs doesn't persist ethtool configuration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41255~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39438. The following is the description of the original issue:
—
Description of problem: If a customer applies ethtool configuration to the interface used in br-ex, that configuration will be dropped when br-ex is created. We need to read and apply the configuration from the interface to the phys0 connection profile, as described in https://issues.redhat.com/browse/RHEL-56741?focusedId=25465040&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-25465040

Version-Release number of selected component (if applicable): 4.16

How reproducible: Always

Steps to Reproduce:

1. Deploy a cluster with an NMState config that sets the ethtool.feature.esp-tx-csum-hw-offload field to "off"

2.

3.

Actual results: The ethtool setting is only applied to the interface profile which is disabled after configure-ovs runs

Expected results: The ethtool setting is present on the configure-ovs-created profile

Additional info:

Affected Platforms: VSphere. Probably baremetal too and possibly others.

https://github.com/openshift/machine-config-operator/pull/4627

Bug OCPBUGS-25725: ManagedBootImages: failed to fetch architecture type of machineset no linked machine found

View the Description View the linked PRs

Description of problem:

When I test PR https://github.com/openshift/machine-config-operator/pull/4083, there is no machineset does not have any machine linked. 

$ oc get machineset/rioliu-1220c-bz2gp-worker-f -n openshift-machine-api
NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
rioliu-1220c-bz2gp-worker-f   0         0                             3h47m

Many errors found in MCD log like below

I1220 09:15:59.743704       1 machine_set_boot_image_controller.go:211] Error syncing machineset openshift-machine-api/rioliu-1220c-bz2gp-worker-f: failed to fetch architecture type of machineset rioliu-1220c-bz2gp-worker-f, err: could not find any machines linked to machineset, error: %!w(<nil>)

the machineset patch is skipped in reconcile loop due to above error, boot image info cannot be patched even it does not have any machine provisioned.

Version-Release number of selected component (if applicable):

How reproducible:

Consistently

Steps to Reproduce:

https://github.com/openshift/machine-config-operator/pull/4083#issuecomment-1864226629

Actual results:

the machineset is skipped in reconcile loop due to above error, boot image info cannot be patched

Expected results:

the machineset should be updated even no linked machine found, because maybe it is scaled down to 0 replica

Additional info:

https://github.com/openshift/machine-config-operator/pull/4088

Bug OCPBUGS-7714: Holding lock during slow payload verification/download interferes with reconciling manifests

View the Description View the linked PRs

Description of problem:

After the fix for ~~OCPBUGSM-44759~~, we put timeouts on payload retrieval operations (verification and download); previously they were uncapped and under certain network circumstances could take hours to terminate. Testing the fix uncovered a problem where, after CVO passes the path with the timeouts, CVO starts logging errors for the core manifest reconciliation loop:

I0208 11:22:57.107819       1 sync_worker.go:993] Running sync for role "openshift-marketplace/marketplace-operator" (648 of 834)
I0208 11:22:57.107887       1 task_graph.go:474] Canceled worker 1 while waiting for work
I0208 11:22:57.107900       1 sync_worker.go:1013] Done syncing for configmap "openshift-apiserver-operator/trusted-ca-bundle" (444 of 834)
I0208 11:22:57.107911       1 task_graph.go:474] Canceled worker 0 while waiting for work                                                                                                                                                                                                                              
I0208 11:22:57.107918       1 task_graph.go:523] Workers finished
I0208 11:22:57.107925       1 task_graph.go:546] Result of work: [update context deadline exceeded at 8 of 834 Could not update role "openshift-marketplace/marketplace-operator" (648 of 834)]
I0208 11:22:57.107938       1 sync_worker.go:1169] Summarizing 1 errors
I0208 11:22:57.107947       1 sync_worker.go:1173] Update error 648 of 834: UpdatePayloadFailed Could not update role "openshift-marketplace/marketplace-operator" (648 of 834) (context.deadlineExceededError: context deadline exceeded)
E0208 11:22:57.107966       1 sync_worker.go:654] unable to synchronize image (waiting 3m39.457405047s): Could not update role "openshift-marketplace/marketplace-operator" (648 of 834)

This is caused by locks. The SyncWorker.Update method acquires its lock for its whole duration. The payloadRetriever.RetrievePayload method is called inside SyncWorker.Update, on the following call chain:

SyncWorker.Update ->
  SyncWorker.loadUpdatedPayload ->
    SyncWorker.syncPayload ->
      payloadRetriever.RetrievePayload

RetrievePayload can take 2 or 4 minutes before it timeouts, so CVO holds the lock for this whole wait.

The manifest reconciliation loop is implemented in the apply method. The whole apply method is bounded by a timeout context set to 2*minimum reconcile interval so it will be set to a value between 4 and 8 minutes. While in the reconciling mode, the manifest graph is split into multiple "tasks" where smaller sequences of these tasks are applied in parallel. Individual tasks in these series are iterated over and each iteration uses a consistentReporter to report status via its Update method, which also acquires the lock on the following call sequence:

SyncWorker.apply ->
  { for _, task := range tasks ... ->
    consistentReporter.Update ->
      statusWrapper.Report ->
        SyncWorker.updateApplyStatus ->

This leads to the following sequence:

1. apply is called with a timeout between 4 and 8 minutes
2. in parallel, SyncWorker.Update starts and acquires the lock
3. tasks under apply wait on the reporter to acquire lock
4. after 2 or 4 minutes RetrievePayload under SyncWorker.Update timeout and terminate, SyncWorker.Update terminates and releases the lock
5. tasks under apply report results after briefly acquiring the lock, start to do their thing
6. in parallel, SyncWorker.Update starts again and acquires the lock
7. further iterations over tasks under apply wait on the reporter to acquire lock
8. context passed to apply times out
9. Canceled worker 0 while waiting for work... errors

Version-Release number of selected component (if applicable):

4.13.0-0.ci.test-2023-02-06-062603 with https://github.com/openshift/cluster-version-operator/pull/896

How reproducible:

always in certain cluster configuration

Steps to Reproduce:

1. in a disconnected cluster, upgrade to an unrechachable payload image with --force
2. observe the CVO log

Actual results:

CVO starts to fail reconciling manifests

Expected results:

no failures, cluster continues to try retrieving the image but no interference with manifest reconciliation

Additional info:

This problem was discovered by Evgeni Vakhonin while testing fix for ~~OCPBUGSM-44759~~: https://bugzilla.redhat.com/show_bug.cgi?id=2090680#c22

https://github.com/openshift/cluster-version-operator/pull/896 uncovers this issue but still gets CVO into a better shape - previously the RetrievePayload could be running for a much longer time (hours), preventing the CVO from working at all.

When the cluster gets into this buggy state, the solution is to abort the upgrade that fails to verify or download.

Bug OCPBUGS-31249: etcd-health-probe.log need to be deprecated on control plane node

View the Description View the linked PRs

Description of problem:

/var/log/etcd/etcd-health-probe.log exist on control plane node, but we only touch it in code:
https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/pod.yaml#L26

etcd's /var/log/etcd/etcd-health-probe.log be though audit log, because there are audit log in same directory tree for apiserver and auth:
/var/log/kube-apiserver/audit-2024-03-21T04-27-49.470.log
/var/log/oauth-server/audit.log

etcd-health-probe.log will bring some misunderstanding to user
    How reproducible:always


    Steps to Reproduce:
    1. login control plane node
    2. check /var/log/etcd/etcd-health-probe.log
    3. the file size is always zero

    Actual results:

    
    Expected results:remove this file in code/don't touch this file

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1230

Bug OCPBUGS-24985: Update 4.16 ose-cluster-machine-approver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/218

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/218

Bug OCPBUGS-26134: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-ibmcloud/pull/33

Bug OCPBUGS-28969: [Custom DNS] Failed to generate coredns.yaml manifest

View the Description View the linked PRs

Description of problem:

Bootstrap process failed due to coredns.yaml manifest generation issue:

Feb 04 05:14:34 yunjiang-p2-2r2b2-bootstrap bootkube.sh[11219]: I0204 05:14:34.966343       1 bootstrap.go:188] manifests/on-prem/coredns.yaml
Feb 04 05:14:34 yunjiang-p2-2r2b2-bootstrap bootkube.sh[11219]: F0204 05:14:34.966513       1 bootstrap.go:188] error rendering bootstrap manifests: failed to execute template: template: manifests/on-prem/coredns.yaml:34:32: executing "manifests/on-prem/coredns.yaml" at <onPremPlatformAPIServerInternalIPs .ControllerConfig>: error calling onPremPlatformAPIServerInternalIPs: invalid platform for API Server Internal IP
Feb 04 05:14:35 yunjiang-p2-2r2b2-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 04 05:14:35 yunjiang-p2-2r2b2-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-03-192446
4.16.0-0.nightly-2024-02-03-221256

How reproducible:

Always

Steps to Reproduce:

    1. 1. Enable custom DNS on GCP: platform.gcp.userProvisionedDNS:Enabled and featureSet:TechPreviewNoUpgrade
    2.
    3.

Actual results:

coredns.yaml can not be generated, bootstrap failed.

Expected results:

Bootstrap process succeeds.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4165

Bug OCPBUGS-24845: Update 4.16 ose-ibm-vpc-block-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/96

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/96

Bug OCPBUGS-41677: [4.16] Install plan is unable to move forward and is stuck in Pending state when the amount of CRs is too high.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41549~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-35358. The following is the description of the original issue:
—
I'm working with the Gitops operator (1.7) and when there is a high amount of CR (38.000 applications objects in this case) the related install plan get stuck with the following error:

- lastTransitionTime: "2024-06-11T14:28:40Z"
    lastUpdateTime: "2024-06-11T14:29:42Z"
    message: 'error validating existing CRs against new CRD''s schema for "applications.argoproj.io":
      error listing resources in GroupVersionResource schema.GroupVersionResource{Group:"argoproj.io",
      Version:"v1alpha1", Resource:"applications"}: the server was unable to return
      a response in the time allotted, but may still be processing the request'

Even waiting for a long time the operator is unable to move forward not removing or reinstalling its components.

Over a lab, the issue was not present until we started to add load to the cluster (applications.argoproj.io) and when we hit 26.000 applications we were not able to upgrade or reinstall the operator anymore.

https://github.com/openshift/operator-framework-olm/pull/863

Bug OCPBUGS-27261: Environment file /etc/kubernetes/node.env is overwritten after a node restart

View the Description View the linked PRs

Description of problem:

    Environment file /etc/kubernetes/node.env is overwritten after node restart. 

There is a type in https://github.com/openshift/machine-config-operator/blob/master/templates/common/aws/files/usr-local-bin-aws-kubelet-nodename.yaml where variable should be changed to NODEENV wherever NODENV is found.

Version-Release number of selected component (if applicable):

How reproducible:

  Easy

Steps to Reproduce:

    1. Change contents of /etc/kubernetes/node.env
    2. Restart node
    3. Notice changes are lost

Actual results:

Expected results:

     /etc/kubernetes/node.env should not be changed after restart of a node

Additional info:

https://github.com/openshift/machine-config-operator/pull/4126

Bug OCPBUGS-35086: [release4.16] Insights Operator to collect the 'prometheus' and 'alertmanager' instances

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34784~~. The following is the description of the original issue:
—
~~INSIGHTOCP-1557~~ is a rule to check for any custom Prometheus instances that may impact the management of corresponding resources.

Resource to gather: Prometheus and Alertmanager in all namespaces

apiVersion: monitoring.coreos.com/v1
kind: Prometheus

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager

Backport: OCP 4.12.z; 4.13.z; 4.14.z; 4.15.z

Additional info:
1) Get the Prometheus and Alertmanager in all namespaces

$ oc get prometheus -A 
NAMESPACE              NAME                             VERSION   DESIRED   READY   RECONCILED   AVAILABLE   AGE
openshift-monitoring   k8s                              2.39.1    2         1       True         Degraded    712d
test                   custom-prometheus                          1         0       True         False       25d

$ oc get alertmanager -A 
NAMESPACE              NAME                             VERSION   DESIRED   READY   RECONCILED   AVAILABLE   AGE
openshift-monitoring   main                             2.39.1    2         1       True         Degraded    712d
test                   custom-alertmanager                        1         0       True         False       25d

https://github.com/openshift/insights-operator/pull/948

Bug OCPBUGS-38510: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-43433: [vsphere] Machine stuck in Provisioning status when machine is power off

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-1735~~. The following is the description of the original issue:
—
Description of problem:

When setting up cluster on vsphere, sometimes machine is powered off and in "Provisioning" phase, it will trigger a new machine creation, and report error "failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists"

Version-Release number of selected component (if applicable):

 4.12.0-0.ci.test-2022-09-26-235306-ci-ln-vh4qjyk-latest

How reproducible:

Sometimes, met two times

Steps to Reproduce:

1. Setup a vsphere cluster
2.
3.

Actual results:

Cluster installation failed, machine stuck in Provisioning status. 
$ oc get machine                      
NAME                             PHASE          TYPE   REGION   ZONE   AGE
jima-ipi-27-d97wp-master-0       Running                               4h
jima-ipi-27-d97wp-master-1       Running                               4h
jima-ipi-27-d97wp-master-2       Running                               4h
jima-ipi-27-d97wp-worker-7qn9b   Provisioning                          3h56m
jima-ipi-27-d97wp-worker-dsqd2   Running                               3h56m

$ oc edit machine jima-ipi-27-d97wp-worker-7qn9b
status:
  conditions:
  - lastTransitionTime: "2022-09-27T01:27:29Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-09-27T01:27:29Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-09-27T01:27:29Z"
    status: "True"
    type: Terminable
  lastUpdated: "2022-09-27T01:27:29Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastTransitionTime: "2022-09-27T01:36:09Z"
      message: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
      reason: MachineCreationSucceeded
      status: "False"
      type: MachineCreation
    taskRef: task-11363480

$ govc vm.info /SDDC-Datacenter/vm/jima-ipi-27-d97wp/jima-ipi-27-d97wp-worker-7qn9b
Name:           jima-ipi-27-d97wp-worker-7qn9b
  Path:         /SDDC-Datacenter/vm/jima-ipi-27-d97wp/jima-ipi-27-d97wp-worker-7qn9b
  UUID:         422cb686-6585-f05a-af13-b2acac3da294
  Guest name:   Red Hat Enterprise Linux 8 (64-bit)
  Memory:       16384MB
  CPU:          8 vCPU(s)
  Power state:  poweredOff
  Boot time:    <nil>
  IP address:   
  Host:         10.3.32.8

I0927 01:44:42.568599       1 session.go:91] No existing vCenter session found, creating new session
I0927 01:44:42.633672       1 session.go:141] Find template by instance uuid: 9535891b-902e-410c-b9bb-e6a57aa6b25a
I0927 01:44:42.641691       1 reconciler.go:270] jima-ipi-27-d97wp-worker-7qn9b: already exists, but was not powered on after clone, requeue
I0927 01:44:42.641726       1 controller.go:380] jima-ipi-27-d97wp-worker-7qn9b: reconciling machine triggers idempotent create
I0927 01:44:42.641732       1 actuator.go:66] jima-ipi-27-d97wp-worker-7qn9b: actuator creating machine
I0927 01:44:42.659651       1 reconciler.go:935] task: task-11363480, state: error, description-id: VirtualMachine.clone
I0927 01:44:42.659684       1 reconciler.go:951] jima-ipi-27-d97wp-worker-7qn9b: Updating provider status
E0927 01:44:42.659696       1 actuator.go:57] jima-ipi-27-d97wp-worker-7qn9b error: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
I0927 01:44:42.659762       1 machine_scope.go:101] jima-ipi-27-d97wp-worker-7qn9b: patching machine
I0927 01:44:42.660100       1 recorder.go:103] events "msg"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists." "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima-ipi-27-d97wp-worker-7qn9b","uid":"9535891b-902e-410c-b9bb-e6a57aa6b25a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"17614"} "reason"="FailedCreate" "type"="Warning"
W0927 01:44:42.688562       1 controller.go:382] jima-ipi-27-d97wp-worker-7qn9b: failed to create machine: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists.
E0927 01:44:42.688651       1 controller.go:326]  "msg"="Reconciler error" "error"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: The name 'jima-ipi-27-d97wp-worker-7qn9b' already exists." "controller"="machine-controller" "name"="jima-ipi-27-d97wp-worker-7qn9b" "namespace"="openshift-machine-api" "object"={"name":"jima-ipi-27-d97wp-worker-7qn9b","namespace":"openshift-machine-api"} "reconcileID"="d765f02c-bd54-4e6c-88a4-c578f16c7149"
...
I0927 03:18:45.118110       1 actuator.go:66] jima-ipi-27-d97wp-worker-7qn9b: actuator creating machine
E0927 03:18:45.131676       1 actuator.go:57] jima-ipi-27-d97wp-worker-7qn9b error: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created
I0927 03:18:45.131725       1 machine_scope.go:101] jima-ipi-27-d97wp-worker-7qn9b: patching machine
I0927 03:18:45.131873       1 recorder.go:103] events "msg"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima-ipi-27-d97wp-worker-7qn9b","uid":"9535891b-902e-410c-b9bb-e6a57aa6b25a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"17614"} "reason"="FailedCreate" "type"="Warning"
W0927 03:18:45.150393       1 controller.go:382] jima-ipi-27-d97wp-worker-7qn9b: failed to create machine: jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created
E0927 03:18:45.150492       1 controller.go:326]  "msg"="Reconciler error" "error"="jima-ipi-27-d97wp-worker-7qn9b: reconciler failed to Create machine: ServerFaultCode: The object 'vim.Task:task-11363480' has already been deleted or has not been completely created" "controller"="machine-controller" "name"="jima-ipi-27-d97wp-worker-7qn9b" "namespace"="openshift-machine-api" "object"={"name":"jima-ipi-27-d97wp-worker-7qn9b","namespace":"openshift-machine-api"} "reconcileID"="5d92bc1d-2f0d-4a0b-bb20-7f2c7a2cb5af"
I0927 03:18:45.150543       1 controller.go:187] jima-ipi-27-d97wp-worker-dsqd2: reconciling Machine

Expected results:

Machine is created successfully.

Additional info:

machine-controller log: http://file.rdu.redhat.com/~zhsun/machine-controller.log

https://github.com/openshift/machine-api-operator/pull/1298

Bug OCPBUGS-27094: [regression] increased etcd leader elections significantly impacting vsphere amd64 platform

View the Description View the linked PRs

Description of problem:

Based on this and this component readiness data that compares success rates for those two particular tests, we are regressing ~7-10% between the current 4.15 master and 4.14.z (iow. we made the product ~10% worse).

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1720630313664647168

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1719915053026643968

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721475601161785344

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1724202075631390720

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721927613917696000

These jobs and their failures are all caused by increased etcd leader elections disrupting seemingly unrelated test cases across the VSphere AMD64 platform.

Since this particular platform's business significance is high, I'm setting this as "Critical" severity.

Please get in touch with me or Dean West if more teams need to be pulled into investigation and mitigation.

Version-Release number of selected component (if applicable):

4.15 / master

How reproducible:

Component Readiness Board

Actual results:

The etcd leader elections are elevated. Some jobs indicate it is due to disk i/o throughput OR network overload.

Expected results:

1. We NEED to understand what is causing this problem.
2. If we can mitigate this, we should.
3. If we cannot mitigate this, we need to document this or work with VSphere infrastructure provider to fix this problem.
4. We optionally need a way to measure how often this happens in our fleet so we can evaluate how bad it is.

Additional info:

Bug OCPBUGS-33649: aws: ca-west-1 is missing quota support

View the Description View the linked PRs

Description of problem:

    The ca-west-1 region is missing from https://github.com/openshift/installer/blob/master/pkg/quota/aws/limits.go#L15

Version-Release number of selected component (if applicable):

    4.15+

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Quota checking is skipped as if it was not supported

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8406

Bug OCPBUGS-27821: pods that are HostNetworked on nodes using routingViaHost:true ipForwarding: global cannot route to default kubernetes service IP

View the Description View the linked PRs

Description of problem:

We're seeing that on two baremetal nodes where `routingViaHost=true` is enabled (with ipForwarding set properly as Global) the following problem:

They have set NodeIP Hint to force OVN to bind to a secondary interface at `bond1.2039 `

we're seeing that specific pods that are hostNetworked can't reach the default kubernetes service IP address; and are failing to initialize as a result (CLBO).

~~~
F0120 03:20:42.221327 879146 driver.go:131] failed to get node "wb02.pdns-edtnabtf-arch01.nfvdev.tlabs.ca" information: Get "https://192.168.0.1:443/api/v1/nodes/wb02.pdns-edtnabtf-arch01.nfvdev.tlabs.ca": dial tcp 192.168.0.1:443: i/o timeout
~~~

other pods on affected node with above config can hit the target service however, pods that are hostNetworked appear to be failing:

$ oc get pod csi-rbdplugin-kpz7n -o yaml | grep hostNetwork
hostNetwork: true

Version-Release number of selected component (if applicable):

4.14

bare-metal

How reproducible:

new cluster, every time

Steps to Reproduce:

We have redeployed the cluster. and have
routingViaHost and ipForwarding both enabled.

We also pushed out a NODEIP_HINT configuraiton to all the nodes to make sure SDN is overlayed on the correct interface.

Default gateway has been moved to bond1.2039on the 2 x baremetal worker nodes.

wb01

wb02

observe that hostNetworked pods crashloop backoff

Actual results:

hostnetworked pods cannot call the default kube service address

Expected results:

hostnetworked pods should be able to do so.

Additional info:

See the first comment for data samples + must-gathers + sosreports

https://github.com/openshift/ovn-kubernetes/pull/2093

Bug OCPBUGS-37771: e2e failures: [sig-node] Managed cluster should verify that nodes have no unexpected reboots

View the Description View the linked PRs

Description of problem:

compact agent e2e jobs are consistently failing the e2e test (when they manage to install):

 [sig-node] Managed cluster should verify that nodes have no unexpected reboots [Late] [Suite:openshift/conformance/parallel]

Examining CI search I noticed that this failure is also occurring on many other jobs:

https://search.dptools.openshift.org/?search=Managed+cluster+should+verify+that+nodes+have+no+unexpected+reboots&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

In CI search, we can see failures in 4.15-4.17

How reproducible:

See CI search results

Steps to Reproduce:

1.
2.
3.

Actual results:

fail e2e

Expected results:

Pass

Additional info:

https://github.com/openshift/origin/pull/28959

Bug OCPBUGS-29350: Agent CI jobs failing due console/authentication operator degraded

View the Description View the linked PRs

Description of problem:

    Agent CI jobs started to fail on a pretty regular basis, especially the compact ones.
Jobs time out due either the console or authentication operators remaining in a degraded state.
From the logs analysis, the are not able to get a route. Both apiserver and etcd component logs report connection refused messages, possibly indicating an underlying network problem

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Pretty frequently

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2089

Bug OCPBUGS-30046: Missing dependency warning error in console UI dev env

View the Description View the linked PRs

Description of problem Missing `useEffect` hook dependency warning error in console UI dev env

    Version-Release number of selected component (if applicable): 4.15.0-0.ci.test-2024-02-28-133709-ci-ln-svyfg32-latest

How reproducible:

To reproduce:

Run `yarn lint --fix`

https://github.com/openshift/console/pull/13640

Bug OCPBUGS-31031: Watcher channel closes after some time

View the Description View the linked PRs

Description of problem:

In this PR, we started using watcher channels to wait for the job finished event from the periodic and on-demand data gathering jobs from IO.
However, as stated in this comment, part of maintaining a watcher is to re-establish it at the last received resource version whenever this channel closes.

This issue is currently causing flakiness in our test suite as the on-demand data gathering job is created, when the job is about to finish, the watcher channel closes, which is causing the datagather instance associated with the job to never have the insightsReport updated. Therefore the tests fail.

Version-Release number of selected component (if applicable):

How reproducible:

Sometimes. Very hard to reproduce as it might have to do with the API resyncing the watcher's cache .

Steps to Reproduce:

    1.Create a data gathering job
    2.You may see a log saying "watcher channel was closed unexpectedly"

Actual results:

The DataGather instance will not be updated with the insightsReport

Expected results:

When the job finishes, the archive is uploaded to ingress and the report is downloaded from the external data pipeline. This report should appear in the DataGather instance.

Additional info:

It's possible but flaky to reproduce with on-demand data gathering jobs but I've seen it happen with periodic ones as well.

https://github.com/openshift/insights-operator/pull/921

Bug OCPBUGS-31419: slow ovnkube-node initialization on large number of services with externalIps

View the Description View the linked PRs

on clusters with a large number of services with externalIPs or services from type loadBalancer the ovnkube-node initialization can take up to 50 min

The problem is after a node reboot done by MCO the unschedule taint is removed from the node so the api allocates pods to that node that get stuck on ContrainerCreating and other nodes continue to go down for reboot making the workloads unavailable. (if no PDB exists for the workload to protect it)

Bug OCPBUGS-27865: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-vsphere/pull/63

Bug OCPBUGS-32931: whereabout-cni add .snyk file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/whereabouts-cni/pull/254

Bug OCPBUGS-36489: [4.16] Should run health checks in parallel to avoid spurious Available=False EtcdMembers_NoQuorum claims

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36301~~. The following is the description of the original issue:
—

Description of problem

Seen in a 4.16.1 CI run:

: [bz-Etcd] clusteroperator/etcd should not change condition/Available expand_less	1h28m39s
{  2 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:

Jun 27 14:17:18.966 E clusteroperator/etcd condition/Available reason/EtcdMembers_NoQuorum status/False EtcdMembersAvailable: 1 of 3 members are available, ip-10-0-71-113.us-west-1.compute.internal is unhealthy, ip-10-0-58-93.us-west-1.compute.internal is unhealthy
Jun 27 14:17:18.966 - 75s   E clusteroperator/etcd condition/Available reason/EtcdMembers_NoQuorum status/False EtcdMembersAvailable: 1 of 3 members are available, ip-10-0-71-113.us-west-1.compute.internal is unhealthy, ip-10-0-58-93.us-west-1.compute.internal is unhealthy

But further digging turned up no sign that quorum had had any difficulties. It seems like the difficulty was the GetMemberHealth structure, which currently allows timelines like:

T0, start probing all known members in GetMemberHealth
Tsmall, MemberA Healthy:true Took:41.614949ms Error:<nil>
Talmost-30s, MemberB Healthy:false Took:29.869420582s Error:health check failed: context deadline exceeded
T30s, DefaultClientTimeout runs out.
T30s, MemberC Healthy:false Took:27.199µs Error:health check failed: context deadline exceeded
TB, next probe round rolls around, start probing all known members in GetMemberHealth.
TBsmall, MemberA Healthy:true Took:...ms Error:<nil>
TB+30s, MemberB Healthy:false Took:29....s Error:health check failed: context deadline exceeded
TB+30s, DefaultClientTimeout runs out.
TB+30s, MemberC Healthy:false Took:...µs Error:health check failed: context deadline exceeded

That can leave 30+s gaps of nominal Healthy:false for MemberC when in fact MemberC was completely fine.
I suspect that the "was really short" Took:27.199µs got a "took too long" context deadline exceeded because GetMemberHealth has a 30s timeout per member, while many (all?) of its callers have a 30s DefaultClientTimeout. Which means by the time we get to MemberC, we've already spend our Context and we're starved of time to actually check MemberC. It may be more reliable to refactor and probe all known members in parallel, and to keep probing in the event of failures while you wait for the slowest member-probe to get back to you, because I suspect a re-probe of MemberC (or even a single probe that was granted reasonable time to complete) while we waited on MemberB would have succeeded and told us MemberC was actually fine.

Exposure is manageable, because this is self-healing, and quorum is actually ok. But still worth fixing because it spooks admins (and the origin CI test suite) if you tell them you're Available=False, and we want to save that for situations where the component is actually having trouble like quorum loss, and not burn signal-to-noise by claiming EtcdMembers_NoQuorum when it's really BriefIssuesScrapingMemberAHealthAndWeWillllTryAgainSoon.

Version-Release number of selected component

Seen in 4.16.1, but the code is old, so likely a longstanding issue.

How reproducible

Luckily for customers, but unluckily for QE, network or whatever hiccups when connecting to members seem rare, so we don't trip the condition that exposes this issue often.

Steps to Reproduce

1. Figure out which order etcd is probing members in.
2. Stop the first or second member, in a way that makes its health probes time out ~30s.
3. Monitor the etcd ClusterOperator Available condition.

Actual results

Available goes False claiming EtcdMembers_NoQuorum, as the operator starves itself of the time it needs to actually probe the third member.

Expected results

Available stays True, as the etcd operator take the full 30s to check on all members, and see that two of them are completely happy.

https://github.com/openshift/cluster-etcd-operator/pull/1290

Bug OCPBUGS-42168: Bump to kubernetes 1.29.9

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.29.9: Changelog: v1.29.9: https://github.com/kubernetes/kubernetes/blob/release-1.29/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1298

https://github.com/openshift/kubernetes/pull/2090

Bug OCPBUGS-29389: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2089

Bug OCPBUGS-24036: [4.15] CNO fails to apply ovnkube-master daemonset during upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22293~~. The following is the description of the original issue:
—
Description of problem:

Upgrading from 4.13.5 to 4.13.17 fails at network operator upgrade

Version-Release number of selected component (if applicable):

How reproducible:

Not sure since we only had one cluster on 4.13.5.

Steps to Reproduce:

1. Have a cluster on version 4.13.5 witn ovn kubernetes
2. Set desired update image to quay.io/openshift-release-dev/ocp-release@sha256:c1f2fa2170c02869484a4e049132128e216a363634d38abf292eef181e93b692
3. Wait until it reaches network operator

Actual results:

Error message: Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: DaemonSet.apps "ovnkube-master" is invalid: [spec.template.spec.containers[1].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[3].lifecycle.preStop: Required value: must specify a handler type]

Expected results:

Network operator upgrades successfully

Additional info:

Since I'm not able to attach files please gather all required debug data from https://access.redhat.com/support/cases/#/case/03645170

https://github.com/openshift/cluster-network-operator/pull/2114

Bug OCPBUGS-24818: Update 4.16 kube-rbac-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/88

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-rbac-proxy/pull/88

Bug OCPBUGS-29305: IPSec - ovn-ipsec-containerized ds typo

View the Description View the linked PRs

Description of problem:

There's a typo in the openssl commands within the ovn-ipsec-containerized/ovn-ipsec-host daemonsets. The correct parameter is "-checkend", not "-checkedn".

Version-Release number of selected component (if applicable):

# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.10   True        False         7s      Cluster version is 4.14.10

How reproducible:

Steps to Reproduce:

1. Enable IPsec encryption

# oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec": 
 {"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'

Actual results:

Examining the initContainer (ovn-keys) logs

# oc logs ovn-ipsec-containerized-7bcd2 -c ovn-keys
...
+ openssl x509 -noout -dates -checkedn 15770000 -in /etc/openvswitch/keys/ipsec-cert.pem
x509: Use -help for summary.

# oc get ds
NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
ovn-ipsec-containerized   1         1         0       1            0           beta.kubernetes.io/os=linux   159m
ovn-ipsec-host            1         1         1       1            1           beta.kubernetes.io/os=linux   159m
ovnkube-node              1         1         1       1            1           beta.kubernetes.io/os=linux   3h44m

# oc get ds ovn-ipsec-containerized -o yaml | grep edn
if ! openssl x509 -noout -dates -checkedn 15770000 -in $cert_pem; then     

# oc get ds ovn-ipsec-host -o yaml | grep edn
if ! openssl x509 -noout -dates -checkedn 15770000 -in $cert_pem; then

https://github.com/openshift/cluster-network-operator/pull/2269

Bug OCPBUGS-22721: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4102

Bug OCPBUGS-29613: Invalid CN name is not bubbled up in the CSR

View the Description View the linked PRs

Description of problem:

   Invalid CN is not bubbled up in the CR

Version-Release number of selected component (if applicable):

    4.15.0-rc7

How reproducible:

    always

Steps to Reproduce:

# generate a key with invalid CN
openssl genrsa -out myuser4.key 2048
openssl req -new -key myuser4.key -out myuser4.csr -subj "/CN=baduser/O=system:masters"
# get cert in the CSR
# apply the CSR
# Status remains in Accepted, but it is not Issued
% oc get csr | grep 29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr
29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr   4m29s   hypershift.openshift.io/ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1.customer-break-glass   system:admin                                                                60m                 Approved
# No status in the CSR status:
  conditions:
  - lastTransitionTime: "2024-02-16T14:06:41Z"
    lastUpdateTime: "2024-02-16T14:06:41Z"
    message: The requisite approval resource exists.
    reason: ApprovalPresent
    status: "True"
    type: Approved
# pki controller shows the error
 oc logs control-plane-pki-operator-bf6d75d5f-h95rf -n ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1 | grep "29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr"
I0216 14:06:41.842414       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1", Name:"control-plane-pki-operator", UID:"b63dbaa9-18f7-4ee6-8473-8a38bdb6f2df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CertificateSigningRequestApproved' "29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr" in is approved
I0216 14:06:41.848623       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"ocm-integration-29ecg6n5bkugrh6io4his24ser3bt16n-ad-int1", Name:"control-plane-pki-operator", UID:"b63dbaa9-18f7-4ee6-8473-8a38bdb6f2df", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'CertificateSigningRequestInvalid' "29ecg6n5bkugrh6io4his24ser3bt16n-5-customer-break-glass-csr" is invalid: invalid certificate request: subject CommonName must begin with "system:customer-break-glass:"

Actual results:

Expected results:

    status in the CR show failed and the error

Additional info:

https://github.com/openshift/hypershift/pull/3612

Bug OCPBUGS-30319: ServiceAccounts can no longer be used as OAuth2 clients

View the Description View the linked PRs

Description of problem:

    OAuth-Proxy breaks when it's using Service Account as an oauth-client as documented in https://docs.openshift.com/container-platform/4.15/authentication/using-service-accounts-as-oauth-client.html

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    100%

Steps to Reproduce:

    1. install an OCP cluster without the ImageRegistry capability
    2. deploy an oauth-proxy that uses an SA as its OAuth2 client
    3. try to login to the oauth-proxy using valid credentials

Actual results:

    The login fails, the oauth-server logs:

2024-02-05T13:30:56.059910994Z E0205 13:30:56.059873       1 osinserver.go:91] internal error: system:serviceaccount:my-namespace:my-sa has no tokens

Expected results:

    The login succeeds

Additional info:

Bug OCPBUGS-31276: ART requests updates to 4.16 image ose-kubevirt-csi-driver-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubevirt-csi-driver/pull/37

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubevirt-csi-driver/pull/37

Bug OCPBUGS-32030: multus-networkpolicy Upstream sync 202404

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/multus-networkpolicy/pull/52

Bug OCPBUGS-24723: Update 4.16 openshift-enterprise-base-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/154

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/154

Bug OCPBUGS-35151: CAPI installer fails to get ip addresses for bootstrap gather

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34295~~. The following is the description of the original issue:
—
Description of problem:

Gathering bootstrap log bundles has been failing in CI with:

     level=error msg=Attempted to gather debug logs after installation failure: must provide bootstrap host address

Example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-installer-8427-ci-4.17-e2e-aws-ovn/1792972823245885440

Version-Release number of selected component (if applicable):

How reproducible:

    not. this is a race condition when serializing the machine manifests to disk

Steps to Reproduce:

    can't reproduce. need to verify in ci.

Actual results:

can't pull bootstrap log bundle

Expected results:

    grabs bootstrap log bundle

Additional info:

https://github.com/openshift/installer/pull/8554

Bug OCPBUGS-33682: The setting of NTO cloud provider doesn't work

View the Description View the linked PRs

Description of problem:

The cloud provider feature of NTO doesn't work as expected

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a cloud-provider profile like as 
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: provider-aws
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=GCE Cloud provider-specific profile
      # Your tuning for GCE Cloud provider goes here.
      [sysctl]
      vm.admin_reserve_kbytes=16386
    name: provider-aws     2.
    3.

Actual results:

    the value of vm.admin_reserve_kbytes still using default value

Expected results:

    the value of vm.admin_reserve_kbytes should change to 16386

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1080

Bug OCPBUGS-25406: 4.14-fast ARO after upgrade to 4.14 new Machinesets do not get worker config

View the Description View the linked PRs

Description of problem:

On a 4.14.5-fast channel cluster in ARO after the upgrade when the customer tried to add a new node the Machine Config was not applied and the node never joined the pool. This happens for every node and can only be remediated by SRE not the customer.

Version-Release number of selected component (if applicable):

4.14.5 -candidate

How reproducible:

Every time a node is added to the cluster at version.

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade it to 4.14 along fast channel
    3. Add a node

Actual results:

 message: >-
        could not Create/Update MachineConfig: Operation cannot be fulfilled on
        machineconfigs.machineconfiguration.openshift.io
        "99-worker-generated-kubelet": the object has been modified; please
        apply your changes to the latest version and try again
      status: 'False'
      type: Failure
    - lastTransitionTime: '2023-11-29T17:44:37Z'

~~~

Expected results:

Node is created and configured correctly.

Additional info:

 MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 15 on node: "aro-cluster-REDACTED-master-0" didn't show up, waited: 4m45s

https://github.com/openshift/machine-config-operator/pull/4087

Bug OCPBUGS-27093: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-35372: Installer sometimes fails to attach the bootstrap FIP when additional networks are set

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34005~~. The following is the description of the original issue:
—
Description of problem:

Intermittent error during the installation process when enabling Cluster API (CAPI) in the install-config for OCP 4.16 tech preview IPI installation on top of OSP. The error occurs during the post-machine creation hook, specifically related to Floating IP association.

Version-Release number of selected component (if applicable):

OCP: 4.16.0-0.nightly-2024-05-16-092402 TP enabled
on top of
OSP: RHOS-17.1-RHEL-9-20240123.n.1

How reproducible:

The issue occurs intermittently, sometimes the installation succeeds, and other times it fails.

Steps to Reproduce:

    1.Install OSP
    2.Initiate OCP installation with TP and CAPI enabled
    3.Observe the installation logs of the failed installation.

Actual results:

    The installation fails intermittently with the following error message:
...
2024-05-17 23:37:51.590 | level=debug msg=E0517 23:37:29.833599  266622 controller.go:329] "Reconciler error" err="failed to create cluster accessor: error creating http client and mapper for remote cluster \"openshift-cluster-api-guests/ostest-4qrz2\": error creating client for remote cluster \"openshift-cluster-api-guests/ostest-4qrz2\": error getting rest mapping: failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://api.ostest.shiftstack.com:6443/api/v1?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="openshift-cluster-api-guests/ostest-4qrz2-master-0" namespace="openshift-cluster-api-guests" name="ostest-4qrz2-master-0" reconcileID="985ba50c-2a1d-41f6-b494-f5af7dca2e7b"
2024-05-17 23:37:51.597 | level=debug msg=E0517 23:37:39.838706  266622 controller.go:329] "Reconciler error" err="failed to create cluster accessor: error creating http client and mapper for remote cluster \"openshift-cluster-api-guests/ostest-4qrz2\": error creating client for remote cluster \"openshift-cluster-api-guests/ostest-4qrz2\": error getting rest mapping: failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://api.ostest.shiftstack.com:6443/api/v1?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="openshift-cluster-api-guests/ostest-4qrz2-master-0" namespace="openshift-cluster-api-guests" name="ostest-4qrz2-master-0" reconcileID="dfe5f138-ac8e-4790-948f-72d6c8631f21"
2024-05-17 23:37:51.603 | level=debug msg=Machine ostest-4qrz2-master-0 is ready. Phase: Provisioned
2024-05-17 23:37:51.610 | level=debug msg=Machine ostest-4qrz2-master-1 is ready. Phase: Provisioned
2024-05-17 23:37:51.615 | level=debug msg=Machine ostest-4qrz2-master-2 is ready. Phase: Provisioned
2024-05-17 23:37:51.619 | level=info msg=Control-plane machines are ready
2024-05-17 23:37:51.623 | level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed during post-machine creation hook: Resource not found: [POST https://10.46.44.159:13696/v2.0/floatingips], error message: {"NeutronError": {"type": "ExternalGatewayForFloatingIPNotFound", "message": "External network 654792e9-dead-485a-beec-f3c428ef71da is not reachable from subnet d9829374-f0de-4a41-a1c0-a2acdd4841da.  Therefore, cannot associate Port 01c518a9-5d5f-42d8-a090-6e3151e8af3f with a Floating IP.", "detail": ""}}
2024-05-17 23:37:51.629 | level=info msg=Shutting down local Cluster API control plane...
2024-05-17 23:37:51.637 | level=info msg=Stopped controller: Cluster API
2024-05-17 23:37:51.643 | level=warning msg=process cluster-api-provider-openstack exited with error: signal: killed
2024-05-17 23:37:51.653 | level=info msg=Stopped controller: openstack infrastructure provider
2024-05-17 23:37:51.659 | level=info msg=Local Cluster API system has completed operations

Expected results:

The installation should complete successfully

Additional info: CAPI is enabled by adding the following to the install-config:

featureSet: 'CustomNoUpgrade'
featureGates: ['ClusterAPIInstall=true']

https://github.com/openshift/installer/pull/8585

Bug OCPBUGS-32189: Yarn postinstall hook fails on MacOS

View the Description View the linked PRs

Description of problem:

Running yarn or yarn install on latest master branch of Console fails on MacOS

$ cd /path/to/console/frontend
$ yarn install

https://github.com/openshift/console/pull/13706#issuecomment-2051682156

$ ./scripts/check-patternfly-modules.sh && yarn prepare-husky && yarn generate
Checking \e[0;33myarn.lock\e[0m file for PatternFly module resolutions
grep: invalid option -- P
usage: grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz] [-A num] [-B num] [-C[num]]
	[-e pattern] [-f file] [--binary-files=value] [--color=when]
	[--context[=num]] [--directories=action] [--label] [--line-buffered]
	[--null] [pattern] [file ...]

https://github.com/openshift/console/pull/13757

Bug OCPBUGS-29576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/408

Bug OCPBUGS-35364: Disconnected ARO clusters fail to install on 4.14 and add new nodes on upgrade to 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35300~~. The following is the description of the original issue:
—
Description of problem:

ARO cluster fails to install with disconnected networking.
We see master nodes bootup hang on the service machine-config-daemon-pull.service. Logs from the service indicate it cannot reach the public IP of the image registry. In ARO, image registries need to go via a proxy. Dnsmasq is used to inject proxy DNS answers, but machine-config-daemon-pull is starting before ARO's dnsmasq.service starts.

Version-Release number of selected component (if applicable):

4.14.16

How reproducible:

Always

Steps to Reproduce:

For Fresh Install:
1. Create the required ARO vnet and subnets
2. Attach a route table to the subnets with a blackhole route 0.0.0.0/0
3. Create 4.14 ARO cluster with --apiserver-visibility=Private --ingress-visibility=Private --outbound-type=UserDefinedRouting

[OR]

Post Upgrade to 4.14:
1. Create a ARO 4.13 UDR.
2. ClusterUpgrade the cluster 4.13-> 4.14 , upgrade was successful
3. Create a new node (scale up), we run into the same issue.

Actual results:

For Fresh Install of 4.14:
ERROR: (InternalServerError) Deployment failed.

[OR]

Post Upgrade to 4.14:
Node doesn't come into a Ready State and Machine is stuck in Provisioned status.

Expected results:

Succeeded

Additional info:
We see in the node logs that machine-config-daemon-pull.service is unable to reach the image registry. ARO's dnsmasq was not yet started.
Previously, systemd ordering was set for ovs-configuration.service to start after (ARO's) dnsmasq.service. Perhaps that should have gone on machine-config-daemon-pull.service.
See https://issues.redhat.com/browse/OCPBUGS-25406.

https://github.com/openshift/machine-config-operator/pull/4406

Bug OCPBUGS-35446: vsphere-problem-detector - checkDataStoreWithURL fails both in newly installed and freshly upgraded 4.14 clusters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35215~~. The following is the description of the original issue:
—
Description of problem:

We're seeing [0] in two customers environments, while one of the two confirmed this issue is replicated both in the context of a freshly installed 4.14.26 cluster, as well as an upgraded cluster.
Looking at [1] and the changes since 4.13 in the vsphere-problem-detector, I see we introduced some additional vSphere permissions checks in the checkDataStoreWithURL() [2][3] function: it was initially suspected that it was due to [4], but this was backported to 4.14.26, where the customer confirms the issue persists.

[0]

$ omc -n openshift-cluster-storage-operator logs vsphere-problem-detector-operator-78cbc7fdbb-2g9mx | grep -i -e datastore.go -e E0508
2024-05-08T07:44:05.842165300Z I0508 07:44:05.839356       1 datastore.go:329] checking datastore ds:///vmfs/volumes/vsan:526390016b19d2b5-21ae3fd76fa61150/ for permissions
2024-05-08T07:44:05.842165300Z I0508 07:44:05.839504       1 datastore.go:125] CheckStorageClasses: thin-csi: storage policy openshift-storage-policy-tc01-rpdd7: unable to find datastore with URL ds:///vmfs/volumes/vsan:526390016b19d2b5-21ae3fd76fa61150/
2024-05-08T07:44:05.842165300Z I0508 07:44:05.839522       1 datastore.go:142] CheckStorageClasses checked 7 storage classes, 1 problems found
2024-05-08T07:44:05.848251057Z E0508 07:44:05.848212       1 operator.go:204] failed to run checks: StorageClass thin-csi: storage policy openshift-storage-policy-tc01-rpdd7: unable to find datastore with URL ds:///vmfs/volumes/vsan:526390016b19d2b5-21ae3fd76fa61150/
[...]

[1] https://github.com/openshift/vsphere-problem-detector/compare/release-4.13...release-4.14
[2] https://github.com/openshift/vsphere-problem-detector/blame/release-4.14/pkg/check/datastore.go#L328-L344
[3] https://github.com/openshift/vsphere-problem-detector/pull/119
[4] https://issues.redhat.com/browse/OCPBUGS-28879

https://github.com/openshift/vsphere-problem-detector/pull/160

Bug OCPBUGS-28216: Warning icon in Cluster > Overview > Activity card is clipped

View the Description View the linked PRs

See https://drive.google.com/file/d/1lsUgaaLf2YKUOEum6pGesl-ZMn7J3O08/view?usp=sharing

https://github.com/openshift/console/pull/13546

Bug OCPBUGS-33125: Make signer buildConfig rhel-flexible

View the Description View the linked PRs

Description of problem:


signing test assumes rhel8 base image with the selection of repositories. It should automatically do the right thing.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28755

Bug OCPBUGS-24190: when baselinecapabiliity set is set to None, still see SA with name `deployer-controller` being present in the cluster

View the Description View the linked PRs

When baselineCapabilitySet is set to None, still see an SA with name `deployer-controller` in the cluster.

steps to Reproduce:

=================

1. Install 4.15 cluster with baselineCapabilitySet to None

2. Run command `oc get sa -A | grep deployer`

Actual Results:

================

[knarra@knarra openshift-tests-private]$ oc get sa -A | grep deployer
openshift-infra deployer-controller 0 63m

Expected Results:

==================

No SA related to deployer should be returned

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/320

Bug OCPBUGS-27215: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2048

Bug OCPBUGS-32729: Rollback state of managed image pull secrets after downgrade.

View the Description View the linked PRs

When rolling back from 4.16 to 4.15, rollback changes made to the cluster state to allow the 4.15 version of the managed image pull secret generation to take over again.

Bug OCPBUGS-37550: edit nncp to update dns nameserver failed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30955~~. The following is the description of the original issue:
—
Description of problem:

apply nncp to configure DNS, then edit nncp to update nameserver, but /etc/resolv.conf is not updated.

Version-Release number of selected component (if applicable):

OCP version: 4.16.0-0.nightly-2024-03-13-061822
knmstate operator version: kubernetes-nmstate-operator.4.16.0-202403111814

How reproducible:

always

Steps to Reproduce:

1. install knmstate operator
2. apply below nncp to configure dns on one of the node
---
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: dns-staticip-4
spec:
  nodeSelector:
    kubernetes.io/hostname: qiowang-031510-k4cjs-worker-0-rw4nt
  desiredState:
    dns-resolver:
      config:
        search:
        - example.org
        server:
        - 192.168.221.146
        - 8.8.9.9
    interfaces:
    - name: dummy44
      type: dummy
      state: up
      ipv4:
        address:
        - ip: 192.0.2.251
          prefix-length: 24
        dhcp: false
        enabled: true
        auto-dns: false
% oc apply -f dns-staticip-noroute.yaml 
nodenetworkconfigurationpolicy.nmstate.io/dns-staticip-4 created
% oc get nncp
NAME             STATUS      REASON
dns-staticip-4   Available   SuccessfullyConfigured
% oc get nnce
NAME                                                 STATUS      STATUS AGE   REASON
qiowang-031510-k4cjs-worker-0-rw4nt.dns-staticip-4   Available   5s           SuccessfullyConfigured


3. check dns on the node, dns configured correctly
sh-5.1# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search qiowang-031510.qe.devcluster.openshift.com example.org
nameserver 192.168.221.146
nameserver 192.168.221.146
nameserver 8.8.9.9
# nameserver 192.168.221.1
sh-5.1# 
sh-5.1# cat /var/run/NetworkManager/resolv.conf 
# Generated by NetworkManager
search example.org
nameserver 192.168.221.146
nameserver 8.8.9.9
nameserver 192.168.221.1
sh-5.1# 
sh-5.1# nmcli | grep 'DNS configuration' -A 10
DNS configuration:
	servers: 192.168.221.146 8.8.9.9
	domains: example.org
	interface: dummy44
... ...


4. edit nncp, update nameserver, save the modification
---
spec:
  desiredState:
    dns-resolver:
      config:
        search:
        - example.org
        server:
        - 192.168.221.146
        - 8.8.8.8       <---- update from 8.8.9.9 to 8.8.8.8
    interfaces:
    - ipv4:
        address:
        - ip: 192.0.2.251
          prefix-length: 24
        auto-dns: false
        dhcp: false
        enabled: true
      name: dummy44
      state: up
      type: dummy
  nodeSelector:
    kubernetes.io/hostname: qiowang-031510-k4cjs-worker-0-rw4nt
% oc edit nncp dns-staticip-4
nodenetworkconfigurationpolicy.nmstate.io/dns-staticip-4 edited
% oc get nncp
NAME             STATUS      REASON
dns-staticip-4   Available   SuccessfullyConfigured
% oc get nnce
NAME                                                 STATUS      STATUS AGE   REASON
qiowang-031510-k4cjs-worker-0-rw4nt.dns-staticip-4   Available   8s           SuccessfullyConfigured


5. check dns on the node again

Actual results:

the dns nameserver in file /etc/resolv.conf is not updated after nncp updated, file /var/run/NetworkManager/resolv.conf updated correctly: 

sh-5.1# cat /etc/resolv.conf 
# Generated by KNI resolv prepender NM dispatcher script
search qiowang-031510.qe.devcluster.openshift.com example.org
nameserver 192.168.221.146
nameserver 192.168.221.146
nameserver 8.8.9.9        <---- it is not updated
# nameserver 192.168.221.1
sh-5.1# 
sh-5.1# cat /var/run/NetworkManager/resolv.conf 
# Generated by NetworkManager
search example.org
nameserver 192.168.221.146
nameserver 8.8.8.8        <---- updated correctly
nameserver 192.168.221.1
sh-5.1# 
sh-5.1# nmcli | grep 'DNS configuration' -A 10
DNS configuration:
	servers: 192.168.221.146 8.8.8.8
	domains: example.org
	interface: dummy44
... ...

Expected results:

the dns nameserver in file /etc/resolv.conf can be updated accordingly

Additional info:

https://github.com/openshift/machine-config-operator/pull/4498

Bug OCPBUGS-43645: [4.16] Cloud Credentials operator generating millions of messages per day in GCP clusters

View the Description View the linked PRs

The customer's cloud credentials operator generates millions of the below messages per day in the GCP cluster.

And they want to reduce/stop these logs as it is consuming more disks. Also, their "cloud credentials" operator runs in manual mode.

time="2024-06-21T08:37:42Z" level=warning msg="read-only creds not found, using root creds client" actuator=gcp cr=openshift-cloud-credential-operator/openshift-gcp-ccm secret=openshift-cloud-credential-operator/cloud-credential-operator-gcp-ro-creds
time="2024-06-21T08:37:42Z" level=error msg="error creating GCP client" error="Secret \"gcp-credentials\" not found"
time="2024-06-21T08:37:42Z" level=error msg="error determining whether a credentials update is needed" actuator=gcp cr=openshift-cloud-credential-operator/openshift-gcp-ccm error="unable to check whether credentialsRequest needs update"
time="2024-06-21T08:37:42Z" level=error msg="error syncing credentials: error determining whether a credentials update is needed" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-ccm secret=openshift-cloud-controller-manager/gcp-ccm-cloud-credentials
time="2024-06-21T08:37:42Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-ccm secret=openshift-cloud-controller-manager/gcp-ccm-cloud-credentials
time="2024-06-21T08:37:42Z" level=info msg="reconciling clusteroperator status"
time="2024-06-21T08:37:42Z" level=info msg="operator detects timed access token enabled cluster (STS, Workload Identity, etc.)" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-pd-csi-driver-operator
time="2024-06-21T08:37:42Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-gcp-pd-csi-driver-operator
time="2024-06-21T08:37:42Z" level=warning msg="read-only creds not found, using root creds client" actuator=gcp cr=openshift-cloud-credential-operator/openshift-gcp-pd-csi-driver-operator secret=openshift-cloud-credential-operator/cloud-credential-operator-gcp-ro-creds

https://github.com/openshift/cloud-credential-operator/pull/772

Bug OCPBUGS-26048: The default channel is not correct

View the Description View the linked PRs

Description of problem:

The default channel of 4.15, 4.16 clusters is stable-4.14.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-01-03-193825

How reproducible:

Always

Steps to Reproduce:

    1. Install a 4.16 cluster
    2. Check default channel
# oc adm upgrade 
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-0.nightly-2024-01-03-193825 not found in the "stable-4.14" channel

Cluster version is 4.16.0-0.nightly-2024-01-03-193825

Upgradeable=False

  Reason: MissingUpgradeableAnnotation
  Message: Cluster operator cloud-credential should not be upgraded between minor versions: Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.14

    3.

Actual results:

Default channel is stable-4.14 in a 4.16 cluster

Expected results:

Default channel should be stable-4.16 in a 4.16 cluster

Additional info:

4.15 cluster has the issue as well.

https://github.com/openshift/installer/pull/7867

Bug OCPBUGS-29249: CPMS leaves only 2 masters during update

View the Description View the linked PRs

Observed during testing of candidate-4.15 image as of 2024-02-08.

This is an incomplete report as I haven't verified the reproducer yet or attempted to get a must-gather. I have observed this multiple times now, so I am confident it's a thing. I can't be confident that the procedure described here reliably reproduces it, or that all the described steps are required.

I have been using MCO to apply machine config to masters. This involves a rolling reboot of all masters.

During a rolling reboot I applied an update to CPMS. I observed the following sequence of events:

master-1 was NotReady as it was rebooting
I modified CPMS
CPMS immediately started provisioning a new master-0
CPMS immediately started deleting master-1
CPMS started provisioning a new master-1

At this point there were only 2 nodes in the cluster:

old master-0
old master-2

and machines provisioning:

new master-0
new master-1

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/278

Bug OCPBUGS-30320: HostedCluster time to provision SLO below desired threshold 99% in 360s

View the Description View the linked PRs

Hosted cluster time to provision is outside the SLO 99% of cluster provision in less than 360s.

https://github.com/openshift/hypershift/pull/3734

Bug OCPBUGS-32492: OLM pods on HyperShift are using incorrect images

View the Description View the linked PRs

Description of problem:

When we create a new HostedCluster with HyperShift, the OLM pods on the management cluster cannot be created correctly.
Regardless using multi or amd64 images, the OLM pods complains:
exec /bin/opm: exec format error
All other pods are running correctly. The nodes on management cluster is amd64.

Version-Release number of selected component (if applicable):

4.15.z

How reproducible:

Trigger reherasal of this example PR: https://github.com/openshift/release/pull/51141

Steps to Reproduce:

    1. Trigger the reherasal on the PR above: /pj-rehearse periodic-ci-opendatahub-io-ai-edge-main-test-ai-edge-periodic
    2. Locate the cluster name in Log of the Pod test-ai-edge-periodic-hypershift-hostedcluster-create-hostedcluster
    3. Login https://console-openshift-console.apps.hosted-mgmt.ci.devcluster.openshift.com/
    4. Enter the namespace for the ephemeral cluster created by the reherasal
    5. Check Pod, looking for marketplace related pods, like certified-operators-catalog-58f7bd7467-4l2s2

Actual results:

The Pods are Running

Expected results:

The Pods are either CrashLoop or ErrPullImage

Additional info:

https://github.com/openshift/hypershift/pull/3937

Bug OCPBUGS-37660: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-vsphere/pull/72

Bug OCPBUGS-38704: [4.16] EgressIP doesn't work in VRF Mode

View the Description View the linked PRs

Egress IP doesn’t work in multihomed VRF Setup, packets can not be delivered to next-hop for routing.

Topology description

SNO with following configuration:

Interface 1 - Machine network

Interface 2 - VRF with IP and Default Network.

Interface 3 - Interface in Main Routing table with static route

Configuration:

---
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
 name: vrf-1082-with-ip-iface-left-transport
 annotations:
   description: Create VLAN, IP Interface and VRF on Transport node LEFT
spec:
 nodeSelector:
   transport/node: "left"
 desiredState:
   interfaces:
     - ipv4:
         address:
           - ip: 10.10.82.2
             prefix-length: 24
         enabled: true
       name: enp5s0f0.1082
       state: up
       type: vlan
       vlan:
         base-iface: enp5s0f0
         id: 1082
     - name: vrf1082
       state: up
       type: vrf
       vrf:
         port:
           - enp5s0f0.1082
         route-table-id: 1082
   route-rules:
     config:
       - ip-to: 172.30.0.0/16
         priority: 998
         route-table: 254
       - ip-to: 10.128.0.0/14
         priority: 998
         route-table: 254
       - ip-to: 169.254.169.0/29
         priority: 998
         route-table: 254
   routes:
     config:
       - destination: 0.0.0.0/0
         metric: 150
         next-hop-address: 10.10.82.1
         next-hop-interface: enp5s0f0.1082
         table-id: 1082

This above creates IP interface on the node

### List of VRFs
 
[core@pool2-controller1 ~]$ ip l show vrf1082
6613: vrf1082: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 72:75:e4:f8:b4:7b brd ff:ff:ff:ff:ff:ff


[core@pool2-controller1 ~]$ ip vrf list
Name              Table
-----------------------
vrf1082           1082







### Default routing table 

[core@pool2-controller1 ~]$ ip r
default via 10.1.196.254 dev br-ex proto static metric 48
10.1.196.0/24 dev br-ex proto kernel scope link src 10.1.196.21 metric 48
10.128.0.0/14 via 10.131.0.1 dev ovn-k8s-mp0
10.131.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.131.0.2
169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2
169.254.169.1 dev br-ex src 10.1.196.21
169.254.169.3 via 10.131.0.1 dev ovn-k8s-mp0
172.30.0.0/16 via 169.254.169.4 dev br-ex src 169.254.169.2 mtu 1400
 
### VRF Routing table 1082
 
[core@pool2-controller1 ~]$ ip r show table 1082
default via 10.10.82.1 dev enp5s0f0.1082 proto static metric 150
10.10.82.0/24 dev enp5s0f0.1082 proto kernel scope link src 10.10.82.2 metric 400
local 10.10.82.2 dev enp5s0f0.1082 proto kernel scope host src 10.10.82.2
local 10.10.82.110 dev enp5s0f0.1082 proto kernel scope host src 10.10.82.110
broadcast 10.10.82.255 dev enp5s0f0.1082 proto kernel scope link src 10.10.82.2

Deploy Application

---
# Create Namespace
apiVersion: v1
kind: Namespace
metadata:
 name: egressip-test
 labels:
   egress: vrf1082
---
# Create EgressIP for the namespace
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
 name: egressip-vrf-1082
spec:
 egressIPs:
 - 100.10.82.110
 namespaceSelector:
   matchLabels:
     egress: vrf1082
--- 
#Deploy APP
apiVersion: apps/v1
kind: Deployment
metadata:
 name: server
 namespace: egressip-test
spec:
 selector:
   matchLabels:
     app: server
 template:
   metadata:
     labels:
       app: server
   spec:
     containers:
       - name: server
         image: quay.io/mancubus77/podman-banner
         ports:
           - name: http
             containerPort: 8080
         volumeMounts:
           - name: npm-empty-dir
             mountPath: /.npm
     volumes:
       - name: npm-empty-dir
         emptyDir: {}

OCP Behaviour

With configuration above, OVNK behaves as it supposed to:

# IPTables egress created
 
[core@pool2-controller1 ~]$ sudo  iptables -nvL OVN-KUBE-EGRESS-IP-MULTI-NIC -t nat
Chain OVN-KUBE-EGRESS-IP-MULTI-NIC (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 SNAT       0    --  *      enp5s0f0.1082  10.130.0.45          0.0.0.0/0            to:10.10.82.110
    0     0 SNAT       0    --  *      enp5s0f0.1082  10.131.0.15          0.0.0.0/0            to:10.10.82.110
[



# IP Rule created 
[core@pool2-controller1 ~]$ ip rule | grep 6000
6000: from 10.131.0.15 lookup 7614
6000: from 10.130.0.45 lookup 7614

Expected behavior

Packets forward to marked Egress Nodes
All egress packets from pods in the namespace have configured SRC IP
Packets forward according to VRF Routing table

Actual Behavior

✅ Packets forward to marked Egress Nodes
✅ All egress packets from pods in the namespace have configured SRC IP
❌ Packets forward according to VRF Routing table

## Command from pod
~ $ curl 1.1.1.1



#### ---=== PACKET DUMP ===---
 
# Packet leaving Pod's OVN Port
10:48:23.615730 0343c50016330fb P   IP 10.131.0.15.57974 > 1.1.1.1.80: Flags [S], seq 2114359519, win 32640, options [mss 1360,sackOK,TS val 2792922673 ecr 0,nop,wscale 7], length 0
 
# Packet leaving OVN-Domain
10:48:23.615858 ovn-k8s-mp0 In  IP 10.131.0.15.57974 > 1.1.1.1.80: Flags [S], seq 2114359519, win 32640, options [mss 1360,sackOK,TS val 2792922673 ecr 0,nop,wscale 7], length 0



# Node tries to resolve Destination IP via ARP (on vlan Interface)
10:48:23.615903 enp5s0f0.1082 Out ARP, Request who-has 1.1.1.1 tell 10.10.82.2, length 28

Root cause

According to the OVN-K source code, when an EgressIP node is added, the controller searches for a routes associated with a given interface based on it’s ifindex. As VRF has different routing table ID from main(default - 254), the OVN-K controller doesn’t know about any routes associated with the interface and creates the following rule per pod:

# IP RULE for 2 pods in the namespace
[core@pool2-controller1 ~]$ ip rule
6000: from 10.131.0.15 lookup 7614
6000: from 10.130.0.45 lookup 7614
 
# Routing table 7614
[core@pool2-controller1 ~]$ ip route show table 7614
default dev enp5s0f0.1082

The entry above says that all traffic on this interface is directly attached (P2P), therefore Linux routing engine sends ARP in attempt to find a MAC address of the destination (1.1.1.1 in this example)

Hack

To make it work, the default route (or associated static route) to VRF needs to be added.

# Add proper route 
[core@pool2-controller1 ~]$ sudo ip route add default via 10.10.82.1 dev enp5s0f0.1082 table 7614
 
# Delete default route 
[core@pool2-controller1 ~]$ sudo ip route del default dev enp5s0f0.1082 table 7614
 
# Ensure route installed 
[core@pool2-controller1 ~]$ ip route show table 7614
default via 10.10.82.1 dev enp5s0f0.1082
default dev enp5s0f0.1082 metric 10

New behaviour

# Packet leaving Pod's OVN Port
11:01:25.915965 0343c50016330fb P   IP 10.131.0.15.35796 > 1.1.1.1.80: Flags [S], seq 1447686540, win 32640, options [mss 1360,sackOK,TS val 2793704974 ecr 0,nop,wscale 7], length 0
 
# Packet leaving OVN-Domain
11:01:25.917868 ovn-k8s-mp0 In  IP 10.131.0.15.35796 > 1.1.1.1.80: Flags [S], seq 1447686540, win 32640, options [mss 1360,sackOK,TS val 2793704974 ecr 0,nop,wscale 7], length 0
 
# Packet addresses to Default GW toward to 10.10.82.1 Router
11:04:11.136937 enp5s0f0.1082 Out ifindex 6614 b4:96:91:25:93:20 > b4:96:91:1d:7f:f0, ethertype IPv4 (0x0800), length 74: 10.10.82.110.kitim > 1.1.1.1.http: Flags [S], seq 2398495404, win 32640, options [mss 1360,sackOK,TS val 2794031032 ecr 0,nop,wscale 7], length 0



# Validate MAC 
[core@pool2-controller1 ~]$ arp -an | grep f0
? (10.10.82.1) at b4:96:91:1d:7f:f0 [ether] on enp5s0f0.1082

Document with more details: https://docs.google.com/document/d/1ZLIqWjs85_zBZ9J92L63zwbds66gMAnLhShtlPFH9Ro/edit

https://github.com/openshift/ovn-kubernetes/pull/2267

Bug OCPBUGS-32028: Logs of haproxy too verbose

View the Description View the linked PRs

==== This Jira covers only haproxy component ====

Description of problem:

Pods running in the namespace openshift-vsphere-infra are so much verbose printing as INFO messages that should debug.

This excesse of verbosity has an impact in CRIO, in the node and also in the Logging system. 

For instance, having 71 nodes, the number of logs coming from this namespace in 1 month was: 450.000.000 meaning 1TB of logs written to disk on the node by CRIO, reading but the Red Hat log collector and stored in the Log Store.

Added to the impact on the performance, it have a financial impact for the storage needed.

Examples of logs are that adjust better to DEBUG and not as INFO:
```
/// For keep-alive pods are printed 4 messages per node each 10 seconds per node, in this example, the number of nodes is 71, then, this means 284 log entries per second, then 1704 log entries by minute and keepalive pod
$ oc logs keepalived-master.example-0 -c  keepalived-monitor |grep master.example-0|grep 2024-02-15T08:20:21 |wc -l

$ oc logs keepalived-master-example-0 -c  keepalived-monitor |grep worker-example-0|grep 2024-02-15T08:20:21 
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
2024-02-15T08:20:21.733399279Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.733421398Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"

/// For haproxy logs observed 2 logs printed per 6 seconds for each master, this means 6 messages in the same second, 60 messages/minute per pod
$ oc logs haproxy-master-0-example -c haproxy-monitor
...
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="Searching for Node IP of master-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x]'."
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="For node master-example-0 selected peer address x.x.x.x using NodeInternalIP"

Version-Release number of selected component (if applicable):

OpenShift 4.14
VSphere IPI installation

How reproducible:

Always

Steps to Reproduce:

    1. Install OpenShift 4.14 Vsphere IPI environment
    2. Review the logs of the haproxy pods and keealived pods running in the namespace `openshift-vsphere-infra`

Actual results:

The pods haproxy-* and keepalived-* pods being so much verbose printing as INFO messages should be as DEBUG. 

Some of the messages are available in the Description of the problem in the present bug.

Expected results:

Printed as INFO only relevant messages helping to reduce the verbosity of the pods running in the namespace  `openshift-vsphere-infra`

Additional info:

https://github.com/openshift/machine-config-operator/pull/4245

Bug OCPBUGS-34142: Import from Git allow users to import an app with Build option Pipeline also when no Pipeline is available

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32476~~. The following is the description of the original issue:
—
Description of problem:
After installing the Pipelines Operator on a local cluster (OpenShift local), the Pipelines features was shown the Console.

But when selecting the Build option "Pipelines" a warning was shown:

The pipeline template for Dockerfiles is not available at this time.

Anyway it was possible to push the Create button and create a Deployment. But because there is no build process created, it couldn't successful start.

After ~20 min after the Pipeline operator says that it was successfully installed, the Pipeline templates in the openshift-pipelines namespaces appear, and I could create valid Deployment.

Version-Release number of selected component (if applicable):

OpenShift cluster 4.14.7
Pipelines operator 1.14.3

How reproducible:
Sometimes, maybe depending on the internet connection speed.

Steps to Reproduce:

Install OpenShift Local
Install Pipelines Opeartor
Import from Git and select Pipeline as option

Actual results:

Error message was shown: The pipeline template for Dockerfiles is not available at this time.
The user can create the Deployment anyway.

Expected results:

The error message is fine.
But as long as the error message is shown I would expect that the user can not click on Create.

Additional info:

https://github.com/openshift/console/pull/13884

Bug OCPBUGS-30224: "k8s.ovn.org/node-chassis-id annotation not found" event causing CI failures

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-arch] events should not repeat pathologically for ns/openshift-multus.

Probability of significant regression: 99.96%

Sample (being evaluated) Release: 4.16
Start Time: 2024-02-19T00:00:00Z
End Time: 2024-03-04T23:59:59Z
Success Rate: 53.33%
Successes: 8
Failures: 7
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-19T00:00:00Z
End Time: 2024-03-04T23:59:59Z
Success Rate: 100.00%
Successes: 24
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-03-04%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-19%2000%3A00%3A00&capability=Other&component=Networking%20%2F%20multus&confidence=95&environment=ovn%20no-upgrade%20amd64%20azure%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=azure&sampleEndTime=2024-03-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-02-19%2000%3A00%3A00&testId=openshift-tests%3A4ef347b98570fc3fa0f208ca0bbcdd04&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-multus&upgrade=no-upgrade&variant=serial

https://github.com/openshift/ovn-kubernetes/pull/2162

Bug OCPBUGS-25843: The third link title doesn't show up on feedback modal

View the Description View the linked PRs

Description of problem:

On customer feedback modal, there are 3 links for user to feedback to Red Hat, the third link lacks a title.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-21-155123

How reproducible:

Always

Steps to Reproduce:

    1.Login admin console. Click on "?"->"Share Feedback", check the links on the modal
    2.
    3.

Actual results:

1. The third link lacks a link title (the link for "Learn about opportunities to ……").

Expected results:

1. There is link title "Inform the direction of Red Hat" in 4.14, it should also exists for 4.15.

Additional info:

screenshot for 4.14 page: https://drive.google.com/file/d/19AnPlE0h9WwvIjxV0gLuf5x27jLN7TLS/view?usp=drive_link
screenshot for 4.15 page: https://drive.google.com/file/d/19MRjzNGRWfYnK-zcoMozh7Z7eaDDG2L-/view?usp=drive_link

https://github.com/openshift/console/pull/13483

Bug OCPBUGS-29429: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8004

Bug OCPBUGS-37078: [backport 4.16] Sometimes dns name configured in EgressFirewall was not resolved

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33750~~. The following is the description of the original issue:
—
Description of problem:

Sometimes dns name configured in EgressFirewall was not resolved

Version-Release number of selected component (if applicable):

Using the build by
{code:java}
 build openshift/cluster-network-operator#2131

    How reproducible:{code:none}

Steps to Reproduce:

  

    % for i in {1..7};do oc create ns test$i;oc create -f  data/egressfirewall/eg_policy_wildcard.yaml -n test$i; oc create -f data/list-for-pod.json -n test$i;sleep 1;done
    namespace/test1 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
    namespace/test2 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
    namespace/test3 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
    namespace/test4 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
    namespace/test5 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
    namespace/test6 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
    namespace/test7 created
    egressfirewall.k8s.ovn.org/default created
    replicationcontroller/test-rc created
    service/test-service created
     
    % cat data/egressfirewall/eg_policy_wildcard.yaml
    kind: EgressFirewall
    apiVersion: k8s.ovn.org/v1
    metadata:
      name: default
    spec:
      egress:
      - type: Allow
        to:
          dnsName: "*.google.com" 
      - type: Deny 
        to:
          cidrSelector: 0.0.0.0/0
     
     
    Then I created namespace test8, created egressfirewall and updated dns anme,it worked well. Then I deleted test8
     
    After that I created namespace test11 as below steps, the issue happened again.
     % oc create ns test11
    namespace/test11 created
    % oc create -f data/list-for-pod.json -n test11
    replicationcontroller/test-rc created
    service/test-service created
    % oc create -f data/egressfirewall/eg_policy_dnsname1.yaml -n test11
    egressfirewall.k8s.ovn.org/default created
    % oc get egressfirewall -n test11
    NAME      EGRESSFIREWALL STATUS
    default   EgressFirewall Rules applied
     % oc get egressfirewall -n test11 -o yaml
    apiVersion: v1
    items:
    - apiVersion: k8s.ovn.org/v1
      kind: EgressFirewall
      metadata:
        creationTimestamp: "2024-05-16T05:32:07Z"
        generation: 1
        name: default
        namespace: test11
        resourceVersion: "101288"
        uid: 18e60759-48bf-4337-ac06-2e3252f1223a
      spec:
        egress:
        - to:
            dnsName: registry-1.docker.io
          type: Allow
        - ports:
          - port: 80
            protocol: TCP
          to:
            dnsName: www.facebook.com
          type: Allow
        - to:
            cidrSelector: 0.0.0.0/0
          type: Deny
      status:
        messages:
        - 'hrw-0516i-d884f-worker-a-m7769: EgressFirewall Rules applied'
        - 'hrw-0516i-d884f-master-0.us-central1-b.c.openshift-qe.internal: EgressFirewall
          Rules applied'
        - 'hrw-0516i-d884f-worker-b-q4fsm: EgressFirewall Rules applied'
        - 'hrw-0516i-d884f-master-1.us-central1-c.c.openshift-qe.internal: EgressFirewall
          Rules applied'
        - 'hrw-0516i-d884f-master-2.us-central1-f.c.openshift-qe.internal: EgressFirewall
          Rules applied'
        - 'hrw-0516i-d884f-worker-c-4kvgr: EgressFirewall Rules applied'
        status: EgressFirewall Rules applied
    kind: List
    metadata:
      resourceVersion: ""
     % oc get pods -n test11                  
    NAME            READY   STATUS    RESTARTS   AGE
    test-rc-ffg4g   1/1     Running   0          61s
    test-rc-lw4r8   1/1     Running   0          61s
     % oc rsh -n test11 test-rc-ffg4g
    ~ $ curl registry-1.docker.io -I
     
    ^C
    ~ $ curl www.facebook.com
    ^C
    ~ $ 
    ~ $ curl www.facebook.com --connect-timeout 5
    curl: (28) Failed to connect to www.facebook.com port 80 after 2706 ms: Operation timed out
    ~ $ curl registry-1.docker.io --connect-timeout 5
    curl: (28) Failed to connect to registry-1.docker.io port 80 after 4430 ms: Operation timed out
    ~ $ ^C
    ~ $ exit
    command terminated with exit code 130
    % oc get dnsnameresolver     -n openshift-ovn-kubernetes         
    NAME             AGE
    dns-67b687cfb5   7m47s
    dns-696b6747d9   2m12s
    dns-b6c74f6f4    2m12s
     
     % oc get dnsnameresolver  dns-696b6747d9  -n openshift-ovn-kubernetes  -o yaml
    apiVersion: network.openshift.io/v1alpha1
    kind: DNSNameResolver
    metadata:
      creationTimestamp: "2024-05-16T05:32:07Z"
      generation: 1
      name: dns-696b6747d9
      namespace: openshift-ovn-kubernetes
      resourceVersion: "101283"
      uid: a8546ad8-b16d-4d81-a943-46bdd0d82aa5
    spec:
      name: www.facebook.com.
 % oc get dnsnameresolver  dns-696b6747d9  -n openshift-ovn-kubernetes  -o yaml
    apiVersion: network.openshift.io/v1alpha1
    kind: DNSNameResolver
    metadata:
      creationTimestamp: "2024-05-16T05:32:07Z"
      generation: 1
      name: dns-696b6747d9
      namespace: openshift-ovn-kubernetes
      resourceVersion: "101283"
      uid: a8546ad8-b16d-4d81-a943-46bdd0d82aa5
    spec:
      name: www.facebook.com.
     
     % oc get dnsnameresolver  dns-696b6747d9  -n openshift-ovn-kubernetes  -o yaml
    apiVersion: network.openshift.io/v1alpha1
    kind: DNSNameResolver
    metadata:
      creationTimestamp: "2024-05-16T05:32:07Z"
      generation: 1
      name: dns-696b6747d9
      namespace: openshift-ovn-kubernetes
      resourceVersion: "101283"
      uid: a8546ad8-b16d-4d81-a943-46bdd0d82aa5
    spec:
      name: www.facebook.com.

Actual results:

The dns name like www.facebook.com configured in egressfirewall didn't get resolved to IP

Expected results:

EgressFirewall works as expected.

Additional info:

Bug OCPBUGS-41551: Nodes to Node and subsequently pod to pod communication are repeatedly degrading despite multiple OVN DB rebuilds to fix the issue

View the Description View the linked PRs

Description of problem:

Bare Metal UPI cluster

Nodes lose communication with other nodes and this affects the pod communication on these nodes as well. This issue can be fixed with an OVN rebuild on the nodes db that are hitting the issue but eventually the nodes will degrade again and lose communication again. Note despite an OVN Rebuild fixing the issue temporarily Host Networking is set to True so it's using the kernel routing table. 

**update: observed on Vsphere with routingViaHost: false, ipForwarding: global configuration as well.

Version-Release number of selected component (if applicable):

 4.14.7, 4.14.30

How reproducible:

Can't reproduce locally but reproducible and repeatedly occurring in customer environment

Steps to Reproduce:

identify a host node who's pods can't be reached from other hosts in default namespaces ( tested via openshift-dns). observe curls to that peer pod consistently timeout. TCPdumps to target pod observe that packets are arriving and are acknowledged, but never route back to the client pod successfully. (SYN/ACK seen at pod network layer, not at geneve; so dropped before hitting geneve tunnel).

Actual results:

Nodes will repeatedly degrade and lose communication despite fixing the issue with a ovn db rebuild (db rebuild only provides hours/days of respite, no permanent resolve).

Expected results:

Nodes should not be losing communication and even if they did it should not happen repeatedly

Additional info:

What's been tried so far
========================

- Multiple OVN rebuilds on different nodes (works but node will eventually hit issue again)

- Flushing the conntrack (Doesn't work)

- Restarting nodes (doesn't work)

Data gathered
=============

- Tcpdump from all interfaces for dns-pods going to port 7777 (to segregate traffic)

- ovnkube-trace

- SOSreports of two nodes having communication issues before an OVN rebuild

- SOSreports of two nodes having communication issues after an OVN rebuild 

- OVS trace dumps of br-int and br-ex 


====

More data in nested comments below.

https://github.com/openshift/ovn-kubernetes/pull/2306

Bug OCPBUGS-28592: Update 4.16 ose-cluster-kube-cluster-api-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-operator/pull/37

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-operator/pull/37

Bug OCPBUGS-25630: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-libvirt/pull/276

Bug OCPBUGS-37820: [4.16] control-plane-machine-set goes Available=False with UnavailableReplicas during etcd scale testing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36462~~. The following is the description of the original issue:
—

Description of problem

Similar to ~~OCPBUGS-20061~~, but for a different situation:

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?maxAge=48h&name=pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling&type=junit&search=clusteroperator/control-plane-machine-set+should+not+change+condition/Available' | grep 'failures match' | sort
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling (all) - 15 runs, 60% failed, 33% of failures match = 20% impact

In that test, since ~~ETCD-329~~, the test suite deletes a control-plane Machine and waits for the ControlPlaneMachineSet controller to scale in a replacement. But in runs like this, the outgoing Node goes Ready=Unknown for not-yet-diagnosed reasons, and that somehow misses cpmso#294's inertia (maybe the running guard should be dropped?), and the ClusterOperator goes Available=False complaining about Missing 1 available replica(s).

It's not clear from the message which replica it's worried about (that would be helpful information to include in the message), but I suspect it's the Machine/Node that's in the deletion process. But regardless of the message, this does not seem like a situation worth a cluster-admin-midnight-page Available=False alarm.

Version-Release number of selected component

Seen in dev-branch CI. I haven't gone back to check older 4.y.

How reproducible

CI Search shows 20% impact, see my earlier query in this message.

Steps to Reproduce

Run a bunch of pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling and check CI Search results.

Actual results

20% impact

Expected results

No hits.

https://github.com/openshift/cluster-etcd-operator/pull/1311

Bug OCPBUGS-42286: [IBMCloud] MonitorTests fail due to CSI Driver pods require ClusterRole SCC binding

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42277~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42231. The following is the description of the original issue:
—
Description of problem:

    OCP Conformance MonitorTests can fail based on CSI Drivers pod and ClusterRole applied order. SA, CR, CRB likely should be applied first prior to deployment/pods.

Version-Release number of selected component (if applicable):

    4.18.0

How reproducible:

60%

Steps to Reproduce:

    1. Create IPI cluster on IBM Cloud
    2. Run OCP Conformance w/ MonitorTests

Actual results:

    : [sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]

{  fail [github.com/openshift/origin/test/extended/authorization/scc.go:76]: 1 pods failed before test on SCC errors
Error creating: pods "ibm-vpc-block-csi-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[6]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[9]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].runAsUser: Invalid value: 0: must be in the ranges: [1000180000, 1000189999], provider restricted-v2: .containers[1].runAsUser: Invalid value: 0: must be in the ranges: [1000180000, 1000189999], provider restricted-v2: .containers[1].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[2].runAsUser: Invalid value: 0: must be in the ranges: [1000180000, 1000189999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/ibm-vpc-block-csi-node -n openshift-cluster-csi-drivers happened 7 times

Ginkgo exit error 1: exit with code 1}

Expected results:

    No pod creation failures using the wrong SCC, because the ClusterRole/ClusterRoleBinding, etc. had not been applied yet.

Additional info:

Sorry, I did not see an IBM Cloud Storage listed in the targeted Component for this bug, so selected the generic Storage component. Please forward as necessary/possible.


Items to consider:

ClusterRole:  https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/rbac/privileged_role.yaml

ClusterRoleBinding:  https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/rbac/node_privileged_binding.yaml

The ibm-vpc-block-csi-node-* pods eventually reach running using privileged SCC. I do not know whether it is possible to stage the resources that get created first, within the CSI Driver Operator
https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/9288e5078f2fe3ce2e69a4be3d94622c164c3dbd/pkg/operator/starter.go#L98-L99
Prior to the CSI Driver daemonset (`node.yaml`), perhaps order matters within the list.

Example of failure in CI:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8235/pull-ci-openshift-installer-master-e2e-ibmcloud-ovn/1836521032031145984

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/129

Bug OCPBUGS-36841: [release-4.16] Operand details page shows incorrect API version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34901~~. The following is the description of the original issue:
—
Description of problem:

My CSV recently added a v1beta2 API version in addition to the existing v1beta1 version. When I create a v1beta2 CR and view it in the console, I see v1beta1 API fields and not the expected v1beta2 fields.

Version-Release number of selected component (if applicable):

4.15.14 (could affect other versions)

How reproducible:

Install 3.0.0 development version of Cryostat Operator

Steps to Reproduce:

    1. operator-sdk run bundle quay.io/ebaron/cryostat-operator-bundle:ocpbugs-34901
    2. cat << 'EOF' | oc create -f -
    apiVersion: operator.cryostat.io/v1beta2
    kind: Cryostat
    metadata:
      name: cryostat-sample
    spec:
      enableCertManager: false
    EOF
    3. Navigate to https://<openshift console>/k8s/ns/openshift-operators/clusterserviceversions/cryostat-operator.v3.0.0-dev/operator.cryostat.io~v1beta2~Cryostat/cryostat-sample
    4. Observe v1beta1 properties are rendered including "Minimal Deployment"
    5. Attempt to toggle "Minimal Deployment", observe that this fails.

Actual results:

v1beta1 properties are rendered in the details page instead of v1beta2 properties

Expected results:

v1beta2 properties are rendered in the details page

Additional info:

https://github.com/openshift/console/pull/14049

Bug OCPBUGS-37510: Subnets created by the installer are tagged as shared

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36904~~. The following is the description of the original issue:
—
Description of problem:

Subnets created by the installer are tagged with kubernetes.io/cluster/<infra_id> set to 'shared' instead of 'owned'.

Version-Release number of selected component (if applicable):

4.16.z

How reproducible:

Any time a 4.16 cluster is installed

Steps to Reproduce:

    1. Install a fresh 4.16 cluster without providing an existing VPC.

Actual results:

Subnets are tagged with kubernetes.io/cluster/<infra_id>: shared

Expected results:

Subnets created by the installer are tagged with kubernetes.io/cluster/<infra_id>: owned

Additional info:

Slack discussion here - https://redhat-internal.slack.com/archives/C68TNFWA2/p1720728359424529

https://github.com/openshift/installer/pull/8772

Bug OCPBUGS-39293: AdditionalTrustedCA in ImageConfig is not wired correctly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39225~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38474. The following is the description of the original issue:
—
Description of problem:

    AdditionalTrustedCA is not wired correctly so the configmap is not found my its operator. This feature is meant to be exposed by XCMSTRAT-590, but at the moment it seems to be broken

Version-Release number of selected component (if applicable):

    4.16.5

How reproducible:

    Always

Steps to Reproduce:

1. Create a configmap containing a registry and PEM cert, like https://github.com/openshift/openshift-docs/blob/ef75d891786604e78dcc3bcb98ac6f1b3a75dad1/modules/images-configuration-cas.adoc#L17  
2. Refer to it in .spec.configuration.image.additionalTrustedCA.name     
3. image-registry-config-operator is not able to find the cm and the CO is degraded

Actual results:

   CO is degraded

Expected results:

    certs are used.

Additional info:

I think we may miss a copy of the configmap from the cluster NS to the target ns. It should be also deleted if it is deleted.

 % oc get hc -n ocm-adecorte-2d525fsstsvtbv1h8qss14pkv171qhdd -o jsonpath="{.items[0].spec.configuration.image.additionalTrustedCA}" | jq
{
  "name": "registry-additional-ca-q9f6x5i4"
}

% oc get cm -n ocm-adecorte-2d525fsstsvtbv1h8qss14pkv171qhdd registry-additional-ca-q9f6x5i4
NAME                              DATA   AGE
registry-additional-ca-q9f6x5i4   1      16m

logs of cluster-image-registry operator

E0814 13:22:32.586416       1 imageregistrycertificates.go:141] ImageRegistryCertificatesController: unable to sync: failed to update object *v1.ConfigMap, Namespace=openshift-image-registry, Name=image-registry-certificates: image-registry-certificates: configmap "registry-additional-ca-q9f6x5i4" not found, requeuing

CO is degraded

% oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.16.5    True        False         False      3h58m
csi-snapshot-controller                    4.16.5    True        False         False      4h11m
dns                                        4.16.5    True        False         False      3h58m
image-registry                             4.16.5    True        False         True       3h58m   ImageRegistryCertificatesControllerDegraded: failed to update object *v1.ConfigMap, Namespace=openshift-image-registry, Name=image-registry-certificates: image-registry-certificates: configmap "registry-additional-ca-q9f6x5i4" not found
ingress                                    4.16.5    True        False         False      3h59m
insights                                   4.16.5    True        False         False      4h
kube-apiserver                             4.16.5    True        False         False      4h11m
kube-controller-manager                    4.16.5    True        False         False      4h11m
kube-scheduler                             4.16.5    True        False         False      4h11m
kube-storage-version-migrator              4.16.5    True        False         False      166m
monitoring                                 4.16.5    True        False         False      3h55m

https://github.com/openshift/hypershift/pull/4647

Bug TRT-1494: Create Azure liveness probe in origin and standup Azure liveness endpoint

View the Description View the linked PRs

In ~~TRT-1476~~, we created a VM that served as an endpoint where we can test connectivity in gcp.
We want one for Azure.

In ~~TRT-1477~~, we created some code in origin to send HTTP GETs to that endpoint as a test to ensure connectivity remains working. Do this also for Azure.

https://github.com/openshift/origin/pull/28582

Bug OCPBUGS-24932: Update 4.16 ose-cluster-update-keys-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/53

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/53

Task MON-3825: Bump downstream Prometheus to v2.51.2

View the linked PRs

Bug OCPBUGS-37460: Openshift uncordoned compute-node that was intentionally cordoned

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33397~~. The following is the description of the original issue:
—
Description of problem:

Node has been cordoned manually.After several days, machine-config-controller uncordoned the same node after rendering a new machine-config.

Version-Release number of selected component (if applicable):

    4.13

Actual results:

The mco rolled out and the node was uncordoned by the mco

Expected results:

 MCO treat unscedhulable node as not ready for performing update. Also, it may halt update on other nodes  in the pool based on what maxUnavailable is set for that pool

Additional info:

https://github.com/openshift/machine-config-operator/pull/4483

Bug OCPBUGS-38726: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4611

Bug MGMT-17106: Network disconnection due to vlan interface creation

View the Description View the linked PRs

Description of the problem:

In a deployment with bonding and vlan during the booting of the provision image, the system loose the connectivity (even the ping) as soon as two new network interfaces appear on the node, created by the ironic-python-agent, those interfaces are the slaves interfaces with the vlan added.{code}
Version-Release number of selected component (if applicable):

4.12.48\{code}
How reproducible:
{code:none}
Always\{code}
Steps to Reproduce:
{code:none}
1. Deploy a cluster with bonding + vlan
2.
3.
Actual results:
{code:none}
After investigation from Openstack team, it looks like having this option "enable_vlan_interfaces = all" enabled in "/etc/ironic-python-agent.conf" is what trigger the creation of the vlan interfaces. This new interfaces is what cuts the communication.

Expected results:

No extra vlan interfaces created, communication is not lost and installation succeeds.

Additional info:

How customer crafted the test:

As soon as the node start pinging he connected with ssh and set a password to core user
one communication is lost (~1 min after started pinging) it connects through the KVM interface and cor password.
If we disable the ironic-python-agent and manually remove the vlan interfaces created the communication is restored.  Installation works if LLDP is turned off at teh switch.

This issue was supposed to be fixed in these versions, according to the original JIRA which I have linked here.

Team lead from that JIRA suggested the issue has to be fixed by re-vendoring ICC in the assisted-service, hence this JIRA creation.

https://github.com/openshift/assisted-service/pull/6055

Bug OCPBUGS-17788: OpenShift Container Platform 4.13.4 installation is failing because of rendered-master-${hash} not found

View the Description View the linked PRs

Description of problem:

The installation of OpenShift Container Platform 4.13.4 is failing fairly frequent compare to previous version, when installing with proxy configured.

The error reported by the MachineConfigPool is as shown below.

  - lastTransitionTime: "2023-07-04T10:36:44Z"
    message: 'Node master0.example.com is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-1e13d7d4ca10669d3d5a6a2bd532873a\" not found", Node master1.example.com
      is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-1e13d7d4ca10669d3d5a6a2bd532873a\"
      not found", Node master2.example.com is reporting:
      "machineconfig.machineconfiguration.openshift.io \"rendered-master-1e13d7d4ca10669d3d5a6a2bd532873a\"
      not found"'

According to https://docs.google.com/document/d/1fgP6Kv1D-75e1Ot0Kg-W2qPyxWDp2_CALltlBLuseec/edit#heading=h.ny6l9ud82fxx this seems to be a known condition but it's not clear how to prevent that from happening and therefore ensure installation are working as expected.

The major difference found between /etc/mcs-machine-config-content.json on the OpenShift Container Platform 4 - Control-Plane Node and the rendered-master-${hash} are within the following files.

 - /etc/mco/proxy.env
 - /etc/kubernetes/kubelet-ca.crt

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.4

How reproducible:

Random

Steps to Reproduce:

1. Install OpenShift Container Platform 4.13.4 on AWS with platform:none, proxy defined and both machineCIDR and machineNetwork.cidr set.

Actual results:

Installation is stuck and will eventually fail as the MachineConfigPool is failing to rollout required MachineConfig for master MachineConfigPool

  - lastTransitionTime: "2023-07-04T10:36:44Z"
    message: 'Node master0.example.com is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-1e13d7d4ca10669d3d5a6a2bd532873a\" not found", Node master1.example.com
      is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-1e13d7d4ca10669d3d5a6a2bd532873a\"
      not found", Node master2.example.com is reporting:
      "machineconfig.machineconfiguration.openshift.io \"rendered-master-1e13d7d4ca10669d3d5a6a2bd532873a\"
      not found"'

Expected results:

Installation to work or else provide meaningful error messaging

Additional info:

https://docs.google.com/document/d/1fgP6Kv1D-75e1Ot0Kg-W2qPyxWDp2_CALltlBLuseec/edit#heading=h.ny6l9ud82fxx checked and then talked to Red Hat Engineering as it was not clear how to proceed

https://github.com/openshift/machine-config-operator/pull/4096

Bug OCPBUGS-42069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14302

Task MON-3793: Update downstream prometheus-operator to v0.73.0

View the linked PRs

Bug OCPBUGS-25001: Update 4.16 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/774

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25583: Update 4.16 ose-prometheus-adapter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/100

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/100

Bug OCPBUGS-31277: ART requests updates to 4.16 image openstack-cluster-api-controllers-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-openstack/pull/300

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-openstack/pull/301

Bug OCPBUGS-35243: [capi aws] installs fail with STS credentials

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35041~~. The following is the description of the original issue:
—
Description of problem:

For STS, an AWS creds file is injected with credentials_process for installer to use. That usually points to a command that loads a Secret containing the creds necessary to assume role. 

For CAPI, installer runs in an ephemeral envtest cluster. So when it runs that credentials_process (via the black box of passing the creds file to the AWS SDK) the command ends up requesting that Secret from the envtest kube API server… where it doesn’t exist.

The Installer should avoid overriding KUBECONFIG whenever possible.

Version-Release number of selected component (if applicable):

    4.16+

How reproducible:

    always

Steps to Reproduce:

    1. Deploy cluster with STS credentials
    2.
    3.

Actual results:

    Install fails with:

time="2024-06-02T23:50:17Z" level=debug msg="failed to get the service provider secret: secrets \"shawnnightly-aws-service-provider-secret\" not foundfailed to get the service provider secret: oc get events -n uhc-staging-2blaesc1478urglmcfk3r79a17n82lm3E0602 23:50:17.324137     151 awscluster_controller.go:327] \"failed to reconcile network\" err=<"
time="2024-06-02T23:50:17Z" level=debug msg="\tfailed to create new managed VPC: failed to create vpc: ProcessProviderExecutionError: error in credential_process"
time="2024-06-02T23:50:17Z" level=debug msg="\tcaused by: exit status 1"
time="2024-06-02T23:50:17Z" level=debug msg=" > controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/shawnnightly-c8zdl\" namespace=\"openshift-cluster-api-guests\" name=\"shawnnightly-c8zdl\" reconcileID=\"e7524343-f598-4b71-a788-ad6975e92be7\" cluster=\"openshift-cluster-api-guests/shawnnightly-c8zdl\""
time="2024-06-02T23:50:17Z" level=debug msg="I0602 23:50:17.324204     151 recorder.go:104] \"Failed to create new managed VPC: ProcessProviderExecutionError: error in credential_process\\ncaused by: exit status 1\" logger=\"events\" type=\"Warning\" object={\"kind\":\"AWSCluster\",\"namespace\":\"openshift-cluster-api-guests\",\"name\":\"shawnnightly-c8zdl\",\"uid\":\"f20bd7ae-a8d2-4b16-91c2-c9525256bb46\",\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta2\",\"resourceVersion\":\"311\"} reason=\"FailedCreateVPC\""

Expected results:

    No failures

Additional info:

https://github.com/openshift/installer/pull/8566

Task MON-3589: Add a test for Prometheus timestamp-tolerance

View the Description View the linked PRs

Introduced in https://github.com/openshift/cluster-monitoring-operator/pull/2187

https://github.com/openshift/cluster-monitoring-operator/pull/2194

Bug OCPBUGS-29583: [service-ca-operator] Apply hypershift cluster-profile for ibm-cloud-managed

View the Description View the linked PRs

Since HyperShift / Hosted Control Plane have adopted include.release.openshift.io/ibm-cloud-managed, to tailor the resources of clusters running in the ROKS IBM environment, the include.release.openshift.io/hypershift addition will allow Hypershift to express different profile choices than ROKS

https://github.com/openshift/service-ca-operator/pull/234

Bug OCPBUGS-31808: control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error

View the Description View the linked PRs

Description of problem:

control-plane-machine-set operator pod stuck into crashloopbackoff state with panic: runtime error: invalid memory address or nil pointer dereference while extracting the failureDomain from the controlplanemachineset. Below is the error trace for reference.
~~~
2024-04-04T09:32:23.594257072Z I0404 09:32:23.594176       1 controller.go:146]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="c282f3e3-9f9d-40df-a24e-417ba2ea4106"
2024-04-04T09:32:23.594257072Z I0404 09:32:23.594221       1 controller.go:125]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55"
2024-04-04T09:32:23.594274974Z I0404 09:32:23.594257       1 controller.go:146]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55"
2024-04-04T09:32:23.597509741Z I0404 09:32:23.597426       1 watch_filters.go:179] reconcile triggered by infrastructure change
2024-04-04T09:32:23.606311553Z I0404 09:32:23.606243       1 controller.go:220]  "msg"="Starting workers" "controller"="controlplanemachineset" "worker count"=1
2024-04-04T09:32:23.606360950Z I0404 09:32:23.606340       1 controller.go:169]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400"
2024-04-04T09:32:23.609322467Z I0404 09:32:23.609217       1 panic.go:884]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400"
2024-04-04T09:32:23.609322467Z I0404 09:32:23.609271       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="controlplanemachineset" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400"
2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2024-04-04T09:32:23.612540681Z     panic: runtime error: invalid memory address or nil pointer dereference
2024-04-04T09:32:23.612540681Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a5911c]
2024-04-04T09:32:23.612540681Z 
2024-04-04T09:32:23.612540681Z goroutine 255 [running]:
2024-04-04T09:32:23.612540681Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2024-04-04T09:32:23.612571624Z     /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa
2024-04-04T09:32:23.612571624Z panic({0x1c8ac60, 0x31c6ea0})
2024-04-04T09:32:23.612571624Z     /usr/lib/golang/src/runtime/panic.go:884 +0x213
2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.VSphereProviderConfig.ExtractFailureDomain(...)
2024-04-04T09:32:23.612571624Z     /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/vsphere.go:120
2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.providerConfig.ExtractFailureDomain({{0x1f2a71a, 0x7}, {{{{...}, {...}}, {{...}, {...}, {...}, {...}, {...}, {...}, ...}, ...}}, ...})
2024-04-04T09:32:23.612588145Z     /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/providerconfig.go:212 +0x23c
~~~

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

control-plane-machine-set operator stuck into crashloopback off state while cluster upgrade.

Expected results:

control-plane-machine-set operator should be upgraded without any errors.

Additional info:

This is happening during the cluster upgrade of Vsphere IPI cluster from OCP version 4.14.z to 4.15.6 and may impact other z stream releases. 
from the official docs[1]  I see providing the failure domain for the Vsphere platform is tech preview feature.
[1] https://docs.openshift.com/container-platform/4.15/machine_management/control_plane_machine_management/cpmso-configuration.html#cpmso-yaml-failure-domain-vsphere_cpmso-configuration

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/287

Bug OCPBUGS-30136: machine-config CO degraded due to MachineConfigNode without ownerReference

View the Description View the linked PRs

Description of problem:


$ oc get co machine-config
NAME             VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.16.0-0.ci-2024-03-01-110656   False       False         True       2m56s   Cluster not available for [{operator 4.16.0-0.ci-2024-03-01-110656}]: MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-24-212.us-east-2.compute.internal" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty]


MCO operator is failing with this error:


218", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MachineConfigNodeFailed' Cluster not available for [{operator 4.16.0-0.ci-2024-03-01-110656}]: MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-24-212.us-east-2.compute.internal" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty]
I0301 17:19:12.823035       1 event.go:364] Event(v1.ObjectReference{Kind:"", Namespace:"openshift-machine-config-operator", Name:"machine-config", UID:"c1bad7e7-26ff-47fb-8a2d-a0c03c04d218", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OperatorDegraded: MachineConfigNodeFailed' Failed to resync 4.16.0-0.ci-2024-03-01-110656 because: MachineConfigNode.machineconfiguration.openshift.io "ip-10-0-49-207.us-east-2.compute.internal" is invalid: [metadata.ownerReferences.apiVersion: Invalid value: "": version must not be empty, metadata.ownerReferences.kind: Invalid value: "": kind must not be empty]

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.ci-2024-03-01-110656   True        False         17m     Error while reconciling 4.16.0-0.ci-2024-03-01-110656: the cluster operator machine-config is not available

How reproducible:

Always

Steps to Reproduce:

    1. Enable techpreview
 oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}'

Actual results:


machine-config CO is degraded

Expected results:


machine-config CO should not be degraded, no error should happen in MCO operator pod

Additional info:

https://github.com/openshift/machine-config-operator/pull/4231

Bug OCPBUGS-30169: CEO deadlocks on health checking a downed member

View the Description View the linked PRs

Description of problem:

For certain operations the CEO will check the etcd member health by creating a client directly and waiting for its status report.

Under a situation of any member not being reachable for a longer period, we found the CEO was constantly getting stuck / deadlocked and couldn't move certain controllers forward. 

In OCPBUGS-12475 we introduced a health-check that would dump stack and automatically restart with the operator deployment health probe.

In a more recent upgrade run we could find the culprit [1] to be a missing context during client initialization to etcd, making it stuck infinitely:


W0229 02:55:46.820529       1 aliveness_checker.go:33] Controller [EtcdEndpointsController] didn't sync for a long time, declaring unhealthy and dumping stack

goroutine 1426 [select]:
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth({0x3272768?, 0xc002090310}, {0xc0000a6880, 0x3, 0xc001c98360?})
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:64 +0x330
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.(*etcdClientGetter).MemberHealth(0xc000c24540, {0x3272688, 0x4c20080})
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/etcdcli.go:412 +0x18c
github.com/openshift/cluster-etcd-operator/pkg/operator/ceohelpers.CheckSafeToScaleCluster({0x324ccd0?, 0xc000b6d5f0?}, {0x3284250?, 0xc0008dda10?}, {0x324e6c0, 0xc000ed4fb0}, {0x3250560, 0xc000ed4fd0}, {0x32908d0, 0xc000c24540})
	github.com/openshift/cluster-etcd-operator/pkg/operator/ceohelpers/bootstrap.go:149 +0x28e
github.com/openshift/cluster-etcd-operator/pkg/operator/ceohelpers.(*QuorumCheck).IsSafeToUpdateRevision(0x2893020?)
	github.com/openshift/cluster-etcd-operator/pkg/operator/ceohelpers/qourum_check.go:37 +0x46
github.com/openshift/cluster-etcd-operator/pkg/operator/etcdendpointscontroller.(*EtcdEndpointsController).syncConfigMap(0xc0002e28c0, {0x32726f8, 0xc0008e60a0}, {0x32801b0, 0xc001198540})
	github.com/openshift/cluster-etcd-operator/pkg/operator/etcdendpointscontroller/etcdendpointscontroller.go:146 +0x5d8
github.com/openshift/cluster-etcd-operator/pkg/operator/etcdendpointscontroller.(*EtcdEndpointsController).sync(0xc0002e28c0, {0x32726f8, 0xc0008e60a0}, {0x325d240, 0xc003569e90})
	github.com/openshift/cluster-etcd-operator/pkg/operator/etcdendpointscontroller/etcdendpointscontroller.go:66 +0x71
github.com/openshift/cluster-etcd-operator/pkg/operator/health.(*CheckingSyncWrapper).Sync(0xc000f21bc0, {0x32726f8?, 0xc0008e60a0?}, {0x325d240?, 0xc003569e90?})
	github.com/openshift/cluster-etcd-operator/pkg/operator/health/checking_sync_wrapper.go:22 +0x43
github.com/openshift/library-go/pkg/controller/factory.(*baseController).reconcile(0xc00113cd80, {0x32726f8, 0xc0008e60a0}, {0x325d240?, 0xc003569e90?})
	github.com/openshift/library-go@v0.0.0-20240124134907-4dfbf6bc7b11/pkg/controller/factory/base_controller.go:201 +0x43



goroutine 11640 [select]:
google.golang.org/grpc.(*ClientConn).WaitForStateChange(0xc003707000, {0x3272768, 0xc002091260}, 0x3)
	google.golang.org/grpc@v1.58.3/clientconn.go:724 +0xb1
google.golang.org/grpc.DialContext({0x3272768, 0xc002091260}, {0xc003753740, 0x3c}, {0xc00355a880, 0x7, 0xc0023aa360?})
	google.golang.org/grpc@v1.58.3/clientconn.go:295 +0x128e
go.etcd.io/etcd/client/v3.(*Client).dial(0xc000895180, {0x32754a0?, 0xc001785670?}, {0xc0017856b0?, 0x28f6a80?, 0x28?})
	go.etcd.io/etcd/client/v3@v3.5.10/client.go:303 +0x407
go.etcd.io/etcd/client/v3.(*Client).dialWithBalancer(0xc000895180, {0x0, 0x0, 0x0})
	go.etcd.io/etcd/client/v3@v3.5.10/client.go:281 +0x1a9
go.etcd.io/etcd/client/v3.newClient(0xc002484e70?)
	go.etcd.io/etcd/client/v3@v3.5.10/client.go:414 +0x91c
go.etcd.io/etcd/client/v3.New(...)
	go.etcd.io/etcd/client/v3@v3.5.10/client.go:81
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.newEtcdClientWithClientOpts({0xc0017853d0, 0x1, 0x1}, 0x0, {0x0, 0x0, 0x0?})
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/etcdcli.go:127 +0x77d
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.checkSingleMemberHealth({0x32726f8, 0xc00318ac30}, 0xc002090460)
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:103 +0xc5
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth.func1()
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:58 +0x6c
created by github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth in goroutine 1426
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:54 +0x2a5

  

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-upgrade-from-stable-4.15-e2e-metal-ipi-upgrade-ovn-ipv6/1762965898773139456/

Version-Release number of selected component (if applicable):

any currently supported OCP version

How reproducible:

Always

Steps to Reproduce:

    1. create a healthy cluster
    2. make sure one etcd member never responds, but the node is still there (ie kubelet shutdown, blocking the etcd ports on a firewall)
    3. wait for the CEO to restart pod on failing health probe and dump its stack (similar to the one above)

Actual results:

CEO controllers are getting deadlocked, but the operator will restart eventually after some time due to health probes failing

Expected results:

CEO should mark the member as unhealthy and continue its service without getting deadlocked and should not restart its pod by failing the health probe

Additional info:

clientv3.New doesn't take any timeout context, but tries to establish a connection forever

https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/etcdcli/etcdcli.go#L127-L130

There's a way to pass the "default context" via the client config, which is slightly misleading.

https://github.com/openshift/cluster-etcd-operator/pull/1215

Bug OCPBUGS-31759: Update i18n docs on how to update Phrase Project Template

View the Description View the linked PRs

Update i18n docs with how to enable additional supported languages

https://github.com/openshift/console/pull/13730

Bug OCPBUGS-38290: The Catalog Operator attempts to connect to deleted catalogSources

View the Description View the linked PRs

Description of problem:

OLM still check the deleted catsrc of openshift-marketplace

Version-Release number of selected component (if applicable):

4.13

How reproducible:

not always

Steps to Reproduce:

https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-gcp-ipi-sdn-p1-f7/1632127504539979776/artifacts/gcp-ipi-sdn-p1-f7/openshift-extended-test/build-log.txt

In daily CI, we met this issue several times.
for example:
https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-gcp-ipi-sdn-p1-f7/1632127504539979776/artifacts/gcp-ipi-sdn-p1-f7/openshift-extended-test/build-log.txt

prometheus-dependency1-cs has been deleted, but many sub are installed failed due to ErrorPreventedResolution.

"message": "failed to populate resolver cache from source prometheus-dependency1-cs/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup prometheus-dependency1-cs.openshift-marketplace.svc on 172.30.0.10:53: no such host\"",
                "reason": "ErrorPreventedResolution",
                "status": "True",
                "type": "ResolutionFailed"

2023-03-04T22:35:00.761837299Z time="2023-03-04T22:35:00Z" level=info msg="removed client for deleted catalogsource" source="{prometheus-dependency1-cs openshift-marketplace}"

 4114 2023-03-04T22:39:38.039489890Z E0304 22:39:38.039410       1 queueinformer_operator.go:298] sync "e2e-test-olm-a-fa98jfef-sxnxr" failed: failed to populate resolver cach      e from source prometheus-dependency1-cs/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error wh      ile dialing dial tcp: lookup prometheus-dependency1-cs.openshift-marketplace.svc on 172.30.0.10:53: no such host"

Actual results:

The deleted catsrc impacts sub installation.

Expected results:

The deleted catsrc should not impact sub installation.

Additional info:

https://github.com/openshift/operator-framework-olm/pull/838

Bug OCPBUGS-24916: Update 4.16 ose-service-ca-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/service-ca-operator/pull/227

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/service-ca-operator/pull/240

Bug OCPBUGS-27429: Handle kubeconfig changes like CA rotation

View the Description View the linked PRs

Description of problem:

A long-lived cluster updating into 4.16.0-ec.1 was bitten by the Engineering Candidate's month-or-more-old api-int CA rotation (details on early rotation in API-1687). After manually updating /var/lib/kubelet/kubeconfig to include the new CA (which ~~OCPBUGS-25821~~ is working on automating), multus pods still complained about untrusted api-int:

$ oc -n openshift-multus logs multus-pz7zp | grep api-int | tail -n5
E0119 19:33:52.983918    3194 reflector.go:148] k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dbuild0-gstfj-m-2.c.openshift-ci-build-farm.internal&resourceVersion=4723865081": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:33:55Z [error] Multus: [openshift-machine-api/cluster-autoscaler-default-f8dd547c7-dg9t5/f79ff01a-71c2-4f02-b48b-8c23c9e875ce]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-machine-api/pods/cluster-autoscaler-default-f8dd547c7-dg9t5?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:33:55Z [verbose] ADD finished CNI request ContainerID:"b554f8edca8ea7672119c1aa71a69e0368fefeb5f8ae2c2659f822b7fa8d3f62" Netns:"/var/run/netns/36923fe0-e28d-422f-8213-233086527baa" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-machine-api;K8S_POD_NAME=cluster-autoscaler-default-f8dd547c7-dg9t5;K8S_POD_INFRA_CONTAINER_ID=b554f8edca8ea7672119c1aa71a69e0368fefeb5f8ae2c2659f822b7fa8d3f62;K8S_POD_UID=f79ff01a-71c2-4f02-b48b-8c23c9e875ce" Path:"", result: "", err: error configuring pod [openshift-machine-api/cluster-autoscaler-default-f8dd547c7-dg9t5] networking: Multus: [openshift-machine-api/cluster-autoscaler-default-f8dd547c7-dg9t5/f79ff01a-71c2-4f02-b48b-8c23c9e875ce]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-machine-api/pods/cluster-autoscaler-default-f8dd547c7-dg9t5?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:34:00Z [error] Multus: [openshift-kube-storage-version-migrator/migrator-558d4d48b9-ggjpj/769153af-350b-492b-9589-ede2574aea85]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-kube-storage-version-migrator/pods/migrator-558d4d48b9-ggjpj?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority
2024-01-19T19:34:00Z [verbose] ADD finished CNI request ContainerID:"cfd0b8ca596411f1e26ae058fc9f015d6edeac407668420c023ff459860423eb" Netns:"/var/run/netns/bc7fbf17-c049-4241-a7dc-7e27acd3c8af" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-storage-version-migrator;K8S_POD_NAME=migrator-558d4d48b9-ggjpj;K8S_POD_INFRA_CONTAINER_ID=cfd0b8ca596411f1e26ae058fc9f015d6edeac407668420c023ff459860423eb;K8S_POD_UID=769153af-350b-492b-9589-ede2574aea85" Path:"", result: "", err: error configuring pod [openshift-kube-storage-version-migrator/migrator-558d4d48b9-ggjpj] networking: Multus: [openshift-kube-storage-version-migrator/migrator-558d4d48b9-ggjpj/769153af-350b-492b-9589-ede2574aea85]: error waiting for pod: Get "https://api-int.build02.gcp.ci.openshift.org:6443/api/v1/namespaces/openshift-kube-storage-version-migrator/pods/migrator-558d4d48b9-ggjpj?timeout=1m0s": tls: failed to verify certificate: x509: certificate signed by unknown authority

The multus pod needed a delete/replace, and after that it recovered:

$ oc --as system:admin -n openshift-multus delete pod multus-pz7zp
pod "multus-pz7zp" deleted
$ oc -n openshift-multus get -o wide pods | grep 'NAME\|build0-gstfj-m-2.c.openshift-ci-build-farm.internal'
NAME                                           READY   STATUS              RESTARTS      AGE     IP               NODE                                                              NOMINATED NODE   READINESS GATES
multus-additional-cni-plugins-wrdtt            1/1     Running             1             28h     10.0.0.3         build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
multus-admission-controller-74d794678b-9s7kl   2/2     Running             0             27h     10.129.0.36      build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
multus-hxmkz                                   1/1     Running             0             11s     10.0.0.3         build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
network-metrics-daemon-dczvs                   2/2     Running             2             28h     10.129.0.4       build0-gstfj-m-2.c.openshift-ci-build-farm.internal               <none>           <none>
$ oc -n openshift-multus logs multus-hxmkz | grep -c api-int
0

That need for multus-pod deletion should be automated, to reduce the number of things that need manual touches when the api-int CA rolls.

Version-Release number of selected component

Seen in 4.16.0-ec.1.

How reproducible:

Several multus on this cluster were bit. But others were not, including some on clusters with old kubeconfigs that did not contain the new CA. I'm not clear on what the trigger is, perhaps some clients escape immediate trouble by having exsting api-int connections to servers from back when the servers used the old CA? But deleting the multus pod on a cluster whose /var/lib/kubelet/kubeconfig has not yet been updated will likely reproduce the breakage, at least until ~~OCPBUGS-25821~~ is fixed.

Steps to Reproduce:

Not entirely clear, but something like:

Install 4.16.0-ec.1.
Wait a month or more for the Kube API server operator to decide to roll the CA signing api-int.
Delete a multus pod, so the replacement comes up broken on api-int trust.
Manually update /var/lib/kubelet/kubeconfig.

Actual results:

Multus still fails to trust api-int until the broken pod is deleted or the container otherwise restarts to notice the updated kubeconfig.

Expected results:

Multus pod automatically pulls in the updated kubeconfig.

Additional info:

One possible implementation would be a liveness probe failing on api-int trust issues, triggering the kubelet to roll the multus container, and the replacement multus container to come up and load the fresh kubeconfig.

https://github.com/openshift/multus-cni/pull/216

Bug OCPBUGS-32631: TaskRuns should not be fetched for Failed PLR's

View the Description View the linked PRs

Description of problem:

1. wrt changes done in PR - https://github.com/openshift/console/pull/13676 TaskRuns are fetched for Failed and Cancelled PipelineRuns. 

In order to still improve the performance of PLR list page, use pipelinerun.status.conditions.message  for Failed TaskRuns as well and along with that, for any PLR, if string pipelinerun.status.conditions.message having data about Tasks status use that string only instead of fetching TaskRuns 

ex string : 'Tasks Completed: 2 (Failed: 1, Cancelled 0), Skipped: 1'

2. For Failed PLR, to show the log snippet, make the API call on click of Failed status column in the list page

https://github.com/openshift/console/pull/13806

Bug OCPBUGS-33513: ART requests updates to 4.16 image ose-csi-driver-shared-resource-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/178

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/178

Bug OCPBUGS-24381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2262

Bug OCPBUGS-28705: add new tested azure instance types in installer doc

View the Description View the linked PRs

Description of problem:

When running 4.15 installer full function test, detect below three instance families and verified, need to append them in installer doc[1]:
- standardHBv4Family
- standardMSMediumMemoryv3Family
- standardMDSMediumMemoryv3Family

[1] https://github.com/openshift/installer/blob/master/docs/user/azure/tested_instance_types_x86_64.md

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7961

Bug OCPBUGS-13114: Topology links between VMs and non VMs (such as Pod or Deployment) don't show

View the Description View the linked PRs

Description of problem:

Topology links between VMs and non VMs (such as Pod or Deployment) don't show

Version-Release number of selected component (if applicable):

4.12.14

How reproducible:

every time via UI or annoation

Steps to Reproduce:

1. Create VM
2. Create Pod/Deployment
3. Add annoation or link via UI

Actual results:

annotation is updated only

Expected results:

topology shows linkage

Additional info:

 app.openshift.io/connects-to: >-
      [{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","name":"es-master00"},{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","name":"es-master01"},{"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","name":"es-master02"}]

https://github.com/openshift/console/pull/13617

Story METAL-901: update ironic containers dependencies for OCP 4.16

View the Description View the linked PRs

we need to update packages_ironic.yml to be closer to current opendev master upper constraints
after the new packages are created we'll have to tag them and update the ironic-image configuration

https://github.com/openshift/ironic-image/pull/457

Bug OCPBUGS-26498: Router fails to start/reload with SHA1 cert due to OpenSSL 3.0 in RHEL9

View the Description View the linked PRs

Description of problem:

   Due to RHEL9 incorporating OpenSSL 3.0, HaProxy will refuse to start if provided with a cert using SHA1-based signature algorithm. RHEL9 is being introduced in 4.16. This means customers updating from 4.15 to 4.16 with a SHA1 cert will find their router in a failure state.


My Notes from experimenting with various ways of using a cert in ingress:
- Routes with SHA1 spec.tls.certificate WILL prevent HaProxy from reloading/starting
- It is NOT limited to FIPs, I broke a non-FIPs cluster with this
- Routes with SHA1 spec.tls.caCertificate will NOT prevent HaProxy starting, but route is rejected, due to extended route validation failure:
    - lastTransitionTime: "2024-01-04T20:18:01Z"
      message: 'spec.tls.certificate: Invalid value: "redacted certificate data":
        error verifying certificate: x509: certificate signed by unknown authority
        (possibly because of "x509: cannot verify signature: insecure algorithm SHA1-RSA
        (temporarily override with GODEBUG=x509sha1=1)" while trying to verify candidate
        authority certificate "www.exampleca.com")'

- Routes with SHA1 spec.tls.destinationCACertificate will NOT prevent HaProxy from starting. It actually seems to work as expected
- IngressController with SHA1 spec.defaultCertificate WILL prevent HaProxy from starting.
- IngressController with SHA1 spec.clientTLS.clientCA will NOT prevent HaProxy from starting.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

    1. Create a Ingress Controller with spec.defaultCertificate or a Route with spec.tls.certificate as a SHA1 cert
    2. Roll out the router

Actual results:

    Router fails to start

Expected results:

    Router should start

Additional info:

    We've previously documented via story in RHEL9 epic: https://issues.redhat.com/browse/NE-1449

The initial fix for this issue was merged as [https://github.com/openshift/router/pull/555].  This issue is currently causing some issues, notably causing the openshift/cluster-ingress-operator repository's {{TestRouteAdmissionPolicy}} E2E test to fail intermittently, which causes the e2e-azure, e2e-gcp-operator, and e2e-aws-operator CI jobs to fail intermittently.

Note: In the solution, we only intend to reject **routes** with SHA1 cert on spec.tls.certificate. Ingress Controller with SHA1 cert on spec.defaultCertificate will NOT be rejected.

Bug OCPBUGS-31101: [AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS arm instance types "c7gd.2xlarge , m7gd.xlarge"

View the Description View the linked PRs

Description of problem:

[AWS-EBS-CSI-Driver] allocatable volumes count incorrect in csinode for AWS arm instance types "c7gd.2xlarge , m7gd.xlarge"

Version-Release number of selected component (if applicable):

    4.15.3

How reproducible:

    Always

Steps to Reproduce:

    1. Create an Openshift cluster on AWS with intance types "c7gd.2xlarge , m7gd.xlarge"
    2. Check the csinode allocatable volumes count 
    3. Create statefulset with 1 pvc mounted and max allocatable volumes count replicas with nodeAffinity 
    apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: statefulset-vol-limit
spec:
  serviceName: "my-svc"
  replicas: $VOL_COUNT_LIMIT
  selector:
    matchLabels:
      app: my-svc
  template:
    metadata:
      labels:
        app: my-svc
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - $NODE_NAME
      containers:
      - name: openshifttest
        image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
        volumeMounts:
        - name: data
          mountPath: /mnt/storage
      tolerations:
        - key: "node-role.kubernetes.io/master"
          effect: "NoSchedule"
  volumeClaimTemplates:
  - metadata:
      name: doc gata
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: gp3-csi
      resources:
        requests:
          storage: 1Gi
    4. The statefulset all replicas should all become ready.

Actual results:

In step 4, the statefulset 26th replica(pod) stuck at ContainerCreating caused by the volume couldn't be attached to the node(the csinode allocatable volumes count incorrect) 
$ oc get no/ip-10-0-22-114.ec2.internal -oyaml|grep 'instance'
    beta.kubernetes.io/instance-type: m7gd.xlarge
    node.kubernetes.io/instance-type: m7gd.xlarge
 $ oc get csinode/ip-10-0-22-114.ec2.internal -oyaml
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
  annotations:
    storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/vsphere-volume
  creationTimestamp: "2024-03-20T02:16:34Z"
  name: ip-10-0-22-114.ec2.internal
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: ip-10-0-22-114.ec2.internal
    uid: acb9a153-bb9b-4c4a-90c1-f3e095173ce2
  resourceVersion: "19281"
  uid: 12507a73-898d-441a-a844-41c7de290b5b
spec:
  drivers:
  - allocatable:
      count: 26
    name: ebs.csi.aws.com
    nodeID: i-00ec014c5676a99d2
    topologyKeys:
    - topology.ebs.csi.aws.com/zone
$ export VOL_COUNT_LIMIT="26"
$ export NODE_NAME="ip-10-0-22-114.ec2.internal"
$ envsubst < sts-vol-limit.yaml| oc apply -f -
statefulset.apps/statefulset-vol-limit created
$ oc get sts
NAME                    READY   AGE
statefulset-vol-limit   25/26   169m

$ oc describe po/statefulset-vol-limit-25
Name:             statefulset-vol-limit-25
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-10-0-22-114.ec2.internal/10.0.22.114
Start Time:       Wed, 20 Mar 2024 18:56:08 +0800
Labels:           app=my-svc
                  apps.kubernetes.io/pod-index=25
                  controller-revision-hash=statefulset-vol-limit-7db55989f7
                  statefulset.kubernetes.io/pod-name=statefulset-vol-limit-25
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.128.2.53/23"],"mac_address":"0a:58:0a:80:02:35","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0...
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/statefulset-vol-limit
Containers:
  openshifttest:
    Container ID:
    Image:          quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/storage from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zkwqx (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-statefulset-vol-limit-25
    ReadOnly:   false
  kube-api-access-zkwqx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Normal   Scheduled           167m                 default-scheduler        Successfully assigned default/statefulset-vol-limit-25 to ip-10-0-22-114.ec2.internal
  Warning  FailedAttachVolume  166m (x2 over 166m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": context deadline exceeded
  Warning  FailedAttachVolume  30s (x87 over 166m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b43ec1d0-4fa3-4e87-a80b-6ad912160273" : rpc error: code = Internal desc = Could not attach volume "vol-0a7cb8c5859cf3f96" to node "i-00ec014c5676a99d2": attachment of disk "vol-0a7cb8c5859cf3f96" failed, expected device to be attached but was attaching

Expected results:

    In step4 The statefulset all replicas should all become ready.

Additional info:

    The AWS arm instance types "c7gd.2xlarge , m7gd.xlarge" all should be "25" not "26"

https://github.com/openshift/aws-ebs-csi-driver/pull/261

Bug OCPBUGS-27879: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/78

Bug OCPBUGS-37063: [release-4.16] 'View all steps in documentation' link should be hidden for ROSA and OSD

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37054~~. The following is the description of the original issue:
—
Description of problem:

The 'Getting started resources' card on the Cluster overview includes a link to 'View all steps in documentation', but this link is not valid for ROSA and OSD so it should be hidden.

https://github.com/openshift/console/pull/14069

Bug OCPBUGS-26222: Failed to create the sandbox-plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF [Openshift-4]

View the Description View the linked PRs

Description of problem:

In the 4.14 z-stream rollback job, I'm seeing test-case "[sig-network] pods should successfully create sandboxes by adding pod to network " fail. 

The job link is here https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-upgrade-rollback-oldest-supported/1719037590788640768

The error is:

56 failures to create the sandbox

ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-48-75.us-east-2.compute.internal - 3314.57 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_95d1a457-3e1b-4ae3-8b57-8023eec5937d_0(5b36bc12b2964e85bcdbe60b275d6a12ea68cb18b81f16622a6cb686270c4eb3): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": EOF
ns/openshift-monitoring pod/prometheus-k8s-1 node/ip-10-0-48-75.us-east-2.compute.internal - 3321.57 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-1_openshift-monitoring_95d1a457-3e1b-4ae3-8b57-8023eec5937d_0(3cc0afc5bec362566e4c3bdaf822209377102c2e39aaa8ef5d99b0f4ba795aaf): error adding pod openshift-monitoring_prometheus-k8s-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): failed to send CNI request: Post "http://dummy/cni": dial unix /run/multus/socket/multus.sock: connect: connection refused

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-30-170011

How reproducible:

Flaky

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

The rollback test is testing by installing 4.14.0, then upgrade to the latest 4.14.nightly, at some random point, rolling back to 4.14.0

Bug OCPBUGS-31529: Oc-mirror create invalid format file of itms-oc-mirror.yaml when work with OCI image

View the Description View the linked PRs

Description of problem:

Oc-mirror create invalid format file of itms-oc-mirror.yaml when work with OCI image, when create itms from the file, hit error : 

oc create -f itms-oc-mirror.yaml 
The ImageTagMirrorSet "itms-operator-0" is invalid: spec.imageTagMirrors[0].source: Invalid value: "//app1/noo": spec.imageTagMirrors[0].source in body should match '^\*(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))+$|^((?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])(?:(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))+)?(?::[0-9]+)?)(?:(?:/[a-z0-9]+(?:(?:(?:[._]|__|[-]*)[a-z0-9]+)+)?)+)?$'

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403251146.p0.g03ce0ca.assembly.stream.el9-03ce0ca", GitCommit:"03ce0ca797e73b6762fd3e24100ce043199519e9", GitTreeState:"clean", BuildDate:"2024-03-25T16:34:33Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1)  Copy the operator as OCI format to localhost:
`skopeo copy --all docker://registry.redhat.io/redhat/redhat-operator-index:v4.15 oci:///app1/noo/redhat-operator-index --remove-signatures`

2)  Use following imagesetconfigure for mirror: cat config-multi-op.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
  operators:
    - catalog: oci:///app1/noo/redhat-operator-index
      packages:
        - name: odf-operator
`oc-mirror --config config-multi-op.yaml file://outmulitop   --v2`


3) Do diskTomirror :
`oc-mirror --config config-multi-op.yaml --from file://outmulitop  --v2 docker://ec2-3-139-239-15.us-east-2.compute.amazonaws.com:5000/multi`

4) Create cluster resource with file: itms-oc-mirror.yaml
   `oc create -f itms-oc-mirror.yaml`

Actual results:

4) failed to create ImageTagMirrorSet
oc create -f itms-oc-mirror.yaml 
The ImageTagMirrorSet "itms-operator-0" is invalid: spec.imageTagMirrors[0].source: Invalid value: "//app1/noo": spec.imageTagMirrors[0].source in body should match '^\*(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))+$|^((?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])(?:(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))+)?(?::[0-9]+)?)(?:(?:/[a-z0-9]+(?:(?:(?:[._]|__|[-]*)[a-z0-9]+)+)?)+)?$'

cat itms-oc-mirror.yaml 
---
apiVersion: config.openshift.io/v1
kind: ImageTagMirrorSet
metadata:
  creationTimestamp: null
  name: itms-operator-0
spec:
  imageTagMirrors:
  - mirrors:
    - ec2-3-139-239-15.us-east-2.compute.amazonaws.com:5000/multi
    source: //app1/noo
status: {}

Expected results:

4) succeed to create the cluster resource

https://github.com/openshift/oc-mirror/pull/826

Bug OCPBUGS-44348: [release-4.16] ABI cluster installation fails for external OCI platform

View the Description View the linked PRs

This is a clone of issue OCPBUGS-43674. The following is the description of the original issue:
—

Description of problem:

The assisted service is throwing an error message stating that the Cloud Controller Manager (CCM) is not enabled, even though the CCM value is correctly set in the install-config file.

Version-Release number of selected component (if applicable):

4.18.0-0.nightly-2024-10-19-045205

How reproducible:

Always

Steps to Reproduce:

    1. Prepare install-config and agent-config for external OCI platform.
      example of install-config configuration
.......
.......
platform: external
  platformName: oci
  cloudControllerManager: External
.......
.......
    2. Create agent ISO for external OCI platform     
    3. Boot up nodes using created agent ISO

Actual results:

Oct 21 16:40:47 agent-sno.private.agenttest.oraclevcn.com service[2829]: time="2024-10-21T16:40:47Z" level=info msg="Register cluster: agenttest with id 2666753a-0485-420b-b968-e8732da6898c and params {\"api_vips\":[],\"base_dns_domain\":\"abitest.oci-rhelcert.edge-sro.rhecoeng.com\",\"cluster_networks\":[{\"cidr\":\"10.128.0.0/14\",\"host_prefix\":23}],\"cpu_architecture\":\"x86_64\",\"high_availability_mode\":\"None\",\"ingress_vips\":[],\"machine_networks\":[{\"cidr\":\"10.0.0.0/20\"}],\"name\":\"agenttest\",\"network_type\":\"OVNKubernetes\",\"olm_operators\":null,\"openshift_version\":\"4.18.0-0.nightly-2024-10-19-045205\",\"platform\":{\"external\":{\"cloud_controller_manager\":\"\",\"platform_name\":\"oci\"},\"type\":\"external\"},\"pull_secret\":\"***\",\"schedulable_masters\":false,\"service_networks\":[{\"cidr\":\"172.30.0.0/16\"}],\"ssh_public_key\":\"ssh-rsa XXXXXXXXXXXX\",\"user_managed_networking\":true,\"vip_dhcp_allocation\":false}" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterClusterInternal" file="/src/internal/bminventory/inventory.go:515" cluster_id=2666753a-0485-420b-b968-e8732da6898c go-id=2110 pkg=Inventory request_id=82e83b31-1c1b-4dea-b435-f7316a1965e

Expected results:

The cluster installation should be successful.

https://github.com/openshift/installer/pull/9188

Bug OCPBUGS-24195: Auth operator capable of firing over 100 events in seconds on OpenShiftAPICheckFailed

View the Description View the linked PRs

The failure is fairly rare globally but some platforms seem to see it more often. Last night we happened to see it twice in 10 azure runs and aggregation failed on it. It appears to be a longstanding issue however.

The following test catches the problem

[sig-arch] events should not repeat pathologically for ns/openshift-authentication-operator

And the error will show something similar to:

{  1 events happened too frequently

event happened 70 times, something is wrong: namespace/openshift-authentication-operator deployment/authentication-operator hmsg/16eeb8c913 - reason/OpenShiftAPICheckFailed "oauth.openshift.io.v1" failed with an attempt failed with statusCode = 503, err = the server is currently unable to handle the request From: 15:46:39Z To: 15:46:40Z result=reject }

This is quite severe for just 1 second. The intervals database shows occurrences of over 100.

Sippy's test page provides insight into what platforms see the problem more, and can be used to find job runs where this happens, but the runs from yesterday were:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1729512594592501760

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1729512598153465856

https://github.com/openshift/cluster-authentication-operator/pull/662

Bug OCPBUGS-37446: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/217

Bug OCPBUGS-24271: Egress IP multi NIC: ipv6 does not work

View the linked PRs

https://github.com/openshift/ovn-kubernetes/pull/2048

Bug OCPBUGS-43476: [IBMCloud] install only checks first set of subnets (no pagination support)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43329~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-36236. The following is the description of the original issue:
—
Description of problem:

    The installer for IBM Cloud currently only checks the first group of subnets (50) when searching for Subnet details by name. It should provide pagination support to search all subnets.

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%, dependent on order of subnets returned by IBM Cloud API's however

Steps to Reproduce:

    1. Create 50+ IBM Cloud VPC Subnets
    2. Use Bring Your Own Network (BYON) configuration (with Subnet names for CP and/or Compute) in install-config.yaml
    3. Attempt to create manifests (openshift-install create manifests)

Actual results:

    ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [platform.ibmcloud.controlPlaneSubnets: Not found: "eu-de-subnet-paginate-1-cp-eu-de-1", platform.ibmcloud.controlPlaneSubnets: Not found: "eu-de-subnet-paginate-1-cp-eu-de-2", platform.ibmcloud.controlPlaneSubnets: Not found: "eu-de-subnet-paginate-1-cp-eu-de-3", platform.ibmcloud.controlPlaneSubnets: Invalid value: []string{"eu-de-subnet-paginate-1-cp-eu-de-1", "eu-de-subnet-paginate-1-cp-eu-de-2", "eu-de-subnet-paginate-1-cp-eu-de-3"}: number of zones (0) covered by controlPlaneSubnets does not match number of provided or default zones (3) for control plane in eu-de, platform.ibmcloud.computeSubnets: Not found: "eu-de-subnet-paginate-1-compute-eu-de-1", platform.ibmcloud.computeSubnets: Not found: "eu-de-subnet-paginate-1-compute-eu-de-2", platform.ibmcloud.computeSubnets: Not found: "eu-de-subnet-paginate-1-compute-eu-de-3", platform.ibmcloud.computeSubnets: Invalid value: []string{"eu-de-subnet-paginate-1-compute-eu-de-1", "eu-de-subnet-paginate-1-compute-eu-de-2", "eu-de-subnet-paginate-1-compute-eu-de-3"}: number of zones (0) covered by computeSubnets does not match number of provided or default zones (3) for compute[0] in eu-de]

Expected results:

    Successful manifests and cluster creation

Additional info:

    IBM Cloud is working on a fix

https://github.com/openshift/installer/pull/9102

Bug OCPBUGS-23550: Unable to use oc-mirror on RHEL9 Host with FIPS enabled OCP cluster

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-31291: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/104

Bug OCPBUGS-36285: metal3 reads secrets from environment variables, which is not CIS Compliant

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29687~~. The following is the description of the original issue:
—
Description of problem:

Security baselines such as CIS do not recommend using secrets as environment variables, but using files.

5.4.1 Prefer using secrets as files over secrets as environmen... | Tenable®
https://www.tenable.com/audits/items/CIS_Kubernetes_v1.6.1_Level_2_Master.audit:98de3da69271994afb6211cf86ae4c6b
Secrets in Kubernetes must not be stored as environment variables.
https://www.stigviewer.com/stig/kubernetes/2021-04-14/finding/V-242415

However, metal3 and metal3-image-customization Pods are using environment variables.

$ oc get pod -A -o jsonpath='{range .items[?(@..secretKeyRef)]} {.kind} {.metadata.name} {"\n"}{end}' | grep metal3
 Pod metal3-66b59bbb76-8xzl7 
 Pod metal3-image-customization-965f5c8fc-h8zrk

Version-Release number of selected component (if applicable):

4.14, 4.13, 4.12

How reproducible:

100%

Steps to Reproduce:

    1. Install a new cluster using baremetal IPI
    2. Run a compliance scan using compliance operator[1], or just look at the manifest of metal3 or metal3-image-customization pod
    
    [1] https://docs.openshift.com/container-platform/4.14/security/compliance_operator/co-overview.html

Actual results:

Not compliant to CIS or other security baselines

Expected results:

Compliant to CIS or other security baselines

Additional info:

Bug OCPBUGS-24261: Use better update strategy for konnectivty daemonset

View the Description View the linked PRs

Currently the konnectivity agent has the following update strategy:

```
updateStrategy:
rollingUpdate:
maxUnavailable: 1
maxSurge: 0

```

We (IBM) suggest to update it to the following:
```
updateStrategy:
rollingUpdate:
maxUnavailable: 10%
type: RollingUpdate
```

In a big cluster, it would speed up the konnectivity-agent update. As the agents are independent, this would not hurt the service.

https://github.com/openshift/hypershift/pull/3294

Bug OCPBUGS-31050: Compute server group policy is not honoured

View the Description View the linked PRs

Description of problem:

The install-config.yaml file lets a user set a server group policy for Control plane nodes, and one for Compute nodes, choosing from affinity, soft-affinity, anti-affinity, soft-anti-affinity. Installer will then create the server group if it doesn't exist.

The server group policy defined in install-config for Compute nodes is ignored. The worker server group always has the same policy as the Control plane's.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. openshift-install create install-config
    2. set Compute's serverGroupPolicy to soft-affinity in install-config.yaml
    3. openshift-install create cluster
    4. watch the server groups

Actual results:

both master and worker server groups have the default soft-anti-affinity policy

Expected results:

the worker server group should have soft-affinity as its policy

Additional info:

https://github.com/openshift/installer/pull/8180

Bug OCPBUGS-27509: Autoscaler should scale-from zero MachineSets that declare taints

View the Description View the linked PRs

Description of problem

When a MachineAutoscaler references a currently-zero-Machine MachineSet that includes spec.template.spec.taints, the autoscaler fails to deserialize that MachineSet, which causes it to fail to autoscale that MachineSet. The autoscaler's deserialization logic should be improved to avoid failing on the presence of taints.

Version-Release number of selected component

Reproduced on 4.14.10 and 4.16.0-ec.1. Expected to be every release going back to at least 4.12, based on code inspection.

How reproducible

Always.

Steps to Reproduce

With a launch 4.14.10 gcp Cluster Bot cluster (logs):

$ oc adm upgrade
Cluster version is 4.14.10

Upstream: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.14 (available channels: candidate-4.14, candidate-4.15)
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.
$ oc -n openshift-machine-api get machinesets.machine.openshift.io
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-s48f02k-72292-5z2hn-worker-a   1         1         1       1           29m
ci-ln-s48f02k-72292-5z2hn-worker-b   1         1         1       1           29m
ci-ln-s48f02k-72292-5z2hn-worker-c   1         1         1       1           29m
ci-ln-s48f02k-72292-5z2hn-worker-f   0         0                             29m

Pick that set with 0 nodes. They don't come with taints by default:

$ oc -n openshift-machine-api get -o json machineset.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f | jq '.spec.template.spec.taints'
null

So patch one in:

$ oc -n openshift-machine-api patch machineset.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f --type json -p '[{"op": "add", "path": "/spec/template/spec/taints", "value": [{"effect":"NoSchedule","key":"node-role.kubernetes.io/ci","value":"ci"}
]}]'
machineset.machine.openshift.io/ci-ln-s48f02k-72292-5z2hn-worker-f patched

And set up autoscaling:

$ cat cluster-autoscaler.yaml
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec:
  maxNodeProvisionTime: 30m
  scaleDown:
    enabled: true
$ oc apply -f cluster-autoscaler.yaml 
clusterautoscaler.autoscaling.openshift.io/default created

I'm not all that familiar with autoscaling. Maybe the ClusterAutoscaler doesn't matter, and you need a MachineAutoscaler aimed at the chosen MachineSet?

$ cat machine-autoscaler.yaml 
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: test
  namespace: openshift-machine-api
spec:
  maxReplicas: 2
  minReplicas: 1
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: ci-ln-s48f02k-72292-5z2hn-worker-f
$ oc apply -f machine-autoscaler.yaml 
machineautoscaler.autoscaling.openshift.io/test created

Checking the autoscaler's logs:

$ oc -n openshift-machine-api logs -l k8s-app=cluster-autoscaler --tail -1 | grep taint
W0122 19:18:47.246369       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
W0122 19:18:58.474000       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
W0122 19:19:09.703748       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
W0122 19:19:20.929617       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
...

And the MachineSet is failing to scale:

$ oc -n openshift-machine-api get machinesets.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-s48f02k-72292-5z2hn-worker-f   0         0                             50m

While if I remove the taint:

$ oc -n openshift-machine-api patch machineset.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f --type json -p '[{"op": "remove", "path": "/spec/template/spec/taints"}]'
machineset.machine.openshift.io/ci-ln-s48f02k-72292-5z2hn-worker-f patched

The autoscaler... well, it's not scaling up new Machines like I'd expected, but at least it seems to have calmed down about the taint deserialization issue:

$ oc -n openshift-machine-api get machines.machine.openshift.io
NAME                                       PHASE     TYPE                REGION        ZONE            AGE
ci-ln-s48f02k-72292-5z2hn-master-0         Running   e2-custom-6-16384   us-central1   us-central1-a   53m
ci-ln-s48f02k-72292-5z2hn-master-1         Running   e2-custom-6-16384   us-central1   us-central1-b   53m
ci-ln-s48f02k-72292-5z2hn-master-2         Running   e2-custom-6-16384   us-central1   us-central1-c   53m
ci-ln-s48f02k-72292-5z2hn-worker-a-fwskf   Running   e2-standard-4       us-central1   us-central1-a   45m
ci-ln-s48f02k-72292-5z2hn-worker-b-qkwlt   Running   e2-standard-4       us-central1   us-central1-b   45m
ci-ln-s48f02k-72292-5z2hn-worker-c-rlw4m   Running   e2-standard-4       us-central1   us-central1-c   45m
$ oc -n openshift-machine-api get machinesets.machine.openshift.io ci-ln-s48f02k-72292-5z2hn-worker-f
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-ln-s48f02k-72292-5z2hn-worker-f   0         0                             53m
$ oc -n openshift-machine-api logs -l k8s-app=cluster-autoscaler --tail 50
I0122 19:23:17.284762       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:17.687036       1 legacy.go:296] No candidates for scale down
W0122 19:23:27.924167       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:23:28.510701       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:28.909507       1 legacy.go:296] No candidates for scale down
W0122 19:23:39.148266       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:23:39.737359       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:40.135580       1 legacy.go:296] No candidates for scale down
W0122 19:23:50.376616       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:23:50.963064       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:23:51.364313       1 legacy.go:296] No candidates for scale down
W0122 19:24:01.601764       1 clusterapi_unstructured.go:217] Unable to convert data to taint: %vmap[effect:NoSchedule key:node-role.kubernetes.io/ci value:ci]
I0122 19:24:02.191330       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:02.589766       1 legacy.go:296] No candidates for scale down
I0122 19:24:13.415183       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:13.815851       1 legacy.go:296] No candidates for scale down
I0122 19:24:24.641190       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:25.040894       1 legacy.go:296] No candidates for scale down
I0122 19:24:35.867194       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:36.266400       1 legacy.go:296] No candidates for scale down
I0122 19:24:47.097656       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:47.498099       1 legacy.go:296] No candidates for scale down
I0122 19:24:58.326025       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:24:58.726034       1 legacy.go:296] No candidates for scale down
I0122 19:25:04.927980       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0122 19:25:04.938213       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 10.036399ms
I0122 19:25:09.552086       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:09.952094       1 legacy.go:296] No candidates for scale down
I0122 19:25:20.778317       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:21.178062       1 legacy.go:296] No candidates for scale down
I0122 19:25:32.005246       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:32.404966       1 legacy.go:296] No candidates for scale down
I0122 19:25:43.233637       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:43.633889       1 legacy.go:296] No candidates for scale down
I0122 19:25:54.462009       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:25:54.861513       1 legacy.go:296] No candidates for scale down
I0122 19:26:05.688410       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:06.088972       1 legacy.go:296] No candidates for scale down
I0122 19:26:16.915156       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:17.315987       1 legacy.go:296] No candidates for scale down
I0122 19:26:28.143877       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:28.543998       1 legacy.go:296] No candidates for scale down
I0122 19:26:39.369085       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:39.770386       1 legacy.go:296] No candidates for scale down
I0122 19:26:50.596923       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:26:50.997262       1 legacy.go:296] No candidates for scale down
I0122 19:27:01.823577       1 static_autoscaler.go:552] No unschedulable pods
I0122 19:27:02.223290       1 legacy.go:296] No candidates for scale down
I0122 19:27:04.938943       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0122 19:27:04.947353       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 8.319938ms

Actual results

Scale-from-zero MachineAutoscaler fails on taint-deserialization when the referenced MachineSet contains spec.template.spec.taints.

Expected results

Scale-from-zero MachineAutoscaler works, even when the referenced MachineSet contains spec.template.spec.taints.

https://github.com/openshift/kubernetes-autoscaler/pull/281

Bug OCPBUGS-25126: Update 4.16 ose-aws-ebs-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/247

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/247

Bug OCPBUGS-30431: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/620

Bug OCPBUGS-24515: Observer -> Alerting, Metrics and Targets page does not load

View the Description View the linked PRs

Description of problem:

    Observer - Alerting, Metrics, and Targets page does not load as expected, blank page would be shown

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-07-041003

How reproducible:

    Always

Steps to Reproduce:

    1.Navigate to Observer -> Alerting, Metrics, and Targets page directly
    2.
    3.

Actual results:

    Blank page, no data be loaded

Expected results:

    Work as normal

Additional info:

 Failed to load resource: the server responded with a status of 404 (Not Found)
/api/accounts_mgmt/v1/subscriptions?page=1&search=external_cluster_id%3D%2715ace915-53d3-4455-b7e3-b7a5a4796b5c%27:1

Failed to load resource: the server responded with a status of 403 (Forbidden)
main-chunk-bb9ed989a7f7c65da39a.min.js:1 API call to get support level has failed r: Access denied due to cluster policy.
    at https://console-openshift-console.apps.ci-ln-9fl1l5t-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-bb9ed989a7f7c65da39a.min.js:1:95279
(anonymous) @ main-chunk-bb9ed989a7f7c65da39a.min.js:1
/api/kubernetes/apis/operators.coreos.com/v1alpha1/namespaces/#ALL_NS#/clusterserviceversions?:1
        
        
       Failed to load resource: the server responded with a status of 404 (Not Found)
vendor-patternfly-5~main-chunk-95cb256d9fa7738d2c46.min.js:1 Modal: When using hasNoBodyWrapper or setting a custom header, ensure you assign an accessible name to the the modal container with aria-label or aria-labelledby.

https://github.com/openshift/monitoring-plugin/pull/84

Bug OCPBUGS-25628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/216

Bug OCPBUGS-28982: oauthclients degraded condition never gets removed

View the Description View the linked PRs

Description of problem:

oauthclients degraded condition that never gets removed, meaning once its set due to an issue on a cluster, it wont be unset

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically, when the AuthStatusHandlerFailedApply condition is set on the console operator status conditions.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/855

Bug OCPBUGS-33578: [AWS CAPI install] Failed to creat the api and api-int records for the private zone on C2S/SC2S

View the Description View the linked PRs

Description of problem:

time="2024-05-10T10:06:43-04:00" level=error msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed provisioning resources after infrastructure ready: failed to create route53 records: failed to create records for api: InvalidChangeBatch: [\"\" is not a valid hosted zone id. is not a valid encrypted

Version-Release number of selected component (if applicable):

How reproducible:

4.16.0-0.nightly-2024-05-05-102537

Steps to Reproduce:

1. Install a C2S or an SC2S cluster via Cluster API

Actual results:

See description

Expected results:

Additional info:

Cluster could be created successfully on C2S/SC2S

https://github.com/openshift/installer/pull/8396

Bug OCPBUGS-35219: [4.16] v0 CI failures

View the Description View the linked PRs

Description of problem:

CI is permafailing all the way down to 4.12 due to some breaking changes being side loaded to old version due to a :latest tag for a fixture image

Longer version - we faced a few different issues:
- we made a change to opm where it started to validate package names differently. This broke some of our tests because they had invalid package names.
- opm switched to a different cache backend, which lead to the operatorhubio image being updated with the new cache backend, but that same image broke CI for older versions whose opm did not support the new backend

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/767

Bug OCPBUGS-36171: BuildController does not build multiple MachineOSBuilds that use canonicalized secrets

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33671~~. The following is the description of the original issue:
—
Description of problem:

If one attempts to create more than one MachineOSConfig at the same time that requires a canonicalized secret, only one will build. The rest will not build.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1. Create multiple MachineConfigPools. Wait for the MachineConfigPool to get a rendered config.
    2. Create multiple MachineOSConfigs at the same time for each of the newly-created MachineConfigPools that uses a legacy Docker pull secret. A legacy Docker pull secret is one which does not have each of its secrets under a top-level auths key. One can use the builder-dockercfg secret in the MCO namespace for this purpose.
    3. Wait for the machine-os-builder pod to start.

Actual results:

Only one of the MachineOSBuilds begins building. The remaining MachineOSBuilds do not build nor do they get a status assigned to them. The root cause is because if they both attempt to use the same legacy Docker pull secret, one will create the canonicalized version of it. Subsequent requests that occur concurrently will fail because the canonicalized secret already exists.

Expected results:

Each MachineOSBuild should occur whenever it is created. It should also have some kind of status assigned to it as well.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4432

Bug OCPBUGS-26061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/53

Bug OCPBUGS-38699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2267

Bug OCPBUGS-24007: Remove unnecessary rhel8 build layer

View the Description View the linked PRs

To accommodate upgrades of 4.12 to 4.13 on a fips cluster, a rhel8 binary needed to be included in the rhel9 based 4.13 ovn-kubernetes container image. See https://issues.redhat.com/browse/OCPBUGS-15962 for details.

This workaround is not needed on 4.14+ clusters, as minor upgrades from 4.12 will always land on 4.13.

A fix in the ovn-kubernetes repo needs to be accompanied by a config change in ocp-build-data, please coordinate with ART.

https://github.com/openshift/ovn-kubernetes/pull/2083

Bug OCPBUGS-25625: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes-autoscaler/pull/276

Bug OCPBUGS-25586: Update 4.16 ose-cluster-api-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api/pull/191

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api/pull/191

Bug OCPBUGS-42744: In OCB, "enforcing=0" kernel argument is degrading the MachineConfigPool

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42081~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-34647. The following is the description of the original issue:
—
Description of problem:

When we enable OCB functionality and we create a MC that configures an eforcing=0 kernel argumnent the MCP is degraded reporting this message

              {
                  "lastTransitionTime": "2024-05-30T09:37:06Z",
                  "message": "Node ip-10-0-29-166.us-east-2.compute.internal is reporting: \"unexpected on-disk state validating against quay.io/mcoqe/layering@sha256:654149c7e25a1ada80acb8eedc3ecf9966a8d29e9738b39fcbedad44ddd15ed5: missing expected kernel arguments: [enforcing=0]\"",
                  "reason": "1 nodes are reporting degraded status on sync",
                  "status": "True",
                  "type": "NodeDegraded"
              },

Version-Release number of selected component (if applicable):

IPI on AWS

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-05-30-021120   True        False         97m     Error while reconciling 4.16.0-0.nightly-2024-05-30-021120: the cluster operator olm is not available

How reproducible:

Alwasy

Steps to Reproduce:

    1. Enable techpreview
$ oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}'

    2. Configure a MSOC resource to enable OCB functionality in the worker pool

When we hit this problem we were using the mcoqe quay repository.
A copy of the pull-secret for baseImagePullSecret and renderedImagePushSecret and no currentImagePullSecret configured.

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: worker
#  buildOutputs:
#    currentImagePullSecret:
#      name: ""
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: pull-copy 
    renderedImagePushSecret:
      name: pull-copy 
    renderedImagePushspec: "quay.io/mcoqe/layering:latest"

    3. Create a MC to use enforing=0 kernel argument

{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "machineconfiguration.openshift.io/v1",
            "kind": "MachineConfig",
            "metadata": {
                "labels": {
                    "machineconfiguration.openshift.io/role": "worker"
                },
                "name": "change-worker-kernel-selinux-gvr393x2"
            },
            "spec": {
                "config": {
                    "ignition": {
                        "version": "3.2.0"
                    }
                },
                "kernelArguments": [
                    "enforcing=0"
                ]
            }
        }
    ]
}

Actual results:

The worker MCP is degraded reporting this message:

oc get mcp worker -oyaml
....

              {
                  "lastTransitionTime": "2024-05-30T09:37:06Z",
                  "message": "Node ip-10-0-29-166.us-east-2.compute.internal is reporting: \"unexpected on-disk state validating against quay.io/mcoqe/layering@sha256:654149c7e25a1ada80acb8eedc3ecf9966a8d29e9738b39fcbedad44ddd15ed5: missing expected kernel arguments: [enforcing=0]\"",
                  "reason": "1 nodes are reporting degraded status on sync",
                  "status": "True",
                  "type": "NodeDegraded"
              },

Expected results:

The MC should be applied without problems and selinux should be using enforcing=0

Additional info:

https://github.com/openshift/machine-config-operator/pull/4628

Bug OCPBUGS-31444: Wrong dnsPolicy is used for konnectivity-agent in data plane

View the Description View the linked PRs

Description of problem:

The konnectivity-agent on the data plane needs to resolve its proxy-server-url to connect the control plane's konnectivity server. Also, the these agents are using the default dnsPolicy which is ClusterFirst.

This creates a dependency with CoreDNS. If CoreDNS is misconfigured or down, agents won't able to connect to the server, and all konnectivity related traffic goes down (blocks updates, webhooks, logs, etc).

The correction would to use the dnsPolicy: Default in the konnectivity-agent daemonset on the data plane, so it would use the name resolution configuration from the node.

This makes sure that the konnectivity-agent's proxy-server-url can be resolved even if coreDNS is down or mis-configured

The konnectivity-agent control plane deployment shall not change as it still needs to use coreDNS as in that case a ClusterIP Service is configured as proxy-server-url.

Version-Release number of selected component (if applicable):

4.14, 4.15

How reproducible:

Break coreDNS configuration

Steps to Reproduce:

1. Put an invalid forwarder to the dns.operator/default to fail upstream DNS resolving
2. Rollout restart the konnectivity-agent daemonset in kube-system

Actual results:

kubectl log is failing

Expected results:

kubectl log is working

Additional info:

https://github.com/openshift/hypershift/pull/3810

Bug OCPBUGS-34620: API is unavailable after bootstrap server is destroyed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30860~~. The following is the description of the original issue:
—
Description of problem:

Installation failed on 4.16 nightly build when waiting for install-complete. API is unavailable.

level=info msg=Waiting up to 20m0s (until 5:00AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443...
level=info msg=API v1.29.2+a0beecc up
level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete...
api available
waiting for bootstrap to complete
level=info msg=Waiting up to 20m0s (until 5:01AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443...
level=info msg=API v1.29.2+a0beecc up
level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete...
level=info msg=It is now safe to remove the bootstrap resources
level=info msg=Time elapsed: 15m54s
Copying kubeconfig to shared dir as kubeconfig-minimal
level=info msg=Destroying the bootstrap resources... 
level=info msg=Waiting up to 40m0s (until 5:39AM UTC) for the cluster at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443 to initialize...
W0313 04:59:34.272442     229 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout
I0313 04:59:34.272658     229 trace.go:236] Trace[533197684]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (13-Mar-2024 04:59:04.271) (total time: 30000ms):
Trace[533197684]: ---"Objects listed" error:Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout 30000ms (04:59:34.272)
...
E0313 05:38:18.669780     229 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout
level=error msg=Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 172.212.184.131:6443: i/o timeout
level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
level=error msg=failed to initialize the cluster: timed out waiting for the condition 

On master node, seems that kube-apiserver is not running, 
[root@ci-op-4sgxj8jx-8482f-hppxj-master-0 ~]# crictl ps | grep apiserver
e4b6cc9622b01       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         7 minutes ago        Running             kube-apiserver-cert-syncer                    22                  3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
1249824fe5788       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             kube-apiserver-insecure-readyz                0                   3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
ca774b07284f0       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             kube-apiserver-cert-regeneration-controller   0                   3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
2931b9a2bbabd       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             openshift-apiserver-check-endpoints           0                   4136bf2183de1       apiserver-7df5bb879-xx74p
0c9534aec3b6b       8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de                                                         4 hours ago          Running             openshift-apiserver                           0                   4136bf2183de1       apiserver-7df5bb879-xx74p
db21a2dd1df33       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             guard                                         0                   199e1f4e665b9       kube-apiserver-guard-ci-op-4sgxj8jx-8482f-hppxj-master-0
429110f9ea5a3       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago          Running             apiserver-watcher                             0                   7664f480df29d       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-0

[root@ci-op-4sgxj8jx-8482f-hppxj-master-1 ~]# crictl ps | grep apiserver
c64187e7adcc6       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago         Running             openshift-apiserver-check-endpoints           0                   1a4a5b247c28a       apiserver-7df5bb879-f6v5x
ff98c52402288       8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de                                                         4 hours ago         Running             openshift-apiserver                           0                   1a4a5b247c28a       apiserver-7df5bb879-f6v5x
2f8a97f959409       faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927                                                         4 hours ago         Running             oauth-apiserver                               0                   ffa2c316a0cca       apiserver-97fbc599c-2ftl7
72897e30e0df0       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago         Running             apiserver-watcher                             0                   3b6c3849ce91f       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-1

[root@ci-op-4sgxj8jx-8482f-hppxj-master-2 ~]# crictl ps | grep apiserver
04c426f07573d       faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927                                                         4 hours ago         Running             oauth-apiserver                      0                   2172a64fb1a38       apiserver-654dcb4cc6-tq8fj
4dcca5c0e9b99       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago         Running             apiserver-watcher                    0                   1cd99ec327199       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-2


And found below error in kubelet log,
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: E0313 06:10:15.004656   23961 kuberuntime_manager.go:1262] container &Container{Name:kube-apiserver,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:789f242b8bc721b697e265c6f9d025f45e56e990bfd32e331c633fe0b9f076bc,Command:[/bin/bash -ec],Args:[LOCK=/var/log/kube-apiserver/.lock
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: # We should be able to acquire the lock immediatelly. If not, it means the init container has not released it yet and kubelet or CRI-O started container prematurely.
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec {LOCK_FD}>${LOCK} && flock --verbose -w 30 "${LOCK_FD}" || {
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   echo "Failed to acquire lock for kube-apiserver. Please check setup container for details. This is likely kubelet or CRI-O bug."
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   exit 1
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: }
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   echo "Copying system trust bundle ..."
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: fi
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP}  -v=2 --permit-address-sharing
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: ],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:,HostPort:6443,ContainerPort:6443,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:POD_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:POD_NAMESPACE,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:STATIC_POD_VERSION,Value:4,ValueFrom:nil,},EnvVar{Name:HOST_IP,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.hostIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:GOGC,Value:100,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{265 -3} {<nil>} 265m DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:resource-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-resources,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:cert-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-certs,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:audit-dir,ReadOnly:false,MountPath:/var/log/kube-apiserver,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:livez,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:readyz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:1,TerminationGracePeriodSeconds:nil,},Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,VolumeDevices:[]VolumeDevice{},StartupProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:healthz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:30,TerminationGracePeriodSeconds:nil,},ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0_openshift-kube-apiserver(196e0956694ff43707b03f4585f3b6cd): CreateContainerConfigError: host IP unknown; known addresses: []

Version-Release number of selected component (if applicable):

    4.16 latest nightly build

How reproducible:

    frequently

Steps to Reproduce:

    1. Install cluster on 4.16 nightly build
    2.
    3.

Actual results:

    Installation failed.

Expected results:

    Installation is successful.

Additional info:

Searched CI jobs, found many jobs failed with same error, most are on azure platform.
https://search.dptools.openshift.org/?search=failed+to+initialize+the+cluster%3A+timed+out+waiting+for+the+condition&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/installer/pull/8499

Bug OCPBUGS-30767: CRD type check test fails too often

View the Description View the linked PRs

[sig-api-machinery] ValidatingAdmissionPolicy [Privileged:ClusterAdmin] [FeatureGate:ValidatingAdmissionPolicy] [Beta] should type check a CRD [Suite:openshift/conformance/parallel] [Suite:k8s]

This test appears to fail a little too often. It seems to only run on techpreview clusters (presumably the Beta tag in the name), but I was worried it's an indication something isn't ready to graduate from techpreview, so figured this is worth a bug.

Even so 93% pass rate is a little too low, would like someone to investigate and get this test rate up. When it fails it's typically the only thing killing the job run. Output is always:

{  fail [k8s.io/kubernetes@v1.29.0/test/e2e/apimachinery/validatingadmissionpolicy.go:349]: Expected
    <[]v1beta1.ExpressionWarning | len:0, cap:0>: nil
to have length 2
Ginkgo exit error 1: exit with code 1}

View this link for sample job runs, I would focus on those with 2 failures indicating this was the only failing test in the job.

https://github.com/openshift/kubernetes/pull/1922

Bug OCPBUGS-41607: [4.16.z] SCC pinning for all workloads in platform namespaces (openshift-network-node-identity)

View the Description View the linked PRs

Backport to 4.16 of AUTH-482 specifically for the openshift-network-node-identity.

Namespaces with workloads that need pinning:

network-check-source
network-check-target
network-node-identity

See 4.17 PR for more info on what needs pinning.

https://github.com/openshift/cluster-network-operator/pull/2496

Bug OCPBUGS-43840: OIDC IDP validation check should not be fatal to CPO reconcilation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43746~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38132. The following is the description of the original issue:
—
The CPO reconciliation aborts when the OIDC/LDAP IDP validation check fails and this result in failure to reconcile for any components that are reconciled after that point in the code.

This failure should not be fatal to the CPO reconcile and should likely be reported as a condition on the HC.

xref

Customer incident
https://issues.redhat.com/browse/OCPBUGS-38071

RFE for bypassing the check
https://issues.redhat.com/browse/RFE-5638

PR to proxy the IDP check through the data plane network
https://github.com/openshift/hypershift/pull/4273

https://github.com/openshift/hypershift/pull/4985

Story MULTIARCH-4066: Include new regions into machine-api-provider-powervs

View the Description View the linked PRs

Included following regions

Madrid, Spain(mad)
Washington DC, USA(wdc)

https://github.com/openshift/machine-api-provider-powervs/pull/71

Bug OCPBUGS-25138: Update 4.16 ose-gcp-pd-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/54

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/54

Bug OCPBUGS-25332: HyperShift failing on kube 1.29 rebase due to KMSv1

View the Description View the linked PRs

Description of problem:

E1213 07:18:34.291004 1 run.go:74] "command failed" err="error while building transformers: KMSv1 is deprecated and will only receive security updates going forward. Use KMSv2 instead. Set --feature-gates=KMSv1=true to use the deprecated KMSv1 feature."

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3318

Bug OCPBUGS-31415: HostedCluster cannot recover from invalid release image

View the Description View the linked PRs

Description of problem:

 After setting an invalid release image on a HostedCluster, it is not possible to fix it by editing the HostedCluster and setting a valid release image.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Create a HostedCluster with an invalid release image
    2. Edit HostedCluster and specify a valid release image
    3.

Actual results:

    HostedCluster does not start using the new valid release image

Expected results:

    HostedCluster starts using the valid release image.

Additional info:

https://github.com/openshift/hypershift/pull/3829

Bug OCPBUGS-22556: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/88

Bug OCPBUGS-33470: Azure masters use only read cache

View the Description View the linked PRs

Description of problem:

While researching OCPBUGS-30860, I came across Read only caches on master Azure nodes.

Version-Release number of selected component (if applicable):

4.10 - present

How reproducible:

Always

Steps to Reproduce:

    1.Spin up a cluster
    2.Observe in the Azure cloud dashboard master nodes only using Read cache.
    3.

Actual results:

Expected results:

Master nodes should be using the ReadWrite cache.

Additional info:

https://github.com/openshift/installer/pull/8380

Bug OCPBUGS-37840: Autonode sizing backport bug for upgrades to work

View the Description View the linked PRs

Description of problem:

4.17 introduces new auto node sizing values. To preserve backwards compatibility we need to backport a version file.

Related: https://issues.redhat.com//browse/OCPNODE-2226

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4382

Bug OCPBUGS-5825: system:openshift:kube-controller-manager:gce-cloud-provider referencing non existing serviceAccount

View the Description View the linked PRs

Description of problem:

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.20 True False 43h Cluster version is 4.11.20

$ oc get clusterrolebinding system:openshift:kube-controller-manager:gce-cloud-provider -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2023-01-11T13:16:47Z"
  name: system:openshift:kube-controller-manager:gce-cloud-provider
  resourceVersion: "6079"
  uid: 82a81635-4535-4a51-ab83-d2a1a5b9a473
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:kube-controller-manager:gce-cloud-provider
subjects:
- kind: ServiceAccount
  name: cloud-provider
  namespace: kube-system

$ oc get sa cloud-provider -n kube-system
Error from server (NotFound): serviceaccounts "cloud-provider" not found

The serviceAccount cloud-provider does not exist. Neither in kube-system nor in any other namespace.

It's therefore not clear what this ClusterRoleBinding does, what use-case it does fulfill and why it references non existing serviceAccount.

From Security point of view, it's recommended to remove non serviceAccounts from ClusterRoleBindings as a potential attacker could abuse the current state by creating the necessary serviceAccount and gain undesired permissions.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4 (all version from what we have found)

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4
2. Run oc get clusterrolebinding system:openshift:kube-controller-manager:gce-cloud-provider -o yaml

Actual results:

$ oc get clusterrolebinding system:openshift:kube-controller-manager:gce-cloud-provider -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2023-01-11T13:16:47Z"
  name: system:openshift:kube-controller-manager:gce-cloud-provider
  resourceVersion: "6079"
  uid: 82a81635-4535-4a51-ab83-d2a1a5b9a473
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:kube-controller-manager:gce-cloud-provider
subjects:
- kind: ServiceAccount
  name: cloud-provider
  namespace: kube-system

$ oc get sa cloud-provider -n kube-system
Error from server (NotFound): serviceaccounts "cloud-provider" not found

Expected results:

The serviceAccount called cloud-provider to exist or otherwise the ClusterRoleBinding to be removed.

Additional info:

Finding related to a Security review done on the OpenShift Container Platform 4 - Platform

Bug OCPBUGS-37708: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/304

Bug OCPBUGS-36898: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/1069

Bug OCPBUGS-27366: hypershift needs different default APIs

View the Description View the linked PRs

To support external OIDC on hypershift, but not on self-managed, we need different schemas for the authentication CRD on a default-hypershift versus a default-self-managed. This requires us to change rendering so that it honors the clusterprofile.

Then we have to update the installer to match, then update hypershift, then update the manifests.

Bug OCPBUGS-28577: Update 4.16 ose-machine-config-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/4152

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/4241

Bug OCPBUGS-32348: No ability to debug node-ip detection logic

View the Description View the linked PRs

After fixing https://issues.redhat.com/browse/OCPBUGS-29919 by merging https://github.com/openshift/baremetal-runtimecfg/pull/301 we have lost ability to properly debug the logic of selection Node IP used in runtimecfg.

In order to preserve debugability of this component, it should be possible to selectively enable verbose logs.

https://github.com/openshift/baremetal-runtimecfg/pull/305

Bug OCPBUGS-28261: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/164

Bug OCPBUGS-29971: ART requests updates to 4.16 image csi-provisioner-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-provisioner/pull/90

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-provisioner/pull/93

Bug OCPBUGS-37456: ART requests updates to 4.16 image ose-operator-framework-tools-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/826

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/826

Task MON-3664: Avoid issues with std.set* functions in jsonnet

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2231

Bug OCPBUGS-24863: Update 4.16 multus-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-cni/pull/202

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25553: Update 4.16 ose-gcp-pd-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver/pull/56

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver/pull/56

Bug OCPBUGS-38938: [4.16] Ironic issues soft power_off command during installation via ACM, preventing fakefish from working on certain configurations

View the Description View the linked PRs

Description of problem:

Even though fakefish is not a supported redfish interface, it is very useful to have it working for "special" scenarios, like NC-SI, while its support is implemented.

On OCP 4.14 and later, converged flow is enabled by default, and on this configuration Ironic sends a soft power_off command to the ironic agent running on the ramdisk. Since this power operation is not going through the redfish interface, it is not processed by fakefish, preventing it from working on some NC-SI configurations, where a full power-off would mean the BMC loses power.

Ironic already supports using out-of-band power off for the agent [1], so having an option to use it would be very helpful.

[1]- https://opendev.org/openstack/ironic/commit/824ad1676bd8032fb4a4eb8ffc7625a376a64371

Version-Release number of selected component (if applicable):

Seen with OCP 4.14.26 and 4.14.33, expected to happen on later versions

How reproducible:

Always

Steps to Reproduce:

    1. Deploy SNO node using ACM and fakefish as redfish interface
    2. Check metal3-ironic pod logs

Actual results:

We can see a soft power_off command sent to the ironic agent running on the ramdisk:

2024-08-07 15:00:45.545 1 DEBUG ironic.drivers.modules.agent_client [None req-74c0c3ed-011f-4718-bdce-53f2ba412e85 - - - - - -] Executing agent command standby.power_off for node df006e90-02ee-4847-b532-be4838e844e6 with params {'wait': 'false', 'agent_token': '***'} _command /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_client.py:197
2024-08-07 15:00:45.551 1 DEBUG ironic.drivers.modules.agent_client [None req-74c0c3ed-011f-4718-bdce-53f2ba412e85 - - - - - -] Agent command standby.power_off for node df006e90-02ee-4847-b532-be4838e844e6 returned result None, error None, HTTP status code 200 _command /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_client.py:234

Expected results:

There is an option to prevent this soft power_off command, so all power actions happen via redfish. This would allow fakefish to capture them and behave as needed.

Additional info:

https://github.com/openshift/baremetal-operator/pull/372

Bug OCPBUGS-38997: Machine-config daemon ListPools panic during tech-preview CI runs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38846~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37850. The following is the description of the original issue:
—

Description of problem:

Occasional machine-config daemon panics in test-preview. For example this run has:

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1076/pull-ci-openshift-cluster-version-operator-master-e2e-aws-ovn-techpreview/1819082707058036736

And the referenced logs include a full stack trace, the crux of which appears to be:

E0801 19:23:55.012345    2908 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 127 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2424b80, 0x4166150})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0004d5340?})
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x2424b80?, 0x4166150?})
	/usr/lib/golang/src/runtime/panic.go:770 +0x132
github.com/openshift/machine-config-operator/pkg/helpers.ListPools(0xc0007c5208, {0x0, 0x0})
	/go/src/github.com/openshift/machine-config-operator/pkg/helpers/helpers.go:142 +0x17d
github.com/openshift/machine-config-operator/pkg/helpers.GetPoolsForNode({0x0, 0x0}, 0xc0007c5208)
	/go/src/github.com/openshift/machine-config-operator/pkg/helpers/helpers.go:66 +0x65
github.com/openshift/machine-config-operator/pkg/daemon.(*PinnedImageSetManager).handleNodeEvent(0xc000a98480, {0x27e9e60?, 0xc0007c5208})
	/go/src/github.com/openshift/machine-config-operator/pkg/daemon/pinned_image_set.go:955 +0x92

Version-Release number of selected component (if applicable):

$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?name=^periodic&type=junit&search=machine-config-daemon.*Observed+a+panic' | grep 'failures match'
periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-techpreview (all) - 37 runs, 62% failed, 13% of failures match = 8% impact
periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-techpreview-serial (all) - 6 runs, 83% failed, 20% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-techpreview (all) - 5 runs, 60% failed, 33% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.17-ocp-e2e-aws-ovn-arm64-techpreview-serial (all) - 10 runs, 40% failed, 25% of failures match = 10% impact
periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-techpreview-serial (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-techpreview-serial (all) - 7 runs, 100% failed, 14% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.18-e2e-vsphere-ovn-techpreview-serial (all) - 5 runs, 100% failed, 20% of failures match = 20% impact
periodic-ci-openshift-multiarch-master-nightly-4.17-ocp-e2e-aws-ovn-arm64-techpreview (all) - 10 runs, 40% failed, 25% of failures match = 10% impact
periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-techpreview (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
periodic-ci-openshift-release-master-ci-4.16-e2e-aws-ovn-techpreview-serial (all) - 6 runs, 17% failed, 200% of failures match = 33% impact
periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-single-node-techpreview-serial (all) - 7 runs, 100% failed, 14% of failures match = 14% impact
periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-single-node-techpreview (all) - 7 runs, 57% failed, 50% of failures match = 29% impact
periodic-ci-openshift-release-master-ci-4.16-e2e-aws-ovn-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-techpreview (all) - 18 runs, 17% failed, 33% of failures match = 6% impact
periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact
periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-aws-ovn-arm64-techpreview-serial (all) - 11 runs, 18% failed, 50% of failures match = 9% impact
periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-techpreview-serial (all) - 7 runs, 57% failed, 25% of failures match = 14% impact

How reproducible:

looks like ~15% impact in those CI runs CI Search turns up.

Steps to Reproduce:

Run lots of CI. Look for MCD panics.

Actual results

CI Search results above.

Expected results

No hits.

https://github.com/openshift/machine-config-operator/pull/4550

Bug OCPBUGS-26415: Application creation fail when manually entering input scaling value in local setup

View the Description View the linked PRs

Description of problem:

In local setup this error appears when creating a deployment with scaling in the git form page locally: 
`Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field DeploymentSpec.spec.replicas of type int32`

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-01-05-154400

How reproducible:

Everytime

Steps to Reproduce:

    1. In the local setup go to the git form page
    2. Enter a git repo and select deployment as the resource type
    3. In scaling enter the value as '5' and click on Create button

Actual results:

Got this error:
"Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field DeploymentSpec.spec.replicas of type int32"

Expected results:

Deployment should be created

Additional info:

Happening with Deployment-config creation as well

https://github.com/openshift/console/pull/13487

Bug OCPBUGS-27866: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/81

Bug OCPBUGS-31546: Azure installs are slow to create manifests

View the Description View the linked PRs

Description of problem:

When running an Azure install, the installer noticeably hangs for a long time when running create manifests or create cluster. It will sit unresponsive for almost 2 minutes at:

DEBUG OpenShift Installer unreleased-master-9741-gbc9836aa9bd3a4f10d229bb6f87981dddf2adc92
DEBUG Built from commit bc9836aa9bd3a4f10d229bb6f87981dddf2adc92
DEBUG Fetching Metadata...
DEBUG Loading Metadata...
DEBUG Loading Cluster ID...
DEBUG Loading Install Config...
DEBUG Loading SSH Key...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Cluster Name...
DEBUG Loading Base Domain...
DEBUG Loading Platform...
DEBUG Loading Pull Secret...
DEBUG Loading Platform...
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json"

This could also be related to failures we see in CI such as this:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8123/pull-ci-openshift-installer-master-e2e-azure-ovn/1773611162923962368

level=info msg=Consuming Worker Machines from target directory
level=info msg=Credentials loaded from file "/var/run/secrets/ci.openshift.io/cluster-profile/osServicePrincipal.json"
level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": error connecting to Azure client: failed to list SKUs: compute.ResourceSkusClient#List: Failure responding to request: StatusCode=200 -- Original Error: Error occurred reading http.Response#Body - Error = 'read tcp 10.128.117.2:43870->4.150.240.10:443: read: connection reset by peer'

If the call takes too long and the context timeout is canceled, we might potentially see this error.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Run azure install
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8134
has a partial fix

https://github.com/openshift/installer/pull/8134

Task MON-3706: Speed up tests by running actions immediatly

View the linked PRs

Bug OCPBUGS-27201: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/92

Bug OCPBUGS-31847: too many warnings for Prometheus Operator 0.73.0 "'bearerTokenFile' is deprecated, use 'authorization' instead."

View the Description View the linked PRs

Description of problem:

4.16.0-0.nightly-2024-04-07-182401, Prometheus Operator 0.73.0, too many warnings for "'bearerTokenFile' is deprecated, use 'authorization' instead.", see below

$ oc -n openshift-monitoring logs -c prometheus-operator deploy/prometheus-operator 
level=info ts=2024-04-08T07:06:17.191301889Z caller=main.go:186 msg="Starting Prometheus Operator" version="(version=0.73.0, branch=rhaos-4.16-rhel-9, revision=3541f90)"
level=info ts=2024-04-08T07:06:17.195797026Z caller=main.go:187 build_context="(go=go1.21.7 (Red Hat 1.21.7-1.el9) X:loopvar,strictfipsruntime, platform=linux/amd64, user=root, date=20240405-12:29:19, tags=strictfipsruntime)"
level=info ts=2024-04-08T07:06:17.195888428Z caller=main.go:198 msg="namespaces filtering configuration " config="{allow_list=\"\",deny_list=\"\",prometheus_allow_list=\"openshift-monitoring\",alertmanager_allow_list=\"openshift-monitoring\",alertmanagerconfig_allow_list=\"\",thanosruler_allow_list=\"openshift-monitoring\"}"
level=info ts=2024-04-08T07:06:17.212735844Z caller=main.go:227 msg="connection established" cluster-version=v1.29.3+e994e5d
level=warn ts=2024-04-08T07:06:17.228748881Z caller=main.go:75 msg="resource \"scrapeconfigs\" (group: \"monitoring.coreos.com/v1alpha1\") not installed in the cluster"
level=info ts=2024-04-08T07:06:17.25637504Z caller=operator.go:335 component=prometheus-controller msg="Kubernetes API capabilities" endpointslices=true
level=warn ts=2024-04-08T07:06:17.258012256Z caller=main.go:75 msg="resource \"prometheusagents\" (group: \"monitoring.coreos.com/v1alpha1\") not installed in the cluster"
level=info ts=2024-04-08T07:06:17.360652572Z caller=server.go:298 msg="starting insecure server" address=127.0.0.1:8080
level=info ts=2024-04-08T07:06:17.602723953Z caller=operator.go:283 component=thanos-controller msg="successfully synced all caches"
level=info ts=2024-04-08T07:06:17.686834878Z caller=operator.go:313 component=alertmanager-controller msg="successfully synced all caches"
level=info ts=2024-04-08T07:06:17.687014402Z caller=operator.go:572 component=alertmanager-controller key=openshift-monitoring/main msg="sync alertmanager"
level=info ts=2024-04-08T07:06:17.696906656Z caller=operator.go:392 component=prometheus-controller msg="successfully synced all caches"
level=info ts=2024-04-08T07:06:17.698997412Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
level=info ts=2024-04-08T07:06:17.904295505Z caller=operator.go:572 component=alertmanager-controller key=openshift-monitoring/main msg="sync alertmanager"
level=warn ts=2024-04-08T07:06:18.111274725Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-apiserver-operator/openshift-apiserver-operator
level=warn ts=2024-04-08T07:06:18.111387227Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-apiserver/openshift-apiserver
level=warn ts=2024-04-08T07:06:18.111430218Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-apiserver/openshift-apiserver-operator-check-endpoints
level=warn ts=2024-04-08T07:06:18.11149249Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-authentication-operator/authentication-operator
level=warn ts=2024-04-08T07:06:18.111554601Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-authentication/oauth-openshift
level=warn ts=2024-04-08T07:06:18.111637633Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cloud-credential-operator/cloud-credential-operator
level=warn ts=2024-04-08T07:06:18.111697614Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:18.111733495Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:18.111784766Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:18.111819506Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:18.111895078Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:18.111944309Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/shared-resource-csi-driver-node-monitor
level=warn ts=2024-04-08T07:06:18.11197813Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/shared-resource-csi-driver-node-monitor
level=warn ts=2024-04-08T07:06:18.112071132Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/shared-resource-csi-driver-node-monitor
level=warn ts=2024-04-08T07:06:18.112151634Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-machine-approver/cluster-machine-approver
level=warn ts=2024-04-08T07:06:18.112226245Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-version/cluster-version-operator
level=warn ts=2024-04-08T07:06:18.112256916Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-config-operator/config-operator
level=warn ts=2024-04-08T07:06:18.112284327Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-console-operator/console-operator
level=warn ts=2024-04-08T07:06:18.112310487Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-console/console
level=warn ts=2024-04-08T07:06:18.112339628Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-controller-manager-operator/openshift-controller-manager-operator
level=warn ts=2024-04-08T07:06:18.112370889Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-controller-manager/openshift-controller-manager
level=warn ts=2024-04-08T07:06:18.112397339Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-dns-operator/dns-operator
level=warn ts=2024-04-08T07:06:18.11243773Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-dns/dns-default
level=warn ts=2024-04-08T07:06:18.112484231Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-etcd-operator/etcd-operator
level=warn ts=2024-04-08T07:06:18.112532742Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-image-registry/image-registry
level=warn ts=2024-04-08T07:06:18.112575493Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ingress-operator/ingress-operator
level=warn ts=2024-04-08T07:06:18.112648155Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ingress/router-default
level=warn ts=2024-04-08T07:06:18.112684775Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-insights/insights-operator
level=warn ts=2024-04-08T07:06:18.112738886Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-apiserver-operator/kube-apiserver-operator
level=warn ts=2024-04-08T07:06:18.112771917Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-apiserver/kube-apiserver
level=warn ts=2024-04-08T07:06:18.112834288Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-controller-manager-operator/kube-controller-manager-operator
level=warn ts=2024-04-08T07:06:18.11288797Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-controller-manager/kube-controller-manager
level=warn ts=2024-04-08T07:06:18.112923101Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-scheduler-operator/kube-scheduler-operator
level=warn ts=2024-04-08T07:06:18.112974211Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-scheduler/kube-scheduler
level=warn ts=2024-04-08T07:06:18.113004992Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-scheduler/kube-scheduler
level=warn ts=2024-04-08T07:06:18.113031193Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/cluster-autoscaler-operator
level=warn ts=2024-04-08T07:06:18.113082674Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-controllers
level=warn ts=2024-04-08T07:06:18.113111174Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-controllers
level=warn ts=2024-04-08T07:06:18.113137205Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-controllers
level=warn ts=2024-04-08T07:06:18.113180076Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-operator
level=warn ts=2024-04-08T07:06:18.113207577Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-config-operator/machine-config-controller
level=warn ts=2024-04-08T07:06:18.113243277Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-config-operator/machine-config-daemon
level=warn ts=2024-04-08T07:06:18.113268968Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-config-operator/machine-config-operator
level=warn ts=2024-04-08T07:06:18.113303009Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-marketplace/marketplace-operator
level=warn ts=2024-04-08T07:06:18.113566255Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-monitoring/promtail-monitor
level=warn ts=2024-04-08T07:06:18.113659677Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-multus/monitor-multus-admission-controller
level=warn ts=2024-04-08T07:06:18.113690037Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-multus/monitor-network
level=warn ts=2024-04-08T07:06:18.113716478Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-network-diagnostics/network-check-source
level=warn ts=2024-04-08T07:06:18.113760539Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-network-operator/network-operator
level=warn ts=2024-04-08T07:06:18.113789389Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-oauth-apiserver/openshift-oauth-apiserver
level=warn ts=2024-04-08T07:06:18.11382366Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-operator-lifecycle-manager/catalog-operator
level=warn ts=2024-04-08T07:06:18.113849491Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-operator-lifecycle-manager/olm-operator
level=warn ts=2024-04-08T07:06:18.113882881Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-operator-lifecycle-manager/package-server-manager-metrics
level=warn ts=2024-04-08T07:06:18.113910142Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ovn-kubernetes/monitor-ovn-control-plane-metrics
level=warn ts=2024-04-08T07:06:18.113939212Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ovn-kubernetes/monitor-ovn-node
level=warn ts=2024-04-08T07:06:18.113965423Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ovn-kubernetes/monitor-ovn-node
level=warn ts=2024-04-08T07:06:18.114005374Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-route-controller-manager/openshift-route-controller-manager
level=warn ts=2024-04-08T07:06:18.114032265Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-service-ca-operator/service-ca-operator
level=warn ts=2024-04-08T07:06:18.114075275Z caller=promcfg.go:1806 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1
level=info ts=2024-04-08T07:06:18.372521592Z caller=operator.go:766 component=prometheus-controller key=openshift-monitoring/k8s msg="sync prometheus"
level=warn ts=2024-04-08T07:06:19.52908448Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-apiserver-operator/openshift-apiserver-operator
level=warn ts=2024-04-08T07:06:19.529206143Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-apiserver/openshift-apiserver
level=warn ts=2024-04-08T07:06:19.529264914Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-apiserver/openshift-apiserver-operator-check-endpoints
level=warn ts=2024-04-08T07:06:19.529314545Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-authentication-operator/authentication-operator
level=warn ts=2024-04-08T07:06:19.529363736Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-authentication/oauth-openshift
level=warn ts=2024-04-08T07:06:19.529496399Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cloud-credential-operator/cloud-credential-operator
level=warn ts=2024-04-08T07:06:19.52954309Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:19.529610031Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:19.529675583Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:19.529722024Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:19.529773425Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/aws-ebs-csi-driver-controller-monitor
level=warn ts=2024-04-08T07:06:19.529840396Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/shared-resource-csi-driver-node-monitor
level=warn ts=2024-04-08T07:06:19.529940188Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/shared-resource-csi-driver-node-monitor
level=warn ts=2024-04-08T07:06:19.530042201Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-csi-drivers/shared-resource-csi-driver-node-monitor
level=warn ts=2024-04-08T07:06:19.530145063Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-machine-approver/cluster-machine-approver
level=warn ts=2024-04-08T07:06:19.530242295Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-cluster-version/cluster-version-operator
level=warn ts=2024-04-08T07:06:19.530318036Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-config-operator/config-operator
level=warn ts=2024-04-08T07:06:19.530379448Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-console-operator/console-operator
level=warn ts=2024-04-08T07:06:19.530423309Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-console/console
level=warn ts=2024-04-08T07:06:19.53046613Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-controller-manager-operator/openshift-controller-manager-operator
level=warn ts=2024-04-08T07:06:19.530515121Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-controller-manager/openshift-controller-manager
level=warn ts=2024-04-08T07:06:19.530600663Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-dns-operator/dns-operator
level=warn ts=2024-04-08T07:06:19.530658014Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-dns/dns-default
level=warn ts=2024-04-08T07:06:19.530718695Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-etcd-operator/etcd-operator
level=warn ts=2024-04-08T07:06:19.530768006Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-image-registry/image-registry
level=warn ts=2024-04-08T07:06:19.530829528Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ingress-operator/ingress-operator
level=warn ts=2024-04-08T07:06:19.530882449Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ingress/router-default
level=warn ts=2024-04-08T07:06:19.53093667Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-insights/insights-operator
level=warn ts=2024-04-08T07:06:19.530991941Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-apiserver-operator/kube-apiserver-operator
level=warn ts=2024-04-08T07:06:19.531039122Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-apiserver/kube-apiserver
level=warn ts=2024-04-08T07:06:19.531094903Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-controller-manager-operator/kube-controller-manager-operator
level=warn ts=2024-04-08T07:06:19.531137024Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-controller-manager/kube-controller-manager
level=warn ts=2024-04-08T07:06:19.531180345Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-scheduler-operator/kube-scheduler-operator
level=warn ts=2024-04-08T07:06:19.531224986Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-scheduler/kube-scheduler
level=warn ts=2024-04-08T07:06:19.531270967Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-kube-scheduler/kube-scheduler
level=warn ts=2024-04-08T07:06:19.531334098Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/cluster-autoscaler-operator
level=warn ts=2024-04-08T07:06:19.53138266Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-controllers
level=warn ts=2024-04-08T07:06:19.5314245Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-controllers
level=warn ts=2024-04-08T07:06:19.531463661Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-controllers
level=warn ts=2024-04-08T07:06:19.531513562Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-api/machine-api-operator
level=warn ts=2024-04-08T07:06:19.531555783Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-config-operator/machine-config-controller
level=warn ts=2024-04-08T07:06:19.531626765Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-config-operator/machine-config-daemon
level=warn ts=2024-04-08T07:06:19.531689586Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-machine-config-operator/machine-config-operator
level=warn ts=2024-04-08T07:06:19.531733467Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-marketplace/marketplace-operator
level=warn ts=2024-04-08T07:06:19.532134636Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-monitoring/promtail-monitor
level=warn ts=2024-04-08T07:06:19.532233158Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-multus/monitor-multus-admission-controller
level=warn ts=2024-04-08T07:06:19.532507644Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-multus/monitor-network
level=warn ts=2024-04-08T07:06:19.532567965Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-network-diagnostics/network-check-source
level=warn ts=2024-04-08T07:06:19.532635257Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-network-operator/network-operator
level=warn ts=2024-04-08T07:06:19.532683058Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-oauth-apiserver/openshift-oauth-apiserver
level=warn ts=2024-04-08T07:06:19.532728279Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-operator-lifecycle-manager/catalog-operator
level=warn ts=2024-04-08T07:06:19.53277187Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-operator-lifecycle-manager/olm-operator
level=warn ts=2024-04-08T07:06:19.532821821Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-operator-lifecycle-manager/package-server-manager-metrics
level=warn ts=2024-04-08T07:06:19.532863662Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ovn-kubernetes/monitor-ovn-control-plane-metrics
level=warn ts=2024-04-08T07:06:19.532904153Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ovn-kubernetes/monitor-ovn-node
level=warn ts=2024-04-08T07:06:19.532944204Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-ovn-kubernetes/monitor-ovn-node
level=warn ts=2024-04-08T07:06:19.532990574Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-route-controller-manager/openshift-route-controller-manager
level=warn ts=2024-04-08T07:06:19.533037166Z caller=promcfg.go:1374 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1 service_monitor=openshift-service-ca-operator/service-ca-operator
level=warn ts=2024-04-08T07:06:19.533089337Z caller=promcfg.go:1806 component=prometheus-controller msg="'bearerTokenFile' is deprecated, use 'authorization' instead." version=2.50.1

example servicemonitor with bearerTokenFile that causes warining in prometheus operator

$ oc -n openshift-apiserver-operator get servicemonitor openshift-apiserver-operator -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    metricRelabelings:
    - action: drop
      regex: etcd_(debugging|disk|request|server).*
      sourceLabels:
 ...

$ oc explain servicemonitor.spec.endpoints.bearerTokenFile
GROUP:      monitoring.coreos.com
KIND:       ServiceMonitor
VERSION:    v1FIELD: bearerTokenFile <string>DESCRIPTION:
    File to read bearer token for scraping the target. 
     Deprecated: use `authorization` instead.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-04-07-182401   True        False         52m     Cluster version is 4.16.0-0.nightly-2024-04-07-182401

How reproducible:

with Prometheus Operator 0.73.0

Steps to Reproduce:

1. check prometheus-operator logs

Actual results:

too many warnings for "'bearerTokenFile' is deprecated, use 'authorization' instead."

Expected results:

no warnings

Bug OCPBUGS-15755: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/105

Bug OCPBUGS-23306: Avoid eviction of CSI driver daemonsets pods from the cluster-autoscaler

View the Description View the linked PRs

Cluster-autoscaler by default evict all those pods -including those coming from daemon sets-
In the case of EFS-CSI drivers, which are mounted as nfs volumes, this is causing nfs stale and that application worloads are not terminated gracefully.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

- While scaling down a node from the cluster-autoscaler-operator, the DS pods are beeing evicted.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

CSI pods might not be evicted by the cluster autoscaler (at least prior to workloads termination) as it might produce data corruption

Additional info:

Is possible to disable csi pods eviction adding the following annotation label on the csi driver pod
cluster-autoscaler.kubernetes.io/enable-ds-eviction: "false"

Bug OCPBUGS-28663: openshift/csi-driver-shared-resource - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/165

Bug OCPBUGS-35023: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8537

Bug OCPBUGS-38964: IngressController subnet selection in AWS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38963~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-33308. The following is the description of the original issue:
—
Description of problem:

When creating an OCP cluster on AWS and selecting "publish: Internal," 
the ingress operator may create external LB mappings to external 
subnets.

This can occur if public subnets were specified during installation at install-config.

https://docs.openshift.com/container-platform/4.15/installing/installing_aws/installing-aws-private.html#private-clusters-about-aws_installing-aws-private 

A configuration validation should be added to the installer.

Version-Release number of selected component (if applicable):

    4.14+ probably older versions as well.

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    Slack thread: https://redhat-internal.slack.com/archives/C68TNFWA2/p1714986876688959

https://github.com/openshift/installer/pull/8910

Bug OCPBUGS-24993: Update 4.16 kube-state-metrics-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/108

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/108

Bug OCPBUGS-18939: The apiservers shouldn't expose both SLO and SLI latency metrics to Prometheus

View the Description View the linked PRs

All the apiservers:

kube-apiserver
openshift-apiserver
oauth-apiserver

Expose both `apiserver_request_slo_duration_seconds` and `apiserver_request_sli_duration_seconds`. The SLI metric was introduced in Kubernetes 1.26 as a replacement of `apiserver_request_slo_duration_seconds` which was deprecated in Kubernetes 1.27. This change is only a renaming so both metrics expose the same data. To avoid storing duplicated data in Prometheus, we need to drop the `apiserver_request_slo_duration_seconds` in favor of `apiserver_request_sli_duration_seconds`.

Bug OCPBUGS-29816: Switch to service to get the data from the Tekton results summary API

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13624

Bug OCPBUGS-31900: ART requests updates to 4.16 image ose-must-gather-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/must-gather/pull/409

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/must-gather/pull/409

Bug OCPBUGS-27861: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/107

Bug OCPBUGS-24745: Update 4.16 golang-github-prometheus-prometheus-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus/pull/187

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus/pull/187

Bug OCPBUGS-25040: Update 4.16 ose-network-tools-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/network-tools/pull/105

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/network-tools/pull/105

Bug OCPBUGS-27282: Make controllerAvailabilityPolicy field immutable

View the Description View the linked PRs

Description of problem:

    We need to make controllerAvailabilityPolicy field inmutable in the HostedCluster spec section to ensure the customer cannot go from/to SingleReplica to HighAvailability.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3513

Bug OCPBUGS-29162: multus-cni upstream sync to 20240207

View the Description View the linked PRs

Not issue, just upstream sync (or issue: multus is not up-to-date).

https://github.com/openshift/multus-cni/pull/214

Bug OCPBUGS-28643: PowerVS: Add dal12 region

View the Description View the linked PRs

Description of problem:

There is a new zone in PowerVS called dal12.  We need to add this zone to the list of supported zones in the installer.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Deploy OpenShift cluster to the zone
    2.
    3.

Actual results:

Fails

Expected results:

Works

Additional info:

https://github.com/openshift/installer/pull/7956

Bug OCPBUGS-31930: ART requests updates to 4.16 image openshift-enterprise-builder-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/builder/pull/385

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/builder/pull/385

Bug OCPBUGS-24526: Bundle Snapshot taken from wrong namespace for Deprecation Conditions

View the Description View the linked PRs

Description of problem:

Snapshots taken to gather deprecation information from bundles are from the Subscription namespace instead of the CatalogSource namespace. That means that if the Subscription is in a different namespace then no bundles will be present in the snapshot.

How reproducible:

100%

Steps to Reproduce:

1.Create CatalogSource with olm.deprecation entries
2.Create Subscription targeting a package with deprecations in a different namespace.

Actual results:

No Deprecation Conditions will be present.

Expected results:

Deprecation Conditions should be present.

https://github.com/openshift/operator-framework-olm/pull/654

Bug OCPBUGS-35883: [Backport-4.16] TestAllowedSourceRangesStatus expected the annotation to be reflected in status.allowedSourceRanges flake

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35368~~. The following is the description of the original issue:
—
Description of problem:

TestAllowedSourceRangesStatus test is flaking with the error:

allowed_source_ranges_test.go:197: expected the annotation to be reflected in status.allowedSourceRanges: timed out waiting for the condition

I also notice it sometimes coincides with a TestScopeChange error. It may be related updating LoadBalancer type operations, for example, https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/978/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator/1800249453098045440

Version-Release number of selected component (if applicable):

4.17

How reproducible:

~25-50%

Steps to Reproduce:

1. Run cluster-ingress-operator TestAllowedSourceRangesStatus E2E tests
2.
3.

Actual results:

Test is flaking

Expected results:

Test shouldn't flake

Additional info:

Example Flake

Search.CI Link

https://github.com/openshift/cluster-ingress-operator/pull/1093

Bug OCPBUGS-24902: Update 4.16 openshift-enterprise-registry-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-29341: dhcp-daemon pods have priority: 0 and no priority class

View the Description View the linked PRs

Description of problem:

If these pods are evicted, they loose all knowlage of exsisting dhcp leases, and any pods using dhcp ipam will fail to renew the dhcp lease. even after the pod is re-created.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. use a NAD with ipam: dhcp.
    2. delete the dhcp deamon pod on the smae node as your workload.
    3. observe the lease expire on dhcp server / get reissued to a different pod causing network outage from duplicate addresses.

Actual results:

dhcp-daemon

Expected results:

dhcp-daemon pod does not get evicted before workloads. because of system-node-critical

Additional info:

All other multus components system-node-critical 
  priority: 2000001000
  priorityClassName: system-node-critical

https://github.com/openshift/cluster-network-operator/pull/2280

Bug OCPBUGS-31306: Azure-Disk CSI Driver node pod CrashLoopBackOff in Azure Stack

View the Description View the linked PRs

Description of problem:

In Azure Stack, the Azure-Disk CSI Driver node pod CrashLoopBackOff:

openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-57rxv                                      1/3     CrashLoopBackOff   33 (3m55s ago)   59m     10.0.1.5       ci-op-q8b6n4iv-904ed-kp5mv-worker-mtcazs-m62cj   <none>           <none>
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-8wvqm                                      1/3     CrashLoopBackOff   35 (29s ago)     67m     10.0.0.6       ci-op-q8b6n4iv-904ed-kp5mv-master-1              <none>           <none>
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-97ww5                                      1/3     CrashLoopBackOff   33 (12s ago)     67m     10.0.0.7       ci-op-q8b6n4iv-904ed-kp5mv-master-2              <none>           <none>
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-9hzw9                                      1/3     CrashLoopBackOff   35 (108s ago)    59m     10.0.1.4       ci-op-q8b6n4iv-904ed-kp5mv-worker-mtcazs-gjqmw   <none>           <none>
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-glgzr                                      1/3     CrashLoopBackOff   34 (69s ago)     67m     10.0.0.8       ci-op-q8b6n4iv-904ed-kp5mv-master-0              <none>           <none>
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-hktfb                                      2/3     CrashLoopBackOff   48 (63s ago)     60m     10.0.1.6       ci-op-q8b6n4iv-904ed-kp5mv-worker-mtcazs-kdbpf   <none>           <none>

The CSI-Driver container log:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xc8 pc=0x18ff5db]
goroutine 228 [running]:
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).GetZone(0xc00021ec00, {0xc0002d57d0?, 0xc00005e3e0?})
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_zones.go:182 +0x2db
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).NodeGetInfo(0xc000144000, {0x21ebbf0, 0xc0002d5470}, 0x273606a?)
 /go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/nodeserver.go:336 +0x13b
github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler.func1({0x21ebbf0, 0xc0002d5470}, {0x1d71a60?, 0xc0003b0320})
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:7160 +0x72
sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common.logGRPC({0x21ebbf0, 0xc0002d5470}, {0x1d71a60?, 0xc0003b0320?}, 0xc0003b0340, 0xc00050ae10)
 /go/src/github.com/openshift/azure-disk-csi-driver/pkg/csi-common/utils.go:80 +0x409
github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler({0x1ec2f40?, 0xc000144000}, {0x21ebbf0, 0xc0002d5470}, 0xc000054680, 0x20167a0)
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:7162 +0x135
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000530000, {0x21ebbf0, 0xc0002d53b0}, {0x21f5f40, 0xc00057b1e0}, 0xc00011cb40, 0xc00052c810, 0x30fa1c8, 0x0)
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1343 +0xe03
google.golang.org/grpc.(*Server).handleStream(0xc000530000, {0x21f5f40, 0xc00057b1e0}, 0xc00011cb40)
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1737 +0xc4c
google.golang.org/grpc.(*Server).serveStreams.func1.1()
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:986 +0x86
created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 260
 /go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:997 +0x145

The registrar container log:
E0321 23:08:02.679727       1 main.go:103] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = error reading from server: EOF, restarting registration container.

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-03-21-152650

How reproducible:

    See it in CI profile, and manual install failed earlier.

Steps to Reproduce:

    See Description

Actual results:

    Azure-Disk CSI Driver node pod CrashLoopBackOff

Expected results:

    Azure-Disk CSI Driver node pod should be running

Additional info:

    See gather-extra and must-gather: 
https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-azure-stack-ipi-proxy-fips-f2/1770921405509013504/artifacts/azure-stack-ipi-proxy-fips-f2/

Bug MON-3584: prometheus-adapter image fails to start

View the Description View the linked PRs

There was a glitch with the prometheus-adapter image after the 4.16 branching.

https://redhat-internal.slack.com/archives/C01CQA76KMX/p1702041610708869

https://github.com/openshift/k8s-prometheus-adapter/pull/97

Bug OCPBUGS-24575: GCP: The installer doesn’t precheck if node architecture and vm type are consistent

View the Description View the linked PRs

Description of problem:

The installer doesn’t do precheck if node architecture and vm type are consistent for aws and gcp, it works on azure

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-multi-2023-12-06-195439

How reproducible:

   Always

Steps to Reproduce:

    1.Config compute architecture field to arm64 but vm type choose amd64 instance type in install-config     
    2.Create cluster 
    3.Check installation

Actual results:

Azure will precheck if architecture is consistent with instance type when creating manifests, like:
12-07 11:18:24.452 [INFO] Generating manifests files.....12-07 11:18:24.452 level=info msg=Credentials loaded from file "/home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/azurecreds20231207-285-jd7gpj"
12-07 11:18:56.474 level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: controlPlane.platform.azure.type: Invalid value: "Standard_D4ps_v5": instance type architecture 'Arm64' does not match install config architecture amd64

But aws and gcp don’t have precheck, it will fail during installation, but many resources have been created. The case more likely to happen in multiarch cluster

Expected results:

The installer can do a precheck for architecture and vm type , especially for heterogeneous supported platforms(aws,gcp,azure)

Additional info:

https://github.com/openshift/installer/pull/7850

Bug OCPBUGS-24960: Update 4.16 ose-agent-installer-utils-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/agent-installer-utils/pull/32

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/agent-installer-utils/pull/32

Task HOSTEDCP-1372: Bump k8s to v0.29

View the Description View the linked PRs

Bump k8s to v0.29

https://github.com/openshift/hypershift/pull/3360

Bug OCPBUGS-37606: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8777

Bug OCPBUGS-27908: Workloads -> Deployments -> Deployment -> Details -> Volumes -> Remove volume : Translation missing

View the Description View the linked PRs

Description of problem:

    Navigation:
    Workloads -> Deployments -> (select any Deployment from list) -> Details -> Volumes -> Remove volume

    Issue:
    Message "Are you sure you want to remove volume audit-policies from Deployment: apiserver?" is in English.

    Observation:
    Translation is present in branch release-4.15 file...
    frontend/public/locales/ja/public.json

Version-Release number of selected component (if applicable):

    4.15.0-rc.3

How reproducible:

    Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Content is in English

Expected results:

    Content should be in selected language

Additional info:

    Reference screenshot attached.

https://github.com/openshift/console/pull/13550

Bug OCPBUGS-36031: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/718

Bug OCPBUGS-38466: Rendezvous node is failed to add the cluster due to some pending CSR's.

View the Description View the linked PRs

Description of problem:

- One node [ rendezvous]   is failed to add the cluster and there are some pending CSR's.

- omc get csr 
NAME                                                            AGE   SIGNERNAME                                    REQUESTOR                                                                   REQUESTEDDURATION   CONDITION
csr-44qjs                                                       21m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-9n9hc                                                       5m    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-9xw24                                                       1h    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-brm6f                                                       1h    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-dz75g                                                       36m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-l8c7v                                                       1h    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-mv7w5                                                       52m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-v6pgd                                                       1h    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending

In order to complete the installation, cu needs to approve the those CSR's manually.

Steps to Reproduce:

   agent-based installation.

Actual results:

    CSR's are in pending state.

Expected results:

    CSR's should approved automatically

Additional info:

Logs : https://drive.google.com/drive/folders/1UCgC6oMx28k-_WXy8w1iN_t9h9rtmnfo?usp=sharing

https://github.com/openshift/assisted-installer/pull/914

Bug OCPBUGS-35381: metal3-ironic-tls secret should have jira component annotation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34649~~. The following is the description of the original issue:
—
Description of problem:

    According to https://github.com/openshift/enhancements/pull/1502 all managed TLS artifacts (secrets, configmaps and files on disk) should have clear ownership and other necessary metadata

`metal3-ironic-tls` is created by cluster-baremetal-operator but doesn't have ownership annotation

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-baremetal-operator/pull/425

Bug OCPBUGS-39447: [OCP 4.15] "error getting ignition payload: failed to download binaries"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39419~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-38794~~. The following is the description of the original issue:
—
Description of problem:

HCP cluster is being updated but the nodepool is stuck updating:
~~~
NAME                   CLUSTER   DESIRED NODES   CURRENT NODES   AUTOSCALING   AUTOREPAIR   VERSION   UPDATINGVERSION   UPDATINGCONFIG   MESSAGE
nodepool-dev-cluster   dev       2               2               False         False        4.15.22   True              True
~~~

Version-Release number of selected component (if applicable):

Hosting OCP cluster 4.15
HCP 4.15.23

How reproducible:

N/A

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Nodepool stuck in upgrade

Expected results:

Upgrade success

Additional info:

I have found this error repeating continually in the ignition-server pods:
~~~
{"level":"error","ts":"2024-08-20T09:02:19Z","msg":"Reconciler error","controller":"secret","controllerGroup":"","controllerKind":"Secret","Secret":{"name":"token-nodepool-dev-cluster-3146da34","namespace":"dev-dev"},"namespace":"dev-dev","name":"token-nodepool-dev-cluster-3146da34","reconcileID":"ec1f0a7f-1657-4245-99ef-c984977ff0f8","error":"error getting ignition payload: failed to download binaries: failed to extract image file: failed to extract image file: file not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

{"level":"info","ts":"2024-08-20T09:02:20Z","logger":"get-payload","msg":"discovered machine-config-operator image","image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3b55cc8f88b9e6564fe6ad0bc431cd7270c0586a06d9b4a19ff2b518c461ede"}
{"level":"info","ts":"2024-08-20T09:02:20Z","logger":"get-payload","msg":"created working directory","dir":"/payloads/get-payload4089452863"}

{"level":"info","ts":"2024-08-20T09:02:28Z","logger":"get-payload","msg":"extracted image-references","time":"8s"}

{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"get-payload","msg":"extracted templates","time":"10s"}
{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"image-cache","msg":"retrieved cached file","imageRef":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3b55cc8f88b9e6564fe6ad0bc431cd7270c0586a06d9b4a19ff2b518c461ede","file":"usr/lib/os-release"}
{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"get-payload","msg":"read os-release","mcoRHELMajorVersion":"8","cpoRHELMajorVersion":"9"}
{"level":"info","ts":"2024-08-20T09:02:38Z","logger":"get-payload","msg":"copying file","src":"usr/bin/machine-config-operator.rhel9","dest":"/payloads/get-payload4089452863/bin/machine-config-operator"}
~~~

https://github.com/openshift/hypershift/pull/4664

Bug OCPBUGS-36965: [GCP NVIDIA H100] "destroy cluster" will hang at "VM has a Local SSD attached but an undefined value for 'discard-local-ssd'" when trying to stop the A3 instance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34638~~. The following is the description of the original issue:
—
Description of problem:

    For a cluster having one worker machine of A3 instance type, during "destroy cluster" it keeps telling below failure until I stopped the instance via "gcloud".

WARNING failed to stop instance jiwei-0530b-q9t8w-worker-c-ck6s8 in zone us-central1-c: googleapi: Error 400: VM has a Local SSD attached but an undefined value for `discard-local-ssd`. If using gcloud, please add `--discard-local-ssd=false` or `--discard-local-ssd=true` to your command., badRequest

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-multi-2024-05-29-143245

How reproducible:

    Always

Steps to Reproduce:

    1. "create install-config" and then "create manifests"
    2. edit a worker machineset YAML, to specify "machineType: a3-highgpu-8g" along with "onHostMaintenance: Terminate"
    3. "create cluster", and make sure it succeeds
    4. "destroy cluster"

Actual results:

    Uninstalling the cluster keeps telling stopping instance error.

Expected results:

    "destroy cluster" should proceed without any warning/error, and delete everything finally.

Additional info:

FYI the .openshift-install.log is available at https://drive.google.com/file/d/15xIwzi0swDk84wqg32tC_4KfUahCalrL/view?usp=drive_link

FYI to stop the A3 instance via "gcloud" by specifying "--discard-local-ssd=false" does succeed.

$ gcloud  compute instances list --format="table(creationTimestamp.date('%Y-%m-%d %H:%M:%S'):sort=1,zone,status,name,machineType,tags.items)" --filter="name~jiwei" 2>/dev/null
CREATION_TIMESTAMP   ZONE           STATUS      NAME                              MACHINE_TYPE   ITEMS
2024-05-29 20:55:52  us-central1-a  TERMINATED  jiwei-0530b-q9t8w-master-0        n2-standard-4  ['jiwei-0530b-q9t8w-master']
2024-05-29 20:55:52  us-central1-b  TERMINATED  jiwei-0530b-q9t8w-master-1        n2-standard-4  ['jiwei-0530b-q9t8w-master']
2024-05-29 20:55:52  us-central1-c  TERMINATED  jiwei-0530b-q9t8w-master-2        n2-standard-4  ['jiwei-0530b-q9t8w-master']
2024-05-29 21:10:08  us-central1-a  TERMINATED  jiwei-0530b-q9t8w-worker-a-rkxkk  n2-standard-4  ['jiwei-0530b-q9t8w-worker']
2024-05-29 21:10:19  us-central1-b  TERMINATED  jiwei-0530b-q9t8w-worker-b-qg6jv  n2-standard-4  ['jiwei-0530b-q9t8w-worker']
2024-05-29 21:10:31  us-central1-c  RUNNING     jiwei-0530b-q9t8w-worker-c-ck6s8  a3-highgpu-8g  ['jiwei-0530b-q9t8w-worker']
$ gcloud compute instances stop jiwei-0530b-q9t8w-worker-c-ck6s8 --zone us-central1-c
ERROR: (gcloud.compute.instances.stop) HTTPError 400: VM has a Local SSD attached but an undefined value for `discard-local-ssd`. If using gcloud, please add `--discard-local-ssd=false` or `--discard-local-ssd=true` to your command.
$ gcloud compute instances stop jiwei-0530b-q9t8w-worker-c-ck6s8 --zone us-central1-c --discard-local-ssd=false
Stopping instance(s) jiwei-0530b-q9t8w-worker-c-ck6s8...done.                                                                                    
Updated [https://compute.googleapis.com/compute/v1/projects/openshift-qe/zones/us-central1-c/instances/jiwei-0530b-q9t8w-worker-c-ck6s8].
$ gcloud  compute instances list --format="table(creationTimestamp.date('%Y-%m-%d %H:%M:%S'):sort=1,zone,status,name,machineType,tags.items)" --filter="name~jiwei" 2>/dev/null
CREATION_TIMESTAMP   ZONE           STATUS      NAME                              MACHINE_TYPE   ITEMS
2024-05-29 20:55:52  us-central1-a  TERMINATED  jiwei-0530b-q9t8w-master-0        n2-standard-4  ['jiwei-0530b-q9t8w-master']
2024-05-29 20:55:52  us-central1-b  TERMINATED  jiwei-0530b-q9t8w-master-1        n2-standard-4  ['jiwei-0530b-q9t8w-master']
2024-05-29 20:55:52  us-central1-c  TERMINATED  jiwei-0530b-q9t8w-master-2        n2-standard-4  ['jiwei-0530b-q9t8w-master']
2024-05-29 21:10:08  us-central1-a  TERMINATED  jiwei-0530b-q9t8w-worker-a-rkxkk  n2-standard-4  ['jiwei-0530b-q9t8w-worker']
2024-05-29 21:10:19  us-central1-b  TERMINATED  jiwei-0530b-q9t8w-worker-b-qg6jv  n2-standard-4  ['jiwei-0530b-q9t8w-worker']
2024-05-29 21:10:31  us-central1-c  TERMINATED  jiwei-0530b-q9t8w-worker-c-ck6s8  a3-highgpu-8g  ['jiwei-0530b-q9t8w-worker']
$ gcloud compute instances delete -q jiwei-0530b-q9t8w-worker-c-ck6s8 --zone us-central1-c
Deleted [https://www.googleapis.com/compute/v1/projects/openshift-qe/zones/us-central1-c/instances/jiwei-0530b-q9t8w-worker-c-ck6s8].
$

https://github.com/openshift/installer/pull/8733

Bug OCPBUGS-24721: Update 4.16 ose-multus-route-override-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-override-cni/pull/51

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24867: Update 4.16 ose-aws-ebs-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/246

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/246

Bug OCPBUGS-25021: Update 4.16 ose-cli-artifacts-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1627

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1628

Task OU-311: monitoring-plugin: Add Dockerfile for local testing

View the Description View the linked PRs

We should add a Dockerfile that is optimized for running the tests locally. (The current Dockerfile assumes it is running with the CI setup.)

https://github.com/openshift/monitoring-plugin/pull/91

Bug OCPBUGS-32042: Incorrect usage of install-config.yaml additionalTrustBundle field

View the Description View the linked PRs

Description of problem:

When the user configures the install-config.yaml additionalTrustBundle field (for example, in a disconnected installation using a local registry),
the user-ca-bundle configmap gets populated with more content than strictly required

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Setup a local registry and mirror the content of an ocp release
    2. Configure the install-config.yaml for a mirrored installation. In particular, configure the additionalTrustBundle field with the registry cert
    3. Create the agent ISO, boot the nodes and wait for the installation to complete

Actual results:

    The user-ca-bundle cm does not contain onyl the registry cert

Expected results:

user-ca-bundle configmap with just the content of the install-config additionalTrustBundle field

Additional info:

https://github.com/openshift/installer/pull/8253

Bug OCPBUGS-27285: Bump version of OVS to openvswitch3.1-3.1.0-73

View the Description View the linked PRs

Bump OVS version to fix https://bugzilla.redhat.com/show_bug.cgi?id=2228464

https://github.com/openshift/ovn-kubernetes/pull/1995

Bug OCPBUGS-29469: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-30567: Remove wrong arguments in K8sCreate method instance for Create YAML editor

View the Description View the linked PRs

Description of problem:

- Investigate why  `name` and `namespcae` properties are passed as arguments in `k8sCreate` instance for Create YAML editor function

- Remove the `name` and `namespcae` arguments in `k8sCreate` instance for Create YAML editor function if it does not require a big change. 

Problem:
If consoleFetchCommon takes an additional option(argument) and return response based on the option as proposed in "[Add support for returning response.header in consoleFetchCommon function|https://issues.redhat.com/browse/CONSOLE-3949]" story, the wrong and unused arguments in k8sCreate would cause the consoleFetchCommon method arguments to return entire response instead of response body which would break the Create Resource YAML functionality.

Code: https://github.com/openshift/console/blob/master/frontend/public/components/edit-yaml.jsx#L334

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13658

Bug OCPBUGS-31288: ART requests updates to 4.16 image ose-kube-metrics-server-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-metrics-server/pull/24

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-metrics-server/pull/24

Bug OCPBUGS-36206: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2396

Bug OCPBUGS-33775: Do not generate idms & itms if nothing has been mirrored

View the Description View the linked PRs

Description of problem:

When images have been skipped and no images have been mirrored i see idms and itms are generated.
2024/05/15 15:38:25  [WARN]   : ⚠️  --v2 flag identified, flow redirected to the oc-mirror v2 version. This is Tech Preview, it is still under development and it is not production ready.
2024/05/15 15:38:25  [INFO]   : 👋 Hello, welcome to oc-mirror
2024/05/15 15:38:25  [INFO]   : ⚙️  setting up the environment for you...
2024/05/15 15:38:25  [INFO]   : 🔀 workflow mode: mirrorToMirror 
2024/05/15 15:38:25  [INFO]   : 🕵️  going to discover the necessary images...
2024/05/15 15:38:25  [INFO]   : 🔍 collecting release images...
2024/05/15 15:38:25  [INFO]   : 🔍 collecting operator images...
2024/05/15 15:38:25  [INFO]   : 🔍 collecting additional images...
2024/05/15 15:38:25  [WARN]   : [AdditionalImagesCollector] mirroring skipped : source image quay.io/cilium/cilium-etcd-operator:v2.0.7@sha256:04b8327f7f992693c2cb483b999041ed8f92efc8e14f2a5f3ab95574a65ea2dc has both tag and digest
2024/05/15 15:38:25  [WARN]   : [AdditionalImagesCollector] mirroring skipped : source image quay.io/coreos/etcd:v3.5.4@sha256:a67fb152d4c53223e96e818420c37f11d05c2d92cf62c05ca5604066c37295e9 has both tag and digest
2024/05/15 15:38:25  [INFO]   : 🚀 Start copying the images...
2024/05/15 15:38:25  [INFO]   : === Results ===
2024/05/15 15:38:25  [INFO]   : All release images mirrored successfully 0 / 0 ✅
2024/05/15 15:38:25  [INFO]   : All operator images mirrored successfully 0 / 0 ✅
2024/05/15 15:38:25  [INFO]   : All additional images mirrored successfully 0 / 0 ✅
2024/05/15 15:38:25  [INFO]   : 📄 Generating IDMS and ITMS files...
2024/05/15 15:38:25  [INFO]   : /app1/knarra/customertest1/working-dir/cluster-resources/idms-oc-mirror.yaml file created
2024/05/15 15:38:25  [INFO]   : 📄 Generating CatalogSource file...
2024/05/15 15:38:25  [INFO]   : mirror time     : 715.644µs
2024/05/15 15:38:25  [INFO]   : 👋 Goodbye, thank you for using oc-mirror
[fedora@preserve-fedora36 knarra]$ ls -l /app1/knarra/customertest1/working-dir/cluster-resources/idms-oc-mirror.yaml
-rw-r--r--. 1 fedora fedora 0 May 15 15:38 /app1/knarra/customertest1/working-dir/cluster-resources/idms-oc-mirror.yaml
[fedora@preserve-fedora36 knarra]$ cat /app1/knarra/customertest1/working-dir/cluster-resources/idms-oc-mirror.yaml

Version-Release number of selected component (if applicable):

     4.16 oc-mirror

How reproducible:

     Always

Steps to Reproduce:

    1. Use the following imageSetConfig.yaml and run command `./oc-mirror --v2 -c /tmp/bug331961.yaml --workspace file:///app1/knarra/customertest1 docker://localhost:5000/bug331961 --dest-tls-verify=false`


   
cat /tmp/imageSetConfig.yaml
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v2alpha1
mirror:
   additionalImages:
   - name: quay.io/cilium/cilium-etcd-operator:v2.0.7@sha256:04b8327f7f992693c2cb483b999041ed8f92efc8e14f2a5f3ab95574a65ea2dc
   - name: quay.io/coreos/etcd:v3.5.4@sha256:a67fb152d4c53223e96e818420c37f11d05c2d92cf62c05ca5604066c37295e9

Actual results:

    Nothing will be mirrored and the images listed will be skipped as these images has both tag and digest but i see idms and itms empty files being generated

Expected results:

     If nothing is mirrored, idms and itms files should not be generated.

Additional info:

    https://issues.redhat.com/browse/OCPBUGS-33196

https://github.com/openshift/oc-mirror/pull/871

Bug OCPBUGS-35249: CVO fails to validate signatures: signatureStores is an empty array

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35236~~. The following is the description of the original issue:
—

Description of problem:

Attempts to update a cluster to a release payload with a signature published by Red Hat fails with CVO failing to verity the signature, signalled by the ReleaseAccepted=False condition:

Retrieving payload failed version="4.16.0-rc.4" image="quay.io/openshift-release-dev/ocp-release@sha256:5e76f8c2cdc81fa40abb809ee5e2d56cb84f409aab773aa9b9c7e8ed8811bf74" failure=The update cannot be verified: unable to verify sha256:5e76f8c2cdc81fa40abb809ee5e2d56cb84f409aab773aa9b9c7e8ed8811bf74 against keyrings: verifier-public-key-redhat

CVO shows evidence of not being able to find the proper signature in its stores:

$ grep verifier-public-key-redhat cvo.log | head
I0610 07:38:16.208595       1 event.go:364] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="4.16.0-rc.4" image="quay.io/openshift-release-dev/ocp-release@sha256:5e76f8c2cdc81fa40abb809ee5e2d56cb84f409aab773aa9b9c7e8ed8811bf74" failure=The update cannot be verified: unable to verify sha256:5e76f8c2cdc81fa40abb809ee5e2d56cb84f409aab773aa9b9c7e8ed8811bf74 against keyrings: verifier-public-key-redhat // [2024-06-10T07:38:16Z: prefix sha256-5e76f8c2cdc81fa40abb809ee5e2d56cb84f409aab773aa9b9c7e8ed8811bf74 in config map signatures-managed: no more signatures to check, 2024-06-10T07:38:16Z: ClusterVersion spec.signatureStores is an empty array. Unset signatureStores entirely if you want to enable the default signature stores]
...

Version-Release number of selected component (if applicable):

4.16.0-rc.3
4.16.0-rc.4
4.17.0-ec.0

How reproducible:

Seems always. All CI build farm clusters showed this behavior when trying to update from 4.16.0-rc.3

Steps to Reproduce:

1. Launch update to a version with a signature published by RH

Actual results:

ReleaseAccepted=False and update is stuck

Expected results:

ReleaseAccepted=True and update proceeds

Additional info:

Suspected culprit is https://github.com/openshift/cluster-version-operator/pull/1030/ so the fix may be a revert or an attempt to fix-forward, but revert seems safer at this point.

Evidence:

#1030 was added in 4.16.0-rc.3
#1030 code is supposed to work with an updated ClusterVersion CRD field .spec.signatureStores which right now is in TechPreview, so it is not enabled by default
CVO log hints that it is trying to process the field but fails [1], and that somehow the signature store may be considered as an explicitly, intentionally empty array instead of default set of signature locations
easily reproducible on any release that contains #1030
testing cluster with a #1030 revert did not reproduce the bug, successfuly started updating to rc4

[1]

...ClusterVersion spec.signatureStores is an empty array. Unset signatureStores entirely if you want to enable the default signature store...
W0610 07:58:59.095970       1 warnings.go:70] unknown field "spec.signatureStores"

https://github.com/openshift/cluster-version-operator/pull/1053

Bug OCPBUGS-17249: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1787

Bug OCPBUGS-30672: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/386

Bug OCPBUGS-31411: Runbook is unused

View the Description View the linked PRs

https://github.com/openshift/runbooks/blob/master/alerts/AggregatedAPIErrors.md exists but is unused by us. We should

move this runbook to the monitoring folder
rename it to KubeAggregatedAPIErrors
link it in https://github.com/openshift/cluster-monitoring-operator/blob/800bedc61f101cd4450f45535f7bfafb72112c4f/assets/control-plane/prometheus-rule.yaml#L401-L408

https://github.com/openshift/cluster-monitoring-operator/pull/2316

Bug OCPBUGS-22894: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13634

Bug OCPBUGS-32303: multus-admission-controller security warning fix

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/multus-admission-controller/pull/84

Bug OCPBUGS-32707: example namespaced page is not working

View the Description View the linked PRs

Description of problem:

the example namespaced page is not working

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-04-22-023835

How reproducible:

Always

Steps to Reproduce:

1. Deploy console-demo-plugin manifests, enable the plugin
$ oc apply -f https://raw.githubusercontent.com/openshift/console/master/dynamic-demo-plugin/oc-manifest.yaml 
$ oc patch console.operator cluster --type='json' -p='[{"op": "add", "path": "/spec/plugins/-", "value":"console-demo-plugin"}]'

2. Change to Demo perspective, click on `Example Namespaced Page` menu

Actual results:

2. an error page returned
Cannot destructure property 'ns' of '(intermediate value)(intermediate value)(intermediate value)' as it is undefined.

Expected results:

2. a page with namespace dropdown menu should be rendered

Additional info:

https://github.com/openshift/console/pull/13803

Story MGMT-17308: Offboard Osher account off the different systems

View the Description View the linked PRs

Remove odepaz/osherdp from all systems in Red Hat.

Bug OCPBUGS-36067: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/71

Bug OCPBUGS-41840: Built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15" even though internalJoinSubnet is configured

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39209. The following is the description of the original issue:
—
Description of problem:
Attempting to Migrate from OpenShiftSDN to OVNKubernetes but experiencing the below Error once the Limited Live Migration is started.

+ exec /usr/bin/hybrid-overlay-node --node ip-10-241-1-192.us-east-2.compute.internal --config-file=/run/ovnkube-config/ovnkube.conf --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
I0829 14:06:20.313928   82345 config.go:2192] Parsed config file /run/ovnkube-config/ovnkube.conf
I0829 14:06:20.314202   82345 config.go:2193] Parsed config: {Default:{MTU:8901 RoutableMTU:0 ConntrackZone:64000 HostMasqConntrackZone:0 OVNMasqConntrackZone:0 HostNodePortConntrackZone:0 ReassemblyConntrackZone:0 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 OfctrlWaitBeforeClear:0 MonitorAll:true OVSDBTxnTimeout:1m40s LFlowCacheEnable:true LFlowCacheLimit:0 LFlowCacheLimitKb:1048576 RawClusterSubnets:100.64.0.0/15/23 ClusterSubnets:[] EnableUDPAggregation:true Zone:global} Logging:{File: CNIFile: LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} IPFIX:{Sampling:400 CacheActiveTimeout:60 CacheMaxFlows:0} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableAdminNetworkPolicy:true EnableEgressIP:true EgressIPReachabiltyTotalTimeout:1 EnableEgressFirewall:true EnableEgressQoS:true EnableEgressService:true EgressIPNodeHealthCheckPort:9107 EnableMultiNetwork:true EnableMultiNetworkPolicy:false EnableStatelessNetPol:false EnableInterconnect:false EnableMultiExternalGateway:true EnablePersistentIPs:false EnableDNSNameResolver:false EnableServiceTemplateSupport:false} Kubernetes:{BootstrapKubeconfig: CertDir: CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:https://api-int.nonamenetwork.sandbox1730.opentlc.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:198.18.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:169.254.169.0/29 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5} DisablePacketMTUCheck:false RouterSubnet: SingleNode:false DisableForwarding:false AllowNoUplink:false} MasterHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} ClusterMgrHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} HybridOverlay:{Enabled:true RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full DPResourceDeviceIdsMap:map[] MgmtPortNetdev: MgmtPortDPResourceName:} ClusterManager:{V4TransitSwitchSubnet:100.88.0.0/16 V6TransitSwitchSubnet:fd97::/64}}
F0829 14:06:20.315468   82345 hybrid-overlay-node.go:54] illegal network configuration: built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15"

The OpenShift Container Platform 4 - Cluster has been installed with the below configuration and therefore has a conflict because of the clusterNetwork with the Join Subnet of OVNKubernetes.

$ oc get cm -n kube-system cluster-config-v1 -o yaml
apiVersion: v1
data:
  install-config: |
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: sandbox1730.opentlc.com
    compute:
    - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform: {}
      replicas: 3
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform: {}
      replicas: 3
    metadata:
      creationTimestamp: null
      name: nonamenetwork
    networking:
      clusterNetwork:
      - cidr: 100.64.0.0/15
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.241.0.0/16
      networkType: OpenShiftSDN
      serviceNetwork:
      - 198.18.0.0/16
    platform:
      aws:
        region: us-east-2
    publish: External
    pullSecret: ""

So following the procedure, the below steps were executed but still the problem is being reported.

oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalJoinSubnet": "100.68.0.0/16"}}}}}'

Checking whether change was applied and one can see it being there/configured.

$ oc get network.operator cluster -o yaml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2024-08-29T10:05:36Z"
  generation: 376
  name: cluster
  resourceVersion: "135345"
  uid: 37f08c71-98fa-430c-b30f-58f82142788c
spec:
  clusterNetwork:
  - cidr: 100.64.0.0/15
    hostPrefix: 23
  defaultNetwork:
    openshiftSDNConfig:
      enableUnidling: true
      mode: NetworkPolicy
      mtu: 8951
      vxlanPort: 4789
    ovnKubernetesConfig:
      egressIPConfig: {}
      gatewayConfig:
        ipv4: {}
        ipv6: {}
        routingViaHost: false
      genevePort: 6081
      ipsecConfig:
        mode: Disabled
      ipv4:
        internalJoinSubnet: 100.68.0.0/16
      mtu: 8901
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        maxLogFiles: 5
        rateLimit: 20
        syslogFacility: local0
    type: OpenShiftSDN
  deployKubeProxy: false
  disableMultiNetwork: false
  disableNetworkDiagnostics: false
  kubeProxyConfig:
    bindAddress: 0.0.0.0
  logLevel: Normal
  managementState: Managed
  migration:
    mode: Live
    networkType: OVNKubernetes
  observedConfig: null
  operatorLogLevel: Normal
  serviceNetwork:
  - 198.18.0.0/16
  unsupportedConfigOverrides: null
  useMultiNetworkPolicy: false

Following the above the Limited Live Migration is being triggered, which then suddently stops because of the Error shown.

oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'

Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.16.9

How reproducible:
Always

Steps to Reproduce:
1. Install OpenShift Container Platform 4 with OpenShiftSDN, the configuration shown above and then update to OpenShift Container Platform 4.16
2. Change internalJoinSubnet to prevent a conflict with the Join Subnet of OVNKubernetes (oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":

{"internalJoinSubnet": "100.68.0.0/16"}

}}}}')
3. Initiate the Limited Live Migration running oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
4. Check the logs of ovnkube-node using oc logs ovnkube-node-XXXXX -c ovnkube-controller

Actual results:

+ exec /usr/bin/hybrid-overlay-node --node ip-10-241-1-192.us-east-2.compute.internal --config-file=/run/ovnkube-config/ovnkube.conf --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
I0829 14:06:20.313928   82345 config.go:2192] Parsed config file /run/ovnkube-config/ovnkube.conf
I0829 14:06:20.314202   82345 config.go:2193] Parsed config: {Default:{MTU:8901 RoutableMTU:0 ConntrackZone:64000 HostMasqConntrackZone:0 OVNMasqConntrackZone:0 HostNodePortConntrackZone:0 ReassemblyConntrackZone:0 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 OfctrlWaitBeforeClear:0 MonitorAll:true OVSDBTxnTimeout:1m40s LFlowCacheEnable:true LFlowCacheLimit:0 LFlowCacheLimitKb:1048576 RawClusterSubnets:100.64.0.0/15/23 ClusterSubnets:[] EnableUDPAggregation:true Zone:global} Logging:{File: CNIFile: LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} IPFIX:{Sampling:400 CacheActiveTimeout:60 CacheMaxFlows:0} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableAdminNetworkPolicy:true EnableEgressIP:true EgressIPReachabiltyTotalTimeout:1 EnableEgressFirewall:true EnableEgressQoS:true EnableEgressService:true EgressIPNodeHealthCheckPort:9107 EnableMultiNetwork:true EnableMultiNetworkPolicy:false EnableStatelessNetPol:false EnableInterconnect:false EnableMultiExternalGateway:true EnablePersistentIPs:false EnableDNSNameResolver:false EnableServiceTemplateSupport:false} Kubernetes:{BootstrapKubeconfig: CertDir: CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:https://api-int.nonamenetwork.sandbox1730.opentlc.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:198.18.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes:migration.network.openshift.io/plugin= NoHostSubnetNodes:<nil> HostNetworkNamespace:openshift-host-network PlatformType:AWS HealthzBindAddress:0.0.0.0:10256 CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:openshift-dns DNSServiceName:dns-default} Metrics:{BindAddress: OVNMetricsBindAddress: ExportOVSMetrics:false EnablePprof:false NodeServerPrivKey: NodeServerCert: EnableConfigDuration:false EnableScaleMetrics:false} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 V4MasqueradeSubnet:169.254.169.0/29 V6MasqueradeSubnet:fd69::/125 MasqueradeIPs:{V4OVNMasqueradeIP:169.254.169.1 V6OVNMasqueradeIP:fd69::1 V4HostMasqueradeIP:169.254.169.2 V6HostMasqueradeIP:fd69::2 V4HostETPLocalMasqueradeIP:169.254.169.3 V6HostETPLocalMasqueradeIP:fd69::3 V4DummyNextHopMasqueradeIP:169.254.169.4 V6DummyNextHopMasqueradeIP:fd69::4 V4OVNServiceHairpinMasqueradeIP:169.254.169.5 V6OVNServiceHairpinMasqueradeIP:fd69::5} DisablePacketMTUCheck:false RouterSubnet: SingleNode:false DisableForwarding:false AllowNoUplink:false} MasterHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} ClusterMgrHA:{ElectionLeaseDuration:137 ElectionRenewDeadline:107 ElectionRetryPeriod:26} HybridOverlay:{Enabled:true RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full DPResourceDeviceIdsMap:map[] MgmtPortNetdev: MgmtPortDPResourceName:} ClusterManager:{V4TransitSwitchSubnet:100.88.0.0/16 V6TransitSwitchSubnet:fd97::/64}}
F0829 14:06:20.315468   82345 hybrid-overlay-node.go:54] illegal network configuration: built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15"

Expected results:
OVNKubernetes Limited Live Migration to recognize the change applied for internalJoinSubnet and don't report any CIDR/Subnet overlap during the OVNKubernetes Limited Live Migration

Additional info:
N/A

Affected Platforms:
OpenShift Container Platform 4.16 on AWS

https://github.com/openshift/cluster-network-operator/pull/2502

Bug OCPBUGS-42062: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/9036

Bug OCPBUGS-42067: Speed up CMO e2e tests [4.16.z]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42066~~. The following is the description of the original issue:
—
Description of problem:

We need to backport https://github.com/openshift/cluster-monitoring-operator/pull/2271 to 4.16 because the CMO e2e tests fail almost all the time in release-4.16.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    This is a non functional change.

https://github.com/openshift/cluster-monitoring-operator/pull/2476

Bug OCPBUGS-31030: deprecated warning in test "deploymentconfigs when tagging images should successfully tag the deployed image "

View the Description View the linked PRs

Description of problem:

  "[sig-apps][Feature:DeploymentConfig] deploymentconfigs when tagging images should successfully tag the deployed image [apigroup:apps.openshift.io][apigroup:authorization.openshift.io][apigroup:image.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" has the following warning: warnings.go:70] apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-03-07-234116

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28680

Bug OCPBUGS-35407: Cannot disable feature migration for egressIP during offline SDN migration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30948~~. The following is the description of the original issue:
—
Description of problem:

When doing offline SDN migration, setting the parameter "spec.migration.features.egressIP" to "false" to disable automatic migration of egressIP configuration doesn't work.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Launch a cluster with OpenShiftSDN. Configure an egressip to a node.
    2. Start offline SDN migration.
    3. In step-3, execute 
       oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{
    "spec": {
      "migration": {
        "networkType": "OVNKubernetes",
        "features": {
          "egressIP": false
        }
      }
    }
  }'

Actual results:

An egressip.k8s.ovn.org CR is created automatcially.

Expected results:

No egressip CR shall be created for OVN-K

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2413

Bug OCPBUGS-29868: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/1037

Bug OCPBUGS-29363: TaskRuns list page is loading constantly for all projects

View the Description View the linked PRs

Description of problem:

    1. TaskRuns list page is loading constantly for all projects
    2. Archive icon is not displayed for some tasks in TaskRun list page
    3. On change of ns to All Projects, PipelineRuns and TaskRuns are not loading properly

Version-Release number of selected component (if applicable):

    4.15.z

How reproducible:

    Always

Steps to Reproduce:

    1.Create some TaskRun
    2.Go to TaskRun list page
    3.Select all project in project dropdown

Actual results:

Screen is keep on loading

Expected results:

     Should load TaskRuns from all projects

Additional info:

https://github.com/openshift/console/pull/13604

Bug OCPBUGS-29637: image-registry co is degraded on Azure MAG, Azure Stack Hub cloud or with azure workload identity

View the Description View the linked PRs

Description of problem:

Install IPI cluster against 4.15 nightly build on Azure MAG and Azure Stack Hub or with Azure workload identity, image-registry co is degraded with different errors.

On MAG:
$ oc get co image-registry
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.15.0-0.nightly-2024-02-16-235514   True        False         True       5h44m   AzurePathFixControllerDegraded: Migration failed: panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such host...

$ oc get pod -n openshift-image-registry
NAME                                               READY   STATUS    RESTARTS        AGE
azure-path-fix-ssn5w                               0/1     Error     0               5h47m
cluster-image-registry-operator-86cdf775c7-7brn6   1/1     Running   1 (5h50m ago)   5h58m
image-registry-5c6796b86d-46lvx                    1/1     Running   0               5h47m
image-registry-5c6796b86d-9st5d                    1/1     Running   0               5h47m
node-ca-48lsh                                      1/1     Running   0               5h44m
node-ca-5rrsl                                      1/1     Running   0               5h47m
node-ca-8sc92                                      1/1     Running   0               5h47m
node-ca-h6trz                                      1/1     Running   0               5h47m
node-ca-hm7s2                                      1/1     Running   0               5h47m
node-ca-z7tv8                                      1/1     Running   0               5h44m

$ oc logs azure-path-fix-ssn5w -n openshift-image-registry
panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such hostgoroutine 1 [running]:
main.main()
    /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:49 +0x125

The blob storage endpoint seems not correct, should be:
$ az storage account show -n imageregistryjima41xvvww -g jima415a-hfxfh-rg --query primaryEndpoints
{
  "blob": "https://imageregistryjima41xvvww.blob.core.usgovcloudapi.net/",
  "dfs": "https://imageregistryjima41xvvww.dfs.core.usgovcloudapi.net/",
  "file": "https://imageregistryjima41xvvww.file.core.usgovcloudapi.net/",
  "internetEndpoints": null,
  "microsoftEndpoints": null,
  "queue": "https://imageregistryjima41xvvww.queue.core.usgovcloudapi.net/",
  "table": "https://imageregistryjima41xvvww.table.core.usgovcloudapi.net/",
  "web": "https://imageregistryjima41xvvww.z2.web.core.usgovcloudapi.net/"
}

On Azure Stack Hub:
$ oc get co image-registry
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.15.0-0.nightly-2024-02-16-235514   True        False         True       3h32m   AzurePathFixControllerDegraded: Migration failed: panic: open : no such file or directory...

$ oc get pod -n openshift-image-registry
NAME                                               READY   STATUS    RESTARTS        AGE
azure-path-fix-8jdg7                               0/1     Error     0               3h35m
cluster-image-registry-operator-86cdf775c7-jwnd4   1/1     Running   1 (3h38m ago)   3h54m
image-registry-658669fbb4-llv8z                    1/1     Running   0               3h35m
image-registry-658669fbb4-lmfr6                    1/1     Running   0               3h35m
node-ca-2jkjx                                      1/1     Running   0               3h35m
node-ca-dcg2v                                      1/1     Running   0               3h35m
node-ca-q6xmn                                      1/1     Running   0               3h35m
node-ca-r46r2                                      1/1     Running   0               3h35m
node-ca-s8jkb                                      1/1     Running   0               3h35m
node-ca-ww6ql                                      1/1     Running   0               3h35m

$ oc logs azure-path-fix-8jdg7 -n openshift-image-registry
panic: open : no such file or directorygoroutine 1 [running]:
main.main()
    /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:36 +0x145

On cluster with Azure workload identity:
Some operator's PROGRESSING is True
image-registry                             4.15.0-0.nightly-2024-02-16-235514   True        True          False      43m     Progressing: The deployment has not completed...

pod azure-path-fix is in CreateContainerConfigError status, and get error in its Event.

"state": {
    "waiting": {
        "message": "couldn't find key REGISTRY_STORAGE_AZURE_ACCOUNTKEY in Secret openshift-image-registry/image-registry-private-configuration",
        "reason": "CreateContainerConfigError"
    }
}

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-16-235514

How reproducible:

    Always

Steps to Reproduce:

    1. Install IPI cluster on MAG or Azure Stack Hub or config Azure workload identity
    2.
    3.

Actual results:

    Installation failed and image-registry operator is degraded

Expected results:

    Installation is successful.

Additional info:

    Seems that issue is related with https://github.com/openshift/image-registry/pull/393

https://github.com/openshift/cluster-image-registry-operator/pull/1003

Bug OCPBUGS-34143: Create Serverless form does not create BuildConfig

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32405~~. The following is the description of the original issue:
—
Description of problem:

    When creating a serverless function in create serverless form, BuildConfig is not created

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1.Install Serverless operator
    2.Add https://github.com/openshift-dev-console/kn-func-node-cloudevents in create serverless form     
    3.Create the function and check BuildConfig page

Actual results:

    BuildConfig is not created

Expected results:

    Should create BuildConfig

Additional info:

https://github.com/openshift/console/pull/13885

Bug OCPBUGS-36484: Cannot create web-terminals as kubeadmin on OpenShift 4.15+

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31685~~. The following is the description of the original issue:
—
See the bug reported here https://github.com/openshift/console/issues/13696

https://github.com/openshift/console/pull/14027

Bug OCPBUGS-27874: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-nutanix/pull/65

Story OSASINFRA-3368: Revendor CAPO in MAPO

View the Description View the linked PRs

We want to use the latest version of CAPO in MAPO. We need to revendor CAPO to version 0.9 before the 4.16 FF.

There are several API changes that might require Matt's help.

https://github.com/openshift/machine-api-provider-openstack/pull/103

Bug OCPBUGS-24871: Update 4.16 ose-cluster-config-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-config-operator/pull/390

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-config-operator/pull/390

Bug OCPBUGS-29856: multus-cni upstream sync to 2024-02-22

View the Description View the linked PRs

Not issue, just upstream sync (or issue: multus is not up-to-date).

https://github.com/openshift/multus-cni/pull/221

Bug OCPBUGS-23801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2321

Bug OCPBUGS-30060: Image Registry ClusterOperator not progressing on Azure Hosted Control Planes

View the Description View the linked PRs

Description of problem:

  The image registry CO is not progressing on Azure Hosted Control Planes

Version-Release number of selected component (if applicable):

How reproducible:

    Every time

Steps to Reproduce:

    1. Create an Azure HCP
    2. Create a kubeconfig for the guest cluster
    3. Check the image-registry CO

Actual results:

    image-registry co's message is Progressing: The registry is ready...

Expected results:

    image-registry finishes progressing

Additional info:

    I let it go for about 34m

% oc get co | grep -i image
image-registry                             4.16.0-0.nightly-multi-2024-02-26-105325   True        True          False      34m     Progressing: The registry is ready...

% oc get co/image-registry -oyaml
...
  - lastTransitionTime: "2024-02-28T19:10:30Z"
    message: |-
      Progressing: The registry is ready
      NodeCADaemonProgressing: The daemon set node-ca is deployed
      AzurePathFixProgressing: The job does not exist
    reason: AzurePathFixNotFound::Ready
    status: "True"
    type: Progressing

https://github.com/openshift/hypershift/pull/3667

Bug OCPBUGS-30551: Switch to service to get the PLR and TR logs from the Tekton results summary API

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13654

Bug OCPBUGS-34171: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4079

Bug OCPBUGS-24894: Update 4.16 ose-powervs-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/67

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/67

Bug OCPBUGS-25810: Update downstream OWNERS to include Surya

View the Description View the linked PRs

No QA required, updating approvers across releases

https://github.com/openshift/ovn-kubernetes/pull/2000

Bug OCPBUGS-32477: "update failure status Build status OutOfMemoryKilled should contain OutOfMemoryKilled failure reason and message" is failing on 4.15

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

 [sig-builds][Feature:Builds][Slow] update failure status Build status OutOfMemoryKilled should contain OutOfMemoryKilled failure reason and message [apigroup:build.openshift.io] test is failing on 4.15 (e.g. https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_oc/1726/pull-ci-openshift-oc-release-4.15-e2e-aws-ovn-builds/1780913191149113344)

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/containerd/containerd/issues/8180 would be the reason of failure. Because in 4.15 the expected status is OOMKilled however test fails after getting an Error status with the correct exit code (137)

https://github.com/openshift/origin/pull/28725

Bug OCPBUGS-28939: ART requests updates to 4.16 image ose-gcp-pd-csi-driver-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver-operator/pull/117

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/117

Bug OCPBUGS-13665: Egress firewall rules with 'nodeSelector' shall include all the IPs of the node

View the Description View the linked PRs

Description of problem:

Today Egress firewall rules with 'nodeSelector' only use the nodeIP in the OVN ACL rule. But there is possibility that one node may have secondary IPs other that the nodeIP. We shall create ACL with all the possible IPs of the selected node.

Version-Release number of selected component (if applicable):

How reproducible:

Create a egress firewall rule with 'nodeSelector'

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2073

Bug OCPBUGS-26757: Switch to using new image for KAS container bootstrap

View the Description View the linked PRs

Description of problem:

Manifests will be removed from CCO image so we have to start using CCA(cluster-config-api) image for bootstrap

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

  KAS bootstrap container fails

Expected results:

    KAS bootstrap container suceeds

Additional info:

https://github.com/openshift/hypershift/pull/3400

Bug OCPBUGS-24954: Update 4.16 ose-oauth-apiserver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-apiserver/pull/94

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-apiserver/pull/94

Bug OCPBUGS-28665: openshift/openshift-controller-manager - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/284

Bug OCPBUGS-31376: SELinux: kubelet running with wrong label

View the Description View the linked PRs

Description of problem:

The kubelet is running with `unconfined_service_t`. It should run as `kubelet_exec_t`. This is causing all our plugins to fail because of Selinux denial.

sh-5.1# ps -AZ | grep kubelet
system_u:system_r:unconfined_service_t:s0 8719 ? 00:24:50 kubelet

This issue was previously observed and resolved in 4.14.10.

Version-Release number of selected component (if applicable):

OCP 4.15

How reproducible:

Run ps -AZ | grep kubelet to see kubelet running with wrong label

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Kubelet is running as unconfined_service_t

Expected results:

    Kubelet should run as kubelet_exec_t

Additional info:

https://github.com/openshift/machine-config-operator/pull/4287

Bug OCPBUGS-34822: Nutanix: installer intermittent failure to upload image data from local file when CAPI enabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33789~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

There is an intermittent issue with the UploadImage() implementation in github.com/nutanix-cloud-native/prism-go-client@v0.3.4, on which the OCP installer depends. When testing the OCP installer with ClusterAPIInstall=true, I frequently hit the error with UploadImage() when calling to upload the bootstrap image to PC from the local image file. 

The error logs:
INFO creating the bootstrap image demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso (uuid: 75694edf-f9c4-4d9a-9a44-731a4d103cc8), taskUUID: c8eafd49-54e2-4fb9-a3df-c456863d71fd.
INFO created the bootstrap image demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso (uuid: 75694edf-f9c4-4d9a-9a44-731a4d103cc8).
INFO preparing to upload the bootstrap image demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso (uuid: 75694edf-f9c4-4d9a-9a44-731a4d103cc8) data from file /Users/yanhuali/Library/Caches/openshift-installer/image_cache/demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso
ERROR failed to upload the bootstrap image data "demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso" from filepath /Users/yanhuali/Library/Caches/openshift-installer/image_cache/demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso: status: 400 Bad Request, error-response: {
ERROR   "api_version": "3.1",
ERROR   "code": 400,
ERROR   "message_list": [
ERROR   { ERROR     "message": "Given input is invalid. Image 75694edf-f9c4-4d9a-9a44-731a4d103cc8 is already complete", ERROR     "reason": "INVALID_ARGUMENT" ERROR    }ERROR   ],
ERROR   "state": "ERROR"
ERROR }
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed preparing ignition data: failed to upload the bootstrap image data "demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso" from filepath /Users/yanhuali/Library/Caches/openshift-installer/image_cache/demo-ocp-cluster-g1-lrmwb-bootstrap-ign.iso: status: 400 Bad Request, error-response: {
ERROR   "api_version": "3.1",
ERROR   "code": 400,
ERROR   "message_list": [
ERROR   { ERROR     "message": "Given input is invalid. Image 75694edf-f9c4-4d9a-9a44-731a4d103cc8 is already complete", ERROR     "reason": "INVALID_ARGUMENT" ERROR    }ERROR   ],
ERROR   "state": "ERROR"
ERROR } The OCP installer code calling the prism-go-client function UploadImage() is here:https://github.com/openshift/installer/blob/master/pkg/infrastructure/nutanix/clusterapi/clusterapi.go#L172-L207

How reproducible:

Use OCP IPI 4.16 to provision a Nutanix OCP cluster with the install-config ClusterAPIInstall=true. This is an intermittent issue, so you need to repeat the test several times to reproduce.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

The installer intermittently failed at uploading the bootstrap image data to PC from the local image data file.

Expected results:

 The installer successfully to create the Nutanix OCP cluster with the install-config ClusterAPIInstall=true.

Additional info:

https://github.com/openshift/installer/pull/8524

Bug OCPBUGS-31608: Invalid idms files are being generated when imageSetConfig file does not have filtering based on channels for operators when mirroring from disk2mirror

View the Description View the linked PRs

Description of problem:

Invalid idms files are being generated when imageSetConfig file does not have  filtering based on channels for operators when mirroring from disk2mirror

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403251146.p0.g03ce0ca.assembly.stream.el9-03ce0ca", GitCommit:"03ce0ca797e73b6762fd3e24100ce043199519e9", GitTreeState:"clean", BuildDate:"2024-03-25T16:34:33Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1)  Use following imagesetconfig :
cat config.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
      packages:
        - name: devworkspace-operator
          minVersion: "0.23.0"
        - name: quay-operator
          maxVersion: "3.10.2"
        - name: cluster-logging
          minVersion: 5.8.3
          maxVersion: 5.8.5

2)  Do mirror2Disk and disk2Mirror : 
`oc-mirror --config config.yaml file://outnochannel --v2`
`oc-mirror --config config.yaml --from file://outnochannel --v2 docker://ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default`
   
3) Create the catalogsource, idms, itms resources

Actual results:

4) failed to create ImageTagMirrorSet
oc create -f outnochannel/working-dir/cluster-resources/idms-oc-mirror.yaml
The ImageDigestMirrorSet "idms-operator-0" is invalid: spec.imageDigestMirrors[2].source: Invalid value: "registry.redhat.io/": spec.imageDigestMirrors[2].source in body should match '^\*(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))+$|^((?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])(?:(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))+)?(?::[0-9]+)?)(?:(?:/[a-z0-9]+(?:(?:(?:[._]|__|[-]*)[a-z0-9]+)+)?)+)?$'


cat outnochannel/working-dir/cluster-resources/idms-oc-mirror.yaml
---
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  creationTimestamp: null
  name: idms-operator-0
spec:
  imageDigestMirrors:
  - mirrors:
    - ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default/devworkspace
    source: registry.redhat.io/devworkspace
  - mirrors:
    - ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default/openshift4
    source: registry.redhat.io/openshift4
  - mirrors:
    - ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default
    source: registry.redhat.io/
  - mirrors:
    - ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default/quay
    source: registry.redhat.io/quay
  - mirrors:
    - ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default/rhel8
    source: registry.redhat.io/rhel8
  - mirrors:
    - ec2-3-17-164-23.us-east-2.compute.amazonaws.com:5000/default/openshift-logging
    source: registry.redhat.io/openshift-logging
status: {}

Expected results:

4) succeed to create the cluster resource

https://github.com/openshift/oc-mirror/pull/850

Bug OCPBUGS-24791: Update 4.16 golang-github-openshift-oauth-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-proxy/pull/270

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-proxy/pull/270

Task HOSTEDCP-1480: Improve security posture of TLS cert hash

View the Description View the linked PRs

Update hash creation to use sha512 instead of sha1

Links

https://app.snyk.io/org/hypershift/project/4c9cffe4-d47c-473d-bea4-4daf4266d02d#issue-aa63e0f3-bed0-44e9-bf14-515127af79d7

https://github.com/openshift/hypershift/pull/3718

Bug OCPBUGS-38062: [release-4.16] LDAP communication going through HTTP(S) proxy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37052~~. The following is the description of the original issue:
—
Description of problem:

This is a followup of https://issues.redhat.com/browse/OCPBUGS-34996, in which comments led us to better understand the issue customers are facing.

LDAP IDP traffic from the oauth pod seems to be going through the configured HTTP(S) proxy, while it should not due to it being a different protocol. This results in customers adding the ldap endpoint to their no-proxy config to circumvent the issue.

Version-Release number of selected component (if applicable):

4.15.11

How reproducible:

Steps to Reproduce:

 (From the customer)   
    1. Configure LDAP IDP
    2. Configure Proxy
    3. LDAP IDP communication from the control plane oauth pod goes through proxy instead of going to the ldap endpoint directly

Actual results:

    LDAP IDP communication from the control plane oauth pod goes through proxy

Expected results:

    LDAP IDP communication from the control plane oauth pod should go to the ldap endpoint directly using the ldap protocol, it should not go through the proxy settings

Additional info:

For more information, see linked tickets.

Bug OCPBUGS-34628: [AWS CAPI install] Installer is still using Terraform Variables validations

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33925~~. The following is the description of the original issue:
—
Description of problem:

When install a 4.16 cluster with the same API public DNS already existed, Installer is prompting Terraform Variables initialization errors, which is not expected since the Terraform support should be removed from the installer.    


05-19 17:36:32.935  level=fatal msg=failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": baseDomain: Invalid value: "qe.devcluster.openshift.com": the zone already has record sets for the domain of the cluster: [api.gpei-0519a.qe.devcluster.openshift.com. (A)]

Version-Release number of selected component (if applicable):

 4.16.0-0.nightly-2024-05-18-212906, which has the CAPI install as default

How reproducible:

Steps to Reproduce:

    1. Create a 4.16 cluster with the cluster name: gpei-0519a
    2. After the cluster installation finished, try to create the 2nd one with the same cluster name

Actual results:

    05-19 17:36:26.390  level=debug msg=OpenShift Installer 4.16.0-0.nightly-2024-05-18-212906
05-19 17:36:26.390  level=debug msg=Built from commit 3eed76e1400cac88af6638bb097ada1607137f3f
05-19 17:36:26.390  level=debug msg=Fetching Metadata...
05-19 17:36:26.390  level=debug msg=Loading Metadata...
05-19 17:36:26.390  level=debug msg=  Loading Cluster ID...
05-19 17:36:26.390  level=debug msg=    Loading Install Config...
05-19 17:36:26.390  level=debug msg=      Loading SSH Key...
05-19 17:36:26.390  level=debug msg=      Loading Base Domain...
05-19 17:36:26.390  level=debug msg=        Loading Platform...
05-19 17:36:26.390  level=debug msg=      Loading Cluster Name...
05-19 17:36:26.390  level=debug msg=        Loading Base Domain...
05-19 17:36:26.390  level=debug msg=        Loading Platform...
05-19 17:36:26.390  level=debug msg=      Loading Pull Secret...
05-19 17:36:26.390  level=debug msg=      Loading Platform...
05-19 17:36:26.390  level=debug msg=    Using Install Config loaded from state file
05-19 17:36:26.391  level=debug msg=  Using Cluster ID loaded from state file
05-19 17:36:26.391  level=debug msg=  Loading Install Config...
05-19 17:36:26.391  level=debug msg=  Loading Bootstrap Ignition Config...
05-19 17:36:26.391  level=debug msg=    Loading Ironic bootstrap credentials...
05-19 17:36:26.391  level=debug msg=    Using Ironic bootstrap credentials loaded from state file
05-19 17:36:26.391  level=debug msg=    Loading CVO Ignore...
05-19 17:36:26.391  level=debug msg=      Loading Common Manifests...
05-19 17:36:26.391  level=debug msg=        Loading Cluster ID...
05-19 17:36:26.391  level=debug msg=        Loading Install Config...
05-19 17:36:26.391  level=debug msg=        Loading Ingress Config...
05-19 17:36:26.391  level=debug msg=          Loading Install Config...
05-19 17:36:26.391  level=debug msg=        Using Ingress Config loaded from state file
05-19 17:36:26.391  level=debug msg=        Loading DNS Config...
05-19 17:36:26.391  level=debug msg=          Loading Install Config...
05-19 17:36:26.392  level=debug msg=          Loading Cluster ID...
05-19 17:36:26.392  level=debug msg=          Loading Platform Credentials Check...
05-19 17:36:26.392  level=debug msg=            Loading Install Config...
05-19 17:36:26.392  level=debug msg=          Using Platform Credentials Check loaded from state file
05-19 17:36:26.392  level=debug msg=        Using DNS Config loaded from state file
05-19 17:36:26.392  level=debug msg=        Loading Infrastructure Config...
05-19 17:36:26.392  level=debug msg=          Loading Cluster ID...
05-19 17:36:26.392  level=debug msg=          Loading Install Config...
05-19 17:36:26.392  level=debug msg=          Loading Cloud Provider Config...
05-19 17:36:26.392  level=debug msg=            Loading Install Config...
05-19 17:36:26.392  level=debug msg=            Loading Cluster ID...
05-19 17:36:26.392  level=debug msg=            Loading Platform Credentials Check...
05-19 17:36:26.392  level=debug msg=          Using Cloud Provider Config loaded from state file
05-19 17:36:26.393  level=debug msg=          Loading Additional Trust Bundle Config...
05-19 17:36:26.393  level=debug msg=            Loading Install Config...
05-19 17:36:26.393  level=debug msg=          Using Additional Trust Bundle Config loaded from state file
05-19 17:36:26.393  level=debug msg=        Using Infrastructure Config loaded from state file
05-19 17:36:26.393  level=debug msg=        Loading Network Config...
05-19 17:36:26.393  level=debug msg=          Loading Install Config...
05-19 17:36:26.393  level=debug msg=        Using Network Config loaded from state file
05-19 17:36:26.393  level=debug msg=        Loading Proxy Config...
05-19 17:36:26.393  level=debug msg=          Loading Install Config...
05-19 17:36:26.393  level=debug msg=          Loading Network Config...
05-19 17:36:26.393  level=debug msg=        Using Proxy Config loaded from state file
05-19 17:36:26.393  level=debug msg=        Loading Scheduler Config...
05-19 17:36:26.394  level=debug msg=          Loading Install Config...
05-19 17:36:26.394  level=debug msg=        Using Scheduler Config loaded from state file
05-19 17:36:26.394  level=debug msg=        Loading Image Content Source Policy...
05-19 17:36:26.394  level=debug msg=          Loading Install Config...
05-19 17:36:26.394  level=debug msg=        Using Image Content Source Policy loaded from state file
05-19 17:36:26.394  level=debug msg=        Loading Cluster CSI Driver Config...
05-19 17:36:26.394  level=debug msg=          Loading Install Config...
05-19 17:36:26.394  level=debug msg=          Loading Cluster ID...
05-19 17:36:26.394  level=debug msg=        Using Cluster CSI Driver Config loaded from state file
05-19 17:36:26.394  level=debug msg=        Loading Image Digest Mirror Set...
05-19 17:36:26.394  level=debug msg=          Loading Install Config...
05-19 17:36:26.394  level=debug msg=        Using Image Digest Mirror Set loaded from state file
05-19 17:36:26.394  level=debug msg=        Loading Machine Config Server Root CA...
05-19 17:36:26.395  level=debug msg=        Using Machine Config Server Root CA loaded from state file
05-19 17:36:26.395  level=debug msg=        Loading Certificate (mcs)...
05-19 17:36:26.395  level=debug msg=          Loading Machine Config Server Root CA...
05-19 17:36:26.395  level=debug msg=          Loading Install Config...
05-19 17:36:26.395  level=debug msg=        Using Certificate (mcs) loaded from state file
05-19 17:36:26.395  level=debug msg=        Loading CVOOverrides...
05-19 17:36:26.395  level=debug msg=        Using CVOOverrides loaded from state file
05-19 17:36:26.395  level=debug msg=        Loading KubeCloudConfig...
05-19 17:36:26.395  level=debug msg=        Using KubeCloudConfig loaded from state file
05-19 17:36:26.395  level=debug msg=        Loading KubeSystemConfigmapRootCA...
05-19 17:36:26.395  level=debug msg=        Using KubeSystemConfigmapRootCA loaded from state file
05-19 17:36:26.395  level=debug msg=        Loading MachineConfigServerTLSSecret...
05-19 17:36:26.396  level=debug msg=        Using MachineConfigServerTLSSecret loaded from state file
05-19 17:36:26.396  level=debug msg=        Loading OpenshiftConfigSecretPullSecret...
05-19 17:36:26.396  level=debug msg=        Using OpenshiftConfigSecretPullSecret loaded from state file
05-19 17:36:26.396  level=debug msg=      Using Common Manifests loaded from state file
05-19 17:36:26.396  level=debug msg=      Loading Openshift Manifests...
05-19 17:36:26.396  level=debug msg=        Loading Install Config...
05-19 17:36:26.396  level=debug msg=        Loading Cluster ID...
05-19 17:36:26.396  level=debug msg=        Loading Kubeadmin Password...
05-19 17:36:26.396  level=debug msg=        Using Kubeadmin Password loaded from state file
05-19 17:36:26.396  level=debug msg=        Loading OpenShift Install (Manifests)...
05-19 17:36:26.396  level=debug msg=        Using OpenShift Install (Manifests) loaded from state file
05-19 17:36:26.397  level=debug msg=        Loading Feature Gate Config...
05-19 17:36:26.397  level=debug msg=          Loading Install Config...
05-19 17:36:26.397  level=debug msg=        Using Feature Gate Config loaded from state file
05-19 17:36:26.397  level=debug msg=        Loading CloudCredsSecret...
05-19 17:36:26.397  level=debug msg=        Using CloudCredsSecret loaded from state file
05-19 17:36:26.397  level=debug msg=        Loading KubeadminPasswordSecret...
05-19 17:36:26.397  level=debug msg=        Using KubeadminPasswordSecret loaded from state file
05-19 17:36:26.397  level=debug msg=        Loading RoleCloudCredsSecretReader...
05-19 17:36:26.397  level=debug msg=        Using RoleCloudCredsSecretReader loaded from state file
05-19 17:36:26.397  level=debug msg=        Loading Baremetal Config CR...
05-19 17:36:26.397  level=debug msg=        Using Baremetal Config CR loaded from state file
05-19 17:36:26.397  level=debug msg=        Loading Image...
05-19 17:36:26.397  level=debug msg=          Loading Install Config...
05-19 17:36:26.398  level=debug msg=        Using Image loaded from state file
05-19 17:36:26.398  level=debug msg=        Loading AzureCloudProviderSecret...
05-19 17:36:26.398  level=debug msg=        Using AzureCloudProviderSecret loaded from state file
05-19 17:36:26.398  level=debug msg=      Using Openshift Manifests loaded from state file
05-19 17:36:26.398  level=debug msg=    Using CVO Ignore loaded from state file
05-19 17:36:26.398  level=debug msg=    Loading Install Config...
05-19 17:36:26.398  level=debug msg=    Loading Kubeconfig Admin Internal Client...
05-19 17:36:26.398  level=debug msg=      Loading Certificate (admin-kubeconfig-client)...
05-19 17:36:26.398  level=debug msg=        Loading Certificate (admin-kubeconfig-signer)...
05-19 17:36:26.398  level=debug msg=        Using Certificate (admin-kubeconfig-signer) loaded from state file
05-19 17:36:26.398  level=debug msg=      Using Certificate (admin-kubeconfig-client) loaded from state file
05-19 17:36:26.399  level=debug msg=      Loading Certificate (kube-apiserver-complete-server-ca-bundle)...
05-19 17:36:26.399  level=debug msg=        Loading Certificate (kube-apiserver-localhost-ca-bundle)...
05-19 17:36:26.399  level=debug msg=          Loading Certificate (kube-apiserver-localhost-signer)...
05-19 17:36:26.399  level=debug msg=          Using Certificate (kube-apiserver-localhost-signer) loaded from state file
05-19 17:36:26.399  level=debug msg=        Using Certificate (kube-apiserver-localhost-ca-bundle) loaded from state file
05-19 17:36:26.399  level=debug msg=        Loading Certificate (kube-apiserver-service-network-ca-bundle)...
05-19 17:36:26.399  level=debug msg=          Loading Certificate (kube-apiserver-service-network-signer)...
05-19 17:36:26.399  level=debug msg=          Using Certificate (kube-apiserver-service-network-signer) loaded from state file
05-19 17:36:26.399  level=debug msg=        Using Certificate (kube-apiserver-service-network-ca-bundle) loaded from state file
05-19 17:36:26.400  level=debug msg=        Loading Certificate (kube-apiserver-lb-ca-bundle)...
05-19 17:36:26.400  level=debug msg=          Loading Certificate (kube-apiserver-lb-signer)...
05-19 17:36:26.400  level=debug msg=          Using Certificate (kube-apiserver-lb-signer) loaded from state file
05-19 17:36:26.400  level=debug msg=        Using Certificate (kube-apiserver-lb-ca-bundle) loaded from state file
05-19 17:36:26.400  level=debug msg=      Using Certificate (kube-apiserver-complete-server-ca-bundle) loaded from state file
05-19 17:36:26.400  level=debug msg=      Loading Install Config...
05-19 17:36:26.400  level=debug msg=    Using Kubeconfig Admin Internal Client loaded from state file
05-19 17:36:26.400  level=debug msg=    Loading Kubeconfig Kubelet...
05-19 17:36:26.400  level=debug msg=      Loading Certificate (kube-apiserver-complete-server-ca-bundle)...
05-19 17:36:26.400  level=debug msg=      Loading Certificate (kubelet-client)...
05-19 17:36:26.401  level=debug msg=        Loading Certificate (kubelet-bootstrap-kubeconfig-signer)...
05-19 17:36:26.401  level=debug msg=        Using Certificate (kubelet-bootstrap-kubeconfig-signer) loaded from state file
05-19 17:36:26.401  level=debug msg=      Using Certificate (kubelet-client) loaded from state file
05-19 17:36:26.401  level=debug msg=      Loading Install Config...
05-19 17:36:26.401  level=debug msg=    Using Kubeconfig Kubelet loaded from state file
05-19 17:36:26.401  level=debug msg=    Loading Kubeconfig Admin Client (Loopback)...
05-19 17:36:26.401  level=debug msg=      Loading Certificate (admin-kubeconfig-client)...
05-19 17:36:26.401  level=debug msg=      Loading Certificate (kube-apiserver-localhost-ca-bundle)...
05-19 17:36:26.401  level=debug msg=      Loading Install Config...
05-19 17:36:26.401  level=debug msg=    Using Kubeconfig Admin Client (Loopback) loaded from state file
05-19 17:36:26.401  level=debug msg=    Loading Master Ignition Customization Check...
05-19 17:36:26.402  level=debug msg=      Loading Install Config...
05-19 17:36:26.402  level=debug msg=      Loading Machine Config Server Root CA...
05-19 17:36:26.402  level=debug msg=      Loading Master Ignition Config...
05-19 17:36:26.402  level=debug msg=        Loading Install Config...
05-19 17:36:26.402  level=debug msg=        Loading Machine Config Server Root CA...
05-19 17:36:26.402  level=debug msg=      Loading Master Ignition Config from both state file and target directory
05-19 17:36:26.402  level=debug msg=      On-disk Master Ignition Config matches asset in state file
05-19 17:36:26.402  level=debug msg=      Using Master Ignition Config loaded from state file
05-19 17:36:26.402  level=debug msg=    Using Master Ignition Customization Check loaded from state file
05-19 17:36:26.402  level=debug msg=    Loading Worker Ignition Customization Check...
05-19 17:36:26.402  level=debug msg=      Loading Install Config...
05-19 17:36:26.402  level=debug msg=      Loading Machine Config Server Root CA...
05-19 17:36:26.403  level=debug msg=      Loading Worker Ignition Config...
05-19 17:36:26.403  level=debug msg=        Loading Install Config...
05-19 17:36:26.403  level=debug msg=        Loading Machine Config Server Root CA...
05-19 17:36:26.403  level=debug msg=      Loading Worker Ignition Config from both state file and target directory
05-19 17:36:26.403  level=debug msg=      On-disk Worker Ignition Config matches asset in state file
05-19 17:36:26.403  level=debug msg=      Using Worker Ignition Config loaded from state file
05-19 17:36:26.403  level=debug msg=    Using Worker Ignition Customization Check loaded from state file
05-19 17:36:26.403  level=debug msg=    Loading Master Machines...
05-19 17:36:26.403  level=debug msg=      Loading Cluster ID...
05-19 17:36:26.403  level=debug msg=      Loading Platform Credentials Check...
05-19 17:36:26.403  level=debug msg=      Loading Install Config...
05-19 17:36:26.403  level=debug msg=      Loading Image...
05-19 17:36:26.404  level=debug msg=      Loading Master Ignition Config...
05-19 17:36:26.404  level=debug msg=    Using Master Machines loaded from state file
05-19 17:36:26.404  level=debug msg=    Loading Worker Machines...
05-19 17:36:26.404  level=debug msg=      Loading Cluster ID...
05-19 17:36:26.404  level=debug msg=      Loading Platform Credentials Check...
05-19 17:36:26.404  level=debug msg=      Loading Install Config...
05-19 17:36:26.404  level=debug msg=      Loading Image...
05-19 17:36:26.404  level=debug msg=      Loading Release...
05-19 17:36:26.404  level=debug msg=        Loading Install Config...
05-19 17:36:26.404  level=debug msg=      Using Release loaded from state file
05-19 17:36:26.404  level=debug msg=      Loading Worker Ignition Config...
05-19 17:36:26.404  level=debug msg=    Using Worker Machines loaded from state file
05-19 17:36:26.404  level=debug msg=    Loading Common Manifests...
05-19 17:36:26.404  level=debug msg=    Loading Openshift Manifests...
05-19 17:36:26.404  level=debug msg=    Loading Proxy Config...
05-19 17:36:26.405  level=debug msg=    Loading Certificate (admin-kubeconfig-ca-bundle)...
05-19 17:36:26.405  level=debug msg=      Loading Certificate (admin-kubeconfig-signer)...
05-19 17:36:26.405  level=debug msg=    Using Certificate (admin-kubeconfig-ca-bundle) loaded from state file
05-19 17:36:26.405  level=debug msg=    Loading Certificate (aggregator)...
05-19 17:36:26.405  level=debug msg=    Using Certificate (aggregator) loaded from state file
05-19 17:36:26.405  level=debug msg=    Loading Certificate (aggregator-ca-bundle)...
05-19 17:36:26.405  level=debug msg=      Loading Certificate (aggregator-signer)...
05-19 17:36:26.405  level=debug msg=      Using Certificate (aggregator-signer) loaded from state file
05-19 17:36:26.405  level=debug msg=    Using Certificate (aggregator-ca-bundle) loaded from state file
05-19 17:36:26.405  level=debug msg=    Loading Certificate (system:kube-apiserver-proxy)...
05-19 17:36:26.405  level=debug msg=      Loading Certificate (aggregator-signer)...
05-19 17:36:26.406  level=debug msg=    Using Certificate (system:kube-apiserver-proxy) loaded from state file
05-19 17:36:26.406  level=debug msg=    Loading Certificate (aggregator-signer)...
05-19 17:36:26.406  level=debug msg=    Loading Certificate (system:kube-apiserver-proxy)...
05-19 17:36:26.406  level=debug msg=      Loading Certificate (aggregator)...
05-19 17:36:26.406  level=debug msg=    Using Certificate (system:kube-apiserver-proxy) loaded from state file
05-19 17:36:26.406  level=debug msg=    Loading Bootstrap SSH Key Pair...
05-19 17:36:26.406  level=debug msg=    Using Bootstrap SSH Key Pair loaded from state file
05-19 17:36:26.406  level=debug msg=    Loading User-provided Service Account Signing key...
05-19 17:36:26.406  level=debug msg=    Using User-provided Service Account Signing key loaded from state file
05-19 17:36:26.406  level=debug msg=    Loading Cloud Provider CA Bundle...
05-19 17:36:26.406  level=debug msg=      Loading Install Config...
05-19 17:36:26.407  level=debug msg=    Using Cloud Provider CA Bundle loaded from state file
05-19 17:36:26.407  level=debug msg=    Loading Certificate (journal-gatewayd)...
05-19 17:36:26.407  level=debug msg=      Loading Machine Config Server Root CA...
05-19 17:36:26.407  level=debug msg=    Using Certificate (journal-gatewayd) loaded from state file
05-19 17:36:26.407  level=debug msg=    Loading Certificate (kube-apiserver-lb-ca-bundle)...
05-19 17:36:26.407  level=debug msg=    Loading Certificate (kube-apiserver-external-lb-server)...
05-19 17:36:26.407  level=debug msg=      Loading Certificate (kube-apiserver-lb-signer)...
05-19 17:36:26.407  level=debug msg=      Loading Install Config...
05-19 17:36:26.407  level=debug msg=    Using Certificate (kube-apiserver-external-lb-server) loaded from state file
05-19 17:36:26.407  level=debug msg=    Loading Certificate (kube-apiserver-internal-lb-server)...
05-19 17:36:26.407  level=debug msg=      Loading Certificate (kube-apiserver-lb-signer)...
05-19 17:36:26.408  level=debug msg=      Loading Install Config...
05-19 17:36:26.408  level=debug msg=    Using Certificate (kube-apiserver-internal-lb-server) loaded from state file
05-19 17:36:26.408  level=debug msg=    Loading Certificate (kube-apiserver-lb-signer)...
05-19 17:36:26.408  level=debug msg=    Loading Certificate (kube-apiserver-localhost-ca-bundle)...
05-19 17:36:26.408  level=debug msg=    Loading Certificate (kube-apiserver-localhost-server)...
05-19 17:36:26.408  level=debug msg=      Loading Certificate (kube-apiserver-localhost-signer)...
05-19 17:36:26.408  level=debug msg=    Using Certificate (kube-apiserver-localhost-server) loaded from state file
05-19 17:36:26.408  level=debug msg=    Loading Certificate (kube-apiserver-localhost-signer)...
05-19 17:36:26.408  level=debug msg=    Loading Certificate (kube-apiserver-service-network-ca-bundle)...
05-19 17:36:26.408  level=debug msg=    Loading Certificate (kube-apiserver-service-network-server)...
05-19 17:36:26.409  level=debug msg=      Loading Certificate (kube-apiserver-service-network-signer)...
05-19 17:36:26.409  level=debug msg=      Loading Install Config...
05-19 17:36:26.409  level=debug msg=    Using Certificate (kube-apiserver-service-network-server) loaded from state file
05-19 17:36:26.409  level=debug msg=    Loading Certificate (kube-apiserver-service-network-signer)...
05-19 17:36:26.409  level=debug msg=    Loading Certificate (kube-apiserver-complete-server-ca-bundle)...
05-19 17:36:26.409  level=debug msg=    Loading Certificate (kube-apiserver-complete-client-ca-bundle)...
05-19 17:36:26.409  level=debug msg=      Loading Certificate (admin-kubeconfig-ca-bundle)...
05-19 17:36:26.409  level=debug msg=      Loading Certificate (kubelet-client-ca-bundle)...
05-19 17:36:26.409  level=debug msg=        Loading Certificate (kubelet-signer)...
05-19 17:36:26.409  level=debug msg=        Using Certificate (kubelet-signer) loaded from state file
05-19 17:36:26.410  level=debug msg=      Using Certificate (kubelet-client-ca-bundle) loaded from state file
05-19 17:36:26.410  level=debug msg=      Loading Certificate (kube-control-plane-ca-bundle)...
05-19 17:36:26.410  level=debug msg=        Loading Certificate (kube-control-plane-signer)...
05-19 17:36:26.410  level=debug msg=        Using Certificate (kube-control-plane-signer) loaded from state file
05-19 17:36:26.410  level=debug msg=        Loading Certificate (kube-apiserver-lb-signer)...
05-19 17:36:26.410  level=debug msg=        Loading Certificate (kube-apiserver-localhost-signer)...
05-19 17:36:26.410  level=debug msg=        Loading Certificate (kube-apiserver-service-network-signer)...
05-19 17:36:26.410  level=debug msg=      Using Certificate (kube-control-plane-ca-bundle) loaded from state file
05-19 17:36:26.410  level=debug msg=      Loading Certificate (kube-apiserver-to-kubelet-ca-bundle)...
05-19 17:36:26.411  level=debug msg=        Loading Certificate (kube-apiserver-to-kubelet-signer)...
05-19 17:36:26.411  level=debug msg=        Using Certificate (kube-apiserver-to-kubelet-signer) loaded from state file
05-19 17:36:26.411  level=debug msg=      Using Certificate (kube-apiserver-to-kubelet-ca-bundle) loaded from state file
05-19 17:36:26.411  level=debug msg=      Loading Certificate (kubelet-bootstrap-kubeconfig-ca-bundle)...
05-19 17:36:26.411  level=debug msg=        Loading Certificate (kubelet-bootstrap-kubeconfig-signer)...
05-19 17:36:26.411  level=debug msg=      Using Certificate (kubelet-bootstrap-kubeconfig-ca-bundle) loaded from state file
05-19 17:36:26.411  level=debug msg=    Using Certificate (kube-apiserver-complete-client-ca-bundle) loaded from state file
05-19 17:36:26.411  level=debug msg=    Loading Certificate (kube-apiserver-to-kubelet-ca-bundle)...
05-19 17:36:26.411  level=debug msg=    Loading Certificate (kube-apiserver-to-kubelet-client)...
05-19 17:36:26.412  level=debug msg=      Loading Certificate (kube-apiserver-to-kubelet-signer)...
05-19 17:36:26.412  level=debug msg=    Using Certificate (kube-apiserver-to-kubelet-client) loaded from state file
05-19 17:36:26.412  level=debug msg=    Loading Certificate (kube-apiserver-to-kubelet-signer)...
05-19 17:36:26.412  level=debug msg=    Loading Certificate (kube-control-plane-ca-bundle)...
05-19 17:36:26.412  level=debug msg=    Loading Certificate (kube-control-plane-kube-controller-manager-client)...
05-19 17:36:26.412  level=debug msg=      Loading Certificate (kube-control-plane-signer)...
05-19 17:36:26.412  level=debug msg=    Using Certificate (kube-control-plane-kube-controller-manager-client) loaded from state file
05-19 17:36:26.412  level=debug msg=    Loading Certificate (kube-control-plane-kube-scheduler-client)...
05-19 17:36:26.412  level=debug msg=      Loading Certificate (kube-control-plane-signer)...
05-19 17:36:26.412  level=debug msg=    Using Certificate (kube-control-plane-kube-scheduler-client) loaded from state file
05-19 17:36:26.413  level=debug msg=    Loading Certificate (kube-control-plane-signer)...
05-19 17:36:26.413  level=debug msg=    Loading Certificate (kubelet-bootstrap-kubeconfig-ca-bundle)...
05-19 17:36:26.413  level=debug msg=    Loading Certificate (kubelet-client-ca-bundle)...
05-19 17:36:26.413  level=debug msg=    Loading Certificate (kubelet-client)...
05-19 17:36:26.413  level=debug msg=    Loading Certificate (kubelet-signer)...
05-19 17:36:26.413  level=debug msg=    Loading Certificate (kubelet-serving-ca-bundle)...
05-19 17:36:26.413  level=debug msg=      Loading Certificate (kubelet-signer)...
05-19 17:36:26.413  level=debug msg=    Using Certificate (kubelet-serving-ca-bundle) loaded from state file
05-19 17:36:26.413  level=debug msg=    Loading Certificate (mcs)...
05-19 17:36:26.413  level=debug msg=    Loading Machine Config Server Root CA...
05-19 17:36:26.413  level=debug msg=    Loading Key Pair (service-account.pub)...
05-19 17:36:26.414  level=debug msg=    Using Key Pair (service-account.pub) loaded from state file
05-19 17:36:26.414  level=debug msg=    Loading Release Image Pull Spec...
05-19 17:36:26.414  level=debug msg=    Using Release Image Pull Spec loaded from state file
05-19 17:36:26.414  level=debug msg=    Loading Image...
05-19 17:36:26.414  level=debug msg=  Loading Bootstrap Ignition Config from both state file and target directory
05-19 17:36:26.414  level=debug msg=  On-disk Bootstrap Ignition Config matches asset in state file
05-19 17:36:26.414  level=debug msg=  Using Bootstrap Ignition Config loaded from state file
05-19 17:36:26.414  level=debug msg=Using Metadata loaded from state file
05-19 17:36:26.414  level=debug msg=Reusing previously-fetched Metadata
05-19 17:36:26.415  level=info msg=Consuming Worker Ignition Config from target directory
05-19 17:36:26.415  level=debug msg=Purging asset "Worker Ignition Config" from disk
05-19 17:36:26.415  level=info msg=Consuming Master Ignition Config from target directory
05-19 17:36:26.415  level=debug msg=Purging asset "Master Ignition Config" from disk
05-19 17:36:26.415  level=info msg=Consuming Bootstrap Ignition Config from target directory
05-19 17:36:26.415  level=debug msg=Purging asset "Bootstrap Ignition Config" from disk
05-19 17:36:26.415  level=debug msg=Fetching Master Ignition Customization Check...
05-19 17:36:26.415  level=debug msg=Reusing previously-fetched Master Ignition Customization Check
05-19 17:36:26.415  level=debug msg=Fetching Worker Ignition Customization Check...
05-19 17:36:26.415  level=debug msg=Reusing previously-fetched Worker Ignition Customization Check
05-19 17:36:26.415  level=debug msg=Fetching Terraform Variables...
05-19 17:36:26.415  level=debug msg=Loading Terraform Variables...
05-19 17:36:26.416  level=debug msg=  Loading Cluster ID...
05-19 17:36:26.416  level=debug msg=  Loading Install Config...
05-19 17:36:26.416  level=debug msg=  Loading Image...
05-19 17:36:26.416  level=debug msg=  Loading Release...
05-19 17:36:26.416  level=debug msg=  Loading BootstrapImage...
05-19 17:36:26.416  level=debug msg=    Loading Install Config...
05-19 17:36:26.416  level=debug msg=    Loading Image...
05-19 17:36:26.416  level=debug msg=  Loading Bootstrap Ignition Config...
05-19 17:36:26.416  level=debug msg=  Loading Master Ignition Config...
05-19 17:36:26.416  level=debug msg=  Loading Master Machines...
05-19 17:36:26.416  level=debug msg=  Loading Worker Machines...
05-19 17:36:26.416  level=debug msg=  Loading Ironic bootstrap credentials...
05-19 17:36:26.416  level=debug msg=  Loading Platform Provisioning Check...
05-19 17:36:26.416  level=debug msg=    Loading Install Config...
05-19 17:36:26.416  level=debug msg=  Loading Common Manifests...
05-19 17:36:26.417  level=debug msg=  Fetching Cluster ID...
05-19 17:36:26.417  level=debug msg=  Reusing previously-fetched Cluster ID
05-19 17:36:26.417  level=debug msg=  Fetching Install Config...
05-19 17:36:26.417  level=debug msg=  Reusing previously-fetched Install Config
05-19 17:36:26.417  level=debug msg=  Fetching Image...
05-19 17:36:26.417  level=debug msg=  Reusing previously-fetched Image
05-19 17:36:26.417  level=debug msg=  Fetching Release...
05-19 17:36:26.417  level=debug msg=  Reusing previously-fetched Release
05-19 17:36:26.417  level=debug msg=  Fetching BootstrapImage...
05-19 17:36:26.417  level=debug msg=    Fetching Install Config...
05-19 17:36:26.417  level=debug msg=    Reusing previously-fetched Install Config
05-19 17:36:26.417  level=debug msg=    Fetching Image...
05-19 17:36:26.417  level=debug msg=    Reusing previously-fetched Image
05-19 17:36:26.417  level=debug msg=  Generating BootstrapImage...
05-19 17:36:26.417  level=debug msg=  Fetching Bootstrap Ignition Config...
05-19 17:36:26.418  level=debug msg=  Reusing previously-fetched Bootstrap Ignition Config
05-19 17:36:26.418  level=debug msg=  Fetching Master Ignition Config...
05-19 17:36:26.418  level=debug msg=  Reusing previously-fetched Master Ignition Config
05-19 17:36:26.418  level=debug msg=  Fetching Master Machines...
05-19 17:36:26.418  level=debug msg=  Reusing previously-fetched Master Machines
05-19 17:36:26.418  level=debug msg=  Fetching Worker Machines...
05-19 17:36:26.418  level=debug msg=  Reusing previously-fetched Worker Machines
05-19 17:36:26.418  level=debug msg=  Fetching Ironic bootstrap credentials...
05-19 17:36:26.418  level=debug msg=  Reusing previously-fetched Ironic bootstrap credentials
05-19 17:36:26.418  level=debug msg=  Fetching Platform Provisioning Check...
05-19 17:36:26.418  level=debug msg=    Fetching Install Config...
05-19 17:36:26.418  level=debug msg=    Reusing previously-fetched Install Config
05-19 17:36:26.418  level=debug msg=  Generating Platform Provisioning Check...
05-19 17:36:26.419  level=info msg=Credentials loaded from the "flexy-installer" profile in file "/home/installer1/workspace/ocp-common/Flexy-install@2/flexy/workdir/awscreds20240519-580673-bzyw8l"
05-19 17:36:32.935  level=fatal msg=failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": baseDomain: Invalid value: "qe.devcluster.openshift.com": the zone already has record sets for the domain of the cluster: [api.gpei-0519a.qe.devcluster.openshift.com. (A)]

Expected results:

 Remove all TF checks on AWS/vSphere/Nutanix platforms

Additional info:

https://github.com/openshift/installer/pull/8501

Bug OCPBUGS-24602: [sig-arch] events should not repeat pathologically for ns/openshift-dns failing

View the Description View the linked PRs

Description of problem: The [sig-arch] events should not repeat pathologically for ns/openshift-dns test is permafailing in the periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.15-e2e-openstack-ovn-serial job

{ 1 events happened too frequently

event happened 114 times, something is wrong: namespace/openshift-dns hmsg/d0c68b9435 service/dns-default - reason/TopologyAwareHintsDisabled Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4 From: 17:11:03Z To: 17:11:04Z result=reject }

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.15-e2e-openstack-ovn-serial

Example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.15-e2e-openstack-ovn-serial/1732612509958934528

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/398

Bug OCPBUGS-27050: Update 4.16 ose-azure-disk-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-operator/pull/114

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-29476: Core CAPI CRDs not deployed on unsupported platforms even when explicitly needed by other operators

View the Description View the linked PRs

Description of problem:

Core CAPI CRDs not deployed on unsupported platforms even when explicitly needed by other operators.

An example of this is on VSphere clusters. CAPI is not yet supported on VSphere clusters, but the CAPI IPAM CRDs, are needed by other operators than the usual consumer, cluster-capi-operator and the CAPI controllers.

Version-Release number of selected component (if applicable):

How reproducible:

    Launch a techpreview cluster for an unsupported platform (e.g. vsphere/azure). Check that the Core CAPI CRDs are not present.

Steps to Reproduce:

    $ oc get crds | grep cluster.x-k8s.io

Actual results:

    Core CAPI CRDs are not present (only the metal ones)

Expected results:

    Core CAPI CRDs should be present

Additional info:

Bug OCPBUGS-31919: openshift-tests image unauthorized

View the Description View the linked PRs

Description of problem:

3 conformance tests are failing in 4.16 when running conformance/parallel testsuite in Openshift on Openstack D/S CI:

[sig-arch][Late][Jira:"kube-apiserver"] all registered tls artifacts must have no metadata violation regressions [Suite:openshift/conformance/parallel]
[sig-arch][Late][Jira:"kube-apiserver"] collect certificate data [Suite:openshift/conformance/parallel]
[sig-arch][Late][Jira:"kube-apiserver"] all tls artifacts must be registered [Suite:openshift/conformance/parallel]

time="2024-04-05T16:44:56Z" level=info msg="Decoding provider" clusterState="<nil>" discover=false dryRun=false func=DecodeProvider providerType="{\"type\":\"skeleton\",\"ProjectID\":\"\",\"Region\":\"\",\"Zone\":\"nova\",\"NumNodes\":3,\"MultiMaster\":true,\"MultiZone\":false,\"Zones\":[\"nova\"],\"ConfigFile\":\"\",\"Disconnected\":false,\"SingleReplicaTopology\":false,\"NetworkPlugin\":\"OVNKubernetes\",\"HasIPv4\":true,\"HasIPv6\":true,\"IPFamily\":\"ipv4\",\"HasSCTP\":false,\"IsProxied\":false,\"IsIBMROKS\":false,\"HasNoOptionalCapabilities\":false}"
time="2024-04-05T16:44:56Z" level=warning msg="config was nil" func=DecodeProvider
  Running Suite: OpenShift e2e suite - /home/stack
  ================================================
  Random Seed: 1712335495 - will randomize all specs

  Will run 1 of 1 specs
  ------------------------------
  [sig-arch][Late][Jira:"kube-apiserver"] all registered tls artifacts must have no metadata violation regressions [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/operators/certs.go:202
  Apr  5 16:44:57.900: INFO: microshift-version configmap not found
    [FAILED] in [BeforeAll] - github.com/openshift/origin/test/extended/operators/certs.go:120 @ 04/05/24 16:45:01.242
  • [FAILED] [3.378 seconds]
  [sig-arch][Late][Jira:"kube-apiserver"] [BeforeAll] all registered tls artifacts must have no metadata violation regressions [Suite:openshift/conformance/parallel]
    [BeforeAll] github.com/openshift/origin/test/extended/operators/certs.go:94
    [It] github.com/openshift/origin/test/extended/operators/certs.go:202

    [FAILED] Unexpected error:
        <*errors.errorString | 0xc00062c060>: 
        unable to determine openshift-tests image: exit status 1: error: unable to read image registry.ci.openshift.org/ocp/release@sha256:8142e7b7720bd37879ec5919cb6bce0d436f119516694bcf0788372faf45a0e0: unauthorized: authentication required
        
        {
            s: "unable to determine openshift-tests image: exit status 1: error: unable to read image registry.ci.openshift.org/ocp/release@sha256:8142e7b7720bd37879ec5919cb6bce0d436f119516694bcf0788372faf45a0e0: unauthorized: authentication required\n",
        }
    occurred
    In [BeforeAll] at: github.com/openshift/origin/test/extended/operators/certs.go:120 @ 04/05/24 16:45:01.242
  ------------------------------

  Summarizing 1 Failure:
    [FAIL] [sig-arch][Late][Jira:"kube-apiserver"] [BeforeAll] all registered tls artifacts must have no metadata violation regressions [Suite:openshift/conformance/parallel]
    github.com/openshift/origin/test/extended/operators/certs.go:120

  Ran 1 of 1 Specs in 3.378 seconds
  FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
fail [github.com/openshift/origin/test/extended/operators/certs.go:120]: Unexpected error:
    <*errors.errorString | 0xc00062c060>: 
    unable to determine openshift-tests image: exit status 1: error: unable to read image registry.ci.openshift.org/ocp/release@sha256:8142e7b7720bd37879ec5919cb6bce0d436f119516694bcf0788372faf45a0e0: unauthorized: authentication required
    
    {
        s: "unable to determine openshift-tests image: exit status 1: error: unable to read image registry.ci.openshift.org/ocp/release@sha256:8142e7b7720bd37879ec5919cb6bce0d436f119516694bcf0788372faf45a0e0: unauthorized: authentication required\n",
    }
occurred
Ginkgo exit error 1: exit with code 1

Version-Release number of selected component (if applicable):

release-4.16 and master branchs in origin.

How reproducible:

Always

https://github.com/openshift/origin/pull/28764

Bug OCPBUGS-34660: [4.16] Allow adding new node during live migration

View the Description View the linked PRs

Description of problem:

    As, the live migration process may take hours for a large cluster. The workload in the cluster may trigger cluster extension by adding new nodes. We need to support adding new nodes when an SDN live migration is running in progress.We need to backport this to 4.15.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2392

Bug OCPBUGS-22405: Deployment cannot be scaled up/down from GUI when an HPA is associated with it.

View the Description View the linked PRs

Description of problem:

Deployment cannot be scaled up/down when an HPA is associated with it.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Create a test deployment
$ oc new-app httpd
2. Create a HPA for the deployment
$ oc autoscale deployment/httpd --min 1 --max 10 --cpu-percent 10 
3. Scale down the deployment via script or manually to 0 replicas.
$ oc scale deployment/httpd --replicas=0
4. The HPA shows below status that it cannot scale up until the deployment is scaled up.
~~~
     - type: ScalingActive
      status: 'False'
      lastTransitionTime: '2023-10-24T10:00:01Z'
      reason: ScalingDisabled
      message: scaling is disabled since the replica count of the target is zero
~~~  
5. Since the scale up/down is disabled, the users will not be able to scale up the deployment from GUI. The only option is to do it from CLI.

Actual results:

The scale up/down arrows are disabled and users are unable to start the deployment.

Expected results:

The scale up/down arrows should be enabled or another option that can help to scale up the deployment.

Additional info:

https://github.com/openshift/console/pull/13711

Bug OCPBUGS-25530: The filter component should not exist in resource section on the search page with Phone View

View the Description View the linked PRs

Description of problem:

    The unworkable filter component should not exist in resource section on the search page with Phone View

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2023-12-15-211129

How reproducible:

    Always

Steps to Reproduce:

    1. Change to phone view for the browser (Browser -> F12 - Toggle device toolbar)
       eg: iPhone 14 Pro Max
    2. Navigate to Home -> Search page, select one resource
       eg: APIRequestCounts
    3. Check the component in the resources panel

Actual results:

   There is an unworkable filter icon under the 'Create APIREquestCount' button

Expected results:

    Remove the filter component in Phone view
 OR, make sure the filter is workable in phone view if customer needed

Additional info:

    https://drive.google.com/file/d/1Fwb8EGznWkA1z3cVVzGcJJjMFkjkuUhK/view?usp=drive_link

https://github.com/openshift/console/pull/13456

Bug OCPBUGS-34968: Deprecate PF4 and ReactRouter5 in SDK docs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34538~~. The following is the description of the original issue:
—
Description of problem:

Since we aim for removing PF4 and ReactRouter5 in 4.18 we need to deprecate these shared modules in 4.16 to give plugin creators time to update their plugins.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13933

Bug OCPBUGS-31725: hypershift-operator fails to deploy 4.13 HostedClusters

View the Description View the linked PRs

The 4.13 CPO fails to reconcile

{"level":"error","ts":"2024-04-03T18:45:28Z","msg":"Reconciler error","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"sjenning-guest","namespace":"clusters-sjenning-guest"},"namespace":"clusters-sjenning-guest","name":"sjenning-guest","reconcileID":"35a91dd1-0066-4c81-a6a4-14770ffff61d","error":"failed to update control plane: failed to reconcile router: failed to reconcile router role: roles.rbac.authorization.k8s.io \"router\" is forbidden: user \"system:serviceaccount:clusters-sjenning-guest:control-plane-operator\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:clusters-sjenning-guest\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"security.openshift.io\"], Resources:[\"securitycontextconstraints\"], ResourceNames:[\"hostnetwork\"], Verbs:[\"use\"]}","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Caused by https://github.com/openshift/hypershift/pull/3789

https://github.com/openshift/hypershift/pull/3838

Task MGMT-16612: [data pipeline] status_info reporting incorrect status

View the Description View the linked PRs

When installation fails, status_info is reporting an incorrect status

Most likely is one of those two scenarios:

we are not notifying the status when changing right before failure
we are changing status after sending the event. While this was OK before, as the scraper would likely run after the event, but now order is important

https://github.com/openshift/assisted-service/pull/5930

Bug OCPBUGS-30547: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8131

Bug OCPBUGS-22749: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13568

Bug OCPBUGS-30048: Update OWNERS file in route-controller-manager

View the Description View the linked PRs

Description of problem:

Update OWNERS file in route-controller-manager repository.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

n/a

Steps to Reproduce:

n/a

Actual results:

n/a

Expected results:

n/a

Additional info:

https://github.com/openshift/route-controller-manager/pull/41

Bug MGMT-17503: [BE][custom OCP] cpu type not the same for latest_versions info vs others

View the Description View the linked PRs

Description of the problem:
Before we create a cluster , we get in UI list of latest_only=false vs latest_only=true.
Looks like the CPU type is not the same for both .

Example:

Latest:
"4.11.59": {
        "cpu_architectures": [
            "x86_64"
        ],
        "display_name": "4.11.59",
        "support_level": "end-of-life"
    },
    "4.12.53": {
        "cpu_architectures": [
            "x86_64"
        ],
        "display_name": "4.12.53",
        "support_level": "maintenance"
    },

from all:

"4.11.59": {
        "cpu_architectures": [
            "x86_64",
            "arm64"
        ],
        "display_name": "4.11.59",
        "support_level": "end-of-life"
    },
 "4.12.53": {
        "cpu_architectures": [
            "x86_64",
            "arm64"
        ],
        "display_name": "4.12.53",
        "support_level": "maintenance"
    },

How reproducible:

Always

Steps to reproduce:

Before creating a cluster , open browser debug and see openshift versions returned.
one as latest_only=true

and latest_only=false.

Expecting to get the same cpu type

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/6184

Bug OCPBUGS-35570: aws bootstrap destroy due to awscluster modified

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35530~~. The following is the description of the original issue:
—
Description of problem:

Bootstrap destroy failed in CI with:

level=fatal msg=error destroying bootstrap resources failed during the destroy bootstrap hook: failed to remove bootstrap SSH rule: failed to update AWSCluster during bootstrap destroy: Operation cannot be fulfilled on awsclusters.infrastructure.cluster.x-k8s.io "ci-op-nk1s6685-77004-4gb4d": the object has been modified; please apply your changes to the latest version and try again

Version-Release number of selected component (if applicable):

How reproducible:

Unclear. CI search returns no results. Observed it as a single failure (aws-ovn job, linked below) in the testing of https://amd64.ocp.releases.ci.openshift.org/releasestream/4.17.0-0.nightly/release/4.17.0-0.nightly-2024-06-15-004118

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn/1801780204167761920

Two possible solutions:

Add retries in the case of a failure like this or to updating in general
Switch to sdk-based destroy rather than capi-based

https://github.com/openshift/installer/pull/8618

Bug OCPBUGS-16814: The web console's JS client is using stale tokens

View the Description View the linked PRs

Description of problem:

Starting OpenShift 4.8 (https://docs.openshift.com/container-platform/4.8/release_notes/ocp-4-8-release-notes.html#ocp-4-8-notable-technical-changes), all pods are getting bound SA tokens.

Currently, instead of expiring the token, we use the `service-account-extend-token-expiration` that extends a bound token validity to 1yr and warns in case of a use of a token that would've otherwise been expired.

We want to disable this behavior in a future OpenShift release, which would break the OpenShift web console.

Version-Release number of selected component (if applicable):

4.8 - 4.14

How reproducible:

100%

Steps to Reproduce:

1. install a fresh cluster
2. wait ~1hr since console pods were deployed for the token rotation to occur
3. log in to the console and click around
4. check the kube-apiserver audit logs events for the "authentication.k8s.io/stale-token" annotation

Actual results:

many occurrences (I doubt I'll be able to upload a text file so I'll show a few audit events in the first comment.

Expected results:

The web-console re-reads the SA token regularly so that it never uses an expired token

Additional info:

In a theoretical case where a console pod lasts for a year, it's going to break and won't be able to authenticate to the kube-apiserver.

We are planning on disallowing the use of stale tokens in a future release and we need to make sure that the core platform is not broken so that the metrics we collect from the clusters in the wild are not polluted.

https://github.com/openshift/console/pull/13321

Bug OCPBUGS-24245: seLinuxMount is missed after changing to csi-operator

View the Description View the linked PRs

https://github.com/openshift/csi-operator/blob/master/assets/overlays/aws-ebs/base/csidriver.yaml

Missed "seLinuxMount: true" which has been merged in https://github.com/bertinatto/aws-ebs-csi-driver-operator-1/blob/0a9642cff6d2a7f9aea940ce89b65fc189cba6b6/assets/csidriver.yaml#L14

Bug OCPBUGS-30192: MCD degraded on content mismatch for resolv-prepender script

View the Description View the linked PRs

I took a look at Component Readiness today and noticed that "[sig-cluster-lifecycle] cluster upgrade should complete in a reasonable time" is permafailing. I modified the sample start time to see that is appears to have started around February 19th.

Is this expected with 4.16 or do we have a problem?

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Cluster%20Version%20Operator&confidence=95&environment=ovn%20upgrade-minor%20amd64%20metal-ipi%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-03-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-01-16%2000%3A00%3A00&testId=Cluster%20upgrade%3A0bf7638bc532109d8a7a3c395e2867da&testName=%5Bsig-cluster-lifecycle%5D%20cluster%20upgrade%20should%20complete%20in%20a%20reasonable%20time&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

Component Readiness has found a potential regression in [sig-cluster-lifecycle] cluster upgrade should complete in a reasonable time.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.16
Start Time: 2024-02-27T00:00:00Z
End Time: 2024-03-04T23:59:59Z
Success Rate: 0.00%
Successes: 0
Failures: 4
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 47
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Cluster%20Version%20Operator&confidence=95&environment=ovn%20upgrade-minor%20amd64%20metal-ipi%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-03-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-02-27%2000%3A00%3A00&testId=Cluster%20upgrade%3A0bf7638bc532109d8a7a3c395e2867da&testName=%5Bsig-cluster-lifecycle%5D%20cluster%20upgrade%20should%20complete%20in%20a%20reasonable%20time&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

https://github.com/openshift/machine-config-operator/pull/4304

Bug OCPBUGS-31285: ART requests updates to 4.16 image ose-olm-operator-controller-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-operator-controller/pull/86

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-31355: Add readOnly prop to ResourceYAMLEditor component for Dynamic plugins

View the Description View the linked PRs

Description of problem:

    ResourceYAMLEditor don't have readOnly prop which will help to hide the Save button in YAML editor which don't allow user to edit resource.  

https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#resourceyamleditor

https://github.com/openshift/console/pull/13694

Bug OCPBUGS-24649: Private endpoint creation does not work on cluster created with minimal permissions

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/971

Bug OCPBUGS-26124: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/107

Bug OCPBUGS-26140: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/36

Task MGMT-17489: Fix periodic-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi-periodic job

View the Description View the linked PRs

periodic-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi-periodic job is failing permanently. see - https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi-periodic

https://github.com/openshift/assisted-service/pull/6163

Bug OCPBUGS-24473: when set a custom endpoint, the private IAM url would be overrode together for installing a ibmcloud cluster

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7805

Bug OCPBUGS-25594: tlsSecurityProfile definitions do not align with documentation

View the Description View the linked PRs

Description of problem:

tlsSecurityProfile definitions do not align with documentation.

When using `oc explain` the field descriptions note that certain values are unsupported, but the same values are supported in the OpenShift Documentation. 

This needs to be clarified and the spacing should be fixed in the descriptions as they are hard to understand.

Version-Release number of selected component (if applicable):

    4.14.1

How reproducible:

⇒ oc explain ingresscontroller.spec.tlsSecurityProfile.modern

Steps to Reproduce:

    1. Check the `oc explain` output

Actual results:

    ⇒ oc explain ingresscontroller.spec.tlsSecurityProfile.modern KIND:     IngressController VERSION:  operator.openshift.io/v1DESCRIPTION:      modern is a TLS security profile based on:      https://wiki.mozilla.org/Security/Server_Side_TLS#Modern_compatibility and      looks like this (yaml):      ciphers: - TLS_AES_128_GCM_SHA256 - TLS_AES_256_GCM_SHA384 -      TLS_CHACHA20_POLY1305_SHA256 minTLSVersion: TLSv1.3 NOTE: Currently      unsupported.

Expected results:

    An output that aligns with the documentation regarding support/unsupported TLS versions Additionally, fixing the output format would be useful as it is very hard to understand/read in it's current form.

Here in the 4.14 Documentation, it states:
```
The HAProxy Ingress Controller image supports TLS 1.3 and the Modern profile.
```

Additional info:

The `apiserver` CR should also be checked for the same thing.

https://github.com/openshift/api/pull/1730

Bug OCPBUGS-27016: ResizeObserver limit exceed in dev-console

View the Description View the linked PRs

Description of problem:

    During automation test execution for dev-console package it is observed that cypress intensely fails the ongoing test due to "uncaught:exception : ResizeObserver limit exceed", but there is no visible failure from UI.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

Expected results:

Additional info: Screenshot

https://github.com/openshift/console/pull/13502

Bug OCPBUGS-35296: Unit test TestImportOrdering fails in openshift/coredns

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34619~~. The following is the description of the original issue:
—
Description of problem:

`make test` is failing in openshift/coredns repo due to TestImportOrdering failure.

This is due to the recent addition of the github.com/openshift/coredns-ocp-dnsnameresolver external plugin and the fact that CoreDNS doesn't generate zplugin.go formatted correctly so TestImportOrdering fails after generation.

Version-Release number of selected component (if applicable):

4.16-4.17

How reproducible:

    100%

Steps to Reproduce:

    1. make test

Actual results:

    TestImportOrdering failure

Expected results:

    TestImportOrdering should not fail

Additional info:

I created an upstream issue and PR: https://github.com/coredns/coredns/pull/6692 which recently merged.

We will just need to carry-patch this in 4.17 and 4.16.

The CoreDNS 1.11.3 rebase https://github.com/openshift/coredns/pull/118 is blocked on this.

https://github.com/openshift/coredns/pull/123

Bug OCPBUGS-38269: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-38705: [4.16] EgressIP intermittent connection timeout while communicating with external services

View the Description View the linked PRs

Description of problem:

- Pods that reside in a namespace utilizing EgressIP are experiencing intermittent TCP IO timeouts when attempting to communicate with external services.

Connection response while connecting external service from one of the pods:

❯ oc exec gitlab-runner-aj-02-56998875b-n6xxb -- bash -c 'while true; do timeout 3 bash -c "</dev/tcp/10.135.108.56/443" && echo "Connection success" || echo "Connection timeout"; sleep 0.5; done'
Connection success
Connection timeout
Connection timeout
Connection timeout
Connection timeout
Connection timeout
Connection success
Connection timeout
Connection success

The customer followed this solution https://access.redhat.com/solutions/7005481 and noticed an IP address in logical_router_policy nexthops that is not associated with any node.

# Get pod node and podIP variable for the problematic pod 
❯ oc get pod gitlab-runner-aj-02-56998875b-n6xxb -ojson 2>/dev/null | jq -r '"\(.metadata.name) \(.spec.nodeName) \(.status.podIP)"' | read -r pod node podip

# Find the ovn-kubernetes pod running on the same node as  gitlab-runner-aj-02-56998875b-n6xxb
❯ oc get pods -n openshift-ovn-kubernetes -lapp=ovnkube-node -ojson | jq --arg node "$node" -r '.items[] | select(.spec.nodeName == $node)| .metadata.name' | read -r ovn_pod

# Collect each possible logical switch port address into variable LSP_ADDRESSES
❯ LSP_ADDRESSES=$(oc -n openshift-ovn-kubernetes exec ${ovn_pod} -it -c northd -- bash -c 'ovn-nbctl lsp-list transit_switch | while read guid name; do printf "%s " "${name}"; ovn-nbctl lsp-get-addresses "${guid}"; done')

# List the logical router policy for the problematic pod
❯ oc -n openshift-ovn-kubernetes exec ${ovn_pod} -c northd -- ovn-nbctl find logical_router_policy match="\"ip4.src == ${podip}\""
_uuid               : c55bec59-6f9a-4f01-a0b1-67157039edb8
action              : reroute
external_ids        : {name=gitlab-runner-caasandpaas-egress}
match               : "ip4.src == 172.40.114.40"
nexthop             : []
nexthops            : ["100.88.0.22", "100.88.0.57"]
options             : {}
priority            : 100

# Check whether each nexthop entry exists in the LSP addresses table
❯ echo $LSP_ADDRESSES | grep 100.88.0.22
(tstor-c1nmedi01-9x2g9-worker-cloud-paks-m9t6b) 0a:58:64:58:00:16 100.88.0.22/16
❯ echo $LSP_ADDRESSES | grep 100.88.0.57

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Pods configured to use EgressIP face intermittent connection timeout while connecting to external services.

Expected results:

The connection timeout should not happen.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
For guidance on using this template please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/ovn-kubernetes/pull/2271

Bug OCPBUGS-29114: Installer creates CPMS incorrectly for vSphere IPI when static IPs are configured

View the Description View the linked PRs

Description of problem:

When installing a new vSphere cluster with static IPs, control plane machine sets (CPMS) are also enabled in TechPreviewNoUpgrade and the installer applies the incorrect config to the CPMS resulting in masters being recreated.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

always

Steps to Reproduce:

1. create install-config.yaml with static IPs following documentation
2. run `openshift-install create cluster`
3. as install progresses, watch the machines definitions

Actual results:

new master machines are created

Expected results:

all machines are the same as what was created by the installer.

Additional info:

Bug OCPBUGS-36890: [CAPI Azure] capi processes are still running when installer failed to start cluster-api-provider-azureaso and exited

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36378~~. The following is the description of the original issue:
—
Description of problem:

When creating cluster with service principal certificate, as known issues OCPBUGS-36360, installer exited with error.

# ./openshift-install create cluster --dir ipi6 
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
WARNING Using client certs to authenticate. Please be warned cluster does not support certs and only the installer does. 
INFO Consuming Install Config from target directory 
WARNING FeatureSet "CustomNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. 
INFO Creating infrastructure resources...         
INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig 
WARNING Using client certs to authenticate. Please be warned cluster does not support certs and only the installer does. 
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:36847 --webhook-port=38905 --webhook-cert-dir=/tmp/envtest-serving-certs-941163289 --kubeconfig=/tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig] 
INFO Running process: azure infrastructure provider with args [-v=2 --health-addr=127.0.0.1:44743 --webhook-port=35373 --webhook-cert-dir=/tmp/envtest-serving-certs-3807817663 --feature-gates=MachinePool=false --kubeconfig=/tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig] 
INFO Running process: azureaso infrastructure provider with args [-v=0 -metrics-addr=0 -health-addr=127.0.0.1:45179 -webhook-port=37401 -webhook-cert-dir=/tmp/envtest-serving-certs-1364466879 -crd-pattern= -crd-management=none] 
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to run cluster api system: failed to run controller "azureaso infrastructure provider": failed to start controller "azureaso infrastructure provider": timeout waiting for process cluster-api-provider-azureaso to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready) 
INFO Shutting down local Cluster API control plane... 
INFO Local Cluster API system has completed operations 

From output, local cluster API system is shut down. But when checking processes, only parent process installer exit, CAPI related processes are still running.

When local control plane is running:
# ps -ef|grep cluster | grep -v grep
root       13355    6900 39 08:07 pts/1    00:00:13 ./openshift-install create cluster --dir ipi6
root       13365   13355  2 08:08 pts/1    00:00:00 ipi6/cluster-api/etcd --advertise-client-urls=http://127.0.0.1:41341 --data-dir=ipi6/.clusterapi_output/etcd --listen-client-urls=http://127.0.0.1:41341 --listen-peer-urls=http://127.0.0.1:34081 --unsafe-no-fsync=true
root       13373   13355 55 08:08 pts/1    00:00:10 ipi6/cluster-api/kube-apiserver --allow-privileged=true --authorization-mode=RBAC --bind-address=127.0.0.1 --cert-dir=/tmp/k8s_test_framework_50606349 --client-ca-file=/tmp/k8s_test_framework_50606349/client-cert-auth-ca.crt --disable-admission-plugins=ServiceAccount --etcd-servers=http://127.0.0.1:41341 --secure-port=38483 --service-account-issuer=https://127.0.0.1:38483/ --service-account-key-file=/tmp/k8s_test_framework_50606349/sa-signer.crt --service-account-signing-key-file=/tmp/k8s_test_framework_50606349/sa-signer.key --service-cluster-ip-range=10.0.0.0/24
root       13385   13355  0 08:08 pts/1    00:00:00 ipi6/cluster-api/cluster-api -v=2 --diagnostics-address=0 --health-addr=127.0.0.1:36847 --webhook-port=38905 --webhook-cert-dir=/tmp/envtest-serving-certs-941163289 --kubeconfig=/tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig
root       13394   13355  6 08:08 pts/1    00:00:00 ipi6/cluster-api/cluster-api-provider-azure -v=2 --health-addr=127.0.0.1:44743 --webhook-port=35373 --webhook-cert-dir=/tmp/envtest-serving-certs-3807817663 --feature-gates=MachinePool=false --kubeconfig=/tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig

After installer exited:
# ps -ef|grep cluster | grep -v grep
root       13365       1  1 08:08 pts/1    00:00:01 ipi6/cluster-api/etcd --advertise-client-urls=http://127.0.0.1:41341 --data-dir=ipi6/.clusterapi_output/etcd --listen-client-urls=http://127.0.0.1:41341 --listen-peer-urls=http://127.0.0.1:34081 --unsafe-no-fsync=true
root       13373       1 45 08:08 pts/1    00:00:35 ipi6/cluster-api/kube-apiserver --allow-privileged=true --authorization-mode=RBAC --bind-address=127.0.0.1 --cert-dir=/tmp/k8s_test_framework_50606349 --client-ca-file=/tmp/k8s_test_framework_50606349/client-cert-auth-ca.crt --disable-admission-plugins=ServiceAccount --etcd-servers=http://127.0.0.1:41341 --secure-port=38483 --service-account-issuer=https://127.0.0.1:38483/ --service-account-key-file=/tmp/k8s_test_framework_50606349/sa-signer.crt --service-account-signing-key-file=/tmp/k8s_test_framework_50606349/sa-signer.key --service-cluster-ip-range=10.0.0.0/24
root       13385       1  0 08:08 pts/1    00:00:00 ipi6/cluster-api/cluster-api -v=2 --diagnostics-address=0 --health-addr=127.0.0.1:36847 --webhook-port=38905 --webhook-cert-dir=/tmp/envtest-serving-certs-941163289 --kubeconfig=/tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig
root       13394       1  0 08:08 pts/1    00:00:00 ipi6/cluster-api/cluster-api-provider-azure -v=2 --health-addr=127.0.0.1:44743 --webhook-port=35373 --webhook-cert-dir=/tmp/envtest-serving-certs-3807817663 --feature-gates=MachinePool=false --kubeconfig=/tmp/jima/ipi6/.clusterapi_output/envtest.kubeconfig


Another scenario, ran capi-based installer on the small disk, and installer stuck there and didn't exit until interrupted until <Ctrl> + C. Then checked that all CAPI related processes were still running, only installer process was killed.

[root@jima09id-vm-1 jima]# ./openshift-install create cluster --dir ipi4
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
INFO Consuming Install Config from target directory 
WARNING FeatureSet "CustomNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. 
INFO Creating infrastructure resources...         
INFO Started local control plane with envtest     
INFO Stored kubeconfig for envtest in: /tmp/jima/ipi4/.clusterapi_output/envtest.kubeconfig 
INFO Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:42017 --webhook-port=41085 --webhook-cert-dir=/tmp/envtest-serving-certs-1774658110 --kubeconfig=/tmp/jima/ipi4/.clusterapi_output/envtest.kubeconfig] 
INFO Running process: azure infrastructure provider with args [-v=2 --health-addr=127.0.0.1:38387 --webhook-port=37783 --webhook-cert-dir=/tmp/envtest-serving-certs-1319713198 --feature-gates=MachinePool=false --kubeconfig=/tmp/jima/ipi4/.clusterapi_output/envtest.kubeconfig] 
FATAL failed to extract "ipi4/cluster-api/cluster-api-provider-azureaso": write ipi4/cluster-api/cluster-api-provider-azureaso: no space left on device 
^CWARNING Received interrupt signal                    
^C[root@jima09id-vm-1 jima]#
[root@jima09id-vm-1 jima]# ps -ef|grep cluster | grep -v grep
root       12752       1  0 07:38 pts/1    00:00:00 ipi4/cluster-api/etcd --advertise-client-urls=http://127.0.0.1:38889 --data-dir=ipi4/.clusterapi_output/etcd --listen-client-urls=http://127.0.0.1:38889 --listen-peer-urls=http://127.0.0.1:38859 --unsafe-no-fsync=true
root       12760       1  4 07:38 pts/1    00:00:09 ipi4/cluster-api/kube-apiserver --allow-privileged=true --authorization-mode=RBAC --bind-address=127.0.0.1 --cert-dir=/tmp/k8s_test_framework_3790461974 --client-ca-file=/tmp/k8s_test_framework_3790461974/client-cert-auth-ca.crt --disable-admission-plugins=ServiceAccount --etcd-servers=http://127.0.0.1:38889 --secure-port=44429 --service-account-issuer=https://127.0.0.1:44429/ --service-account-key-file=/tmp/k8s_test_framework_3790461974/sa-signer.crt --service-account-signing-key-file=/tmp/k8s_test_framework_3790461974/sa-signer.key --service-cluster-ip-range=10.0.0.0/24
root       12769       1  0 07:38 pts/1    00:00:00 ipi4/cluster-api/cluster-api -v=2 --diagnostics-address=0 --health-addr=127.0.0.1:42017 --webhook-port=41085 --webhook-cert-dir=/tmp/envtest-serving-certs-1774658110 --kubeconfig=/tmp/jima/ipi4/.clusterapi_output/envtest.kubeconfig
root       12781       1  0 07:38 pts/1    00:00:00 ipi4/cluster-api/cluster-api-provider-azure -v=2 --health-addr=127.0.0.1:38387 --webhook-port=37783 --webhook-cert-dir=/tmp/envtest-serving-certs-1319713198 --feature-gates=MachinePool=false --kubeconfig=/tmp/jima/ipi4/.clusterapi_output/envtest.kubeconfig
root       12851    6900  1 07:41 pts/1    00:00:00 ./openshift-install destroy cluster --dir ipi4

Version-Release number of selected component (if applicable):

   4.17 nightly build

How reproducible:

    Always

Steps to Reproduce:

    1. Run capi-based installer
    2. Installer failed to start some capi process and exited 
    3.

Actual results:

    Installer process exited, but capi related processes are still running

Expected results:

    Both installer and all capi related processes are exited.

Additional info:

https://github.com/openshift/installer/pull/8726

Task MON-3792: Fix metrics-server path for /test versions

View the Description View the linked PRs

Owing to the older path being referenced in the prow workflow, we saw consistent failure for the /test versions job.

https://github.com/openshift/cluster-monitoring-operator/pull/2286

Bug OCPBUGS-24241: RHOCP installation on RHOSP fails with an error "Incompatible openstacksdk library found"

View the Description View the linked PRs

RHOCP installation on RHOSP fails with an error

~~~

$ ansible-playbook -i inventory.yaml security-groups.yaml
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Incompatible openstacksdk library found: Version MUST be >=1.0 and <=None, but 0.36.5 is smaller than minimum version 1.0."}

~~~

Packages Installed :

ansible-2.9.27-1.el8ae.noarch Fri Oct 13 06:56:05 2023
python3-netaddr-0.7.19-8.el8.noarch Fri Oct 13 06:55:44 2023
python3-openstackclient-4.0.2-2.20230404115110.54bf2c0.el8ost.noarch Tue Nov 21 01:38:32 2023
python3-openstacksdk-0.36.5-2.20220111021051.feda828.el8ost.noarch Fri Oct 13 06:55:52 2023

Document followed :
https://docs.openshift.com/container-platform/4.13/installing/installing_openstack/installing-openstack-user.html#installation-osp-downloading-modules_installing-openstack-user

https://github.com/openshift/installer/pull/7821

Bug OCPBUGS-36450: [4.16] Can't install operator on 4.15 after uninstalling it on a prior version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31073~~. The following is the description of the original issue:
—
Description of problem

I had a version of MTC installed on my cluster when it was running a prior version. I had deleted it some time ago, long before upgrading to 4.15. I upgraded it to 4.15 and needed to reinstall to take a look at something, but found the operator would not install.

I originally tried with 4.15.0, but on failure upgraded to 4.15.3 to see if it would resolve the issue, but it did no.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.15.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.3
Kubernetes Version: v1.28.7+6e2789b

How reproducible:

Always as far as I can tell. I have at least two clusters where I was able to reproduce it.

Steps to Reproduce:

    1. Install Migration Toolkit for Containers on OpenShift 4.14
    2. Uninstall it
    3. Upgrade to 4.15
    4. Try to install it again

Actual results:

The operator never installs. UI just shows "Upgrade status: Unkown Failure"

Observe the catalog operator logs and note errors like:
E0319 21:35:57.350591       1 queueinformer_operator.go:319] sync {"update" "openshift-migration"} failed: bundle unpacking failed with an error: [roles.rbac.authorization.k8s.io "c1572438804f004fb90b6768c203caad96c47331f7ecc4f68c3cf6b43b0acfd" already exists, roles.rbac.authorization.k8s.io "724788f6766aa5ba19b24ef4619b6a8e8e856b8b5fb96e1380f0d3f5b9dcb7a" already exists]

If you delete the roles, you'll get the same for rolebindings, then the same for jobs.batch, and then configmaps.

Expected results:

Operator just installs

Additional info:

If you clean up all these resources the operator will install successfully.

https://github.com/openshift/operator-framework-olm/pull/809

Bug OCPBUGS-22623: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-nutanix/pull/29

Bug OCPBUGS-23511: CPO needs to filter out IAM Role paths when modifying the allowed principals of a VPC Endpoint Service

View the Description View the linked PRs

Description of problem:

When creating an IAM role with a "path" (https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-friendly-names) its "principal" when applied to trust policies or VPC Endpoint Service allowed principals confusingly does not include the path. That is, for the folowing rolesRef on a hostedcluster:

spec:
  platform:
    aws:
      rolesRef:
        controlPlaneOperatorARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-kube-system-control-plane-operator
        imageRegistryARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-openshift-image-registry-installer-cloud-crede
        ingressARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-openshift-ingress-operator-cloud-credentials
        kubeCloudControllerARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-kube-system-kube-controller-manager
        networkARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-openshift-cloud-network-config-controller-clou
        nodePoolManagementARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-kube-system-capa-controller-manager
        storageARN: arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-openshift-cluster-csi-drivers-ebs-cloud-creden

The actual valid principal that should be added to the VPC Endpoint Service's allowed principals is:

arn:aws:iam::765374464689:role/ad-int-path1-y4y2-kube-system-control-plane-operator

instead of

arn:aws:iam::765374464689:role/adecorte/ad-int-path1-y4y2-kube-system-control-plane-operator

However, for all other cases, the full ARN including the path should be used, e.g. https://github.com/openshift/hypershift/blob/082e880d0a492a357663d620fa58314a4a477730/hypershift-operator/controllers/hostedcluster/internal/platform/aws/aws.go#L237-L273

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

100%

Steps to Reproduce:

ROSA HCP-specific steps:

1. rosa create account-roles --path /anything/ -m auto -y
2. rosa create cluster --hosted-cp
3. ...etc
4a. Observe on the hosted cluster AWS Account that the VPC Endpoint cannot be created with the error: 'failed to create vpc endpoint: InvalidServiceName'
4b. Observe on the management cluster that CPO is failing to update the VPC Endpoint Service's allowed principals with the error: Client.InvalidPrincipal
5. If the contents of .spec.platform.aws.rolesRef.controlPlaneOperatorARN are manually applied to the additional allowed principals with the path component removed, then the problems are largely fixed on the hosted cluster side. VPC Endpoint is created, worker nodes can spin up, etc.

Actual results:

The VPC Endpoint Service is attempting and failing to get this applied to its additional allowed principals:

arn:aws:iam::${ACCOUNT_ID}:role/path/name

Expected results:

The VPC Endpoint Service gets this applied to its additional allowed principals:

arn:aws:iam::${ACCOUNT_ID}:role/name

Additional info:

https://github.com/openshift/hypershift/pull/3215

Story TRT-1477: Standup GCP Liveness Monitoring Endpoint

View the Description View the linked PRs

As part of our investigation into GCP disruption we want an endpoint separate from the cluster under test but inside GCP to monitor for connectivity.

One approach is to use a GCP Cloud Function with a HTTP Trigger

Another alternative it to standup our own server and collect logging

We need to consider cost of implementation, cost of maintaining and how well the implementation lines up with our overall test scenario (we are wanting to use this as a control to compare with reaching a pod within a cluster under test)

We may want to also consider standing up similar endpoints in AWS and Azure in the future.

A separate story will cover monitoring the endpoint from within Origin

We want to capture the audit id and log when we receive an incoming request
audit id could include the build id or another field could be used to correlate back to the job instance

https://github.com/openshift/origin/pull/28583

Bug OCPBUGS-25754: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-gcp/pull/54

Bug OCPBUGS-30873: CEO aliveness check should only detect deadlocks

View the Description View the linked PRs

Description of problem:

From a test run in [1] we can't be sure whether the call to etcd was really deadlocked or just waiting for a result.

Currently the CheckingSyncWrapper only defines "alive" as a sync func that has not returned an error. This can be wrong in scenarios where a member is down and perpetually not reachable. 
Instead, we wanted to detect deadlock situations where the sync loop is just stuck for a prolonged period of time.

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-upgrade-from-stable-4.15-e2e-metal-ipi-upgrade-ovn-ipv6/1762965898773139456/

Version-Release number of selected component (if applicable):

>4.14

How reproducible:

Always

Steps to Reproduce:

    1. create a healthy cluster
    2. make sure one etcd member never responds, but the node is still there (ie kubelet shutdown, blocking the etcd ports on a firewall)
    3. wait for the CEO to restart pod on failing health probe and dump its stack

Actual results:

CEO controllers are returning errors, but might not deadlock, which currently results in a restart

Expected results:

CEO should mark the member as unhealthy and continue its service without getting deadlocked and should not restart its pod by failing the health probe

Additional info:

highly related to ~~OCPBUGS-30169~~

https://github.com/openshift/cluster-etcd-operator/pull/1223

Bug OCPBUGS-42109: Node sclaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42108~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38012. The following is the description of the original issue:
—
Description of problem:

Customers are unable to scale-up the OCP nodes when the initial setup is done with OCP 4.8/4.9 and then upgraded to 4.15.22/4.15.23

At first customer observed that the node scale-up failed and the /etc/resolv.conf was empty on the nodes.
As a workaround, customer copy/paste the resolv.conf content from a correct resolv.conf and then it continued with setting up the new node.

However then they observed the rendered MachineConfig assembled with the 00-worker, and suspected that something can be wrong with the on-prem-resolv-prepender.service service definition.
As a workaround, customer manually changed this service definition which helped them to scale up new nodes.

Version-Release number of selected component (if applicable):

4.15 , 4.16

How reproducible:

100%

Steps to Reproduce:

1. Install OCP vSphere IPI cluster version 4.8 or 4.9
2. Check "on-prem-resolv-prepender.service" service definition
3. Upgrade it to 4.15.22 or 4.15.23
4. Check if the node scaling is working 
5. Check "on-prem-resolv-prepender.service" service definition

Actual results:

Unable to scaleup node with default service definition. After manually making changes in the service definition , scaling is working.

Expected results:

Node sclaing should work without making any manual changes in the service definition.

Additional info:

on-prem-resolv-prepender.service content on the clusters build with 4.8 / 4.9 version and then upgraded to 4.15.22 / 4.25.23 :
~~~
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
[Service]
Type=oneshot
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=0
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env
~~~

After manually correcting the service definition as below, scaling works on 4.15.22 / 4.15.23 :
~~~
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
StartLimitIntervalSec=0                -----------> this
[Service]
Type=oneshot
#Restart=on-failure                    -----------> this
RestartSec=10
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env
~~~

Below is the on-prem-resolv-prepender.service on a freshly intsalled 4.15.23 where sclaing is working fine :
~~~
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
StartLimitIntervalSec=0
[Service]
Type=oneshot
Restart=on-failure
RestartSec=10
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env
~~~

Observed this in the rendered MachineConfig which is assembled with the 00-worker

https://github.com/openshift/machine-config-operator/pull/4616

Bug OCPBUGS-24991: Update 4.16 ose-vsphere-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/59

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/59

Bug OCPBUGS-35312: aws: rename `preserverBootstrapIgnition` install-config option

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33661~~. The following is the description of the original issue:
—
Description of problem:

`preserveBootstrapIgnition` was named after the implementation details in terraform for how to make deleting S3 objects optional. The motivation behind the change was that some customers run installs in subscriptions where policies do not allow deleting s3 objects. They didn't want the install to fail because of that.

With the move from terraform to capi/capa, this is now implemented differently: capa always tries to delete the s3 objects but will ignore any permission errors if `preserveBootstrapIgnition` is set.

We should rename this option so it's clear that the objects will be deleted if there are enough permissions. My suggestion is to name something similar to what's named in CAPA: `allowBestEffortDeleteIgnition`.

Ideally we should deprecate `preserveBootstrapIgnition` in 4.16 and remove it in 4.17.

Version-Release number of selected component (if applicable):

    4.14+ but I don't think we want to change this for terraform-based installs

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    https://github.com/openshift/installer/pull/7288

https://github.com/openshift/installer/pull/8574

Bug OCPBUGS-25559: Update 4.16 ose-cluster-autoscaler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/305

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/305

Bug OCPBUGS-35268: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4185

Bug OCPBUGS-35718: Failed to deploy a cluster (failed to create security groups) when using 3 compact node cluster and dualstack IPv4

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35533~~. The following is the description of the original issue:
—
Description of problem:

Failed to deploy the cluster with the following error:
time="2024-06-13T14:01:11Z" level=debug msg="Creating the security group rules"time="2024-06-13T14:01:19Z" level=error msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed during pre-provisioning: failed to create security groups: failed to create the security group rule on group \"cb9a607c-9799-4186-bc22-26f141ce91aa\" for IPv4 tcp on ports 1936-1936: Bad request with: [POST https://10.46.44.159:13696/v2.0/security-group-rules], error message: {\"NeutronError\": {\"type\": \"SecurityGroupRuleParameterConflict\", \"message\": \"Conflicting value ethertype IPv4 for CIDR fd2e:6f44:5dd8:c956::/64\", \"detail\": \"\"}}"time="2024-06-13T14:01:20Z" level=debug msg="OpenShift Installer 4.17.0-0.nightly-2024-06-13-083330"time="2024-06-13T14:01:20Z" level=debug msg="Built from commit 6bc75dfebaca79ecf302263af7d32d50c31f371a"time="2024-06-13T14:01:20Z" level=debug msg="Loading Install Config..."time="2024-06-13T14:01:20Z" level=debug msg="  Loading SSH Key..."time="2024-06-13T14:01:20Z" level=debug msg="  Loading Base Domain..."time="2024-06-13T14:01:20Z" level=debug msg="    Loading Platform..."time="2024-06-13T14:01:20Z" level=debug msg="  Loading Cluster Name..."time="2024-06-13T14:01:20Z" level=debug msg="    Loading Base Domain..."time="2024-06-13T14:01:20Z" level=debug msg="    Loading Platform..."time="2024-06-13T14:01:20Z" level=debug msg="  Loading Pull Secret..."time="2024-06-13T14:01:20Z" level=debug msg="  Loading Platform..."time="2024-06-13T14:01:20Z" level=debug msg="Using Install Config loaded from state file"time="2024-06-13T14:01:20Z" level=debug msg="Loading Agent Config..."time="2024-06-13T14:01:20Z" level=info msg="Waiting up to 40m0s (until 2:41PM UTC) for the cluster at https://api.ostest.shiftstack.com:6443 to initialize..."

Version-Release number of selected component (if applicable):

4.17.0-0.nightly-2024-06-13-083330

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8623

Story CLID-67: Add --v1 flag

View the Description View the linked PRs

Customer is asking for this flag so they can keep using v1 code even when v2 will be the default.

https://github.com/openshift/oc-mirror/pull/806

Bug ETCD-621: [4.16] Consume api values in CEO

View the linked PRs

https://github.com/openshift/cluster-etcd-operator/pull/1265

Story TRT-1625: KubePodNotReady alert in openshift-marketplace is killing payloads

View the Description View the linked PRs

This change is killing payloads and thus the org is blocked at a fairly critical time.

Sample failure: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-upgrade/1783654894146686976

Hitting the test: [OLM][invariant] alert/KubePodNotReady should not be at or above info in ns/openshift-marketplace

Possibly more.

~~Suspect related to: https://github.com/openshift/origin/pull/28741 which merged Apr 25 20:11 UTC, and sippy db shows the failures start showing at 21:19, before that nothing for months.~~

Looks related to:

Failed to pull image "registry.redhat.io/redhat/redhat-marketplace-index:v4.16": copying system image from manifest list: reading signatures: parsing signature https://registry.redhat.io/containers/sigstore/redhat/redhat-marketplace-index@sha256=7ff75c6598abd1a2abe9fa3db8a805fa552798361272b983ea07c9e9ef22d686/signature-2: unrecognized signature format, starting with binary 0x3c

We suspect there is a problem with the images and the failure may be legitimate.

Use the sippy test details to view the pass rate for the test which exposes this day by day.

https://github.com/openshift/origin/pull/28747

Bug OCPBUGS-24722: Update 4.16 golang-github-prometheus-node_exporter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/node_exporter/pull/140

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/node_exporter/pull/140

Bug OCPBUGS-33668: vSphere UPI powercli script creating folder with unexpected name

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel].

Probability of significant regression: 99.79%

Sample (being evaluated) Release: 4.16
Start Time: 2024-05-08T00:00:00Z
End Time: 2024-05-14T23:59:59Z
Success Rate: 83.33%
Successes: 25
Failures: 5
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 67
Failures: 0
Flakes: 1

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Monitoring&confidence=95&environment=ovn%20no-upgrade%20amd64%20vsphere%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=vsphere&platform=vsphere&sampleEndTime=2024-05-14%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-05-08%2000%3A00%3A00&testId=openshift-tests%3Ac1f54790201ec8f4241eca902f854b79&testName=%5Bsig-instrumentation%5D%20Prometheus%20%5Bapigroup%3Aimage.openshift.io%5D%20when%20installed%20on%20the%20cluster%20shouldn%27t%20report%20any%20alerts%20in%20firing%20state%20apart%20from%20Watchdog%20and%20AlertmanagerReceiversNotConfigured%20%5BEarly%5D%5Bapigroup%3Aconfig.openshift.io%5D%20%5BSkipped%3ADisconnected%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard

Notes:

We need to fix the following error:

I0514 12:31:46.014919       1 vsphere_check.go:272] CheckAccountPermissions failed: specified folder not found: folder '/IBMCdatacenter/vm/ci-op-1qvr0jdj-10b01' not found

This seems to be caused by the new powercli script. It is creating the folder based on infra id instead of cluster name. We'll change this to match expected and verify error is resolved.

https://github.com/openshift/installer/pull/8414

Bug OCPBUGS-30511: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver/pull/60

Bug OCPBUGS-29773: The hypershift installer does not set the cipher suites for konnectivity-server

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3618

Bug OCPBUGS-23544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-33932: Automatic scaling not always working because NodeGroup.GetOptions() not being implemented

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33592~~. The following is the description of the original issue:
—
Description of problem:

While investigating a problem with OpenShift Container Platform 4 - Node scaling, I found the below messages reported in my OpenShift Container Platform 4 - Cluster.

E0513 11:15:09.331353       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.331365       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.332076       1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented
I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]

The same events are reported in must-gather reviewed from customers. Given that we have https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 that appear to be solved via https://github.com/kubernetes/autoscaler/pull/6677 and https://github.com/kubernetes/autoscaler/pull/6038 I'm wondering whether we should pull in those changes as they seem to eventually impact automated scaling of OpenShift Container Platform 4 - Node(s).

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.15

How reproducible:

Always

Steps to Reproduce:

1. Setup OpenShift Container Platform 4 with ClusterAutoscaler configured
2. Trigger scaling activity and verify the cluster-autoscaler-default logs

Actual results:

Logs like the below are being reported.

E0513 11:15:09.331353       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.331365       1 orchestrator.go:450] Couldn't get autoscaling options for ng: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
E0513 11:15:09.332076       1 orchestrator.go:507] Failed to get autoscaling options for node group MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c: Not implemented
I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]

Expected results:

Scale-up of OpenShift Container Platform 4 - Node to happen without error being reported

I0513 11:15:09.331529       1 orchestrator.go:546] Pod project-100/curl-67f84bd857-h92wb can't be scheduled on MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c, predicate checking error: node(s) didn't match Pod's node affinity/selector; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity/selector; debugInfo=
I0513 11:15:09.331684       1 orchestrator.go:157] No pod can fit to MachineSet/openshift-machine-api/test-12345-batch-amd64-us-east-2c
I0513 11:15:09.332100       1 orchestrator.go:185] Best option to resize: MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332110       1 orchestrator.go:189] Estimated 1 nodes needed in MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c
I0513 11:15:09.332135       1 orchestrator.go:295] Final scale-up plan: [{MachineSet/openshift-machine-api/test-12345-batch-arm64-us-east-2c 0->1 (max: 12)}]

Additional info:

Please review https://github.com/kubernetes/autoscaler/issues/6037 and https://github.com/kubernetes/autoscaler/issues/6676 as they seem to document the problem and also have a solution linked/merged

https://github.com/openshift/kubernetes-autoscaler/pull/303

Bug OCPBUGS-25940: Failed spot VM machinesets in non-zonal Azure regions

View the Description View the linked PRs

Description of problem:

New spot VMs fail to be created by machinesets defining providerSpec.value.spotVMOptions in Azure regions without Availability Zones.

Azure-controller logs the error: Azure Spot Virtual Machine is not supported in Availability Set.

A new availabilitySet is created for each machineset in non-zonal regions, but this only works with normal nodes. Spot VMs and availabilitySets are incompatible as per Microsoft docs for this error: You need to choose to either use an Azure Spot Virtual Machine or use a VM in an availability set, you can't choose both.
From: https://learn.microsoft.com/en-us/azure/virtual-machines/error-codes-spot

Version-Release number of selected component (if applicable):

n/a

How reproducible:

    Always

Steps to Reproduce:

1. Follow the instructions to create a machineset to provision spot VMs: 
  https://docs.openshift.com/container-platform/4.12/machine_management/creating_machinesets/creating-machineset-azure.html#machineset-creating-non-guaranteed-instance_creating-machineset-azure

2. New machines will be in Failed state:
$ oc get machines -A
NAMESPACE               NAME                                            PHASE     TYPE              REGION       ZONE   AGE
openshift-machine-api   mabad-test-l5x58-worker-southindia-spot-c4qr5   Failed                                          7m17s
openshift-machine-api   mabad-test-l5x58-worker-southindia-spot-dtzsn   Failed                                          7m17s
openshift-machine-api   mabad-test-l5x58-worker-southindia-spot-tzrhw   Failed                                          7m28s


3. Events in the failed machines show errors creating spot VMs with availabilitySets:
Events:
  Type     Reason             Age                 From                           Message
  ----     ------             ----                ----                           -------
  Warning  FailedCreate       28s                 azure-controller               InvalidConfiguration: failed to reconcile machine "mabad-test-l5x58-worker-southindia-spot-dx78z": failed to create vm mabad-test-l5x58-worker-southindia-spot-dx78z: failure sending request for machine mabad-test-l5x58-worker-southindia-spot-dx78z: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Azure Spot Virtual Machine is not supported in Availability Set. For more information, see http://aka.ms/AzureSpot/errormessages."

Actual results:

     Machines stay in Failed state and nodes are not created

Expected results:

     Machines get created and new spot VM nodes added to the cluster.

Additional info:

    This problem was identified from a customer alert in an ARO cluster. ICM for ref (requires b- MSFT account): https://portal.microsofticm.com/imp/v3/incidents/incident/455463992/summary

https://github.com/openshift/machine-api-provider-azure/pull/91

Bug OCPBUGS-36498: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/889

Bug OCPBUGS-32440: Update multus-cni .snyk

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/multus-cni/pull/229

Bug OCPBUGS-30244: Update warning to add image names into the kubectl version mismatch message in addition to the version list

View the Description View the linked PRs

Description of problem:

when extracting `oc` and `openshift-install` from release payload  below warnings are shown which might be confusing for the user, to make this clear please help update the warning to add image names into the kubectl version mismatch message  in addition to the version list
    Version-Release number of selected component (if applicable):{code:none}

always
    How reproducible:{code:none}
Always

Steps to Reproduce:

    1. Run command to extract oc & openshift-install using `oc adm extract`
    2. Run oc adm release info --commits <payload>
    3.

Actual results:

    $ oc adm release info --commits registry.ci.openshift.org/ocp/release:4.16.0-0.ci-2024-03-05-032119
warning: multiple versions reported for the kubectl: 1.29.1,1.28.2,1.29.0

Expected results:

     show image names which needs kubernetes bump along with kubectl version

Additional info:

   Thread here:  https://redhat-internal.slack.com/archives/GK58XC2G2/p1709565188855519

https://github.com/openshift/oc/pull/1695

Bug OCPBUGS-43834: Enable topology e2e tests in CI

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39573. The following is the description of the original issue:
—
Description of problem:

Enabling the topology tests in CI

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/14438

Bug OCPBUGS-35527: GHSA-6wvf-f2vw-3425: ose-installer-container: containers/image allows unexpected authenticated registry accesses

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34037~~. The following is the description of the original issue:
—
Open Github Security Advisory for: containers/image

https://github.com/advisories/GHSA-6wvf-f2vw-3425

The ARO SRE team became aware of this advisory against our installer fork. Upstream installer is also pinning a vulnerable version of containerd.

Advisory recommends to update to versions 5.30.1

https://github.com/openshift/installer/pull/8611

Bug OCPBUGS-32248: Subnet label values are not interpreted correctly by new scheduler

View the Description View the linked PRs

Description of problem:

New hypershift scheduler is not replacing '.' with ',' in subnet label values, resulting in invalid subnet annotations for load balancer services.

Version-Release number of selected component (if applicable):

 4.16.0

How reproducible:

 Always

Steps to Reproduce:

    1. Using new hypershift request serving node scheduler, create a HostedCluster.
    2. Use nodes that are labeled with subnets separated by periods instead of commas.

Actual results:

    HostedCluster fails to roll out because router services are not deployed.

Expected results:

    HostedCluster provisions successfully.

Additional info:

https://github.com/openshift/hypershift/pull/3885

Bug OCPBUGS-27289: metrics-server should handle kubelet server CA rotation

View the Description View the linked PRs

Description of problem

Build02, a years old cluster currently running 4.15.0-ec.2 with TechPreviewNoUpgrade, has been Available=False for days:

$ oc get -o json clusteroperator monitoring | jq '.status.conditions[] | select(.type == "Available")'
{
  "lastTransitionTime": "2024-01-14T04:09:52Z",
  "message": "UpdatingMetricsServer: reconciling MetricsServer Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/metrics-server: context deadline exceeded",
  "reason": "UpdatingMetricsServerFailed",
  "status": "False",
  "type": "Available"
}

Both pods had been having CA trust issues. We deleted one pod, and it's replacement is happy:

$ oc -n openshift-monitoring get -l app.kubernetes.io/component=metrics-server pods
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-9cc8bfd56-dd5tx   1/1     Running   0          136m
metrics-server-9cc8bfd56-k2lpv   0/1     Running   0          36d

The young, happy pod has occasional node-removed noise, which is expected in this cluster with high levels of compute-node autoscaling:

$ oc -n openshift-monitoring logs --tail 3 metrics-server-9cc8bfd56-dd5tx
E0117 17:16:13.492646       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.33:10250/metrics/resource\": dial tcp 10.0.32.33:10250: connect: connection refused" node="build0-gstfj-ci-builds-worker-b-srjk5"
E0117 17:16:28.611052       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.33:10250/metrics/resource\": dial tcp 10.0.32.33:10250: connect: connection refused" node="build0-gstfj-ci-builds-worker-b-srjk5"
E0117 17:16:56.898453       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.33:10250/metrics/resource\": context deadline exceeded" node="build0-gstfj-ci-builds-worker-b-srjk5"

While the old, sad pod is complaining about unknown authorities:

$ oc -n openshift-monitoring logs --tail 3 metrics-server-9cc8bfd56-k2lpv
E0117 17:19:09.612161       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.0.3:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="build0-gstfj-m-2.c.openshift-ci-build-farm.internal"
E0117 17:19:09.620872       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.0.32.90:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate signed by unknown authority" node="build0-gstfj-ci-prowjobs-worker-b-cg7qd"
I0117 17:19:14.538837       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

More details in the Additional details section, but the timeline seems to have been something like:

2023-12-11, metrics-server-* pods come up, and are running happily, scraping kubelets with a CA trust store descended from openshift-config-managed's kubelet-serving-ca ConfigMap.
2024-01-02, a new openshift-kube-controller-manager-operator_csr-signer-signer@1704206554 is created.
2024-01-04, kubelets rotate their serving CA. Not entirely clear how this works yet outside of bootstrapping, but at least for bootstrapping it uses a CertificateSigningRequest, approved by cluster-machine-approver, and signed by the kubernetes.io/kubelet-serving signing component in the kube-controller-manager-* pods in the openshift-kube-controller-manager namespace.
2024-01-04, the csr-signer Secret in openshift-kube-controller-manager has the new openshift-kube-controller-manager-operator_csr-signer-signer@1704206554 issuing a certificate for kube-csr-signer_@1704338196.
The kubelet-serving-ca ConfigMap gets updated to include a CA for the new kube-csr-signer_@1704338196, signed by the new openshift-kube-controller-manager-operator_csr-signer-signer@1704206554.
Local /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt updated in metrics-server-* containers.
But metrics-server-* pods fail to notice the file change and reload /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt, so the existing pods do not trust the new kubelet server certs.
Mysterious time delay. Perhaps the monitoring operator does not notice sad metrics-server-* pods outside of things that trigger DeploymentRollout?
2024-01-14, monitoring ClusterOperator goes Available=False on UpdatingMetricsServerFailed.
2024-01-17, deleting one metrics-server-* pod triggers replacement-pod creation, and the replacement pod comes up fine.

So addressing the metrics-server /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt change detection should resolve this use-case. And triggering a container or pod restart would be an aggressive-but-sufficient mechanism, although loading the new data without rolling the process would be less invasive.

Version-Release number of selected component (if applicable)

4.15.0-ec.3, which has fast CA rotation, see discussion in API-1687.

How reproducible

Unclear.

Steps to Reproduce

Unclear.

Actual results

metrics-server pods having trouble with CA trust when attempting to scrape nodes.

Expected results

metrics-server pods successfully trusting kubelets when scraping nodes.

Additional details

The monitoring operator sets up the metrics server with --kubelet-certificate-authority=/etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt, which is the "Path to the CA to use to validate the Kubelet's serving certificates" and is mounted from the kubelet-serving-ca-bundle ConfigMap. But that mount point only contains openshift-kube-controller-manager-operator_csr-signer-signer@... CAs:

$ oc --as system:admin -n openshift-monitoring debug pod/metrics-server-9cc8bfd56-k2lpv -- cat /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt | while openssl x509 -noout -text; do :; done | grep '^Certificate:\|Issuer\|Subject:\|Not '
Starting pod/metrics-server-9cc8bfd56-k2lpv-debug-gtctn ...

Removing debug pod ...
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec  3 14:42:33 2023 GMT
            Not After : Feb  1 14:42:34 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec 20 03:16:35 2023 GMT
            Not After : Jan 19 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1703042196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  4 03:16:35 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1704338196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  2 14:42:34 2024 GMT
            Not After : Mar  2 14:42:35 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
unable to load certificate
137730753918272:error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE

While actual kubelets seem to be using certs signed by kube-csr-signer_@1704338196 (which is one of the Subjects in /etc/tls/kubelet-serving-ca-bundle/ca-bundle.crt):

$ oc get -o wide -l node-role.kubernetes.io/master= nodes
NAME                                                  STATUS   ROLES    AGE      VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
build0-gstfj-m-0.c.openshift-ci-build-farm.internal   Ready    master   3y240d   v1.28.3+20a5764   10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 415.92.202311271112-0 (Plow)   5.14.0-284.41.1.el9_2.x86_64   cri-o://1.28.2-2.rhaos4.15.gite7be4e1.el9
build0-gstfj-m-1.c.openshift-ci-build-farm.internal   Ready    master   3y240d   v1.28.3+20a5764   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 415.92.202311271112-0 (Plow)   5.14.0-284.41.1.el9_2.x86_64   cri-o://1.28.2-2.rhaos4.15.gite7be4e1.el9
build0-gstfj-m-2.c.openshift-ci-build-farm.internal   Ready    master   3y240d   v1.28.3+20a5764   10.0.0.3      <none>        Red Hat Enterprise Linux CoreOS 415.92.202311271112-0 (Plow)   5.14.0-284.41.1.el9_2.x86_64   cri-o://1.28.2-2.rhaos4.15.gite7be4e1.el9
$ oc --as system:admin -n openshift-monitoring debug pod/metrics-server-9cc8bfd56-k2lpv -- openssl s_client -connect 10.0.0.3:10250 -showcerts </dev/null
Starting pod/metrics-server-9cc8bfd56-k2lpv-debug-ksl2k ...
Can't use SSL_get_servername
depth=0 O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
verify return:1
CONNECTED(00000003)
---
Certificate chain
 0 s:O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
   i:CN = kube-csr-signer_@1704338196
-----BEGIN CERTIFICATE-----
MIIC5DCCAcygAwIBAgIQAbKVl+GS6s2H20EHAWl4WzANBgkqhkiG9w0BAQsFADAm
MSQwIgYDVQQDDBtrdWJlLWNzci1zaWduZXJfQDE3MDQzMzgxOTYwHhcNMjQwMTE3
MDMxNDMwWhcNMjQwMjAzMDMxNjM2WjBhMRUwEwYDVQQKEwxzeXN0ZW06bm9kZXMx
SDBGBgNVBAMTP3N5c3RlbTpub2RlOmJ1aWxkMC1nc3Rmai1tLTIuYy5vcGVuc2hp
ZnQtY2ktYnVpbGQtZmFybS5pbnRlcm5hbDBZMBMGByqGSM49AgEGCCqGSM49AwEH
A0IABFqT+UgohFAxJrGYQUeYsEhNB+ufFo14xYDedKBCeNzMhaC+5/I4UN1e1u2X
PH7J4ncmH+M/LXI7v+YfEIG7cH+jgZ0wgZowDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAU394ABuS2
9i0qss9AKk/mQ9lhJ88wRAYDVR0RBD0wO4IzYnVpbGQwLWdzdGZqLW0tMi5jLm9w
ZW5zaGlmdC1jaS1idWlsZC1mYXJtLmludGVybmFshwQKAAADMA0GCSqGSIb3DQEB
CwUAA4IBAQCiKelqlgK0OHFqDPdIR+RRdjXoCfFDa0JGCG0z60LYJV6Of5EPv0F/
vGZdM/TyGnPT80lnLCh2JGUvneWlzQEZ7LEOgXX8OrAobijiFqDZFlvVwvkwWNON
rfucLQWDFLHUf/yY0EfB0ZlM8Sz4XE8PYB6BXYvgmUIXS1qkV9eGWa6RPLsOnkkb
q/dTLE/tg8cz24IooDC8lmMt/wCBPgsq9AnORgNdZUdjCdh9DpDWCw0E4csSxlx2
H1qlH5TpTGKS8Ox9JAfdAU05p/mEhY9PEPSMfdvBZep1xazrZyQIN9ckR2+11Syw
JlbEJmapdSjIzuuKBakqHkDgoq4XN0KM
-----END CERTIFICATE-----
---
Server certificate
subject=O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal

issuer=CN = kube-csr-signer_@1704338196

---
Acceptable client certificate CA names
OU = openshift, CN = admin-kubeconfig-signer
CN = openshift-kube-controller-manager-operator_csr-signer-signer@1699022534
CN = kube-csr-signer_@1700450189
CN = kube-csr-signer_@1701746196
CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
CN = openshift-kube-apiserver-operator_kube-apiserver-to-kubelet-signer@1691004449
CN = openshift-kube-apiserver-operator_kube-control-plane-signer@1702234292
CN = openshift-kube-apiserver-operator_kube-control-plane-signer@1699642292
OU = openshift, CN = kubelet-bootstrap-kubeconfig-signer
CN = openshift-kube-apiserver-operator_node-system-admin-signer@1678905372
Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512:RSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA-PSS+SHA256:ECDSA+SHA256:Ed25519:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA384:ECDSA+SHA512
Peer signing digest: SHA256
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1902 bytes and written 383 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.3, Cipher is TLS_AES_128_GCM_SHA256
Server public key is 256 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 21 (unable to verify the first certificate)
---
DONE

Removing debug pod ...
$ openssl x509 -noout -text <<EOF 2>/dev/null
> -----BEGIN CERTIFICATE-----
MIIC5DCCAcygAwIBAgIQAbKVl+GS6s2H20EHAWl4WzANBgkqhkiG9w0BAQsFADAm
MSQwIgYDVQQDDBtrdWJlLWNzci1zaWduZXJfQDE3MDQzMzgxOTYwHhcNMjQwMTE3
MDMxNDMwWhcNMjQwMjAzMDMxNjM2WjBhMRUwEwYDVQQKEwxzeXN0ZW06bm9kZXMx
SDBGBgNVBAMTP3N5c3RlbTpub2RlOmJ1aWxkMC1nc3Rmai1tLTIuYy5vcGVuc2hp
ZnQtY2ktYnVpbGQtZmFybS5pbnRlcm5hbDBZMBMGByqGSM49AgEGCCqGSM49AwEH
A0IABFqT+UgohFAxJrGYQUeYsEhNB+ufFo14xYDedKBCeNzMhaC+5/I4UN1e1u2X
PH7J4ncmH+M/LXI7v+YfEIG7cH+jgZ0wgZowDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAU394ABuS2
9i0qss9AKk/mQ9lhJ88wRAYDVR0RBD0wO4IzYnVpbGQwLWdzdGZqLW0tMi5jLm9w
ZW5zaGlmdC1jaS1idWlsZC1mYXJtLmludGVybmFshwQKAAADMA0GCSqGSIb3DQEB
CwUAA4IBAQCiKelqlgK0OHFqDPdIR+RRdjXoCfFDa0JGCG0z60LYJV6Of5EPv0F/
vGZdM/TyGnPT80lnLCh2JGUvneWlzQEZ7LEOgXX8OrAobijiFqDZFlvVwvkwWNON
rfucLQWDFLHUf/yY0EfB0ZlM8Sz4XE8PYB6BXYvgmUIXS1qkV9eGWa6RPLsOnkkb
q/dTLE/tg8cz24IooDC8lmMt/wCBPgsq9AnORgNdZUdjCdh9DpDWCw0E4csSxlx2
H1qlH5TpTGKS8Ox9JAfdAU05p/mEhY9PEPSMfdvBZep1xazrZyQIN9ckR2+11Syw
JlbEJmapdSjIzuuKBakqHkDgoq4XN0KM
-----END CERTIFICATE-----
> EOF
...
        Issuer: CN = kube-csr-signer_@1704338196
        Validity
            Not Before: Jan 17 03:14:30 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: O = system:nodes, CN = system:node:build0-gstfj-m-2.c.openshift-ci-build-farm.internal
...

The monitoring operator populates the openshift-monitoring kubelet-serving-ca-bundle} ConfigMap using data from the openshift-config-managed kubelet-serving-ca ConfigMap, and that propagation is working, but does not contain the kube-csr-signer_ CA:

$ oc -n openshift-config-managed get -o json configmap kubelet-serving-ca | jq -r '.data["ca-bundle.crt"]' | while openssl x509 -noout -text; do :; done | grep '^Certificate:\|Issuer\|Subject:\|Not '
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec  3 14:42:33 2023 GMT
            Not After : Feb  1 14:42:34 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1701614554
            Not Before: Dec 20 03:16:35 2023 GMT
            Not After : Jan 19 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1703042196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  4 03:16:35 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: CN = kube-csr-signer_@1704338196
Certificate:
        Issuer: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
            Not Before: Jan  2 14:42:34 2024 GMT
            Not After : Mar  2 14:42:35 2024 GMT
        Subject: CN = openshift-kube-controller-manager-operator_csr-signer-signer@1704206554
unable to load certificate
140531510617408:error:0909006C:PEM routines:get_name:no start line:../crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
$ oc -n openshift-config-managed get -o json configmap kubelet-serving-ca | jq -r '.data["ca-bundle.crt"]' | sha1sum 
a32ab44dff8030c548087d70fea599b0d3fab8af  -
$ oc -n openshift-monitoring get -o json configmap kubelet-serving-ca-bundle | jq -r '.data["ca-bundle.crt"]' | sha1sum 
a32ab44dff8030c548087d70fea599b0d3fab8af  -

Flipping over to the kubelet side, nothing in the machine-config operator's template is jumping out at me as a key/cert pair for serving on 10250. The kubelet seems to set up server certs via serverTLSBootstrap: true. But we don't seem to set the beta RotateKubeletServerCertificate, so I'm not clear on how these are supposed to rotate on the kubelet side. But there are CSRs from kubelets requesting serving certs:

$ oc get certificatesigningrequests | grep 'NAME\|kubelet-serving'
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   REQUESTEDDURATION   CONDITION
csr-8stgd   51m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-xkdw2                           <none>              Approved,Issued
csr-blbjx   9m1s    kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-longtests-worker-b-5w9dz                        <none>              Approved,Issued
csr-ghxh5   64m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-sdwdn                           <none>              Approved,Issued
csr-hng85   33m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-longtests-worker-d-7d7h2                        <none>              Approved,Issued
csr-hvqxz   24m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-fp6wb                           <none>              Approved,Issued
csr-vc52m   50m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-xlmt6                           <none>              Approved,Issued
csr-vflcm   40m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-djpgq                           <none>              Approved,Issued
csr-xfr7d   51m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-8v4vk                           <none>              Approved,Issued
csr-zhzbs   51m     kubernetes.io/kubelet-serving                 system:node:build0-gstfj-ci-builds-worker-b-rqr68                           <none>              Approved,Issued
$ oc get -o json certificatesigningrequests csr-blbjx
{
    "apiVersion": "certificates.k8s.io/v1",
    "kind": "CertificateSigningRequest",
    "metadata": {
        "creationTimestamp": "2024-01-17T19:20:43Z",
        "generateName": "csr-",
        "name": "csr-blbjx",
        "resourceVersion": "4719586144",
        "uid": "5f12d236-3472-485f-8037-3896f51a809c"
    },
    "spec": {
        "groups": [
            "system:nodes",
            "system:authenticated"
        ],
        "request": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQlh6Q0NBUVFDQVFBd1ZqRVZNQk1HQTFVRUNoTU1jM2x6ZEdWdE9tNXZaR1Z6TVQwd093WURWUVFERXpSegplWE4wWlcwNmJtOWtaVHBpZFdsc1pEQXRaM04wWm1vdFkya3RiRzl1WjNSbGMzUnpMWGR2Y210bGNpMWlMVFYzCk9XUjZNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUV5Y0dhSDMvZ3F4ZHNZWkdmQXovTEpoZVgKd1o0Z1VRbjB6TlZUenJncHpvd1VPOGR6NTN4UUZTOTRibm40NldlZFg3Q2xidUpVSUpUN2pCblV1WEdnZktCTQpNRW9HQ1NxR1NJYjNEUUVKRGpFOU1Ec3dPUVlEVlIwUkJESXdNSUlvWW5WcGJHUXdMV2R6ZEdacUxXTnBMV3h2CmJtZDBaWE4wY3kxM2IzSnJaWEl0WWkwMWR6bGtlb2NFQ2dBZ0F6QUtCZ2dxaGtqT1BRUURBZ05KQURCR0FpRUEKMHlRVzZQOGtkeWw5ZEEzM3ppQTJjYXVJdlhidTVhczNXcUZLYWN2bi9NSUNJUURycEQyVEtScHJOU1I5dExKTQpjZ0ZpajN1dVNieVJBcEJ5NEE1QldEZm02UT09Ci0tLS0tRU5EIENFUlRJRklDQVRFIFJFUVVFU1QtLS0tLQo=",
        "signerName": "kubernetes.io/kubelet-serving",
        "usages": [
            "digital signature",
            "server auth"
        ],
        "username": "system:node:build0-gstfj-ci-longtests-worker-b-5w9dz"
    },
    "status": {
        "certificate": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN6ekNDQWJlZ0F3SUJBZ0lSQUlGZ1NUd0ovVUJLaE1hWlE4V01KcEl3RFFZSktvWklodmNOQVFFTEJRQXcKSmpFa01DSUdBMVVFQXd3YmEzVmlaUzFqYzNJdGMybG5ibVZ5WDBBeE56QTBNek00TVRrMk1CNFhEVEkwTURFeApOekU1TVRVME0xb1hEVEkwTURJd016QXpNVFl6Tmxvd1ZqRVZNQk1HQTFVRUNoTU1jM2x6ZEdWdE9tNXZaR1Z6Ck1UMHdPd1lEVlFRREV6UnplWE4wWlcwNmJtOWtaVHBpZFdsc1pEQXRaM04wWm1vdFkya3RiRzl1WjNSbGMzUnoKTFhkdmNtdGxjaTFpTFRWM09XUjZNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUV5Y0dhSDMvZwpxeGRzWVpHZkF6L0xKaGVYd1o0Z1VRbjB6TlZUenJncHpvd1VPOGR6NTN4UUZTOTRibm40NldlZFg3Q2xidUpVCklKVDdqQm5VdVhHZ2ZLT0JrakNCanpBT0JnTlZIUThCQWY4RUJBTUNCNEF3RXdZRFZSMGxCQXd3Q2dZSUt3WUIKQlFVSEF3RXdEQVlEVlIwVEFRSC9CQUl3QURBZkJnTlZIU01FR0RBV2dCVGYzZ0FHNUxiMkxTcXl6MEFxVCtaRAoyV0VuenpBNUJnTlZIUkVFTWpBd2dpaGlkV2xzWkRBdFozTjBabW90WTJrdGJHOXVaM1JsYzNSekxYZHZjbXRsCmNpMWlMVFYzT1dSNmh3UUtBQ0FETUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFBRE5ad0pMdkp4WWNta2RHV08KUm5ocC9rc3V6akJHQnVHbC9VTmF0RjZScml3eW9mdmpVNW5Kb0RFbGlLeHlDQ2wyL1d5VXl5a2hMSElBK1drOQoxZjRWajIrYmZFd0IwaGpuTndxQThudFFabS90TDhwalZ5ZzFXM0VwR2FvRjNsZzRybDA1cXBwcjVuM2l4WURJClFFY2ZuNmhQUnlKN056dlFCS0RwQ09lbU8yTFllcGhqbWZGY2h5VGRZVGU0aE9IOW9TWTNMdDdwQURIM2kzYzYKK3hpMDhhV09LZmhvT3IybTVBSFBVN0FkTjhpVUV0M0dsYzI0SGRTLzlLT05tT2E5RDBSSk9DMC8zWk5sKzcvNAoyZDlZbnYwaTZNaWI3OGxhNk5scFB0L2hmOWo5TlNnMDN4OFZYRVFtV21zN29xY1FWTHMxRHMvWVJ4VERqZFphCnEwMnIKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=",
        "conditions": [
            {
                "lastTransitionTime": "2024-01-17T19:20:43Z",
                "lastUpdateTime": "2024-01-17T19:20:43Z",
                "message": "This CSR was approved by the Node CSR Approver (cluster-machine-approver)",
                "reason": "NodeCSRApprove",
                "status": "True",
                "type": "Approved"
            }
        ]
    }
}
$ oc get -o json certificatesigningrequests csr-blbjx | jq -r '.status.certificate | @base64d' | openssl x509 -noout -text | grep '^Certificate:\|Issuer\|Subject:\|Not '
Certificate:
        Issuer: CN = kube-csr-signer_@1704338196
            Not Before: Jan 17 19:15:43 2024 GMT
            Not After : Feb  3 03:16:36 2024 GMT
        Subject: O = system:nodes, CN = system:node:build0-gstfj-ci-longtests-worker-b-5w9dz

So that's approved by cluster-machine-approver, but signerName: kubernetes.io/kubelet-serving is an upstream Kubernetes component documented here, and the signer is implemented by kube-controller-manager.

Bug OCPBUGS-36482: PrometheusOperatorRejectedResources should link its runbook

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36406~~. The following is the description of the original issue:
—

Description of problem

Seen in a 4.15.19 cluster, the PrometheusOperatorRejectedResources alert was firing, but did not link a runbook, despite the runbook existing since ~~MON-2358~~.

Version-Release number of selected component

Seen in 4.15.19, but likely applies to all versions where the PrometheusOperatorRejectedResources alert exists.

How reproducible

Every time.

Steps to Reproduce:

Check the cluster console at /monitoring/alertrules?rowFilter-alerting-rule-source=platform&name=PrometheusOperatorRejectedResources, and click through to the alert definition.

Actual results

No mention of runbooks.

Expected results

A Runbook section linking the runbook.

Additional info

I haven't dug into the upstream/downstream sync process, but the runbook information likely needs to at least show up here, although that may or may not be the root location for injecting our canonical runbook into the upstream-sourced alert.

https://github.com/openshift/cluster-monitoring-operator/pull/2404

Bug OCPBUGS-26116: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/225

Bug OCPBUGS-35565: [release-4.16] one OAuth.config.openshift.io item on Global Configuration page links to non-existing resource

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34986~~. The following is the description of the original issue:
—
Description of problem:

non-existing oauth.config.openshift.io resource is  listed on Global Configuration page

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-05-082646

How reproducible:

Always

Steps to Reproduce:

1. visit global configuration page /settings/cluster/globalconfig
2. check listed items on the page
3.

Actual results:

2. There are two OAuth.config.openshift.io entries, one is linking to /k8s/cluster/config.openshift.io~v1~OAuth/oauth-config, this will return 404: Not Found

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-06-05-082646   True        False         171m    Cluster version is 4.16.0-0.nightly-2024-06-05-082646

$ oc get oauth.config.openshift.io
NAME      AGE
cluster   3h26m

Expected results:

from CLI output we can see there is only one oauth.config.openshift.io resource, but we are showing one more 'oauth-config'  

Only one oauth.config.openshift.io resource should be listed

Additional info:

https://github.com/openshift/console/pull/13982

Bug OCPBUGS-31110: Failed to create RHCOS image when creating Azure infrastructure

View the Description View the linked PRs

Description of problem:

Failed to create RHCOS image when creating Azure infrastructure

Steps to Reproduce & actual results:

fxie-mac:hypershift fxie$ hypershift create infra azure --name $CLUSTER_NAME --azure-creds $HOME/.azure/osServicePrincipal.json --base-domain $BASE_DOMAIN --infra-id $INFRA_ID --location eastus --output-file $OUTPUT_INFRA_FILE
2024-03-20T14:26:23+08:00	INFO	Using credentials from file	{"path": "/Users/fxie/.azure/osServicePrincipal.json"}
2024-03-20T14:26:30+08:00	INFO	Successfully created resource group	{"name": "fxie-hcp-1-fxie-hcp-1-13639"}
2024-03-20T14:26:32+08:00	INFO	Successfully created managed identity	{"name": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourcegroups/fxie-hcp-1-fxie-hcp-1-13639/providers/Microsoft.ManagedIdentity/userAssignedIdentities/fxie-hcp-1-fxie-hcp-1-13639"}
2024-03-20T14:26:32+08:00	INFO	Assigning role to managed identity, this may take some time
2024-03-20T14:26:51+08:00	INFO	Successfully assigned contributor role to managed identity	{"name": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourcegroups/fxie-hcp-1-fxie-hcp-1-13639/providers/Microsoft.ManagedIdentity/userAssignedIdentities/fxie-hcp-1-fxie-hcp-1-13639"}
2024-03-20T14:26:55+08:00	INFO	Successfully created network security group	{"name": "fxie-hcp-1-fxie-hcp-1-13639-nsg"}
2024-03-20T14:27:01+08:00	INFO	Successfully created vnet	{"name": "fxie-hcp-1-fxie-hcp-1-13639"}
2024-03-20T14:27:35+08:00	INFO	Successfully created private DNS zone	{"name": "fxie-hcp-1-azurecluster.qe.azure.devcluster.openshift.com"}
2024-03-20T14:28:09+08:00	INFO	Successfully created private DNS zone link
2024-03-20T14:28:12+08:00	INFO	Successfully created public IP address for guest cluster egress load balancer
2024-03-20T14:28:15+08:00	INFO	Successfully created guest cluster egress load balancer
2024-03-20T14:28:37+08:00	INFO	Successfully created storage account	{"name": "clusterzw22c"}
2024-03-20T14:28:38+08:00	INFO	Successfully created blob container	{"name": "vhd"}
2024-03-20T14:28:38+08:00	ERROR	Failed to create infrastructure	{"error": "failed to create RHCOS image: the image source url must be from an azure blob storage, otherwise upload will fail with an `One of the request inputs is out of range` error"}
github.com/openshift/hypershift/cmd/infra/azure.NewCreateCommand.func2
	/Users/fxie/Projects/hypershift/cmd/infra/azure/create.go:114
github.com/spf13/cobra.(*Command).execute
	/Users/fxie/Projects/hypershift/vendor/github.com/spf13/cobra/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
	/Users/fxie/Projects/hypershift/vendor/github.com/spf13/cobra/command.go:1115
github.com/spf13/cobra.(*Command).Execute
	/Users/fxie/Projects/hypershift/vendor/github.com/spf13/cobra/command.go:1039
github.com/spf13/cobra.(*Command).ExecuteContext
	/Users/fxie/Projects/hypershift/vendor/github.com/spf13/cobra/command.go:1032
main.main
	/Users/fxie/Projects/hypershift/main.go:78
runtime.main
	/usr/local/go/src/runtime/proc.go:267
Error: failed to create RHCOS image: the image source url must be from an azure blob storage, otherwise upload will fail with an `One of the request inputs is out of range` error
failed to create RHCOS image: the image source url must be from an azure blob storage, otherwise upload will fail with an `One of the request inputs is out of range` error

https://github.com/openshift/hypershift/pull/3782

Bug OCPBUGS-24921: Update 4.16 ose-ibm-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/61

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/61

Bug OCPBUGS-24996: Update 4.16 ose-csi-snapshot-validation-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/129

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25002: Update 4.16 ose-aws-pod-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-pod-identity-webhook/pull/180

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-25372: vsphere-problem-detector-operator pod CrashLoopBackOff with panic

View the Description View the linked PRs

Description of problem:

Find in QE's CI (with vsphere-agent profile), storage CO is not avaliable and vsphere-problem-detector-operator pod is CrashLoopBackOff with panic.
(Find must-garther here: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-agent-disconnected-ha-f14/1734850632575094784/artifacts/vsphere-agent-disconnected-ha-f14/gather-must-gather/)


The storage CO reports "unable to find VM by UUID":
  - lastTransitionTime: "2023-12-13T09:15:27Z"
    message: "VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable:
      unable to find VM ci-op-782gwsbd-b3d4e-master-2 by UUID \nVSphereProblemDetectorDeploymentControllerAvailable:
      Waiting for Deployment"
    reason: VSphereCSIDriverOperatorCR_VMwareVSphereController_vcenter_api_error::VSphereProblemDetectorDeploymentController_Deploying
    status: "False"
    type: Available
(But I did not see the "unable to find VM by UUID" from vsphere-problem-detector-operator log in must-gather)


The vsphere-problem-detector-operator log:
2023-12-13T10:10:56.620216117Z I1213 10:10:56.620159       1 vsphere_check.go:149] Connected to vcenter.devqe.ibmc.devcluster.openshift.com as ci_user_01@devqe.ibmc.devcluster.openshift.com
2023-12-13T10:10:56.625161719Z I1213 10:10:56.625108       1 vsphere_check.go:271] CountVolumeTypes passed
2023-12-13T10:10:56.625291631Z I1213 10:10:56.625258       1 zones.go:124] Checking tags for multi-zone support.
2023-12-13T10:10:56.625449771Z I1213 10:10:56.625433       1 zones.go:202] No FailureDomains configured.  Skipping check.
2023-12-13T10:10:56.625497726Z I1213 10:10:56.625487       1 vsphere_check.go:271] CheckZoneTags passed
2023-12-13T10:10:56.625531795Z I1213 10:10:56.625522       1 info.go:44] vCenter version is 8.0.2, apiVersion is 8.0.2.0 and build is 22617221
2023-12-13T10:10:56.625562833Z I1213 10:10:56.625555       1 vsphere_check.go:271] ClusterInfo passed
2023-12-13T10:10:56.625603236Z I1213 10:10:56.625594       1 datastore.go:312] checking datastore /DEVQEdatacenter/datastore/vsanDatastore for permissions
2023-12-13T10:10:56.669205822Z panic: runtime error: invalid memory address or nil pointer dereference
2023-12-13T10:10:56.669338411Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x23096cb]
2023-12-13T10:10:56.669565413Z 
2023-12-13T10:10:56.669591144Z goroutine 550 [running]:
2023-12-13T10:10:56.669838383Z github.com/openshift/vsphere-problem-detector/pkg/operator.getVM(0xc0005da6c0, 0xc0002d3b80)
2023-12-13T10:10:56.669991749Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:319 +0x3eb
2023-12-13T10:10:56.670212441Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*vSphereChecker).enqueueSingleNodeChecks.func1()
2023-12-13T10:10:56.670289644Z     github.com/openshift/vsphere-problem-detector/pkg/operator/vsphere_check.go:238 +0x55
2023-12-13T10:10:56.670490453Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker.func1(0xc000c88760?, 0x0?)
2023-12-13T10:10:56.670702592Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:40 +0x55
2023-12-13T10:10:56.671142070Z github.com/openshift/vsphere-problem-detector/pkg/operator.(*CheckThreadPool).worker(0xc000c78660, 0xc000c887a0?)
2023-12-13T10:10:56.671331852Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:41 +0xe7
2023-12-13T10:10:56.671529761Z github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool.func1()
2023-12-13T10:10:56.671589925Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:28 +0x25
2023-12-13T10:10:56.671776328Z created by github.com/openshift/vsphere-problem-detector/pkg/operator.NewCheckThreadPool
2023-12-13T10:10:56.671847478Z     github.com/openshift/vsphere-problem-detector/pkg/operator/pool.go:27 +0x73

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

Steps to Reproduce:

    1. See description
    2.
    3.

Actual results:

   vpd is panic

Expected results:

   vpd should not panic

Additional info:

   I guess it is privileges issue, but our pod should not be panic.

https://github.com/openshift/vsphere-problem-detector/pull/148

Bug OCPBUGS-27140: Update 4.16 operator-lifecycle-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/658

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/658

Bug OCPBUGS-35486: IPv6 ingress VIP not configured in keepalived on vSphere Dual-stack

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34706~~. The following is the description of the original issue:
—
Description of problem:

Regression of ~~OCPBUGS-12739~~

level=warning msg="Couldn't unmarshall OVN annotations: ''. Skipping." err="unexpected end of JSON input"

Upstream OVN changed the node annotation from "k8s.ovn.org/host-addresses" to "k8s.ovn.org/host-cidrs" in OpenShift 4.14

https://github.com/ovn-org/ovn-kubernetes/pull/3915

We might need to fix baremetal-runtimecfg

diff --git a/pkg/config/node.go b/pkg/config/node.go
index 491dd4f..078ad77 100644
--- a/pkg/config/node.go
+++ b/pkg/config/node.go
@@ -367,10 +367,10 @@ func getNodeIpForRequestedIpStack(node v1.Node, filterIps []string, machineNetwo
                log.Debugf("For node %s can't find address using NodeInternalIP. Fallback to OVN annotation.", node.Name)
 
                var ovnHostAddresses []string
-               if err := json.Unmarshal([]byte(node.Annotations["k8s.ovn.org/host-addresses"]), &ovnHostAddresses); err != nil {
+               if err := json.Unmarshal([]byte(node.Annotations["k8s.ovn.org/host-cidrs"]), &ovnHostAddresses); err != nil {
                        log.WithFields(logrus.Fields{
                                "err": err,
-                       }).Warnf("Couldn't unmarshall OVN annotations: '%s'. Skipping.", node.Annotations["k8s.ovn.org/host-addresses"])
+                       }).Warnf("Couldn't unmarshall OVN annotations: '%s'. Skipping.", node.Annotations["k8s.ovn.org/host-cidrs"])
                }

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-30-130713

How reproducible:

Frequent

Steps to Reproduce:

    1. Deploy vsphere IPv4 cluster
    2. Convert to Dualstack IPv4/IPv6
    3. Add machine network and IPv6 apiServerInternalIPs and ingressIPs
    4. Check keepalived.conf
for f in $(oc get pods -n openshift-vsphere-infra -l app=vsphere-infra-vrrp --no-headers -o custom-columns=N:.metadata.name  ) ; do oc -n openshift-vsphere-infra exec -c keepalived $f -- cat /etc/keepalived/keepalived.conf | tee $f-keepalived.conf ; done

Actual results:

IPv6 VIP is not in keepalived.conf

Expected results:
Something like:

vrrp_instance rbrattai_INGRESS_1 {
    state BACKUP
    interface br-ex
    virtual_router_id 129
    priority 20
    advert_int 1

    unicast_src_ip fd65:a1a8:60ad:271c::cc
    unicast_peer {
        fd65:a1a8:60ad:271c:9af:16a9:cb4f:d75c
        fd65:a1a8:60ad:271c:86ec:8104:1bc2:ab12
        fd65:a1a8:60ad:271c:5f93:c9cf:95f:9a6d
        fd65:a1a8:60ad:271c:bb4:de9e:6d58:89e7
        fd65:a1a8:60ad:271c:3072:2921:890:9263
    }
...
    virtual_ipaddress {
        fd65:a1a8:60ad:271c::1117/128
    }
...
}

https://github.com/openshift/baremetal-runtimecfg/pull/318

Bug OCPBUGS-35923: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-27162: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4146

Bug OCPBUGS-32469: NTO operand reloads TuneD unnecessarily twice

View the Description View the linked PRs

Description of problem:


TuneD unnecessarily restarts twice when both current TuneD profile changes and when a new TuneD profile is selected.

Version-Release number of selected component (if applicable):


All NTO versions are affected.

How reproducible:


Depends on the order of k8s object updates (races), but nearly 100% reproducible.

Steps to Reproduce:

    1. Install SNO 
    2. Label your SNO node with label "profile"
    3. Create the following CR:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile 1
      include=openshift-node
      [sysctl]
      kernel.pty.max=4096
    name: openshift-profile-1
  - data: |
      [main]
      summary=Custom OpenShift profile 2
      include=openshift-node
      [sysctl]
      kernel.pty.max=8192
    name: openshift-profile-2
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile-1

    4. Apply the following CR:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile 1
      include=openshift-node
      [sysctl]
      kernel.pty.max=8192
    name: openshift-profile-1
  - data: |
      [main]
      summary=Custom OpenShift profile 2
      include=openshift-node
      [sysctl]
      kernel.pty.max=8192
    name: openshift-profile-2
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile-2

Actual results:


You'll see two restarts/applications of the openshift-profile-1

$ cat tuned-operand.log |grep "profile-1' applied"
2024-04-19 06:10:54,685 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-1' applied
2024-04-19 06:13:23,627 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-1' applied

Expected results:


Only 1 application of openshift-profile-1:

$ cat tuned-operand.log |grep "profile-1' applied"
2024-04-19 07:20:31,600 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-1' applied

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1036

Bug OCPBUGS-38196: Installer support for GCP deployments using short-lived credential formats

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37821~~. The following is the description of the original issue:
—
Openshift Dedicated is in the process of developing an offering of GCP clusters that uses only short-lived credentials from the end user. For these clusters to be deployed, the pod running the Openshift Installer needs to function with GCP credentials that fit the short-lived credential formats. This worked in prior Installer versions, such as 4.14, but was not an explicit requirement.

https://github.com/openshift/installer/pull/8818

Bug OCPBUGS-29732: Resolv-prepender doesn't create signal file on reboot

View the Description View the linked PRs

Description of problem:

There is a problem with the logic change in https://github.com/openshift/machine-config-operator/pull/4196 that is causing Kubelet to fail to start after a reboot on OpenShiftSDN deployments. This is currently breaking all of the v4 metal jobs.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. Deploy baremetal cluster with OpenShiftSDN
    2.
    3.

Actual results:

    Nodes fail to join cluster

Expected results:

    Successful cluster deployment

Additional info:

https://github.com/openshift/machine-config-operator/pull/4207

Bug OCPBUGS-30951: CSI topology can be disabled even though the env is compatible

View the Description View the linked PRs

Description of problem: In an environment with the following zones, topology was disabled while it should be enabled by default

$ openstack availability zone list --compute
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| AZ-0      | available   |
| AZ-1      | available   |
| AZ-2      | available   |
+-----------+-------------+

$ openstack availability zone list --volume
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| nova      | available   |
| AZ-0      | available   |
| AZ-1      | available   |
| AZ-2      | available   |
+-----------+-------------+

We have a check that verify the number of zones is identical for compute and volumes. This check should be removed. We want however to ensure that for every compute zone we have a matching volume zone.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/164

Bug OCPBUGS-28203: Power VS: Installer create workspace is not instantly ready for PER configuration

View the Description View the linked PRs

Description of problem:

    If you allow the installer to provision a Power VS Workspace instead of bringing your own, it can sometimes fail when creating a network. This is because Power Edge Router can sometimes take up to a minute to configure.

Version-Release number of selected component (if applicable):

How reproducible:

    Infrequent, but will probably hit it within 50-100 runs

Steps to Reproduce:

    1. Install on Power VS with IPI with serviceInstanceGUID not set in the install-config.yaml
    2. Occasionally you'll observe a failure due to the workspace not being ready for networks

Actual results:

    Failure

Expected results:

    Success

Additional info:

    Not consistently reproducible

https://github.com/openshift/installer/pull/7889

Story TRT-1563: 4.16 Minor Upgrades Blocked

View the Description View the linked PRs

Began permafailing somewhere in https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-03-14-214308

Example: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1768395075936587776

{ fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:199]: cluster is still being upgraded: registry.build02.ci.openshift.org/ci-op-588yb1d9/release@sha256:98f570afbb8492d9b393eecc929266e987ba75088af72b234b81d2702d63f75e Ginkgo exit error 1: exit with code 1}

{Cluster did not complete upgrade: timed out waiting for the condition: Could not update customresourcedefinition "infrastructures.config.openshift.io" (48 of 887): the object is invalid, possibly due to local cluster configuration }

We suspect the latter message implicates https://github.com/openshift/api/pull/1802 and a revert is open now.

Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1710501463301079

https://github.com/openshift/api/pull/1810

Bug OCPBUGS-24970: Update 4.16 ose-machine-api-provider-gcp-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/73

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/73

Bug OCPBUGS-26937: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/insights-operator/pull/899

Bug OCPBUGS-30518: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-kubevirt/pull/41

Bug OCPBUGS-22923: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2048

Bug OCPBUGS-39323: [Backport-4.16] co/ingress status cannot reflect the real condition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39220~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37491. The following is the description of the original issue:
—
Description of problem:

co/ingress is always good even operator pod log error:

2024-07-24T06:42:09.580Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Version-Release number of selected component (if applicable):

    4.17.0-0.nightly-2024-07-20-191204

How reproducible:

    100%

Steps to Reproduce:

    1. install AWS cluster
    2. update ingresscontroller/default and adding   "endpointPublishingStrategy.loadBalancer.allowedSourceRanges", eg

spec:
  endpointPublishingStrategy:
    loadBalancer:
      allowedSourceRanges:
      - 1.1.1.2/32

    3. above setting drop most traffic to LB, so some operator degraded

Actual results:

    co/authentication and console degraded but co/ingress is still good

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-aws.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 
console                                    4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 
ingress                                    4.17.0-0.nightly-2024-07-20-191204   True        False         False      3h58m   


check the ingress operator log and see:

2024-07-24T06:59:09.588Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Expected results:

    co/ingress status should reflect the real condition timely

Additional info:

    even co/ingress status can be updated in some scenarios, but it is always less sensitive than authentication and console, we always rely on authentication/console to know the route healthy, the purpose of ingress canary route becomes meaningless.

https://github.com/openshift/cluster-ingress-operator/pull/1138

Task OKD-213: installer: Reference `stream-coreos` instead of `centos-stream-coreos-9`

View the Description View the linked PRs

Align with the version-less-ness of `rhel-coreos` and `fedora-coreos` and shorten the overall tag.

Both tags are currently aliases, `centos-stream-coreos-9` will be removed in the future.

https://github.com/openshift/installer/pull/8356

Story CONSOLE-3965: i18n upload/download routine task for version-4.16/ sprint-248

View the Description View the linked PRs

The story is to track i18n upload/download routine tasks which are perform every sprint.

A.C.

Upload strings to Memosource at the start of the sprint and reach out to localization team

Download translated strings from Memsource when it is ready

Review the translated strings and open a pull request

Open a followup story for next sprint

https://github.com/openshift/console/pull/13652

Bug OCPBUGS-28585: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13689

Bug OCPBUGS-24835: Update 4.16 ose-network-interface-bond-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/bond-cni/pull/60

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/bond-cni/pull/60

Bug OCPBUGS-30584: Console window refreshes every time a card is clicked in operator hub

View the Description View the linked PRs

Description of problem:

Whenever I click on a card in the operator hub and developer hub the console window refreshes.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Every time

Steps to Reproduce:

    1. Go to operator hub or developer hub
    2.  Select any card

Actual results:

Window refreshes

Expected results:

The window should not refresh and show the side panel for the card

Additional info:

https://github.com/openshift/console/pull/13693

Bug MGMT-17775: Patch manifests apply needs to handle file extensions correctly

View the Description View the linked PRs

Functionality around this is still inconsistent.

The correct format for a patch is `something.patch_something_else`

Valid patch filename examples would be

`something.patch`
`something.patch_something_else`
`something.patch.patch_something`

Invalid patch filename examples would be

`something.patch.something`
`something.patch.something.else`

Code and validation needs to be consistent in how this is respected.

https://github.com/openshift/assisted-service/pull/6289

Bug OCPBUGS-25591: Update 4.16 ose-machine-api-provider-azure-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/90

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/90

Bug OCPBUGS-41845: Slow network causes metal IPI bootstrap to fail

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41500~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39081. The following is the description of the original issue:
—
If the network to the bootstrap VM is slow, the extract-machine-os.service can time out (after 180s). If this happens, it will be restarted but services that depend on it (like ironic) will never be started even once it succeeds. systemd added support for Restart:on-failure for Type:oneshot services, but they still don't behave the same way as other types of services.

This can be simulated in dev-scripts by doing:

sudo tc qdisc add dev ostestbm root netem rate 33Mbit

https://github.com/openshift/installer/pull/8997

Bug OCPBUGS-24936: Update 4.16 ose-kubevirt-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-kubevirt/pull/30

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-kubevirt/pull/30

Bug OCPBUGS-26951: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4143

Bug OCPBUGS-25753: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4102

Bug OCPBUGS-34141: UI inconsistency in topology when application grouping is collapsed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33090~~. The following is the description of the original issue:
—
Description of problem:

When application grouping is unchecked in display filters under the expand section the topology display is distorted and Application name is also missing.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Have some deployments
    2. In topology unselect the application grouping in the display filter 
    3.

Actual results:

Topology shows distorted UI and Application name is missing.

Expected results:

UI should be in the correct condition and Apllication name should present.

Additional info:

Screenshot:

https://drive.google.com/file/d/1z80qLrr5v-K8ZFDa3P-n7SoDMaFtuxI7/view?usp=sharing

https://github.com/openshift/console/pull/13883

Bug OCPBUGS-39415: CI doesn't reflect software used during tests

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39414~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-30811. The following is the description of the original issue:
—
Description of problem:

On CI all the software for openstack and ansible related pieces are taken from pip and ansible-glalaxy instead of OS repository.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8995

Story NE-1490: Update go to version 1.20 in cluster-ingress-operatror

View the Description View the linked PRs

Split go version update out of PR https://github.com/openshift/cluster-ingress-operator/pull/999.

https://github.com/openshift/cluster-ingress-operator/pull/1012

Bug OCPBUGS-25340: Update 4.16 ose-openstack-cinder-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/149

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-35044: aws: do not require s3:Delete* perms if `preserveBootstrapIgnition` is set

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33662~~. The following is the description of the original issue:
—
Description of problem:

    We should not require the s3:DeleteObject permission for installs when the `preserveBootstrapIgnition` option is set in the install-config.

Version-Release number of selected component (if applicable):

    4.14+

How reproducible:

    always

Steps to Reproduce:

    1. Use an account without the permission
    2. Set `preserveBootstrapIgnition: true` in the install-config.yaml
    3. Try to deploy a cluster

Actual results:

INFO Credentials loaded from the "denys3" profile in file "/home/cloud-user/.aws/credentials"
INFO Consuming Install Config from target directory
WARNING Action not allowed with tested creds          action=s3:DeleteBucket
WARNING Action not allowed with tested creds          action=s3:DeleteObject
WARNING Action not allowed with tested creds          action=s3:DeleteObject
WARNING Tested creds not able to perform all requested actions
FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: current credentials insufficient for performing cluster installation

Expected results:

    No permission errors.

Additional info:

https://github.com/openshift/installer/pull/8547

Bug OCPBUGS-25541: Update 4.16 ose-gcp-pd-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver-operator/pull/106

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/106

Bug OCPBUGS-25564: Update 4.16 ose-cluster-storage-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-storage-operator/pull/435

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-storage-operator/pull/435

Bug OCPBUGS-32484: Chart template should limit tooltip width

View the Description View the linked PRs

Two quality-of-life improvements for e2e charts:

wrap tooltips in e2e chart to make sure that tooltips don't overflow X when a long label is used.
copy segment label contents to clipboard when its clicked

https://github.com/openshift/origin/pull/27965

Bug OCPBUGS-32041: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-30600~~.

https://github.com/openshift/installer/pull/8256

Bug OCPBUGS-42012: Systemd Fails to Parse Multiline EC Keys

View the Description View the linked PRs

Description of problem:

Multiline EC private and public keys are causing systemd services to fail during cluster installations. While this issue does not result in a complete cluster failure, it generates warnings in the `journalctl` logs, which could confuse users when diagnosing installation issues. The root cause is systemd's inability to properly parse multiline keys, leading to service crashes and unnecessary log noise. This should be addressed to improve the clarity of logs and prevent misleading warnings during the cluster setup process.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Create a 4.16 cluster using ABI
    2. Observ the journaltctl logs
    3. You will see warning/errors mentioning the EC keys(EC_PUBLIC_KEY_PEM) are not parsed correctly

Actual results:

Parsing errors related to EC keys

Expected results:

    1. No parsing errors related to EC keys. Example: agent-register-infraenv.service: Ignoring invalid environment assignment
     2. No multiline public key in /usr/local/share/assisted-service/assisted-service.env.  Public key should be base64 encoded

Additional info:

https://github.com/openshift/installer/pull/9037

Bug OCPBUGS-43035: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2528

Bug OCPBUGS-22664: [IPI] coredns-monitor continuously reporting "Failed to read ip from file /run/nodeip-configuration/ipv4"

View the Description View the linked PRs

Description of problem:

After installing an OpenShift IPI vSPhere cluter the coredns-monitor containers in the "openshift-vsphere-infra" namespace continuously report the message: "Failed to read ip from file /run/nodeip-configuration/ipv4" error="open /run/nodeip-configuration/ipv4: no such file or directory". The file "/run/nodeip-configuration/ipv4" present on the nodes is not actually moutned on the coredns pods. Apparently doesn't look to have any impact on the functionality of the cluster, but having a "failed" message on the container can triggers allarm or reserach for problem in the cluster.

Version-Release number of selected component (if applicable):

Any 4.12, 4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift IPI vSphere cluster
2. Wait forthe installation to complete
3. Read the logs of any coredns-monitor container in the "openshift-vsphere-infra" namespace

Actual results:

coredns-monitor continuously report the failed message, mesleading a cluster administartor for searching if there is a real issue.

Expected results:

coredns-monitor should not report this failed message if is not needed to fix it.

Additional info:

The same issue happens in Baremetal IPI clusters.

https://github.com/openshift/machine-config-operator/pull/4058

Bug OCPBUGS-35904: [4.16] - Data is not being loaded in Prometheus dashboard

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32696~~. The following is the description of the original issue:
—
Description of problem:

In Openshift web console, Dashboards tab, data is not getting loaded for "Prometheus/Overview" Dashboard

Version-Release number of selected component (if applicable):

4.16.0-ec.5

How reproducible:

OCP 4.16.0-ec.5 cluster deployed on Power using UPI installer

Steps to Reproduce:

1. Deploy 4.16.0-ec.5 cluster using UPI installer
2. Login to web console  
3. Select "Dashboards" panel under "Observe" tab
4. Select "Prometheus/Overview" from the "Dashboard" drop down

Actual results:

Data/graphs are not getting loaded. "No datapoints found." message is being displayed in all panels

Expected results:

Data/Graphs should be displayed

Additional info:

Screenshots and must-gather.log are available at 
https://drive.google.com/drive/folders/1XnotzYBC_UDN97j_LNVygwrc77Tmmbtx?usp=drive_link     


Status of Prometheus pods:

[root@ha-416-sajam-bastion-0 ~]# oc get pods -n openshift-monitoring | grep prometheus
prometheus-adapter-dc7f96748-mczvq                       1/1     Running   0          3h18m
prometheus-adapter-dc7f96748-vl4n8                       1/1     Running   0          3h18m
prometheus-k8s-0                                         6/6     Running   0          7d2h
prometheus-k8s-1                                         6/6     Running   0          7d2h
prometheus-operator-677d4c87bd-8prnx                     2/2     Running   0          7d2h
prometheus-operator-admission-webhook-54549595bb-gp9bw   1/1     Running   0          7d3h
prometheus-operator-admission-webhook-54549595bb-lsb2p   1/1     Running   0          7d3h
[root@ha-416-sajam-bastion-0 ~]#

Logs of Prometheus pods are available at https://drive.google.com/drive/folders/13DhLsQYneYpouuSsxYJ4VFhVrdJfQx8P?usp=drive_link

https://github.com/openshift/cluster-monitoring-operator/pull/2391

Bug OCPBUGS-35042: The PinnedImageSet controller can panic on techpreview clusters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34959~~. The following is the description of the original issue:
—
Description of problem:

The tech preview jobs can sometimes fail: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial/1787262709813743616

It seems early on the pinnedimageset controller can panic: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial/1787262709813743616/artifacts/e2e-vsphere-ovn-techpreview-serial/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-controller-66559c9856-58g4w_machine-config-controller_previous.log

Although it is fine on future syncs: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial/1787262709813743616/artifacts/e2e-vsphere-ovn-techpreview-serial/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-controller-66559c9856-58g4w_machine-config-controller.log

Version-Release number of selected component (if applicable):

4.16.0 techpreview only

How reproducible:

Unsure

Steps to Reproduce:

See CI

Actual results:

Expected results:

Don't panic

Additional info:

https://github.com/openshift/machine-config-operator/pull/4394

Bug OCPBUGS-37607: When vsphere bootstrap fails ipv6 address is used over ipv4 in gather

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37427~~. The following is the description of the original issue:
—
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/8758/pull-ci-openshift-installer-master-e2e-vsphere-ovn-zones/1815430759268225024/artifacts/e2e-vsphere-ovn-zones/ipi-install-install/artifacts/.openshift_install-1721673823.log

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8758/pull-ci-openshift-installer-master-e2e-vsphere-ovn-zones/1815430759268225024

https://github.com/openshift/installer/pull/8778

Bug OCPBUGS-19303: OKD: Agent-based Installer is broken on OKD/FCOS

View the Description View the linked PRs

Description of problem:

OKD/FCOS uses FCOS for its bootimage which lacks several tools and services such as oc and crio that the rendezvous host of the Agent-based Installer needs to set up a bootstrap control plane.

Version-Release number of selected component (if applicable):

4.13.0
4.14.0
4.15.0

https://github.com/openshift/installer/pull/7484

Bug MGMT-17354: With SNO dual-stack hub ironic-python-agent config uses IPv4 address but spoke cluster is IPv6 only

View the Description View the linked PRs

Description of the problem:
Environment: ACM running on a SNO. ACM is dual stack, but spoke cluster is IPv6 only

Attempting to deploy the spoke cluster via ZTP fails with proxy errors appearing in the ironic-python-agent logs:

Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 CRITICAL ironic-python-agent [-] Unhandled error: requests.exceptions.ProxyError: HTTPSConnectionPool(host='10.240.92.11', port=5050): Max retries exceeded with url: /v1/continue (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent Traceback (most recent call last):
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 696, in urlopen
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent self._prepare_proxy(conn)
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 964, in _prepare_proxy
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent conn.connect()
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 366, in connect
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent self._tunnel()
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib64/python3.9/http/client.py", line 930, in _tunnel
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent raise OSError(f"Tunnel connection failed:")
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent OSError: Tunnel connection failed: 403 Forbidden
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent Traceback (most recent call last):
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent resp = conn.urlopen(
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent retries = retries.increment(
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent raise MaxRetryError(_pool, url, error or ResponseError(cause))
Mar 26 19:11:53 sancrvdu3.cran-openshift.bete.ericy.com podman[2825]: 2024-03-26 19:11:53.361 1 ERROR ironic-python-agent urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.240.92.11', port=5050): Max retries exceeded with url: /v1/continue (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))

How reproducible:
Always

Steps to reproduce:

1. Deploy ACM on a dual-stack SNO

2. Configure ACM for ZTP/GitOps

3. Use ZTP to deploy a spoke IPv6 only SNO

Actual results:
Proxy errors when the ironic agent attempts to communicate with ACM. The ironic-python-agent.conf incorrectly specifies the IPv4 endpoint:

$ cat /etc/ironic-python-agent.conf
[DEFAULT]
api_url = https://192.168.92.11:6385
inspection_callback_url = https://192.168.92.11:5050/v1/continue
insecure = True
enable_vlan_interfaces = all

Expected results:
Spoke cluster is successfully deployed via ZTP.

https://github.com/openshift/assisted-service/pull/6113

Bug OCPBUGS-29969: ART requests updates to 4.16 image golang-github-prometheus-alertmanager-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/87

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-alertmanager/pull/88

Bug OCPBUGS-24979: Update 4.16 ose-cluster-capi-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/150

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/150

Bug OCPBUGS-30232: dualStack HostPrefix validation failures for non-(sdn/ovn) plugins

View the Description View the linked PRs

This is a regression due to the fix for https://issues.redhat.com/browse/OCPBUGS-23069.

When using dual-stack networks with networks other than OVN or SDN a validation failure results. For example when using this networking config:

networking:
  clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 25
    - cidr: fd01::/48
      hostPrefix: 64
  networkType: Calico

The following error will be returned:

{
  "id": "network-prefix-valid",
  "status": "failure",
  "message": "Unexpected status ValidationError"
},

When the clusterNetwork prefixes are removed the following error will result:

{
  "id": "network-prefix-valid",
  "status": "failure",
  "message": "Invalid Cluster Network prefix: Host prefix, now 0, must be a positive integer."
},

https://github.com/openshift/assisted-service/pull/6054

Bug OCPBUGS-35224: PowerVS: Destroy DHCP in ERROR state

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35039~~. The following is the description of the original issue:
—
Description of problem:

If there was no DHCP Network Name, then the destroy code would skip deleting the DHCP resource.  Now we add a test to see if the DHCP backing VM is in ERROR state.  And, if so, delete it.

https://github.com/openshift/installer/pull/8561

Bug OCPBUGS-25054: Update 4.16 ose-must-gather-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/must-gather/pull/395

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/must-gather/pull/395

Bug OCPBUGS-27507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7942

Bug OCPBUGS-30538: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-vsphere/pull/67

Story HOSTEDCP-1438: Preserve container resources for more hosted control plane components

View the Description View the linked PRs

This is an extension of https://issues.redhat.com/browse/HOSTEDCP-190, in which we are adding container resource preservation to more hosted control plane components.

https://github.com/openshift/hypershift/pull/3120

Bug OCPBUGS-29981: ART requests updates to 4.16 image golang-github-prometheus-prometheus-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus/pull/195

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus/pull/197

Bug OCPBUGS-35531: [GCP CAPI install] the optional "kmsKeyServiceAccount" is demanded for controlPlane unexpectedly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35400~~. The following is the description of the original issue:
—
Description of problem:

without specifying "kmsKeyServiceAccount" for controlPlane leads to creating bootstrap and control-plane machines failure

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-multi-2024-06-12-211551

How reproducible:

Always

Steps to Reproduce:

1. "create install-config" and then insert disk encryption settings, but not set "kmsKeyServiceAccount" for controlPlane (see [2])
2. "create cluster" (see [3])

Actual results:

"create cluster" failed with below error: 

ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to create control-plane manifest: GCPMachine.infrastructure.cluster.x-k8s.io "jiwei-0613d-capi-84z69-bootstrap" is invalid: spec.rootDiskEncryptionKey.kmsKeyServiceAccount: Invalid value: "": spec.rootDiskEncryptionKey.kmsKeyServiceAccount in body should match '[-_[A-Za-z0-9]+@[-_[A-Za-z0-9]+.iam.gserviceaccount.com

Expected results:

Installation should succeed.

Additional info:

FYI the QE test case: 

OCP-61160 - [IPI-on-GCP] install cluster with different custom managed keys for control-plane and compute nodes https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-61160

https://github.com/openshift/installer/pull/8613

Bug OCPBUGS-25362: Set the correct kubelet wrapper selinux permissions within MCO

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4074

Bug OCPBUGS-25699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/899

Bug OCPBUGS-29577: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/194

Bug OCPBUGS-29690: haproxy oom - troubleshoot process

View the Description View the linked PRs

Description of problem:

    Router are restarting due to memory issues

Version-Release number of selected component (if applicable):

    OCP 4.12.45

How reproducible:

    not easy

Router restart due to memory issues:
~~~
3h40m       Warning   ProbeError   pod/router-default-56c9f67f66-j8xwn                        Readiness probe error: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
3h40m       Warning   Unhealthy    pod/router-default-56c9f67f66-j8xwn                        Readiness probe failed: Get "http://localhost:1936/healthz/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h40m       Warning   ProbeError   pod/router-default-56c9f67f66-j8xwn                        Liveness probe error: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
3h40m       Warning   Unhealthy    pod/router-default-56c9f67f66-j8xwn                        Liveness probe failed: Get "http://localhost:1936/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3h40m       Normal    Killing      pod/router-default-56c9f67f66-j8xwn                        Container router failed liveness probe, will be restarted
3h40m       Warning   ProbeError   pod/router-default-56c9f67f66-j8xwn                        Readiness probe error: HTTP probe failed with statuscode: 500...
3h40m       Warning   Unhealthy    pod/router-default-56c9f67f66-j8xwn                        Readiness probe failed: HTTP probe failed with statuscode: 500
~~~

The node only host the router replica, and from prometheus it can be verified that routers are consumming all the memory in a short period of time ~20G with an hour.

At some point, the number of haproxy are increasing and ending consuming all memory resources leading in a service disruption in a productive environment.

As console is one of the service with highest activity as per router stats, so far customer is deleting the console pod and process decreasing from 45 to 12. 

Customer is willing to have a guidance about how to identify the process that is consuming the memory, haproxy monitoring is enabled but no dashboard available. 

Router stats from when the router has 8g-6g-3g of memory available has been requested.

Additional info:

 Customer is claiming that this is a happening only in OCP 4.12.45, as other active cluster is still in version 4.10.39 and this is not happening. Upgrade is blocked because of this .

Requested action:
* hard-stop-after might be an option but customer expect information about side effects of this configuration.
* How to reset console connection from haproxy?
* Is there any documentation about haproxy prometheus queries?

https://github.com/openshift/router/pull/576

Bug OCPBUGS-35279: Fix spelling in oc-mirror delete functionality (console ending)

View the Description View the linked PRs

Description of problem:

Fix spelling "Rememeber" to "Remember"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/881

Bug OCPBUGS-24971: Update 4.16 marketplace-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/554

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/560

Bug OCPBUGS-25105: Update 4.16 ose-ibm-vpc-block-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/98

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/98

Bug OCPBUGS-28538: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Vulnerability OCPBUGS-43239: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/217

Bug TRT-1632: openshift-controller-manager pod panic due to type assertion

View the Description View the linked PRs

Caught by the test: Undiagnosed panic detected in pod

Sample job run:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-upgrade/1783981854974545920

Error message

{  pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:02.367266       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)
pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:03.368403       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)
pods/openshift-controller-manager_controller-manager-6b66bf5587-6ghjk_controller-manager.log.gz:E0426 23:06:04.370157       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x3c6a2a0), concrete:(*abi.Type)(0x3e612c0), asserted:(*abi.Type)(0x419cdc0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Secret)}

Sippy indicates it's happening a small percentage of the time since around Apr 25th.

Took out the last payload so labeling trt-incident for now.

See the linked OCPBUG for the actual component.

https://github.com/openshift/openshift-controller-manager/pull/301

Bug OCPBUGS-25193: [azure] permissions required on customer vnet when installing private cluster by using workload identity

View the Description View the linked PRs

Description of problem:

Install private cluster by using azure workload identity, and failed due to no worker machines being provisioned.

install-config:
----------------------
platform:
  azure:
    region: eastus
    networkResourceGroupName: jima971b-12015319-rg
    virtualNetwork: jima971b-vnet
    controlPlaneSubnet: jima971b-master-subnet
    computeSubnet: jima971b-worker-subnet
    resourceGroupName: jima971b-rg
publish: Internal
credentialsMode: Manual

Detailed check on cluster and found machine-api/ingress/image-registry operators reported permissions issues and have no access to customer vnet.

$ oc get machine -n openshift-machine-api
NAME                                  PHASE     TYPE              REGION   ZONE   AGE
jima971b-qqjb7-master-0               Running   Standard_D8s_v3   eastus   2      5h14m
jima971b-qqjb7-master-1               Running   Standard_D8s_v3   eastus   3      5h14m
jima971b-qqjb7-master-2               Running   Standard_D8s_v3   eastus   1      5h15m
jima971b-qqjb7-worker-eastus1-mtc47   Failed                                      4h52m
jima971b-qqjb7-worker-eastus2-ph8bk   Failed                                      4h52m
jima971b-qqjb7-worker-eastus3-hpmvj   Failed                                      4h52m

Errors on worker machine:
--------------------
  errorMessage: 'failed to reconcile machine "jima971b-qqjb7-worker-eastus1-mtc47":
    network.SubnetsClient#Get: Failure responding to request: StatusCode=403 -- Original
    Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed"
    Message="The client ''705eb743-7c91-4a16-a7cf-97164edc0341'' with object id ''705eb743-7c91-4a16-a7cf-97164edc0341''
    does not have authorization to perform action ''Microsoft.Network/virtualNetworks/subnets/read''
    over scope ''/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima971b-12015319-rg/providers/Microsoft.Network/virtualNetworks/jima971b-vnet/subnets/jima971b-worker-subnet''
    or the scope is invalid. If access was recently granted, please refresh your credentials."'
  errorReason: InvalidConfiguration

After manually creating customer role with missed permissions for machine-api/ingress/cloud-controller-manager/image-registry, and assigning it to machine-api/ingress/cloud-controller-manager/image-registry user-assigned identity on scope of customer vnet, cluster was recovered and became running.

Permissions for machine-api/cloud-controller-manager/ingress on customer vnet:
"Microsoft.Network/virtualNetworks/subnets/read",
"Microsoft.Network/virtualNetworks/subnets/join/action"

Permissions for image-registry on customer vnet:
"Microsoft.Network/virtualNetworks/subnets/read",
"Microsoft.Network/virtualNetworks/subnets/join/action"
"Microsoft.Network/virtualNetworks/join/action"

Version-Release number of selected component (if applicable):

    4.15 nightly build

How reproducible:

    always on recent 4.15 payload

Steps to Reproduce:

    1. prepare install-config with private cluster configuration + credentialsMode: Manual
    2. using ccoctl tool to create workload identity
    3. install cluster

Actual results:

    Installation failed due to permission issues

Expected results:

    ccoctl also needs to assign customer role to machine-api/ccm/image-registry user-assigned identity on scope of customer vnet if it is configured in install-config

Additional info:

Issue is only detected on 4.15, it works on 4.14.

Bug OCPBUGS-41709: Need to allow blank for Project/namespace when setting SA Subject in 'Project access tab'

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39109~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38011. The following is the description of the original issue:
—
Description of problem:

Until OCP 4.11, the form with Name and Role in 'Dev Console -> Project -> Project Access tab' seems to have been changed to the form of Subject, Name, and Role through ~~OCPBUGS-7800~~. Here, when the Subject is ServiceAccount, the Save button is not available unless Project is selected.

This seems to be a requirement to set Project/namespace.However, in the CLI, RoleBinding objects can be created without namespace with no issues.

$ oc describe rolebinding.rbac.authorization.k8s.io/monitor
Name: monitor
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: view
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount monitor

—

This is inconsistent with the dev console, causing confusion for developers and administrators and making things cumbersome for administrators.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Login to the web console for Developer.
    2. Select Project on the left.
    3. Select 'Project Access' tab.
    4. Add  access -> Select Sevice Account on the dropdown

Actual results:

   Save button is not active when no project is selected

Expected results:

    The Save button is enabled even though the Project is not selected, so that it can be created just as it is handled in the CLI.

Additional info:

https://github.com/openshift/console/pull/14278

Task MGMT-17181: [UI] Remove unnecessary unleash features

View the Description View the linked PRs

Take a look to this document: https://docs.google.com/spreadsheets/d/16L3_0Jvug2XGCzOva7IH7YRIdCbqrIHJ2M9yyr9K6gk/edit#gid=1106887856 and remove unnecessary unleash features

https://github.com/openshift/assisted-service/pull/6079

Bug OCPBUGS-31746: Backport volumegroupsnapshot fixes to OCP 4.16

View the Description View the linked PRs

Description of problem:

Backport volumegroupsnapshot fixes to OCP 4.16, below are the PR's that need to be backported to external-snapshotter for OCP 4.16

https://github.com/openshift/csi-external-snapshotter/pull/148

Bug OCPBUGS-25357: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1986

Bug OCPBUGS-29511: OCP 4.14.8 responds with RST to all ip fragmented packets arriving to a pod

View the Description View the linked PRs

Description of problem:

When external TCP traffic is IP fragmented with no DF flag set and is targeted to a pod external IP, the fragmented packets are responded by RST and are not delivered to the PODs application socket.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.14.8
Kustomize Version: v5.0.1
Server Version: 4.14.7
Kubernetes Version: v1.27.8+4fab27b

How reproducible:

I built a reproducer for this issue on KVM hosted OCP claster.
I can simulate the same traffic as can be seen in the customer's network.
So we do have a solid reproducer for the issue.
Details are in the JIRA updates.

Steps to Reproduce:
I wrote a simple C-based tcp_server/tcp_client application for testing.
The client simply sends a file towards the server from a networking namespace with
disabled pmtu. The server app runs in a pod and simply waits for connections then reads the data from the socket and stores the received file into /tmp .
There is along the way from the client namespace a veth pair with MTU 1000 since the
path MTU is 1500.
This is enough to get ip packets fragmented along the way from the client to the server.
Details of the setup and testing steps are in the JIRA comments.

Actual results:

$ oc get network.operator -o yaml | grep routingViaHost
routingViaHost: false
All fragmented packets are responded causing a TCP RST and are not delivered to the
application socket in the pod.

Expected results:

Fragmented packets are delivered to the application socket running in a pod with
$ oc get network.operator -o yaml | grep routingViaHost
routingViaHost: false

Additional info:

There is a WA to prevent the issue.
$ oc get network.operator -o yaml | grep routingViaHost
routingViaHost: true
Makes the fragmented traffic arrive at the application socket in the pod.

I can assist with the reproducer and testing on the test env.
Regards Michal Tesar

https://github.com/openshift/ovn-kubernetes/pull/2110

Bug OCPBUGS-37954: No access to list pipelines.tekton.dev prevents from using Delete application form

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30889~~. The following is the description of the original issue:
—
Description of problem:
Trying to delete the application depleyed using Serveless, with a user with limited permission, caused the "Delete application" form to complain:

pipelines.tekton.dev is forbidden: User "uitesting" cannot list resource "pipelines" in API group "tekton.dev" in the namespace "test-cluster-local"

This prevents the deletion. Worth adding that the cluster doesn't have Pipelines installed.
See the sceenshot: https://drive.google.com/file/d/1bsQ_NFO_grj_fE-UInUJXum39bPsHJh1

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Always

Steps to Reproduce:

    1. Create a limited user
    2. Deploy some application, not nececcerly a Serverless one
    3. Try to delete the "application" using the Dev Console

Actual results:

And unrevelant error is shown, preventing the deletetion: pipelines.tekton.dev is forbidden: User "uitesting" cannot list resource "pipelines" in API group "tekton.dev" in the namespace "test-cluster-local"

Expected results:

The app should be removed, with everything that's labelled with it.

https://github.com/openshift/console/pull/14107

Bug OCPBUGS-24981: Update 4.16 ose-gcp-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/48

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/48

Bug OCPBUGS-29723: Hypershift CLI requires security group id when creating a nodepool

View the Description View the linked PRs

Description of problem:

 Hypershift CLI requires security group id when creating a nodepool, otherwise it fails to create a nodepool.

jiezhao-mac:hypershift jiezhao$ ./bin/hypershift create nodepool aws --name=test --cluster-name=jie-test --node-count=3 2024-02-20T11:29:19-05:00 ERROR Failed to create nodepool {"error": "security group ID was not specified and cannot be determined from default nodepool"} github.com/openshift/hypershift/cmd/nodepool/core.(*CreateNodePoolOptions).CreateRunFunc.func1 /Users/jiezhao/hypershift-test/hypershift/cmd/nodepool/core/create.go:39 github.com/spf13/cobra.(*Command).execute /Users/jiezhao/hypershift-test/hypershift/vendor/github.com/spf13/cobra/command.go:983 github.com/spf13/cobra.(*Command).ExecuteC /Users/jiezhao/hypershift-test/hypershift/vendor/github.com/spf13/cobra/command.go:1115 github.com/spf13/cobra.(*Command).Execute /Users/jiezhao/hypershift-test/hypershift/vendor/github.com/spf13/cobra/command.go:1039 github.com/spf13/cobra.(*Command).ExecuteContext /Users/jiezhao/hypershift-test/hypershift/vendor/github.com/spf13/cobra/command.go:1032 main.main /Users/jiezhao/hypershift-test/hypershift/main.go:78 runtime.main /usr/local/Cellar/go/1.20.4/libexec/src/runtime/proc.go:250 Error: security group ID was not specified and cannot be determined from default nodepool security group ID was not specified and cannot be determined from default nodepool jiezhao-mac:hypershift jiezhao${code}
Version-Release number of selected component (if applicable):
{code:none}

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

    nodepool creation should succeed without security group specified in hypershift cli

Additional info:

https://github.com/openshift/hypershift/pull/3614

Bug OCPBUGS-24855: Update 4.16 ose-csi-driver-shared-resource-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource-operator/pull/94

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource-operator/pull/94

Bug OCPBUGS-34707: vsphere - The folder is not deleted after successfully destroying the capi cluster.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33410~~. The following is the description of the original issue:
—
Description of problem:

for ipi on vSphere. enable capi in installer and install the cluster.after destroy the cluster, according to the destroy log: it display "all folder deleted". but actually the cluster folder still exists in vSphere 
Client.

example:

05-08 20:24:38.765 level=debug msg=Delete Folder*05-08 20:24:40.649* level=debug msg=All folders deleted*05-08 20:24:40.649* level=debug msg=Delete StoragePolicy=openshift-storage-policy-wwei-0429g-fdwqc*05-08 20:24:41.576* level=info msg=Destroyed StoragePolicy=openshift-storage-policy-wwei-0429g-fdwqc*05-08 20:24:41.576* level=debug msg=Delete Tag=wwei-0429g-fdwqc*05-08 20:24:43.463* level=info msg=Deleted Tag=wwei-0429g-fdwqc*05-08 20:24:43.463* level=debug msg=Delete TagCategory=openshift-wwei-0429g-fdwqc*05-08 20:24:44.825* level=info msg=Deleted TagCategory=openshift-wwei-0429g-fdwqc
govc ls /DEVQEdatacenter/vm |grep wwei-0429g-fdwqc/DEVQEdatacenter/vm/wwei-0429g-fdwqc

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-07-025557

How reproducible:

destroy a the cluster with capi

Steps to Reproduce:

    1.install cluster with capi
    2.destroy cluster and check cluster folder in vSphere client

Actual results:

    cluster folder still exists.

Expected results:

    cluster folder should not exists in vSphere client after successful destroy.

Additional info:

https://github.com/openshift/installer/pull/8509

Bug OCPBUGS-34408: [Upgrade] kube-apiserver stuck in updating versions when upgrade from old releases

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33963~~. The following is the description of the original issue:
—
Description of problem:

kube-apiserver was stuck in updating versions when upgrade from 4.1 to 4.16 with AWS ipi installation

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-01-111315

How reproducible:

    always

Steps to Reproduce:

    1. IPI Install an AWS 4.1 cluster, upgrade it to 4.16
    2. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating

Actual results:

   1. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating
   $ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-05-16-091947   True        True          39m     Working towards 4.16.0-0.nightly-2024-05-16-092402: 111 of 894 done (12% complete)

Expected results:

Upgrade should be successful.

Additional info:

Must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.1-aws-ipi-f30/1791391925467615232/artifacts/aws-ipi-f30/gather-must-gather/artifacts/must-gather.tar

Checked the must-gather logs, 
$ omg get clusterversion -oyaml
...
conditions:
  - lastTransitionTime: '2024-05-17T09:35:29Z'
    message: Done applying 4.15.0-0.nightly-2024-05-16-091947
    status: 'True'
    type: Available
  - lastTransitionTime: '2024-05-18T06:31:41Z'
    message: 'Multiple errors are preventing progress:

      * Cluster operator kube-apiserver is updating versions

      * Could not update flowschema "openshift-etcd-operator" (82 of 894): the server
      does not recognize this resource, check extension API servers'
    reason: MultipleErrors
    status: 'True'
    type: Failing

$ omg get co | grep -v '.*True.*False.*False'
NAME                                      VERSION                             AVAILABLE  PROGRESSING  DEGRADED  SINCE
kube-apiserver                            4.15.0-0.nightly-2024-05-16-091947  True       True         False     10m

$ omg get pod -n openshift-kube-apiserver
NAME                                               READY  STATUS     RESTARTS  AGE
installer-40-ip-10-0-136-146.ec2.internal          0/1    Succeeded  0         2h29m
installer-41-ip-10-0-143-206.ec2.internal          0/1    Succeeded  0         2h25m
installer-43-ip-10-0-154-116.ec2.internal          0/1    Succeeded  0         2h22m
installer-44-ip-10-0-154-116.ec2.internal          0/1    Succeeded  0         1h35m
kube-apiserver-guard-ip-10-0-136-146.ec2.internal  1/1    Running    0         2h24m
kube-apiserver-guard-ip-10-0-143-206.ec2.internal  1/1    Running    0         2h24m
kube-apiserver-guard-ip-10-0-154-116.ec2.internal  0/1    Running    0         2h24m
kube-apiserver-ip-10-0-136-146.ec2.internal        5/5    Running    0         2h27m
kube-apiserver-ip-10-0-143-206.ec2.internal        5/5    Running    0         2h24m
kube-apiserver-ip-10-0-154-116.ec2.internal        4/5    Running    17        1h34m
revision-pruner-39-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h44m
revision-pruner-39-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h50m
revision-pruner-39-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h52m
revision-pruner-40-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h29m
revision-pruner-40-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h29m
revision-pruner-40-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h29m
revision-pruner-41-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h26m
revision-pruner-41-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h26m
revision-pruner-41-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h26m
revision-pruner-42-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h24m
revision-pruner-42-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-42-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-43-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-43-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-43-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h23m
revision-pruner-44-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         1h35m
revision-pruner-44-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         1h35m
revision-pruner-44-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         1h35m

Checked the kube-apiserver kube-apiserver-ip-10-0-154-116.ec2.internal logs, seems something wring with informers, 
$ grep 'informers not started yet' current.log  | wc -l
360

$ grep 'informers not started yet' current.log 
2024-05-18T06:34:51.888804183Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.Secret *v1.FlowSchema *v1.ConfigMap]
2024-05-18T06:34:51.889350484Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema *v1.Secret *v1.ConfigMap]
2024-05-18T06:34:52.004808401Z [-]informer-sync failed: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]
2024-05-18T06:34:52.095516498Z [-]informer-sync failed: 2 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema]
...

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1690

Bug OCPBUGS-36324: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-34693~~.

https://github.com/openshift/installer/pull/8692

Bug OCPBUGS-42113: metal3 pod's memory consumption seem to grow over time

View the Description View the linked PRs

OCP version: 4.15.0

We have monitoring alerts configured against a cluster in our longevity setup.
After receiving alerts for metal3 - we examined the graph for the pod.

The graph indicates a continuous steady growth of memory consumption.

https://github.com/openshift/ironic-image/pull/587

Task HOSTEDCP-1668: Remove dns-operator leader-elect flags

View the Description View the linked PRs

openshift/cluster-dns-operator#394 removed these flags and the standalone manifests does not use them.

https://github.com/openshift/cluster-dns-operator/blob/master/manifests/0000_70_dns-operator_02-deployment.yaml#L34-L38

Currently causing an e2e outage.

https://github.com/openshift/hypershift/pull/4038

Bug OCPBUGS-32228: Creation of second hostedcluster in the same namespace fails with 'failed to set secret''s owner reference'

View the Description View the linked PRs

Description of problem:

Creation of a second hostedcluster in the same namespace fails with the error "failed to set secret''s owner reference" in the status of the second hostedlcuster's yaml.

~~~
  conditions:
  - lastTransitionTime: "2024-04-02T06:57:18Z"
    message: 'failed to reconcile the CLI secrets: failed to set secret''s owner reference'
    observedGeneration: 1
    reason: ReconciliationError
    status: "False"
    type: ReconciliationSucceeded
~~~

Note that the hosted control plane namespace is still different for both clusters.

Customer is just following the doc - https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/clusters/cluster_mce_overview#creating-a-hosted-cluster-bm for both the clusters and only the hostedcluster CR is created in the same namespace.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

Steps to Reproduce:

    1. Create a hostedcluster as per the doc https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.9/html/clusters/cluster_mce_overview#creating-a-hosted-cluster-bm
    2. Create another hostedcluster in the same namespace where the first hostedcluster was created.
    3. Second hostedcluster fails to proceed with the said error.

Actual results:

The hostedcluster creation fails

Expected results:

The hostedcluster creation should succeed

Additional info:

https://github.com/openshift/hypershift/pull/3900

Bug OCPBUGS-37362: Incorrect OVN-K alerts pre & post IC (was: There is no runbook url for alert OVNKubernetesNorthdInactive)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33758~~. The following is the description of the original issue:
—
Description of problem:

We have runbook for OVNKubernetesNorthdInactive: https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/OVNKubernetesNorthdInactive.md

But the runbook url is not added for alert OVNKubernetesNorthdInactive:
4.12: https://github.com/openshift/cluster-network-operator/blob/c1a891129c310d01b8d6940f1eefd26058c0f5b6/bindata/network/ovn-kubernetes/managed/alert-rules-control-plane.yaml#L350
4.13: https://github.com/openshift/cluster-network-operator/blob/257435702312e418be694f4b98b8fe89557030c6/bindata/network/ovn-kubernetes/managed/alert-rules-control-plane.yaml#L350

Version-Release number of selected component (if applicable):

4.12.z, 4.13.z

How reproducible:

always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-32591: GCP misleading destroy alert

View the Description View the linked PRs

Description of problem:

The following "error" shows up when running a gcp destroy:

Invalid instance ci-op-nlm7chi8-8411c-4tl9r-master-0 in target pool af84a3203fc714c64a8043fdc814386f, target pool will not be destroyed" 

It is a bit misleading as this alerts when the resource is simply not part of the cluster.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8298

Task MON-1047: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2228

Bug OCPBUGS-28827: Agent can't find 'boot.catalog' when creating agent.aarch64.iso

View the Description View the linked PRs

Description of problem:

openshift-install is unable to generate an aarch64 iso:
              
FATAL failed to write asset (Agent Installer ISO) to disk: missing boot.catalog file

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    100 %

Steps to Reproduce:

    1. Create an install_config.yaml with controlplane.architecture and compute.architecture = arm64 
    2. openshift-install agent create image  --log-level debug

Actual results:

DEBUG Generating Agent Installer ISO... 
INFO Consuming Install Config from target directory 
DEBUG Purging asset "Install Config" from disk 
INFO Consuming Agent Config from target directory 
DEBUG Purging asset "Agent Config" from disk 
DEBUG initDisk(): start DEBUG initDisk(): regular file 
FATAL failed to write asset (Agent Installer ISO) to disk: missing boot.catalog file

Expected results:

    agent.aarch64.iso is created

Additional info:

Seems to be related to this PR:
https://github.com/openshift/installer/pull/7896

boot.catalog is also referenced in the assisted-image-service here:
https://github.com/openshift/installer/blob/master/vendor/github.com/openshift/assisted-image-service/pkg/isoeditor/isoutil.go#L155

https://github.com/openshift/installer/pull/7972

Story CONSOLE-4019: Display a warning message from kube-apiserver when creating resource integration tests

View the Description View the linked PRs

Display a warning message from kube-apiserver when creating resource integration tests

A.C.
Implement integration tests for the display a warning toast notification after creating/updating resource action

Additional:
Use Cypress Feature to stub resource response and `cy.intecerpt` method to invoke resource creation endpoint

https://github.com/openshift/console/pull/13814

Bug OCPBUGS-23852: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/431

Bug OCPBUGS-25562: Update 4.16 ose-ibm-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/62

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/62

Bug OCPBUGS-37723: update owners in whereabouts

View the Description View the linked PRs

Backport owners file changes

https://github.com/openshift/whereabouts-cni/pull/294

Bug OCPBUGS-30111: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/973

Bug OCPBUGS-36091: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8654

Bug OCPBUGS-27871: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/498

Bug OCPBUGS-31589: e2e: use bound API token for prometheus-k8s SA

View the Description View the linked PRs

e2e tests are unable to create a prometheus client when legacy service account API tokens are not auto-generated.

https://github.com/openshift/origin/pull/28679

Story TRT-1656: Payloads Blocked on TestWebhook test failure

View the Description View the linked PRs

Cloned for TRT incident tracking:

During jobs that upgrade to 4.16 from 4.15, the testing of unauthenticated build webhook invocation fails (I suspect due to the existing rolebindings from 4.15 surviving the upgrade).

[sig-builds][Feature:Builds][webhook] TestWebhook [apigroup:build.openshift.io][apigroup:image.openshift.io] [Suite:openshift/conformance/parallel] 
.
.
.
    STEP: testing unauthenticated forbidden webhooks @ 05/07/24 20:03:20.024
    STEP: executing the webhook to get the build object @ 05/07/24 20:03:20.024
    [FAILED] in [It] - github.com/openshift/origin/test/extended/builds/webhook.go:36 @ 05/07/24 20:03:20.148

https://github.com/openshift/origin/pull/28781

Bug OCPBUGS-24481: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1908

Bug OCPBUGS-25206: Dev console: Pipelines integration tests was disabled because the operator wasn't available on 4.15

View the Description View the linked PRs

We need to reenable the e2e integration tests as soon as the operator is available again.

https://github.com/openshift/console/pull/13438

Bug OCPBUGS-28581: Update 4.16 ose-egress-router-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/egress-router-cni/pull/80

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/egress-router-cni/pull/80

Bug OCPBUGS-43544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2537

Bug OCPBUGS-38773: [Backport-4.16] Unexpected featuregate "ExternalRouteCertificate" added in openshift/api

View the Description View the linked PRs

This is a clone of issue OCPBUGS-36479. The following is the description of the original issue:
—
Description of problem:

    As part of https://issues.redhat.com/browse/CFE-811, we added a featuregate "RouteExternalCertificate" to release the feature as TP, and all the code implementations were behind this gate.

However, it seems https://github.com/openshift/api/pull/1731 inadvertently duplicated "ExternalRouteCertificate" as "RouteExternalCertificate".

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    100%

Steps to Reproduce:

   $ oc get featuregates.config.openshift.io cluster -oyaml 
<......>
spec:
  featureSet: TechPreviewNoUpgrade
status:
  featureGates:
    enabled:
    - name: ExternalRouteCertificate
    - name: RouteExternalCertificate
<......>

Actual results:

    Both RouteExternalCertificate and ExternalRouteCertificate were added in the API

Expected results:

We should have only one featuregate "RouteExternalCertificate" and the same should be displayed in https://docs.openshift.com/container-platform/4.16/nodes/clusters/nodes-cluster-enabling-features.html

Additional info:

 Git commits

https://github.com/openshift/api/commit/11f491c2c64c3f47cea6c12cc58611301bac10b3

https://github.com/openshift/api/commit/ff31f9c1a0e4553cb63c3e530e46a3e8d2e30930

Slack thread: https://redhat-internal.slack.com/archives/C06EK9ZH3Q8/p1719867937186219

https://github.com/openshift/api/pull/2007

Bug OCPBUGS-24806: Update 4.16 ose-olm-catalogd-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-catalogd/pull/36

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-catalogd/pull/36

Bug OCPBUGS-24176: unable to create hypershift-hosted cluster via cluster-bot

View the Description View the linked PRs

Upon debugging, nodes are stuck in NotReady state and CNI is not initialised on them.

Seeing the following error log in cluster network operator

failed parsing certificate data from ConfigMap "openshift-service-ca.crt": failed to parse certificate PEM

CNO operator logs: https://docs.google.com/document/d/1hor1r9ue4gnetkXm9mh8AKa7vm8zNBPhUQqWCbbnnUc/edit?usp=sharing

This is happening on a management cluster that is configured to use legacy service CA's:

$ oc get kubecontrollermanager/cluster -o yaml --as system:admin
apiVersion: operator.openshift.io/v1
kind: KubeControllerManager
metadata:
  name: cluster
spec:
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
  unsupportedConfigOverrides: null
  useMoreSecureServiceCA: false

In newer clusters, useMoreSecureServiceCA is set to true.

https://github.com/openshift/cluster-network-operator/pull/2178

Bug OCPBUGS-29980: ART requests updates to 4.16 image prom-label-proxy-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prom-label-proxy/pull/364

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prom-label-proxy/pull/365

Bug OCPBUGS-37559: [release-4.16] Mark e2e flaky tests as flaky

View the Description View the linked PRs

Description of problem:

    Flake's gonna flake

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Run e2e test
    2. ...
    3. Profit

Actual results:

Red

Expected results:

Green

Additional info:

https://github.com/openshift/operator-framework-olm/pull/814

Bug OCPBUGS-18454: CVO chokes on ConditionalUpdateRisk with name that is not a valid condition reason

View the Description View the linked PRs

Description of problem:

Conditional risks have looser naming restrictions:

	// +kubebuilder:validation:Required
	// +kubebuilder:validation:MinLength=1
	// +required
	Name string `json:"name"`

...than condition Reason field:

	// +required
	// +kubebuilder:validation:Required
	// +kubebuilder:validation:MaxLength=1024
	// +kubebuilder:validation:MinLength=1
	// +kubebuilder:validation:Pattern=`^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$`
	Reason string `json:"reason" protobuf:"bytes,5,opt,name=reason"`

CVO can use a name risk as a reason so when a name of the applying risk is invalid, CVO will start trying to update the ClusterVersion resource with an invalid one.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

Make the cluster consume update graph data containing a conditional edge with a risk with a name that does not follow the Condition.Reason restriction, e.g. uses a - character. The risk needs to apply on the cluster. For example:

{
  "nodes": [
    {"version": "CLUSTER-BOT-VERSION", "payload": "CLUSTER-BOT-PAYLOAD"},
    {"version": "4.12.22", "payload": "quay.io/openshift-release-dev/ocp-release@sha256:1111111111111111111111111111111111111111111111111111111111111111"}
  ],
  "conditionalEdges": [
    {
      "edges": [{"from": "CLUSTER-BOT-VERSION", "to": "4.12.22"}],
      "risks": [
        {
          "url": "https://github.com/openshift/api/blob/8891815aa476232109dccf6c11b8611d209445d9/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1515C4-L1520",
          "name": "OCPBUGS-9050",
          "message": "THere is no validation that risk name is a valid Condition.Reason so let's just use a - character in it.",
          "matchingRules": [{"type": "PromQL", "promql": { "promql": "group by (type) (cluster_infrastructure_provider)"}}]
        }
      ]
    }
  ]
}

Then, observe the ClusterVersion status field after the cluster has a chance to evaluate the risk:

$ oc get clusterversion version -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing" or .type=="Available" or .type=="Failing")'

Actual results:

{
  "lastTransitionTime": "2023-09-01T13:21:49Z",
  "status": "False",
  "type": "Available"
}
{
  "lastTransitionTime": "2023-09-01T13:21:49Z",
  "message": "ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates[0].conditions[0].reason: Invalid value: \"OCPBUGS-9050\": status.conditionalUpdates[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'",
  "status": "True",
  "type": "Failing"
}
{
  "lastTransitionTime": "2023-09-01T13:14:34Z",
  "message": "Error ensuring the cluster version is up to date: ClusterVersion.config.openshift.io \"version\" is invalid: status.conditionalUpdates[0].conditions[0].reason: Invalid value: \"OCPBUGS-9050\": status.conditionalUpdates[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'",
  "status": "False",
  "type": "Progressing"
}

Expected results:

No errors, CVO continues to work and either uses some sanitized version of the name as the reason, or maybe uses something generic, like RiskApplies.

CVO does not get stuck after consuming data from external source

Additional info:

1. We should CI PRs to o/cincinnati-graph-data so we never create invalid data
2. We should sanitize the field in CVO code so that CVO never attempts to submit an invalid ClusterVersion.status.conditionalUpdates.condition.reason
3. We should further restrict the conditional update risk name in the CRD so it is guaranteed compatible with Condition.Reason
4. We should sanitize the field in CVO code after being read from OSUS so that CVO never attempts to submit an invalid (after we do 3) ClusterVersion.conditinalUpdates.name

https://github.com/openshift/cluster-version-operator/pull/962

Bug OCPBUGS-24720: Update 4.16 ose-multus-whereabouts-ipam-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/212

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-29605: Cannot re-use ipv6 LBs in dualstack clusters

View the Description View the linked PRs

On ipv6-primary dualstack, it is observed that the test:

"[sig-installer][Suite:openshift/openstack][lb][Serial] The Openstack platform should re-use an existing UDP Amphora LoadBalancer when new svc is created on Openshift with the proper annotation"

fails, because CCM is considering it as "internal":

I0216 10:13:07.053922       1 loadbalancer.go:2113] "EnsureLoadBalancer" cluster="kubernetes" service="e2e-test-openstack-sprfn/udp-lb-shared2-svc"
E0216 10:13:07.124915       1 controller.go:298] error processing service e2e-test-openstack-sprfn/udp-lb-shared2-svc (retrying with exponential backoff): failed to ensure load balancer: internal Service cannot share a load balancer
I0216 10:13:07.125445       1 event.go:307] "Event occurred" object="e2e-test-openstack-sprfn/udp-lb-shared2-svc" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: internal Service cannot share a load balancer"

However, both LBs do not have the below annotation:

"service.beta.kubernetes.io/openstack-internal-load-balancer": "true"

Versions:
4.15.0-0.nightly-2024-02-14-052317
RHOS-16.2-RHEL-8-20230510.n.1

https://github.com/openshift/cloud-provider-openstack/pull/275

Bug OCPBUGS-31513: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28665

Bug OCPBUGS-41555: SingleReplica HCPs can not upgrade on cluster with nodes in a single zone

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39313~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-29497. The following is the description of the original issue:
—
While updating an HC with controllerAvailabilityPolicy of SingleReplica, the HCP doesn't fully rollout with 3 pod stuck in Pending

multus-admission-controller-5b5c95684b-v5qgd          0/2     Pending   0               4m36s
network-node-identity-7b54d84df4-dxx27                0/3     Pending   0               4m12s
ovnkube-control-plane-647ffb5f4d-hk6fg                0/3     Pending   0               4m21s

This is because these deployment all have requiredDuringSchedulingIgnoredDuringExecution zone anti-affinity and maxUnavailable: 25% (i.e. 1)

Thus the old pod blocks scheduling of the new pod.

https://github.com/openshift/cluster-network-operator/pull/2494

Bug OCPBUGS-19950: Clusters stopped provisioning via ztp/acm in large scale test environment with mixed clusters types (SNO, Compact, Standard sized clusters) with bad gateway error

View the Description View the linked PRs

Description of problem:

While attempting to provision 300 clusters every hour of mixed cluster sizes (SNO, Compact, and standard cluster sizes) It appears that the metal3 baremetal operator has his a failure to provision any clusters.  Out of the 1850 attempted clusters, only 282 successfully provisioned (Mostly SNO size).

There seems to be many errors in the baremetal operator log, some of which are actual stack traces but it is unclear if this is the actually reason why the clusters began to fail to install with 100% not installing on the 3rd wave and beyond.

Version-Release number of selected component (if applicable):

Hub OCP - 4.14.0-rc.2
Deployed Cluster OCP - 4.14.0-rc.2
ACM - 2.9.0-DOWNSTREAM-2023-09-27-22-12-46

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Some of the errors found in the logs:
{"level":"error","ts":"2023-09-28T22:39:56Z","msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":{"name":"vm01343","namespace":"compact-00046"},"namespace":"compact-00046","name":"vm01343","reconcileID":"4bbfa52f-12a6-4983-b86b-01086491de9f","error":"action \"provisioning\" failed: failed to provision: failed to change provisioning state to \"active\": Internal Server Error","errorVerbose":"Internal Server Error\nfailed to change provisioning state to \"active\"\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).tryChangeNodeProvisionState\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:740\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).changeNodeProvisionState\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:750\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).Provision\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:1604\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:1179\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handleProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:527\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\nfailed to provision\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:1188\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handleProvisioning\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:527\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\naction \"provisioning\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:229\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}


{"level":"info","ts":"2023-09-29T16:11:24Z","logger":"provisioner.ironic","msg":"error caught while checking endpoint","host":"standard-00241~vm03618","endpoint":"https://metal3-state.openshift-machine-api.svc.cluster.local:6388/v1/","error":"Bad Gateway"}

https://github.com/openshift/ironic-agent-image/pull/101

Bug OCPBUGS-24862: Update 4.16 ose-installer-altinfra-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7819

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7872

Bug OCPBUGS-28246: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-36198: kubelet does not start after reboot due to dependency issue

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33694~~. The following is the description of the original issue:
—
Description of problem:

kubelet does not start after reboot due to dependency issue

Version-Release number of selected component (if applicable):

 OCP 4.14.23

How reproducible:

    Every time at customer end

Steps to Reproduce:

    1. Upgrade Openshift cluster (OVN based) with kdump enabled to OCP 4.14.23
    2. Check kubelet and crio status

Actual results:

    kubelet and crio services are in dead state and do not start automatically after reboot, manual intervention is needed.

$ cat sos_commands/crio/systemctl_status_crio 
○ crio.service - Container Runtime Interface for OCI (CRI-O)
     Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; preset: disabled)
    Drop-In: /etc/systemd/system/crio.service.d
             └─01-kubens.conf, 05-mco-ordering.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
     Active: inactive (dead)
       Docs: https://github.com/cri-o/cri-o$ cat sos_commands/openshift/systemctl_status_kubelet 
○ kubelet.service - Kubernetes Kubelet
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
     Active: inactive (dead)

Expected results:

    kubelet and crio should start automatically.

Additional info:

I feel the recent patch to wait till kdump starts has broken the ordering cycle.

https://github.com/openshift/machine-config-operator/pull/4213/files

May 09 19:12:05 network01 systemd[1]: network-online.target: Found dependency on kdump.service/start
May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Found ordering cycle on kdump.service/start
May 09 19:13:48 network01 systemd[1]: ovs-configuration.service: Job kdump.service/start deleted to break ordering cycle starting with ovs-configuration.service/start
May 12 21:20:57 network01 systemd[1]: node-valid-hostname.service: Found dependency on kdump.service/start
May 12 21:21:00 network01 kdumpctl[1389]: kdump: kexec: loaded kdump kernel
May 12 21:21:00 network01 kdumpctl[1389]: kdump: Starting kdump: [OK]
May 12 21:25:28 network01 systemd[1]: kdump.service: Found ordering cycle on network-online.target/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on node-valid-hostname.service/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on ovs-configuration.service/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Found dependency on kdump.service/start
May 12 21:25:28 network01 systemd[1]: kdump.service: Job network-online.target/start deleted to break ordering cycle starting with kdump.service/start
May 12 21:25:31 network01 kdumpctl[1284]: kdump: kexec: loaded kdump kernel
May 12 21:25:31 network01 kdumpctl[1284]: kdump: Starting kdump: [OK]

To break a cycle, systemd deletes a job part of the cycle, making the corresponding service not to be started.
  Disabling kdump and rebooting the node helps, kubelet and crio start automatically. 

# systemctl disable kdump

# systemctl reboot

Make sure systemctl list-jobs do not have any pending jobs, once it is completed, we can check status of kubelet.

# systemctl list-jobs

# systemctl status kubelet

https://github.com/openshift/machine-config-operator/pull/4436

Story CORS-3182: 4.16 Release Branching Checklist

View the Description

User Story:

This is a checklist of tasks for when we break off a new feature branch.

Acceptance Criteria:

Description of criteria:

Complete checklist items
Clone this card to the next release

Sub-task CORS-3185: Update default release image

View the Description View the linked PRs

e.g. https://github.com/openshift/installer/pull/5774/files

https://github.com/openshift/installer/pull/7874

Sub-task CORS-3188: Update k8s dependencies (api, etc)

View the linked PRs

https://github.com/openshift/installer/pull/7970

Task HOSTEDCP-1397: Add documentation on how to debug Azure nodes

View the Description View the linked PRs

Add documentation on how to debug Azure nodes

https://github.com/openshift/hypershift/pull/3452

Bug OCPBUGS-32985: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-33502: [AWS CAPI install] "metadataService.authentication. Required" is not accepted

View the Description View the linked PRs

Description of problem:

Specify "metadataService.authentication: Required" config for cluster:

platform.aws.defaultMachinePlatform.metadataService.authentication: Required
Or 
compute.plarform.aws.metadataService.authentication: Required
controlPlane.plarform.aws.metadataService.authentication: Required


Creating cluster got following error:

INFO Created manifest *v1.Namespace, namespace= name=openshift-cluster-api-guests
...
INFO Creating Route53 records for control plane load balancer
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to create control-plane manifest: AWSMachine.infrastructure.cluster.x-k8s.io "yunjiang-cap4-fp88c-bootstrap" is invalid: spec.instanceMetadataOptions.httpTokens: Unsupported value: "Required": supported values: "optional", "required"
INFO Shutting down local Cluster API control plane...
...

"Required" is a valid value:
openshift-install explain installconfig.platform.aws.defaultMachinePlatform.metadataService.authentication
KIND: 	InstallConfig
VERSION:  v1

RESOURCE: <string>
  Valid Values: "Required","Optional"
...

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-08-222442

How reproducible:

Steps to Reproduce:

    1.See description
    2.
    3.

Actual results:

See description

Expected results:

No issues with above configurations

Additional info:

https://github.com/openshift/installer/pull/8399

Bug OCPBUGS-38015: [4.16] etcd vertical scaling test should not rely on CPMS status.readyReplicas

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37837~~. The following is the description of the original issue:
—
In our vertical scaling test, after we delete a machine, we rely on the `status.readyReplicas` field of the ControlPlaneMachineSet (CPMS) to indicate that it has successfully created a new machine that let's us scale up before we scale down.
https://github.com/openshift/origin/blob/3deedee4ae147a03afdc3d4ba86bc175bc6fc5a8/test/extended/etcd/vertical_scaling.go#L76-L87

As we've seen in the past as well, that status field isn't a reliable indicator of the scale up of machines, as status.readyReplicas might stay at 3 as the soon-to-be-removed node that is pending deletion can go Ready=Unknown in runs such as the following: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1286/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling/1808186565449486336

Which then ends up the test timing out on waiting for status.readyReplicas=4 while the scale-up and down may already have happened.
This shows up across scaling tests on all platforms as:

fail [github.com/openshift/origin/test/extended/etcd/vertical_scaling.go:81]: Unexpected error:
    <*errors.withStack | 0xc002182a50>: 
    scale-up: timed out waiting for CPMS to show 4 ready replicas: timed out waiting for the condition
    {
        error: <*errors.withMessage | 0xc00304c3a0>{
            cause: <wait.errInterrupted>{
                cause: <*errors.errorString | 0xc0003ca800>{
                    s: "timed out waiting for the condition",
                },
            },
            msg: "scale-up: timed out waiting for CPMS to show 4 ready replicas",
        },

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-azure-ovn-etcd-scaling/1811686448848441344

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.17?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522etcd-scaling%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=asc&sortField=net_improvement

In hindsight all we care about is whether the deleted machine's member is replaced by another machine's member and can ignore the flapping of node and machine statuses while we wait for the scale-up then down of members to happen. So we can relax or replace that check on status.readyReplicas with just looking at the membership change.

PS: We can also update the outdated Godoc comments for the test to mention that it relies on CPMSO to create a machine for us https://github.com/openshift/origin/blob/3deedee4ae147a03afdc3d4ba86bc175bc6fc5a8/test/extended/etcd/vertical_scaling.go#L34-L38

https://github.com/openshift/origin/pull/28981

Bug OCPBUGS-28225: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/993

Bug OCPBUGS-31106: HyperShift: Minimize container ephemeral storage usage when auditing is enabled

View the Description View the linked PRs

Description of problem:

    HyperShift control plane pods that support auditing (i.e. Kubernetes API server, OpenShift API server, and OpenShift oauth API server) maintain auditing log files that may consume many GB of container ephemeral storage in short period of time.

We need to reduce the size of logs in these containers by modifying audit-log-maxbackup and audit-log-maxsize. This should not change the functionality of the audit logs since all we do is output to stdout in the containerd logs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3781

Bug OCPBUGS-38013: [4.16.z] SCC pinning for all workloads in platform namespaces (openshift-manila-csi-driver)

View the Description View the linked PRs

Backport to 4.16 of AUTH-482 specifically for the openshift-manila-csi-driver.

Namespaces with workloads that need pinning:

openshift-manila-csi-driver

See 4.17 PR for more info on what needs pinning.

https://github.com/openshift/csi-driver-manila-operator/pull/235

Bug OCPBUGS-29976: ART requests updates to 4.16 image golang-github-prometheus-node_exporter-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/node_exporter/pull/141

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/node_exporter/pull/142

Bug OCPBUGS-42203: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/636

Bug OCPBUGS-29987: ART requests updates to 4.16 image csi-node-driver-registrar-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-node-driver-registrar/pull/65

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-node-driver-registrar/pull/68

Bug OCPBUGS-35280: Illogical shared module warnings when building Kubevirt dynamic plugin

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34683~~. The following is the description of the original issue:
—
Description of problem:

When building https://github.com/kubevirt-ui/kubevirt-plugin from its release-4.16 branch, following warnings are issued during the webpack build:

WARNING in shared module react

No required version specified and unable to automatically determine one. Unable to find required version for "react" in description file (/home/vszocs/work/kubevirt-plugin/node_modules/react/package.json). It need to be in dependencies, devDependencies or peerDependencies.

These warnings should not appear during the plugin build.

Root cause seems to be webpack module federation code which attempts to auto-detect actual build version of shared modules, but this code seems to be unreliable and warnings such as the one above are anything but helpful.

How reproducible: always on kubevirt-plugin branch release-4.16

Steps to Reproduce:

1. git clone https://github.com/kubevirt-ui/kubevirt-plugin
2. cd kubevirt-plugin
3. yarn && yarn dev

https://github.com/openshift/console/pull/13956

Bug OCPBUGS-38711: [4.16.z] SCC pinning for all workloads in platform namespaces (openshift-*-infra)

View the Description View the linked PRs

Backport to 4.16 of AUTH-482 specifically for the openshift-*-infra.

Namespaces with workloads that need pinning:

openshift-kni-infra
openshift-openstack-infra
openshift-vsphere-infra

See 4.17 PR for more info on what needs pinning.

https://github.com/openshift/machine-config-operator/pull/4539

Story TRT-1485: 4.16 CI payloads failed on an Alert we've never seen before

View the Description View the linked PRs

Our test that watches for alerts to appear we've never seen before has picked something up on https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-02-02-031544

[sig-trt][invariant] No new alerts should be firing expand_less 0s

{ Found alerts firing which are new or less than two weeks old, which should not be firing: PrometheusOperatorRejectedResources has no test data, this alert appears new and should not be firing}

It hit about 3-4 out of 10 on both azure and aws agg jobs. Could be a regression, could be something really rare.

https://github.com/openshift/cluster-etcd-operator/pull/1193

Bug OCPBUGS-37759: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4505

Bug OCPBUGS-14496: TechPreviewNoUpgrade alert should only trigger on the TechPreviewNoUpgrade feature set

View the Description View the linked PRs

Description of problem:

For years, the TechPreviewNoUpgrade alert has used:

cluster_feature_set{name!="", namespace="openshift-kube-apiserver-operator"} == 0

But recently testing 4.12.19, I saw the alert pending with name="LatencySensitive". Alerting on that not-useful-since-4.6 feature set is fine (~~OCPBUGS-14497~~), but TechPreviewNoUpgrade isn't a great name when the actual feature set is LatencySensitive. And the summary and descripition don't apply to LatencySensitive either.

Version-Release number of selected component (if applicable):

The buggy expr / alertname pair shipped in 4.3.0.

How reproducible:

All the time.

Steps to Reproduce:

1. Install a cluster like 4.12.19.
2. Set the LatencySensitive feature set:

$ oc patch featuregate cluster --type=json --patch='[{"op":"add","path":"/spec/featureSet","value":"LatencySensitive"}]'

3. Check alerts with /monitoring/alerts?rowFilter-alert-source=platform&resource-list-text=TechPreviewNoUpgrade in the web console.

Actual results:

TechPreviewNoUpgrade is pending or firing.

Expected results:

Something appropriate to LatencySensitive, like a generic alert that covers all non-default feature sets, is pending or firing.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1512

Bug OCPBUGS-24854: Update 4.16 ose-azure-workload-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-workload-identity/pull/8

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-workload-identity/pull/8

Bug OCPBUGS-30537: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/143

Bug OCPBUGS-28742: ovnkube-node doesn't refresh certificates after node was suspended for 30 days

View the Description View the linked PRs

Description of problem:

ovnkube-node doesn't issue a CSR to get new certificates when node is suspended for 30 days

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Setup a libvirt cluster on machine
    2. Disable chronyd on all nodes and host machine
    3. Suspend nodes
    4. Change time on host 30 days forward
    5. Resume nodes
    6. Wait for API server to come up
    7. Wait for all operators to become ready

Actual results:

ovnkube-node would attempt to use expired certs:  2024-01-21T01:24:41.576365431+00:00 stderr F I0121 01:24:41.573615    8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0"
2024-04-20T01:25:08.519622252+00:00 stderr F I0420 01:25:08.516550    8852 services_controller.go:567] Deleting service openshift-operator-lifecycle-manager/packageserver-service
2024-04-20T01:25:08.900228370+00:00 stderr F I0420 01:25:08.898580    8852 services_controller.go:567] Deleting service openshift-operator-lifecycle-manager/packageserver-service
2024-04-20T01:25:17.137956433+00:00 stderr F I0420 01:25:17.137891    8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:17.137956433+00:00 stderr F I0420 01:25:17.137933    8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:17.137997952+00:00 stderr F I0420 01:25:17.137979    8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 
2024-04-20T01:25:19.099635059+00:00 stderr F I0420 01:25:19.099057    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-1
2024-04-20T01:25:19.099635059+00:00 stderr F I0420 01:25:19.099080    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-1: 35.077µs
2024-04-20T01:25:22.245550966+00:00 stderr F W0420 01:25:22.242774    8852 base_network_controller_namespace.go:458] Unable to remove remote zone pod's openshift-controller-manager/controller-manager-5485d88c84-xztxq IP address from the namespace address-set, err: pod openshift-controller-manager/controller-manager-5485d88c84-xztxq: no pod IPs found 
2024-04-20T01:25:22.262446336+00:00 stderr F W0420 01:25:22.261351    8852 base_network_controller_namespace.go:458] Unable to remove remote zone pod's openshift-route-controller-manager/route-controller-manager-6b5868f887-n6jj9 IP address from the namespace address-set, err: pod openshift-route-controller-manager/route-controller-manager-6b5868f887-n6jj9: no pod IPs found 
2024-04-20T01:25:27.154790226+00:00 stderr F I0420 01:25:27.154744    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-worker-0
2024-04-20T01:25:27.154790226+00:00 stderr F I0420 01:25:27.154770    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-worker-0: 31.72µs
2024-04-20T01:25:27.172301639+00:00 stderr F I0420 01:25:27.168666    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-2
2024-04-20T01:25:27.172301639+00:00 stderr F I0420 01:25:27.168692    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-2: 34.346µs
2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194311    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-master-0
2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194339    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-master-0: 40.027µs
2024-04-20T01:25:27.196078022+00:00 stderr F I0420 01:25:27.194582    8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0"
2024-04-20T01:25:27.215435944+00:00 stderr F I0420 01:25:27.215387    8852 master.go:740] Adding or Updating Node "test-infra-cluster-4832ebf8-master-0"
2024-04-20T01:25:35.789830706+00:00 stderr F I0420 01:25:35.789782    8852 egressservice_zone_node.go:110] Processing sync for Egress Service node test-infra-cluster-4832ebf8-worker-1
2024-04-20T01:25:35.790044794+00:00 stderr F I0420 01:25:35.790025    8852 egressservice_zone_node.go:113] Finished syncing Egress Service node test-infra-cluster-4832ebf8-worker-1: 250.227µs
2024-04-20T01:25:37.596875642+00:00 stderr F I0420 01:25:37.596834    8852 iptables.go:358] "Running" command="iptables-save" arguments=["-t","nat"]
2024-04-20T01:25:47.138312366+00:00 stderr F I0420 01:25:47.138266    8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:47.138382299+00:00 stderr F I0420 01:25:47.138370    8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:25:47.138453866+00:00 stderr F I0420 01:25:47.138440    8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 
2024-04-20T01:26:17.138583468+00:00 stderr F I0420 01:26:17.138544    8852 obj_retry.go:296] Retry object setup: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:26:17.138640587+00:00 stderr F I0420 01:26:17.138629    8852 obj_retry.go:358] Adding new object: *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp
2024-04-20T01:26:17.138708817+00:00 stderr F I0420 01:26:17.138696    8852 obj_retry.go:370] Retry add failed for *v1.Pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp, will try again later: failed to obtain IPs to add remote pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: suppressed error logged: pod openshift-operator-lifecycle-manager/collect-profiles-28559595-wfdsp: no pod IPs found 
2024-04-20T01:26:39.474787436+00:00 stderr F I0420 01:26:39.474744    8852 reflector.go:790] k8s.io/client-go/informers/factory.go:159: Watch close - *v1.EndpointSlice total 130 items received
2024-04-20T01:26:39.475670148+00:00 stderr F E0420 01:26:39.475653    8852 reflector.go:147] k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.EndpointSlice: the server has asked for the client to provide credentials (get endpointslices.discovery.k8s.io)
2024-04-20T01:26:40.786339334+00:00 stderr F I0420 01:26:40.786255    8852 reflector.go:325] Listing and watching *v1.EndpointSlice from k8s.io/client-go/informers/factory.go:159
2024-04-20T01:26:40.806238387+00:00 stderr F W0420 01:26:40.804542    8852 reflector.go:535] k8s.io/client-go/informers/factory.go:159: failed to list *v1.EndpointSlice: Unauthorized
2024-04-20T01:26:40.806238387+00:00 stderr F E0420 01:26:40.804571    8852 reflector.go:147] k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Unauthorized

Expected results:

ovnkube-node detects that cert is expired, requests new certs via CSR flow and reloads them

Additional info:

CI periodic to check this flow: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ovn-sno-cert-rotation-suspend-30d
artifacts contain sosreport

Applies to SNO and HA clusters, works as expected when nodes are being properly shutdown instead of suspended

https://github.com/openshift/ovn-kubernetes/pull/2081

Bug OCPBUGS-34717: The s2i build strategy is not assumed for Serverless Functions

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33733~~. The following is the description of the original issue:
—
Description of problem:

    When creating a Serverless Function via Web Console from GIT repository the validation claims that builder strategy is not s2i. However if the build strategy is not set in func.yaml, then the s2i should be assumed implicitly. There should be no error.

There should be error only if the strategy is explicitly set to something other than s2i in func.yaml.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Try to create Serverless function from git repository where func.yaml does not explicitly specify builder.
    2. The Serverless Function cannot be created because of the validation.

Actual results:

    The Function cannot be created.

Expected results:

    The function can be created.

Additional info:

https://github.com/openshift/console/pull/13915

Bug OCPBUGS-39341: oc adm prune deployments` does not work and giving panic when using --replica-set option

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34877~~. The following is the description of the original issue:
—
Description of problem:

oc adm prune deployments` does not work and giving below error when using --replica-set option.

    [root@weyb1525 ~]# oc adm prune deployments --orphans --keep-complete=1 --keep-failed=0 --keep-younger-than=1440m --replica-sets --v=6
I0603 09:55:39.588085 1540280 loader.go:373] Config loaded from file:  /root/openshift-install/paas-03.build.net.intra.laposte.fr/auth/kubeconfig
I0603 09:55:39.890672 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/apis/apps.openshift.io/v1/deploymentconfigs 200 OK in 301 milliseconds
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
I0603 09:55:40.529367 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/apis/apps/v1/deployments 200 OK in 65 milliseconds
I0603 09:55:41.369413 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/api/v1/replicationcontrollers 200 OK in 706 milliseconds
I0603 09:55:43.083804 1540280 round_trippers.go:553] GET https://api-int.paas-03.build.net.intra.laposte.fr:6443/apis/apps/v1/replicasets 200 OK in 118 milliseconds
I0603 09:55:43.320700 1540280 prune.go:58] Creating deployment pruner with keepYoungerThan=24h0m0s, orphans=true, replicaSets=true, keepComplete=1, keepFailed=0
Dry run enabled - no modifications will be made. Add --confirm to remove deployments
panic: interface conversion: interface {} is *v1.Deployment, not *v1.DeploymentConfig

goroutine 1 [running]:
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*dataSet).GetDeployment(0xc007fa9bc0, {0x5052780?, 0xc00a0b67b0?})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/data.go:171 +0x3d6
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*orphanReplicaResolver).Resolve(0xc006ec87f8)
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/resolvers.go:78 +0x1a6
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*mergeResolver).Resolve(0x55?)
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/resolvers.go:28 +0xcf
github.com/openshift/oc/pkg/cli/admin/prune/deployments.(*pruner).Prune(0x5007c40?, {0x50033e0, 0xc0083c19e0})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/prune.go:96 +0x2f
github.com/openshift/oc/pkg/cli/admin/prune/deployments.PruneDeploymentsOptions.Run({0x0, 0x1, 0x1, 0x4e94914f0000, 0x1, 0x0, {0x0, 0x0}, {0x5002d00, 0xc000ba78c0}, ...})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/deployments.go:206 +0xa03
github.com/openshift/oc/pkg/cli/admin/prune/deployments.NewCmdPruneDeployments.func1(0xc0005f4900?, {0xc0006db020?, 0x0?, 0x6?})
        /go/src/github.com/openshift/oc/pkg/cli/admin/prune/deployments/deployments.go:78 +0x118
github.com/spf13/cobra.(*Command).execute(0xc0005f4900, {0xc0006dafc0, 0x6, 0x6})
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:944 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000e5b800)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:1068 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:992
k8s.io/component-base/cli.run(0xc000e5b800)
        /go/src/github.com/openshift/oc/vendor/k8s.io/component-base/cli/run.go:146 +0x317
k8s.io/component-base/cli.RunNoErrOutput(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/component-base/cli/run.go:84
main.main()
        /go/src/github.com/openshift/oc/cmd/oc/oc.go:77 +0x365

Version-Release number of selected component (if applicable):

How reproducible:

   Run  oc adm prune deployments command with --replica-sets option

 #  oc adm prune deployments --keep-younger-than=168h --orphans --keep-complete=5 --keep-failed=1 --replica-sets=true

Actual results:

    Its failing with below error:panic: interface conversion: interface {} is *v1.Deployment, not *v1.DeploymentConfig

Expected results:

    Its should not fail and work as expected.

Additional info:

    Slack thread https://redhat-internal.slack.com/archives/CKJR6200N/p1717519017531979

https://github.com/openshift/oc/pull/1862

Bug OCPBUGS-30119: cert-syncer is forcibly changing secret type without retaining content

View the Description View the linked PRs

Description of problem:

`ensureSigningCertKeyPair` and `ensureTargetCertKeyPair` are always updating secret type. if the secret requires metadata update, its previous content will not be retained

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Install 4.6 cluster (or make sure installer-generated secrets have `type: SecretTypeTLS` instead of `type: kubernetes.io/tls`
    2. Run secret sync
    3. Check secret contents

Actual results:

    Secret was regenerated with new content

Expected results:

Existing content should be preserved, content is not modified

Additional info:

    This causes api-int CA update for clusters born in 4.6 or earlier.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1651

Bug OCPBUGS-31695: Local development of console with auth is failing

View the Description View the linked PRs

Description of problem:

    When trying to run console in local development with auth, the run-bridge.sh script fails out.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Follow step for local development of console with auth - https://github.com/openshift/console/tree/master?tab=readme-ov-file#openshift-with-authentication
    2.
    3.

Actual results:

The run-bridge.sh scripts fails with:

    
$ ./examples/run-bridge.sh
++ oc whoami --show-server
++ oc -n openshift-config-managed get configmap monitoring-shared-config -o 'jsonpath={.data.alertmanagerPublicURL}'
++ oc -n openshift-config-managed get configmap monitoring-shared-config -o 'jsonpath={.data.thanosPublicURL}'
+ ./bin/bridge --base-address=http://localhost:9000 --ca-file=examples/ca.crt --k8s-auth=openshift --k8s-mode=off-cluster --k8s-mode-off-cluster-endpoint=https://api.lprabhu-030420240903.devcluster.openshift.com:6443 --k8s-mode-off-cluster-skip-verify-tls=true --listen=http://127.0.0.1:9000 --public-dir=./frontend/public/dist --user-auth=openshift --user-auth-oidc-client-id=console-oauth-client --user-auth-oidc-client-secret-file=examples/console-client-secret --user-auth-oidc-ca-file=examples/ca.crt --k8s-mode-off-cluster-alertmanager=https://alertmanager-main-openshift-monitoring.apps.lprabhu-030420240903.devcluster.openshift.com --k8s-mode-off-cluster-thanos=https://thanos-querier-openshift-monitoring.apps.lprabhu-030420240903.devcluster.openshift.com
W0403 14:25:07.936281   49352 authoptions.go:99] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
F0403 14:25:07.936827   49352 main.go:539] Failed to create k8s HTTP client: failed to read token file "/var/run/secrets/kubernetes.io/serviceaccount/token": open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

Expected results:

    Bridge runs fine

Additional info:

https://github.com/openshift/console/pull/13720

Bug OCPBUGS-41904: NAD created for ovn-k8s-cni-overlay via console form is unusable

View the Description View the linked PRs

Description of problem:

    Creating a ovn-k8s-cni-overlay generates incorrect YAML

Version-Release number of selected component (if applicable):

   4.16.9

How reproducible:

    100%

Steps to Reproduce:

    1. Console
    2.Networking
    3. NAD
    4. Create
    5.Network type = OVN localnet

Actual results:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: openshift.io/     <--------- wrong
  creationTimestamp: "2024-09-13T04:58:15Z"
  generation: 1
  name: network-aware-ermine
  namespace: default
  resourceVersion: "43545754"
  uid: 543537e3-6981-4d43-a2cb-4f77b9b70824
spec:
  config: '{"name":"asdasdsa","type":"ovn-k8s-cni-overlay","cniVersion":"0.4.0","topology":"localnet","vlanID":3000,"mtu":1500,"netAttachDefName":"default/network-aware-ermine"}'

Expected results:

Without that annotation

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  creationTimestamp: "2024-09-13T04:58:15Z"
  generation: 1
  name: network-aware-ermine
  namespace: default
  resourceVersion: "43545754"
  uid: 543537e3-6981-4d43-a2cb-4f77b9b70824
spec:
  config: '{"name":"asdasdsa","type":"ovn-k8s-cni-overlay","cniVersion":"0.4.0","topology":"localnet","vlanID":3000,"mtu":1500,"netAttachDefName":"default/network-aware-ermine"}'

Additional info:

    This makes pods and virtual machines using the NAD fail to start with "`Invalid value: openshift.io/: name part must be non-empty`"

https://github.com/openshift/console/pull/14366

Bug OCPBUGS-21800: Dev console buildconfig got [the server does not allow this method on the requested resource] error when not setting metadate.namespace

View the Description View the linked PRs

Description of problem:

Dev console buildconfig got [the server does not allow this method on the requested resource] error when not setting metadate.namespace

How reproducible:

Test case is shown in below

Steps to Reproduce:

Using below to create a Buildconfig in GUI page of 
openshift console -> Developer -> Builds -> Create BuildConfig -> yaml view


  
~~~
apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: mywebsite
  labels:
    name: mywebsite
spec:
  triggers:
  - type: ImageChange
    imageChange: {}
  - type: ConfigChange
  source:
    type: Git
    git:
      uri: https://github.com/monodot/container-up
    contextDir: httpd-hello-world
  strategy:
    type: Docker
    dockerStrategy:
      dockerfilePath: Dockerfile
      from:
        kind: ImageStreamTag
        name: httpd:latest 
        namespace: testbuild
  output:
    to:
      kind: ImageStreamTag
      name: mywebsite:latest 
~~~

Actual results:

Get  [the server does not allow this method on the requested resource] error

Expected results:

we can find even not setting metadata.namespace in CLI mode or by contact from the customer that in 4.11 GUI console will not trigger this error, Does that mean the code changed in 4.13 ?

https://github.com/openshift/console/pull/13544

Bug OCPBUGS-24606: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1821

Bug OCPBUGS-25633: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/306

Bug OCPBUGS-24690: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-26117: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/784

Bug OCPBUGS-27469: "Deploy Image" with "Serverless Deployment", Scaling "Min Pods"/"Max Pods" should set "autoscaling.knative.dev/min-scale"/max-scale not minScale/maxScale

View the Description View the linked PRs

Description of problem:

Creating a Serverless Deployment with "Scaling" "Min Pods"/"Max Pods" options set, uses deprecated knative annotations "autoscaling.knative.dev/minScale" / "maxScale",

the correct current ones are "autoscaling.knative.dev/min-scale" / "max-scale"

The same problem with "autoscaling.knative.dev/targetUtilizationPercentage" , which should be "autoscaling.knative.dev/target-utilization-percentage"

Prerequisites (if any, like setup, operators/versions):

Serverless operator

Steps to Reproduce

Install serverless operator
Create KnativeServing in knative-serving namespace
create a test "foobar" namespace
Go to <console>/deploy-image/ns/foobar
Use gcr.io/knative-samples/helloworld-go as "Image name from external registry" (or any webserver image listening on :8080)
Choose "Serverless Deployment" for the "resource type"
Click on "Scaling" in "Click on the names to access advanced options for ..."
Set "2" for "Min Pods" and "3" for "Max Pods"
Create

Actual results:

The created ksvc resource has

 spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/maxScale: "3"
        autoscaling.knative.dev/minScale: "2"
        autoscaling.knative.dev/targetUtilizationPercentage: "70"

Expected results:

The created ksvc should have

spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/max-scale: "3"
        autoscaling.knative.dev/min-scale: "2"
        autoscaling.knative.dev/target-utilization-percentage: "70"

Reproducibility (Always/Intermittent/Only Once): Always

Build Details:

4.14.8

Workaround:

none required ATM, current serverless still supports the deprecated "minScale"/"maxScale" annotations.

Additional info:

https://docs.openshift.com/serverless/1.31/knative-serving/autoscaling/serverless-autoscaling-developer-scale-bounds.html

https://issues.redhat.com/browse/SRVKS-910

https://github.com/openshift/console/pull/13534

Bug OCPBUGS-33046: When we scale up new nodes, the PinnedImageSets status in the MCPs takes very long to be updated

View the Description View the linked PRs

Description of problem:

When new nodes are scaled up, the PinnedImageSets status in the MCP takes a long while to be updated.

Nevertheless, after a long time (sometimes about 30mins) they are correctly updated.

Version-Release number of selected component (if applicable):

pre-merge: https://github.com/openshift/machine-config-operator/pull/4303

How reproducible:

Always

Steps to Reproduce:

    1. Create a pinnedimageset for the worker node with one only image (so that it is fast)
    2. Wait until the pinned image is pinned in all worker nodes
    3. Scale up 10 new worker nodes, or thereabouts

Actual results:

When all new nodes are created, the image is correctly pinned in all of them, but the status in the MCP is not fully synced until a long time later. Sometimes even 25 or 30 minutes.

oc get mcp worker -o yaml
....
 poolSynchronizersStatus:
  - availableMachineCount: 12
    machineCount: 12
    poolSynchronizerType: PinnedImageSets
    readyMachineCount: 12
    unavailableMachineCount: 0
    updatedMachineCount: 10
  readyMachineCount: 12

Expected results:

The status in the MCP should be updated earlier once all nodes have finished pinning the image.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4339

Bug MGMT-16716: [STG][UI][Nutanix] Operator Install OpenShift Virtualization should be disabled when select platform Nutanix

View the Description View the linked PRs

Description of the problem:

Up to latest decision RH won't going to support installation OCP cluster on Nutanix with

nested virtualization. Thus the check box "Install OpenShift Virtualization" on page "Operators" should be disabled when select platform "Nutanix" on page "Cluster Details"

Slack discussion thread

https://redhat-internal.slack.com/archives/C0211848DBN/p1706640683120159

Nutanix
https://portal.nutanix.com/page/documents/kbs/details?targetId=kA00e000000XeiHCAS

https://github.com/openshift/assisted-service/pull/5991

Task MGMT-17365: Allow setting agent labels from BMH

View the Description View the linked PRs

BMH is the custom resource that is used to add a new host. Agent on the
other hand is created automatically when a host registers. Since there
is a need to control agent labels the following agent label support was
added:

In order to add an entry that controls agent label, a new BMH annotation
needs to be added.
The annotation key is prefixed with the string
'bmac.agent-install.openshift.io.agent-label.'. The remainder of the
annotation is considered the label key.
The value of the annotation is a JSON dictionary with 2 possible keys.
The key 'operation' can contain one of the values ["add","delete"] which
mean that the label can either added , or deleted.
The dictionary key 'value' contains the label value.

https://github.com/openshift/assisted-service/pull/6124

Bug OCPBUGS-24966: Update 4.16 csi-attacher-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/66

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/66

Bug OCPBUGS-25718: vSphere ABI failed due to storage operator degraded

View the Description View the linked PRs

Description of problem:

The degradation of the storage operator occurred because it couldn't locate the node by UUID. I noticed that the providerID was present for node 0, but it was blank for other nodes. A successful installation can be achieved on day 2 by executing step 4 after step 7 from this document: https://access.redhat.com/solutions/6677901. Additionally, if we provide credentials from the install-config, it's necessary to add a taint to the node using the uninitialized taint(oc adm taint node "$NODE" node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule) after the bootstrap completed.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

    1. Create an agent ISO image
    2. Boot the created ISO on vSphere VM

Actual results:

Installation is failing due to storage operator unable to find the node by UUID.

Expected results:

Storage operator should be installed without any issue.

Additional info:

Slack discussion: https://redhat-internal.slack.com/archives/C02SPBZ4GPR/p1702893456002729

https://github.com/openshift/assisted-installer/pull/778

Bug OCPBUGS-32162: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/471

Bug OCPBUGS-41905: List of default Camel K event sources disappears when adding a custom event source

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39110~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-29528. The following is the description of the original issue:
—
Description of problem:

Camel K provides a list of Kamelets that are able to act as an event source or sink for a Knative eventing message broker.

Usually the list of Kamelets installed with the Camel K operator are displayed in the Developer Catalog list of available event sources with the provider "Apache Software Foundation" or "Red Hat Integration".

When a user adds a custom Kamelet custom resource to the user namespace the list of default Kamelets coming from the Camel K operator is gone. The Developer Catalog event source list then only displays the custom Kamelet but not the default ones.

Version-Release number of selected component (if applicable):

How reproducible:

Apply a custom Kamelet custom resource to the user namespace and open the list of available event sources in Dev Console Developer Catalog.

Steps to Reproduce:

    1. install global Camel K operator in operator namespace (e.g. openshift-operators)
    2. list all available event sources in "default" user namespace and see all Kamelets listed as event sources/sinks
    3. add a custom Kamelet custom resource to the default namespace
    4. see the list of available event sources only listing the custom Kamelet and the default Kamelets are gone from that list

Actual results:

Default Kamelets that act as event source/sink are only displayed in the Developer Catalog when there is no custom Kamelet added to a namespace.

Expected results:

Default Kamelets coming with the Camel K operator (installed in the operator namespace) should always be part of the Developer Catalog list of available event sources/sinks. When the user adds more custom Kamelets these should be listed, too.

Additional info:

Reproduced with Camel K operator 2.2 and OCP 4.14.8

screenshots: https://drive.google.com/drive/folders/1mTpr1IrASMT76mWjnOGuexFr9-mP0y3i?usp=drive_link

https://github.com/openshift/console/pull/14290

Story MON-3669: Provide a telemeter metric that tracks effective cluster size in cores for subscription usage

View the Description View the linked PRs

Red Hat OpenShift Container Platform subscriptions are often measured against underlying cores. However, the metrics for cores are unreliable with some known edge cases. Namely, when virtualization is used, depending on a variety of factors, the hypervisor doesn't report the underlying cores, and instead reports a core per "cpu" where "cpu" is a schedulable executor (possibly backed by a single hyperthreaded executor). In order to address, we assume a ratio of 2-vCPU to 1 core, and divide the "cores" value by 2 to normalize when we detect that hyperthreading information was not reported, when we're on x86-64 CPU architecture, and when the cluster is not a bare-metal cluster.

At this time, x86-64 virtualized clusters are the ones affected.

Bug OCPBUGS-30647: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-36781: Azure HC fails to create AzureMachineTemplate if a MachineIdentityID is not provided

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36681~~. The following is the description of the original issue:
—
Description of problem:

Azure HC fails to create AzureMachineTemplate if a MachineIdentityID is not provided. 

E0705 19:09:23.783858       1 controller.go:329] "Reconciler error" err="failed to parse ProviderID : invalid resource ID: id cannot be empty" controller="azuremachinetemplate" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureMachineTemplate" AzureMachineTemplate="clusters-hostedcp-1671-hc/hostedcp-1671-hc-f412695a" namespace="clusters-hostedcp-1671-hc" name="hostedcp-1671-hc-f412695a" reconcileID="74581db2-0ac0-4a30-abfc-38f07b8247cc"

https://github.com/openshift/hypershift/blob/84f594bd2d44e03aaac2d962b0d548d75505fed7/hypershift-operator/controllers/nodepool/azure.go#L52 does not check first to see if a MachineIdentityID was provided before adding the UserAssignedIdentity field.

Version-Release number of selected component (if applicable):

How reproducible:

    Every time

Steps to Reproduce:

    1. Create an Azure HC without a MachineIdentityID

Actual results:

    Azure HC fails to create AzureMachineTemplate properly, nodes aren't created, and HC is in a failed state.

Expected results:

     Azure HC creates AzureMachineTemplate properly, nodes are created, and HC is in a completed state.

Additional info:

https://github.com/openshift/hypershift/pull/4341

Bug CLID-70: v2 is not mirroring multiple catalogs

View the Description View the linked PRs

In v2 there is an issue to mirroring multiple catalogs at the same mirroring flow.

https://github.com/openshift/oc-mirror/pull/807

Bug OCPBUGS-19427: Whitespace at the end of URL in ICSP is carried over into resources conf file and results in invalid url errors

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4064

Bug OCPBUGS-26187: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/98

Bug OCPBUGS-15861: openshift-baremetal-installer should not link against libvirt

View the Description View the linked PRs

Currently the openshift-baremetal-install binary is dynamically linked to libvirt-client, meaning that it is only possible to run it on a RHEL system with libvirt installed.

A new version of the libvirt bindings, v1.8010.0, allows the library to be loaded only on demand, so that users who do not execute any libvirt code can run the rest of the installer without needing to install libvirt. (See this comment from Dan Berrangé.) In practice, the "rest of the installer" is everything except the baremetal destroy cluster command (which destroys the bootstrap storage pool - though only if the bootstrap itself has already been successfully destroyed - and has probably never been used by anybody ever). The Terraform providers all run in a separate binary.

There is also a pure-go libvirt library that can be used even within a statically-linked binary on any platform, even when interacting with libvirt. The libvirt terraform provider that does almost all of our interaction with libvirt already uses this library.

https://github.com/openshift/installer/pull/7252

Bug OCPBUGS-35493: [gcp] with N2D instance type, unexpectedly got error "Confidential Instance Config is only supported for compatible cpu platforms"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34416~~. The following is the description of the original issue:
—
Description of problem:

    Specifying N2D machine types for compute and controlPlane machines, with "confidentialCompute: Enabled", "create cluster" got the error "Confidential Instance Config is only supported for compatible cpu platforms" [1], while the real cause is missing the settings "onHostMaintenance: Terminate". That being said, the 4.16 error is mis-leading, suggest to be consistent with 4.15 [2] / 4/14 [3] error messages. 

FYI Confidential VM is supported on N2D machine types (see https://cloud.google.com/confidential-computing/confidential-vm/docs/supported-configurations#machine-type-cpu-zone).

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-21-221942

How reproducible:

Always

Steps to Reproduce:

    1. Please refer to [1]

Actual results:

    The error message is like "Confidential Instance Config is only supported for compatible cpu platforms", which is mis-leading.

Expected results:

    4.15 [2] / 4/14 [3] error messages, which look better.

Additional info:

    FYI it is about QE test case OCP-60212 scenario b.

https://github.com/openshift/installer/pull/8606

Bug MGMT-16721: Some instructions request crash assisted-service

View the Description View the linked PRs

Description of the problem:

Some requests of this type:

{"x_request_id":"05e66411-7612-46bb-86c2-69bf7096b6da","protocol":"HTTP/1.1","authority":"zoscaru4s08w1mz.api.openshift.com","user_agent":"Go-http-client/2.0","method":"GET","response_flags":"UC","x_forwarded_for":"163.244.72.2,10.128.10.16,23.21.192.204","bytes_rx":0,"duration":13,"bytes_tx":95,"response_code":503,"timestamp":"2024-01-31T15:52:41.418Z","upstream_duration":null,"path":"/api/assisted-install/v2/infra-envs/84596f7d-0138-4f57-ada4-be72aea031a5/hosts/62148a92-8588-b591-5f7e-046bf1136b3b/instructions?timestamp=1706716359"}

are causing 503 because the applicaiton crashes. Fortunately it's only a goroutine to crash, so main loop is still going and other requests seem unaffected

How reproducible:

Not sure what conditions we need to meet but plenty of requests of this type can be found from prod logs

Steps to reproduce:

1.

2.

3.

Actual results:

2024/01/31 16:41:31 http: panic serving 127.0.0.1:39486: runtime error: invalid memory address or nil pointer dereference
goroutine 575931 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1854 +0xbf
panic({0x4369120, 0x6bee680})
	/usr/local/go/src/runtime/panic.go:890 +0x263
github.com/openshift/assisted-service/internal/host/hostcommands.(*tangConnectivityCheckCmd).getTangServersFromHostIgnition(0x34?, 0x484da60?)
	/assisted-service/internal/host/hostcommands/tang_connectivity_check_cmd.go:41 +0x3e
github.com/openshift/assisted-service/internal/host/hostcommands.(*tangConnectivityCheckCmd).GetSteps(0xc000a5c440, {0xc00057c1c8?, 0x48adb49?}, 0xc000f5d180)
	/assisted-service/internal/host/hostcommands/tang_connectivity_check_cmd.go:104 +0x112
github.com/openshift/assisted-service/internal/host/hostcommands.(*InstructionManager).GetNextSteps(0xc000ad0780, {0x50cf558, 0xc00444b4a0}, 0xc000f5d180)
	/assisted-service/internal/host/hostcommands/instruction_manager.go:178 +0xa2f
github.com/openshift/assisted-service/internal/host.(*Manager).GetNextSteps(0xc0003e4990?, {0x50cf558?, 0xc00444b4a0?}, 0x0?)
	/assisted-service/internal/host/host.go:548 +0x48
github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).V2GetNextSteps(0xc000da4800, {0x50cf558, 0xc00444b4a0}, {0xc00203b500, 0xc002b9f460, {0xc001daa690, 0x24}, {0xc001daa6f0, 0x24}, 0xc0015c8158})
	/assisted-service/internal/bminventory/inventory.go:5357 +0x1b8
github.com/openshift/assisted-service/restapi.HandlerAPI.func54({0xc00203b500, 0xc002b9f460, {0xc001daa690, 0x24}, {0xc001daa6f0, 0x24}, 0xc0015c8158}, {0x408d980?, 0xc000fb67e0?})
	/assisted-service/restapi/configure_assisted_install.go:654 +0xf4
github.com/openshift/assisted-service/restapi/operations/installer.V2GetNextStepsHandlerFunc.Handle(0xc00203b300?, {0xc00203b500, 0xc002b9f460, {0xc001daa690, 0x24}, {0xc001daa6f0, 0x24}, 0xc0015c8158}, {0x408d980, 0xc000fb67e0})
	/assisted-service/restapi/operations/installer/v2_get_next_steps.go:19 +0x7a
github.com/openshift/assisted-service/restapi/operations/installer.(*V2GetNextSteps).ServeHTTP(0xc00169e468, {0x50bfe00, 0xc001399d20}, 0xc00203b500)
	/assisted-service/restapi/operations/installer/v2_get_next_steps.go:66 +0x2dd
github.com/go-openapi/runtime/middleware.NewOperationExecutor.func1({0x50bfe00, 0xc001399d20}, 0xc00203b500)
	/assisted-service/vendor/github.com/go-openapi/runtime/middleware/operation.go:28 +0x59
net/http.HandlerFunc.ServeHTTP(0x0?, {0x50bfe00?, 0xc001399d20?}, 0x17334b7?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/openshift/assisted-service/internal/metrics.Handler.func1.1()
	/assisted-service/internal/metrics/reporter.go:37 +0x31
github.com/slok/go-http-metrics/middleware.Middleware.Measure({{{0x50bfdd0, 0xc0014ce0e0}, {0x48922ea, 0x12}, 0x0, 0x0, 0x0}}, {0x0, 0x0}, {0x50d23c0, ...}, ...)
	/assisted-service/vendor/github.com/slok/go-http-metrics/middleware/middleware.go:117 +0x30e
github.com/openshift/assisted-service/internal/metrics.Handler.func1({0x50cdea0?, 0xc0005ec770}, 0xc00203b500)
	/assisted-service/internal/metrics/reporter.go:36 +0x35f
net/http.HandlerFunc.ServeHTTP(0x50cf558?, {0x50cdea0?, 0xc0005ec770?}, 0xc002b9ee80?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/openshift/assisted-service/pkg/context.ContextHandler.func1.1({0x50cdea0, 0xc0005ec770}, 0xc00203b400)
	/assisted-service/pkg/context/param.go:95 +0xc8
net/http.HandlerFunc.ServeHTTP(0xc001735430?, {0x50cdea0?, 0xc0005ec770?}, 0xc0026625f8?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/go-openapi/runtime/middleware.NewRouter.func1({0x50cdea0, 0xc0005ec770}, 0xc00203b200)
	/assisted-service/vendor/github.com/go-openapi/runtime/middleware/router.go:77 +0x257
net/http.HandlerFunc.ServeHTTP(0x7fc8dc2c3820?, {0x50cdea0?, 0xc0005ec770?}, 0xc00009f000?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/go-openapi/runtime/middleware.Redoc.func1({0x50cdea0, 0xc0005ec770}, 0x418b480?)
	/assisted-service/vendor/github.com/go-openapi/runtime/middleware/redoc.go:72 +0x242
net/http.HandlerFunc.ServeHTTP(0x1?, {0x50cdea0?, 0xc0005ec770?}, 0x0?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/go-openapi/runtime/middleware.Spec.func1({0x50cdea0, 0xc0005ec770}, 0x486ac6f?)
	/assisted-service/vendor/github.com/go-openapi/runtime/middleware/spec.go:46 +0x18c
net/http.HandlerFunc.ServeHTTP(0xc001136380?, {0x50cdea0?, 0xc0005ec770?}, 0xc00203b200?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/rs/cors.(*Cors).Handler.func1({0x50cdea0, 0xc0005ec770}, 0xc00203b200)
	/assisted-service/vendor/github.com/rs/cors/cors.go:281 +0x1c4
net/http.HandlerFunc.ServeHTTP(0x0?, {0x50cdea0?, 0xc0005ec770?}, 0x4?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x50cd990, 0xc0012e8540}, 0xc0017358f0?)
	/assisted-service/vendor/github.com/NYTimes/gziphandler/gzip.go:336 +0x24e
net/http.HandlerFunc.ServeHTTP(0x100c0017359e8?, {0x50cd990?, 0xc0012e8540?}, 0x10?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/openshift/assisted-service/pkg/app.WithMetricsResponderMiddleware.func1({0x50cd990?, 0xc0012e8540?}, 0x16f0a19?)
	/assisted-service/pkg/app/middleware.go:32 +0xb0
net/http.HandlerFunc.ServeHTTP(0xc000a26900?, {0x50cd990?, 0xc0012e8540?}, 0xc00444a7e0?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/openshift/assisted-service/pkg/app.WithHealthMiddleware.func1({0x50cd990?, 0xc0012e8540?}, 0x5094901?)
	/assisted-service/pkg/app/middleware.go:55 +0x162
net/http.HandlerFunc.ServeHTTP(0x50cf4b0?, {0x50cd990?, 0xc0012e8540?}, 0x5094980?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
github.com/openshift/assisted-service/pkg/requestid.handler.ServeHTTP({{0x50aa9c0?, 0xc0008a4eb0?}}, {0x50cd990, 0xc0012e8540}, 0xc00203b100)
	/assisted-service/pkg/requestid/requestid.go:69 +0x1ad
github.com/openshift/assisted-service/internal/spec.WithSpecMiddleware.func1({0x50cd990?, 0xc0012e8540?}, 0xc00203b100?)
	/assisted-service/internal/spec/spec.go:38 +0x9b
net/http.HandlerFunc.ServeHTTP(0xc00124ec35?, {0x50cd990?, 0xc0012e8540?}, 0x170a0ce?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc002cb8f30?}, {0x50cd990, 0xc0012e8540}, 0xc00203b100)
	/usr/local/go/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc00189d320, {0x50cf558, 0xc0011fc240})
	/usr/local/go/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3089 +0x5ed

Expected results:

https://github.com/openshift/assisted-service/pull/5940

Bug OCPBUGS-27074: some events are missing time related infomration

View the Description View the linked PRs

Description of problem:

Some events have time related infomration set to null (firstTimestamp, lastTimestamp, eventTime)

Version-Release number of selected component (if applicable):

cluster-logging.v5.8.0

How reproducible:

100%

Steps to Reproduce:

    1.Stop one of the masters
    2.Start the master
    3.Wait untill the ENV stabilizes
    4. oc get events -A | grep unknown

Actual results:

oc get events -A | grep unknow
default                                      <unknown>   Normal    TerminationStart                             namespace/kube-system                                                            Received signal to terminate, becoming unready, but keeping serving
default                                      <unknown>   Normal    TerminationPreShutdownHooksFinished          namespace/kube-system                                                            All pre-shutdown hooks have been finished
default                                      <unknown>   Normal    TerminationMinimalShutdownDurationFinished   namespace/kube-system                                                            The minimal shutdown duration of 0s finished
....

Expected results:

    All time related information is set correctly

Additional info:

   This causes issues with external monitoring systems. Events with no timestamp will never show or push other events from the view depending on the sorting order of the timestamp. The operator of the environment has then trouble to see what is happening there.

https://github.com/openshift/kubernetes/pull/1980

Bug OCPBUGS-25567: Update 4.16 ose-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-operator/pull/328

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-operator/pull/328

Bug OCPBUGS-32739: MachineConfigurations is only effective with name

View the Description View the linked PRs

Description of problem:

Currently MachineConfiguration is only effective with name cluster, we can create multiple MachineConfigurations with other names like

apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: ndp-file-action-none
  namespace: openshift-machine-config-operator
spec:
  nodeDisruptionPolicy:
    files:
      - path: /etc/test
        actions:
           - type: None

But only 'cluster' can take action, it will make the user confused

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

Create MachineConfiguration with any name expect 'cluster'

Actual results:

new MachineConfiguration won't be effective

Expected results:

if the function only works with 'cluster' object, we should reject the CR creation with other names

Additional info:

https://github.com/openshift/machine-config-operator/pull/4332

Bug OCPBUGS-37857: Featuregate taking invalid value

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35906~~. The following is the description of the original issue:
—
Description of problem:

Featuregate taking unknown value

Version-Release number of selected component (if applicable):

4.16 and 4.17

How reproducible:

Always

Steps to Reproduce:

 oc patch featuregate cluster --type=json -p '[{"op": "replace", "path": "/spec/featureSet", "value": "unknownghfh"}]'
featuregate.config.openshift.io/cluster patched
 oc get  featuregate cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
  creationTimestamp: "2024-06-21T07:20:25Z"
  generation: 2
  name: cluster
  resourceVersion: "56172"
  uid: c900a975-78ea-4076-8e56-e5517e14b55e
spec:
  featureSet: unknownghfh

Actual results:

featuregate.config.openshift.io/cluster patched

metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
  creationTimestamp: "2024-06-21T07:20:25Z"
  generation: 2
  name: cluster
  resourceVersion: "56172"
  uid: c900a975-78ea-4076-8e56-e5517e14b55e
spec:
  featureSet: unknownghfh

Expected results:

Should not take invalid value and give error

{{oc patch featuregate cluster --type=json -p '[

{"op": "replace", "path": "/spec/featureSet", "value": "unknownghfh"}

]'}}
The FeatureGate "cluster" is invalid: spec.featureSet: Unsupported value: "unknownghfh": supported values: "", "CustomNoUpgrade", "LatencySensitive", "TechPreviewNoUpgrade"

Additional info:

https://github.com/openshift/kubernetes/commit/facd3b18622d268a4780de1ad94f7da763351425

https://github.com/openshift/api/pull/1985

Bug OCPBUGS-26952: no ipsec on cluster post NS mc's deletion during ipsecConfig mode `Full`

View the Description View the linked PRs

Description of problem:

 no ipsec on cluster post NS mc's deletion during ipsecConfig mode `Full`, on an upgraded cluster from 4.14 ->4.15 build

Version-Release number of selected component (if applicable):

 bot build on https://github.com/openshift/cluster-network-operator/pull/2191

How reproducible:

    Always

Steps to Reproduce:

Steps:
1. Cluster on EW+NS cluster(4.14), Upgraded to above bot build to check ipsecConfig modes 
2. ipsecConfig mode changed to Full
3. Deleted NS MCs 
4. new MCs spawned up as `80-ipsec-master-extensions` and `80-ipsec-worker-extensions`
5. cluster settled with no ipsec at all (no ovn-ipsec-host ds)
6. mode still Full

Actual results:

mode Full actually replicated Diasbled state on above steps

Expected results:

Just NS IPsec should have gone away. EW should have persisted

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2201

Bug OCPBUGS-28325: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/100

Bug OCPBUGS-28559: Update 4.16 ose-containernetworking-plugins-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/containernetworking-plugins/pull/150

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/containernetworking-plugins/pull/150

Bug OCPBUGS-31089: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3775

Bug OCPBUGS-31438: Default catalog source pod never get updates

View the Description View the linked PRs

Description of problem:

The default catalog source pod never gets updates, the users have to manually recreate it to get updated. Here is must-gather log for your debugging: https://drive.google.com/file/d/16_tFq5QuJyc_n8xkDFyK83TdTkrsVFQe/view?usp=drive_link

I went through the code and found the `updateStrategy` depends on the `ImageID`, see

https://github.com/openshift/operator-framework-olm/blob/master/staging/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go#L527-L534

// imageID returns the ImageID of the primary catalog source container or an empty string if the image ID isn't available yet.
// Note: the pod must be running and the container in a ready status to return a valid ImageID.
func imageID(pod *corev1.Pod) string {
 if len(pod.Status.ContainerStatuses) < 1 {
 logrus.WithField("CatalogSource", pod.GetName()).Warn("pod status unknown")
 return ""
 }
 return pod.Status.ContainerStatuses[0].ImageID
}

But, for those default catalog source pods, their `pod.Status.ContainerStatuses[0].ImageID` will never change since it's the `opm` image, not index image.

jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.containerStatuses} |jq
[
  {
    "containerID": "cri-o://115bd207312c7c8c36b63bfd251c085a701c58df2a48a1232711e15d7595675d",
    "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
    "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:965fe452763fd402ca8d8b4a3fdb13587673c8037f215c0ffcd76b6c4c24635e",
    "lastState": {},
    "name": "registry-server",
    "ready": true,
    "restartCount": 1,
    "started": true,
    "state": {
      "running": {
        "startedAt": "2024-03-26T04:21:41Z"
      }
    }
  }
]

The imageID() func should return the index image ID for those default catalog sources.

jiazha-mac:~ jiazha$ oc get pods redhat-operators-mpvzm -o=jsonpath={.status.initContainerStatuses[1]} |jq
{
  "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
  "image": "registry.redhat.io/redhat/redhat-operator-index:v4.15",
  "imageID": "registry.redhat.io/redhat/redhat-operator-index@sha256:19010760d38e1a898867262698e22674d99687139ab47173e2b4665e588635e1",
  "lastState": {},
  "name": "extract-content",
  "ready": true,
  "restartCount": 1,
  "started": false,
  "state": {
    "terminated": {
      "containerID": "cri-o://4cd6e1f45e23aadc27b8152126eb2761a37da61c4845017a06bb6f2203659f5c",
      "exitCode": 0,
      "finishedAt": "2024-03-26T04:21:39Z",
      "reason": "Completed",
      "startedAt": "2024-03-26T04:21:27Z"
    }
  }
}

Version-Release number of selected component (if applicable):

    4.15.2

How reproducible:

    always

Steps to Reproduce:

    1. Install an OCP 4.16.0
    2. Waiting for the redhat-operator catalog source updates
    3.

Actual results:

The redhat-operator catalog source never gets updates.

Expected results:

These default catalog source should get updates depending on the `updateStrategy`.

    jiazha-mac:~ jiazha$ oc get catalogsource redhat-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  annotations:
    operatorframework.io/managed-by: marketplace-operator
    target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
  creationTimestamp: "2024-03-20T15:48:59Z"
  generation: 1
  name: redhat-operators
  namespace: openshift-marketplace
  resourceVersion: "12217605"
  uid: cc0fc420-c9d8-4c7d-997e-f0893b4c497f
spec:
  displayName: Red Hat Operators
  grpcPodConfig:
    extractContent:
      cacheDir: /tmp/cache
      catalogDir: /configs
    memoryTarget: 30Mi
    nodeSelector:
      kubernetes.io/os: linux
      node-role.kubernetes.io/master: ""
    priorityClassName: system-cluster-critical
    securityContextConfig: restricted
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 120
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 120
  icon:
    base64data: ""
    mediatype: ""
  image: registry.redhat.io/redhat/redhat-operator-index:v4.15
  priority: -100
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m
status:
  connectionState:
    address: redhat-operators.openshift-marketplace.svc:50051
    lastConnect: "2024-03-27T06:35:36Z"
    lastObservedState: READY
  latestImageRegistryPoll: "2024-03-27T10:23:16Z"
  registryService:
    createdAt: "2024-03-20T15:56:03Z"
    port: "50051"
    protocol: grpc
    serviceName: redhat-operators
    serviceNamespace: openshift-marketplace

Additional info:

I also checked the currentPodsWithCorrectImageAndSpec, but no hash changed due to the pod.spec are the same always.

time="2024-03-26T03:22:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace
time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
time="2024-03-26T03:27:01Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=xW0cW
time="2024-03-26T03:27:02Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA
time="2024-03-26T03:27:03Z" level=info msg="of 1 pods matching label selector, 1 have the correct images and matching hash" catalogsource.name=redhat-operators catalogsource.namespace=openshift-marketplace correctHash=true correctImages=true current-pod.name=redhat-operators-mpvzm current-pod.namespace=openshift-marketplace id=vq5VA

https://github.com/openshift/operator-framework-olm/pull/719

Bug HOSTEDCP-1715: Kubernetes API Server Log Verbosity Annotation cherry pick to 4.16

View the Description View the linked PRs

Child of https://issues.redhat.com/browse/HOSTEDCP-1553 for cherry-picking to 4.15

https://github.com/openshift/hypershift/pull/4177

Task MON-3747: Extend CMO e2e tests timeout

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2278

Bug OCPBUGS-31813: AWS: Installer requires nonexistent s3:HeadBucket permission

View the Description View the linked PRs

Description of problem:

    Installer requires the `s3:HeadBucket` even though such permission does not exist. The correct permission for the `HeadBucket` action is `s3:ListBucket`

https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Install a cluster using a role with limited permissions
    2.
    3.

Actual results:

    level=warning msg=Action not allowed with tested creds action=iam:DeleteUserPolicy
level=warning msg=Tested creds not able to perform all requested actions
level=warning msg=Action not allowed with tested creds action=s3:HeadBucket
level=warning msg=Tested creds not able to perform all requested actions
level=fatal msg=failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: AWS credentials cannot be used to either create new creds or use as-is
Installer exit with code 1

Expected results:

    Installer should check only for s3:ListBucket

Additional info:

https://github.com/openshift/installer/pull/8233

Bug OCPBUGS-19551: Clean up northd templating

View the Description View the linked PRs

Some templating code can be simplified now.

https://github.com/openshift/cluster-network-operator/pull/2234

Bug OCPBUGS-30569: Systemd processes not being moved to cpuset/systemd.slice

View the Description View the linked PRs

Description of problem:

This is only applicable to systems that install a performance profile

There seems to be a race condition where all systemd spawed processes are not being moved to /sys/fs/cgroup/cpuset/system.slice.

This is suppose to be done by the one-shot cpuset-configure.service. 
Here is a list of processes I see on one lab that are still in the root directory 

/usr/bin/dbus-broker-launch --scope system --audit
dbus-broker --log 4 --controller 9 --machine-id 071fd738af0146859d2c04b7fea6d276 --max-bytes 536870912 --max-fds 4096 --max-matches 131072 --audit
/usr/sbin/NetworkManager --no-daemon
/usr/sbin/dnsmasq -k
/sbin/agetty -o -p -- \u --noclear - linux
sshd: core@pts/0

Version-Release number of selected component (if applicable):

    4.14, 4.15

How reproducible:

Steps to Reproduce:

    1. Reboot a SNO with a peformance profile applied 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/992

Bug OCPBUGS-26115: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/103

Bug OCPBUGS-32138: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4316

Bug OCPBUGS-35320: [CI Watcher] Deploy git workload with devfile from topology page. Create the different workloads from Add page Deploy git workload with devfile from topology page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34828~~. The following is the description of the original issue:
—
Multiple pr failing with this error...

Deploy git workload with devfile from topology page: A-04-TC01: Create the different workloads from Add page Deploy git workload with devfile from topology page: A-04-TC01 expand_less18s{`cy.focus()` can only be called on a single element. Your subject contained 14 elements.

https://on.cypress.io/focus CypressError CypressError: `cy.focus()` can only be called on a single element. Your subject contained 14 elements.

https://search.dptools.openshift.org/?search=Deploy+git+workload+with+devfile+from+topology+page&maxAge=336h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console/pull/13962

Bug OCPBUGS-22637: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/108

Bug OCPBUGS-27848: vsphere: when esxi host is offline no version is present

View the Description View the linked PRs

When a esxi host is in maintenance mode the installer is unable to query the hosts' version causing validation to fail.

time="2024-01-04T05:30:45-05:00" level=fatal msg="failed to fetch Terraform Variables: failed to fetch dependency of \"Terraform Variables\": failed to generate asset \"Platform Provisioning Check\": platform.vsphere: Internal error: vCenter is failing to retrieve config product version information for the ESXi host: "

https://github.com/openshift/installer/blob/0d56a06e02343e6603128e3f58c6c9bbc2edea3d/pkg/asset/installconfig/vsphere/validation.go#L247-L319

https://github.com/openshift/installer/pull/8206

Bug OCPBUGS-31315: i18n upload/download routine task

View the Description View the linked PRs

The story is to track i18n upload/download routine tasks which are perform every sprint.

A.C.

- Upload strings to Memosource at the start of the sprint and reach out to localization team

- Download translated strings from Memsource when it is ready

- Review the translated strings and open a pull request

- Open a followup story for next sprint

https://github.com/openshift/console/pull/13692

Bug OCPBUGS-30441: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-node-driver-registrar/pull/69

Bug OCPBUGS-22528: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/81

Bug OCPBUGS-30831: ART requests updates to 4.16 image cluster-network-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-network-operator/pull/2308

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-network-operator/pull/2308

Bug OCPBUGS-22602: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-ibm/pull/59

Bug OCPBUGS-28879: vsphere-problem-detector-operator does not respect cluster wide proxy

View the Description View the linked PRs

Description of problem:

In a recently installed cluster running 4.13.29, after configuring the cluster-wide-proxy, the "vsphere-problem-detector" is not taking the proxy configuration.
As the pod cannot reach vSphere  it's failing to run checks:
2024-02-01T09:28:00.150332407Z E0201 09:28:00.150292       1 operator.go:199] failed to run checks: failed to connect to vsphere.local: Post "https://vsphere.local/sdk": dial tcp 172.16.1.3:443: i/o timeout  

The pod doesn't get the cluster proxy settings as expected:
   - name: HTTPS_PROXY
     value: http://proxy.local:3128
   - name: HTTP_PROXY
     value: http://proxy.local:3128

Other storage related pods get the configuration expected as above.

This causes the vsphere-problem-detector to fail connections to vSphere, hence failing the health checks.

Version-Release number of selected component (if applicable):

  4.13.29

How reproducible:

   Always

Steps to Reproduce:

    1.Configure cluster-wide proxy in the environment. 
    2. Wait for the change
    3. Check the pod configuration

Actual results:

    vSphere health checks failing

Expected results:

    vSphere health checks working through the cluster proxy

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/457

Bug OCPBUGS-29098: revert metrics crio change / need to backport metrics_host to 1.28 of crio

View the Description View the linked PRs

Description of problem:

We need to backport https://github.com/cri-o/cri-o/pull/7744 into 1.28 of crio. CI is failing on upgrades due to a feature not in 1.28.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4168

Bug OCPBUGS-35030: Various jobs are failing on bootstrap phase

View the Description View the linked PRs

This is a clone of issue OCPBUGS-34900. The following is the description of the original issue:
—
Description of problem:

The following jobs have been failing on the bootstrap stage. The following error message is seen "level=error msg=Bootstrap failed to complete: timed out waiting for the condition"

https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.17-e2e-openstack-csi-manila
https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.17-e2e-openstack-csi-cinder
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.17-e2e-openstack-nfv-mellanox/1797334785262096384
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.17-e2e-openstack-proxy/1797330506849718272

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8539

Bug OCPBUGS-25577: Update 4.16 ose-cluster-control-plane-machine-set-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/270

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/270

Bug OCPBUGS-27056: Update 4.16 ose-aws-ebs-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-operator/pull/115

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-30834: ART requests updates to 4.16 image ose-cluster-ingress-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/1033

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/1040

Task MON-3650: Upgrade Thanos to v0.33.0

View the linked PRs

Bug OCPBUGS-24907: Update 4.16 ose-cluster-kube-apiserver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1597

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1606

Bug OCPBUGS-33172: nil pointer dereference in AzurePathFix controller

View the Description View the linked PRs

Seeing this in hypershift e2e. I think it is racing with the Infrastructure status being populated and PlatformStatus being nil.

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.16-periodics-e2e-aws-ovn/1785458059246571520/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestAutoscaling_Teardown/namespaces/e2e-clusters-rjhhw-example-g6tsn/core/pods/logs/cluster-image-registry-operator-5597f9f4d4-dfvc6-cluster-image-registry-operator-previous.log

I0501 00:13:11.951062       1 azurepathfixcontroller.go:324] Started AzurePathFixController
I0501 00:13:11.951056       1 base_controller.go:73] Caches are synced for LoggingSyncer 
I0501 00:13:11.951072       1 imageregistrycertificates.go:214] Started ImageRegistryCertificatesController
I0501 00:13:11.951077       1 base_controller.go:110] Starting #1 worker of LoggingSyncer controller ...
E0501 00:13:11.951369       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 534 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2d6bd00?, 0x57a60e0})
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x3bcb370?})
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x2d6bd00?, 0x57a60e0?})
	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).sync(0xc000003d40)
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:171 +0x97
github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).processNextWorkItem(0xc000003d40)
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:154 +0x292
github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).runWorker(...)
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:133
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001186820?, {0x3bd1320, 0xc000cace40}, 0x1, 0xc000ca2540)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0011bac00?, 0x3b9aca00, 0x0, 0xd0?, 0x447f9c?)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0xc001385f68?, 0xc001385f78?)
	/go/src/github.com/openshift/cluster-image-registry-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
created by github.com/openshift/cluster-image-registry-operator/pkg/operator.(*AzurePathFixController).Run in goroutine 248
	/go/src/github.com/openshift/cluster-image-registry-operator/pkg/operator/azurepathfixcontroller.go:322 +0x1a6
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x2966e97]

https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/operator/azurepathfixcontroller.go#L171

https://github.com/openshift/cluster-image-registry-operator/pull/1030

Bug OCPBUGS-29984: ART requests updates to 4.16 image ose-csi-external-resizer-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-resizer/pull/155

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-resizer/pull/158

Bug OCPBUGS-25313: Alert, Metrics page not loading in OCP Console

View the Description View the linked PRs

Description of problem:

Unable to view the alerts, metrics page, getting a blank page.

Version-Release number of selected component (if applicable):

4.15.0-nightly

How reproducible:

Always

Steps to Reproduce:

Click on any alert under "Notification Panel" to view more, and you will be redirected to the alert page.

Actual results:

User is unable to view any alerts, metrics.

Expected results:

User should be able to view all/individual alerts, metrics.

Additional info:

N.A

https://github.com/openshift/monitoring-plugin/pull/88

Bug OCPBUGS-31482: [External OIDC] console pods stuck in ContainerCreating status when issuerCertificateAuthority is set due to the CA configmap is not propagated to openshift-console namespace

View the Description View the linked PRs

Description of problem:

In the tested HCP external OIDC env, when issuerCertificateAuthority is set, console pods are stuck in ContainerCreating status. The reason is the CA configmap is not propagated to openshift-console namespace by the console operator.

Version-Release number of selected component (if applicable):

Latest 4.16 and 4.15 nightly payloads

How reproducible:

Always

Steps to Reproduce:

1. Configure HCP external OIDC env with issuerCertificateAuthority set.
2. Check oc get pods -A

Actual results:

2. Before ~~OCPBUGS-31319~~ is fixed, console pods are in CrashLoopBackOff status. After ~~OCPBUGS-31319~~ is fixed or manually coping the CA configmap to openshift-config namespace as workaround, console pods are stuck in ContainerCreating status until the CA configmap is manually copied to openshift-console namespace too. Console login is affected.

Expected results:

2. Console operator should be responsible to copy the CA to openshift-console namespace. And console login should succeed.

Additional info:

In https://redhat-internal.slack.com/archives/C060D1W96LB/p1711548626625499 , HyperShift Dev side Seth requested to create this separate console bug to unblock the PR merge and backport for ~~OCPBUGS-31319~~ . So creating it

Task MON-3851: Update downstream thanos to v0.35.0

View the linked PRs

Bug OCPBUGS-24928: Update 4.16 ose-aws-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/488

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/488

Bug OCPBUGS-29590: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/77

Bug OCPBUGS-26494: error loading certificate: open /etc/tls/private/tls.crt: no such file or directory

View the Description View the linked PRs

In PowerVS, when I try and deploy a 4.16 cluster, I see the following:

Description of problem:

[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc get pods -n openshift-cloud-controller-manager
NAME                                                READY   STATUS             RESTARTS      AGE
powervs-cloud-controller-manager-6b6fbcc9db-9rhtj   0/1     CrashLoopBackOff   4 (10s ago)   2m47s
powervs-cloud-controller-manager-6b6fbcc9db-wnvck   0/1     CrashLoopBackOff   3 (49s ago)   2m46s
[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc logs pod/powervs-cloud-controller-manager-6b6fbcc9db-9rhtj -n openshift-cloud-controller-manager
Error from server: no preferred addresses found; known addresses: []
[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc logs pod/powervs-cloud-controller-manager-6b6fbcc9db-wnvck -n openshift-cloud-controller-manager
Error from server: no preferred addresses found; known addresses: []

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-ppc64le-2024-01-07-111144

How reproducible:

Aways

Steps to Reproduce:

    1. Deploy OpenShift cluster

On the master-0 node, I see:

[core@rdr-hamzy-test-wdc06-fs5m2-master-0 ~]$ sudo crictl ps -a
CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                               ATTEMPT             POD ID              POD
a048556553827       ec3035a371e09312254a277d5eb9affba2930adbd4018f7557899a2f3d76bc88                                                         18 seconds ago      Exited              kube-rbac-proxy                    7                   0381a589d57cd       cluster-cloud-controller-manager-operator-94dd5b468-kxqw5
a326f7ec83ddb       60f5c9455518c79a9797cfbeab0b3530dae1bf77554eccc382ff12d99053efd1                                                         11 minutes ago      Running             config-sync-controllers            0                   0381a589d57cd       cluster-cloud-controller-manager-operator-94dd5b468-kxqw5
ddaa6999b5b86       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:60eff87ed56ee4761fd55caa4712e6bea47dccaa11c59ba53a6d5697eacc7d32   11 minutes ago      Running             cluster-cloud-controller-manager   0                   0381a589d57cd       cluster-cloud-controller-manager-operator-94dd5b468-kxqw5

The failing pod has this as its log:

[core@rdr-hamzy-test-wdc06-fs5m2-master-0 ~]$ sudo crictl logs a048556553827
Flag --logtostderr has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
I0108 18:09:12.320332       1 flags.go:64] FLAG: --add-dir-header="false"
I0108 18:09:12.320401       1 flags.go:64] FLAG: --allow-paths="[]"
I0108 18:09:12.320413       1 flags.go:64] FLAG: --alsologtostderr="false"
I0108 18:09:12.320420       1 flags.go:64] FLAG: --auth-header-fields-enabled="false"
I0108 18:09:12.320427       1 flags.go:64] FLAG: --auth-header-groups-field-name="x-remote-groups"
I0108 18:09:12.320435       1 flags.go:64] FLAG: --auth-header-groups-field-separator="|"
I0108 18:09:12.320441       1 flags.go:64] FLAG: --auth-header-user-field-name="x-remote-user"
I0108 18:09:12.320447       1 flags.go:64] FLAG: --auth-token-audiences="[]"
I0108 18:09:12.320454       1 flags.go:64] FLAG: --client-ca-file=""
I0108 18:09:12.320460       1 flags.go:64] FLAG: --config-file="/etc/kube-rbac-proxy/config-file.yaml"
I0108 18:09:12.320467       1 flags.go:64] FLAG: --help="false"
I0108 18:09:12.320473       1 flags.go:64] FLAG: --http2-disable="false"
I0108 18:09:12.320479       1 flags.go:64] FLAG: --http2-max-concurrent-streams="100"
I0108 18:09:12.320486       1 flags.go:64] FLAG: --http2-max-size="262144"
I0108 18:09:12.320492       1 flags.go:64] FLAG: --ignore-paths="[]"
I0108 18:09:12.320500       1 flags.go:64] FLAG: --insecure-listen-address=""
I0108 18:09:12.320506       1 flags.go:64] FLAG: --kubeconfig=""
I0108 18:09:12.320512       1 flags.go:64] FLAG: --log-backtrace-at=":0"
I0108 18:09:12.320520       1 flags.go:64] FLAG: --log-dir=""
I0108 18:09:12.320526       1 flags.go:64] FLAG: --log-file=""
I0108 18:09:12.320531       1 flags.go:64] FLAG: --log-file-max-size="1800"
I0108 18:09:12.320537       1 flags.go:64] FLAG: --log-flush-frequency="5s"
I0108 18:09:12.320543       1 flags.go:64] FLAG: --logtostderr="true"
I0108 18:09:12.320550       1 flags.go:64] FLAG: --oidc-ca-file=""
I0108 18:09:12.320556       1 flags.go:64] FLAG: --oidc-clientID=""
I0108 18:09:12.320564       1 flags.go:64] FLAG: --oidc-groups-claim="groups"
I0108 18:09:12.320570       1 flags.go:64] FLAG: --oidc-groups-prefix=""
I0108 18:09:12.320576       1 flags.go:64] FLAG: --oidc-issuer=""
I0108 18:09:12.320581       1 flags.go:64] FLAG: --oidc-sign-alg="[RS256]"
I0108 18:09:12.320590       1 flags.go:64] FLAG: --oidc-username-claim="email"
I0108 18:09:12.320595       1 flags.go:64] FLAG: --one-output="false"
I0108 18:09:12.320601       1 flags.go:64] FLAG: --proxy-endpoints-port="0"
I0108 18:09:12.320608       1 flags.go:64] FLAG: --secure-listen-address="0.0.0.0:9258"
I0108 18:09:12.320614       1 flags.go:64] FLAG: --skip-headers="false"
I0108 18:09:12.320620       1 flags.go:64] FLAG: --skip-log-headers="false"
I0108 18:09:12.320626       1 flags.go:64] FLAG: --stderrthreshold="2"
I0108 18:09:12.320631       1 flags.go:64] FLAG: --tls-cert-file="/etc/tls/private/tls.crt"
I0108 18:09:12.320637       1 flags.go:64] FLAG: --tls-cipher-suites="[TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305]"
I0108 18:09:12.320654       1 flags.go:64] FLAG: --tls-min-version="VersionTLS12"
I0108 18:09:12.320661       1 flags.go:64] FLAG: --tls-private-key-file="/etc/tls/private/tls.key"
I0108 18:09:12.320667       1 flags.go:64] FLAG: --tls-reload-interval="1m0s"
I0108 18:09:12.320674       1 flags.go:64] FLAG: --upstream="http://127.0.0.1:9257/"
I0108 18:09:12.320681       1 flags.go:64] FLAG: --upstream-ca-file=""
I0108 18:09:12.320686       1 flags.go:64] FLAG: --upstream-client-cert-file=""
I0108 18:09:12.320692       1 flags.go:64] FLAG: --upstream-client-key-file=""
I0108 18:09:12.320697       1 flags.go:64] FLAG: --upstream-force-h2c="false"
I0108 18:09:12.320703       1 flags.go:64] FLAG: --v="3"
I0108 18:09:12.320709       1 flags.go:64] FLAG: --version="false"
I0108 18:09:12.320719       1 flags.go:64] FLAG: --vmodule=""
I0108 18:09:12.320735       1 kube-rbac-proxy.go:578] Reading config file: /etc/kube-rbac-proxy/config-file.yaml
I0108 18:09:12.321427       1 kube-rbac-proxy.go:285] Valid token audiences: 
I0108 18:09:12.321473       1 kube-rbac-proxy.go:399] Reading certificate files
E0108 18:09:12.321519       1 run.go:74] "command failed" err="failed to initialize certificate reloader: error loading certificates: error loading certificate: open /etc/tls/private/tls.crt: no such file or directory"

When I describe the pod, I see:

[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-release]$ oc describe pod/powervs-cloud-controller-manager-6b6fbcc9db-9rhtj -n openshift-cloud-controller-manager
Name:                 powervs-cloud-controller-manager-6b6fbcc9db-9rhtj
Namespace:            openshift-cloud-controller-manager
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      cloud-controller-manager
Node:                 rdr-hamzy-test-wdc06-fs5m2-master-2/
Start Time:           Mon, 08 Jan 2024 11:57:45 -0600
Labels:               infrastructure.openshift.io/cloud-controller-manager=PowerVS
                      k8s-app=powervs-cloud-controller-manager
                      pod-template-hash=6b6fbcc9db
Annotations:          operator.openshift.io/config-hash: 09205e81b4dc20086c29ddbdd3fccc29a675be94b2779756a0e748dd9ba91e40
Status:               Running
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/powervs-cloud-controller-manager-6b6fbcc9db
Containers:
  cloud-controller-manager:
    Container ID:  cri-o://4365a326d05ecaac8e4114efabb4a46e01a308459ad30438d742b4829c24a717
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3dd2cf78ddeed971d38731d27ce293501547b960cefc3aadaa220186eded8a09
    Image ID:      65401afa73528f9a425a9d7f5dee8a9de8d9d3d82c8fd84cd653b16409093836
    Port:          10258/TCP
    Host Port:     10258/TCP
    Command:
      /bin/bash
      -c
      #!/bin/bash
      set -o allexport
      if [[ -f /etc/kubernetes/apiserver-url.env ]]; then
        source /etc/kubernetes/apiserver-url.env
      fi
      exec /bin/ibm-cloud-controller-manager \
      --bind-address=$(POD_IP_ADDRESS) \
      --use-service-account-credentials=true \
      --configure-cloud-routes=false \
      --cloud-provider=ibm \
      --cloud-config=/etc/ibm/cloud.conf \
      --profiling=false \
      --leader-elect=true \
      --leader-elect-lease-duration=137s \
      --leader-elect-renew-deadline=107s \
      --leader-elect-retry-period=26s \
      --leader-elect-resource-namespace=openshift-cloud-controller-manager \
      --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256,TLS_AES_256_GCM_SHA384 \
      --v=2
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 08 Jan 2024 12:35:12 -0600
      Finished:     Mon, 08 Jan 2024 12:35:12 -0600
    Ready:          False
    Restart Count:  12
    Requests:
      cpu:     75m
      memory:  60Mi
    Liveness:  http-get https://:10258/healthz delay=300s timeout=160s period=10s #success=1 #failure=3
    Environment:
      POD_IP_ADDRESS:               (v1:status.podIP)
      VPCCTL_CLOUD_CONFIG:         /etc/ibm/cloud.conf
      ENABLE_VPC_PUBLIC_ENDPOINT:  true
    Mounts:
      /etc/ibm from cloud-conf (rw)
      /etc/kubernetes from host-etc-kube (ro)
      /etc/pki/ca-trust/extracted/pem from trusted-ca (ro)
      /etc/vpc from ibm-cloud-credentials (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z5xdm (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  trusted-ca:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      ccm-trusted-ca
    Optional:  false
  host-etc-kube:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes
    HostPathType:  Directory
  cloud-conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cloud-conf
    Optional:  false
  ibm-cloud-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ibm-cloud-credentials
    Optional:    false
  kube-api-access-z5xdm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                             node.kubernetes.io/not-ready:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  38m                    default-scheduler  Successfully assigned openshift-cloud-controller-manager/powervs-cloud-controller-manager-6b6fbcc9db-9rhtj to rdr-hamzy-test-wdc06-fs5m2-master-2
  Normal   Pulling    38m                    kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3dd2cf78ddeed971d38731d27ce293501547b960cefc3aadaa220186eded8a09"
  Normal   Pulled     37m                    kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3dd2cf78ddeed971d38731d27ce293501547b960cefc3aadaa220186eded8a09" in 36.694s (36.694s including waiting)
  Normal   Started    36m (x4 over 37m)      kubelet            Started container cloud-controller-manager
  Normal   Created    35m (x5 over 37m)      kubelet            Created container cloud-controller-manager
  Normal   Pulled     35m (x4 over 37m)      kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3dd2cf78ddeed971d38731d27ce293501547b960cefc3aadaa220186eded8a09" already present on machine
  Warning  BackOff    2m57s (x166 over 37m)  kubelet            Back-off restarting failed container cloud-controller-manager in pod powervs-cloud-controller-manager-6b6fbcc9db-9rhtj_openshift-cloud-controller-manager(bf58b824-b1a2-4d2e-8735-22723642a24a)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/322

Bug OCPBUGS-36968: [aws capi] unnecessary revoke-authorize ingress rules loop

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35440~~. The following is the description of the original issue:
—
Description of problem:

Because of a bug in upstream CAPA, the Load Balancer ingress rules are continuously revoked and then authorized, causing unnecessary AWS API calls and cluster provision delays.

Version-Release number of selected component (if applicable):

4.16+

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

A constant loop of revoke-authorize of ingress rules.

Expected results:

Rules should be revoked only when needed (for example, when the installer removes the allow-all ssh rule). In the other cases, rules should be authorized only once.

Additional info:

Upstream issue created: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/5023
PR submitted upstream: https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5024

https://github.com/openshift/installer/pull/8734

Bug OCPBUGS-25035: Update 4.16 ose-csi-driver-shared-resource-mustgather-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/160

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/160

Bug OCPBUGS-28744: Upgrade blocked due to the OLM operator stuck in CrashLoopBackOff

View the Description View the linked PRs

Description of problem:

$ oc adm upgrade
info: An upgrade is in progress. Working towards 4.15.0-rc.4: 701 of 873 done (80% complete), waiting on operator-lifecycle-manager

Upstream: https://api.openshift.com/api/upgrades_info/v1/graph
Channel: candidate-4.15 (available channels: candidate-4.15, candidate-4.16)
No updates available. You may still upgrade to a specific release image with --to-image or wait for new updates to be available.


$ oc get pods -n openshift-operator-lifecycle-manager 
NAME                                      READY   STATUS             RESTARTS        AGE
catalog-operator-db86b7466-gdp4g          1/1     Running            0               9h
collect-profiles-28443465-9zzbk           0/1     Completed          0               34m
collect-profiles-28443480-kkgtk           0/1     Completed          0               19m
collect-profiles-28443495-shvs7           0/1     Completed          0               4m10s
olm-operator-56cb759d88-q2gr7             0/1     CrashLoopBackOff   8 (3m27s ago)   20m
package-server-manager-7cf46947f6-sgnlk   2/2     Running            0               9h
packageserver-7b795b79f-thxfw             1/1     Running            1               14d
packageserver-7b795b79f-w49jj             1/1     Running            0               4d17h

Version-Release number of selected component (if applicable):

How reproducible:

Unknown

Steps to Reproduce:

Upgrade from 4.15.0-rc.2 to 4.15.0-rc.4

Actual results:

The upgrade is unable to proceed

Expected results:

The upgrade can proceed

Additional info:

https://github.com/openshift/operator-framework-olm/pull/679

Bug OCPBUGS-29623: Some oc cli commands don't respect --certificate-authority

View the Description View the linked PRs

Description of problem:

When using the oc cli to query information about release images it is not possible to use the --certificate-authority option to specify an alternative CA bundle for verifying connections to the target registry.

Version-Release number of selected component (if applicable): 4.14.5

How reproducible: 100%

Steps to Reproduce:

    1. oc adm release info --registry-config ./auth.json --certificate-authority ./tls-ca-bundle.pem quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64

Actual results:

error: unable to read image quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64: Get "https://quay.io/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority

Expected results:

Something beginning with:

Name:           4.14.9
Digest:         sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44
Created:        2024-01-12T06:48:42Z
OS/Arch:        linux/amd64
Manifests:      680
Metadata files: 1

Pull From: quay.io/openshift-release-dev/ocp-release@sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44

Release Metadata:

Additional info:

To fully verify that this was an issue I went through the following steps which should show that the oc command is not using the CA bundle in the provided file and that the command would have worked if oc was using the provided bundle

// show the command works with the system CA bundle

# oc adm release info --registry-config ./auth.json quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64 | head
Name:           4.14.9
Digest:         sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44
Created:        2024-01-12T06:48:42Z
OS/Arch:        linux/amd64
Manifests:      680
Metadata files: 1

Pull From: quay.io/openshift-release-dev/ocp-release@sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44

Release Metadata:

// move the system CA bundle to the local directory

# mv /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem .

// show the same command now fails without that bundle file

# oc adm release info --registry-config ./auth.json quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64 | head
error: unable to read image quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64: Get "https://quay.io/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority

// show using that same bundle file with --certificate-authority doesn't work

# oc adm release info --registry-config ./auth.json --certificate-authority ./tls-ca-bundle.pem quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64 | head
error: unable to read image quay.io/openshift-release-dev/ocp-release:4.14.9-x86_64: Get "https://quay.io/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority


Additionally this also seems to be a problem for at least the following commands as well:
oc image info
oc adm release extract

https://github.com/openshift/oc/pull/1693

Bug OCPBUGS-30832: ART requests updates to 4.16 image telemeter-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/522

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/521

Bug OCPBUGS-33794: Cluster Install with disabled Ingress capability does not disable Ingress

View the Description View the linked PRs

Description of problem:

    If I use custom CVO capabilities via the install config, I can create a capability set that disables the Ingress capability.
However, once the cluster boots up, the Ingress capability will always be enabled.
This creates a dissonance between the desired install config and what happens.
It would be better to fail the install at install-config validation to prevent that dissonance.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8528

Bug MGMT-17241: [soft-timeout] Installation not timeout /aborted after 24Hours

View the Description View the linked PRs

Description of the problem:

Installing cluster . during installation hit on timeout warnings and installation "stucked" in a state of installing .

No more events in the log events and installation still at the same state after ~48 hours.
Looks like stuck forever ....
test-infra-cluster-cfb47d07_608f175e-aa23-493d-8a5c-d5bcaf15468f(1).tar

Screencast from 2024-03-15 21-05-34.webm

How reproducible:

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/6092

Task MON-3861: Bump go tools 1.21

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2348

Bug OCPBUGS-24987: Update 4.16 oauth-server-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-server/pull/140

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-server/pull/142

Bug OCPBUGS-27796: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-36379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1810

Bug OCPBUGS-34166: virtual hosted-style doesn't work since 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32710~~. The following is the description of the original issue:
—
Description of problem:

    When enabled virtualHostedStyle with regionEndpoint set in config.image/cluster , image registry failed to be running. errors throw:

time="2024-04-22T14:14:31.057192227Z" level=error msg="s3aws: RequestError: send request failed\ncaused by: Get \"https://s3-fips.us-west-1.amazonaws.com/ci-ln-67zbmzk-76ef8-4n6wb-image-registry-us-west-1-xjyfbabyboc?list-type=2&max-keys=1&prefix=\": dial tcp: lookup s3-fips.us-west-1.amazonaws.com on 172.30.0.10:53: no such host" go.version="go1.20.12 X:strictfipsruntime"

Version-Release number of selected component (if applicable):

    4.14.18

How reproducible:

    always

Steps to Reproduce:

    1.
$ oc get config.imageregistry/cluster -ojsonpath="{.status.storage}"|jq 
{
  "managementState": "Managed",
  "s3": {
    "bucket": "ci-ln-67zbmzk-76ef8-4n6wb-image-registry-us-west-1-xjyfbabyboc",
    "encrypt": true,
    "region": "us-west-1",
    "regionEndpoint": "https://s3-fips.us-west-1.amazonaws.com",
    "trustedCA": {
      "name": ""
    },
    "virtualHostedStyle": true
  }
}     
    2. Check registry pod
$ oc get co image-registry
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
image-registry   4.15.5    True        True          True       79m     Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-b6c58998d" has timed out progressing

Actual results:

$ oc get pods image-registry-b6c58998d-m8pnb -oyaml| yq '.spec.containers[0].env'
- name: REGISTRY_STORAGE_S3_REGIONENDPOINT
  value: https://s3-fips.us-west-1.amazonaws.com
[...]
- name: REGISTRY_STORAGE_S3_VIRTUALHOSTEDSTYLE
  value: "true"
[...]

$ oc logs image-registry-b6c58998d-m8pnb
[...]
time="2024-04-22T14:14:31.057192227Z" level=error msg="s3aws: RequestError: send request failed\ncaused by: Get \"https://s3-fips.us-west-1.amazonaws.com/ci-ln-67zbmzk-76ef8-4n6wb-image-registry-us-west-1-xjyfbabyboc?list-type=2&max-keys=1&prefix=\": dial tcp: lookup s3-fips.us-west-1.amazonaws.com on 172.30.0.10:53: no such host" go.version="go1.20.12 X:strictfipsruntime"

Expected results:

    virtual hosted-style should work

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1042

Bug MGMT-16757: [STG][BE][vSphere] CNV should be disabled when select platform vSphere

View the Description View the linked PRs

Description of the problem:

Up to latest decision RH won't going to support installation OCP cluster on vSphere with

nested virtualization. Thus the check box "Install OpenShift Virtualization" on page "Operators" should be disabled when select platform "vSphere" on page "Cluster Details"

Slack discussion thread

https://redhat-internal.slack.com/archives/C0211848DBN/p1706640683120159

Nutanix
https://portal.nutanix.com/page/documents/kbs/details?targetId=kA00e000000XeiHCAS

https://github.com/openshift/assisted-service/pull/5945

Bug OCPBUGS-12876: Cluster fails to convert from IPv6 Primary Dualstack to IPv6 single stack

View the Description View the linked PRs

Description of problem:

Converting IPv6 Primary Dual Stack to IPv6 Single stack causing control plane failures. OVN masters in CLBO state peridically

OVN masters logs: http://shell.lab.bos.redhat.com/~anusaxen/convert/

MG is not working as cluster lands in bad shape. Happy to share cluster if needed for debugging

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-21-084440

How reproducible:

Always

Steps to Reproduce:

1.Bring cluster on IPv6 Primary Dual Stack

2.Edit network.config.openshift.io from dual stack to single like as follow


spec:
    clusterNetwork:
    - cidr: fd01::/48
      hostPrefix: 64
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    externalIP:
      policy: {}
    networkType: OVNKubernetes
    serviceNetwork:
    - fd02::/112
    - 172.30.0.0/16
  status:
    clusterNetwork:
    - cidr: fd01::/48
      hostPrefix: 64
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    clusterNetworkMTU: 1400
    networkType: OVNKubernetes
    serviceNetwork:
    - fd02::/112
    - 172.30.0.0/16


TO



 apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Network
  metadata:
    creationTimestamp: "2023-04-27T14:11:37Z"
    generation: 3
    name: cluster
    resourceVersion: "81045"
    uid: 28f15675-e739-4262-9acc-4c2c0df4b38d
  spec:
    clusterNetwork:
    - cidr: fd01::/48
      hostPrefix: 64
    externalIP:
      policy: {}
    networkType: OVNKubernetes
    serviceNetwork:
    - fd02::/112
  status:
    clusterNetwork:
    - cidr: fd01::/48
      hostPrefix: 64
    clusterNetworkMTU: 1400
    networkType: OVNKubernetes
    serviceNetwork:
    - fd02::/112
kind: List
metadata:
  resourceVersion: ""

3. Wait for control plane components to roll out successfully

Actual results:

Cluster fails with network, ETCD, Kube API and ingress failures

Expected results:

Cluster should convert to IPv6 single without any issues

Additional info:

MGs not working due to varios control place restricting it

https://github.com/openshift/ovn-kubernetes/pull/2073

Task MGMT-17850: Bump k8s.io modules to v0.29

View the Description View the linked PRs

Ecosystem QE is preparing to create a release-4.16 branch within our test repos. Many pkgs are currently using v0.29 modules which will not be compatible with v0.28. It would be ideal if we can update k8s modules to v0.29 to prevent us from needing to re-implement the assisted APIs.

https://github.com/openshift/assisted-service/pull/6345

Bug OCPBUGS-27267: Update 4.16 ose-azure-disk-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-operator/pull/126

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug MGMT-16373: Minimal ISO s390x architecture - there is no URL to change to Full ISO (type=full-iso), the discovery ISO fails to create

View the Description View the linked PRs

OCP 4.14.5
multicluster-engine.v2.4.1
advanced-cluster-management.v2.9.0

Attempt to run the create a spoke cluster:

apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
  creationTimestamp: "2023-12-08T16:59:25Z"
  finalizers:
  - agentclusterinstall.agent-install.openshift.io/ai-deprovision
  generation: 1
  name: infraenv-spoke
  namespace: infraenv-spoke
  ownerReferences:
  - apiVersion: hive.openshift.io/v1
    kind: ClusterDeployment
    name: infraenv-spoke
    uid: 34f1fe43-2af2-4880-b4ca-fb9ab8df13df
  resourceVersion: "3468594"
  uid: 79a42bdf-db1f-4500-b689-8b3813bd27a6
spec:
  clusterDeploymentRef:
    name: infraenv-spoke
  imageSetRef:
    name: 4.14-test
  networking:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    serviceNetwork:
    - 172.30.0.0/16
    userManagedNetworking: true
  provisionRequirements:
    controlPlaneAgents: 3
    workerAgents: 2
status:
  conditions:
  - lastProbeTime: "2023-12-08T16:59:30Z"
    lastTransitionTime: "2023-12-08T16:59:30Z"
    message: SyncOK
    reason: SyncOK
    status: "True"
    type: SpecSynced
  - lastProbeTime: "2023-12-08T16:59:30Z"
    lastTransitionTime: "2023-12-08T16:59:30Z"
    message: The cluster is not ready to begin the installation
    reason: ClusterNotReady
    status: "False"
    type: RequirementsMet
  - lastProbeTime: "2023-12-08T16:59:30Z"
    lastTransitionTime: "2023-12-08T16:59:30Z"
    message: 'The cluster''s validations are failing: '
    reason: ValidationsFailing
    status: "False"
    type: Validated
  - lastProbeTime: "2023-12-08T16:59:30Z"
    lastTransitionTime: "2023-12-08T16:59:30Z"
    message: The installation has not yet started
    reason: InstallationNotStarted
    status: "False"
    type: Completed
  - lastProbeTime: "2023-12-08T16:59:30Z"
    lastTransitionTime: "2023-12-08T16:59:30Z"
    message: The installation has not failed
    reason: InstallationNotFailed
    status: "False"
    type: Failed
  - lastProbeTime: "2023-12-08T16:59:30Z"
    lastTransitionTime: "2023-12-08T16:59:30Z"
    message: The installation is waiting to start or in progress
    reason: InstallationNotStopped
    status: "False"
    type: Stopped
  debugInfo:
    eventsURL: https://assisted-service-rhacm.apps.sno-0.qe.lab.redhat.com/api/assisted-install/v2/events?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiZjU4MjVmMTctNTg0OS00OTljLWE1NDctNjJmMDc4ZDU3MDJiIn0.qpSeZuqLwZ3cr3qn6AZo665o1ANp45YVE6IWUv7Gdn1RmapG4HZaxsUUY4iswkRMiqIfka_pLHFnBeVzXSTbrg&cluster_id=f5825f17-5849-499c-a547-62f078d5702b
    logsURL: ""
    state: insufficient
    stateInfo: Cluster is not ready for install
  platformType: None
  progress:
    totalPercentage: 0
  userManagedNetworking: true
apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  creationTimestamp: "2023-12-08T16:59:26Z"
  finalizers:
  - infraenv.agent-install.openshift.io/ai-deprovision
  generation: 1
  name: infraenv-spoke
  namespace: infraenv-spoke
  resourceVersion: "3468794"
  uid: 6254bbb3-5531-4665-bb78-f073b439b023
spec:
  clusterRef:
    name: infraenv-spoke
    namespace: infraenv-spoke
  cpuArchitecture: s390x
  ipxeScriptType: ""
  nmStateConfigLabelSelector: {}
  pullSecretRef:
    name: infraenv-spoke-pull-secret
status:
  agentLabelSelector:
    matchLabels:
      infraenvs.agent-install.openshift.io: infraenv-spoke
  bootArtifacts:
    initrd: ""
    ipxeScript: ""
    kernel: ""
    rootfs: ""
  conditions:
  - lastTransitionTime: "2023-12-08T16:59:51Z"
    message: 'Failed to create image: cannot use Minimal ISO because it''s not compatible
      with the s390x architecture on version 4.14.6 of OpenShift'
    reason: ImageCreationError
    status: "False"
    type: ImageCreated
  debugInfo:
    eventsURL: ""



 oc get clusterimagesets.hive.openshift.io 4.14-test -o yaml
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  creationTimestamp: "2023-12-08T18:11:29Z"
  generation: 1
  name: 4.14-test
  resourceVersion: "3514589"
  uid: 32e2ba8d-6bb7-4e4b-b3a5-63fa8224d144
spec:
  releaseImage: registry.ci.openshift.org/ocp-s390x/release-s390x@sha256:f024a617c059bf2cbf4a669c2a19ab4129e78a007c6863b64dd73a413c0bdf46


oc get agentserviceconfigs.agent-install.openshift.io agent -o yaml
apiVersion: agent-install.openshift.io/v1beta1
kind: AgentServiceConfig
metadata:
  creationTimestamp: "2023-12-08T18:10:42Z"
  finalizers:
  - agentserviceconfig.agent-install.openshift.io/ai-deprovision
  generation: 1
  name: agent
  resourceVersion: "3514534"
  uid: ef204896-25f1-4ff3-ae60-c80c2f45cd30
spec:
  databaseStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi
  filesystemStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi
  imageStorage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi
  mirrorRegistryRef:
    name: mirror-registry-ca
  osImages:
  - cpuArchitecture: x86_64
    openshiftVersion: "4.14"
    rootFSUrl: ""
    url: https://mirror.openshift.com/pub/openshift-v4/amd64/dependencies/rhcos/4.14/latest/rhcos-live.x86_64.iso
    version: "4.14"
  - cpuArchitecture: arm64
    openshiftVersion: "4.14"
    rootFSUrl: ""
    url: https://mirror.openshift.com/pub/openshift-v4/aarch64/dependencies/rhcos/4.14/latest/rhcos-live.aarch64.iso
    version: "4.14"
  - cpuArchitecture: ppc64le
    openshiftVersion: "4.14"
    rootFSUrl: ""
    url: https://mirror.openshift.com/pub/openshift-v4/ppc64le/dependencies/rhcos/4.14/latest/rhcos-live.ppc64le.iso
    version: "4.14"
  - cpuArchitecture: s390x
    openshiftVersion: "4.14"
    rootFSUrl: ""
    url: https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.14/latest/rhcos-live.s390x.iso
    version: "4.14"
status:
  conditions:
  - lastTransitionTime: "2023-12-08T18:10:42Z"
    message: AgentServiceConfig reconcile completed without error.
    reason: ReconcileSucceeded
    status: "True"
    type: ReconcileCompleted
  - lastTransitionTime: "2023-12-08T18:11:23Z"
    message: All the deployments managed by Infrastructure-operator are healthy.
    reason: DeploymentSucceeded
    status: "True"
    type: DeploymentsHealthy

https://github.com/openshift/assisted-service/pull/5825

Bug OCPBUGS-31622: mirrorToDisk command failed when imagesetconfig include multiple catalogs (v2docker2 + oci)

View the Description View the linked PRs

Description of problem:

mirrorToDisk command failed when imagesetconfig include multiple catalogs (v2docker2 + oci)

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403251146.p0.g03ce0ca.assembly.stream.el9-03ce0ca", GitCommit:"03ce0ca797e73b6762fd3e24100ce043199519e9", GitTreeState:"clean", BuildDate:"2024-03-25T16:34:33Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1)  Copy   oci redhat-operator-index multi-arch with skopeo:
`skopeo copy --all docker://registry.redhat.io/redhat/redhat-operator-index:v4.15 oci:///app1/noo/redhat-operator-index --remove-signatures`
  
2)  Set the imagesetconfig with multiple catalogs :
cat config-multi-cs.yaml
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
  operators:
    - catalog: oci:///app1/noo/redhat-operator-index
      packages:
        - name: aws-load-balancer-operator
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
      packages:
       - name: elasticsearch-operator
    - catalog: registry.redhat.io/redhat/redhat-marketplace-index:v4.15
      packages:
      - name: datadog-operator-certified-rhmp
    - catalog: registry.redhat.io/redhat/certified-operator-index:v4.15
      packages:
      - name: portworx-certified
    - catalog: registry.redhat.io/redhat/community-operator-index:v4.15
      packages:
      - name: seldon-operator

3) Run the mirrorToDisk
`oc-mirror --config config-multi-cs.yaml file://multics --v2`

Actual results:

3)  mirror command failed: 
oc-mirror --config config-multi-cs.yaml file://multics --v2
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used. 
2024/04/02 06:41:10  [INFO]   : mode mirrorToDisk 
2024/04/02 06:41:10  [INFO]   : local storage registry will log to /app1/0401/multics/working-dir/logs/registry.log
2024/04/02 06:41:10  [INFO]   : starting local storage on localhost:55000
2024/04/02 06:41:10  [INFO]   : copying  cincinnati response to multics/working-dir/release-filters
2024/04/02 06:41:10  [INFO]   : total release images to copy 0 
2024/04/02 06:41:10  [INFO]   : copying operator image oci:///app1/noo/redhat-operator-index
2024/04/02 06:41:15  [INFO]   : manifest 8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
2024/04/02 06:41:15  [INFO]   : label /configs
2024/04/02 06:41:29  [INFO]   : copying operator image registry.redhat.io/redhat/redhat-operator-index:v4.15
2024/04/02 06:41:39  [INFO]   : manifest c866c3b4dac531016e4798f3232bb40e07d7dabd7d628d575de53d1821e51a50
2024/04/02 06:41:39  [INFO]   : label /configs
2024/04/02 06:41:53  [INFO]   : copying operator image registry.redhat.io/redhat/redhat-marketplace-index:v4.15
2024/04/02 06:42:01  [INFO]   : manifest 895c1659c4337aaa963263f002fefa938087e40011cd2c1331ef4780d62fd1a7
2024/04/02 06:42:01  [INFO]   : label /configs
2024/04/02 06:42:07  [INFO]   : copying operator image registry.redhat.io/redhat/certified-operator-index:v4.15
2024/04/02 06:42:13  [INFO]   : manifest 5c32d95f0c6d873454f2fcd6a9750dbadf638b8db8ada3d0b1c282d80b0dbcb3
2024/04/02 06:42:13  [INFO]   : label /configs
2024/04/02 06:42:21  [INFO]   : copying operator image registry.redhat.io/redhat/community-operator-index:v4.15
2024/04/02 06:42:27  [INFO]   : manifest befb55a98886578684023b155c5889a845defb35d0d09c52b8738b851ee4eec2
2024/04/02 06:42:27  [INFO]   : label /configs
2024/04/02 06:42:36  [INFO]   : related images length 10 
2024/04/02 06:42:36  [INFO]   : images to copy (before duplicates) 26 
error closing log file registry.log: close multics/working-dir/logs/registry.log: file already closed
2024/04/02 06:42:36  [ERROR]  : unable to parse image  correctly

Expected results:

4) no error

https://github.com/openshift/oc-mirror/pull/851

Bug OCPBUGS-33493: baremetal operator not starting on assisted/agent installs

View the Description View the linked PRs

The provisioning CR is now created with a paused annotation (since https://github.com/openshift/installer/pull/8346)

On baremetal IPI installs, this annotation is removed at the conclusion of bootstrapping.

On assisted/ABI installs there is nothing to remove it, so cluster-baremetal-operator never deploys anything.

https://github.com/openshift/cluster-baremetal-operator/pull/416

Bug OCPBUGS-29915: CNCC got crashed when upgrade from 4.15 to 4.16 for gcp-ipi-disc-priv-oidc ci

View the Description View the linked PRs

Description of problem:
After upgrading to 4.16.0-0.nightly-2024-02-23-013505 from 4.15.0-rc.8 (gcp-ipi-disc-priv-oidc-f14), openshift-cloud-network-config-controller CrashLoopBackOff by Error building cloud provider client, err: error: cannot initialize google client, must gather is available. The other job (gcp-ipi-oidc-rt-fips-f14) failed by same error.
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-gcp-ipi-disc-priv-oidc-f14/1761337726575054848

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-gcp-ipi-oidc-rt-fips-f14/1760520933212164096

must-gather:
https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-gcp-ipi-disc-priv-oidc-f14/1761337726575054848/artifacts/gcp-ipi-disc-priv-oidc-f14/gather-must-gather/artifacts/

Version-Release number of selected component (if applicable):

 4.16.0-0.nightly-2024-02-23-013505

How reproducible:

Steps to Reproduce:

After upgrading to 4.16.0-0.nightly-2024-02-23-013505 from 4.15.0-rc.8 (gcp-ipi-disc-priv-oidc-f14), openshift-cloud-network-config-controller CrashLoopBackOff by Error building cloud provider client, err: error: cannot initialize google client, must gather is available. The other job  (gcp-ipi-oidc-rt-fips-f14) failed by same error.

Actual results:


containerStatuses:
  - containerID: cri-o://b7dc826c4004583a4195f953bb7c858f3645b3ba864db65c69282fc8b7a9a9e8
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02a0ea00865bda78b3b04056dc9e4f596dae74996ecc1fcdee7fbe8d603e33f1
    imageID: 9dfa10971dce332900b111bbe6a28df76e1d6e0c5b9c132c3abfff80ea0afa9c
    lastState:
      terminated:
        containerID: cri-o://b7dc826c4004583a4195f953bb7c858f3645b3ba864db65c69282fc8b7a9a9e8
        exitCode: 255
        finishedAt: "2024-02-24T15:20:08Z"
        message: |
          r,UID:,APIVersion:apps/v1,ResourceVersion:,FieldPath:,},Reason:FeatureGatesInitialized,Message:FeatureGates updated to featuregates.Features{Enabled:[]v1.FeatureGateName{\"AlibabaPlatform\", \"AzureWorkloadIdentity\", \"BuildCSIVolumes\", \"CloudDualStackNodeIPs\", \"ExternalCloudProvider\", \"ExternalCloudProviderAzure\", \"ExternalCloudProviderExternal\", \"ExternalCloudProviderGCP\", \"KMSv1\", \"NetworkLiveMigration\", \"OpenShiftPodSecurityAdmission\", \"PrivateHostedZoneAWS\", \"VSphereControlPlaneMachineSet\"}, Disabled:[]v1.FeatureGateName{\"AdminNetworkPolicy\", \"AutomatedEtcdBackup\", \"CSIDriverSharedResource\", \"ClusterAPIInstall\", \"DNSNameResolver\", \"DisableKubeletCloudCredentialProviders\", \"DynamicResourceAllocation\", \"EventedPLEG\", \"GCPClusterHostedDNS\", \"GCPLabelsTags\", \"GatewayAPI\", \"InsightsConfigAPI\", \"InstallAlternateInfrastructureAWS\", \"MachineAPIOperatorDisableMachineHealthCheckController\", \"MachineAPIProviderOpenStack\", \"MachineConfigNodes\", \"ManagedBootImages\", \"MaxUnavailableStatefulSet\", \"MetricsServer\", \"MixedCPUsAllocation\", \"NodeSwap\", \"OnClusterBuild\", \"PinnedImages\", \"RouteExternalCertificate\", \"SignatureStores\", \"SigstoreImageVerification\", \"TranslateStreamCloseWebsocketRequests\", \"UpgradeStatus\", \"VSphereStaticIPs\", \"ValidatingAdmissionPolicy\", \"VolumeGroupSnapshot\"}},Source:EventSource{Component:cloud-network-config-controller-86bc6cf968-54kkg,Host:,},FirstTimestamp:2024-02-24 15:20:07.457325229 +0000 UTC m=+0.107570685,LastTimestamp:2024-02-24 15:20:07.457325229 +0000 UTC m=+0.107570685,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:cloud-network-config-controller-86bc6cf968-54kkg,ReportingInstance:,}"
          F0224 15:20:08.633010       1 main.go:138] Error building cloud provider client, err: error: cannot initialize google client, err: Get "http://169.254.169.254/computeMetadata/v1/universe/universe_domain": dial tcp 169.254.169.254:80: connect: connection refused
        reason: Error
        startedAt: "2024-02-24T15:20:07Z"
    name: controller
    ready: false
    restartCount: 12
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=controller pod=cloud-network-config-controller-86bc6cf968-54kkg_openshift-cloud-network-config-controller(95a0c264-ad8b-4fb0-9218-5b2b84fb8194)
        reason: CrashLoopBackOff

Expected results:

   CNCC won't crash after upgrade

Additional info:

Bug OCPBUGS-30297: TaskRun with same name in different project don't show 2 entries when listing in all namespace

View the Description View the linked PRs

Description of problem:

    If there is a taskRun with same name in 2 different namespace, then in TaskRuns list page for All namespace, showing only one record due to same name

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1. Create TaskRun using https://gist.github.com/karthikjeeyar/eb1bbdf9157431f5c875eb55ce47580c in 2 different namespace
    2. Go to TaskRun list page
    3. Select All Projects

Actual results:

    Only one entry is shown

Expected results:

    Both entries should be visible

Additional info:

https://github.com/openshift/console/pull/13650

Bug OCPBUGS-34278: FIPS install not possible with agent-based installer

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34181~~. The following is the description of the original issue:
—
In the agent installer, assisted-service must always use the openshift-baremetal-installer binary (which is dynamically linked) to ensure that if the target cluster is in FIPS mode the installer will be able to run. (This was implemented in ~~MGMT-15150~~.)

A recent change for ~~OCPBUGS-33227~~ has switched to using the statically-linked openshift-installer for 4.16 and later. This breaks FIPS on the agent-based installer.

It appears that CI tests for the agent installer (the compact-ipv4 job runs with FIPS enabled) did not detect this, because we are unable to correctly determine the "version" of OpenShift being installed when it is in fact a CI payload.

https://github.com/openshift/assisted-service/pull/6357

Bug OCPBUGS-38058: [release-4.16] Hosted control planes: IDP communication through Konnectivity does not respect outgoing HTTP/s PROXY in DataPlane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36932~~. The following is the description of the original issue:
—
Description of problem:

Customer defines proxy in its HostedCluster resource definition. The variables are propagated to some pods but not to oauth one:

oc describe pod kube-apiserver-5f5dbf78dc-8gfgs | grep PROX
HTTP_PROXY: http://ocpproxy.corp.example.com:8080
HTTPS_PROXY: http://ocpproxy.corp.example.com:8080
NO_PROXY: .....
oc describe pod oauth-openshift-6d7b7c79f8-2cf99| grep PROX
HTTP_PROXY: socks5://127.0.0.1:8090
HTTPS_PROXY: socks5://127.0.0.1:8090
ALL_PROXY: socks5://127.0.0.1:8090
NO_PROXY: kube-apiserver

apiVersion: hypershift.openshift.io/v1beta1
kind: HostedCluster

...

spec:
autoscaling: {}
clusterID: 9c8db607-b291-4a72-acc7-435ec23a72ea
configuration:

.....
proxy:
httpProxy: http://ocpproxy.corp.example.com:8080
httpsProxy: http://ocpproxy.corp.example.com:8080

Version-Release number of selected component (if applicable): 4.14

https://github.com/openshift/hypershift/pull/4496

Story HOSTEDCP-1188: Include docs for our control plane scheduling topologies

View the Description View the linked PRs

Complement https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/ to include Share everything / share nothing / dedicated behaviour and requirements. -> https://docs.google.com/document/d/1eSaqR7rUwelq0PRC_trL3d5vxLMlJmLytHtFZyFOpvg/edit

e.g https://docs.google.com/presentation/d/1gfvr2dqcfdJ3UdOmgy09E8wCZYIIsvagva7RA-w4-s0/edit#slide=id.g2519af5cbc8_0_54

https://github.com/openshift/hypershift/pull/3434

Bug OCPBUGS-24925: Update 4.16 ose-ovirt-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-ovirt/pull/176

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Story TRT-1489: AWS minor jobs failing on TargetDown for crio metrics

View the Description View the linked PRs

In https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-02-06-031624, I see several PR's involving moving crio metrics. This payload is being rejected on TargetAlerts alerts on AWS minor upgrades.

Example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-[…]e-from-stable-4.15-e2e-aws-ovn-upgrade/1754707749708500992

[sig-node][invariant] alert/TargetDown should not be at or above info in ns/kube-system expand_less 0s { TargetDown was at or above info for at least 1m58s on platformidentification.JobType

{Release:"4.16", FromRelease:"4.15", Platform:"aws", Architecture:"amd64", Network:"ovn", Topology:"ha"}

(maxAllowed=1s): pending for 15m0s, firing for 1m58s: Feb 06 04:48:42.698 - 118s W namespace/kube-system alert/TargetDown alertstate/firing severity/warning ALERTS{alertname="TargetDown", alertstate="firing", job="crio", namespace="kube-system", prometheus="openshift-monitoring/k8s", service="kubelet", severity="warning"}}

https://github.com/openshift/cluster-monitoring-operator/pull/2255

Bug OCPBUGS-34835: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/1072

Bug OCPBUGS-25113: Update 4.16 ose-gcp-pd-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver-operator/pull/101

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/101

Bug OCPBUGS-34214: multus-admission-controller stuck in CrashLoopBackOff when egress IPs are created at scale [4.16]

View the Description View the linked PRs

Description of problem:

Perf & scale team is running scale tests to to find out maximum supported egress ips and come across this issue. When we have 55339 egress ip objects (each egress ip object with one egress ip address) in 118 worker node baremetal cluster, multus-admission-controller pod is stuck in CrashLoopBackOff state.

"oc describe pod" command output is copied here http://storage.scalelab.redhat.com/anilvenkata/multus-admission/multus-admission-controller-84b896c8-kmvdk.describe 

"oc describe pod" shows that the names of all 55339 egress ips are passed to container's exec command 
#cat multus-admission-controller-84b896c8-kmvdk.describe  | grep ignore-namespaces | tr ',' '\n' | grep -c egressip
55339

and exec command is failing as this argument list is too long.
# oc logs  -n openshift-multus multus-admission-controller-84b896c8-kmvdk
Defaulted container "multus-admission-controller" out of: multus-admission-controller, kube-rbac-proxy
exec /bin/bash: argument list too long

# oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
network   4.14.16   True        True          False      35d     Deployment "/openshift-multus/multus-admission-controller" update is rolling out (1 out of 3 updated)

# oc describe pod -n openshift-multus multus-admission-controller-84b896c8-kmvdk > multus-admission-controller-84b896c8-kmvdk.describe
 
# oc get pods -n openshift-multus  | grep multus-admission-controller
multus-admission-controller-6c58c66ff9-5x9hn   2/2     Running            0                35d
multus-admission-controller-6c58c66ff9-zv9pd   2/2     Running            0                35d
multus-admission-controller-84b896c8-kmvdk     1/2     CrashLoopBackOff   26 (2m56s ago)   110m

As this environment has 55338 namespaces (each namespace with 1 pod and 1 eip object), it will hard to capture must gather.

Version-Release number of selected component (if applicable):

    4.14.16

How reproducible:

    always

Steps to Reproduce:

    1. use kube-burner to create 55339 egress ip obejct, each object with one egress ip address. 
    2. We will see multus-admission-controller pod stuck in CrashLoopBackOff

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2378

Bug OCPBUGS-25036: sdn: Decouple from cli image

View the Description View the linked PRs

Description of problem:

The sdn image inherits from the cli image to get the oc binary. Change this to install the openshift-clients rpm instead.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn't need to read the entire case history.
Don't presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with "sbr-triaged"
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with "sbr-untriaged"
Note: bugs that do not meet these minimum standards will be closed with label "SDN-Jira-template"

https://github.com/openshift/sdn/pull/593

Bug OCPBUGS-30077: Add search filters in the toolbar

View the Description View the linked PRs

Add the possibility to have other search filters in the resource list toolbar

Why is this important?

This will let plugins define additional search filters on top of Name and Label in a resource list page

Scenarios

In the nmstate-console-plugin I would like to define other filters like IP search in the NMState list. This search will let the user filter the nmstate resources that are inside a subnet or by specific ips.

For now, using the props and some hacks we were able to change the Name search into an IP search but we would like to have both.

https://issues.redhat.com/browse/CNV-36247

https://github.com/openshift/console/pull/13233

Bug OCPBUGS-24930: Update 4.16 ose-ibm-vpc-block-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/60

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/60

Bug OCPBUGS-33081: oc-mirror v2 should be able to mirror operators that does not have standard versioning (incompatible with semver)

View the Description View the linked PRs

Description of problem:

When running oc-mirror against yaml that includes community-operator-index the process terminates prematurely

Version-Release number of selected component (if applicable):

$ oc-mirror version
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202404221110.p0.g0e2235f.assembly.stream.el9-0e2235f", GitCommit:"0e2235f4a51ce0a2d51cfc87227b1c76bc7220ea", GitTreeState:"clean", BuildDate:"2024-04-22T16:05:56Z", GoVersion:"go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

$ cat imageset-config.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
archiveSize: 4
mirror:
  platform:
    channels:
    - name: stable-4.15
      type: ocp
    graph: true
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
    full: false
  - catalog: registry.redhat.io/redhat/certified-operator-index:v4.15
    full: false
  - catalog: registry.redhat.io/redhat/community-operator-index:v4.15
    full: false
  additionalImages:
  - name: registry.redhat.io/ubi8/ubi:latest
  helm: {}


$ oc-mirror --v2 -c imageset-config.yaml  --loglevel debug --workspace file:////data/oc-mirror/workdir/ docker://registry.local.momolab.io:8443


Last 10 lines:

2024/04/29 06:01:40  [DEBUG]  : source docker://public.ecr.aws/aws-controllers-k8s/apigatewayv2-controller:1.0.7
2024/04/29 06:01:40  [DEBUG]  : destination docker://registry.local.momolab.io:8443/aws-controllers-k8s/apigatewayv2-controller:1.0.7
2024/04/29 06:01:40  [DEBUG]  : source docker://quay.io/openshift-community-operators/ack-apigatewayv2-controller@sha256:c6844909fa2fdf8aabf1c6762a2871d85fb3491e4c349990f46e4cd1e7ecc099
2024/04/29 06:01:40  [DEBUG]  : destination docker://registry.local.momolab.io:8443/openshift-community-operators/ack-apigatewayv2-controller:c6844909fa2fdf8aabf1c6762a2871d85fb3491e4c349990f46e4cd1e7ecc099
2024/04/29 06:01:40  [DEBUG]  : source docker://quay.io/openshift-community-operators/openshift-nfd-operator@sha256:880517267f12e0ca4dd9621aa196c901eb1f754e5ec990a1459d0869a8c17451
2024/04/29 06:01:40  [DEBUG]  : destination docker://registry.local.momolab.io:8443/openshift-community-operators/openshift-nfd-operator:880517267f12e0ca4dd9621aa196c901eb1f754e5ec990a1459d0869a8c17451
2024/04/29 06:01:40  [DEBUG]  : source docker://quay.io/openshift/origin-cluster-nfd-operator:4.10
2024/04/29 06:01:40  [DEBUG]  : destination docker://registry.local.momolab.io:8443/openshift/origin-cluster-nfd-operator:4.10
2024/04/29 06:01:40  [ERROR]  : [OperatorImageCollector] unable to parse image registry.redhat.io/openshift4/ose-kube-rbac-proxy correctly
2024/04/29 06:01:40  [INFO]   : 👋 Goodbye, thank you for using oc-mirror
error closing log file registry.log: close /data/oc-mirror/workdir/working-dir/logs/registry.log: file already closed
2024/04/29 06:01:40  [ERROR]  : unable to parse image registry.redhat.io/openshift4/ose-kube-rbac-proxy correctly

Steps to Reproduce:

    1. Run oc-mirror command as above with debug enabled
    2. Wait a few minutes
    3. oc-mirror fails

Actual results:

    oc-mirror fails when openshift-community-operator is included

Expected results:

    oc-mirror to complete

Additional info:

I have the debug logs, which I can attach.

https://github.com/openshift/oc-mirror/pull/861

Bug OCPBUGS-23729: Multus admission controller should honour Hypershift controllerAvailabilityPolicy

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2159

Bug OCPBUGS-36447: OpenShift Installer on AWS: EKS CAPA is enabled by default

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35752~~. The following is the description of the original issue:
—
Description of problem:

When installing a fresh 4.16-rc.5 on AWS, the following logs are shown:

time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596147    4921 logger.go:75] \"enabling EKS controllers and webhooks\" logger=\"setup\""
time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596154    4921 logger.go:81] \"EKS IAM role creation\" logger=\"setup\" enabled=false"
time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596159    4921 logger.go:81] \"EKS IAM additional roles\" logger=\"setup\" enabled=false"
time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596164    4921 logger.go:81] \"enabling EKS control plane controller\" logger=\"setup\""
time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596184    4921 logger.go:81] \"enabling EKS bootstrap controller\" logger=\"setup\""
time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596198    4921 logger.go:81] \"enabling EKS managed cluster controller\" logger=\"setup\""
time="2024-06-18T16:47:23+02:00" level=debug msg="I0618 16:47:23.596215    4921 logger.go:81] \"enabling EKS managed machine pool controller\" logger=\"setup\""

That is somehow strange and may have side effects. It seems the EKS CAPA is enabled by default (see additional info)

Version-Release number of selected component (if applicable):

4.16-rc.5

How reproducible:

Always

Steps to Reproduce:

1. Install an cluster (even an SNO works) on AWS using IPI

Actual results:

EKS feature enabled

Expected results:

EKS feature not enabled

Additional info:

https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/feature/feature.go#L99

https://github.com/openshift/installer/pull/8694

Bug OCPBUGS-41334: UPI playbooks when master schedulable fails

View the Description View the linked PRs

This is a clone of issue OCPBUGS-39402. The following is the description of the original issue:
—
There is a typo here: https://github.com/openshift/installer/blob/release-4.18/upi/openstack/security-groups.yaml#L370

It should be os_subnet6_range.

That task is only run if os_master_schedulable is defined and greater to 0 in the inventory.yaml

https://github.com/openshift/installer/pull/8971

Bug OCPBUGS-30539: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/36

Bug OCPBUGS-33222: List DeploymentConfig triggers a warning notification which is not required for Display warning policy feature

View the Description View the linked PRs

Description of problem
List DeploymentConfig triggers a warning notification which is not required for Display warning policy feature. This Warning response is set in the cluster by default. See the Warning response below:

299 - "apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+"

Version-Release number of selected component (if applicable):
4.16

How reproducible:

    Steps to Reproduce:{code:none}
    1. Click on Deployment Config sub nav
    2. The Admission Webhook notification is displayed
    3.

Additional info:
I think this is good since the the CLI behavior like that too,.  Will discuss this behavior in next stand up. 

    Actual results:{code:none}


    Expected results:{code:none}


    Additional info:{code:none}

https://github.com/openshift/console/pull/13818

Bug OCPBUGS-25082: Update 4.16 csi-provisioner-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-provisioner/pull/80

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-provisioner/pull/80

Bug OCPBUGS-25549: Update 4.16 ose-aws-ebs-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/250

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/250

Bug OCPBUGS-33965: i18n - Download and merge French and Spanish languages translations in the OCP Console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33869~~. The following is the description of the original issue:
—
Download and merge French and Spanish languages translations in the OCP Console.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13870

Bug OCPBUGS-43555: Image registry operator becomes degraded when setting management state to Removed when networkAccess is set to Internal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43350~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42732. The following is the description of the original issue:
—
Description of problem:

    The operator cannot succeed removing resources when networkAccess is set to Removed.
    It looks like the authorization error changes from bloberror.AuthorizationPermissionMismatch to bloberror.AuthorizationFailure after the storage account becomes private (networkAccess: Internal).
    This is either caused by weird behavior in the azure sdk, or in the azure api itself.
    The easiest way to solve it is to also handle bloberror.AuthorizationFailure here: https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/storage/azure/azure.go?plain=1#L1145

    The error condition is the following:

status:
  conditions:
  - lastTransitionTime: "2024-09-27T09:04:20Z"
    message: "Unable to delete storage container: DELETE https://imageregistrywxj927q6bpj.blob.core.windows.net/wxj-927d-jv8fc-image-registry-rwccleepmieiyukdxbhasjyvklsshhee\n--------------------------------------------------------------------------------\nRESPONSE
      403: 403 This request is not authorized to perform this operation.\nERROR CODE:
      AuthorizationFailure\n--------------------------------------------------------------------------------\n\uFEFF<?xml
      version=\"1.0\" encoding=\"utf-8\"?><Error><Code>AuthorizationFailure</Code><Message>This
      request is not authorized to perform this operation.\nRequestId:ababfe86-301e-0005-73bd-10d7af000000\nTime:2024-09-27T09:10:46.1231255Z</Message></Error>\n--------------------------------------------------------------------------------\n"
    reason: AzureError
    status: Unknown
    type: StorageExists
  - lastTransitionTime: "2024-09-27T09:02:26Z"
    message: The registry is removed
    reason: Removed
    status: "True"
    type: Available

Version-Release number of selected component (if applicable):

    4.18, 4.17, 4.16 (needs confirmation), 4.15 (needs confirmation)

How reproducible:

    Always

Steps to Reproduce:

    1. Get an Azure cluster
    2. In the operator config, set networkAccess to Internal
    3. Wait until the operator reconciles the change (watch networkAccess in status with `oc get configs.imageregistry/cluster -oyaml |yq '.status.storage'`)
    4. In the operator config, set management state to removed: `oc patch configs.imageregistry/cluster -p '{"spec":{"managementState":"Removed"}}' --type=merge`
    5. Watch the cluster operator conditions for the error

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1143

Task OPRUN-3182: Clean up Downstream OLM go.mod

View the linked PRs

https://github.com/openshift/operator-framework-olm/pull/654

Bug OCPBUGS-30496: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-policy-controller/pull/147

Bug OCPBUGS-38687: vsphere: install-config allows configuration of multiple NICs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38569~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38551. The following is the description of the original issue:
—
Description of problem:

    If multiple NICs are configured in install-config, the installer will provision nodes properly but will fail in bootstrap due to API validation. > 4.17 will support multiple NICs, < 4.17 will not and will fail.

Aug 15 18:30:57 2.252.83.01.in-addr.arpa cluster-bootstrap[4889]: [#1672] failed to create some manifests:
Aug 15 18:30:57 2.252.83.01.in-addr.arpa cluster-bootstrap[4889]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : Infrastructure.config.openshift.io "cluster" is invalid: [spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8875

Bug OCPBUGS-27390: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-nutanix/pull/66

Bug OCPBUGS-29973: ART requests updates to 4.16 image ose-cluster-openshift-controller-manager-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/337

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/337

Bug OCPBUGS-12890: [gcp] Bootstrap node should honor http proxy when fetching bootstrap ignition

View the Description View the linked PRs

Description of problem:

Openshift Installer supports HTTP Proxy configuration in a restricted environment. However, it seems the bootstrap node doesn't use the given proxy when it grabs ignition assets.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-27-113605

How reproducible:

Always

Steps to Reproduce:

1. try IPI installation in a restricted/disconnected network with "publish: Internal", and without using Google Private Access

Actual results:

The installation failed, because bootstrap node failed to fetch its ignition config.

Expected results:

The installation should succeed.

Additional info:

We'd ever fixed similar issue on AWS (and Alibabacloud) by https://bugzilla.redhat.com/show_bug.cgi?id=2090836.

https://github.com/openshift/installer/pull/8056

Bug OCPBUGS-24823: Update 4.16 ose-cloud-network-config-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-network-config-controller/pull/129

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-network-config-controller/pull/129

Bug OCPBUGS-25534: Update 4.16 csi-livenessprobe-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/58

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/58

Bug OCPBUGS-24379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2170

Bug OCPBUGS-24816: Update 4.16 ose-machine-os-images-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-os-images/pull/34

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-os-images/pull/34

Bug OCPBUGS-27760: excessive Back-off restarting failed containers

View the Description View the linked PRs

I noticed this today when looking at component readiness. A ~5% decrease in instability may seem minor, but these can certainly add up. This test passed 713 times in a row on 4.14. You can see today's failure here.

Details below:

-------

Component Readiness has found a potential regression in [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers.

Probability of significant regression: 99.96%

Sample (being evaluated) Release: 4.15
Start Time: 2024-01-17T00:00:00Z
End Time: 2024-01-23T23:59:59Z
Success Rate: 94.83%
Successes: 55
Failures: 3
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 713
Failures: 0
Flakes: 4

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20upgrade-minor%20amd64%20gcp%20rt&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-23%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-17%2000%3A00%3A00&testId=openshift-tests-upgrade%3A37f1600d4f8d75c47fc5f575025068d2&testName=%5Bsig-cluster-lifecycle%5D%20pathological%20event%20should%20not%20see%20excessive%20Back-off%20restarting%20failed%20containers&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=rt&variant=rt

https://github.com/openshift/cluster-baremetal-operator/pull/403

Bug OCPBUGS-28383: Activity card expanded content is not left aligned correctly

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13547

Bug OCPBUGS-31076: Hypershift Operator Keyvault for Azure Etcd Encryption is Hardcoded to Public Cloud

View the Description View the linked PRs

Description of problem:

The hypershift operator introduced Azure customer-managed keys etcd encryption in https://github.com/openshift/hypershift/pull/3183.  The implementation will not work in any non-Azure Public Cloud as the keyvault URL is hardcoded: https://github.com/openshift/hypershift/blob/cd4d4c69a64d8983da04d7bb26ea39a72109e135/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L4871 to vault.azure.net, which is only the public cloud keyvault domain suffix.  The cloud-specific keyvault domain suffixes are here: https://learn.microsoft.com/en-us/azure/key-vault/general/about-keys-secrets-certificates#dns-suffixes-for-object-identifiers

Version-Release number of selected component (if applicable):

    Since https://github.com/openshift/hypershift/pull/3183 was merged

How reproducible:

    Every time

Steps to Reproduce:

    1. 
    2.
    3.

Actual results:

    The keyvault domain is hardcoded to work specifically for public cloud, but will not for azure gov cloud when using etcd encryption with customer-managed keys

Expected results:

    The keyvault domain to fetch from will use the correct cloud's domain suffix as outlined here: https://learn.microsoft.com/en-us/azure/key-vault/general/about-keys-secrets-certificates#dns-suffixes-for-object-identifiers

Additional info:

https://github.com/openshift/hypershift/pull/3804

Bug OCPBUGS-34392: 4.15 AWS EFS CSI driver is not compatible with the 4.16 OCP

View the Description View the linked PRs

Description of problem:

Trying to install AWS EFS Driver 4.15 in 4.16 OCP. And driver pods get stuck with the below error:
$ oc get pods
NAME                                             READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-controller-5f85b66c6-5gw8n    11/11   Running   0          80m
aws-ebs-csi-driver-controller-5f85b66c6-r5lzm    11/11   Running   0          80m
aws-ebs-csi-driver-node-4mcjp                    3/3     Running   0          76m
aws-ebs-csi-driver-node-82hmk                    3/3     Running   0          76m
aws-ebs-csi-driver-node-p7g8j                    3/3     Running   0          80m
aws-ebs-csi-driver-node-q9bnd                    3/3     Running   0          75m
aws-ebs-csi-driver-node-vddmg                    3/3     Running   0          80m
aws-ebs-csi-driver-node-x8cwl                    3/3     Running   0          80m
aws-ebs-csi-driver-operator-5c77fbb9fd-dc94m     1/1     Running   0          80m
aws-efs-csi-driver-controller-6c4c6f8c8c-725f4   4/4     Running   0          11m
aws-efs-csi-driver-controller-6c4c6f8c8c-nvtl7   4/4     Running   0          12m
aws-efs-csi-driver-node-2frs7                    0/3     Pending   0          6m29s
aws-efs-csi-driver-node-5cpb8                    0/3     Pending   0          6m26s
aws-efs-csi-driver-node-bchg5                    0/3     Pending   0          6m28s
aws-efs-csi-driver-node-brndb                    0/3     Pending   0          6m27s
aws-efs-csi-driver-node-qcc4m                    0/3     Pending   0          6m27s
aws-efs-csi-driver-node-wpk5d                    0/3     Pending   0          6m27s
aws-efs-csi-driver-operator-6b54c78484-gvxrt     1/1     Running   0          13m

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  6m58s                  default-scheduler  0/6 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/6 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 5 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling  3m42s (x2 over 4m24s)  default-scheduler  0/6 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/6 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 5 node(s) didn't match Pod's node affinity/selector.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    all the time

Steps to Reproduce:

    1. Install AWS EFS CSI driver 4.15 in 4.16 OCP
    2.
    3.

Actual results:

    EFS CSI drive node pods are stuck in pending state

Expected results:

    All pod should be running.

Additional info:

    More info on the initial debug here: https://redhat-internal.slack.com/archives/CBQHQFU0N/p1715757611210639

Task MON-3697: use maximumStartupDurationSeconds instead of container patch

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2251

Bug OCPBUGS-23378: PF5 bubble component with wrong layout in Create NetworkPolicy page

View the Description View the linked PRs

Description of problem:

The bubble box with wrong layout

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-16-110328

How reproducible:

Always

Steps to Reproduce:

1. Make sure there is no pod under your using project
2. navigate to Networking -> NetworkPolicies -> Create NetworkPolicy page, click the 'affected pods' in Pod selector section
3. Check the layout in the bubble component

Actual results:

the layout is in correct (shared file:https://drive.google.com/file/d/1I8e2ZkiFO2Gu4nSt9kJ6JmRG3LdvkE-u/view?usp=drive_link )

Expected results:

layout should correct

Additional info:

https://github.com/openshift/console/pull/13429

Bug OCPBUGS-36172: Canonicalized pull secrets do not get updated when the original secret has changed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34079~~. The following is the description of the original issue:
—
Description of problem:

If a cluster admin creates a new MachineOSConfig that references a legacy pull secret, the canonicalized version of this secret that gets created is not updated whenever the original pull secret changes.

How reproducible:

Always

Steps to Reproduce:

Create a new legacy-style Docker pull secret in the MCO namespace. Specifically, one which follows the pattern of {"hostname.com": {"username": ""...}
.
Create a MachineOSConfig that references this legacy pull secret. The MachineOSConfig will get updated with a different secret name with the suffix -canonical.
Change the original legacy-style Docker pull secret that was created to a different secret.

Actual results:

The canonicalized version of the pull secret is never updated with the contents of the legacy-style pull secret.

Expected results:

Ideally, the canonicalized version of the pull secret should be updated since BuildController created it.

Additional info:

This occurs because when the legacy pull secret is initially detected, BuildController canonicalizes it and then updates the MachineOSConfig with the name of the canonicalized secret. The next time this secret is referenced, the original secret does not get read.

https://github.com/openshift/machine-config-operator/pull/4432

Bug OCPBUGS-26087: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/95

Bug OCPBUGS-29895: ServiceInstanceNameToGUID needs more debugging statements

View the Description View the linked PRs

Description of problem:

A user noticed on delete cluster that the IPI generated service instance was not cleaned up. Add more debugging statements to find out why.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Create cluster
    2. Delete cluster

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8058

Bug OCPBUGS-31557: ovn-kube ovn-controller node component crashes on RHEL8 nodes

View the Description View the linked PRs

Description of problem:

RHEL8 workers fail to go ready, ovn-controller node component is crashlooping with

2024-03-29T20:41:34.082252221Z + sourcedir=/usr/libexec/cni/
2024-03-29T20:41:34.082269221Z + case "${rhelmajor}" in
2024-03-29T20:41:34.082269221Z + sourcedir=/usr/libexec/cni/rhel8
2024-03-29T20:41:34.082276361Z + cp -f /usr/libexec/cni/rhel8/ovn-k8s-cni-overlay /cni-bin-dir/
2024-03-29T20:41:34.083575440Z cp: cannot stat '/usr/libexec/cni/rhel8/ovn-k8s-cni-overlay': No such file or directory

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    100% since https://github.com/openshift/ovn-kubernetes/pull/2083 merged

Steps to Reproduce:

    1. run periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-workers-rhel8

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2104

Bug OCPBUGS-33189: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8366

Bug OCPBUGS-25104: revert "force cert rotation every couple days for development" in 4.16

View the Description View the linked PRs

Description of problem:

revert "force cert rotation every couple days for development" in 4.16

Below is the steps to verify this bug:

# oc adm release info --commits registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-06-25-081133|grep -i cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                7764681777edfa3126981a0a1d390a6060a840a3

# git log --date local --pretty="%h %an %cd - %s" 776468 |grep -i "#1307"
08973b820 openshift-ci[bot] Thu Jun 23 22:40:08 2022 - Merge pull request #1307 from tkashem/revert-cert-rotation

# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-081133   True        False         64m     Cluster version is 4.11.0-0.nightly-2022-06-25-081133

$ cat scripts/check_secret_expiry.sh
FILE="$1"
if [ ! -f "$1" ]; then
  echo "must provide \$1" && exit 0
fi
export IFS=$'\n'
for i in `cat "$FILE"`
do
  if `echo "$i" | grep "^#" > /dev/null`; then
    continue
  fi
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt; oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  openssl x509 -noout --dates -in tls.crt; echo
done

$ cat certs.txt
openshift-kube-controller-manager-operator csr-signer-signer
openshift-kube-controller-manager-operator csr-signer
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver external-loadbalancer-serving-certkey
openshift-kube-apiserver internal-loadbalancer-serving-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key

Checking the Certs,  they are with one day expiry times, this is as expected.
# ./check_secret_expiry.sh certs.txt
Check cert dates of csr-signer-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:41:38 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of csr-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:52:21 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
notBefore=Jun 27 04:41:37 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of external-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of internal-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:49 2022 GMT
notAfter=Jul 27 04:52:50 2022 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:28 2022 GMT
notAfter=Jul 27 04:52:29 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT
# 

# cat check_secret_expiry_within.sh
#!/usr/bin/env bash
# usage: ./check_secret_expiry_within.sh 1day # or 15min, 2days, 2day, 2month, 1year
WITHIN=${1:-24hours}
echo "Checking validity within $WITHIN ..."
oc get secret --insecure-skip-tls-verify -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | . != null and fromdateiso8601<='$( date --date="+$WITHIN" +%s )') | "\(.metadata.annotations."auth.openshift.io/certificate-not-before")  \(.metadata.annotations."auth.openshift.io/certificate-not-after")  \(.metadata.namespace)\t\(.metadata.name)"'

# ./check_secret_expiry_within.sh 1day
Checking validity within 1day ...
2022-06-27T04:41:37Z  2022-06-28T04:41:37Z  openshift-kube-apiserver-operator	aggregator-client-signer
2022-06-27T04:52:26Z  2022-06-28T04:41:37Z  openshift-kube-apiserver	aggregator-client
2022-06-27T04:52:21Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer
2022-06-27T04:41:38Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer-signer

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1695

Bug OCPBUGS-27211: IPv6 ETP=Local Services broken on LGW

View the Description View the linked PRs

Description of problem:

    See https://issues.redhat.com/browse/OCPBUGS-26053

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Create an ETP=Local LB Service on LGW for some v6 workload (assign IP to lb with MetalLB or manually)
    2. Set static routes to a node hosting a pod on the client
    3. Attempt reaching the IPv6 Service fails

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ovn-kubernetes/pull/2018

Bug OCPBUGS-35357: The must-gather took too long after upgrading to OpenShift 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34360~~. The following is the description of the original issue:
—
Description of problem:

    After upgrading to OpenShift 4.14, the must-gather took much longer than before.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    Always

Steps to Reproduce:

    1. Run oc adm must-gather
    2. Wait for it to complete
    3.

Actual results:

    For a cluster with around 50 nodes, the must-gather took about 30 minutes.

Expected results:

   For a cluster with around 50 nodes, the must-gather can finish in about 10 minutes.

Additional info:

    It seems the gather_ppc collection script is related here.

https://github.com/openshift/must-gather/blob/release-4.14/collection-scripts/gather_ppc

https://github.com/openshift/must-gather/pull/426

Bug OCPBUGS-25587: Update 4.16 ose-cluster-kube-controller-manager-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/779

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/780

Bug OCPBUGS-31845: Assisted-service fail to register cluster with "CPU architecture is not supported" error

View the Description View the linked PRs

Description of problem:

When running agent-based installation with arm64 and multi payload, after booting the iso file, assisted-service raise the error, and the installation fail to start:

Openshift version 4.16.0-0.nightly-arm64-2024-04-02-182838 for CPU architecture arm64 is not supported: no release image found for openshiftVersion: '4.16.0-0.nightly-arm64-2024-04-02-182838' and CPU architecture 'arm64'" go-id=419 pkg=Inventory request_id=5817b856-ca79-43c0-84f1-b38f733c192f 

The same error when running the installation with multi-arch build in assisted-service.log:

Openshift version 4.16.0-0.nightly-multi-2024-04-01-135550 for CPU architecture multi is not supported: no release image found for openshiftVersion: '4.16.0-0.nightly-multi-2024-04-01-135550' and CPU architecture 'multi'" go-id=306 pkg=Inventory request_id=21a47a40-1de9-4ee3-9906-a2dd90b14ec8 

Amd64 build works fine for now.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create agent iso file with openshift-install binary: openshift-install agent create image with arm64/multi payload
2. Booting the iso file 
3. Track the "openshift-install agent wait-for bootstrap-complete" output and assisted-service log

Actual results:

 The installation can't start with error

Expected results:

 The installation is working fine

Additional info:

assisted-service log: https://docs.google.com/spreadsheets/d/1Jm-eZDrVz5so4BxsWpUOlr3l_90VmJ8FVEvqUwG8ltg/edit#gid=0

Job fail url: 
multi payload: 
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-baremetal-compact-agent-ipv4-dhcp-day2-amd-mixarch-f14/1774134780246364160

arm64 payload:
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-arm64-nightly-baremetal-pxe-ha-agent-ipv4-static-connected-f14/1773354788239446016

https://github.com/openshift/installer/pull/8490

Bug OCPBUGS-33531: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-32041~~.

https://github.com/openshift/installer/pull/8409

Bug OCPBUGS-28718: Revision tab and routes tab in serving details page showing no resource found

View the Description View the linked PRs

Description of problem:

    In service details page, under Revision and Route tabs, user is able to see No resource found message although Revision and Route is created for that service

Version-Release number of selected component (if applicable):

    4.15.z

How reproducible:

    Always

Steps to Reproduce:

    1.Install serverless operator
    2.Create serving instance
    3.Create knative service/ function
    4.Go to details page

Actual results:

    User is not able to see Revision and Route created for the service

Expected results:

     User should be able to see Revision and Route created for the service

Additional info:

https://github.com/openshift/console/pull/13556

Bug OCPBUGS-24742: Update 4.16 prom-label-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prom-label-proxy/pull/359

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24942: Update 4.16 operator-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/630

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/630

Bug OCPBUGS-35344: Master network address collection fails in absence of hardware details

View the Description View the linked PRs

HardwareDetails is a pointer and we fail to check if it's null. The installer panics when attempting to collect gather logs from masters.

https://github.com/openshift/installer/pull/8579

Bug OCPBUGS-30440: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-livenessprobe/pull/63

Bug OCPBUGS-34837: [backport 4.16] Ingress Operator E2E test failing with prometheus service account not found

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33792~~. The following is the description of the original issue:
—
Description of problem:

The ingress operator is E2E tests are perma-failing with a prometheus service account issue:
=== CONT  TestAll/parallel/TestRouteMetricsControllerRouteAndNamespaceSelector
    route_metrics_test.go:86: prometheus service account not found
=== CONT  TestAll/parallel/TestRouteMetricsControllerOnlyNamespaceSelector
    route_metrics_test.go:86: prometheus service account not found
=== CONT  TestAll/parallel/TestRouteMetricsControllerOnlyRouteSelector
    route_metrics_test.go:86: prometheus service account not found

We need to bump openshift/library-go to get update https://github.com/openshift/library-go/pull/1697 for NewPrometheusClient function that switches from using a legacy service account API to TokenRequest API.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    100%

Steps to Reproduce:

    1. Run e2e-[aws|gcp|azure]-operator E2E tests on cluster-ingress-operator

Actual results:

     route_metrics_test.go:86: prometheus service account not found

Expected results:

    No failure

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/1074

Bug OCPBUGS-23859: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/98

Bug OCPBUGS-31961: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/431

Task HOSTEDCP-1552: Update RHTAP tekton files for 0.3 -> 0.4 migration

View the Description View the linked PRs

Update the tekton files per the migration instructions for 4.14, 4.15, & 4.16 branches.

https://github.com/openshift/hypershift/pull/3956

Bug OCPBUGS-25049: Update 4.16 ose-ibm-vpc-block-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/60

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/61

Bug OCPBUGS-28558: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2057

Bug OCPBUGS-35088: PowerVS: fix AddIPToLoadBalancerPool

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34869~~. The following is the description of the original issue:
—
Description of problem:

During IPI CAPI cluster creation, it is possible that the load balancer is currently busy. So wrap AddIPToLoadBalancerPool in a PollUntilContextCancel loop.

https://github.com/openshift/installer/pull/8551

Bug OCPBUGS-24783: Update 4.16 atomic-openshift-cluster-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/271

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/271

Bug OCPBUGS-25197: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/249

Bug OCPBUGS-29932: image registry operator displays panic in status from move-blobs command

View the Description View the linked PRs

Description of problem:

    Sample job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.15-nightly-x86-data-path-9nodes/1760228008968327168

Version-Release number of selected component (if applicable):

How reproducible:

    Anytime there is an error from the move-blobs command

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    An error message is shown

Expected results:

    A panic is shown followed by the error message

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1006

Bug OCPBUGS-32320: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28708

Bug OKD-219: Resolve differences between RHEL and CentOS Stream base images

View the Description View the linked PRs

After some investigation, the issues we have seen with util-linux missing from some images are due to the CentOS Stream base image not installing subscription-manager

[root@92063ff10998 /]# yum install subscription-manager
CentOS Stream 9 - BaseOS                                                                                                                                                                                      3.3 MB/s | 8.9 MB     00:02    
CentOS Stream 9 - AppStream                                                                                                                                                                                   2.1 MB/s |  17 MB     00:08    
CentOS Stream 9 - Extras packages                                                                                                                                                                              14 kB/s |  17 kB     00:01    
Dependencies resolved.
==============================================================================================================================================================================================================================================
 Package                                                                        Architecture                                    Version                                                  Repository                                      Size
==============================================================================================================================================================================================================================================
Installing:
 subscription-manager                                                           aarch64                                         1.29.40-1.el9                                            baseos                                         911 k
Installing dependencies:
 acl                                                                            aarch64                                         2.3.1-4.el9                                              baseos                                          71 k
 checkpolicy                                                                    aarch64                                         3.6-1.el9                                                baseos                                         348 k
 cracklib                                                                       aarch64                                         2.9.6-27.el9                                             baseos                                          95 k
 cracklib-dicts                                                                 aarch64                                         2.9.6-27.el9                                             baseos                                         3.6 M
 dbus                                                                           aarch64                                         1:1.12.20-8.el9                                          baseos                                         3.7 k
 dbus-broker                                                                    aarch64                                         28-7.el9                                                 baseos                                         166 k
 dbus-common                                                                    noarch                                          1:1.12.20-8.el9                                          baseos                                          15 k
 dbus-libs                                                                      aarch64                                         1:1.12.20-8.el9                                          baseos                                         150 k
 diffutils                                                                      aarch64                                         3.7-12.el9                                               baseos                                         392 k
 dmidecode                                                                      aarch64                                         1:3.3-7.el9                                              baseos                                          70 k
 gobject-introspection                                                          aarch64                                         1.68.0-11.el9                                            baseos                                         248 k
 iproute                                                                        aarch64                                         6.2.0-5.el9                                              baseos                                         818 k
 kmod-libs                                                                      aarch64                                         28-9.el9                                                 baseos                                          62 k
 libbpf                                                                         aarch64                                         2:1.3.0-2.el9                                            baseos                                         172 k
 libdb                                                                          aarch64                                         5.3.28-53.el9                                            baseos                                         712 k
 libdnf-plugin-subscription-manager                                             aarch64                                         1.29.40-1.el9                                            baseos                                          63 k
 libeconf                                                                       aarch64                                         0.4.1-4.el9                                              baseos                                          26 k
 libfdisk                                                                       aarch64                                         2.37.4-18.el9                                            baseos                                         150 k
 libmnl                                                                         aarch64                                         1.0.4-16.el9                                             baseos                                          28 k
 libpwquality                                                                   aarch64                                         1.4.4-8.el9                                              baseos                                         119 k
 libseccomp                                                                     aarch64                                         2.5.2-2.el9                                              baseos                                          72 k
 libselinux-utils                                                               aarch64                                         3.6-1.el9                                                baseos                                         190 k
 libuser                                                                        aarch64                                         0.63-13.el9                                              baseos                                         405 k
 libutempter                                                                    aarch64                                         1.2.1-6.el9                                              baseos                                          27 k
 openssl                                                                        aarch64                                         1:3.2.1-1.el9                                            baseos                                         1.3 M
 pam                                                                            aarch64                                         1.5.1-19.el9                                             baseos                                         627 k
 passwd                                                                         aarch64                                         0.80-12.el9                                              baseos                                         121 k
 policycoreutils                                                                aarch64                                         3.6-2.1.el9                                              baseos                                         242 k
 policycoreutils-python-utils                                                   noarch                                          3.6-2.1.el9                                              baseos                                          77 k
 psmisc                                                                         aarch64                                         23.4-3.el9                                               baseos                                         243 k
 python3-audit                                                                  aarch64                                         3.1.2-2.el9                                              baseos                                          83 k
 python3-chardet                                                                noarch                                          4.0.0-5.el9                                              baseos                                         239 k
 python3-cloud-what                                                             aarch64                                         1.29.40-1.el9                                            baseos                                          77 k
 python3-dateutil                                                               noarch                                          1:2.8.1-7.el9                                            baseos                                         288 k
 python3-dbus                                                                   aarch64                                         1.2.18-2.el9                                             baseos                                         144 k
 python3-decorator                                                              noarch                                          4.4.2-6.el9                                              baseos                                          28 k
 python3-distro                                                                 noarch                                          1.5.0-7.el9                                              baseos                                          37 k
 python3-dnf-plugins-core                                                       noarch                                          4.3.0-15.el9                                             baseos                                         264 k
 python3-gobject-base                                                           aarch64                                         3.40.1-6.el9                                             baseos                                         184 k
 python3-gobject-base-noarch                                                    noarch                                          3.40.1-6.el9                                             baseos                                         161 k
 python3-idna                                                                   noarch                                          2.10-7.el9.1                                             baseos                                         102 k
 python3-iniparse                                                               noarch                                          0.4-45.el9                                               baseos                                          47 k
 python3-inotify                                                                noarch                                          0.9.6-25.el9                                             baseos                                          53 k
 python3-librepo                                                                aarch64                                         1.14.5-2.el9                                             baseos                                          48 k
 python3-libselinux                                                             aarch64                                         3.6-1.el9                                                baseos                                         183 k
 python3-libsemanage                                                            aarch64                                         3.6-1.el9                                                baseos                                          79 k
 python3-policycoreutils                                                        noarch                                          3.6-2.1.el9                                              baseos                                         2.1 M
 python3-pysocks                                                                noarch                                          1.7.1-12.el9                                             baseos                                          35 k
 python3-requests                                                               noarch                                          2.25.1-8.el9                                             baseos                                         125 k
 python3-setools                                                                aarch64                                         4.4.4-1.el9                                              baseos                                         595 k
 python3-setuptools                                                             noarch                                          53.0.0-12.el9                                            baseos                                         944 k
 python3-six                                                                    noarch                                          1.15.0-9.el9                                             baseos                                          37 k
 python3-subscription-manager-rhsm                                              aarch64                                         1.29.40-1.el9                                            baseos                                         162 k
 python3-systemd                                                                aarch64                                         234-18.el9                                               baseos                                          89 k
 python3-urllib3                                                                noarch                                          1.26.5-5.el9                                             baseos                                         215 k
 subscription-manager-rhsm-certificates                                         noarch                                          20220623-1.el9                                           baseos                                          21 k
 systemd                                                                        aarch64                                         252-33.el9                                               baseos                                         4.0 M
 systemd-libs                                                                   aarch64                                         252-33.el9                                               baseos                                         641 k
 systemd-pam                                                                    aarch64                                         252-33.el9                                               baseos                                         271 k
 systemd-rpm-macros                                                             noarch                                          252-33.el9                                               baseos                                          69 k
 usermode                                                                       aarch64                                         1.114-4.el9                                              baseos                                         189 k
 util-linux                                                                     aarch64                                         2.37.4-18.el9                                            baseos                                         2.3 M
 util-linux-core                                                                aarch64                                         2.37.4-18.el9                                            baseos                                         463 k
 virt-what                                                                      aarch64                                         1.25-5.el9                                               baseos                                          33 k
 which                                                                          aarch64                                         2.21-29.el9                                              baseos                                          41 kTransaction Summary
==============================================================================================================================================================================================================================================
Install  66 PackagesTotal download size: 26 M
Installed size: 92 M
Is this ok [y/N]:

subscription-manager does bring in quite a few things. we can probably get away with installing

systemd util-linux iproute dbus

we may hit some edge cases still where something works in OCP but doesn't in OKD due to a missing package. we have hit at least 6 or 7 containers using tools from util-linux so far.

https://github.com/openshift/images/pull/188

Bug OCPBUGS-29068: [Custom DNS] installer should skip DNS zone validation

View the Description View the linked PRs

Description of problem:

User may provide an DNS domain outside GCP, once custom DNS is enabled, installer should skip DNS zone validation:

level=fatal msg="failed to fetch Terraform Variables: failed to generate asset \"Terraform Variables\": failed to get GCP public zone: no matching public DNS Zone found"

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-02-03-192446
4.16.0-0.nightly-2024-02-03-221256

How reproducible:

 Always

Steps to Reproduce:

1. Enable custom DNS on gcp: platform.gcp.userProvisionedDNS:Enabled and featureSet:TechPreviewNoUpgrade
2. config a baseDomain which does not exist on GCP.

Actual results:

See description.

Expected results:

Installer should skip the validation, as the custom domain may not exist on GCP

Additional info:

https://github.com/openshift/installer/pull/7986

Bug OCPBUGS-31694: e2e: [Workloadhints]: Workload hints test cases gets stuck for certain test cases

View the Description View the linked PRs

Description of problem:

    Workload hints test cases get stuck  when the existing profile is similar to changes proposed in some of the test cases

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1012

Bug OCPBUGS-25019: [release-4.16] no RHEL8 version of opm

View the Description View the linked PRs

Description of problem:

    When we run opm on RHEL8, we met the following error
./opm: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./opm)
./opm: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by ./opm)
./opm: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./opm)

Note: it happened for 4.15.0-ec.3
I tried the 4.14, it works.
I also tried to compile it with latest code, it also work.

Version-Release number of selected component (if applicable):

    4.15.0-ec.3

How reproducible:

    always

Steps to Reproduce:

[root@preserve-olm-env2 slavecontainer]# curl -s -k -L https://mirror2.openshift.com/pub/openshift-v4/x86_64/clients/ocp-dev-preview/candidate/opm-linux-4.15.0-ec.3.tar.gz -o opm.tar.gz && tar -xzvf opm.tar.gz
opm
[root@preserve-olm-env2 slavecontainer]# ./opm version
./opm: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by ./opm)
./opm: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by ./opm)
./opm: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./opm)
[root@preserve-olm-env2 slavecontainer]# curl -s -l -L https://mirror2.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest-4.14/opm-linux-4.14.5.tar.gz -o opm.tar.gz && tar -xzvf opm.tar.gz
opm
[root@preserve-olm-env2 slavecontainer]# opm version
Version: version.Version{OpmVersion:"639fc1203", GitCommit:"639fc12035292dec74a16b306226946c8da404a2", BuildDate:"2023-11-21T08:03:15Z", GoOs:"linux", GoArch:"amd64"}
[root@preserve-olm-env2 kuiwang]# cd operator-framework-olm/
[root@preserve-olm-env2 operator-framework-olm]# git branch
  gs
* master
  release-4.10
  release-4.11
  release-4.12
  release-4.13
  release-4.8
  release-4.9
[root@preserve-olm-env2 operator-framework-olm]# git pull origin master
remote: Enumerating objects: 1650, done.
remote: Counting objects: 100% (1650/1650), done.
remote: Compressing objects: 100% (831/831), done.
remote: Total 1650 (delta 727), reused 1617 (delta 711), pack-reused 0
Receiving objects: 100% (1650/1650), 2.03 MiB | 12.81 MiB/s, done.
Resolving deltas: 100% (727/727), completed with 468 local objects.
From github.com:openshift/operator-framework-olm
 * branch master -> FETCH_HEAD
   639fc1203..85c579f9b master -> origin/master
Updating 639fc1203..85c579f9b
Fast-forward
 go.mod | 120 +-
 go.sum | 240 ++--
 manifests/0000_50_olm_00-pprof-secret.yaml
...
 create mode 100644 vendor/google.golang.org/protobuf/types/dynamicpb/types.go
[root@preserve-olm-env2 operator-framework-olm]# rm -fr bin/opm
[root@preserve-olm-env2 operator-framework-olm]# make build/opm
make bin/opm
make[1]: Entering directory '/data/kuiwang/operator-framework-olm'
go build -ldflags "-X 'github.com/operator-framework/operator-registry/cmd/opm/version.gitCommit=85c579f9be61aaea11e90b6c870452c72107300a' -X 'github.com/operator-framework/operator-registry/cmd/opm/version.opmVersion=85c579f9b' -X 'github.com/operator-framework/operator-registry/cmd/opm/version.buildDate=2023-12-11T06:12:50Z'" -mod=vendor -tags "json1" -o bin/opm github.com/operator-framework/operator-registry/cmd/opm
make[1]: Leaving directory '/data/kuiwang/operator-framework-olm'
[root@preserve-olm-env2 operator-framework-olm]# which opm
/data/kuiwang/operator-framework-olm/bin/opm
[root@preserve-olm-env2 operator-framework-olm]# opm version
Version: version.Version{OpmVersion:"85c579f9b", GitCommit:"85c579f9be61aaea11e90b6c870452c72107300a", BuildDate:"2023-12-11T06:12:50Z", GoOs:"linux", GoArch:"amd64"}

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-25103: HyperShift failing on kube 1.29 rebase

View the Description View the linked PRs

Description of problem:

    Kube apiserver pod keeps crashing when tested against v1.29 rebase

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Run hypershift e2e agains v1.29 rebase
    2.
    3.

Actual results:

    Fails https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-kubernetes-1810-periodics-e2e-aws-ovn/1732734494688940032

Expected results:

    Succeeds

Additional info:

    Kube apiserver pod is crashlooping with:
E1208 21:17:06.619997 1 run.go:74] "command failed" err="group version flowcontrol.apiserver.k8s.io/v1alpha1 that has not been registered"

https://github.com/openshift/hypershift/pull/3304

Bug OCPBUGS-25887: [IBMCloud] cluster install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

View the Description View the linked PRs

Description of problem:

Cluster install fails on IBMCloud, nodes tainted with node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

Version-Release number of selected component (if applicable):

from 4.16.0-0.nightly-2023-12-22-210021

last PASS version: 4.16.0-0.nightly-2023-12-20-061023

How reproducible:

Always

Steps to Reproduce:

    1. Install a cluster on IBMCloud, we use auto flexy template: aos-4_16/ipi-on-ibmcloud/versioned-installer

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          92m     Unable to apply 4.16.0-0.nightly-2023-12-25-200355: an unknown error has occurred: MultipleErrors
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                                                                                                               
baremetal                                                                                                                    
cloud-controller-manager                   4.16.0-0.nightly-2023-12-25-200355   True        False         False      89m     
cloud-credential                                                                                                             
cluster-autoscaler                                                                                                           
config-operator                                                                                                              
console                                                                                                                      
control-plane-machine-set                                                                                                    
csi-snapshot-controller                                                                                                      
dns                                                                                                                          
etcd                                                                                                                         
image-registry                                                                                                               
ingress                                                                                                                      
insights                                                                                                                     
kube-apiserver                                                                                                               
kube-controller-manager                                                                                                      
kube-scheduler                                                                                                               
kube-storage-version-migrator                                                                                                
machine-api                                                                                                                  
machine-approver                                                                                                             
machine-config                                                                                                               
marketplace                                                                                                                  
monitoring                                                                                                                   
network                                                                                                                      
node-tuning                                                                                                                  
openshift-apiserver                                                                                                          
openshift-controller-manager                                                                                                 
openshift-samples                                                                                                            
operator-lifecycle-manager                                                                                                   
operator-lifecycle-manager-catalog                                                                                           
operator-lifecycle-manager-packageserver                                                                                     
service-ca                                                                                                                   
storage                                                                                                                       
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                        STATUS     ROLES                  AGE   VERSION
huliu-ibma-qbg48-master-0   NotReady   control-plane,master   89m   v1.29.0+b0d609f
huliu-ibma-qbg48-master-1   NotReady   control-plane,master   89m   v1.29.0+b0d609f
huliu-ibma-qbg48-master-2   NotReady   control-plane,master   89m   v1.29.0+b0d609f
liuhuali@Lius-MacBook-Pro huali-test % oc describe node huliu-ibma-qbg48-master-0
Name:               huliu-ibma-qbg48-master-0
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=huliu-ibma-qbg48-master-0
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node-role.kubernetes.io/master=
                    node.openshift.io/os_id=rhcos
Annotations:        volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 27 Dec 2023 18:02:21 +0800
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  huliu-ibma-qbg48-master-0
  AcquireTime:     <unset>
  RenewTime:       Wed, 27 Dec 2023 19:32:24 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 27 Dec 2023 19:32:21 +0800   Wed, 27 Dec 2023 18:02:21 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 27 Dec 2023 19:32:21 +0800   Wed, 27 Dec 2023 18:02:21 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 27 Dec 2023 19:32:21 +0800   Wed, 27 Dec 2023 18:02:21 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 27 Dec 2023 19:32:21 +0800   Wed, 27 Dec 2023 18:02:21 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Addresses:
Capacity:
  cpu:                4
  ephemeral-storage:  104266732Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16391716Ki
  pods:               250
Allocatable:
  cpu:                3500m
  ephemeral-storage:  95018478229
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             15240740Ki
  pods:               250
System Info:
  Machine ID:                 0ae21a012be844f18c5871f6eaefb85b
  System UUID:                0ae21a01-2be8-44f1-8c58-71f6eaefb85b
  Boot ID:                    fbe619e2-8ff5-4cdb-b6a4-cd6830ccc568
  Kernel Version:             5.14.0-284.45.1.el9_2.x86_64
  OS Image:                   Red Hat Enterprise Linux CoreOS 416.92.202312250319-0 (Plow)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  cri-o://1.28.2-9.rhaos4.15.git6d902a3.el9
  Kubelet Version:            v1.29.0+b0d609f
  Kube-Proxy Version:         v1.29.0+b0d609f
Non-terminated Pods:          (0 in total)
  Namespace                   Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
  hugepages-1Gi      0 (0%)    0 (0%)
  hugepages-2Mi      0 (0%)    0 (0%)
Events:
  Type    Reason                   Age                From             Message
  ----    ------                   ----               ----             -------
  Normal  NodeHasNoDiskPressure    90m (x7 over 90m)  kubelet          Node huliu-ibma-qbg48-master-0 status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     90m (x7 over 90m)  kubelet          Node huliu-ibma-qbg48-master-0 status is now: NodeHasSufficientPID
  Normal  NodeHasSufficientMemory  90m (x7 over 90m)  kubelet          Node huliu-ibma-qbg48-master-0 status is now: NodeHasSufficientMemory
  Normal  RegisteredNode           90m                node-controller  Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
  Normal  RegisteredNode           73m                node-controller  Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
  Normal  RegisteredNode           53m                node-controller  Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
  Normal  RegisteredNode           32m                node-controller  Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller
  Normal  RegisteredNode           12m                node-controller  Node huliu-ibma-qbg48-master-0 event: Registered Node huliu-ibma-qbg48-master-0 in Controller 
liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-cloud-controller-manager
NAME                                            READY   STATUS             RESTARTS         AGE
ibm-cloud-controller-manager-787645668b-djqnr   0/1     CrashLoopBackOff   22 (2m29s ago)   90m
ibm-cloud-controller-manager-787645668b-pgkh2   0/1     Error              15 (5m8s ago)    52m
liuhuali@Lius-MacBook-Pro huali-test % oc describe pod ibm-cloud-controller-manager-787645668b-pgkh2 -n openshift-cloud-controller-manager
Name:                 ibm-cloud-controller-manager-787645668b-pgkh2
Namespace:            openshift-cloud-controller-manager
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 huliu-ibma-qbg48-master-2/
Start Time:           Wed, 27 Dec 2023 18:41:23 +0800
Labels:               infrastructure.openshift.io/cloud-controller-manager=IBMCloud
                      k8s-app=ibm-cloud-controller-manager
                      pod-template-hash=787645668b
Annotations:          operator.openshift.io/config-hash: 82a75c6ff86a490b0dac9c8c9b91f1987da0e646a42d72c33c54cbde3c29395b
Status:               Running
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/ibm-cloud-controller-manager-787645668b
Containers:
  cloud-controller-manager:
    Container ID:  cri-o://c56e246f64c770146c30b7a894f6a4d974159551dbb9d1ea31c238e516a0f854
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218
    Image ID:      e494d0d4b28e31170a4a2792bb90701c7f1e81c78c03e3686c5f0e601801937e
    Port:          10258/TCP
    Host Port:     10258/TCP
    Command:
      /bin/bash
      -c
      #!/bin/bash
      set -o allexport
      if [[ -f /etc/kubernetes/apiserver-url.env ]]; then
        source /etc/kubernetes/apiserver-url.env
      fi
      exec /bin/ibm-cloud-controller-manager \
      --bind-address=$(POD_IP_ADDRESS) \
      --use-service-account-credentials=true \
      --configure-cloud-routes=false \
      --cloud-provider=ibm \
      --cloud-config=/etc/ibm/cloud.conf \
      --profiling=false \
      --leader-elect=true \
      --leader-elect-lease-duration=137s \
      --leader-elect-renew-deadline=107s \
      --leader-elect-retry-period=26s \
      --leader-elect-resource-namespace=openshift-cloud-controller-manager \
      --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256,TLS_AES_256_GCM_SHA384 \
      --v=2
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 27 Dec 2023 19:33:23 +0800
      Finished:     Wed, 27 Dec 2023 19:33:23 +0800
    Ready:          False
    Restart Count:  15
    Requests:
      cpu:     75m
      memory:  60Mi
    Liveness:  http-get https://:10258/healthz delay=300s timeout=160s period=10s #success=1 #failure=3
    Environment:
      POD_IP_ADDRESS:           (v1:status.podIP)
      VPCCTL_CLOUD_CONFIG:     /etc/ibm/cloud.conf
      VPCCTL_PUBLIC_ENDPOINT:  false
    Mounts:
      /etc/ibm from cloud-conf (rw)
      /etc/kubernetes from host-etc-kube (ro)
      /etc/pki/ca-trust/extracted/pem from trusted-ca (ro)
      /etc/vpc from ibm-cloud-credentials (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cbd4b (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  trusted-ca:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      ccm-trusted-ca
    Optional:  false
  host-etc-kube:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes
    HostPathType:  Directory
  cloud-conf:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cloud-conf
    Optional:  false
  ibm-cloud-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ibm-cloud-credentials
    Optional:    false
  kube-api-access-cbd4b:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                             node.kubernetes.io/not-ready:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  52m                    default-scheduler  Successfully assigned openshift-cloud-controller-manager/ibm-cloud-controller-manager-787645668b-pgkh2 to huliu-ibma-qbg48-master-2
  Normal   Pulling    52m                    kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218"
  Normal   Pulled     52m                    kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" in 3.431s (3.431s including waiting)
  Normal   Created    50m (x5 over 52m)      kubelet            Created container cloud-controller-manager
  Normal   Started    50m (x5 over 52m)      kubelet            Started container cloud-controller-manager
  Normal   Pulled     50m (x4 over 52m)      kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:76aedf175591ff1675c891e5c80d02ee7425a6b3a98c34427765f402ca050218" already present on machine
  Warning  BackOff    2m19s (x240 over 52m)  kubelet            Back-off restarting failed container cloud-controller-manager in pod ibm-cloud-controller-manager-787645668b-pgkh2_openshift-cloud-controller-manager(d7f93ecf-cd14-450e-a986-028559a775b3)
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

    cluster install failed on IBMCloud

Expected results:

    cluster install succeed on IBMCloud

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/323

Bug OCPBUGS-30604: Misformatted node labels causing origin-tests to panic

View the Description View the linked PRs

Description of problem:

    Panic thrown by origin-tests

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. Create aws or rosa 4.15 cluster
    2. run origin tests
    3.

Actual results:

    time="2024-03-07T17:03:50Z" level=info msg="resulting interval message" message="{RegisteredNode  Node ip-10-0-8-83.ec2.internal event: Registered Node ip-10-0-8-83.ec2.internal in Controller map[reason:RegisteredNode roles:worker]}"
  E0307 17:03:50.319617      71 runtime.go:79] Observed a panic: runtime.boundsError{x:24, y:23, signed:true, code:0x3} (runtime error: slice bounds out of range [24:23])
  goroutine 310 [running]:
  k8s.io/apimachinery/pkg/util/runtime.logPanic({0x84c6f20?, 0xc006fdc588})
  	k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:75 +0x99
  k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc008c38120?})
  	k8s.io/apimachinery@v0.29.0/pkg/util/runtime/runtime.go:49 +0x75
  panic({0x84c6f20, 0xc006fdc588})
  	runtime/panic.go:884 +0x213
  github.com/openshift/origin/pkg/monitortests/testframework/watchevents.nodeRoles(0x0?)
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:251 +0x1e5
  github.com/openshift/origin/pkg/monitortests/testframework/watchevents.recordAddOrUpdateEvent({0x96bcc00, 0xc0076e3310}, {0x7f2a0e47a1b8, 0xc007732330}, {0x281d36d?, 0x0?}, {0x9710b50, 0xc000c5e000}, {0x9777af, 0xedd7be6b7, ...}, ...)
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:116 +0x41b
  github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring.func2({0x8928f00?, 0xc00b528c80})
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:65 +0x185
  k8s.io/client-go/tools/cache.(*FakeCustomStore).Add(0x8928f00?, {0x8928f00?, 0xc00b528c80?})
  	k8s.io/client-go@v0.29.0/tools/cache/fake_custom_store.go:35 +0x31
  k8s.io/client-go/tools/cache.watchHandler({0x0?, 0x0?, 0xe16d020?}, {0x9694a10, 0xc006b00180}, {0x96d2780, 0xc0078afe00}, {0x96f9e28?, 0x8928f00}, 0x0, ...)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:756 +0x603
  k8s.io/client-go/tools/cache.(*Reflector).watch(0xc0005dcc40, {0x0?, 0x0?}, 0xc005cdeea0, 0xc005bf8c40?)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:437 +0x53b
  k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0005dcc40, 0xc005cdeea0)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:357 +0x453
  k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:291 +0x26
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x10?)
  	k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:226 +0x3e
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc007974ec0?, {0x9683f80, 0xc0078afe50}, 0x1, 0xc005cdeea0)
  	k8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:227 +0xb6
  k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0005dcc40, 0xc005cdeea0)
  	k8s.io/client-go@v0.29.0/tools/cache/reflector.go:290 +0x17d
  created by github.com/openshift/origin/pkg/monitortests/testframework/watchevents.startEventMonitoring
  	github.com/openshift/origin/pkg/monitortests/testframework/watchevents/event.go:83 +0x6a5
panic: runtime error: slice bounds out of range [24:23] [recovered]
	panic: runtime error: slice bounds out of range [24:23]

Expected results:

    execution of tests

Additional info:

https://github.com/openshift/origin/pull/28650

Bug OCPBUGS-32491: Power VS: Endpoint overrides are ignored for cluster-image-registry-operator

View the Description View the linked PRs

Description of problem:

    When deploying to Power VS with endpoint overrides set in the provider status, the operator will ignore the overrides.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Easily

Steps to Reproduce:

    1. Set overrides in platform status
    2. Deploy cluster-image-registry-operator
    3. Endpoints are ignored

Actual results:

    Specified endpoints are ignored

Expected results:

    Specified endpoints are used

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1024

Story CORS-3480: [AWS SDK] remove SDK provisioning

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

use capi/capa is a terraform-free cluster provisioning

so that I can achieve

Outcome 1
Outcome 2
Outcome 3

Acceptance Criteria:

Description of criteria:

SDK provisiong code is removed
`InstallAlternateInfrastructureAWS` feature gate triggers a capi install.

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/8358

Bug OCPBUGS-25673: CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition

View the Description View the linked PRs

Description of problem:

CNV upgrades from v4.14.1 to v4.15.0 (unreleased) are not starting due to out of sync operatorCondition.

We see:

$ oc get csv
NAME                                       DISPLAY                    VERSION               REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.14.1   OpenShift Virtualization   4.14.1                kubevirt-hyperconverged-operator.v4.14.0   Replacing
kubevirt-hyperconverged-operator.v4.15.0   OpenShift Virtualization   4.15.0                kubevirt-hyperconverged-operator.v4.14.1   Pending

And on the v4.15.0 CSV:

$ oc get csv kubevirt-hyperconverged-operator.v4.15.0 -o yaml
....
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2023-12-19T01:50:48Z"
    lastUpdateTime: "2023-12-19T01:50:48Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  - lastTransitionTime: "2023-12-19T01:50:48Z"
    lastUpdateTime: "2023-12-19T01:50:48Z"
    message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable
  lastTransitionTime: "2023-12-19T01:50:48Z"
  lastUpdateTime: "2023-12-19T01:50:48Z"
  message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
    is outdated'
  phase: Pending
  reason: OperatorConditionNotUpgradeable

and if we check the pending operator condition (v4.14.1) we see:

$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2023-12-16T17:10:17Z"
  generation: 18
  labels:
    operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: ""
  name: kubevirt-hyperconverged-operator.v4.14.1
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: kubevirt-hyperconverged-operator.v4.14.1
    uid: 7db79d4b-e69e-4af8-9335-6269cf004440
  resourceVersion: "4116127"
  uid: 347306c9-865a-42b8-b2c9-69192b0e350a
spec:
  conditions:
  - lastTransitionTime: "2023-12-18T18:47:23Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  deployments:
  - hco-operator
  - hco-webhook
  - hyperconverged-cluster-cli-download
  - cluster-network-addons-operator
  - virt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  serviceAccounts:
  - hyperconverged-cluster-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
status:
  conditions:
  - lastTransitionTime: "2023-12-18T09:41:06Z"
    message: ""
    observedGeneration: 11
    reason: Upgradeable
    status: "True"
    type: Upgradeable

where metadata.generation (18) is not in sync with status.conditions[*].observedGeneration (11).

Even manually redacting spec.conditions.lastTransitionTime is causing a change in metadata.generation (as expected) but this doesn't trigger any reconciliation on the OLM and so status.conditions[*].observedGeneration remains at 11.

$ oc get operatorcondition kubevirt-hyperconverged-operator.v4.14.1 -o yaml
apiVersion: operators.coreos.com/v2
kind: OperatorCondition
metadata:
  creationTimestamp: "2023-12-16T17:10:17Z"
  generation: 19
  labels:
    operators.coreos.com/kubevirt-hyperconverged.openshift-cnv: ""
  name: kubevirt-hyperconverged-operator.v4.14.1
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: true
    kind: ClusterServiceVersion
    name: kubevirt-hyperconverged-operator.v4.14.1
    uid: 7db79d4b-e69e-4af8-9335-6269cf004440
  resourceVersion: "4147472"
  uid: 347306c9-865a-42b8-b2c9-69192b0e350a
spec:
  conditions:
  - lastTransitionTime: "2023-12-18T18:47:25Z"
    message: ""
    reason: Upgradeable
    status: "True"
    type: Upgradeable
  deployments:
  - hco-operator
  - hco-webhook
  - hyperconverged-cluster-cli-download
  - cluster-network-addons-operator
  - virt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  serviceAccounts:
  - hyperconverged-cluster-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
  - cluster-network-addons-operator
  - kubevirt-operator
  - ssp-operator
  - cdi-operator
  - hostpath-provisioner-operator
  - mtq-operator
status:
  conditions:
  - lastTransitionTime: "2023-12-18T09:41:06Z"
    message: ""
    observedGeneration: 11
    reason: Upgradeable
    status: "True"
    type: Upgradeable

since its observedGeneration is out of sync, this check:
https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operatorconditions.go#L44C1-L48

fails and the upgrade never starts.

I suspect (I'm only guessing) that it could be a regression introduced with the memory optimization for https://issues.redhat.com/browse/OCPBUGS-17157 .

Version-Release number of selected component (if applicable):

    OCP 4.15.0-ec.3

How reproducible:

- Not reproducible (with the same CNV bundles) on OCP v4.14.z.
- Pretty high (but not 100%) on OCP 4.15.0-ec.3

Steps to Reproduce:

    1. Try triggering a CNV v4.14.1 -> v4.15.0 on OCP 4.15.0-ec.3
    2.
    3.

Actual results:

    The OLM is not reacting to changes on spec.conditions on the pending operator condition, so metadata.generation is constantly out of sync with status.conditions[*].observedGeneration and so the CSV is reported as 

    message: 'operator is not upgradeable: the operatorcondition status "Upgradeable"="True"
      is outdated'
    phase: Pending
    reason: OperatorConditionNotUpgradeable

Expected results:

    The OLM correctly reconcile the operatorCondition and the upgrade starts

Additional info:

    Not reproducible with exactly the same bundle (origin and target) on OCP v4.14.z

https://github.com/openshift/operator-framework-olm/pull/641

Bug OCPBUGS-27161: Nodepool has message NotFound when replica is set to 0

View the Description View the linked PRs

Description of problem:

When the replica for a nodepool is set to 0, the message for the nodepool is "NotFound". This message should not be displayed if the desired replica is 0.

Version-Release number of selected component (if applicable):

How reproducible:

    Create a nodepool and set the replica to 0

Steps to Reproduce:

    1. Create a hosted cluster
    2. Set the replica for the nodepool to 0
    3.

Actual results:

NodePool message is "NotFound"

Expected results:

NodePool message to be empty

Additional info:

https://github.com/openshift/hypershift/pull/3472

Bug OCPBUGS-35409: When mirroring fails IDMS, ITMS resources are not generated

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34020~~. The following is the description of the original issue:
—
Description of problem:

Mirroring fails sometimes due to various number of reasons and since mirror fails, current code does not generate idms & itms files. Even if user tries to mirror the operators  twice or thrice the operators does not get mirrored and no resources are created to utilize the operators that have already been mirrored. This bug is to create idms and itms even if mirroring fails

Version-Release number of selected component (if applicable):

     4.16

How reproducible:

     Always

Steps to Reproduce:

    1. Install latest oc-mirror
    2. Use the ImageSetConfig.yaml below
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
archiveSize: 4
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/certified-operator-index:v4.15
    full: false # only mirror the latest versions
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
    full: false # only mirror the latest versions
    3. Mirror using the command `oc-mirror -c config.yaml docker://localhost:5000/m2m --dest-skip-verify=false --workspace=file://test`

Actual results:

     Mirroring fails and does not generate any idms or itms files

Expected results:

     IDMS and ITMS files should be generated for the mirrored operators, even if mirroring fails

Additional info:

https://github.com/openshift/oc-mirror/pull/879

Bug OCPBUGS-24416: MachineConfigNode cannot be synced with node creation/deletion

View the Description View the linked PRs

unknown machine config node can be listed, the name is not in current cluster, in my cluster, there are 6 nodes, but I can see 10 machine config nodes

// current node
$ oc get node
NAME                                        STATUS   ROLES                  AGE     VERSION
ip-10-0-12-209.us-east-2.compute.internal   Ready    worker                 3h48m   v1.28.3+59b90bd
ip-10-0-23-177.us-east-2.compute.internal   Ready    control-plane,master   3h54m   v1.28.3+59b90bd
ip-10-0-32-216.us-east-2.compute.internal   Ready    control-plane,master   3h54m   v1.28.3+59b90bd
ip-10-0-42-207.us-east-2.compute.internal   Ready    worker                 53m     v1.28.3+59b90bd
ip-10-0-71-71.us-east-2.compute.internal    Ready    worker                 3h46m   v1.28.3+59b90bd
ip-10-0-81-190.us-east-2.compute.internal   Ready    control-plane,master   3h54m   v1.28.3+59b90bd

// current mcn
$ oc get machineconfignode
NAME                                        UPDATED   UPDATEPREPARED   UPDATEEXECUTED   UPDATEPOSTACTIONCOMPLETE   UPDATECOMPLETE   RESUMED
ip-10-0-12-209.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-23-177.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-32-216.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-42-207.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-53-5.us-east-2.compute.internal     True      False            False            False                      False            False
ip-10-0-56-84.us-east-2.compute.internal    True      False            False            False                      False            False
ip-10-0-58-210.us-east-2.compute.internal   True      False            False            False                      False            False
ip-10-0-58-99.us-east-2.compute.internal    False     True             True             Unknown                    False            False
ip-10-0-71-71.us-east-2.compute.internal    True      False            False            False                      False            False
ip-10-0-81-190.us-east-2.compute.internal   True      False            False            False                      False            False

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-04-162702

How reproducible:

Consistently

Steps to Reproduce:

1. setup cluster with 4.15.0-0.nightly-2023-12-04-162702 on aws
2. enable featureSet: TechPreviewNoUpgrade
3. apply file based mc few times.
4. check node list
5. check machine config node list

Actual results:

there are some unknown machine config nodes found

Expected results:

machine config node number should be same as cluster node number

Additional info:

must-gather: https://drive.google.com/file/d/1-VTismwXXZ9sYMHi8hDL7vhwzjuMn92n/view?usp=drive_link

https://github.com/openshift/machine-config-operator/pull/4062

Bug OCPBUGS-2135: [GCP]capi machine cannot be deleted by installer during cluster destroy

View the Description View the linked PRs

Description of problem:

capi machine cannot be deleted by installer during cluster destroy, checked on GCP console, found this machine lacks label(kubernetes-io-cluster-clusterid: owned), if adding this label manually on GCP console for the machine, then the machine can be deleted by installer during cluster destroy.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-05-053337

How reproducible:

Always

Steps to Reproduce:

1.Follow the steps here https://bugzilla.redhat.com/show_bug.cgi?id=2107999#c9 to create a capi machine

liuhuali@Lius-MacBook-Pro huali-test % oc get machine.cluster.x-k8s.io      -n openshift-cluster-api
NAME             CLUSTER            NODENAME   PROVIDERID                                                         PHASE         AGE   VERSION
capi-ms-mtchm    huliu-gcpx-c55vm              gce://openshift-qe/us-central1-a/capi-gcp-machine-template-gcw9t   Provisioned   51m   

2.Destroy the cluster
The cluster destroyed successfully, but checked on GCP console, found the capi machine is still there.

labels of capi machine

labels of mapi machine

Actual results:

capi machine cannot be deleted by installer during cluster destroy

Expected results:

capi machine should be deleted by installer during cluster destroy

Additional info:

Also checked on aws, the case worked well, and found there is tag(kubernetes.io/cluster/clusterid:owned) for capi machines.

https://github.com/openshift/installer/pull/7907

Bug OCPBUGS-24383: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2171

Bug OCPBUGS-34790: [release-4.16] oauth login page templates include old OpenShift favicon

View the Description View the linked PRs

https://github.com/openshift/console/pull/13420 updated the console to use the new OpenShift branding for the favicon, but this change was not applied to oauth-templates.

https://github.com/openshift/cluster-authentication-operator/pull/676

Bug OCPBUGS-15845: FIPS install should fail if installer is not FIPS capable

View the Description View the linked PRs

Because the installer generates some of the keys that will remain present in the cluster (e.g. the signing key for the admin kubeconfig), it should also run in an environment where FIPS is enabled.

Because it is very easy to fail to notice that the keys were generated in a non-FIPS-certified environment, we should enforce this by checking that fips_enabled is true if the target cluster is to have FIPS enabled.

Colin Walters has a patch for this.

https://github.com/openshift/installer/pull/7566

Task MON-3856: Update downstream prometheus to v2.52.0

View the linked PRs

Bug OCPBUGS-29012: Azure Service Load Balancer taking long time to get deleted 4.15 and 4.16

View the Description View the linked PRs

Description of problem:

    We see failures in this test:

[Jira:"Networking / router"] monitor test service-type-load-balancer-availability setup expand_less 15m1s{ failed during setup error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the condition}

See this https://search.ci.openshift.org/?search=error+waiting+for+load+balancer&maxAge=168h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job to find recent ones.

example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1754402739040817152

this has failed payloads like:

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-02-01-211543
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2024-02-02-061913
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2024-02-02-001913

Version-Release number of selected component (if applicable):

    4.15 and 4.16

How reproducible:

    intermittent as shown in the search.ci query above

Steps to Reproduce:

    1. run the e2e tests on 4.15 and 4.16
    2.
    3.

Actual results:

    timeouts on getting load balancer

Expected results:

    no timeout and successful load balancer

Additional info:

    https://issues.redhat.com/browse/TRT-1486 has more info 
thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1707142256956139

Bug OCPBUGS-29587: Power VS: Handle composite_instance for cluster create

View the Description View the linked PRs

Description of problem:

    When deploying to a Power VS workspace created after February 14th 2024, it will not be found by the installer.

Version-Release number of selected component (if applicable):

How reproducible:

    Easily.

Steps to Reproduce:

    1. Create a Power VS Workspace
    2. Specify it in the install config
    3. Attempt to deploy
    4. Fail with "...is not a valid guid" error.

Actual results:

    Failure to deploy to service instance

Expected results:

    Should deploy to service instance

Additional info:

https://github.com/openshift/installer/pull/8033

Bug OCPBUGS-33240: Avoid FAT32 error messages when generating the agent ISO

View the Description View the linked PRs

Description of problem:

    When building the agent ISO with the debug log level enabled, a number of FAT32 error messages are logged. They do not hamper the ISO creation, but they make the log output very noisy and harder to read (and moreover they are not necessary).

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Create an agent ISO (the configuration does not matter) with the debug log level

Steps to Reproduce:

    1.$ openshift-install agent create image --log-level=debug

Actual results:

    The ouput contains several traces like following:

...
level=debug msg=trying fat32
level=debug msg=fat32 failed: error reading MS-DOS Boot Sector: could not read FAT32 BIOS Parameter Block from boot sector: could not read embedded DOS 3.31 BPB: error reading embedded DOS 2.0 BPB: invalid sector size 37008 provided in DOS 2.0 BPB. Must be 512
level=debug msg=trying iso9660 with physical block size 0
...

Expected results:

 The above traces are not shown

Additional info:

https://github.com/openshift/installer/pull/8353

Bug OCPBUGS-34554: Fix audit-logs container to respect SIGTERM

View the Description View the linked PRs

Description of problem:

    The audit-logs container for kas, oapi and oauth apiservers does not terminate within the `TerminationGracePeriodSeconds` timer. This is due to the container not terminating when a `SIGTERM` command is issued. When testing without the audit logs container, oapi and oath-apiserver terminates within a 90-110 second range gracefully. The kas does not terminate with the container gone and I have a hunch that it's the konnectivity container that also does not follow `SIGTERM` (I've attempted 10 minutes and still does not timeout). So this issue is to change the logic for audit-logs to terminate gracefully and increase the TerminationGracePeriodSeconds from the default of 30s to 120s.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4090

Story PSAP-1210: Automate for the bug SNO installation does not finish due to machine-config

View the Description View the linked PRs

User Story:
This is story use to merge PR in openshift/origin repository

https://github.com/openshift/origin/pull/28382

Acceptance criteria:

https://github.com/openshift/origin/pull/28382

Bug OCPBUGS-29568: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/229

Bug OCPBUGS-30860: API is unavailable after bootstrap server is destroyed

View the Description View the linked PRs

Description of problem:

Installation failed on 4.16 nightly build when waiting for install-complete. API is unavailable.

level=info msg=Waiting up to 20m0s (until 5:00AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443...
level=info msg=API v1.29.2+a0beecc up
level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete...
api available
waiting for bootstrap to complete
level=info msg=Waiting up to 20m0s (until 5:01AM UTC) for the Kubernetes API at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443...
level=info msg=API v1.29.2+a0beecc up
level=info msg=Waiting up to 30m0s (until 5:11AM UTC) for bootstrapping to complete...
level=info msg=It is now safe to remove the bootstrap resources
level=info msg=Time elapsed: 15m54s
Copying kubeconfig to shared dir as kubeconfig-minimal
level=info msg=Destroying the bootstrap resources... 
level=info msg=Waiting up to 40m0s (until 5:39AM UTC) for the cluster at https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443 to initialize...
W0313 04:59:34.272442     229 reflector.go:539] k8s.io/client-go/tools/watch/informerwatcher.go:146: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout
I0313 04:59:34.272658     229 trace.go:236] Trace[533197684]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (13-Mar-2024 04:59:04.271) (total time: 30000ms):
Trace[533197684]: ---"Objects listed" error:Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout 30000ms (04:59:34.272)
...
E0313 05:38:18.669780     229 reflector.go:147] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 172.212.184.131:6443: i/o timeout
level=error msg=Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.ci-op-4sgxj8jx-8482f.qe.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 172.212.184.131:6443: i/o timeout
level=error msg=Cluster initialization failed because one or more operators are not functioning properly.
level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
level=error msg=failed to initialize the cluster: timed out waiting for the condition 

On master node, seems that kube-apiserver is not running, 
[root@ci-op-4sgxj8jx-8482f-hppxj-master-0 ~]# crictl ps | grep apiserver
e4b6cc9622b01       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         7 minutes ago        Running             kube-apiserver-cert-syncer                    22                  3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
1249824fe5788       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             kube-apiserver-insecure-readyz                0                   3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
ca774b07284f0       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             kube-apiserver-cert-regeneration-controller   0                   3ff4af6614409       kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0
2931b9a2bbabd       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             openshift-apiserver-check-endpoints           0                   4136bf2183de1       apiserver-7df5bb879-xx74p
0c9534aec3b6b       8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de                                                         4 hours ago          Running             openshift-apiserver                           0                   4136bf2183de1       apiserver-7df5bb879-xx74p
db21a2dd1df33       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago          Running             guard                                         0                   199e1f4e665b9       kube-apiserver-guard-ci-op-4sgxj8jx-8482f-hppxj-master-0
429110f9ea5a3       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago          Running             apiserver-watcher                             0                   7664f480df29d       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-0

[root@ci-op-4sgxj8jx-8482f-hppxj-master-1 ~]# crictl ps | grep apiserver
c64187e7adcc6       ec5ccd782eb003136d9cc1df51a2b20f8a2a489d72ffb894b92f50e363c7cb90                                                         4 hours ago         Running             openshift-apiserver-check-endpoints           0                   1a4a5b247c28a       apiserver-7df5bb879-f6v5x
ff98c52402288       8c9042f97c89d8c8519d6e6235bef5a5346f08e6d7d9864ef0f228b318b4c3de                                                         4 hours ago         Running             openshift-apiserver                           0                   1a4a5b247c28a       apiserver-7df5bb879-f6v5x
2f8a97f959409       faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927                                                         4 hours ago         Running             oauth-apiserver                               0                   ffa2c316a0cca       apiserver-97fbc599c-2ftl7
72897e30e0df0       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago         Running             apiserver-watcher                             0                   3b6c3849ce91f       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-1

[root@ci-op-4sgxj8jx-8482f-hppxj-master-2 ~]# crictl ps | grep apiserver
04c426f07573d       faa1b95089d101cdc907d7affe310bbff5a9aa8f92c725dc6466afc37e731927                                                         4 hours ago         Running             oauth-apiserver                      0                   2172a64fb1a38       apiserver-654dcb4cc6-tq8fj
4dcca5c0e9b99       6a03f3f082f3719e79087d569b3cd1e718fb670d1261fbec9504662f1005b1a5                                                         4 hours ago         Running             apiserver-watcher                    0                   1cd99ec327199       apiserver-watcher-ci-op-4sgxj8jx-8482f-hppxj-master-2


And found below error in kubelet log,
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: E0313 06:10:15.004656   23961 kuberuntime_manager.go:1262] container &Container{Name:kube-apiserver,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:789f242b8bc721b697e265c6f9d025f45e56e990bfd32e331c633fe0b9f076bc,Command:[/bin/bash -ec],Args:[LOCK=/var/log/kube-apiserver/.lock
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: # We should be able to acquire the lock immediatelly. If not, it means the init container has not released it yet and kubelet or CRI-O started container prematurely.
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec {LOCK_FD}>${LOCK} && flock --verbose -w 30 "${LOCK_FD}" || {
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   echo "Failed to acquire lock for kube-apiserver. Please check setup container for details. This is likely kubelet or CRI-O bug."
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   exit 1
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: }
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   echo "Copying system trust bundle ..."
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]:   cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: fi
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP}  -v=2 --permit-address-sharing
Mar 13 06:10:15 ci-op-4sgxj8jx-8482f-hppxj-master-0 kubenswrapper[23961]: ],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:,HostPort:6443,ContainerPort:6443,Protocol:TCP,HostIP:,},},Env:[]EnvVar{EnvVar{Name:POD_NAME,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.name,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:POD_NAMESPACE,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:STATIC_POD_VERSION,Value:4,ValueFrom:nil,},EnvVar{Name:HOST_IP,Value:,ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.hostIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,},},EnvVar{Name:GOGC,Value:100,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{cpu: {{265 -3} {<nil>} 265m DecimalSI},memory: {{1073741824 0} {<nil>} 1Gi BinarySI},},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:resource-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-resources,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:cert-dir,ReadOnly:false,MountPath:/etc/kubernetes/static-pod-certs,SubPath:,MountPropagation:nil,SubPathExpr:,},VolumeMount{Name:audit-dir,ReadOnly:false,MountPath:/var/log/kube-apiserver,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:livez,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,TerminationGracePeriodSeconds:nil,},ReadinessProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:readyz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:1,TerminationGracePeriodSeconds:nil,},Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:FallbackToLogsOnError,VolumeDevices:[]VolumeDevice{},StartupProbe:&Probe{ProbeHandler:ProbeHandler{Exec:nil,HTTPGet:&HTTPGetAction{Path:healthz,Port:{0 6443 },Host:,Scheme:HTTPS,HTTPHeaders:[]HTTPHeader{},},TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:0,TimeoutSeconds:10,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:30,TerminationGracePeriodSeconds:nil,},ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod kube-apiserver-ci-op-4sgxj8jx-8482f-hppxj-master-0_openshift-kube-apiserver(196e0956694ff43707b03f4585f3b6cd): CreateContainerConfigError: host IP unknown; known addresses: []

Version-Release number of selected component (if applicable):

    4.16 latest nightly build

How reproducible:

    frequently

Steps to Reproduce:

    1. Install cluster on 4.16 nightly build
    2.
    3.

Actual results:

    Installation failed.

Expected results:

    Installation is successful.

Additional info:

Searched CI jobs, found many jobs failed with same error, most are on azure platform.
https://search.dptools.openshift.org/?search=failed+to+initialize+the+cluster%3A+timed+out+waiting+for+the+condition&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/installer/pull/8394

Bug OCPBUGS-26554: Topology: Chinese translation was broken

View the Description View the linked PRs

Chinese translation in topology was invalid, see https://github.com/openshift/console/pull/13458

https://github.com/openshift/console/pull/13458

Bug OCPBUGS-28724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2073

Bug OCPBUGS-32786: Unclear failure message when using an incompatible oc client to run "oc adm release extract" command

View the Description View the linked PRs

Description of problem:

When using an old version "oc" client to extract something newly introduced into the release extract, got an error about it doesn't support linux, which is a bit confusing.


[root@gpei-test-rhel9 0423]# ./oc version
Client Version: 4.15.10
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Kubernetes Version: v1.27.12+7bee54d

[root@gpei-test-rhel9 0423]# ./oc adm release extract --registry-config ~/.docker/config --command=openshift-install-fips --to ./ registry.ci.openshift.org/ocp/release:4.16.0-0.ci-2024-04-23-065741
error: command "openshift-install-fips" does not support the operating system "linux"

And for the oc client extracted from the same payload, it works well.
[root@gpei-test-rhel9 fips]# ./oc version
Client Version: 4.16.0-0.ci-2024-04-23-065741
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Kubernetes Version: v1.27.12+7bee54d

[root@gpei-test-rhel9 fips]# ./oc adm release extract --registry-config ~/.docker/config --command=openshift-install-fips --to ./ registry.ci.openshift.org/ocp/release:4.16.0-0.ci-2024-04-23-065741

[root@gpei-test-rhel9 fips]# ls
oc  openshift-install-fips

It would be expected to get error prompt "command "openshift-install-fips" is not supported in current oc client" or something like this, but not saying it does not support the operating system "linux"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1746

Bug OCPBUGS-18776: Test case failure- OpenShift alerting rules [apigroup:image.openshift.io] should have description and summary annotations

View the Description View the linked PRs

Description of problem:

Test case failure- OpenShift alerting rules [apigroup:image.openshift.io] should have description and summary annotations
The obtained response seems to have unmarshalling errors. 

Failed to fetch alerting rules: unable to parse response 

invalid character 's' after object key

Expected output- The response should be proper and the unmarshalling should have worked

Openshift Version- 4.13 & 4.14

Cloud Provider/Platform- PowerVS

Prow Job Link/Must gather path- https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.13-ocp-e2e-ovn-ppc64le-powervs/1700992665824268288/artifacts/ocp-e2e-ovn-ppc64le-powervs/

https://github.com/openshift/origin/pull/28546

Bug OCPBUGS-38167: Rebase CPO on 4.16

View the Description View the linked PRs

release-4.16 of openshift/cloud-provider-openstack is missing some commits that were backported in upstream project into the release-1.29 branch.
We should import them in our downstream fork.

https://github.com/openshift/cloud-provider-openstack/pull/291

Task ACM-8917: Add agent options to HyperShift CLI nodepool create command

View the Description View the linked PRs

There is currently no way to specify AgentLabelSelector when creating a nodepool via the hypershift CLI

https://github.com/openshift/hypershift/pull/3285

Bug OCPBUGS-24856: Update 4.16 csi-node-driver-registrar-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-node-driver-registrar/pull/57

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-node-driver-registrar/pull/57

Bug OCPBUGS-31957: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/427

Bug OCPBUGS-31492: Subset of Metal jobs have insufficient etcd disk I/O

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-arch][Early] CRDs for openshift.io should have subresource.status [Suite:openshift/conformance/parallel].

Probability of significant regression: 98.48%

Sample (being evaluated) Release: 4.16
Start Time: 2024-03-21T00:00:00Z
End Time: 2024-03-27T23:59:59Z
Success Rate: 89.29%
Successes: 25
Failures: 3
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 99.28%
Successes: 138
Failures: 1
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20no-upgrade%20amd64%20metal-ipi%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=metal-ipi&platform=metal-ipi&sampleEndTime=2024-03-27%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-03-21%2000%3A00%3A00&testId=openshift-tests%3Ab3e170673c14c432c14836e9f41e7285&testName=%5Bsig-arch%5D%5BEarly%5D%20CRDs%20for%20openshift.io%20should%20have%20subresource.status%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial

In examining these test failures we found what is actually a pretty random grouping of tests failing and likely as a group these are a fairly significant part of why component readiness is reporting so much red on metal right now.

In March the metal team modified some configuration such that a portion of metal jobs can now land in a couple new environments, one of them ibmcloud.

This linked test above helped find the pattern whereby we can open the spyglass chart in prow and see a clear pattern that we then found in many other failed metal jobs:

pod-logs section full of etcd logging problems where reads and writes took too long
a vertical line of disruption across multiple backends
an abnormal vertical line of etcd leader elections jumping around
a vertical line of failed e2e tests

All of these line up within the same vertical space indicating the problem was at the same time, and the pod-logs section is as full as ever.

Derek Higgins has pulled ibmcloud out of rotation until they can attempt some SSD for etcd.

This bug is for introduction of a test that will make this symptom of etcd being very unhealthy visible as a test failure, both to communicate to engineers who look at the runs and help them understand this critical failure, and to help us locate runs affected because no single existing test can really do this today.

Azure and GCP jobs can normally log these etcd warnings 3-5k times in a CI run. These ibmcloud runs were showing 30-70k. A limit of 10k was chosen based on examining the data in bigquery, only 50 jobs have exceeded that this month, all metal and agent jobs.

https://github.com/openshift/origin/pull/28674

Bug MGMT-16587: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task MON-3771: Update downstream prometheus-operator to v0.72.0

View the linked PRs

Bug OCPBUGS-31111: ART requests updates to 4.16 image monitoring-plugin-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/monitoring-plugin/pull/103

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-30805: no response when clicking on 'Configure' button for AlertmanagerReceiversNotConfigured alert

View the Description View the linked PRs

Description of problem:

nothing happens when user clicks on the 'Configure' button next to AlertmanagerReceiversNotConfigured alert

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-03-11-041450

How reproducible:

 Always

Steps to Reproduce:

1. navigate to Home -> Overview, locate the AlertmanagerReceiversNotConfigured alert in 'Status' card
2. click the 'Configure' button next to AlertmanagerReceiversNotConfigured alert

Actual results:

nothing happens

Expected results:

user should be taken to alert manager configuration page /monitoring/alertmanagerconfig

Additional info:

https://github.com/openshift/console/pull/13666

Bug OCPBUGS-37526: Race condition when deleting ServiceAccount

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35731~~. The following is the description of the original issue:
—
Description of problem:

A ServiceAccount is not deleted due to a race condition in the controller manager. When deleting the SA, this is logged in the controller manager:

2024-06-17T15:57:47.793991942Z I0617 15:57:47.793942       1 image_pull_secret_controller.go:233] "Internal registry pull secret auth data does not contain the correct number of entries" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" expected=3 actual=0
2024-06-17T15:57:47.794120755Z I0617 15:57:47.794080       1 image_pull_secret_controller.go:163] "Refreshing image pull secret" ns="test-qtreoisu" name="sink-eguqqiwm-dockercfg-vh8mw" serviceaccount="sink-eguqqiwm"

As a result, the Secret is updated and the ServiceAccount owning the Secret is updated by the controller via server-side apply operation as can be seen in the managedFields:

{
   "apiVersion":"v1",
   "imagePullSecrets":[
      {
         "name":"default-dockercfg-vdck9"
      },
      {
         "name":"kn-test-image-pull-secret"
      },
      {
         "name":"sink-eguqqiwm-dockercfg-vh8mw"
      }
   ],
   "kind":"ServiceAccount",
   "metadata":{
      "annotations":{
         "openshift.io/internal-registry-pull-secret-ref":"sink-eguqqiwm-dockercfg-vh8mw"
      },
      "creationTimestamp":"2024-06-17T15:57:47Z",
      "managedFields":[
         {
            "apiVersion":"v1",
            "fieldsType":"FieldsV1",
            "fieldsV1":{
               "f:imagePullSecrets":{
                  
               },
               "f:metadata":{
                  "f:annotations":{
                     "f:openshift.io/internal-registry-pull-secret-ref":{
                        
                     }
                  }
               },
               "f:secrets":{
                  "k:{\"name\":\"sink-eguqqiwm-dockercfg-vh8mw\"}":{
                     
                  }
               }
            },
            "manager":"openshift.io/image-registry-pull-secrets_service-account-controller",
            "operation":"Apply",
            "time":"2024-06-17T15:57:47Z"
         }
      ],
      "name":"sink-eguqqiwm",
      "namespace":"test-qtreoisu",
      "resourceVersion":"104739",
      "uid":"eaae8d0e-8714-4c2e-9d20-c0c1a221eecc"
   },
   "secrets":[
      {
         "name":"sink-eguqqiwm-dockercfg-vh8mw"
      }
   ]
}"Events":{
   "metadata":{
      
   },
   "items":null
}

The ServiceAccount then hangs there and is NOT deleted.

We have seen this only on OCP 4.16 (not on older versions) but already several time, like for example in this CI run which also has must-gather logs that can be investigated.

Another run is here

The controller code is new in 4.16 and it seems to be a regression.

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-14-130320

How reproducible:

It happens sometimes in our CI runs where we want to delete a ServiceAccount but it's hanging there. The test doesn't try to delete it again. It tries only once.

Steps to Reproduce:

The following reproducer works for me. Some service accounts keep handing there after running the script

#!/usr/bin/env bash

kubectl create namespace test

for i in `seq 100`; do
	(
		kubectl create sa "my-sa-${i}" -n test
		kubectl wait --for=jsonpath="{.metadata.annotations.openshift\\.io/internal-registry-pull-secret-ref}" sa/my-sa-${i}
		kubectl delete sa/my-sa-${i}
		kubectl wait --for=delete sa/my-sa-${i} --timeout=60s
	)&
done

wait

Actual results:

ServiceAccount not deleted

Expected results:

ServiceAccount deleted

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/324

Bug OCPBUGS-28578: Update 4.16 ose-network-interface-bond-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/bond-cni/pull/62

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/bond-cni/pull/62

Bug OCPBUGS-34012: Reduce the number of calls to the subscriptions fetchOrganization endpoint from console-operator

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33715~~. The following is the description of the original issue:
—
Description of problem:

console-operator is fetching the organization ID from OCM on every sync call, which is too often. We need to reduce the fetch period.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/916

Bug OCPBUGS-25571: Update 4.16 ose-alibaba-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/49

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/49

Bug OCPBUGS-29288: Ensure proper deprecation for the default field manager in CNO

View the Description View the linked PRs

Description of problem:

We deprecated the default field manager in CNO, but it is still used by default calls like Patch(). We need to update all calls to use explicit fieldManager, and add a test to verify deprecated managers are not used.

Since 4.14 https://github.com/openshift/cluster-network-operator/commit/db57a477b10f517bc4ae501d95cc7b8398a8755c#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R31 (more specifically, since sigs.k8s.io/controller-runtime bump) we were exposed to this bug https://github.com/kubernetes-sigs/controller-runtime/pull/2435/commits/a6b9c0b672c77a79fff4d5bc03221af1e1fe21fa which made the default fieldManager to be "Go-http-client" instead of "cluster-network-operator".
It means, that "cluster-network-operator" deprecation doesn't really work, since the manager is called differently. Manager name, when unset, is coming from https://github.com/kubernetes/kubernetes/blob/b85c9bbf1ac911a2a2aed2d5c1f5eaf5956cc199/staging/src/k8s.io/client-go/rest/config.go#L498 and then is cropped https://github.com/openshift/cluster-network-operator/blob/5f18e4231f291bf5a01812974b0b4dff19c77f2c/vendor/k8s.io/apiserver/pkg/endpoints/handlers/create.go#L253-L260.

Identified changes needed (may be more):

(status *StatusManager) setAnnotation
mysterious message for the newly created clusters only (didn't see on CI runs, only seen once)
`Depreciated field manager cluster-network-operator for object "operator.openshift.io/v1, Kind=Network" cluster`

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Check CNO logs for deprecated field manager logs
    2. oc logs -l name=network-operator --tail=-1 -n openshift-network-operator|grep "Depreciated field manager"     
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2274

Bug OCPBUGS-31423: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/679

Bug OCPBUGS-37040: [v2] unable to mirror releases disk to mirror in enclave setup

View the Description View the linked PRs

Description of problem:

Unable to run disk to mirror in enclave environment.

Version:

WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202406131906.p0.g7c0889f.assembly.stream.el9-7c0889f", GitCommit:"7c0889f4bd343ccaaba5f33b7b861db29b1e5e49", GitTreeState:"clean", BuildDate:"2024-06-13T22:07:44Z", GoVersion:"go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

Always

Steps to Reproduce:

    1. Mirror to disk on internet facing machine
    2. scp the tar to a disconnected machine, different folder
    3. Disk to mirror

Actual results:

[ec2-user@ip-10-0-1-197 ~]$ oc-mirror --v2 -c isc_resources.yaml --from file:///home/ec2-user/entreprise-content/ docker://localhost:5000

2024/07/15 08:42:40  [WARN]   : ⚠️  --v2 flag identified, flow redirected to the oc-mirror v2 version. This is Tech Preview, it is still under development and it is not production ready.
2024/07/15 08:42:40  [INFO]   : 👋 Hello, welcome to oc-mirror
2024/07/15 08:42:40  [INFO]   : ⚙️  setting up the environment for you...
2024/07/15 08:42:40  [INFO]   : 🔀 workflow mode: diskToMirror 
I0715 08:42:40.646736   40155 client.go:44] Usage of the UPDATE_URL_OVERRIDE environment variable is unsupported
2024/07/15 08:49:34  [INFO]   : 🕵️  going to discover the necessary images...
2024/07/15 08:49:34  [INFO]   : 🔍 collecting release images...
2024/07/15 08:49:34  [ERROR]  : [ReleaseImageCollector] open /home/skhoury/demo/working-dir/hold-release/ocp-release/4.15.17-x86_64/release-manifests/image-references: no such file or directory
2024/07/15 08:49:34  [INFO]   : 👋 Goodbye, thank you for using oc-mirror
error closing log file registry.log: close /home/ec2-user/entreprise-content/working-dir/logs/registry.log: file already closed
2024/07/15 08:49:34  [ERROR]  : [ReleaseImageCollector] open /home/skhoury/demo/working-dir/hold-release/ocp-release/4.15.17-x86_64/release-manifests/image-references: no such file or directory

Expected results:

Success
the folder releaseImageCollector should have used is /home/ec2-user/entreprise-content/working-dir/hold-release/ocp-release/4.15.17-x86_64/release-manifests/image-references

Additional info:

https://github.com/openshift/oc-mirror/pull/892

Story OPRUN-3221: Support multi-RHEL opm clis in mirror.openshift.com

View the Description View the linked PRs

This looks very much like a 'downstream a thing' process, but only making a modification to an existing one.

Currently, the operator-framework-olm monorepo generates a self-hosting catalog from operator-registry.Dockerfile. This image also contains cross-compiled opm binaries for windows and mac, and joins the payload as ose-operator-registry.

To separate concerns, this introduces a new operator-framework-cli image which will be based on scratch, not self-hosting in any way, and just a container to convey repeatably produced o-f CLIs. Right now, this will focus on opm for olm v0 only, but others can be added in future.

Bug MGMT-16216: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5836

Bug OCPBUGS-30475: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/78

Bug OCPBUGS-31955: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kube-rbac-proxy/pull/98

Bug OCPBUGS-32296: Bump to kubernetes 1.29.4

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.29.4:

Changelog:
v1.29.4: https://github.com/kubernetes/kubernetes/blob/release-1.29/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1293

https://github.com/openshift/kubernetes/pull/1947

Bug OCPBUGS-33345: Disable create button on Network Attachment Definitions page for regular user

View the Description View the linked PRs

Description of problem:

Disable create button on Network Attachment Definitions page for regular user.
Regular user cannot create NAD but can list NAD, the button should be dislabled just like the  "Create" button on MigrationPolicies page, and a tooltip should be added to the button.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13805

Bug OCPBUGS-24186: ODF Dynamic plugin should not expose Server header

View the Description View the linked PRs

Customer pentest shows that the Server header is returned by admin console when browsing

https://console-openshift-console$domain/locales/resource.json?lng=en&ns=plugin__odf-console

This could lead to information about CVE for a potential attacker.

Response header:

Server: nginx/1.20.1

https://github.com/openshift/console/pull/13404

Bug OCPBUGS-29095: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/298

Bug OCPBUGS-37262: wrong idrac-redfish management interface in BMO in redfish module

View the Description View the linked PRs

The management interface for idrac-redfish in the redfish BMO module is wrongly set to "ipxe" causing the error
"Could not find the following interface in the 'ironic.hardware.interfaces.management' entrypoint: ipxe."

we need to set that to "idracRedfish"

https://github.com/openshift/baremetal-operator/pull/366

Bug OCPBUGS-38376: snyk: google.golang.org/grpc/metadata [4.16]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38375~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37782. The following is the description of the original issue:
—
Description of problem:

    ci/prow/security is failing on google.golang.org/grpc/metadata

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

always

Steps to Reproduce:

    1. run ci/pro/security job on 4.15 pr
    2.
    3.

Actual results:

    Medium severity vulnerability found in google.golang.org/grpc/metadata

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/748

Story CONSOLE-3921: Removal of NICs/Disks page from Node Details page

View the Description View the linked PRs

When metal3-plugin is enabled it adds Disks and NICs tabs to Nodes details page.

We would like to remove these tabs becase:

The same tabs are added to Bare Metal Hosts page - thus we would be removing the duplicity.

Not every Node has to be backed by Bare Metal Host resource. If that is the case, then Disks/NICs tabs wont show anything.
RFE https://issues.redhat.com/browse/RFE-5061

https://github.com/openshift/console/pull/13552

Bug OCPBUGS-29196: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-33098: Provide user a way to determine if the error happens during a mirroring is actually an error or a flake

View the Description View the linked PRs

Description of problem:

Currently in oc-mirror v2 user have no way of determining if the error occurs during a mirror is an actual mirror or a flake. We need to provide a way for the user to determine such errors easily which will make the product/user experience better.

Version-Release number of selected component (if applicable):

    [knarra@knarra-thinkpadx1carbon7th openshift-tests-private]$ oc-mirror version
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202404221110.p0.g0e2235f.assembly.stream.el9-0e2235f", GitCommit:"0e2235f4a51ce0a2d51cfc87227b1c76bc7220ea", GitTreeState:"clean", BuildDate:"2024-04-22T16:05:56Z", GoVersion:"go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

     Always

Steps to Reproduce:

    1.  Install latest oc-mirror v2
    2.  Run mirror2disk via command `oc-mirror -c <config.yaml> file://out --v2`
    3.  Now run disk2mirror via command `oc-mirror -c <config.yaml> --from file://out docker:<localhost:5000>/mirror

Actual results:

    sometimes mirror fails with error 
 2024/04/24 13:15:38  [ERROR]  : [Worker] err: copying image 3/3 from manifest list: reading blob sha256:418a8fe842682e4eadab6f16a6ac8d30550665a2510090aa9a29c607d5063e67: fetching blob: unauthorized: Access to the requested resource is not authorized

Expected results:

     There should be a way for user to determine whether the error happened was an actual error or kind of a race. Above error is not  intutive

Additional info:

    Discussed here https://redhat-internal.slack.com/archives/C050P27C71S/p1714025295014549

https://github.com/openshift/oc-mirror/pull/843

Bug OCPBUGS-30349: ART requests updates to 4.16 image ose-must-gather-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/must-gather/pull/406

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/must-gather/pull/406

Vulnerability OCPBUGS-42511: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/594

Bug OCPBUGS-25543: Update 4.16 ose-azure-workload-identity-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-workload-identity/pull/10

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-workload-identity/pull/10

Bug OCPBUGS-33006: Bootstrap fails to boot on ARM BM IPI

View the Description View the linked PRs

level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to create bootstrap: unsupported configuration: No emulator found for arch 'x86_64'

Task MON-3552: Get rid of temp code for kubelet custom SM

View the Description View the linked PRs

See TODOs in https://github.com/openshift/cluster-monitoring-operator/pull/2160/files

Remove the DeleteConfigMapByNamespaceAndName util (if not used by others )

https://github.com/openshift/cluster-monitoring-operator/pull/2210

Bug OCPBUGS-31615: Should print out an error if single arch image specified with non-expected arch by filter-by-os

View the Description View the linked PRs

Description of problem:

Should print out an error if single arch image specified with non-expected arch by filter-by-os

Version-Release number of selected component (if applicable):

 oc version Client Version: 4.16.0-202403121314.p0.gc92b507.assembly.stream-c92b507

How reproducible:

    Always

Steps to Reproduce:

1)  Use `filter-by-os linux/amd64` for the image only with arch : arm64 
`oc image info  quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-aarch64  --filter-by-os linux/amd64

2) Use invalid `--filter-by-os linux/invalid` for the image 
`oc image info  quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-aarch64  --filter-by-os linux/invalid`

Actual results:

   1) Succeed with no error or warning
oc image info  quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-aarch64  --filter-by-os linux/amd64
Name:        quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-aarch64
Digest:      sha256:0c13de057d9f75c40999778bb924f654be1d0def980acbe8a00096e6bf18cc2a
Media Type:  application/vnd.docker.distribution.manifest.v2+json
Created:     16d ago
Image Size:  155.5MB in 5 layers
Layers:      75.95MB sha256:f90c4920e095dc91c490dd9ed7920d18e0327ddedcf5e10d2887e80ccae94fd7
             42.16MB sha256:a974fa00e888c491ab67f8d63456937bbaffbebb530db5ee2f9f5193fc5bb910
             10.2MB  sha256:c391a61f467f437cf6a0ba00c394aa4dbc107ecf56edd91a018de97ca4cd16bc
             26.07MB sha256:0e78634759d2f9c988dbf5ee73a7ed9a5d3b4ec28dcad5dd9086544826bbde05
             1.115MB sha256:277f2a9ba38386db697a1cbde875c1ec79988a632d006c6d697d0a79911d9955
OS:          linux
Arch:        arm64
Entrypoint:  /usr/bin/cluster-version-operator
Environment: PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
             container=oci
             GODEBUG=x509ignoreCN=0,madvdontneed=1
             __doozer=merge
             BUILD_RELEASE=202403070215.p0.g6a76ba9.assembly.stream.el9
             BUILD_VERSION=v4.16.0
             OS_GIT_MAJOR=4
             OS_GIT_MINOR=16
             OS_GIT_PATCH=0
             OS_GIT_TREE_STATE=clean
             OS_GIT_VERSION=4.16.0-202403070215.p0.g6a76ba9.assembly.stream.el9-6a76ba9
             SOURCE_GIT_TREE_STATE=clean
             __doozer_group=openshift-4.16
             __doozer_key=cluster-version-operator
             __doozer_version=v4.16.0
             OS_GIT_COMMIT=6a76ba9
             SOURCE_DATE_EPOCH=1709342193
             SOURCE_GIT_COMMIT=6a76ba95ed441893e1bdf6616c47701c0464b7f4
             SOURCE_GIT_TAG=v1.0.0-1176-g6a76ba95
             SOURCE_GIT_URL=https://github.com/openshift/cluster-version-operator
Labels:      io.openshift.release=4.16.0-ec.4
             io.openshift.release.base-image-digest=sha256:fa1b36be29e72ca5c180ce8cc599a1f0871fa5aacd3153ed4cefc84038cd439a 

2) succeed with no error or warning:
oc image info  quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-aarch64  --filter-by-os linux/invalid
Name:        quay.io/openshift-release-dev/ocp-release:4.16.0-ec.4-aarch64
Digest:      sha256:0c13de057d9f75c40999778bb924f654be1d0def980acbe8a00096e6bf18cc2a
Media Type:  application/vnd.docker.distribution.manifest.v2+json
Created:     16d ago
Image Size:  155.5MB in 5 layers
Layers:      75.95MB sha256:f90c4920e095dc91c490dd9ed7920d18e0327ddedcf5e10d2887e80ccae94fd7
             42.16MB sha256:a974fa00e888c491ab67f8d63456937bbaffbebb530db5ee2f9f5193fc5bb910
             10.2MB  sha256:c391a61f467f437cf6a0ba00c394aa4dbc107ecf56edd91a018de97ca4cd16bc
             26.07MB sha256:0e78634759d2f9c988dbf5ee73a7ed9a5d3b4ec28dcad5dd9086544826bbde05
             1.115MB sha256:277f2a9ba38386db697a1cbde875c1ec79988a632d006c6d697d0a79911d9955
OS:          linux
Arch:        arm64
Entrypoint:  /usr/bin/cluster-version-operator
Environment: PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
             container=oci
             GODEBUG=x509ignoreCN=0,madvdontneed=1
             __doozer=merge
             BUILD_RELEASE=202403070215.p0.g6a76ba9.assembly.stream.el9
             BUILD_VERSION=v4.16.0
             OS_GIT_MAJOR=4
             OS_GIT_MINOR=16
             OS_GIT_PATCH=0
             OS_GIT_TREE_STATE=clean
             OS_GIT_VERSION=4.16.0-202403070215.p0.g6a76ba9.assembly.stream.el9-6a76ba9
             SOURCE_GIT_TREE_STATE=clean
             __doozer_group=openshift-4.16
             __doozer_key=cluster-version-operator
             __doozer_version=v4.16.0
             OS_GIT_COMMIT=6a76ba9
             SOURCE_DATE_EPOCH=1709342193
             SOURCE_GIT_COMMIT=6a76ba95ed441893e1bdf6616c47701c0464b7f4
             SOURCE_GIT_TAG=v1.0.0-1176-g6a76ba95
             SOURCE_GIT_URL=https://github.com/openshift/cluster-version-operator
Labels:      io.openshift.release=4.16.0-ec.4
             io.openshift.release.base-image-digest=sha256:fa1b36be29e72ca5c180ce8cc599a1f0871fa5aacd3153ed4cefc84038cd439a

[root@localhost Doc]# echo $?
0

Expected results:

1) If the image is not a manifest list , we’d better to print out an error as these is nothing to filter Or have a warning this is not at manifest-list image;

2) Better to print out with error for the invalid arch.

Additional info:

https://github.com/openshift/oc/pull/1721

Bug OCPBUGS-29968: Generated ZTP manifests have wrong/missing Group/Version/Kind

View the Description View the linked PRs

ZTP manifests generated by the openshift-install agent create cluster-manifests command do not contain the correct Group/Version/Kind type metadata.

This means that they could not be applied to an OpenShift cluster to use with ZTP as intended.

https://github.com/openshift/installer/pull/8068

Bug OCPBUGS-24044: Converting load balancer service from internal scope to external keeps internal load balancer IP on GCP

View the Description View the linked PRs

Reproducer:
1. On a GCP cluster, create an ingress controller with internal load balancer scope, like this:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: foo
  namespace: openshift-ingress-operator
spec:
  domain: foo.<cluster-domain>
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      dnsManagementPolicy: Managed
      scope: Internal

2. Wait for load balancer service to complete rollout

$ oc -n openshift-ingress get service router-foo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-foo LoadBalancer 172.30.101.233 10.0.128.5 80:32019/TCP,443:32729/TCP 81s

3. Edit ingress controller to set spec.endpointPublishingStrategy.loadBalancer.scope to External

the load balancer service (router-foo in this case) should get an external IP address, but currently it keeps the 10.x.x.x address that was already assigned.

https://github.com/openshift/cloud-provider-gcp/pull/40

Bug OCPBUGS-25778: Update 4.16 ose-cluster-capi-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/153

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/153

Bug OCPBUGS-29099: Add hostNetwork:true to cni-sysctl-allowlist ds

View the Description View the linked PRs

In CNO, need to change cni-sysctl-allowlist daemonset to hostNetwork: true in case of multus ds + cni-sysctl-allowlist ds upgrade successfully.

https://github.com/openshift/cluster-network-operator/pull/2237

Bug OCPBUGS-33241: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/302

Bug OCPBUGS-35743: [4.16] haproxy crashlooping fresh install Openshift 4.14.10

View the Description View the linked PRs

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Clone of https://issues.redhat.com/browse/OCPBUGS-32141 for 4.16
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Description of problem:
VIP's are on a different network than the machine network on a 4.14 cluster

failing cluster: 4:14

Infrastructure
--------------
Platform: VSphere
Install Type: IPI
apiServerInternalIP: 10.8.0.83
apiServerInternalIPs: 10.8.0.83
ingressIP: 10.8.0.84
ingressIPs: 10.8.0.84

All internal IP addresses of all nodes match the Machine Network.

Machine Network: 10.8.42.0/23

Node name IP Address Matches CIDR
..............................................................................................................
sv1-prd-ocp-int-bn8ln-master-0 10.8.42.24 YES
sv1-prd-ocp-int-bn8ln-master-1 10.8.42.35 YES
sv1-prd-ocp-int-bn8ln-master-2 10.8.42.36 YES
sv1-prd-ocp-int-bn8ln-worker-0-5rbwr 10.8.42.32 YES
sv1-prd-ocp-int-bn8ln-worker-0-h7fq7 10.8.42.49 YES

logs from one of the haproxy pods

oc logs -n openshift-vsphere-infra haproxy-sv1-prd-ocp-int-bn8ln-master-0 haproxy-monitor
.....
2024-04-02T18:48:57.534824711Z time="2024-04-02T18:48:57Z" level=info msg="An error occurred while trying to read master nodes details from api-vip:kube-apiserver: failed find a interface for the ip 10.8.0.83"
2024-04-02T18:48:57.534849744Z time="2024-04-02T18:48:57Z" level=info msg="Trying to read master nodes details from localhost:kube-apiserver"
2024-04-02T18:48:57.544507441Z time="2024-04-02T18:48:57Z" level=error msg="Could not retrieve subnet for IP 10.8.0.83" err="failed find a interface for the ip 10.8.0.83"
2024-04-02T18:48:57.544507441Z time="2024-04-02T18:48:57Z" level=error msg="Failed to retrieve API members information" kubeconfigPath=/var/lib/kubelet/kubeconfig
2024-04-02T18:48:57.544507441Z time="2024-04-02T18:48:57Z" level=info msg="GetLBConfig failed, sleep half of interval and retry" kubeconfigPath=/var/lib/kubelet/kubeconfig
2024-04-02T18:49:00.572652095Z time="2024-04-02T18:49:00Z" level=error msg="Could not retrieve subnet for IP 10.8.0.83" err="failed find a interface for the ip 10.8.0.83"

There is a kcs that addresses this:
https://access.redhat.com/solutions/7037425

Howerver, this same configuration works in production on 4.12

working cluster:
Infrastructure
--------------
Platform: VSphere
Install Type: IPI
apiServerInternalIP: 10.8.0.73
apiServerInternalIPs: 10.8.0.73
ingressIP: 10.8.0.72
ingressIPs: 10.8.0.72

All internal IP addresses of all nodes match the Machine Network.

Machine Network: 10.8.38.0/23

Node name IP Address Matches CIDR
..............................................................................................................
sb1-prd-ocp-int-qls2m-cp4d-4875s 10.8.38.29 YES
sb1-prd-ocp-int-qls2m-cp4d-phczw 10.8.38.19 YES
sb1-prd-ocp-int-qls2m-cp4d-ql5sj 10.8.38.43 YES
sb1-prd-ocp-int-qls2m-cp4d-svzl7 10.8.38.27 YES
sb1-prd-ocp-int-qls2m-cp4d-x286s 10.8.38.18 YES
sb1-prd-ocp-int-qls2m-cp4d-xk48m 10.8.38.40 YES
sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 YES
sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 YES
sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 YES
sb1-prd-ocp-int-qls2m-worker-njzdx 10.8.38.15 YES
sb1-prd-ocp-int-qls2m-worker-rhqn5 10.8.38.39 YES

logs from one of the haproxy pods

2023-08-18T21:12:19.730010034Z time="2023-08-18T21:12:19Z" level=info msg="API is not reachable through HAProxy"
2023-08-18T21:12:19.755357706Z time="2023-08-18T21:12:19Z" level=info msg="Config change detected" configChangeCtr=1 curConfig="{6443 9445 29445 [

{sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 6443} {sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 6443} {sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 6443}] }"
2023-08-18T21:12:19.782529185Z time="2023-08-18T21:12:19Z" level=info msg="Removing existing nat PREROUTING rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT"
2023-08-18T21:12:19.794532220Z time="2023-08-18T21:12:19Z" level=info msg="Removing existing nat OUTPUT rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT -o lo"
2023-08-18T21:12:25.816406455Z time="2023-08-18T21:12:25Z" level=info msg="Config change detected" configChangeCtr=2 curConfig="{6443 9445 29445 [{sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 6443}

{sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 6443} {sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 6443}] }"
2023-08-18T21:12:25.919248671Z time="2023-08-18T21:12:25Z" level=info msg="Removing existing nat PREROUTING rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT"
2023-08-18T21:12:25.965663811Z time="2023-08-18T21:12:25Z" level=info msg="Removing existing nat OUTPUT rule" spec="--dst 10.8.0.73 -p tcp --dport 6443 -j REDIRECT --to-ports 9445 -m comment --comment OCP_API_LB_REDIRECT -o lo"
2023-08-18T21:12:32.005310398Z time="2023-08-18T21:12:32Z" level=info msg="Config change detected" configChangeCtr=3 curConfig="{6443 9445 29445 [{sb1-prd-ocp-int-qls2m-master-1 10.8.38.24 6443} {sb1-prd-ocp-int-qls2m-master-0 10.8.38.25 6443}

{sb1-prd-ocp-int-qls2m-master-2 10.8.38.30 6443}

] }"

The data is being redirected

found this in the sos report: sos_commands/firewall_tables/

nft_-a_list_ruleset

table ip nat { # handle 2
chain PREROUTING

{ # handle 1 type nat hook prerouting priority dstnat; policy accept; meta l4proto tcp ip daddr 10.8.0.73 tcp dport 6443 counter packets 0 bytes 0 redirect to :9445 # handle 66 counter packets 82025408 bytes 5088067290 jump OVN-KUBE-ETP # handle 30 counter packets 82025421 bytes 5088068062 jump OVN-KUBE-EXTERNALIP # handle 28 counter packets 82025439 bytes 5088069114 jump OVN-KUBE-NODEPORT # handle 26 }

chain INPUT

{ # handle 2 type nat hook input priority 100; policy accept; }

chain POSTROUTING

{ # handle 3 type nat hook postrouting priority srcnat; policy accept; counter packets 245475292 bytes 16221809463 jump OVN-KUBE-EGRESS-SVC # handle 25 oifname "ovn-k8s-mp0" counter packets 58115015 bytes 4184247096 jump OVN-KUBE-SNAT-MGMTPORT # handle 16 counter packets 187360548 bytes 12037581317 jump KUBE-POSTROUTING # handle 10 }

chain OUTPUT

{ # handle 4 type nat hook output priority -100; policy accept; oifname "lo" meta l4proto tcp ip daddr 10.8.0.73 tcp dport 6443 counter packets 0 bytes 0 redirect to :9445 # handle 67 counter packets 245122162 bytes 16200621351 jump OVN-KUBE-EXTERNALIP # handle 29 counter packets 245122163 bytes 16200621411 jump OVN-KUBE-NODEPORT # handle 27 counter packets 245122166 bytes 16200621591 jump OVN-KUBE-ITP # handle 24 }

... many more lines ...

This code was not added by the customer

None of the redirect statements are in the same file for 4.14 (the failing cluster)

ocp 4.14: (if applicable):{code:none}

How reproducible:100%

    Steps to Reproduce:{code:none}
This is the install script that our ansible job uses to install 4.12

If you need it cleared up let me know, all the items in {{}} are just variables for file paths

cp -r {{  item.0.cluster_name }}/install-config.yaml {{ openshift_base }}{{  item.0.cluster_name }}/
./openshift-install create manifests --dir {{ openshift_base }}{{  item.0.cluster_name }}/
cp -r machineconfigs/* {{ openshift_base }}{{  item.0.cluster_name }}/openshift/
cp -r {{  item.0.cluster_name }}/customizations/* {{ openshift_base }}{{  item.0.cluster_name }}/openshift/
./openshift-install create ignition-configs --dir {{ openshift_base }}{{  item.0.cluster_name }}/
./openshift-install create cluster --dir {{ openshift_base }}{{  item.0.cluster_name }} --log-level=debug

We are installing IPI on vmware

API and Ingress VIPs are configured on our external load balancer appliance. (Citrix ADCs if that matters)

Actual results:


haproxy pods crashloop and do not work
In 4.14 following the same install workflow neither the API or Ingress IP binds to masters or workers and we see HAPROXY crashlooping

Expected results:


for 4.12
Following a completion of 4.12 if we look in vmware at our master and worker nodes we will see all of them have an IP address from the machine network assigned to them, and one node from both masters and workers will have the VIP bound to them as well.

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/320

Task MON-3799: Update downstream thanos to v0.34.1

View the linked PRs

Bug OCPBUGS-24579: Cannot deploy v6-only hosts from a v4-primary dualstack cluster

View the Description View the linked PRs

Description of problem:

In a dualstack cluster, we use IPv4 URLs for callbacks to Ironic and Inspector. If the host only has IPv6 networking, the provisioning will fail. This issue affects both the normal IPI and ZTP with the converged flow.

A similar issue has been fixed as part of ~~METAL-163~~ where we use the BMC's address family to determine which URL to send to it. This bug is somewhat simpler: we can provide IPA with several URLs and let it decide which one it can use. This way, only small changes to IPA itself, ICC and CBO are required.

The fix will only affect virtual media deployments without provisioning network or with virtualMediaViaExternalNetwork:true. We don't have a good dualstack story around provisioning networks anyway.

Upstream IPA request: https://bugs.launchpad.net/ironic/+bug/2045548

Bug OCPBUGS-34165: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1041

Bug OCPBUGS-41910: Alerts with a non-standard severity label should be filtered out from Telemetry

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41908~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39246. The following is the description of the original issue:
—
Description of problem:

    Alerts with non-standard severity labels are sent to Telemeter.

Version-Release number of selected component (if applicable):

    All supported versions

How reproducible:

    Always

Steps to Reproduce:

    1. Create an always firing alerting rule with severity=foo.
    2. Make sure that telemetry is enabled for the cluster.
    3.

Actual results:

    The alert can be seen on the telemeter server side.

Expected results:

    The alert is dropped by the telemeter allow-list.

Additional info:

Red Hat operators should use standard severities: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide
Looking at the current data, it looks like ~2% of the alerts reported to Telemter have an invalid severity.

https://github.com/openshift/cluster-monitoring-operator/pull/2471

Bug OCPBUGS-27860: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/164

Bug OCPBUGS-31020: console-config sets telemeterClientDisabled: true when telemeter client is NOT disabled

View the Description View the linked PRs

Description of problem:

console-config sets telemeterClientDisabled: true even telemeter client is NOT disabled

Version-Release number of selected component (if applicable):

a cluster launched by image built with cluster-bot: build 4.16-ci,openshift/console#13677,openshift/console-operator#877

How reproducible:

Always

Steps to Reproduce:

1. Check if telemeter client is enabled
$ oc -n openshift-monitoring get pod | grep telemeter-clienttelemeter-client-7cc8bf56db-7wcs5                       3/3     Running   0          83m 
$ oc get cm cluster-monitoring-config -n openshift-monitoring
Error from server (NotFound): configmaps "cluster-monitoring-config" not found

2. Check console-config settings
$ oc get cm console-config -n openshift-console -o yaml
apiVersion: v1
data:
  console-config.yaml: |
    apiVersion: console.openshift.io/v1
    auth:
      authType: openshift
      clientID: console
      clientSecretFile: /var/oauth-config/clientSecret
      oauthEndpointCAFile: /var/oauth-serving-cert/ca-bundle.crt
    clusterInfo:
      consoleBaseAddress: https://xxxxx
      controlPlaneTopology: HighlyAvailable
      masterPublicURL: https://xxxxx:6443
      nodeArchitectures:
      - amd64
      nodeOperatingSystems:
      - linux
      releaseVersion: 4.16.0-0.test-2024-03-18-024238-ci-ln-0q7bq2t-latest
    customization:
      branding: ocp
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.16/
    kind: ConsoleConfig
    monitoringInfo:
      alertmanagerTenancyHost: alertmanager-main.openshift-monitoring.svc:9092
      alertmanagerUserWorkloadHost: alertmanager-main.openshift-monitoring.svc:9094
    plugins:
      monitoring-plugin: https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/
    providers: {}
    servingInfo:
      bindAddress: https://[::]:8443
      certFile: /var/serving-cert/tls.crt
      keyFile: /var/serving-cert/tls.key
    session: {}
    telemetry:
      telemeterClientDisabled: "true"
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-19T01:20:23Z"
  labels:
    app: console
  name: console-config
  namespace: openshift-console
  resourceVersion: "27723"
  uid: 2f9282c3-1c4a-4400-9908-4e70025afc33

Actual results:

in cm/console-config, telemeterClientDisabled is set with 'true'

Expected results:

telemeterClientDisabled property should reveal the real status of telemeter client

telemeter client is not disabled because 
1. telemeter client pod is running
2. user didn't disable telemeter client manually because 'cluster-monitoring-config' configmap doesn't exist

Additional info:

https://github.com/openshift/console-operator/pull/877

Bug OCPBUGS-42719: MCPs report wrong number of nodes when we move nodes from one custom MCP to another custom MCP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42200~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41920. The following is the description of the original issue:
—
Description of problem:

When we move one node from one custom MCP to another custom MCP, the MCPs are reporting a wrong number of nodes.

For example, we reach this situation (worker-perf MCP is not reporting the right number of nodes)

$ oc get mcp,nodes
NAME                                                                     CONFIG                                                         UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master               rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6               True      False      False      3              3                   3                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker               rendered-worker-36ee1fdc485685ac9c324769889c3348               True      False      False      1              1                   1                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf          rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556          True      False      False      2              2                   2                     0                      24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary   rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556   True      False      False      1              1                   1                     0                      7m52s

NAME                                             STATUS   ROLES                       AGE    VERSION
node/ip-10-0-13-228.us-east-2.compute.internal   Ready    worker,worker-perf-canary   138m   v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal    Ready    control-plane,master        145m   v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal    Ready    worker,worker-perf          136m   v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal   Ready    worker                      139m   v1.30.4



After 20 minutes or half an hour the MCPs start reporting the right number of nodes

Version-Release number of selected component (if applicable):
IPI on AWS version:

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.17.0-0.nightly-2024-09-13-040101 True False 124m Cluster version is 4.17.0-0.nightly-2024-09-13-040101

How reproducible:
Always

Steps to Reproduce:

    1. Create a MCP
    
     oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf: ""
EOF

    
    2. Add 2 nodes to the MCP
    
   $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf=
   $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[1].metadata.name}") node-role.kubernetes.io/worker-perf=

    3. Create another MCP
    oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-perf-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,worker-perf,worker-perf-canary]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-perf-canary: ""
EOF

    3. Move one node from the MCP created in step 1 to the MCP created in step 3
    $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-canary=
    $ oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/worker-perf-

Actual results:

The worker-perf pool is not reporting the right number of nodes. It continues reporting 2 nodes even though one of them was moved to the worker-perf-canary MCP.
$ oc get mcp,nodes
NAME                                                                     CONFIG                                                         UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master               rendered-master-c8d23b071e1ccf6cf85c7f1b31c0def6               True      False      False      3              3                   3                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker               rendered-worker-36ee1fdc485685ac9c324769889c3348               True      False      False      1              1                   1                     0                      142m
machineconfigpool.machineconfiguration.openshift.io/worker-perf          rendered-worker-perf-6b5fbffac62c3d437e307e849c44b556          True      False      False      2              2                   2                     0                      24m
machineconfigpool.machineconfiguration.openshift.io/worker-perf-canary   rendered-worker-perf-canary-6b5fbffac62c3d437e307e849c44b556   True      False      False      1              1                   1                     0                      7m52s

NAME                                             STATUS   ROLES                       AGE    VERSION
node/ip-10-0-13-228.us-east-2.compute.internal   Ready    worker,worker-perf-canary   138m   v1.30.4
node/ip-10-0-2-250.us-east-2.compute.internal    Ready    control-plane,master        145m   v1.30.4
node/ip-10-0-34-223.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-35-61.us-east-2.compute.internal    Ready    worker,worker-perf          136m   v1.30.4
node/ip-10-0-79-232.us-east-2.compute.internal   Ready    control-plane,master        144m   v1.30.4
node/ip-10-0-86-124.us-east-2.compute.internal   Ready    worker                      139m   v1.30.4

Expected results:

MCPs should always report the right number of nodes

Additional info:

It is very similar to this other issue 
https://bugzilla.redhat.com/show_bug.cgi?id=2090436
That was discussed in this slack conversation
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1653479831004619

https://github.com/openshift/machine-config-operator/pull/4625

Bug OCPBUGS-33083: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/302

Bug OCPBUGS-38942: hypershift periodic conformance are failing due to coreos changes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38941~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38925. The following is the description of the original issue:
—
Description of problem:

periodics are failing due to a change in coreos.

Version-Release number of selected component (if applicable):

    4.15,4.16,4.17,4.18

How reproducible:

    100%

Steps to Reproduce:

    1. Check any periodic conformance jobs
    2.
    3.

Actual results:

    periodic conformance fails with hostedcluster creation

Expected results:

    periodic conformance test suceeds

Additional info:

https://github.com/openshift/hypershift/pull/4615

Bug OCPBUGS-36037: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/1064

Bug OCPBUGS-34419: PowerVS: update capi ibmcloud to release 0 8 0

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34354~~. The following is the description of the original issue:
—
Description of problem:

Update the PowerVS CAPI provider to v0.8.0

https://github.com/openshift/installer/pull/8465

Bug OCPBUGS-32174: Serial Logs not returned when ssh issues occur

View the Description View the linked PRs

Description of problem:

Bootstrap process fails. When attempting to gather logs, the process fails. The SSH connection was refused.

Version-Release number of selected component (if applicable):

How reproducible:

    Alsways when failing bootstrap process

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8264

Bug OCPBUGS-44179: [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-44047~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42660. The following is the description of the original issue:
—
There were remaining issues from the original issue. A new bug has been opened to address this. This is a clone of issue ~~OCPBUGS-32947~~. The following is the description of the original issue:
—
Description of problem:

    [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-04-23-032717

How reproducible:

    Always

Steps to Reproduce:

    1.Install a vSphere 4.16 cluster, we use automated template: ipi-on-vsphere/versioned-installer
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-04-23-032717   True        False         24m     Cluster version is 4.16.0-0.nightly-2024-04-23-032717     

    2.Check the controlplanemachineset, you can see network.devices, template and workspace have value.
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset     
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   51m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T02:52:11Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "18273"
  uid: f340d9b4-cf57-4122-b4d4-0f45f20e4d79
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Active
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices:
              - networkName: devqe-segment-221
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: huliu-vs425c-f5tfl-rhcos-generated-region-generated-zone
            userDataSecret:
              name: master-user-data
            workspace:
              datacenter: DEVQEdatacenter
              datastore: /DEVQEdatacenter/datastore/vsanDatastore
              folder: /DEVQEdatacenter/vm/huliu-vs425c-f5tfl
              resourcePool: /DEVQEdatacenter/host/DEVQEcluster/Resources
              server: vcenter.devqe.ibmc.devcluster.openshift.com
status:
  conditions:
  - lastTransitionTime: "2024-04-25T02:59:37Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:01:04Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    3.Delete the controlplanemachineset, it will recreate a new one, but those three fields that had values before are now cleared.

liuhuali@Lius-MacBook-Pro huali-test % oc delete controlplanemachineset cluster
controlplanemachineset.machine.openshift.io "cluster" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
cluster   3         3         3       3                       Inactive   6s
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T03:45:51Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "46172"
  uid: 45d966c9-ec95-42e1-b8b0-c4945ea58566
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Inactive
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices: null
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: ""
            userDataSecret:
              name: master-user-data
            workspace: {}
status:
  conditions:
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    4.I active the controlplanemachineset and it does not trigger an update,  I continue to add these field values back and it does not trigger an update, I continue to edit these fields to add a second network device and it still does not trigger an update. 


            network:
              devices:
              - networkName: devqe-segment-221
              - networkName: devqe-segment-222


By the way, I can create worker machines with other network device or two network devices.
huliu-vs425c-f5tfl-worker-0a-ldbkh    Running                          81m
huliu-vs425c-f5tfl-worker-0aa-r8q4d   Running                          70m

Actual results:

    network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Expected results:

    The fields value should not be changed when deleting the controlplanemachineset, 
    Updating these fields should trigger an update, or if these fields should not be modified, then it should not take effect when modifying the controlplanemachineset, as such an inconsistency seems confusing.

Additional info:

    Must gather:  https://drive.google.com/file/d/1mHR31m8gaNohVMSFqYovkkY__t8-E30s/view?usp=sharing

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/331

Task MGMT-14633: Include custom manifests in cluster logs archive

View the Description View the linked PRs

Sometimes user manifests could be the source of problems, and right now they're not included in the logs archive downloaded for an Assisted cluster from the UI.

Currently we can only see the feature usage "Custom manifest" in metadata.json but that only tells us the user has custom manifests, not manifests and what is their value

https://github.com/openshift/assisted-service/pull/5777

Bug OCPBUGS-24299: Add permission to network-node-identity secrets

View the Description View the linked PRs

CNO managed component (network-node-identity) to conform to hypershift control plane expectations that All secrets should be mounted to not have global read. change from 420(0644) to 416(0640)

https://github.com/openshift/cluster-network-operator/pull/2142

Bug TRT-1522: gcp-network-liveness test looks broken on metal

View the Description View the linked PRs

revert: https://github.com/openshift/origin/pull/28603

output looks like:

INFO[2024-02-16T00:25:04Z] time="2024-02-16T00:20:41Z" level=error msg="disruption sample failed: error running request: 403 Forbidden: http://www.w3.org/TR/html4/strict.dtd\">\n\n\n\n\n\n\n
\n
ERROR
\n
The requested URL could not be retrieved
\n
\n
\n\n
\n

The following error was encountered while trying to retrieve the URL: http://35.212.33.188/health\">http://35.212.33.188/health

\n\n
\n

Access Denied.

\n
\n\n

Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.

\n\n

Your cache administrator is root.

\n
\n
\n\n
\n
\n

Generated Fri, 16 Feb 2024 00:20:41 GMT by ofcir-3329d9226457452fb2040e269776e3a5 (squid/5.2)

\n\n
\n\n" auditID=facdfd31-51e5-4812-a356-6e4b0e30cd38 backend=gcp-network-liveness-reused-connections this-instance="{Disruption map[backend-disruption-name:gcp-network-liveness-reused-connections connection:reused disruption:openshift-tests]}" type=reused
time="2024-02-16T00:20:41Z" level=error msg="disruption sample failed: error running request: 403 Forbidden: http://www.w3.org/TR/html4/strict.dtd\">\n\n\n\n\n\n\n
\n
ERROR

https://github.com/openshift/origin/pull/28603

Bug OCPBUGS-23788: gatewayConfig.ipForwarding allows any invalid value

View the Description View the linked PRs

Description of problem:
gatewayConfig.ipForwarding allows any invalid value but it should enforce "", "Restricted" or "Global"

You can currently even do really funky stuff with that:

oc edit network.operator/cluster
(...)
 15 spec:                                                                                                                   
 16   clusterNetwork:                                                                                                       
 17   - cidr: 10.128.0.0/14                                                                                                 
 18     hostPrefix: 23                                                                                                      
 19   - cidr: fd01::/48                                                                                                     
 20     hostPrefix: 64                                                                                                      
 21   defaultNetwork:                                                                                                       
 22     ovnKubernetesConfig:                                                                                                
 23       egressIPConfig: {}                                                                                                
 24       gatewayConfig:                                                                                                    
 25         ipForwarding: $(echo 'Im injected'; lscpu)

$ oc get pods -n openshift-ovn-kubernetes ovnkube-node-24628 -o yaml | grep sysctl -C5
      fi

      # If IP Forwarding mode is global set it in the host here.
      ip_forwarding_flag=
      if [ "$(echo 'Im injected'; lscpu)" == "Global" ]; then
        sysctl -w net.ipv4.ip_forward=1
        sysctl -w net.ipv6.conf.all.forwarding=1
      else
        ip_forwarding_flag="--disable-forwarding"
      fi

      NETWORK_NODE_IDENTITY_ENABLE=

$ oc logs -n openshift-ovn-kubernetes ovnkube-node-24628 -c ovnkube-controller | grep inje -A5
++ echo 'Im injected'
++ lscpu
+ '[' 'Im injected
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 57 bits virtual
Byte Order:                         Little Endian
CPU(s):                             112

I wouldn't consider this a security issue, because I have to be the admin to do that, and as the admin I can also simply modify the pod, but it's not very elegant to allow for some sort of code injection, even by the admin

Bug OCPBUGS-31497: [csi-snapshot-controller-operator] does not create suitable role and roleBinding for csi-snapshot-webhook

View the Description View the linked PRs

Description of problem:

[csi-snapshot-controller-operator] does not create suitable role and roleBinding for csi-snapshot-webhook

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.14.0-rc.0
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2024-03-28-004801
Kubernetes Version: v1.27.11+749fe1d

How reproducible:

Always

Steps to Reproduce:

    1. Create an OpenShift cluster on AWS;
    2. Check the csi-snapshot-webhook logs with no errors.

Actual results:

In step 2:
$ oc logs csi-snapshot-webhook-76bf9bd758-cxr7g
I0328 08:02:58.016020       1 certwatcher.go:129] Updated current TLS certificate
W0328 08:02:58.029464       1 reflector.go:424] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: failed to list *v1.VolumeSnapshotClass: volumesnapshotclasses.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumesnapshotclasses" in API group "snapshot.storage.k8s.io" at the cluster scope
E0328 08:02:58.029512       1 reflector.go:140] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: volumesnapshotclasses.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumesnapshotclasses" in API group "snapshot.storage.k8s.io" at the cluster scope
W0328 08:02:58.888397       1 reflector.go:424] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: failed to list *v1.VolumeSnapshotClass: volumesnapshotclasses.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumesnapshotclasses" in API group "snapshot.storage.k8s.io" at the cluster scope

Expected results:

In step2 the csi-snapshot-webhook logs should have no cannot list resource errors

Additional info:

The issue exist on 4.15 and 4.16 as well, in addition since 4.15+ the webhook needs additional "VolumeGroupSnapshotClass" list permissions

$ oc logs csi-snapshot-webhook-794b7b54d7-b8vl9
...
E0328 12:12:06.509158       1 reflector.go:147] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:133: Failed to watch *v1alpha1.VolumeGroupSnapshotClass: failed to list *v1alpha1.VolumeGroupSnapshotClass: volumegroupsnapshotclasses.groupsnapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumegroupsnapshotclasses" in API group "groupsnapshot.storage.k8s.io" at the cluster scope
W0328 12:12:50.836582       1 reflector.go:535] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:133: failed to list *v1alpha1.VolumeGroupSnapshotClass: volumegroupsnapshotclasses.groupsnapshot.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-cluster-storage-operator:default" cannot list resource "volumegroupsnapshotclasses" in API group "groupsnapshot.storage.k8s.io" at the cluster scope
...

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/202

Bug OCPBUGS-26594: potential regression: [sig-arch] events should not repeat pathologically for ns/openshift-monitoring

View the Description View the linked PRs

Component Readiness has found a potential regression in [sig-arch] events should not repeat pathologically for ns/openshift-monitoring.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.15
Start Time: 2024-01-04T00:00:00Z
End Time: 2024-01-10T23:59:59Z
Success Rate: 42.31%
Successes: 11
Failures: 15
Flakes: 0

Base (historical) Release: 4.14
Start Time: 2023-10-04T00:00:00Z
End Time: 2023-10-31T23:59:59Z
Success Rate: 100.00%
Successes: 151
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Other&component=Monitoring&confidence=95&environment=sdn%20no-upgrade%20amd64%20gcp%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-10%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-04%2000%3A00%3A00&testId=openshift-tests%3A567152bb097fa9ce13dd2fb6885e094a&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-monitoring&upgrade=no-upgrade&upgrade=no-upgrade&variant=serial&variant=serial

Bug OCPBUGS-25889: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/2120

Bug OCPBUGS-19802: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/2032

Bug OCPBUGS-34882: Out-of-range error in AZ check

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34870~~. The following is the description of the original issue:
—

Description of problem:

In ~~OCPBUGS-30951~~, we modified a check used in the Cinder CSI Driver Operator to relax the requirements for enabling topology support. Unfortunately in doing this we introduced a bug: we now attempt to access the volume AZ for each compute AZ, which isn't valid if there are more compute AZs than volume AZS. This needs to be addressed.

Version-Release number of selected component (if applicable):

This affects 4.14 through to master (unreleased 4.17).

How reproducible:

Always.

Steps to Reproduce:

1. Deploy OCP-on-OSP on a cluster with fewer storage AZs than compute AZs

Actual results:

Operator fails due to out-of-range error.

Expected results:

Operator should not fail.

Additional info:

None.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/171

Bug OCPBUGS-37795: cluster-capi-operator: out-of-date openstack e2e branch

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37718~~. The following is the description of the original issue:
—
Description of problem:

 The cluster-api-provider-openstack branch used for e2e testing in cluster-capi-operator is not pinned to a branch. As such the go version used in the two projects goes out of sync causing the test to fail starting.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/192

Bug OCPBUGS-24322: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ovn-kubernetes/pull/1986

Bug OCPBUGS-31290: ART requests updates to 4.16 image ose-cloud-credential-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-credential-operator/pull/689

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-31465: RHTAP build fails due to using max() function

View the Description View the linked PRs

Description of problem:

    RHTAP builds are failing because with the addition of request serving node schedueler [1] we use max() [2]  function that required a golang version 1.21 however the containerfile [3] that is used for building the HO binary in rhtap is using 1.20

[1] https://issues.redhat.com/browse/HOSTEDCP-1478 
[2] https://github.com/openshift/hypershift/pull/3776/files#diff-a7f22add63b0067c0a7c9813255519d1432821f431f6eea0c3373d0646d1a855R489
[3] https://github.com/openshift/hypershift/blob/main/Containerfile.operator#L1

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

100%

Steps to Reproduce:

    1.Run rhtap on main branch
    2.
    3.

Actual results:

    Fail

Expected results:

    Pass

Additional info:

https://github.com/openshift/hypershift/pull/3815

Bug OCPBUGS-35227: Azure CPMS periodics are failing due to non-retryable API errors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35069~~. The following is the description of the original issue:
—
Description of problem:

Reviewing https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=operator-conditions&component=Cloud%20Compute%20%2F%20Other%20Provider&confidence=95&environment=ovn%20no-upgrade%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-06-05%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-05-30%2000%3A00%3A00&testId=Operator%20results%3A6d9ee55972f66121016367d07d52f0a9&testName=operator%20conditions%20control-plane-machine-set&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard, it appears that the Azure tests are failing frequently with "Told to stop trying". Check failed before until passed.

Reviewing this, it appears that the rollout happened as expected, but the until function got a non-retryable error and exited, while the check saw that the Deletion timestamp was set and the Machine went into Running, which caused it to fail.

We should investigate why the until failed in this case as it should have seen the same machines and therefore should have seen a Running machine and passed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/300

Bug OCPBUGS-25015: Update 4.16 kube-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/596

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/596

Bug OCPBUGS-42570: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6833

Bug OCPBUGS-24608: Whereabouts assignment error

View the Description View the linked PRs

Description of problem:

    Before: Warning  FailedCreatePodSandBox  8s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "82187d55b1379aad1e6c02b3394df7a8a0c84cc90902af413c1e0d9d56ddafb0": plugin type="multus" name="multus-cni-network" failed (add): [default/netshoot-deployment-59898b5dd9-hhvfn/89e6349b-9797-4e03-8828-ebafe224dfaf:whereaboutsexample]: error adding container to network "whereaboutsexample": error at storage engine: Could not allocate IP in range: ip: 2000::1 / - 2000::ffff:ffff:ffff:fffe / range: net.IPNet{IP:net.IP{0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Mask:net.IPMask{0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}

After:  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               6s    default-scheduler  Successfully assigned default/netshoot-deployment-59898b5dd9-kk2zm to whereabouts-worker
  Normal   AddedInterface          6s    multus             Add eth0 [10.244.2.2/24] from kindnet
  Warning  FailedCreatePodSandBox  6s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "23dd45e714db09380150b5df74be37801bf3caf73a5262329427a5029ef44db1": plugin type="multus" name="multus-cni-network" failed (add): [default/netshoot-deployment-59898b5dd9-kk2zm/142de5eb-9f8a-4818-8c5c-6c7c85fe575e:whereaboutsexample]: error adding container to network "whereaboutsexample": error at storage engine: Could not allocate IP in range: ip: 2000::1 / - 2000::ffff:ffff:ffff:fffe / range: 2000::/64 / excludeRanges: [2000::/32]

Fixed upstream in #366 https://github.com/k8snetworkplumbingwg/whereabouts/pull/366

https://github.com/openshift/whereabouts-cni/pull/211

Bug OCPBUGS-24847: Update 4.16 ose-gcp-pd-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/gcp-pd-csi-driver-operator/pull/99

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/99

Bug OCPBUGS-33256: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/230

Bug OCPBUGS-35383: capi installer reconciliation takes longer than necessary

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35315~~. The following is the description of the original issue:
—
Description of problem:

If infrastructure or machine provisioning is slow, the installer may wait several minutes before declaring provisioning successful due to the exponential backoff.

For instance, if dns resolution from load balancers is slow to propogate and we

Version-Release number of selected component (if applicable):

How reproducible:

Sometimes, it depends on provisioning being slow.

Steps to Reproduce:

1. Provision a cluster in an environment that has slow dns resolution (unclear how to set this up)
2.
3.

Actual results:

The installer will only check for infrastructure or machine readiness at intervals of several minutes after a certain threshold (say 10 minutes).

Expected results:

Installer should just check regularly, e.g. every 15 seconds.

Additional info:

It may not be possible to definitively test this. We may want to just check ci logs for an improvement in provisioning time and check for lack of regressions.

https://github.com/openshift/installer/pull/8588

Bug OCPBUGS-27247: improve empty state message for Machines and MachineSets page

View the Description View the linked PRs

Description of problem:

in UPI cluster, there is no MachineSets and Machines resource, when user visits Machines and MachineSets list page, we will see simple text 'Not found'

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-16-113018

How reproducible:

Always

Steps to Reproduce:

1. setup UPI cluster
2. goes to MachineSets and Machines list page, check the empty state message

Actual results:

2. we just simply show 'Not found' text

Expected results:

2. for other resources, we show richer text 'No <resourcekind> found', so we should also show 'No Machines found' and 'No MachineSets found' for these pages

Additional info:

https://github.com/openshift/console/pull/13577

Bug OCPBUGS-34623: [vSphere CAPI install] installconfig.platform.vsphere.diskType does not work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33561~~. The following is the description of the original issue:
—
Description of problem:

    The valid values for installconfig.platform.vsphere.diskType are thin, thick, and eagerZeroedThick.But no matter the diskType is set to thick or eagerZeroedThick, the actual check result is thin.

govc vm.info --json  /DEVQEdatacenter/vm/wwei-511d-gtbqd/wwei-511d-gtbqd-master-1 | jq -r .VirtualMachines[].Layout.Disk[].DiskFile[][vsanDatastore] e7323f66-86ef-9947-a2b9-507c6f3b795c/wwei-511d-gtbqd-master-1.vmdk[fedora@preserve-wwei ~]$ govc datastore.disk.info -ds  /DEVQEdatacenter/datastore/vsanDatastore e7323f66-86ef-9947-a2b9-507c6f3b795c/wwei-511d-gtbqd-master-1.vmdk |grep Type  Type:    thin

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-05-07-025557

How reproducible:

    Setting installconfig.platform.vsphere.diskType to thick or eagerZeroedThick and continue installation.

Steps to Reproduce:

    1.Setting installconfig.platform.vsphere.diskType to thick or eagerZeroedThick     
    2.continue installation

Actual results:

    diskType is thin when install-config setting diskType: thick/eagerZeroedThick

Expected results:

    The check result for disk info should match the setting in install-config

Additional info:

https://github.com/openshift/installer/pull/8500

Bug OCPBUGS-29645: Revocation of customer certificate causes access to the cluster using kubeconfig with sre cert to be denied

View the Description View the linked PRs

Description of problem:

When a customer certificate and sre certificate are configured and approved, revocation of customer certificate causes access to the cluster using kubeconfig with sre cert to be denied

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. Create a cluster
    2. Configure a customer cert and a sre cert, they are approved
    3. Revoke a customer cert, access to the cluster using kubeconfig with sre cert gets denied

Actual results:

   Revoke a customer cert, access to the cluster using kubeconfig with sre cert gets denied

Expected results:

   Revoke a customer cert, access to the cluster using kubeconfig with sre cert succeeds

Additional info:

https://github.com/openshift/hypershift/pull/3615

Bug OCPBUGS-30724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/524

Story STOR-1714: Release leader election on operator shutdown

View the Description View the linked PRs

As OCP user, I want storage operators restarted quickly and newly started operator to start leading immediately without ~3 minute wait.

This means that the old operator should release its leadership after it receives SIGTERM and before it exists. Right now, storage operators fail to release the leadership in ~50% of cases.

Steps to reproduce:

Delete an operator Pod (`oc delete pod xyz`).
Wait for a replacement Pod to be created.
Check logs of the replacement Pod. It should contain "successfully acquired lease XYZ" relatively quickly after the Pod start (+/- 1 second?)
Go to 1. and retry few times.

This is an hack'n'hustle "work", not tied to any Epic, I'm using it just to get proper QE and tracking what operators are being updated (see linked github PRs).

Bug OCPBUGS-10851: Dynamic Plugin Template: Unable to use React Developer Tools when running console as an image

View the Description View the linked PRs

Currently, the plugin template gives you instructions for running the console using a container image, which is a lightweight to do development and avoids the need to build the console source code from scratch. The image we reference uses a production version of React, however. This means that you aren't able to use the React browser plugin to debug your application.

We should look at alternatives that allow you to use React Developer Tools. Perhaps we can publish a different image that uses a development build. Or at least we need to better document building console locally instead of using an image to allow development builds.

https://github.com/openshift/console/pull/13497

Bug OCPBUGS-24896: Update 4.16 ose-azure-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-azure/pull/293

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-azure/pull/293

Bug OCPBUGS-26173: Update 4.16 openshift-enterprise-registry-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-registry/pull/390

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-registry/pull/390

Bug OCPBUGS-25191: [azure] using marketplace image fails while retrieving the image

View the Description View the linked PRs

Description of problem:

    https://github.com/openshift/installer/pull/7778 introduced a bug where an error is always returned while retrieving a marketplace image.

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. Configure marketplace image in the install-config
    2. openshift-install create manifests
    3.

Actual results:

    $ ./openshift-install create manifests --dir ipi1 --log-level debug
DEBUG OpenShift Installer 4.16.0-0.test-2023-12-12-020559-ci-ln-xkqmlqk-latest 
DEBUG Built from commit 456ae720a83e39dffd9918c5a71388ad873b6a38 
DEBUG Fetching Master Machines...                  
DEBUG Loading Master Machines...                   
DEBUG   Loading Cluster ID...                      
DEBUG     Loading Install Config...                
DEBUG       Loading SSH Key...                     
DEBUG       Loading Base Domain...                 
DEBUG         Loading Platform...                  
DEBUG       Loading Cluster Name...                
DEBUG         Loading Base Domain...               
DEBUG         Loading Platform...                  
DEBUG       Loading Pull Secret...                 
DEBUG       Loading Platform...                    
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [controlPlane.platform.azure.osImage: Invalid value: azure.OSImage{Plan:"", Publisher:"redhat", Offer:"rh-ocp-worker", SKU:"rh-ocp-worker", Version:"413.92.2023101700"}: could not get marketplace image: %!w(<nil>), compute[0].platform.azure.osImage: Invalid value: azure.OSImage{Plan:"", Publisher:"redhat", Offer:"rh-ocp-worker", SKU:"rh-ocp-worker", Version:"413.92.2023101700"}: could not get marketplace image: %!w(<nil>)]

Expected results:

    Success

Additional info:

    When {{errors.Wrap(err, ...)}} was replaced by {{fmt.Errorf(...)}}, there is a slight difference in behavior in which {{errors.Wrap}} returns {{nil}} if {{err}} is {{nil}} but {{fmt.Errorf}} always returns an error.

https://github.com/openshift/installer/pull/7826

Bug OCPBUGS-25989: [AMQ Broker Operator] OLM deployed operator with watching multiple namespaces can't deploy its resources

View the Description View the linked PRs

Description of problem:

    Since OCP 4.15 we see issue with OLM deployed operator unable to operate in watched namespaces (multiple). It works fine with single watched namespace (subscription). Also, same test passes if we don't deploy operator using OLM, but using files.
It seems like it is permission issue based on operator log. Same test works fine on any other previous OCP 4.14 and older.

Version-Release number of selected component (if applicable):

Server Version: 4.15.0-ec.3
Kubernetes Version: v1.28.3+20a5764

How reproducible:

Always

Steps to Reproduce:

    0. oc login OCP4.15
    1. git clone https://gitlab.cee.redhat.com/amq-broker/claire
    2. make -f Makefile.downstream build ARTEMIS_VERSION=7.11.4 RELEASE_TYPE=released
    3. make -f Makefile.downstream operator_test OLM_IIB=registry-proxy.engineering.redhat.com/rh-osbs/iib:636350 OLM_CHANNEL=7.11.x  TESTS=ClusteredOperatorSmokeTests TEST_LOG_LEVEL=debug DISABLE_RANDOM_NAMESPACES=true

Actual results:

    Can't deploy artemis broker custom resource in given namespace (permission issue - see details below)

Expected results:

    Successfully deployed broker on watched namespaces

Additional info:

Log from AMQ Broker operator - seems like some permission issues since 4.15

    E0103 10:04:54.425202       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1beta1.ActiveMQArtemis: failed to list *v1beta1.ActiveMQArtemis: activemqartemises.broker.amq.io is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "activemqartemises" in API group "broker.amq.io" in the namespace "cluster-testsa"
E0103 10:04:54.425207       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1beta1.ActiveMQArtemisSecurity: failed to list *v1beta1.ActiveMQArtemisSecurity: activemqartemissecurities.broker.amq.io is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "activemqartemissecurities" in API group "broker.amq.io" in the namespace "cluster-testsa"
E0103 10:04:54.425221       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "pods" in API group "" in the namespace "cluster-testsa"
W0103 10:04:54.425296       1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1beta1.ActiveMQArtemisScaledown: activemqartemisscaledowns.broker.amq.io is forbidden: User "system:serviceaccount:cluster-tests:amq-broker-controller-manager" cannot list resource "activemqartemisscaledowns" in API group "broker.amq.io" in the namespace "cluster-testsa"

https://github.com/openshift/operator-framework-olm/pull/663

Bug OCPBUGS-28938: ART requests updates to 4.16 image ose-ibm-vpc-block-csi-driver-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/109

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/109

Task OBSDOCS-986: The Monitoring topic used by the console team needs updating

View the Description View the linked PRs

The Monitoring topic is used by the console by reviewing this file.

https://github.com/openshift/console/pull/13765

Bug OCPBUGS-31639: "oc create job" fails with "cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on"

View the Description View the linked PRs

Description of problem:

With `oc` version 4.15 on OCP 4.15, the following command fails:

$ ~/openshift-client-linux-4.15.6/oc version
Client Version: 4.15.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Kubernetes Version: v1.28.7+f1b5f6c

$ ~/openshift-client-linux-4.15.6/oc create job manual-skrenger-from-oc-415 --from=cronjob/pi
error: failed to create job: jobs.batch "manual-skrenger-from-oc-415" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>

With older versions of `oc`, this command executes as expected:

$ ~/openshift-client-linux-4.14.19/oc version
Client Version: 4.14.19
Kustomize Version: v5.0.1
Kubernetes Version: v1.28.7+f1b5f6c
$ ~/openshift-client-linux-4.14.19/oc create job manual-skrenger-with-oc-414 --from=cronjob/pi
job.batch/manual-skrenger-with-oc-414 created

Version-Release number of selected component (if applicable):

$ ~/openshift-client-linux-4.15.6/oc version
Client Version: 4.15.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Kubernetes Version: v1.28.7+f1b5f6c

How reproducible:

Always

Steps to Reproduce:

1. Set up a cluster using OCP 4.15 and set up IDP
2. Ensure a 4.15 version of `oc` client is used by executing "oc version"
3. Log in with a regular user, NOT cluster-admin (this is important)
4. Create a new project using "oc new-project example"
6. Create a Cronjob using the instructions in the documentation: https://docs.openshift.com/container-platform/4.15/nodes/jobs/nodes-nodes-jobs.html#nodes-nodes-jobs-creating-cron_nodes-nodes-jobs
7. Execute the following command to manually create a job from this cronjob: "oc create job manual-example --from=cronjob/pi"

Actual results:

Creating the job fails with:

$ ~/openshift-client-linux-4.15.6/oc create job manual-skrenger-from-oc-415 --from=cronjob/pi
error: failed to create job: jobs.batch "manual-skrenger-from-oc-415" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>

This is likely due to the missing permission on "cronjobs/finalizers". We would expect the "admin" role to have these permissions (see comments below).

Expected results:

Job is created as expected

Additional info:

`oc` Version 4.14 and OCP 4.14 did not yet show this behaviour, it seems only the new client will try to set these fields. OCP 4.14 and OCP 4.13 are also missing the necessary permissions in the "admin" role.

https://github.com/openshift/oc/pull/1824

Bug OCPBUGS-24994: Update 4.16 operator-lifecycle-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/631

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/631

Bug OCPBUGS-43797: Fix TestOperandProxyConfiguration and TestLeaderElection flakes on Image Registry Operator

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43564~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43508. The following is the description of the original issue:
—
Description of problem:

    These two tests have been flaking more often lately. The TestLeaderElection flake is partially (but not solely) connected to OCPBUGS-41903.

   TestOperandProxyConfiguration seems to fail in the teardown while waiting for other cluster operators to become available.

   Although these flakes aren't customer facing, they considerably slow development cycles (due to retests) and also consume more resources than they should (every retest runs on a new cluster), so we want to backport the fixes.

Version-Release number of selected component (if applicable):

    4.18, 4.17, 4.16, 4.15, 4.14

How reproducible:

    Sometimes

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1148

Bug OCPBUGS-24308: Ingress Router should have a PodDisruptionBudget

View the Description View the linked PRs

hypershift#1614 gave us the router Deployment (descended from the private-router Deployment), but it lacks PDB coverage. For example:

$ git --no-pager log -1 --oneline origin/main
f3f421bc7 (origin/release-4.16, origin/release-4.15, origin/main, origin/HEAD) Merge pull request #3183 from muraee/azure-kms
$ git --no-pager grep 'func [^(]*\(Deployment\|PodDisruptionBudget\)' f3f421bc7 -- control-plane-operator/controllers/hostedcontrolplane/{ingress,kas}
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/ingress/router.go:func ReconcileRouterDeployment(deployment *appsv1.Deployment, ownerRef config.OwnerRef, deploymentConfig config.DeploymentConfig, image string, config *corev1.ConfigMap) error {
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/deployment.go:func ReconcileKubeAPIServerDeployment(deployment *appsv1.Deployment,
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/pdb.go:func ReconcilePodDisruptionBudget(pdb *policyv1.PodDisruptionBudget, p *KubeAPIServerParams) error {

Both the ingress and kas packages have Reconcile*Deployment methods. Only kas has a ReconcilePodDisruptionBudget method.

This bug is asking for router to get a covering PDB too, because being able to simultaneously evict all router-* pods simultaneously (for the cluster flavors that have replicas > 1 on that Deployment) can make the incoming traffic unreachable. And some of that Route traffic looks like stuff that folks would want to be reliably reachable:

$ git --no-pager grep 'func Reconcile[^(]*Route(' f3f421bc7 -- control-plane-operator/controllers/hostedcontrolplane/{ingress,kas}
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/service.go:func ReconcileExternalPublicRoute(route *routev1.Route, owner *metav1.OwnerReference, hostname string) error {
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/service.go:func ReconcileExternalPrivateRoute(route *routev1.Route, owner *metav1.OwnerReference, hostname string) error {
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/service.go:func ReconcileInternalRoute(route *routev1.Route, owner *metav1.OwnerReference) error {
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/service.go:func ReconcileKonnectivityExternalRoute(route *routev1.Route, ownerRef config.OwnerRef, hostname string, defaultIngressDomain string) error {
f3f421bc7:control-plane-operator/controllers/hostedcontrolplane/kas/service.go:func ReconcileKonnectivityInternalRoute(route *routev1.Route, ownerRef config.OwnerRef) error {

Test plan:

1. Install a hosted cluster.
2. Log into the managment cluster, and find the namespace of the hosted cluster $NAMESPACE.
3. Evict both router pods (using a raw create, because there isn't more convenient syntax yet):

oc -n "${NAMESPACE}" get -l app=private-router -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' pods | while read NAME
do
  oc create -f - <<EOF --raw "/api/v1/namespaces/${NAMESPACE}/pods/${NAME}/eviction"
{"apiVersion": "policy/v1", "kind": "Eviction", "metadata": {"name": "${NAME}"}}
EOF
done

If that clears out both router pods right after the other, ingress will probably hiccup. And with the PDB in place, I'd expect the second eviction to fail.

https://github.com/openshift/hypershift/pull/3337

Bug OCPBUGS-24814: Update 4.16 ose-installer-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/installer/pull/7816

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/installer/pull/7816

Bug OCPBUGS-24905: Update 4.16 ose-aws-ebs-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-operator/pull/78

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24005: tls: bad certificate from kube-apiserver-operator

View the Description View the linked PRs

As this shows tls: bad certificate from kube-apiserver operator, for example, https://reportportal-openshift.apps.ocp-c1.prod.psi.redhat.com/ui/#prow/launches/all/470214, checked its must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-aws-ipi-imdsv2-fips-f14/1726036030588456960/artifacts/aws-ipi-imdsv2-fips-f14/gather-must-gather/artifacts/

MacBook-Pro:~ jianzhang$ omg logs prometheus-operator-admission-webhook-6bbdbc47df-jd5mb | grep "TLS handshake"
2023-11-27 10:11:50.687 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
2023-11-19T00:57:08.318983249Z ts=2023-11-19T00:57:08.318923708Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.129.0.35:48334: remote error: tls: bad certificate"
2023-11-19T00:57:10.336569986Z ts=2023-11-19T00:57:10.336505695Z caller=stdlib.go:105 caller=server.go:3215 msg="http: TLS handshake error from 10.129.0.35:48342: remote error: tls: bad certificate"
...
MacBook-Pro:~ jianzhang$ omg get pods -A -o wide | grep "10.129.0.35"
2023-11-27 10:12:16.382 | WARNING  | omg.utils.load_yaml:<module>:10 - yaml.CSafeLoader failed to load, using SafeLoader
openshift-kube-apiserver-operator                 kube-apiserver-operator-f78c754f9-rbhw9                          1/1    Running    2         5h27m  10.129.0.35   ip-10-0-107-238.ec2.internal

for more information slack - https://redhat-internal.slack.com/archives/CC3CZCQHM/p1700473278471309

Task MON-3705: Bump jsonnet dependencies

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2208

Bug OCPBUGS-14257: coreos-installer iso kargs show broken on Agent ISO

View the Description View the linked PRs

Running the command coreos-installer iso kargs show no longer works with the 4.13 Agent ISO. Instead we get this error:

$ coreos-installer iso kargs show agent.x86_64.iso
Writing manifest to image destination
Storing signatures
Error: No karg embed areas found; old or corrupted CoreOS ISO image.

This is almost certainly due to the way we repack the ISO as part of embedding the agent-tui binary in it.

It worked fine in 4.12. I have tested both with every version of coreos-installer from 0.14 to 0.17

https://github.com/openshift/installer/pull/7896

Bug OCPBUGS-29947: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/105

Bug OCPBUGS-25744: Deleting the node with the Ingress VIP using oc delete node causes a keepalived split-brain

View the Description View the linked PRs

Description of problem:

Deleting the node with the Ingress VIP using oc delete node causes a keepalived split-brain

Version-Release number of selected component (if applicable):

4.12, 4.14

How reproducible:

100%

Steps to Reproduce:

1. In an OpenShift cluster installed via vSphere IPI, check the node with the Ingress VIP.
2. Delete the node.
3. Check the discrepancy between machines objects and nodes. There will be more machines than nodes.
4. SSH to the deleted node, and check the VIP is still mounted and keepalived pods are running.
5. Check the VIP is also mounted in another worker.
6. SSH to the node and check the VIP is still present.

Actual results:

The deleted node still has the VIP present and the ingress fails sometimes

Expected results:

The deleted node should not have the VIP present and the ingress should not fail.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3698

Bug OCPBUGS-31365: make verify should use MCO's kube version

View the Description View the linked PRs

Description of problem:

make verify uses the latest version of setup-envtest, regardless of what go version the repo is currently on

How reproducible:

100%

Steps to Reproduce:

Run `make verify` without a local image of setup-envtest should cause the issue

Actual results:

go: sigs.k8s.io/controller-runtime/tools/setup-envtest@latest: sigs.k8s.io/controller-runtime/tools/setup-envtest@v0.0.0-20240323114127-e08b286e313e requires go >= 1.22.0 (running go 1.21.7; GOTOOLCHAIN=local)
Go compliance shim [5685] [rhel-8-golang-1.21][openshift-golang-builder]: Exited with: 1

Expected results:

make verify should be able to run without build errors

Additional info:

https://github.com/openshift/machine-config-operator/pull/4280

Bug OCPBUGS-31459: OLMv1 bumper overrides changes downstream

View the Description View the linked PRs

We merged this ART PR which bumps base images. And then bumper [reverted the changes here|https://github.com/openshift/operator-framework-operator-controller/pull/88/files].

I still see the ART bump commit in main, but there is "Add OpenShift specific files" commit on top of it with older images. Actually now we have two "Add OpenShift specific files" commits in main:

And every UPSTREAM: <carry>-prefixed commit seems to be duplicated on top of synced changes.

Expected result:

Bumper doesn't override/revert UPSTREAM: <carry>-prefixed commit contributed directly into the downstream repos. Order of UPSTREAM: <carry>-prefixed commits should be respected.

https://github.com/openshift/operator-framework-operator-controller/pull/97

Bug OCPBUGS-34427: [AWS CAPI Install] SSH on private clusters is open to public internet

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34389~~. The following is the description of the original issue:
—
Description of problem:

    When publish: internal, bootstrap SSH rules are still open to public internet (0.0.0.0/0) instead of restricted to the machine cidr.

Version-Release number of selected component (if applicable):

How reproducible:

    all private clusters

Steps to Reproduce:

    1. set publish: internal in installconfig
    2. inspect ssh rule
    3.

Actual results:

    ssh is open to public internet

Expected results:

    should be restricted to machine network

Additional info:

https://github.com/openshift/installer/pull/8467

Task MON-3838: Bump prometheus-operator to 0.73.2

View the Description View the linked PRs

Bump prometheus-operator to 0.73.2

https://github.com/openshift/prometheus-operator/pull/286

Bug OCPBUGS-36861: Pull image from gcp artifact registry failed

View the Description View the linked PRs

Description of problem:

Pull image from gcp artifact registry failed

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%

Steps to Reproduce:

1. Create repo for gcp artifact registry: zhsun-repo1
 
2. Login to registry
gcloud auth login
gcloud auth configure-docker us-central1-docker.pkg.dev 
    
3. Push image to registry
$ docker pull openshift/hello-openshift
$ docker tag openshift/hello-openshift:latest us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest
$ docker push us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest
4. Create pod
$ oc new-project hello-gcr
$ oc new-app --name hello-gcr --allow-missing-images \  
  --image us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest
5. Check pod status

Actual results:

Pull image failed.
must-gather: https://drive.google.com/file/d/1o9cyJB53vQtHNmL5EV_hIx9I_LzMTB0K/view?usp=sharing
kubelet log: https://drive.google.com/file/d/1tL7HGc4fEOjH5_v6howBpx2NuhjGKsTp/view?usp=sharing
$ oc get po               
NAME                          READY   STATUS             RESTARTS   AGE
hello-gcr-658f7f9869-76ssg    0/1     ImagePullBackOff   0          3h24m

$ oc describe po hello-gcr-658f7f9869-76ssg 
  Warning  Failed          14s (x2 over 15s)  kubelet  Error: ImagePullBackOff
  Normal   Pulling         2s (x2 over 16s)   kubelet  Pulling image "us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest"
  Warning  Failed          1s (x2 over 16s)   kubelet  Failed to pull image "us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest": rpc error: code = Unknown desc = Requesting bearer token: invalid status code from registry 403 (Forbidden)

Expected results:

Can pull image from artifact registry succeed

Additional info:

gcr.io works as expected. 
us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest doesn't work.
$ oc get po -n hello-gcr            
NAME                          READY   STATUS             RESTARTS   AGE
hello-gcr-658f7f9869-76ssg    0/1     ImagePullBackOff   0          156m
hello-gcr2-6d98c475ff-vjkt5   1/1     Running            0          163m
$ oc get po -n hello-gcr -o yaml | grep image                                                                                                                                       
    - image: us-central1-docker.pkg.dev/openshift-qe/zhsun-repo1/hello-gcr:latest
    - image: gcr.io/openshift-qe/hello-gcr:latest

https://github.com/openshift/installer/pull/8939

Story TRT-1587: Jobs are failing on watch request limits for cluster-node-tuning-operator

View the Description View the linked PRs

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-03-28-192827

[sig-arch][Late] operators should not create watch channels very often [apigroup:apiserver.openshift.io] [Suite:openshift/conformance/parallel]

{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:360]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-node-tuning-operator\" produces more watch requests than expected: watchrequestcount=209, upperbound=184, ratio=1.14",
    ]
to be empty
Ginkgo exit error 1: exit with code 1}

https://github.com/openshift/machine-config-operator/pull/4295

Bug OCPBUGS-36317: PowerVS: Liveness probe error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35798~~. The following is the description of the original issue:
—
Description of problem:

In PowerVS, when I try and deploy a 4.17 cluster, I see the following ProbeError event:
Liveness probe error: Get "https://192.168.169.11:10258/healthz": dial tcp 192.168.169.11:10258: connect: connection refused

Version-Release number of selected component (if applicable):

release-ppc64le:4.17.0-0.nightly-ppc64le-2024-06-14-211304

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/358

Bug MGMT-16588: [BE] Day2 - Non-Nutanix node successfully added to Nutanix Day2 cluster

View the Description View the linked PRs

Description of the problem:

Non-Nutanix node was successfully added to Nutanix day1 cluster

How reproducible:

100%

Steps to reproduce:

1. Deploy Nutanix day1 cluster

4. Try to add non-Nutanix day-2 node to Nutanix cluster

Actual results:

Day-2 node installation started and host installed

Expected results:

Day-2 node doesn't pass pre-installation checks

https://github.com/openshift/assisted-service/pull/5946

Bug OCPBUGS-33184: built-in cluster role for "hostmount-anyuid" not functioning

View the Description View the linked PRs

Description of problem:

There are built-in cluster roles to provide access to the default OpenShift SCCs. The "hostmount-anyuid" SCC does not have a functioning build-in cluster role, as it appears to have a typo in the name.

Version-Release number of selected component (if applicable):

How reproducible:

Consistent

Steps to Reproduce:

    1. Attempt to use "system:openshift:scc:hostmount" cluster role
    2. 
    3.

Actual results:

No access provided as the name of the SCC is typod

Expected results:

Access provided to use the SCC

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1671

Bug OCPBUGS-36002: [release-4.16] [CI-Watcher] console-yaml-sample.cy.ts test flakes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35908~~. The following is the description of the original issue:
—

  ConsoleYAMLSample CRD
      redirect to home
      ensure perspective switcher is set to Administrator
    1) creates, displays, tests and deletes a new ConsoleYAMLSample instance


  0 passing (2m)
  1 failing

  1) ConsoleYAMLSample CRD
       creates, displays, tests and deletes a new ConsoleYAMLSample instance:
     AssertionError: Timed out retrying after 30000ms: Expected to find element: `[data-test-action="View instances"]:not([disabled])`, but never found it.
      at Context.eval (webpack:///./support/selectors.ts:47:5)

console flakes

console-operator

https://github.com/openshift/console/pull/14003

Bug OCPBUGS-23518: y-stream upgrade fails because CVO has no Upgradeable condition

View the Description View the linked PRs

When upgrading a HC from 4.13 to 4.14, after admin-acking the API deprecation check, the upgrade is still blocked by the ClusterVersionUpgradeble condition on the HC being Unknown. This is because the CVO in the guest cluster does not have an Upgradeable condition anymore.

https://github.com/openshift/hypershift/pull/3239

Bug OCPBUGS-28662: openshift/origin - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28564

Bug OCPBUGS-35298: Storage degraded by VSphereProblemDetectorStarterStaticControllerDegraded during uprading to 4.16.0-0.nightly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34590~~. The following is the description of the original issue:
—
Description of problem:

Storage degraded by VSphereProblemDetectorStarterStaticControllerDegraded during uprading to  4.16.0-0.nightly

Version-Release number of selected component (if applicable):

How reproducible:

 Once

Steps to Reproduce:

    1.Run prow ci job: 
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-vsphere-ipi-disk-encryption-tang-fips-f28/1790991142867701760 

     2.Storage degraded by VSphereProblemDetectorStarterStaticControllerDegraded during uprading to  4.16.0-0.nightly from 4.15.13:
 Last Transition Time:  2024-05-16T09:35:05Z
    Message:               VSphereProblemDetectorStarterStaticControllerDegraded: "vsphere_problem_detector/04_clusterrole.yaml" (string): client rate limiter Wait returned an error: context canceled
VSphereProblemDetectorStarterStaticControllerDegraded: "vsphere_problem_detector/05_clusterrolebinding.yaml" (string): client rate limiter Wait returned an error: context canceled
VSphereProblemDetectorStarterStaticControllerDegraded: "vsphere_problem_detector/10_service.yaml" (string): client rate limiter Wait returned an error: context canceled
VSphereProblemDetectorStarterStaticControllerDegraded:
    Reason:                VSphereProblemDetectorStarterStaticController_SyncError
    Status:                True
    Type:                  Degraded

     3.must-gather is available: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-vsphere-ipi-disk-encryption-tang-fips-f28/1790991142867701760/artifacts/vsphere-ipi-disk-encryption-tang-fips-f28/gather-must-gather/

Actual results:

Storage degraded by VSphereProblemDetectorStarterStaticControllerDegraded during uprading to  4.16.0-0.nightly from 4.15.13

Expected results:

 Upgrade should be successful

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/481

Bug OCPBUGS-33511: ART requests updates to 4.16 image ose-csi-driver-shared-resource-webhook-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/177

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/177

Bug OCPBUGS-37097: QuickStartContext should be exposed via the dynamic plugin SDK

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36678~~. The following is the description of the original issue:
—
Description of problem:

Quick starts are no longer working in the kubevirt-plugin while in dev mode. The cause appears to be the use of different instances of QuickStartContext.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

1. Run kubevirt-plugin in dev mode using the instructions in the plugin's README 
2. Navigate to Virtualization > Overview
3. Click on a quick start in the QuickStarts section of the Getting Started card

Actual results:

The following error is thrown: 
setActiveQuickStart is not a function
TypeError: setActiveQuickStart is not a function
    at onClick (http://localhost:9000/api/plugins/kubevirt-plugin/exposed-ClusterOverviewPage-chunk.js:8938:21)
    at onClick (http://localhost:9000/api/plugins/kubevirt-plugin/node_modules_patternfly_react-core_dist_esm_components_SimpleList_index_js-_c4a10-chunk.js:142:25)
    at Object.qe (http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:16073)
    at Je (http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:16227)
    at http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:34214
    at Sn (http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:34308)
    at An (http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:34722)
    at http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:40370
    at Re (http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:116041)
    at http://localhost:9000/static/vendors~main-chunk-371c85e9324f56231546.min.js:117:36181

Expected results:

The quick start opens

Additional info:

https://github.com/openshift/console/pull/14070

Bug MGMT-16241: preparing-for-installation fails when modify applied-config

View the Description View the linked PRs

Description of the problem:

Prepare cluster for installation , add applied configuration from api code.

When we try to install cluster it backs to ready as expected but in case we fix configuration and try to install again it ALWAYS fails on the first attemp to install ( not related to time)

Only on the second install it works ! without changing the configuration.

How reproducible:

Always

Steps to reproduce:

1.Prepare cluster for installation

2.Create invalid applied configuration to verify that preparing-for-installation returns to ready state as expected.

invalid_override_config = {
"capabilities":

{ "additionalEnabledCapabilities": ["123454baremetal"], }

}
3. Start installation , back to ready as expected.

4. fix applied configuration and try to install again -> Fails

Actual results:

On the first attemp after config change installation fails.

It work only on the second try

Expected results:

https://github.com/openshift/assisted-service/pull/5811

Bug OCPBUGS-41557: [FLAKE] e2e: upgrade CRD with deprecated version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41498~~. The following is the description of the original issue:
—
Description of problem:

The e2e test "upgrade CRD with deprecated version" in the test/e2e/installplan_e2e_test.go suite is flaking

Version-Release number of selected component (if applicable):

How reproducible:

Hard to reproduce, could be related to other tests running at the same time, or any number of things.

Steps to Reproduce:

It might be worthwhile trying to re-rerun the test multiple times against a ClusterBot, or OpenShift Local, cluster

Actual results:

Expected results:

Additional info:

https://github.com/openshift/operator-framework-olm/pull/861

Bug OCPBUGS-24952: Update 4.16 csi-driver-nfs-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-nfs/pull/136

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-nfs/pull/136

Bug OCPBUGS-35319: [CI-Watcher] knative integration tests failing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34656~~. The following is the description of the original issue:
—
knative-ci.feature testi is failing with:

  Logging in as kubeadmin
      Installing operator: "Red Hat OpenShift Serverless"
      Operator Red Hat OpenShift Serverless was not yet installed.
      Performing Serverless post installation steps
      User has selected namespace knative-serving
  1) "before all" hook for "Create knative workload using Container image with extrenal registry on Add page: KN-05-TC05 (example #1)"

  0 passing (3m)
  1 failing

  1) Perform actions on knative service and revision
       "before all" hook for "Create knative workload using Container image with extrenal registry on Add page: KN-05-TC05 (example #1)":
     AssertionError: Timed out retrying after 40000ms: Expected to find element: `[title="knativeservings.operator.knative.dev"]`, but never found it.

Because this error occurred during a `before all` hook we are skipping all of the remaining tests.

Although you have test retries enabled, we do not retry tests when `before all` or `after all` hooks fail
      at createKnativeServing (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/knativeSubscriptions.ts:15:5)
      at performPostInstallationSteps (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/installOperatorOnCluster.ts:176:26)
      at verifyAndInstallOperator (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/installOperatorOnCluster.ts:221:2)
      at verifyAndInstallKnativeOperator (webpack:////go/src/github.com/openshift/console/frontend/packages/dev-console/integration-tests/support/pages/functions/installOperatorOnCluster.ts:231:27)
      at Context.eval (webpack:///./support/commands/hooks.ts:7:33)



[mochawesome] Report JSON saved to /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress_report_knative.json


  (Results)

  ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
  │ Tests:        16                                                                               │
  │ Passing:      0                                                                                │
  │ Failing:      1                                                                                │
  │ Pending:      0                                                                                │
  │ Skipped:      15                                                                               │
  │ Screenshots:  1                                                                                │
  │ Video:        true                                                                             │
  │ Duration:     3 minutes, 8 seconds                                                             │
  │ Spec Ran:     knative-ci.feature                                                               │
  └────────────────────────────────────────────────────────────────────────────────────────────────┘


  (Screenshots)

  -  /go/src/github.com/openshift/console/frontend/gui_test_screenshots/cypress/scree     (1280x720)
     nshots/knative-ci.feature/Create knative workload using Container image with ext               
     renal registry on Add page KN-05-TC05 (example #1) -- before all hook (failed).p               
     ng

Search link

https://github.com/openshift/console/pull/13962

Bug OCPBUGS-35884: Add runbook_url for the PrometheusDuplicateTimestamps alert

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35397~~. The following is the description of the original issue:
—
Description of problem:

The runbook was added in https://issues.redhat.com/browse/MON-3862
The alert is more likely to fire in >=4.16

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2389

Bug OCPBUGS-9133: ClusterVersion Failing=True and Available=False should trigger alerts

View the Description View the linked PRs

We have ClusterOperatorDown and ClusterOperatorDegraded in this space for ClusterOperator conditions. We should wire that up for ClusterVersion as well.

https://github.com/openshift/cluster-version-operator/pull/746

Bug OCPBUGS-30430: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2320

Bug OCPBUGS-29003: Default Internal Registry cleans custom images stored on it from 4.13 to 4.14

View the Description View the linked PRs

Description of problem:

After upgrade from 4.13.x to 4.14.10, the workload images that the customer stored inside the internal registry are lost, resulting the applications pods into error 
"Back-off pulling image".

Even when manually pulling with podman, it fails then with "manifest unknown" because the image cannot be found in the registry anymore.


- This behavior was found and reproduced 100% on ARO clusters, where the internal registry is by default backed up by the Storage Account created by the ARO RP service principal, which is the Containers blob service.

- I do not know if in non-managed Azure clusters or any other architecture the same behavior is found.

Version-Release number of selected component (if applicable):

4.14.10

How reproducible:

100% with an ARO cluster (Managed cluster)

Steps to Reproduce: Attached.

The workaround found so far is to rebuild the apps or re-import the images. But those tasks are lengthy and costly specially if it is a production cluster.

Bug OCPBUGS-30438: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/237

Bug OCPBUGS-31344: Oc-mirror get the wrong index.json and failed when ImageSetConfig containing OCI FBC

View the Description View the linked PRs

Description of problem:

Oc-mirror get the wrong index.json and failed when ImageSetConfig containing OCI FBC

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295", GitCommit:"c4f829512107f7d0f52a057cd429de2030b9b3b3", GitTreeState:"clean", BuildDate:"2024-03-07T03:46:24Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1)  Copy the operator as OCI format to localhost:
`skopeo copy docker://registry.redhat.io/redhat/redhat-operator-index:v4.12 oci:///app1/noo/redhat-operator-index  --remove-signatures`

2)  Use following imagesetconfigure for mirror:
cat config-oci.yaml
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
storageConfig:
  registry:
    imageURL: registryhost:5000/metadata:latest
mirror:
  additionalImages:
   - name: quay.io/openshifttest/bench-army-knife@sha256:078db36d45ce0ece589e58e8de97ac1188695ac155bc668345558a8dd77059f6
  platform:
    channels:
    - name: stable-4.12
      type: ocp
    graph: true
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12
      packages:
       - name: elasticsearch-operator
    - catalog: oci:///app1/noo/redhat-operator-index
      packages:
        - name: cluster-kube-descheduler-operator
        - name: odf-operator

`oc-mirror --config config-oci.yaml file://outoci --v2`

Actual results:

2) In the configuration we are use oci:///app1/noo/redhat-operator-index, so should not to check index.json under outoci/working-dir/operator-images/redhat-operator-index/index.json

 oc-mirror --config config-oci.yaml file://outoci --v2
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used. 
2024/03/25 06:23:06  [INFO]   : mode mirrorToDisk
2024/03/25 06:23:06  [INFO]   : local storage registry will log to /app1/0321/outoci/working-dir/logs/registry.log
2024/03/25 06:23:06  [INFO]   : starting local storage on localhost:55000
2024/03/25 06:23:06  [INFO]   : detected minimum version as 4.12.53
2024/03/25 06:23:06  [INFO]   : detected minimum version as 4.12.53
2024/03/25 06:23:07  [INFO]   : Found update 4.12.53
2024/03/25 06:23:07  [INFO]   : signature b584f5458fb946115b0cf0f1793dc9224c5e6a4567e74018f0590805a03eb523
2024/03/25 06:23:07  [WARN]   : signature for b584f5458fb946115b0cf0f1793dc9224c5e6a4567e74018f0590805a03eb523 not in cache
2024/03/25 06:23:07  [INFO]   : content {"critical": {"image": {"docker-manifest-digest": "sha256:b584f5458fb946115b0cf0f1793dc9224c5e6a4567e74018f0590805a03eb523"}, "type": "atomic container signature", "identity": {"docker-reference": "quay.io/openshift-release-dev/ocp-release:4.12.53-x86_64"}}, "optional": {"creator": "Red Hat OpenShift Signing Authority 0.0.1"}}
2024/03/25 06:23:07  [INFO]   : image found : quay.io/openshift-release-dev/ocp-release:4.12.53-x86_64
2024/03/25 06:23:07  [INFO]   : public Key : 567E347AD0044ADE55BA8A5F199E2F91FD431D51
2024/03/25 06:23:07  [INFO]   : copying  quay.io/openshift-release-dev/ocp-release:4.12.53-x86_64
2024/03/25 06:23:12  [INFO]   : copying  cincinnati response to outoci/working-dir/release-filters
2024/03/25 06:23:12  [INFO]   : creating graph data image
2024/03/25 06:23:15  [INFO]   : graph image created and pushed to cache.
2024/03/25 06:23:15  [INFO]   : total release images to copy 185
2024/03/25 06:23:15  [INFO]   : copying operator image registry.redhat.io/redhat/redhat-operator-index:v4.12
2024/03/25 06:23:18  [INFO]   : manifest 7b9891532a76194c1b18698518abad9be4aca7f1152ac73f450aa8bfadef538f
2024/03/25 06:23:18  [INFO]   : label /configs
2024/03/25 06:23:36  [INFO]   : copying operator image oci:///app1/noo/redhat-operator-index
error closing log file registry.log: close outoci/working-dir/logs/registry.log: file already closed
2024/03/25 06:23:36  [ERROR]  : open outoci/working-dir/operator-images/redhat-operator-index/index.json: no such file or directory

Expected results:

2) oc-mirror find correct path for index.json and not fail

https://github.com/openshift/oc-mirror/pull/820

Bug OCPBUGS-25337: [OVN][IPSEC] ovn-ipsec-host pods got deleted when there is a NotReady node

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2161

Bug OCPBUGS-27927: Update 4.16 ose-openshift-apiserver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-apiserver/pull/415

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-apiserver/pull/415

Bug OCPBUGS-18326: CMO manifest lack capability annotations

View the Description View the linked PRs

The CVO managed manifest, that CMO ships lack capability annotations as defined in https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md#manifest-annotations.

The dashboards should be tied to the console capability so that when CMO deploys on a cluster without the Console capability, CVO doesn't deploy the dashboards configmap.

https://github.com/openshift/cluster-monitoring-operator/pull/2254

Bug OCPBUGS-19054: No warning that TechPreview is not supported by agent installer

View the Description View the linked PRs

Description of problem:

The agent-based installer does not support the TechPreviewNoUpgrade featureSet, and by extension nor does it support any of the features gated by it. Because of this, there is no warning about one of these features being specified - we expect the TechPreviewNoUpgrade feature gate to error out when any of them are used.

However, we don't warn about TechPreviewNoUpgrade itself being ignored, so if the user does specify it then they can use some of these non-supported features without being warned that their configuration is ignored.

We should fail with an error when TechPreviewNoUpgrade is specified, until such time as AGENT-554 is implemented.

https://github.com/openshift/installer/pull/7825

Bug OCPBUGS-28607: Excessive revision history stored for HyperShift control plane components

View the Description View the linked PRs

Description of problem:

HyperShift-managed components use the default RevisionHistoryLimit of 10. This significantly impacts etcd load and scalability on the management cluster.

Version-Release number of selected component (if applicable):

4.9, 4.10, 4.11, 4.12, 4.13, 4.14, 4.15, 4.16

How reproducible:

100% (may vary depending on resource availablility on management cluster)

Steps to Reproduce:

    1. Create 375+ HostedCluster
    2. Observe etcd performance on management cluster
    3.

Actual results:

etcd hitting storage space limits

Expected results:

Able to manage HyperShift control planes at scale (375+ HostedClusters)

Additional info:

https://github.com/openshift/hypershift/pull/3477

Bug OCPBUGS-31613: [cee.Next]Adding same value to the Vmware plugin on OCP console should not lead to nodes reboot

View the Description View the linked PRs

Description of problem:

On OCP console if we added a parameter related to VMware,add the same value back again and click on save the nodes are rebooted

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

Steps to Reproduce:

    1. On any 4.14+ cluster go to ocp console page
    2. Click on the vmware plugin
    3. Edit any parameter and add the same value again.
    4. Click on save

Actual results:

    The nodes reboot to pickup change

Expected results:

 nodes should not reboot if the same values are entered

Additional info:

https://github.com/openshift/console/pull/13726

Bug OCPBUGS-36410: Content mirrored with oc-mirror v2 does not push openshift/release images to registry

View the Description View the linked PRs

Description of problem:

When mirroring content with oc-mirror v2, some required images for OpenShift installation are missing from the registry

Version-Release number of selected component (if applicable):

OpenShift installer version: v4.15.17 

[admin@registry ~]$ oc-mirror version
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202406131906.p0.g7c0889f.assembly.stream.el9-7c0889f", GitCommit:"7c0889f4bd343ccaaba5f33b7b861db29b1e5e49", GitTreeState:"clean", BuildDate:"2024-06-13T22:07:44Z", GoVersion:"go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

Use oc-mirror v2 to mirror content.

$ cat imageset-config-ocmirrorv2-v4.15.yaml
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v2alpha1
mirror:
  platform:
    channels:
    - name: stable-4.15
      minVersion: 4.15.17
      type: ocp
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
    full: false
    packages:
      - name: ansible-automation-platform-operator
      - name: cluster-logging
      - name: datagrid
      - name: devworkspace-operator
      - name: multicluster-engine
      - name: multicluster-global-hub-operator-rh
      - name: odf-operator
      - name: quay-operator
      - name: rhbk-operator
      - name: skupper-operator
      - name: servicemeshoperator
      - name: submariner
      - name: lvms-operator
      - name: odf-lvm-operator
  - catalog: registry.redhat.io/redhat/certified-operator-index:v4.15
    full: false
    packages:
      - name: crunchy-postgres-operator
      - name: nginx-ingress-operator
  - catalog: registry.redhat.io/redhat/community-operator-index:v4.15
    full: false
    packages:
      - name: argocd-operator
      - name: cockroachdb
      - name: infinispan
      - name: keycloak-operator
      - name: mariadb-operator
      - name: nfs-provisioner-operator
      - name: postgresql
      - name: skupper-operator
  additionalImages:
  - name: registry.redhat.io/ubi8/ubi:latest
  - name: registry.access.redhat.com/ubi8/nodejs-18
  - name: registry.redhat.io/openshift4/ose-prometheus:v4.14.0
  - name: registry.redhat.io/service-interconnect/skupper-router-rhel9:2.4.3
  - name: registry.redhat.io/service-interconnect/skupper-config-sync-rhel9:1.4.4
  - name: registry.redhat.io/service-interconnect/skupper-service-controller-rhel9:1.4.4
  - name: registry.redhat.io/service-interconnect/skupper-flow-collector-rhel9:1.4.4
  helm: {}


Run oc-mirror using the command:

oc-mirror --v2 \
-c imageset-config-ocmirrorv2-v4.15.yaml  \
--workspace file:////data/oc-mirror/workdir/ \
docker://registry.local.momolab.io:8443/mirror

Steps to Reproduce:

    1. Install Red Hat Quay mirror registry
    2. Mirror using oc-mirror v2 command and steps above
    3. Install OpenShift

Actual results:

    Installation fails

Expected results:

    Installation succeeds

Additional info:

 ## Check logs on coreos:
[core@sno1 ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Jul 02 03:46:22 sno1.local.momolab.io bootkube.sh[13486]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06: (Mirrors also failed: [registry.local.momolab.io:8443/mirror/openshift/release@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06: reading manifest sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06 in registry.local.momolab.io:8443/mirror/openshift/release: name unknown: repository not found]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06: reading manifest sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized

## Check if that image was pulled:

[admin@registry ~]$ cat /data/oc-mirror/workdir/working-dir/dry-run/mapping.txt | grep -i f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06
docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06=docker://registry.local.momolab.io:8443/mirror/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06

## Problem is, it doesn't exist on the registry (also via UI):

[admin@registry ~]$ podman pull registry.local.momolab.io:8443/mirror/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06
Trying to pull registry.local.momolab.io:8443/mirror/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06...
Error: initializing source docker://registry.local.momolab.io:8443/mirror/openshift-release-dev/ocp-v4.0-art-dev@sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06: reading manifest sha256:f36e139f75b179ffe40f5a234a0cef3f0a051cc38cbde4b262fb2d96606acc06 in registry.local.momolab.io:8443/mirror/openshift-release-dev/ocp-v4.0-art-dev: manifest unknown

https://github.com/openshift/oc-mirror/pull/901

Vulnerability OCPBUGS-43950: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/604

Bug OCPBUGS-11933: pod scc annotation shows "privileged" even though the audit logs mention "anyuid" SCC was picked

View the Description View the linked PRs

Description of problem:

When I create a pod with empty security context as a user that has access to all SCCs, the SCC annotation shows "privileged"

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. create a bare pod with an empty security context
2. look at the "openshift.io/scc" annotation

Actual results:

privileged

Expected results:

anyuid

Additional info:

kind: Pod
apiVersion: v1
metadata:
  name: mypod
spec:
    restartPolicy: Never
    containers:
    - name: fedora
      image: fedora:latest
      command:
      - sleep
      args:
      - "infinity"

https://github.com/openshift/kubernetes/pull/1910

Bug OCPBUGS-25164: 4.15 server does not have PodMetrics/NodeMetrics

View the Description View the linked PRs

Description of problem:

checked with 4.15.0-0.nightly-2023-12-11-033133, there are not PodMetrics/NodeMetrics in server

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-12-11-033133   True        False         122m    Cluster version is 4.15.0-0.nightly-2023-12-11-033133

$ oc api-resources | grep -i metrics
nodes                                                                                                                        metrics.k8s.io/v1beta1                        false        NodeMetrics
pods                                                                                                                         metrics.k8s.io/v1beta1                        true         PodMetrics

$ oc explain PodMetrics
the server doesn't have a resource type "PodMetrics"
$ oc explain NodeMetrics
the server doesn't have a resource type "NodeMetrics"

$ oc get NodeMetrics
error: the server doesn't have a resource type "NodeMetrics"
$ oc get PodMetrics -A
error: the server doesn't have a resource type "PodMetrics"

no issue with 4.14.0-0.nightly-2023-12-11-135902

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-12-11-135902   True        False         88m     Cluster version is 4.14.0-0.nightly-2023-12-11-135902

$ oc api-resources | grep -i metrics
nodes                                                                                                                        metrics.k8s.io/v1beta1                        false        NodeMetrics
pods                                                                                                                         metrics.k8s.io/v1beta1                        true         PodMetrics

$ oc explain PodMetrics
GROUP:      metrics.k8s.io
KIND:       PodMetrics
VERSION:    v1beta1DESCRIPTION:
    PodMetrics sets resource usage metrics of a pod.
...

$ oc explain NodeMetrics
GROUP:      metrics.k8s.io
KIND:       NodeMetrics
VERSION:    v1beta1DESCRIPTION:
    NodeMetrics sets resource usage metrics of a node.
...

$ oc get PodMetrics -A
NAMESPACE                                          NAME                                                                       CPU    MEMORY      WINDOW
openshift-apiserver                                apiserver-65f777466-4m8nj                                                  9m     297512Ki    5m0s
openshift-apiserver                                apiserver-65f777466-g7n72                                                  10m    313308Ki    5m0s
openshift-apiserver                                apiserver-65f777466-xzd8l                                                  12m    293008Ki    5m0s
openshift-apiserver-operator                       openshift-apiserver-operator-54945b8bbd-bxkcj                              3m     119264Ki    5m0s
...

$ oc get NodeMetrics
NAME                                        CPU     MEMORY      WINDOW
ip-10-0-20-163.us-east-2.compute.internal   765m    8349848Ki   5m0s
ip-10-0-22-189.us-east-2.compute.internal   388m    5363132Ki   5m0s
ip-10-0-41-231.us-east-2.compute.internal   1274m   7243548Ki   5m0s
...

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-11-033133

How reproducible:

always

Steps to Reproduce:

1. see the description

Actual results:

4.15 server does not have PodMetrics/NodeMetrics

Expected results:

should have

https://github.com/openshift/kubernetes-metrics-server/pull/25

Bug OCPBUGS-27376: Domain name starting with numeric character is invalid for Assisted Installer

View the Description View the linked PRs

Description of problem:

Customer is trying to create RHOCP cluster with Domain name 123mydomain.com

In RedHat Hybrid Cloud Console customer is getting below error : 
~~~
Failed to update the cluster
DNS format mismatch: 123mydomain.com domain name is not valid
~~~

*** As per regex check as described in KCS - https://access.redhat.com/solutions/5517531
The domain name starting with numeric character is valid e.g. 123mydomain.com

Below is the RegeX check to find the domain name validity :
 [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]

*** From the validations the assisted installer does as per : https://github.com/openshift/assisted-service/blob/master/pkg/validations/validations.go 

The below regexps are applied:
baseDomainRegex          = `^[a-z]([\-]*[a-z\d]+)+$`
dnsNameRegex             = `^([a-z]([\-]*[a-z\d]+)*\.)+[a-z\d]+([\-]*[a-z\d]+)+$`
wildCardDomainRegex      = `^(validateNoWildcardDNS\.).+\.?$`
hostnameRegex            = `^[a-z0-9][a-z0-9\-\.]{0,61}[a-z0-9]$`
installerArgsValuesRegex = `^[A-Za-z0-9@!#$%*()_+-=//.,";':{}\[\]]+$`
  
This means the domain name must start with a letter [a-z].

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Open RedHat Hybrid Cloud Console
2. Go to Clusters
3. Create Cluster
4. Go to Datacenter 
5. Under Assisted Installer -> Create Cluster
6. Enter Cluster Name mytestcluster and enter Domain Name 123mydomain.com
7. Click on Next

Actual results:

Domain name with numeric character first and then letters e.g. 123mydomain.com showing invalid in RedHat Hybrid Cloud Console Assisted Installer , throwing error :-
Failed to create new cluster
DNS format mismatch: 123mydomain.com domain name is not valid

Expected results:

Domain name with numeric character first and then letters e.g. 123mydomain.com must be valid in OpenShift RedHat Hybrid Cloud Console Assisted Installer

https://github.com/openshift/assisted-service/pull/5914

Bug OCPBUGS-33710: Bump to kubernetes 1.29.5

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.29.5:

Changelog:
v1.29.5: https://github.com/kubernetes/kubernetes/blob/release-1.29/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1294

Bug OCPBUGS-37689: Switch to use annotations as labels from PipelineRuns created through Pipelines as Code is deprecated

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36619~~. The following is the description of the original issue:
—

Description of problem:

The labels added by PAC have been deprecated and added to PLR annotations. So, use annotations to get the value in the repository list page, repository PLRs list page, and on the PLR details page.

https://github.com/openshift/console/pull/14094

Task MON-3794: Bump downstream Prometheus to v2.51.1

View the linked PRs

Bug OCPBUGS-28708: add new arm64 tested azure instance types in installer doc

View the Description View the linked PRs

Description of problem:

When running 4.15 installer full function test, detect below one arm64 instance families and verified, need to append them in installer doc[1]:
- standardBpsv2Family

[1] https://github.com/openshift/installer/blob/master/docs/user/azure/tested_instance_types_aarch64.md

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7965

Bug OCPBUGS-32331: FIPS clusters cannot complete due to oauth-server

View the Description View the linked PRs

Since approximately 12 April, all FIPS CI is broken, with the authentication operator failing to come up.

Sippy

The oauth-openshift containers are failing with the message:

Copying system trust bundle
FIPS mode is enabled, but the required OpenSSL backend is unavailable

This is due to https://github.com/openshift/oauth-server/commit/8a6f3a11a4b25e3e22152252720490b9f355ce53 changing the base image to RHEL 9 while leaving the builder image as RHEL 8. When the binary starts, it can not find the RHEL 8 OpenSSL it was linked against.

https://github.com/openshift/oauth-server/pull/145

Bug OCPBUGS-35567: [4.16] Failed to pull/push blob from/to image registry on external OIDC cluster

View the Description View the linked PRs

This is a clone of ~~OCPBUGS-35335~~.

Description of problem:

user.openshift.io and oauth.openshift.io APIs are not unavailable in external oidc cluster, that conducts all the common pull/push blob from/to image registry failed.

Version-Release number of selected component (if applicable):

4.15.15

How reproducible:

always

Steps to Reproduce:

1.Create a ROSA HCP cluster which configured external oidc users
2.Push data to image registry under a project
oc new-project wxj1
oc new-build httpd~https://github.com/openshift/httpd-ex.git 
3.

Actual results:

$ oc logs -f build/httpd-ex-1
Cloning "https://github.com/openshift/httpd-ex.git" ...	Commit:	1edee8f58c0889616304cf34659f074fda33678c (Update httpd.json)	Author:	Petr Hracek <phracek@redhat.com>	Date:	Wed Jun 5 13:00:09 2024 +0200time="2024-06-12T09:55:13Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"I0612 09:55:13.306937       1 defaults.go:112] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].Caching blobs under "/var/cache/blobs".Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/httpd@sha256:765aa645587f34e310e49db7cdc97e82d34122adb0b604eea891e0f98050aa77...Warning: Pull failed, retrying in 5s ...Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/httpd@sha256:765aa645587f34e310e49db7cdc97e82d34122adb0b604eea891e0f98050aa77...Warning: Pull failed, retrying in 5s ...Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/httpd@sha256:765aa645587f34e310e49db7cdc97e82d34122adb0b604eea891e0f98050aa77...Warning: Pull failed, retrying in 5s ...error: build error: After retrying 2 times, Pull image still failed due to error: unauthorized: unable to validate token: NotFound


oc logs -f deploy/image-registry -n openshift-image-registry

time="2024-06-12T09:55:13.36003996Z" level=error msg="invalid token: the server could not find the requested resource (get users.user.openshift.io ~)" go.version="go1.20.12 X:strictfipsruntime" http.request.host="image-registry.openshift-image-registry.svc:5000" http.request.id=0c380b81-99d4-4118-8de3-407706e8767c http.request.method=GET http.request.remoteaddr="10.130.0.35:50550" http.request.uri="/openshift/token?account=serviceaccount&scope=repository%3Aopenshift%2Fhttpd%3Apull" http.request.useragent="containers/5.28.0 (github.com/containers/image)"

Expected results:

Should pull/push blob from/to image registry on external oidc cluster

Additional info:

https://github.com/openshift/image-registry/pull/405

Bug OCPBUGS-24700: Certificate Authorities Support For Custom release-signature stores

View the Description View the linked PRs

Description of problem:

The new-in-4.15 ClusterVersion spec.signatureStores should implement the ca property.

Version-Release number of selected component (if applicable):

4.15 and 4.15.

How reproducible:

Every time, for TechPreviewNoUpgrade clusters where signatureStores exists.

Steps to Reproduce:

1. Install a TechPreviewNoUpgrade cluster.
2. Set up a signature store in the cluster behind the self-signed ingress/router CA:

FIXME

3. Patch ClusterVersion to ask the CVO to use that store.

FIXME

4. Ask the cluster to update to a release whose signature is in the custom store:

FIXME

Actual results:

FIXME

Expected results:

The update is accepted and begins rolling out, as shown by oc adm upgrade. Whether the update successfully completes or not is not relevant.

https://github.com/openshift/cluster-version-operator/pull/1030

Bug OCPBUGS-25018: PipelineRun List page list PipelineRuns from all namespace

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13432

Bug OCPBUGS-35934: Missing management cluster capabilities check on ovnkube-sbdb route removal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35874~~. The following is the description of the original issue:
—
Description of problem:

The ovnkube-sbdb route removal is missing a management cluster capabilities check and thus fails on a Kubernetes based management cluster.

Version-Release number of selected component (if applicable):

4.15.z, 4.16.0, 4.17.0

How reproducible:

Always

Steps to Reproduce:

Deploy an OpenShift version 4.16.0-rc.6 cluster control plane using HyperShift on a Kubernetes based management cluster.

Actual results:

Cluster control plane deployment fails because the cluster-network-operator pod is stuck in Init state due to the following error:

{"level":"error","ts":"2024-06-19T20:51:37Z","msg":"Reconciler error","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","HostedControlPlane":{"name":"cppjslm10715curja3qg","namespace":"master-cppjslm10715curja3qg"},"namespace":"master-cppjslm10715curja3qg","name":"cppjslm10715curja3qg","reconcileID":"037842e8-82ea-4f6e-bf28-deb63abc9f22","error":"failed to update control plane: failed to reconcile cluster network operator: failed to clean up ovnkube-sbdb route: error getting *v1.Route: no matches for kind \"Route\" in version \"route.openshift.io/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

Expected results:

Cluster control plane deployment succeeds.

Additional info:

https://ibm-argonauts.slack.com/archives/C01C8502FMM/p1718832205747529

https://github.com/openshift/hypershift/pull/4264

Spike OPRUN-3268: Impact statement request for OCPBUGS-24009 OLM Operator packageserver Reporting Unavailable on InstallComponentFailed

View the Description View the linked PRs

Impact assessment of OCPBUGS-24009

Which 4.y.z to 4.y'.z' updates increase vulnerability?

Any upgrade up to 4.15.{current-z}

Which types of clusters?

Any non-Microshift cluster with an operator installed via OLM before upgrade to 4.15. After upgrading to 4.15, re-installing a previously uninstalled operator may also cause this issue.

What is the impact? Is it serious enough to warrant removing update recommendations?

OLM Operators can't be upgraded and may incorrectly report failed status.

How involved is remediation?

Delete the resources associated with the OLM installation related to the failure message in the olm-operator.

A failure message similar to this may appear on the CSV:

InstallComponentFailed install strategy failed: rolebindings.rbac.authorization.k8s.io "openshift-gitops-operator-controller-manager-service-auth-reader" already exists

The following resource types have been observed to encounter this issue and should be safe to delete:

ClusterRoleBinding suffixed with "-system:auth-delegator"
Service
RoleBinding suffixed with "-auth-reader"

Under no circumstances should a user delete a CustomResourceDefinition (CRD) if the same error occurs and names such a resource as data loss may occur. Note that we have not seen this type of resource named in the error from any of our users so far.

Labeling the problematic resources with olm.managed: "true" then restarting the olm-operator pod in the openshift-operator-lifecycle-manager namespace may also resolve the issue if the resource appears risky to delete.

Is this a regression?

Yes, functionality which worked in 4.14 may break after upgrading to 4.15.Not a regression, this is a new issue related to performance improvements added to OLM in 4.15

https://issues.redhat.com/browse/OCPBUGS-24009

https://issues.redhat.com/browse/OCPBUGS-31080

https://issues.redhat.com/browse/OCPBUGS-28845

https://github.com/openshift/operator-framework-operator-controller/pull/94

Bug OCPBUGS-24701: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1599

Bug OCPBUGS-32948: Azure pod identity webhook not provided after migration to Microsoft Entra Workload ID.

View the Description View the linked PRs

Description of problem:

    After migration works complete, “pod-identity-webhook” deployment is not in the namespace "openshift-cloud-credential-operator".

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1.Prepare an Azure OpenShift cluster.
    2.Migration to Azure AD workload Identity using procedure https://github.com/openshift/cloud-credential-operator/blob/master/docs/azure_workload_identity.md#steps-to-in-place-migrate-an-openshift-cluster-to-azure-ad-workload-identity.
    3.

Actual results:

    Azure pod identity webhook is not being created.
[hmx@fedora CCO]$  oc get po -n openshift-cloud-credential-operator 
NAME                                        READY   STATUS    RESTARTS   AGE
cloud-credential-operator-78b94ffb4-587rh   2/2     Running   0          3h7m

Expected results:

Additional info:

Tested migration to Azure AD workload Identity on following Azure cluster type:
  1. Default public Azure cluster.
  2. Single-node cluster.
  3. Azure private cluster.
  4. Disconnected Azure cluster.
This issue exists in all of the above cluster types.

https://github.com/openshift/cloud-credential-operator/pull/701

Bug OCPBUGS-29701: Console should be using SelfSubjectReview API from frontend

View the Description View the linked PRs

Description of problem:

Currently console frontend and backend is using OpenShift centric UserKind type. In order for the console to work without OAuth server, iow. with. external OIDC it needs to use k8s UserInfo type, which is retrieved querying SelfSubjectReview API

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Console is not working with external OIDC provider

Expected results:

Console will be working with external OIDC provider

Additional info:

This is mainly an API change.

https://github.com/openshift/console/pull/13605

Bug OCPBUGS-26135: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/75

Bug OCPBUGS-34011: UI should use type "bridge" instead of "cnv-bridge"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33717~~. The following is the description of the original issue:
—
Description of problem:

Starting with OCP 4.14, we have decided to start using OCP's own "bridge" CNI build instead of our "cnv-bridge" rebuild. To make sure that current users of "cnv-bridge" don't have to change their configuration, we kept "cnv-bridge" as a symlink to "bridge". While the old name still functions, we should make an effort to move users to "bridge". To do that, we can start by changing UI so it generates NADs of the type "bridge" instead of "cnv-bridge".

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Use the NetworkAttachmentDefinition dialog to create a network of type bridge
2. Read the generated yaml

Actual results:

It has "type": "cnv-bridge"

Expected results:

It should have "type": "bridge"

Additional info:

The same should be done to any instance of "cnv-tuning" by changing it to "tuning".

https://github.com/openshift/console/pull/13875

Bug OCPBUGS-36210: When switching from ipForwarding: Global to Restricted, sysctl settings are not adjusted

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23758~~. The following is the description of the original issue:
—
When switching from ipForwarding: Global to Restricted, sysctl settings are not adjusted

Switch from:

# oc edit network.operator/cluster
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  annotations:
    networkoperator.openshift.io/ovn-cluster-initiator: 10.19.1.66
  creationTimestamp: "2023-11-22T12:14:46Z"
  generation: 207
  name: cluster
  resourceVersion: "235152"
  uid: 225d404d-4e26-41bf-8e77-4fc44948f239
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  defaultNetwork:
    ovnKubernetesConfig:
      egressIPConfig: {}
      gatewayConfig:
        ipForwarding: Global
(...)

To:

# oc edit network.operator/cluster
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  annotations:
    networkoperator.openshift.io/ovn-cluster-initiator: 10.19.1.66
  creationTimestamp: "2023-11-22T12:14:46Z"
  generation: 207
  name: cluster
  resourceVersion: "235152"
  uid: 225d404d-4e26-41bf-8e77-4fc44948f239
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  defaultNetwork:
    ovnKubernetesConfig:
      egressIPConfig: {}
      gatewayConfig:
        ipForwarding: Restricted

You'll see that the pods are updated:

# oc get pods -o yaml -n openshift-ovn-kubernetes ovnkube-node-fnl9z | grep sysctl -C10
      fi

      admin_network_policy_enabled_flag=
      if [[ "false" == "true" ]]; then
        admin_network_policy_enabled_flag="--enable-admin-network-policy"
      fi

      # If IP Forwarding mode is global set it in the host here.
      ip_forwarding_flag=
      if [ "Restricted" == "Global" ]; then
        sysctl -w net.ipv4.ip_forward=1
        sysctl -w net.ipv6.conf.all.forwarding=1
      else
        ip_forwarding_flag="--disable-forwarding"
      fi

      NETWORK_NODE_IDENTITY_ENABLE=
      if [[ "true" == "true" ]]; then
        NETWORK_NODE_IDENTITY_ENABLE="
          --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig
          --cert-dir=/etc/ovn/ovnkube-node-certs
          --cert-duration=24h

And that ovnkube correctly takes the settings:

# ps aux | grep disable-for
root       74963  0.3  0.0 8085828 153464 ?      Ssl  Nov22   3:38 /usr/bin/ovnkube --init-ovnkube-controller master1.site1.r450.org --init-node master1.site1.r450.org --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --inactivity-probe=180000 --gateway-mode shared --gateway-interface br-ex --metrics-bind-address 127.0.0.1:29103 --ovn-metrics-bind-address 127.0.0.1:29105 --metrics-enable-pprof --metrics-enable-config-duration --export-ovs-metrics --disable-snat-multiple-gws --enable-multi-network --enable-multicast --zone master1.site1.r450.org --enable-interconnect --acl-logging-rate-limit 20 --enable-multi-external-gateway=true --disable-forwarding --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig --cert-dir=/etc/ovn/ovnkube-node-certs --cert-duration=24h
root     2096007  0.0  0.0   3880  2144 pts/0    S+   10:07   0:00 grep --color=auto disable-for

But sysctls are never restricted:

[root@master1 ~]# sysctl -a | grep forward
net.ipv4.conf.0eca9d9e7fd3231.bc_forwarding = 0
net.ipv4.conf.0eca9d9e7fd3231.forwarding = 1
net.ipv4.conf.0eca9d9e7fd3231.mc_forwarding = 0
net.ipv4.conf.21a32cf76c3bcdf.bc_forwarding = 0
net.ipv4.conf.21a32cf76c3bcdf.forwarding = 1
net.ipv4.conf.21a32cf76c3bcdf.mc_forwarding = 0
net.ipv4.conf.22f9bca61beeaba.bc_forwarding = 0
net.ipv4.conf.22f9bca61beeaba.forwarding = 1
net.ipv4.conf.22f9bca61beeaba.mc_forwarding = 0
net.ipv4.conf.2ee438a7201c1f7.bc_forwarding = 0
net.ipv4.conf.2ee438a7201c1f7.forwarding = 1
net.ipv4.conf.2ee438a7201c1f7.mc_forwarding = 0
net.ipv4.conf.3560ce219f7b591.bc_forwarding = 0
net.ipv4.conf.3560ce219f7b591.forwarding = 1
net.ipv4.conf.3560ce219f7b591.mc_forwarding = 0
net.ipv4.conf.507c81eb9944c2e.bc_forwarding = 0
net.ipv4.conf.507c81eb9944c2e.forwarding = 1
net.ipv4.conf.507c81eb9944c2e.mc_forwarding = 0
net.ipv4.conf.6278633ca74482f.bc_forwarding = 0
net.ipv4.conf.6278633ca74482f.forwarding = 1
net.ipv4.conf.6278633ca74482f.mc_forwarding = 0
net.ipv4.conf.68b572ce18f3b82.bc_forwarding = 0
net.ipv4.conf.68b572ce18f3b82.forwarding = 1
net.ipv4.conf.68b572ce18f3b82.mc_forwarding = 0
net.ipv4.conf.7291c80dd47a6f3.bc_forwarding = 0
net.ipv4.conf.7291c80dd47a6f3.forwarding = 1
net.ipv4.conf.7291c80dd47a6f3.mc_forwarding = 0
net.ipv4.conf.76abdac44c6aee7.bc_forwarding = 0
net.ipv4.conf.76abdac44c6aee7.forwarding = 1
net.ipv4.conf.76abdac44c6aee7.mc_forwarding = 0
net.ipv4.conf.7f9abb486611f68.bc_forwarding = 0
net.ipv4.conf.7f9abb486611f68.forwarding = 1
net.ipv4.conf.7f9abb486611f68.mc_forwarding = 0
net.ipv4.conf.8cd86bfb8ea635f.bc_forwarding = 0
net.ipv4.conf.8cd86bfb8ea635f.forwarding = 1
net.ipv4.conf.8cd86bfb8ea635f.mc_forwarding = 0
net.ipv4.conf.8e87bd3f6ddc9f8.bc_forwarding = 0
net.ipv4.conf.8e87bd3f6ddc9f8.forwarding = 1
net.ipv4.conf.8e87bd3f6ddc9f8.mc_forwarding = 0
net.ipv4.conf.91079c8f5c1630f.bc_forwarding = 0
net.ipv4.conf.91079c8f5c1630f.forwarding = 1
net.ipv4.conf.91079c8f5c1630f.mc_forwarding = 0
net.ipv4.conf.92e754a12836f63.bc_forwarding = 0
net.ipv4.conf.92e754a12836f63.forwarding = 1
net.ipv4.conf.92e754a12836f63.mc_forwarding = 0
net.ipv4.conf.a5c01549a6070ab.bc_forwarding = 0
net.ipv4.conf.a5c01549a6070ab.forwarding = 1
net.ipv4.conf.a5c01549a6070ab.mc_forwarding = 0
net.ipv4.conf.a621d1234f0f25a.bc_forwarding = 0
net.ipv4.conf.a621d1234f0f25a.forwarding = 1
net.ipv4.conf.a621d1234f0f25a.mc_forwarding = 0
net.ipv4.conf.all.bc_forwarding = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.br-ex.bc_forwarding = 0
net.ipv4.conf.br-ex.forwarding = 1
net.ipv4.conf.br-ex.mc_forwarding = 0
net.ipv4.conf.br-int.bc_forwarding = 0
net.ipv4.conf.br-int.forwarding = 1
net.ipv4.conf.br-int.mc_forwarding = 0
net.ipv4.conf.c3f3da187245cf6.bc_forwarding = 0
net.ipv4.conf.c3f3da187245cf6.forwarding = 1
net.ipv4.conf.c3f3da187245cf6.mc_forwarding = 0
net.ipv4.conf.c7e518fff8ff973.bc_forwarding = 0
net.ipv4.conf.c7e518fff8ff973.forwarding = 1
net.ipv4.conf.c7e518fff8ff973.mc_forwarding = 0
net.ipv4.conf.d17c6fb6d3dd021.bc_forwarding = 0
net.ipv4.conf.d17c6fb6d3dd021.forwarding = 1
net.ipv4.conf.d17c6fb6d3dd021.mc_forwarding = 0
net.ipv4.conf.default.bc_forwarding = 0
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.eno8303.bc_forwarding = 0
net.ipv4.conf.eno8303.forwarding = 1
net.ipv4.conf.eno8303.mc_forwarding = 0
net.ipv4.conf.eno8403.bc_forwarding = 0
net.ipv4.conf.eno8403.forwarding = 1
net.ipv4.conf.eno8403.mc_forwarding = 0
net.ipv4.conf.ens1f0.bc_forwarding = 0
net.ipv4.conf.ens1f0.forwarding = 1
net.ipv4.conf.ens1f0.mc_forwarding = 0
net.ipv4.conf.ens1f0/3516.bc_forwarding = 0
net.ipv4.conf.ens1f0/3516.forwarding = 1
net.ipv4.conf.ens1f0/3516.mc_forwarding = 0
net.ipv4.conf.ens1f0/3517.bc_forwarding = 0
net.ipv4.conf.ens1f0/3517.forwarding = 1
net.ipv4.conf.ens1f0/3517.mc_forwarding = 0
net.ipv4.conf.ens1f0/3518.bc_forwarding = 0
net.ipv4.conf.ens1f0/3518.forwarding = 1
net.ipv4.conf.ens1f0/3518.mc_forwarding = 0
net.ipv4.conf.ens1f1.bc_forwarding = 0
net.ipv4.conf.ens1f1.forwarding = 1
net.ipv4.conf.ens1f1.mc_forwarding = 0
net.ipv4.conf.ens3f0.bc_forwarding = 0
net.ipv4.conf.ens3f0.forwarding = 1
net.ipv4.conf.ens3f0.mc_forwarding = 0
net.ipv4.conf.ens3f1.bc_forwarding = 0
net.ipv4.conf.ens3f1.forwarding = 1
net.ipv4.conf.ens3f1.mc_forwarding = 0
net.ipv4.conf.fcb6e9468a65d70.bc_forwarding = 0
net.ipv4.conf.fcb6e9468a65d70.forwarding = 1
net.ipv4.conf.fcb6e9468a65d70.mc_forwarding = 0
net.ipv4.conf.fcd96084b7f5a9a.bc_forwarding = 0
net.ipv4.conf.fcd96084b7f5a9a.forwarding = 1
net.ipv4.conf.fcd96084b7f5a9a.mc_forwarding = 0
net.ipv4.conf.genev_sys_6081.bc_forwarding = 0
net.ipv4.conf.genev_sys_6081.forwarding = 1
net.ipv4.conf.genev_sys_6081.mc_forwarding = 0
net.ipv4.conf.lo.bc_forwarding = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.ovn-k8s-mp0.bc_forwarding = 0
net.ipv4.conf.ovn-k8s-mp0.forwarding = 1
net.ipv4.conf.ovn-k8s-mp0.mc_forwarding = 0
net.ipv4.conf.ovs-system.bc_forwarding = 0
net.ipv4.conf.ovs-system.forwarding = 1
net.ipv4.conf.ovs-system.mc_forwarding = 0
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0
net.ipv6.conf.0eca9d9e7fd3231.forwarding = 1
net.ipv6.conf.0eca9d9e7fd3231.mc_forwarding = 0
net.ipv6.conf.21a32cf76c3bcdf.forwarding = 1
net.ipv6.conf.21a32cf76c3bcdf.mc_forwarding = 0
net.ipv6.conf.22f9bca61beeaba.forwarding = 1
net.ipv6.conf.22f9bca61beeaba.mc_forwarding = 0
net.ipv6.conf.2ee438a7201c1f7.forwarding = 1
net.ipv6.conf.2ee438a7201c1f7.mc_forwarding = 0
net.ipv6.conf.3560ce219f7b591.forwarding = 1
net.ipv6.conf.3560ce219f7b591.mc_forwarding = 0
net.ipv6.conf.507c81eb9944c2e.forwarding = 1
net.ipv6.conf.507c81eb9944c2e.mc_forwarding = 0
net.ipv6.conf.6278633ca74482f.forwarding = 1
net.ipv6.conf.6278633ca74482f.mc_forwarding = 0
net.ipv6.conf.68b572ce18f3b82.forwarding = 1
net.ipv6.conf.68b572ce18f3b82.mc_forwarding = 0
net.ipv6.conf.7291c80dd47a6f3.forwarding = 1
net.ipv6.conf.7291c80dd47a6f3.mc_forwarding = 0
net.ipv6.conf.76abdac44c6aee7.forwarding = 1
net.ipv6.conf.76abdac44c6aee7.mc_forwarding = 0
net.ipv6.conf.7f9abb486611f68.forwarding = 1
net.ipv6.conf.7f9abb486611f68.mc_forwarding = 0
net.ipv6.conf.8cd86bfb8ea635f.forwarding = 1
net.ipv6.conf.8cd86bfb8ea635f.mc_forwarding = 0
net.ipv6.conf.8e87bd3f6ddc9f8.forwarding = 1
net.ipv6.conf.8e87bd3f6ddc9f8.mc_forwarding = 0
net.ipv6.conf.91079c8f5c1630f.forwarding = 1
net.ipv6.conf.91079c8f5c1630f.mc_forwarding = 0
net.ipv6.conf.92e754a12836f63.forwarding = 1
net.ipv6.conf.92e754a12836f63.mc_forwarding = 0
net.ipv6.conf.a5c01549a6070ab.forwarding = 1
net.ipv6.conf.a5c01549a6070ab.mc_forwarding = 0
net.ipv6.conf.a621d1234f0f25a.forwarding = 1
net.ipv6.conf.a621d1234f0f25a.mc_forwarding = 0
net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.all.mc_forwarding = 0
net.ipv6.conf.br-ex.forwarding = 1
net.ipv6.conf.br-ex.mc_forwarding = 0
net.ipv6.conf.br-int.forwarding = 1
net.ipv6.conf.br-int.mc_forwarding = 0
net.ipv6.conf.c3f3da187245cf6.forwarding = 1
net.ipv6.conf.c3f3da187245cf6.mc_forwarding = 0
net.ipv6.conf.c7e518fff8ff973.forwarding = 1
net.ipv6.conf.c7e518fff8ff973.mc_forwarding = 0
net.ipv6.conf.d17c6fb6d3dd021.forwarding = 1
net.ipv6.conf.d17c6fb6d3dd021.mc_forwarding = 0
net.ipv6.conf.default.forwarding = 1
net.ipv6.conf.default.mc_forwarding = 0
net.ipv6.conf.eno8303.forwarding = 1
net.ipv6.conf.eno8303.mc_forwarding = 0
net.ipv6.conf.eno8403.forwarding = 1
net.ipv6.conf.eno8403.mc_forwarding = 0
net.ipv6.conf.ens1f0.forwarding = 1
net.ipv6.conf.ens1f0.mc_forwarding = 0
net.ipv6.conf.ens1f0/3516.forwarding = 0
net.ipv6.conf.ens1f0/3516.mc_forwarding = 0
net.ipv6.conf.ens1f0/3517.forwarding = 0
net.ipv6.conf.ens1f0/3517.mc_forwarding = 0
net.ipv6.conf.ens1f0/3518.forwarding = 0
net.ipv6.conf.ens1f0/3518.mc_forwarding = 0
net.ipv6.conf.ens1f1.forwarding = 1
net.ipv6.conf.ens1f1.mc_forwarding = 0
net.ipv6.conf.ens3f0.forwarding = 1
net.ipv6.conf.ens3f0.mc_forwarding = 0
net.ipv6.conf.ens3f1.forwarding = 1
net.ipv6.conf.ens3f1.mc_forwarding = 0
net.ipv6.conf.fcb6e9468a65d70.forwarding = 1
net.ipv6.conf.fcb6e9468a65d70.mc_forwarding = 0
net.ipv6.conf.fcd96084b7f5a9a.forwarding = 1
net.ipv6.conf.fcd96084b7f5a9a.mc_forwarding = 0
net.ipv6.conf.genev_sys_6081.forwarding = 1
net.ipv6.conf.genev_sys_6081.mc_forwarding = 0
net.ipv6.conf.lo.forwarding = 1
net.ipv6.conf.lo.mc_forwarding = 0
net.ipv6.conf.ovn-k8s-mp0.forwarding = 1
net.ipv6.conf.ovn-k8s-mp0.mc_forwarding = 0
net.ipv6.conf.ovs-system.forwarding = 1
net.ipv6.conf.ovs-system.mc_forwarding = 0

It's logical that this is happening, because nowhere in the code is there a mechanism to tune the global sysctl back to 0 when the mode is switched from `Global` to `Restricted`. There's also no mechanism to sequentially reboot the nodes so that they'd reboot back to their defaults (= sysctl ip forward off).

Bug OCPBUGS-39564: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/insights-operator/pull/1013

Bug OCPBUGS-30200: Power VS: Removal of unmaintained package (bluemix-go)

View the Description View the linked PRs

Description of problem:

    Package that we use for Power VS has recently been revealed to be unmaintained. We should remove it in favor of maintained solutions.

Version-Release number of selected component (if applicable):

    4.13.0 onward

How reproducible:

It's always used

Steps to Reproduce:

    1. Deploy with IPI on Power VS
    2. Use bluemix-go
    3.

Actual results:

    bluemix-go is used

Expected results:

bluemix-go should be avoided

Additional info:

https://github.com/openshift/installer/pull/8025

Bug OCPBUGS-35515: Feature parity with BYO Public IPv4 terraform in CAPA

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35504~~. The following is the description of the original issue:
—
Description of problem:

The BYO Public IPv4 feature[1] for AWS added on Terraform version[2] was merged on capi upstream/CAPA[3] after branch cut. The installer PR supporting CAPA provisioning BYO IPv4 was also merged[4] in the active branch (4.17).

The feature is exercised by CI tests[5][6], the step[7] is running by default on CI runs to consume from existing CI IPv4 Pool when using terraform version.

[1] https://issues.redhat.com/browse/OCPSTRAT-1154
[2] https://issues.redhat.com/browse/SPLAT-1432 https://github.com/openshift/installer/pull/7983
[3] https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4905
[4] https://github.com/openshift/installer/pull/7983
[5] https://github.com/openshift/release/pull/48467
[6] https://github.com/openshift/release/pull/50653
[7] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/8592/pull-ci-openshift-installer-master-e2e-aws-ovn/1801525554881499136/artifacts/e2e-aws-ovn/ipi-conf-aws-byo-ipv4-pool-public/build-log.txt

Version-Release number of selected component (if applicable):

4.16

How reproducible:

always

Steps to Reproduce:

1. create a cluster with setting platform.aws.publicIpv4Pool on install-config.yaml.
2. create a cluster with CAPA on 4.16

Actual results:

the field will be ignored

Expected results:

installer provision resources claiming public IPv4 IPs from custom pools provided by AWS.

Additional info:

https://github.com/openshift/installer/pull/8608

Bug OCPBUGS-20368: [4.15] E2E Automation of Dynamic OVS Pinning

View the Description View the linked PRs

Description of problem:

Automate E2E tests of Dynamic OVS Pinning. This bug is created for merging

https://github.com/openshift/cluster-node-tuning-operator/pull/746

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/746

Bug OCPBUGS-33916: oc command cannot be used with RHEL 8 based bastion on s390x

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33651~~. The following is the description of the original issue:
—
Description of problem:

oc command cannot be used with RHEL 8 based bastion

Version-Release number of selected component (if applicable):

    4.16.0-rc.1

How reproducible:

    Very

Steps to Reproduce:

    1. Have a bastion for z/VM installation at Red Hat Enterprise Linux release 8.9 (Ootpa) 
    2. Download and install the 4.16.0-rc.1 client on the bastion
    3.Attempt to use the oc command

Actual results:

    oc get nodes
oc: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by oc)
oc: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by oc)
oc: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by oc)

Expected results:

    oc command returns without error

Additional info:

    This was introduced in 4.16.0-rc.1 - 4.16.0-rc.0 works fine

https://github.com/openshift/oc/pull/1770

Bug OCPBUGS-42125: [IBMCloud] CCM liveness probe in failure loop

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41941~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41936. The following is the description of the original issue:
—
Description of problem:

IBM Cloud CCM was reconfigured to use loopback as the bind address in 4.16. However, the liveness probe was not configured to use loopback too, so the CCM constantly fails the liveness probe and restarts continuously.

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%

Steps to Reproduce:

    1. Create a IPI cluster on IBM Cloud
    2. Watch the IBM Cloud CCM pod and restarts, increase every 5 mins (liveness probe timeout)

Actual results:

    # oc --kubeconfig cluster-deploys/eu-de-4.17-rc2-3/auth/kubeconfig get po -n openshift-cloud-controller-manager
NAME                                            READY   STATUS             RESTARTS          AGE
ibm-cloud-controller-manager-58f7747d75-j82z8   0/1     CrashLoopBackOff   262 (39s ago)     23h
ibm-cloud-controller-manager-58f7747d75-l7mpk   0/1     CrashLoopBackOff   261 (2m30s ago)   23h



  Normal   Killing     34m (x2 over 40m)    kubelet            Container cloud-controller-manager failed liveness probe, will be restarted
  Normal   Pulled      34m (x2 over 40m)    kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ac9fb24a0e051aba6b16a1f9b4b3f9d2dd98f33554844953dd4d1e504fb301e" already present on machine
  Normal   Created     34m (x3 over 45m)    kubelet            Created container cloud-controller-manager
  Normal   Started     34m (x3 over 45m)    kubelet            Started container cloud-controller-manager
  Warning  Unhealthy   29m (x8 over 40m)    kubelet            Liveness probe failed: Get "https://10.242.129.4:10258/healthz": dial tcp 10.242.129.4:10258: connect: connection refused
  Warning  ProbeError  3m4s (x22 over 40m)  kubelet            Liveness probe error: Get "https://10.242.129.4:10258/healthz": dial tcp 10.242.129.4:10258: connect: connection refused
body:

Expected results:

    CCM runs continuously, as it does on 4.15

# oc --kubeconfig cluster-deploys/eu-de-4.15.10-1/auth/kubeconfig get po -n openshift-cloud-controller-manager
NAME                                            READY   STATUS    RESTARTS   AGE
ibm-cloud-controller-manager-66d4779cb8-gv8d4   1/1     Running   0          63m
ibm-cloud-controller-manager-66d4779cb8-pxdrs   1/1     Running   0          63m

Additional info:

    IBM Cloud have a PR open to fix the liveness probe.
https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/360

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/366

Task OSASINFRA-3420: Decouple OpenStack API calls from Machine generation

View the Description View the linked PRs

To facilitate testing manifest generation, extract OpenStack API calls from the function body.

https://github.com/openshift/installer/pull/8187

Bug OCPBUGS-24838: Update 4.16 ose-vmware-vsphere-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/197

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/197

Bug OCPBUGS-23862: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/100

Bug OCPBUGS-26067: IP and CIDR CEL validation for OpenShift 4.16

View the Description View the linked PRs

Description of problem:

We would like to include the CEL IP and CIDR validations in 4.16. They have been mergeded upstream and can be backported into OpenShift to improve out validation downstream.

Upstream PR: https://github.com/kubernetes/kubernetes/pull/121912

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1828

Bug OCPBUGS-36577: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-25740: Update 4.16 ose-node-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/599

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/599

Bug OCPBUGS-29567: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/106

Bug OCPBUGS-25483: LB not getting External-IP

View the Description View the linked PRs

Description of problem:

A regression was identified creating LoadBalancer services in ARO in new 4.14 clusters (handled for new installations in OCPBUGS-24191)

The same regression has been also confirmed in ARO clusters upgraded to 4.14

Version-Release number of selected component (if applicable):

4.14.z

How reproducible:

On any ARO cluster upgraded to 4.14.z

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade to 4.14 from fast channel
    3. oc create svc loadbalancer test-lb -n default --tcp 80:8080

Actual results:

# External-IP stuck in Pending
$ oc get svc test-lb -n default
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
test-lb   LoadBalancer   172.30.104.200   <pending>     80:30062/TCP   15m


# Errors in cloud-controller-manager being unable to map VM to nodes
$ oc logs -l infrastructure.openshift.io/cloud-controller-manager=Azure  -n openshift-cloud-controller-manager
I1215 19:34:51.843715       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(default/test-lb) - wantLb(true): started
I1215 19:34:51.844474       1 event.go:307] "Event occurred" object="default/test-lb" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1215 19:34:52.253569       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-r5iks3dh) success
I1215 19:34:52.253632       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(default/test-lb): lb(aro-r5iks3dh/mabad-test-74km6) wantLb(true) resolved load balancer name
I1215 19:34:52.528579       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
E1215 19:34:52.714678       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-r5iks3dh/providers/Microsoft.Network/networkInterfaces/mabad-test-74km6-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
E1215 19:34:52.714888       1 azure_loadbalancer.go:126] reconcileLoadBalancer(default/test-lb) failed: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
I1215 19:34:52.714956       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.871261893 request="services_ensure_loadbalancer" resource_group="aro-r5iks3dh" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="default/test-lb" result_code="failed_ensure_loadbalancer"
E1215 19:34:52.715005       1 controller.go:291] error processing service default/test-lb (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0

Expected results:

# The LoadBalancer gets an External-IP assigned
$ oc get svc test-lb -n default 
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP                            PORT(S)        AGE 
test-lb      LoadBalancer   172.30.193.159   20.242.180.199                         80:31475/TCP   14s

Additional info:

In cloud-provider-config cm in openshift-config namespace, vmType=""

When vmType gets changed to "standard" explicitly, the provisioning of the LoadBalancer completes and an ExternalIP gets assigned without errors.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/316

Bug OCPBUGS-42420: Continuous pull-secret updates / slow initialization on build01 (test platform infrastructure)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42362~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42106. The following is the description of the original issue:
—
Description of problem:

Test Platform has detected a large increase in the amount of time spent waiting for pull secrets to be initialized.
Monitoring the audit log, we can see nearly continuous updates to the SA pull secrets in the cluster (~2 per minute for every SA pull secret in the cluster).

Controller manager is filled with entries like: 
- "Internal registry pull secret auth data does not contain the correct number of entries" ns="ci-op-tpd3xnbx" name="deployer-dockercfg-p9j54" expected=5 actual=4"
- "Observed image registry urls" urls=["172.30.228.83:5000","image-registry.openshift-image-registry.svc.cluster.local:5000","image-registry.openshift-image-registry.svc:5000","registry.build01.ci.openshift.org","registry.build01.ci.openshift.org"

In this "Observed image registry urls" log line, notice the duplicate entries for "registry.build01.ci.openshift.org" . We are not sure what is causing this but it leads to duplicate entry, but when actualized in a pull secret map, the double entry is reduced to one. So the controller-manager finds the cardinality mismatch on the next check.

The duplication is evident in OpenShiftControllerManager/cluster:
      dockerPullSecret:
        internalRegistryHostname: image-registry.openshift-image-registry.svc:5000
        registryURLs:
        - registry.build01.ci.openshift.org
        - registry.build01.ci.openshift.org


But there is only one hostname in config.imageregistry.operator.openshift.io/cluster:
  routes:
  - hostname: registry.build01.ci.openshift.org
    name: public-routes
    secretName: public-route-tls

Version-Release number of selected component (if applicable):

4.17.0-rc.3

How reproducible:

Constant on build01 but not on other build farms

Steps to Reproduce:

    1. Something ends up creating duplicate entries in the observed configuration of the openshift-controller-manager.
    2.
    3.

Actual results:

- Approximately 400K secret patches an hour on build01 vs ~40K on other build farms. Intialization times have increased by two orders of magnitude in new ci-operator namespaces.    
- The openshift-controller-manager is hot looping and experiencing client throttling.

Expected results:

1. Initialization of pull secrets in a namespace should take < 1 seconds. On build01, it can take over 1.5 minutes.
2. openshift-controller-manager should not possess duplicate entries.
3. If duplicate entries are a configuration error, openshift-controller-manager should de-dupe the entries.
4. There should be alerting when the openshift-controller-manager experiences client-side throttling / pathological behavior.

Additional info:

Task MON-3175: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/512

Bug OCPBUGS-25565: Update 4.16 ose-aws-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/491

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/499

Bug OU-309: Tables in dashboards are not merging multiple query responses correctly

View the Description View the linked PRs

A table in a dashboard relies on the order of the metric labels to merge results

How to Reproduce:

Create a dashboard with a table including this query:

label_replace(sort_desc(sum(sum_over_time(ALERTS{alertstate="firing"}[24h])) by ( alertstate, alertname)), "aaa", "$1", "alertstate", "(.+)")

A single row will be displayed as the query is simulating that the first label `aaa` has a single value.

Expected result:

The table should not rely on a single metric label to merge results but consider all the labels so the expected rows are displayed.

https://github.com/openshift/monitoring-plugin/pull/96

Spike SPLAT-1599: Enhance UPI pwsh to support setting storage policy and secure boot

View the linked PRs

Story TRT-1403: Port remaining intervals to structured

View the Description View the linked PRs

I was identifying what remains with:

cat e2e-events_20231204-183144.json| jq '.items[] | select(has("tempSource") | not)'

I think I've cleared all the difficult ones, hopefully these are just simple stragglers.

https://github.com/openshift/origin/pull/28463

Bug OCPBUGS-32210: assisted-installer-controller logs should be included in must-gather data

View the Description View the linked PRs

The agent-based installer and assisted-installer create a Deployment named assisted-installer-controller in the assisted-installer namespace. This deployment is responsible for running the assisted-installer-controller to finalise the installation, mainly by updating the status of the Nodes in the assisted-service API. It's also required to be able to install platform:vsphere without credentials in 4.13 and above.

We want the logs for this pod to be included in the must-gather file, so that we can easily debug any installation issues caused by this process. Currently it is not.

https://github.com/openshift/must-gather/pull/417

Bug OCPBUGS-30239: kdump doesn't create the dumpfile via ssh with OVN

View the Description View the linked PRs

kdump crash logs are not created to the SSH remote when OVN is configured.

See https://issues.redhat.com/browse/OCPBUGS-28239

https://github.com/openshift/machine-config-operator/pull/4213

Bug OCPBUGS-24915: Update 4.16 ose-cluster-kube-scheduler-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/515

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-28666: openshift/openshift-controller-manager-operator - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/326

Bug OCPBUGS-24382: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2262

Bug OCPBUGS-24638: Tuned Profiles going degraded due to the extra net.core.rps_default_mask configuration in openshift-node-performance-xxx-profile

View the Description View the linked PRs

Description of problem:
Issue - Profiles are degraded [1]even after applied due to below [2]error:

[1]

$oc get profile -A
NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   worker0    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker1    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker10   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker11   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker12   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker13   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker14   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker15   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker2    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker3    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker4  rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker5    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker6    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker7    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker8   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker9   rdpmc-patch-worker   True      True       5d

[2]

  lastTransitionTime: "2023-12-05T22:43:12Z"
    message: TuneD daemon issued one or more sysctl override message(s) during profile
      application. Use reapply_sysctl=true or remove conflicting sysctl net.core.rps_default_mask
    reason: TunedSysctlOverride
    status: "True"

If we see in rdpmc-patch-master tuned:

NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d

We are configuring below in rdpmc-patch-master tuned:

$ oc get tuned rdpmc-patch-master -n openshift-cluster-node-tuning-operator -oyaml |less
spec:
  profile:
  - data: |
      [main]
      include=performance-patch-master
      [sysfs]
      /sys/devices/cpu/rdpmc = 2
    name: rdpmc-patch-master
  recommend:

Below in Performance-patch-master which is included in above tuned:

spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile to adjust performance
      include=openshift-node-performance-master-profile
      [bootloader]
      cmdline_removeKernelArgs=-nohz_full=${isolated_cores}

Below(which is coming in error) is in openshift-node-performance-master-profile included in above tuned:

net.core.rps_default_mask=${not_isolated_cpumask}

RHEL BUg has been raised for the same https://issues.redhat.com/browse/RHEL-18972

    Version-Release number of selected component (if applicable):{code:none}
4.14

https://github.com/openshift/cluster-node-tuning-operator/pull/868

Bug OCPBUGS-34069: Support "ChunkSizeMiB" feature gate via api bump

View the Description View the linked PRs

Description of problem:

    The current api version used by the registry operator does not include the recently added "ChunkSizeMiB" feature gate. We need to bump the openshift/api to latest so that this feature gate becomes available for use.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1039

Bug MGMT-17369: Missing multi-arch release image in ".../openshift-versions?only_latest=true"

View the Description View the linked PRs

Description of the problem:

In .../openshift-versions?only_latest=true, the multi-arch release images are not retrieved as well.

How reproducible:

Always

Steps to reproduce:

1. Run master assisted-service

2. curl ".../openshift-versions?only_latest=true"

Actual results:

{
  "4.10.67": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.10.67",
    "support_level": "production"
  },
  "4.11.58": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.11.58",
    "support_level": "production"
  },
  "4.12.53": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.12.53",
    "support_level": "production"
  },
  "4.13.38": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.13.38",
    "support_level": "production"
  },
  "4.14.18": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.14.18",
    "support_level": "production"
  },
  "4.15.3": {
    "cpu_architectures": [
      "x86_64"
    ],
    "default": true,
    "display_name": "4.15.3",
    "support_level": "production"
  },
  "4.16.0-ec.4": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.16.0-ec.4",
    "support_level": "beta"
  },
  "4.9.59": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.9.59",
    "support_level": "production"
  }
}

Expected results:

{
  "4.10.67": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "display_name": "4.10.67",
    "support_level": "production"
  },
  "4.11.0-multi": {
    "cpu_architectures": [
      "x86_64",
      "arm64",
      "ppc64le",
      "s390x"
    ],
    "display_name": "4.11.0-multi",
    "support_level": "production"
  },
  "4.11.58": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "display_name": "4.11.58",
    "support_level": "production"
  },
  "4.12.53": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "display_name": "4.12.53",
    "support_level": "production"
  },
  "4.12.53-multi": {
    "cpu_architectures": [
      "x86_64",
      "arm64",
      "ppc64le",
      "s390x"
    ],
    "display_name": "4.12.53-multi",
    "support_level": "production"
  },
  "4.13.38": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "display_name": "4.13.38",
    "support_level": "production"
  },
  "4.13.38-multi": {
    "cpu_architectures": [
      "x86_64",
      "arm64",
      "ppc64le",
      "s390x"
    ],
    "display_name": "4.13.38-multi",
    "support_level": "production"
  },
  "4.14.18": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "display_name": "4.14.18",
    "support_level": "production"
  },
  "4.14.18-multi": {
    "cpu_architectures": [
      "x86_64",
      "arm64",
      "ppc64le",
      "s390x"
    ],
    "display_name": "4.14.18-multi",
    "support_level": "production"
  },
  "4.15.3": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "default": true,
    "display_name": "4.15.3",
    "support_level": "production"
  },
  "4.15.3-multi": {
    "cpu_architectures": [
      "x86_64",
      "arm64",
      "ppc64le",
      "s390x"
    ],
    "display_name": "4.15.3-multi",
    "support_level": "production"
  },
  "4.16.0-ec.4": {
    "cpu_architectures": [
      "x86_64",
      "arm64"
    ],
    "display_name": "4.16.0-ec.4",
    "support_level": "beta"
  },
  "4.16.0-ec.4-multi": {
    "cpu_architectures": [
      "x86_64",
      "arm64",
      "ppc64le",
      "s390x"
    ],
    "display_name": "4.16.0-ec.4-multi",
    "support_level": "beta"
  },
  "4.9.59": {
    "cpu_architectures": [
      "x86_64"
    ],
    "display_name": "4.9.59",
    "support_level": "production"
  }
}

https://github.com/openshift/assisted-service/pull/6119

Bug OCPBUGS-32517: Missing worker nodes on metal

View the Description View the linked PRs

Seeing CI jobs with

> level=error msg=ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).

search shows 65 hits in the last 7 days

https://search.dptools.openshift.org/?search=Got+0+worker+nodes%2C+3+master+nodes&maxAge=168h&context=-1&type=build-log&name=metal-ipi&excludeName=&maxMatches=1&maxBytes=20971520&groupBy=none

https://github.com/openshift/installer/pull/8299

Bug OCPBUGS-35384: 4.16 CAPI installer is unable to create AWS V2 loadbalancer Security Groups

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35293~~. The following is the description of the original issue:
—
Description of problem:


4.16 installs fail for ROSA STS installations

time="2024-06-11T14:05:48Z" level=debug msg="\t[failed to apply security groups to load balancer \"jamesh-sts-52g29-int\": AccessDenied: User: arn:aws:sts::476950216884:assumed-role/ManagedOpenShift-Installer-Role/1718114695748673685 is not authorized to perform: elasticloadbalancing:SetSecurityGroups on resource: arn:aws:elasticloadbalancing:us-east-1:476950216884:loadbalancer/net/jamesh-sts-52g29-int/bf7ef748daa739ce because no identity-based policy allows the elasticloadbalancing:SetSecurityGroups action"

Version-Release number of selected component (if applicable):


4.16+

How reproducible:


Every time

Steps to Reproduce:

1. Create an installer policy with the permissions listed in the installer [here|https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/permissions.go]
2. Run a install in AWS IPI

Actual results:


The installer fails to install a cluster in AWS

The installer log should show AccessDenied messages for the IAM action elasticloadbalancing:SetSecurityGroups 

The installer should show the error message "failed to apply security groups to load balancer"

Expected results:


Install completes successfully

Additional info:


Managed OpenShift (ROSA) installs STS clusters with [this|https://github.com/openshift/managed-cluster-config/blob/master/resources/sts/4.16/sts_installer_permission_policy.json] permission policy for the installer which should be what is required from the installer [policy|https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/permissions.go] plus permissions needed for OCM to do pre install validation.

https://github.com/openshift/installer/pull/8589

Bug OCPBUGS-20336: Upgrade from 4.13.13 to 4.14rc2 failed at 250 nodes.

View the Description View the linked PRs

Description of problem:
While upgrading a loaded 250 node ROSA cluster from 4.13.13 to 4.14.rc2 the cluster failed to upgrade and was stuck at when network operator was trying
to upgrade.
Around 20 multus pods were in CrashLookpack state with the log

oc logs multus-4px8t
2023-10-10T00:54:34+00:00 [cnibincopy] Successfully copied files in /usr/src/multus-cni/rhel9/bin/ to /host/opt/cni/bin/upgrade_6dcb644a-4164-42a5-8f1e-4ae2c04dc315
2023-10-10T00:54:34+00:00 [cnibincopy] Successfully moved files in /host/opt/cni/bin/upgrade_6dcb644a-4164-42a5-8f1e-4ae2c04dc315 to /host/opt/cni/bin/
2023-10-10T00:54:34Z [verbose] multus-daemon started
2023-10-10T00:54:34Z [verbose] Readiness Indicator file check
2023-10-10T00:55:19Z [error] have you checked that your default network is ready? still waiting for readinessindicatorfile @ /host/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition

https://github.com/openshift/ovn-kubernetes/pull/2057

Bug OCPBUGS-30466: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/79

Bug OCPBUGS-25050: Update 4.16 ose-azure-disk-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-disk-csi-driver/pull/65

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-disk-csi-driver/pull/67

Bug OCPBUGS-26492: Operation cannot be fulfilled on networks.operator.openshift.io during OVN live migration

View the Description View the linked PRs

Description of problem:

Operation cannot be fulfilled on networks.operator.openshift.io during OVN live migration

Version-Release number of selected component (if applicable):

How reproducible:

Not always

Steps to Reproduce:

1. Enable features of egressfirewall, externalIP,multicast, multus, network-policy, service-idle.
2. Start migrate SDN to OVN cluster

Actual results:

[weliang@weliang ~]$ oc delete validatingwebhookconfigurations.admissionregistration.k8s.io/sre-techpreviewnoupgrade-validation
validatingwebhookconfiguration.admissionregistration.k8s.io "sre-techpreviewnoupgrade-validation" deleted
[weliang@weliang ~]$ oc edit featuregate cluster
featuregate.config.openshift.io/cluster edited
[weliang@weliang ~]$ oc get node
NAME                          STATUS   ROLES                  AGE   VERSION
ip-10-0-20-154.ec2.internal   Ready    control-plane,master   86m   v1.28.5+9605db4
ip-10-0-45-93.ec2.internal    Ready    worker                 80m   v1.28.5+9605db4
ip-10-0-49-245.ec2.internal   Ready    worker                 74m   v1.28.5+9605db4
ip-10-0-57-37.ec2.internal    Ready    infra,worker           60m   v1.28.5+9605db4
ip-10-0-60-0.ec2.internal     Ready    infra,worker           60m   v1.28.5+9605db4
ip-10-0-62-121.ec2.internal   Ready    control-plane,master   86m   v1.28.5+9605db4
ip-10-0-62-56.ec2.internal    Ready    control-plane,master   86m   v1.28.5+9605db4
[weliang@weliang ~]$ for f in $(oc get nodes -o jsonpath='{.items[*].metadata.name}') ; do oc debug node/"${f}" --  chroot /host cat /etc/kubernetes/kubelet.conf | grep NetworkLiveMigration ; done
Starting pod/ip-10-0-20-154ec2internal-debug-9wvd8 ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-45-93ec2internal-debug-rwvls ...
To use host binaries, run `chroot /host`
    "NetworkLiveMigration": true,Removing debug pod ...
Starting pod/ip-10-0-49-245ec2internal-debug-rp9dt ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-57-37ec2internal-debug-q5thk ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-60-0ec2internal-debug-zp78h ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-62-121ec2internal-debug-42k2g ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
Starting pod/ip-10-0-62-56ec2internal-debug-s99ls ...
To use host binaries, run `chroot /host`Removing debug pod ...
    "NetworkLiveMigration": true,
[weliang@weliang ~]$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/live-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
network.config.openshift.io/cluster patched
[weliang@weliang ~]$ 
[weliang@weliang ~]$ oc get co network
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
network   4.15.0-0.nightly-2024-01-06-062415   True        False         True       4h1m    Internal error while updating operator configuration: could not apply (/, Kind=) /cluster, err: failed to apply / update (operator.openshift.io/v1, Kind=Network) /cluster: Operation cannot be fulfilled on networks.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
[weliang@weliang ~]$ oc get node
NAME                          STATUS   ROLES                  AGE     VERSION
ip-10-0-2-52.ec2.internal     Ready    worker                 3h54m   v1.28.5+9605db4
ip-10-0-26-16.ec2.internal    Ready    control-plane,master   4h2m    v1.28.5+9605db4
ip-10-0-32-116.ec2.internal   Ready    worker                 3h54m   v1.28.5+9605db4
ip-10-0-32-67.ec2.internal    Ready    infra,worker           3h38m   v1.28.5+9605db4
ip-10-0-35-11.ec2.internal    Ready    infra,worker           3h39m   v1.28.5+9605db4
ip-10-0-39-125.ec2.internal   Ready    control-plane,master   4h2m    v1.28.5+9605db4
ip-10-0-6-117.ec2.internal    Ready    control-plane,master   4h2m    v1.28.5+9605db4
[weliang@weliang ~]$ oc get Network.operator.openshift.io/cluster -o json
{
    "apiVersion": "operator.openshift.io/v1",
    "kind": "Network",
    "metadata": {
        "creationTimestamp": "2024-01-08T13:28:07Z",
        "generation": 417,
        "name": "cluster",
        "resourceVersion": "236888",
        "uid": "37fb36f0-c13c-476d-aea1-6ebc1c87abe8"
    },
    "spec": {
        "clusterNetwork": [
            {
                "cidr": "10.128.0.0/14",
                "hostPrefix": 23
            }
        ],
        "defaultNetwork": {
            "openshiftSDNConfig": {
                "enableUnidling": true,
                "mode": "NetworkPolicy",
                "mtu": 8951,
                "vxlanPort": 4789
            },
            "ovnKubernetesConfig": {
                "egressIPConfig": {},
                "gatewayConfig": {
                    "ipv4": {},
                    "ipv6": {},
                    "routingViaHost": false
                },
                "genevePort": 6081,
                "mtu": 8901,
                "policyAuditConfig": {
                    "destination": "null",
                    "maxFileSize": 50,
                    "maxLogFiles": 5,
                    "rateLimit": 20,
                    "syslogFacility": "local0"
                }
            },
            "type": "OVNKubernetes"
        },
        "deployKubeProxy": false,
        "disableMultiNetwork": false,
        "disableNetworkDiagnostics": false,
        "kubeProxyConfig": {
            "bindAddress": "0.0.0.0"
        },
        "logLevel": "Normal",
        "managementState": "Managed",
        "migration": {
            "mode": "Live",
            "networkType": "OVNKubernetes"
        },
        "observedConfig": null,
        "operatorLogLevel": "Normal",
        "serviceNetwork": [
            "172.30.0.0/16"
        ],
        "unsupportedConfigOverrides": null,
        "useMultiNetworkPolicy": false
    },
    "status": {
        "conditions": [
            {
                "lastTransitionTime": "2024-01-08T13:28:07Z",
                "status": "False",
                "type": "ManagementStateDegraded"
            },
            {
                "lastTransitionTime": "2024-01-08T17:29:52Z",
                "status": "False",
                "type": "Degraded"
            },
            {
                "lastTransitionTime": "2024-01-08T13:28:07Z",
                "status": "True",
                "type": "Upgradeable"
            },
            {
                "lastTransitionTime": "2024-01-08T17:26:38Z",
                "status": "False",
                "type": "Progressing"
            },
            {
                "lastTransitionTime": "2024-01-08T13:28:20Z",
                "status": "True",
                "type": "Available"
            }
        ],
        "readyReplicas": 0,
        "version": "4.15.0-0.nightly-2024-01-06-062415"
    }
}
[weliang@weliang ~]$

Expected results:

OVN live migration pass

Additional info:

must-gather: https://people.redhat.com/~weliang/must-gather1.tar.gz

https://github.com/openshift/cluster-network-operator/pull/2192

Vulnerability OCPBUGS-42424: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/monitoring-plugin/pull/197

Bug OCPBUGS-43473: Cloud controller manager operator can fail when running goimports through fmt make target

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43389~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43157. The following is the description of the original issue:
—
Description of problem:

    When running the `make fmt` target in the repository the command can fail due to a mismatch of versions between the go language and the goimports dependency.

Version-Release number of selected component (if applicable):

    4.16.z

How reproducible:

    always

Steps to Reproduce:

    1.checkout release-4.16 branch
    2.run `make fmt`

Actual results:

INFO[2024-10-01T14:41:15Z] make fmt make[1]: Entering directory '/go/src/github.com/openshift/cluster-cloud-controller-manager-operator' hack/goimports.sh go: downloading golang.org/x/tools v0.25.0 go: golang.org/x/tools/cmd/goimports@latest: golang.org/x/tools@v0.25.0 requires go >= 1.22.0 (running go 1.21.11; GOTOOLCHAIN=local)

Expected results:

    successful completion of `make fmt`

Additional info:

    our goimports.sh script file reference `goimports@latest` which means that this problem will most likely affect older branches as well. we will need to set a specific version of the goimports package for those branches.

given that the CCCMO includes golangci-lint, and uses it for a test, we should include goimports through golangci-lint which will solve this problem without needing special versions of goimports.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/372

Bug OCPBUGS-30282: Multiple MachineConfigs in one CM

View the Description View the linked PRs

Description of problem:

When Hypershift runs large number of hosted clusters (> 370), the management cluster ETCD fills up and Hypershift begins to fail. One way to reduce the ETCD size while improving its performance is to reduce the number of stored objects, like config-maps.

Currently if a hosted cluster's NodePool needs to reference multiple MachineConfig objects, each of those MachineConfigs has to be in its own config-map (referenced in NodePool spec.config). To reduce the number of config-maps Hypertshift needs ability to extract multiple MachineConfig objects from a single config-map.

Currently if multiple MachineConfig objects is placed into a config-map, only the first one is recognized by the Nodepool Controller, all others are ignored.

Nodepool controller code fix is required to support multiple MachineConfig objects in ignition-config config-maps.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Create Hypershift-hosted cluster
    2. Patch the cluster's default NodePool's spec.config to reference a single config-map that has multiple MachineConfig yamls inside it.
    3. Obtain ignition data from the ignition server.

Actual results:

    The ignition has data from the first MachineConfig object inside the config-map, but all other MachineConfig objects are not there.

Expected results:

    The ignition should have all MachineConfig object inside the config-map.

Additional info:

Example of NodePool:
```
apiVersion: hypershift.openshift.io/v1beta1
kind: NodePool
...
spec:
  arch: amd64
  clusterName: cg319sf10ghnddkvo8j0
  config:
  - name: ignition-config-98-ibm-machineconfig-cg319sf10ghnddkvo8j0    
...
```

The ignition-config-98-ibm-machineconfig-cg319sf10ghnddkvo8j0 config-map has multiple MachineConfig yamls inside it separated by "---":
```
apiVersion: v1
data:
  config: |+
    ---
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 97-ibm-machineconfig-base
    spec:
      config:
        ignition:
          version: 2.2.0
        storage:
          files:
          - contents:
...
    ---
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 98-ibm-machineconfig-satellite
    spec:
      config:
        ignition:
          version: 2.2.0
        storage:
...
```

Currently only the first MachineConfig "97-ibm-machineconfig-base" is processed, the other one "98-ibm-machineconfig-satellite" is skipped.

https://github.com/openshift/hypershift/pull/4398

Bug OCPBUGS-41717: Kube-aggregator reaching stale apiservice endpoints

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41580~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39133. The following is the description of the original issue:
—
Description of problem:

Debugging https://issues.redhat.com/browse/OCPBUGS-36808 (the Metrics API failing some of the disruption checks) and taking https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-monitoring-operator-2439-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade/1824454734052855808 as a reproducer of the issue, I think the Kube-aggregator is behind the problem.

According to the disruption checks which forward some relevant errors from the apiserver in the logs, looking at one of the new-connections check failures (from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/openshift-cluster-monitoring-operator-2439-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade/1824454734052855808/artifacts/e2e-aws-ovn-upgrade-2/openshift-e2e-test/artifacts/junit/backend-disruption_20240816-155051.json)

> 	"Aug 16 *16:43:17.672* - 2s    E backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests reason/DisruptionBegan request-audit-id/c62b7d32-856f-49de-86f5-1daed55326b2 backend-disruption-name/metrics-api-new-connections connection/new disruption/openshift-tests stopped responding to GET requests over new connections: error running request: 503 Service Unavailable: error trying to reach service: dial tcp 10.128.2.31:10250: connect: connection refused"

The "error trying to reach service" part comes from: https://github.com/kubernetes/kubernetes/blob/b3c725627b15bb69fca01b70848f3427aca4c3ef/staging/src/k8s.io/apimachinery/pkg/util/proxy/transport.go#L105, the apiserver failing to reach the metrics-server Pod, the problem is that the IP "10.128.2.31" corresponds to a Pod that was deleted some milliseconds before (as part of a node update/draining), as we can see in:

> 2024-08-16T16:19:43.087Z|00195|binding|INFO|openshift-monitoring_metrics-server-7b9d8c5ddb-dtsmr: Claiming 0a:58:0a:80:02:1f 10.128.2.31
...
I0816 *16:43:17.650083*    2240 kubelet.go:2453] "SyncLoop DELETE" source="api" pods=["openshift-monitoring/metrics-server-7b9d8c5ddb-dtsmr"]
...

The apiserver was using a stale IP to reach a Pod that no longer exists, even though a new Pod that had already replaced the other Pod (Metrics API backend runs on 2 Pods), some minutes before, was available.
According to OVN, a fresher IP 10.131.0.12 of that Pod was already in the endpoints at that time:

> I0816 16:40:24.711048    4651 lb_config.go:1018] Cluster endpoints for openshift-monitoring/metrics-server are: map[TCP/https:{10250 [10.128.2.31 10.131.0.12] []}]

*I think, when "10.128.2.31" failed, the apiserver should have fallen back to "10.131.0.12", maybe it waits for some time/retries before doing so, or maybe it wasn't even aware of "10.131.0.12"*

AFAIU, we have "--enable-aggregator-routing" set by default https://github.com/openshift/cluster-kube-apiserver-operator/blob/37df1b1f80d3be6036b9e31975ac42fcb21b6447/bindata/assets/config/defaultconfig.yaml#L101-L103 on the apiservers, so instead of forwarding to the metrics-server's service, apiserver directly reaches the Pods.

For that it keeps track of the relevant services and endpoints https://github.com/kubernetes/kubernetes/blob/ad8a5f5994c0949b5da4240006d938e533834987/staging/src/k8s.io/kube-aggregator/pkg/apiserver/resolvers.go#L40

bad decisions may be made if the if the services and/or endpoints cache are stale.

Looking at the metrics-server (the Metrics API backend) endpoints changes in the apiserver audit logs:

> $ grep -hr Event . | grep "endpoints/metrics-server" | jq -c 'select( .verb | match("watch|update"))' | jq -r '[.requestReceivedTimestamp,.user.username,.verb] | @tsv' | sort
2024-08-16T15:39:57.575468Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T15:40:02.005051Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T15:40:35.085330Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T15:40:35.128519Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:19:41.148148Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:19:47.797420Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:20:23.051594Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:20:23.100761Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:20:23.938927Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:21:01.699722Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:39:00.328312Z	system:serviceaccount:kube-system:endpoint-controller	update ==> At around 16:39:XX the first Pod was rolled out
2024-08-16T16:39:07.260823Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:39:41.124449Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:43:23.701015Z	system:serviceaccount:kube-system:endpoint-controller	update ==> At around 16:43:23, the new Pod that replaced the second one was created
2024-08-16T16:43:28.639793Z	system:serviceaccount:kube-system:endpoint-controller	update
2024-08-16T16:43:47.108903Z	system:serviceaccount:kube-system:endpoint-controller	update

We can see that just before the new-connections checks succeeded again at around "2024-08-16T16:43:23.", an UPDATE was received/treated which may have helped the apiserver sync its endpoints cache or/and chose a healthy Pod

Also, no update was triggered when the second Pod was deleted at "16:43:17" which may explain the stale 10.128.2.31 endpoints entry on apiserver side.

To summarize, I can see two problems here (maybe one is the consequence of the other):

    A Pod was deleted and an Endpoint pointing to it wasn't updated. Apparently the Endpoints controller had/has some sync issues https://github.com/kubernetes/kubernetes/issues/125638
    The apiserver resolver had a endpoints cache with one stale and one fresh entry but it kept 4-5 times in a row trying to reach the stale entry OR
    The endpoints was updated "At around 16:39:XX the first Pod was rolled out, see above", but the apiserver resolver cache missed that and ended up with 2 stale entries in the cache, and had to wait until "At around 16:43:23, the new Pod that replaced the second one was created, see above" to sync and replace them with 2 fresh entries.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. See "Description of problem"
    2.
    3.

Actual results:

Expected results:

the kube-aggregator should detect stale Apiservice endpoints.

Additional info:

the kube-aggregator proxies requests to a stale Endpoints/Pod which makes Metrics API requests falsely fail.

Task MON-3673: Bump downstream Prometheus to v2.49.1

View the linked PRs

Bug OCPBUGS-43967: Re-vendor assisted-service to make external platform work

View the Description View the linked PRs

Description of problem:

    https://issues.redhat.com/browse/MGMT-15691 introduced the code restructuring related to external platform and oci via PR https://github.com/openshift/assisted-service/pull/5787 Assisted service needs to be re-vendored in the installer in 4.16 and 4.17 releases to make sure the assisted-service dependencies are consistent.

The master branch or 4.18 do not need this revendoring as the master branch was recently revendored via https://github.com/openshift/installer/pull/9058

Version-Release number of selected component (if applicable):

    4.17, 4.16

How reproducible:

   Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/9150

Bug OCPBUGS-34625: [vSphere CAPI install] installconfig.controlPlane.platform.vsphere.coresPerSocket does not work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33615~~. The following is the description of the original issue:
—
Description of problem:

    The coresPerSocket value set in install-config does not match the actual result. When setting controlPlane.platform.vsphere.cpus to 16 and controlPlane.platform.vsphere.coresPerSocket to 8.The actual result I checked was: "NumCPU": 16,"NumCoresPerSocket": 16, NumCoresPerSocket should match the setting in install-config instead of NumCPU.

Check the setting in VSphereMachine-openshift-cluster-api-guests-wwei1215a-42n48-master-0.yaml, the numcorespersocket is 0:
    numcpus: 16    
    numcorespersocket: 0

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-05-08-222442

How reproducible:

    See description

Steps to Reproduce:

    1.setting coresPerSocket for control plane in install-config. cpu needs to be a multiple of corespersocket.
    2.install the cluster

Actual results:

    The NumCoresPerSocket is equal to NumCPU. In file VSphereMachine-openshift-cluster-api-guests-xxxx-xxxx-master-0.yaml, the numcorespersocket is 0. and in vm setting: "NumCoresPerSocket": 8.

Expected results:

    The NumCoresPerSocket should match the setting in install-config.

Additional info:

installconfig setting:
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    vsphere:
      cpus: 16
      coresPerSocket: 8
check result:     
"Hardware": {          "NumCPU": 16,          "NumCoresPerSocket": 16,
the check result for compute node is expected.
installconfig setting:
compute:- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    vsphere:
      cpus: 8
      coresPerSocket: 4
check result:
"Hardware": {          "NumCPU": 8,          "NumCoresPerSocket": 4,

https://github.com/openshift/installer/pull/8500

Bug OCPBUGS-42369: Console crashes when ssh is selected in add secret for starting a pipeline run

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42060~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41228. The following is the description of the original issue:
—
Description of problem:

The console crashes when the user selects SSH as the Authentication type for the git server under add secret in the start pipeline form

Version-Release number of selected component (if applicable):

How reproducible:

Everytime. Only in developer perspective and if the Pipelines dynamic plugin is enabled.

Steps to Reproduce:

    1. Create a pipeline through add flow and open start pipeline page 
    2. Under show credentials select add secret
    3. In the secret form select `Access to ` as Git server and `Authentication type` as SSH key

Actual results:

Console crashes

Expected results:

UI should work as expected

Additional info:

Attaching console log screenshot

https://drive.google.com/file/d/1bGndbq_WLQ-4XxG5ylU7VuZWZU15ywTI/view?usp=sharing

https://github.com/openshift/console/pull/14320

Bug OCPBUGS-27446: CCO reports manual instead of manualpodidentity mode in metrics for an Azure Workload Identity cluster

View the Description View the linked PRs

Steps to Reproduce:

1. Install a cluster using Azure Workload Identity
2. Check the value of the cco_credentials_mode metric

Actual results:

mode = manual

Expected results:

mode = manualpodidentity

Additional info:

The cco_credentials_mode metric reports manualpodidentity mode for an AWS STS cluster.

https://github.com/openshift/cloud-credential-operator/pull/657

Bug OCPBUGS-34806: [4.16.0] Send cnv_abnormal metric via telemetry

View the Description View the linked PRs

PR adding the metric in OCP 4.17: https://github.com/openshift/cluster-monitoring-operator/pull/2291

This will need backports to existing OCP versions.

https://github.com/openshift/cluster-monitoring-operator/pull/2364

Bug OCPBUGS-36186: Add default sorting column for VirtualizedTable component of dynamic plugin sdk

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33539~~. The following is the description of the original issue:
—
Description of problem:

    VirtualizedTable component  in console dynamic plugin don't have default sorting column. We need default sorting column for list pages.

https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#virtualizedtable

https://github.com/openshift/console/pull/14008

Bug OCPBUGS-32741: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13790

Bug OCPBUGS-33872: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13864

Bug OCPBUGS-31088: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3770

Bug OCPBUGS-35481: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13975

Bug OCPBUGS-27844: dual-stack UPI: network.yaml not compatible with different ansible versions

View the Description View the linked PRs

Description of problem:

The network resource provisioning playbook for 4.15 dualstack UPI contains a task for adding an IPv6 subnet to the existing external router [1].
This task fails with:
- ansible-2.9.27-1.el8ae.noarch & ansible-collections-openstack-1.8.0-2.20220513065417.5bb8312.el8ost.noarch in OSP 16 env (RHEL 8.5) or
- openstack-ansible-core-2.14.2-4.1.el9ost.x86_64 & ansible-collections-openstack-1.9.1-17.1.20230621074746.0e9a6f2.el9ost.noarch in OSP 17 env (RHEL 9.2)

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-22-160236

How reproducible:

Always

Steps to Reproduce:

1. Set the os_subnet6 in the inventory file for setting dualstack
2. Run the 4.15 network.yaml playbook

Actual results:

Playbook fails:
TASK [Add IPv6 subnet to the external router] ********************************** fatal: [localhost]: FAILED! => {"changed": false, "extra_data": {"data": null, "details": "Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.", "response": "{\"NeutronError\": {\"type\": \"HTTPBadRequest\", \"message\": \"Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}.\", \"detail\": \"\"}}"}, "msg": "Error updating router 8352c9c0-dc39-46ed-94ed-c038f6987cad: Client Error for url: https://10.46.43.81:13696/v2.0/routers/8352c9c0-dc39-46ed-94ed-c038f6987cad, Invalid input for external_gateway_info. Reason: Validation of dictionary's keys failed. Expected keys: {'network_id'} Provided keys: {'external_fixed_ips'}."}

Expected results:

Successful playbook execution

Additional info:

The router can be created in two different tasks, the playbook [2] worked for me.

[1] https://github.com/openshift/installer/blob/1349161e2bb8606574696bf1e3bc20ae054e60f8/upi/openstack/network.yaml#L43
[2] https://file.rdu.redhat.com/juriarte/upi/network.yaml

https://github.com/openshift/installer/pull/8087

Bug OCPBUGS-31280: ART requests updates to 4.16 image ose-cluster-olm-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/47

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/50

Bug OCPBUGS-15253: Add namespace to IngressWithoutClassName and UnmanagedRoutes alert message

View the Description View the linked PRs

Description of problem:

It would help making debugging easier if we included the namespace in the message for these alerts: https://github.com/openshift/cluster-ingress-operator/blob/master/manifests/0000_90_ingress-operator_03_prometheusrules.yaml#L69

Version-Release number of selected component (if applicable):

4.12.x

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

No namespace in the alert message

Expected results:

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/980

Bug OCPBUGS-24884: Update 4.16 ose-cluster-dns-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/397

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/397

Bug OCPBUGS-30501: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/172

Bug OCPBUGS-31024: The CatalogSource file created by oc-mirror for v2 format is invalid

View the Description View the linked PRs

Description of problem:

when I used the file to create CatalogSource, the creation failed and hit error:
[root@preserve-fedora36 cluster-resources]# oc create -f cs-redhat-operator-index-v4-15.yaml 
The CatalogSource "cs-redhat-operator-index-v4-15" is invalid: 
* spec.icon.base64data: Required value
* spec.icon.mediatype: Required value
[root@preserve-fedora36 cluster-resources]# cat cs-redhat-operator-index-v4-15.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: null
  name: cs-redhat-operator-index-v4-15
  namespace: openshift-marketplace
spec:
  icon: {}
  image: ec2-3-144-93-237.us-east-2.compute.amazonaws.com:5000/redhat/redhat-operator-index:v4.15
  sourceType: grpc
status: {}

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295", GitCommit:"c4f829512107f7d0f52a057cd429de2030b9b3b3", GitTreeState:"clean", BuildDate:"2024-03-07T03:46:24Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1. Use following imagesetconfigure to mirror to localhost:
cat config.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
#archiveSize: 8
storageConfig:
  local:
    path: /app1/ocmirror/offline
mirror:
  platform:
    channels:
    - name: stable-4.12                                             
      type: ocp
      minVersion: '4.12.46'
      maxVersion: '4.12.46'
      shortestPath: true
    graph: true
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.15
    packages:
    - name: advanced-cluster-management                                  
      channels:
      - name: release-2.9             
    - name: compliance-operator
      channels:
      - name: stable
    - name: multicluster-engine
      channels:
      - name: stable-2.4
      - name: stable-2.5
  additionalImages:
  - name: registry.redhat.io/ubi8/ubi:latest                        
  - name: registry.redhat.io/rhel8/support-tools:latest
  - name: registry.access.redhat.com/ubi8/nginx-120:latest
  - name: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0
  - name: registry.k8s.io/sig-storage/csi-resizer:v1.8.0
`oc-mirror --config config.yaml  file://operatortest --v2`
2. mirror to registry :
`oc-mirror  --config config.yaml --from file://operatortest   docker://ec2-3-144-93-237.us-east-2.compute.amazonaws.com:5000  --v2`

3. Create catalogsource with the created file:
cat cs-redhat-operator-index-v4-15.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: null
  name: cs-redhat-operator-index-v4-15
  namespace: openshift-marketplace
spec:
  icon: {}
  image: ec2-3-144-93-237.us-east-2.compute.amazonaws.com:5000/redhat/redhat-operator-index:v4.15
  sourceType: grpc
status: {}

oc create -f cs-redhat-operator-index-v4-15.yaml 
The CatalogSource "cs-redhat-operator-index-v4-15" is invalid: 
* spec.icon.base64data: Required value
* spec.icon.mediatype: Required value

Actual results:

Failed to create catalogsource by the created file.

Expected results:

No error.

Bug OCPBUGS-32219: multus-admission-controller upstream sync 202404

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/multus-admission-controller/pull/83

Bug OCPBUGS-24947: Update 4.16 prometheus-operator-admission-webhook-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/266

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/266

Bug OCPBUGS-25942: OpenShift vSphere Connection Configuration Does Not Appropriately Insert Escaped Strings

View the Description View the linked PRs

Recently a user was attempting to change the Virtual Machine Folder for a cluster installed on vSphere. The user used the configuration panel "vSphere Connection Configuration" to complete this process. Upon updating the path and clicking "Save Configuration" cluster wide issues emerged including nodes not coming back online after a reboot.

OpenShift nodes eventually crashed with an error resultant of an incorrectly parsed folder due to the string literal " " characters missing.

While this was exhibited on OCP 4.13, other versions may be affected.

https://github.com/openshift/console/pull/13477

Bug OCPBUGS-36717: alert for metrics endpoint at port 9537 shows connection refused for windows nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-31250~~. The following is the description of the original issue:
—
Description of problem:

1. For the Linux nodes, the container runtime is CRI-O and the port 9537 has a crio process listening on it.While, windows nodes doesn't have CRIO container runtime.

2. Prometheus is trying to connect to /metrics endpoint on the windows nodes on port 9537 which actually does not have any process listening on it.

3. TargetDown is alerting crio job since it cannot reach the endpoint http://windows-node-ip:9537/metrics.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Install 4.13 cluster with windows operator
    2. In the Prometheus UI, go to > Status > Targets to know which targets are down.

Actual results:

    It gives the alert for targetDown

Expected results:

    It should not give any such alert.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2408

Bug OCPBUGS-24280: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13421

Bug OCPBUGS-7656: Console with custom route can not be accessed when the console is enabled after cluster upgrade

View the Description View the linked PRs

Description of problem:

When console with custom route is disabled before cluster upgrade, and re-enabled after cluster upgrade, console could not be accessed successfully.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-15-111607

How reproducible:

Always

Steps to Reproduce:

1. Launch a cluster with available update.
2. Create custom route for console in ingress configuration:
# oc edit ingresses.config.openshift.io cluster
spec:
  componentRoutes:
  - hostname: console-openshift-custom.apps.qe-413-0216.qe.devcluster.openshift.com
    name: console
    namespace: openshift-console
  - hostname: openshift-downloads-custom.apps.qe-413-0216.qe.devcluster.openshift.com
    name: downloads
    namespace: openshift-console
  domain: apps.qe-413-0216.qe.devcluster.openshift.com
3. After custom route is created, access console with custom route.
4. Remove console by setting managementState as Removed in console operator:
# oc edit consoles.operator.openshift.io cluster
spec:
  logLevel: Normal
  managementState: Removed
  operatorLogLevel: Normal
5. Upgrade cluster to a target version.
6. Enable console by setting managementState as Managed in console operator:
# oc edit consoles.operator.openshift.io cluster
spec:
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
7. After console resources are created, access console url.

Actual results:

3. Console could be accessed through custom route.
4. Console resources are removed. And all cluster operators are in normal status
# oc get all -n openshift-console
No resources found in openshift-console namespace.

5. Upgrade succeeds, all cluster operators are in normal status
6. Console resources are created:

oc get all -n openshift-console
NAME READY STATUS RESTARTS AGE
pod/console-69d88985b-bvh46 1/1 Running 0 3m41s
pod/console-69d88985b-fwhjf 1/1 Running 0 3m41s
pod/downloads-6b6b555d8d-kn822 1/1 Running 0 3m49s
pod/downloads-6b6b555d8d-wp6zc 1/1 Running 0 3m49s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/console ClusterIP 172.30.226.112 <none> 443/TCP 3m50s
service/console-redirect ClusterIP 172.30.147.151 <none> 8444/TCP 3m50s
service/downloads ClusterIP 172.30.251.248 <none> 80/TCP 3m50s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/console 2/2 2 2 3m47s
deployment.apps/downloads 2/2 2 2 3m50s

NAME DESIRED CURRENT READY AGE
replicaset.apps/console-69d88985b 2 2 2 3m42s
replicaset.apps/console-6dbdd487d 0 0 0 3m47s
replicaset.apps/downloads-6b6b555d8d 2 2 2 3m50s

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/console console-openshift-console.apps.qe-413-0216.qe.devcluster.openshift.com console-redirect custom-route-redirect edge/Redirect None
route.route.openshift.io/console-custom console-openshift-custom.apps.qe-413-0216.qe.devcluster.openshift.com console https reencrypt/Redirect None
route.route.openshift.io/downloads downloads-openshift-console.apps.qe-413-0216.qe.devcluster.openshift.com downloads http edge/Redirect None
route.route.openshift.io/downloads-custom openshift-downloads-custom.apps.qe-413-0216.qe.devcluster.openshift.com downloads http edge/Redirect None

7. Could not open console url successfully. There is error info for console operator:

oc get co | grep console
console 4.13.0-0.nightly-2023-02-15-202607 False False False 42s RouteHealthAvailable: route not yet available, https://console-openshift-custom.apps.qe-413-0216.qe.devcluster.openshift.com returns '503 Service Unavailable'
oc get clusterversions.config.openshift.io
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-2023-02-15-202607 True False 4h48m Error while reconciling 4.13.0-0.nightly-2023-02-15-202607: the cluster operator console is not available

Expected results:

7. Should be able to access console successfully.

Additional info:

https://github.com/openshift/console-operator/pull/826

Bug OCPBUGS-30242: 4.15 Control plane won't allow the creation of a 4.14 and lower node pool

View the Description View the linked PRs

Description of problem:

    4.15 control plane can't create a 4.14 node pool due to an issue with payload

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create an Hosted Cluster in 4.15
    2. Create a Node Pool in 4.14
    3. Node pool stuck in provisioning

Actual results:

    No node pool is created

Expected results:

    Node pool is created as we support N-2 version there

Additional info:

Possibly linked to OCPBUGS-26757

Bug OCPBUGS-35281: Display of "Auth Token GCP" filter in OperatorHub should be conditioned

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33756~~. The following is the description of the original issue:
—
Description of problem:

The "Auth Token GCP" filter in OperatorHub is displayed all the time, but in stead it should be rendered only for GPC cluster that have Manual creadential mode. When an GCP WIF capable operator is installed and the cluster is in GCP WIF mode, the Console should require the user to enter the necessary information about the GCP project, account, service account etc, which is in turn to be injected the operator's deployment via subscription.config (exactly how Azure and AWS STS got implemented in Console)

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Steps to Reproduce:

    1. On a non-GCP cluster, navigate to OperatorHub
    2. check available filters
    3.

Actual results:

    "Auth Token GCP" filter is available in OperatorHub

Expected results:

    "Auth Token GCP" filter should not be available in OperatorHub for a non-GCP cluster. 
    When selecting an operator that supports "Auth token GCP" as indicated by the annotation features.operators.openshift.io/token-auth-gcp: "true" the console needs to, aligned with how it works AWS/Azure auth capable operators, force the user to input the required information to auth against GCP via WIF in the form of env vars that are set up using subscription.config on the operator. The exact names need to come out of https://issues.redhat.com/browse/CCO-574

Additional info:

Azure PR - https://github.com/openshift/console/pull/13082
AWS PR - https://github.com/openshift/console/pull/12778

UI Screen Design can be taken from the existing implementation of the Console support short-lived token setup flow for AWS and Azure described here: https://docs.google.com/document/d/1iFNpyycby_rOY1wUew-yl3uPWlE00krTgr9XHDZOTNo/edit

https://github.com/openshift/console/pull/13955

Bug OCPBUGS-26125: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-vsphere/pull/61

Bug OCPBUGS-28763: --destroy-cloud-resources does not work for HCP KubeVirt platform

View the Description View the linked PRs

Description of problem:

    When destroying an HCP KubeVirt cluster using the cli and the --destroy-cloud-resources, pvcs are not cleaned up within the guest cluster due to the cli not properly honoring the --destroy-cloud-resources option for KubeVirt.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

    100%

Steps to Reproduce:

    1. destroy an hcp kubevirt cluster using cli and --destroy-cloud-resources
    2.
    3.

Actual results:

the hosted cluster does not have the hypershift.openshift.io/cleanup-cloud-resources: "true" annotation added which ensures the hosted cluster config controller cleans up pvcs

Expected results:

the hypershift.openshift.io/cleanup-cloud-resources: "true" should get added to the hosted cluster during tear down when the --destroy-cloud-resources cli option is used

Additional info:

https://github.com/openshift/hypershift/pull/3494

Bug OCPBUGS-22598: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-gcp/pull/53

Bug OCPBUGS-25775: Update 4.16 ironic-rhcos-downloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-rhcos-downloader/pull/96

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-rhcos-downloader/pull/96

Bug OCPBUGS-41929: Adding more tested arm instances to Tested instance types for AWS on 64-bit ARM infrastructures

View the Description View the linked PRs

This is a clone of issue OCPBUGS-41896. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41776. The following is the description of the original issue:
—
Description of problem:

the section is: https://docs.openshift.com/container-platform/4.16/installing/installing_aws/ipi/installing-aws-vpc.html#installation-aws-arm-tested-machine-types_installing-aws-vpc  

all tesed arm instances for 4.14+:
c6g.*
c7g.*
m6g.*
m7g.*
r8g.*

we need to ensure all sections include "Tested instance types for AWS on 64-bit ARM infrastructures" section been updated for 4.14+

Additional info:

https://github.com/openshift/installer/pull/9009

Bug OCPBUGS-25149: Update 4.16 ose-cluster-csi-snapshot-controller-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/179

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/179

Bug OCPBUGS-30458: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/884

Bug OCPBUGS-34046: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/88

Bug OCPBUGS-27445: Client side throttling when running the metrics controller

View the Description View the linked PRs

Description of problem:

Client side throttling observed when running the metrics controller.

Steps to Reproduce:

1. Install an AWS cluster in mint mode
2. Enable debug log by editing cloudcredential/cluster
3. Wait for the metrics loop to run for a few times
4. Check CCO logs

Actual results:

// 7s consumed by metrics loop which is caused by client-side throttling 
time="2024-01-20T19:43:56Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
I0120 19:43:56.251278       1 request.go:629] Waited for 176.161298ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
I0120 19:43:56.451311       1 request.go:629] Waited for 197.182213ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
I0120 19:43:56.651313       1 request.go:629] Waited for 197.171082ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
I0120 19:43:56.850631       1 request.go:629] Waited for 195.251487ms due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-cloud-network-config-controller/secrets/cloud-credentials
...
time="2024-01-20T19:44:03Z" level=info msg="reconcile complete" controller=metrics elapsed=7.231061324s

Expected results:

No client-side throttling when running the metrics controller.

https://github.com/openshift/cloud-credential-operator/pull/656

Bug OCPBUGS-29594: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-29209: HyperShift operator should not apply PKI operator RBAC if PKI disabled

View the Description View the linked PRs

Description of problem:

    HyperShift operator is applying control-plane-pki-operator RBAC resources regardless of if PKI reconciliation is disabled for the HostedCluster.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    100%

Steps to Reproduce:

    1. Create 4.15 HostedCluster with PKI reconciliation disabled
    2. Unused RBAC resources for control-plane-pki-operator is created

Actual results:

    Unused RBAC resources for control-plane-pki-operator is created

Expected results:

RBAC resources for control-plane-pki-operator should not be created if deployment for control-plane-pki-operator itself is not created.

Additional info:

https://github.com/openshift/hypershift/pull/3544

Bug OCPBUGS-38054: No response from applications exposed via NodePort when client ephemeral port is 22623 or 22624

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37541~~. The following is the description of the original issue:
—
Description of problem:
Apps exposed via NodePort do not return responses to client requests if the client's ephemeral port is 22623 or 22624.
When testing with curl command specifying the local port as shown below, a response is returned if the ephemeral port is 22622 or 22626, but it times out if the ephemeral port is 22623 or 22624.

[root@bastion ~]# for i in {22622..22626}; do echo localport:${i}; curl -m 10 -I 10.0.0.20:32325 --local-port ${i}; done
localport:22622
HTTP/1.1 200 OK
Server: nginx/1.22.1
Date: Thu, 25 Jul 2024 07:44:22 GMT
Content-Type: text/html
Content-Length: 37451
Last-Modified: Wed, 24 Jul 2024 12:20:19 GMT
Connection: keep-alive
ETag: "66a0f183-924b"
Accept-Ranges: bytes
localport:22623
curl: (28) Connection timed out after 10001 milliseconds
localport:22624
curl: (28) Connection timed out after 10000 milliseconds
localport:22625
HTTP/1.1 200 OK
Server: nginx/1.22.1
Date: Thu, 25 Jul 2024 07:44:42 GMT
Content-Type: text/html
Content-Length: 37451
Last-Modified: Wed, 24 Jul 2024 12:20:19 GMT
Connection: keep-alive
ETag: "66a0f183-924b"
Accept-Ranges: bytes
localport:22626
HTTP/1.1 200 OK
Server: nginx/1.22.1
Date: Thu, 25 Jul 2024 07:44:42 GMT
Content-Type: text/html
Content-Length: 37451
Last-Modified: Wed, 24 Jul 2024 12:20:19 GMT
Connection: keep-alive
ETag: "66a0f183-924b"
Accept-Ranges: bytes

This issue has been occurring since upgrading to version 4.16. Confirmed that it does not occur in versions 4.14 and 4.12.

Version-Release number of selected component (if applicable):
OCP 4.16

How reproducible:
100%

Steps to Reproduce:
1. Prepare a 4.16 cluster.
2. Launch any web app pod (nginx, httpd, etc.).
3. Expose the application externally using NodePort.
4. Access the URL using curl --local-port option to specify 22623 or 22624.

Actual results:
No response is returned from the exposed application when the ephemeral port is 22623 or 22624.

Expected results:
A response is returned regardless of the ephemeral port.

Additional info:
This issue started occurring from version 4.16, so it is possible that this is due to changes in RHEL 9.4, particularly those related to nftables.

https://github.com/openshift/ovn-kubernetes/pull/2248

Bug OCPBUGS-13551: FailedPrecondition volume does not appear staged

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver/pull/88

Bug MGMT-17413: Can't use nightly images in stage environment

View the Description View the linked PRs

Description of the problem:

Currently, assisted-service performs validation of the user pull secret during actions like registering / updating a cluster / infraenv, in which it checks that the pull secret contains the tokens for all release images. Therefore, currently we can't add nightly release images to stage environment as it blocks users without the right token from installing clusters with different release images.

How reproducible:

Always

Steps to reproduce:

1. Add nightly image to stage. (not recommended)

Actual results:

https://redhat-internal.slack.com/archives/C02RD175109/p1705916732425399?thread_ts=1705915614.824009&cid=C02RD175109

Expected results:

Register cluster successfully.

https://github.com/openshift/assisted-service/pull/6158

Bug OCPBUGS-39015: Bump to kubernetes 1.29.8

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.29.8:

Changelog:
v1.29.8: https://github.com/kubernetes/kubernetes/blob/release-1.29/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1297

https://github.com/openshift/kubernetes/pull/2066

Bug OCPBUGS-27422: Invalid memory address or nil pointer dereference in Cloud Network Config Controller

View the Description View the linked PRs

Description of problem:

  Invalid memory address or nil pointer dereference in Cloud Network Config Controller

Version-Release number of selected component (if applicable):

    4.12

How reproducible:

    sometimes

Steps to Reproduce:

    1. Happens by itself sometimes
    2.
    3.

Actual results:

    Panic and pod restarts

Expected results:

    Panics due to Invalid memory address or nil pointer dereference  should not occur

Additional info:

    E0118 07:54:18.703891 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 93 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x203c8c0?, 0x3a27b20})
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x203c8c0, 0x3a27b20})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0
github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1a40b30]
goroutine 93 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x203c8c0, 0x3a27b20})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*Azure).AssignPrivateIP(0xc0001ce700, {0xc000696540, 0x10, 0x10}, 0xc000818ec0)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/azure.go:146 +0xcf0
github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig.(*CloudPrivateIPConfigController).SyncHandler(0xc000986000, {0xc000896a90, 0xe})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/cloudprivateipconfig/cloudprivateipconfig_controller.go:327 +0x1013
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc000720d80, {0x1e640c0?, 0xc0003bd090?})
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc000720d80)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(0xc000504ea0?)
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x27b3220, 0xc000894480}, 0x1, 0xc0000aa540)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run
/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa

https://github.com/openshift/cloud-network-config-controller/pull/133

Bug OCPBUGS-30468: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/259

Bug OCPBUGS-32791: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13786

Bug OCPBUGS-44043: Setting ESP offload off for bonds does not work reliably

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43917~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42987. The following is the description of the original issue:
—
It is been observed that the esp_offload kernel module might be loaded by libreswan even if bond ESP offloads have been correctly turned off.

This might be because ipsec service and configure-ovs run at the same time, so it is possible that ipsec service starts when bond offloads are not yet turned off and trick libreswan into thinking they should be used.

The potential fix would be to run ipsec service after configure-ovs.

https://github.com/openshift/machine-config-operator/pull/4676

Bug OCPBUGS-24061: CSO generates excessive progressing condition events

View the Description View the linked PRs

We had this CI job failing because clusteroperator/storage kept flip-flopping between progressing=True and progressing=False

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_api/1684/pull-ci-openshift-api-master-e2e-aws-serial-techpreview/1729464659330732032

: [sig-arch] events should not repeat pathologically for ns/openshift-cluster-storage-operator expand_less    0s
{  1 events happened too frequently
event happened 21 times, something is wrong: namespace/openshift-cluster-storage-operator deployment/cluster-storage-operator hmsg/cfc7e5cdbe - reason/OperatorStatusChanged Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well") From: 14:13:20Z To: 14:13:21Z result=reject }

This exposed ~~OCPBUGS-24027~~ which is now fixed.

However, there are still an excessive number of progressing events from this job.

$ grep 'clusteroperator/storage changed: Progressing' events.txt > progressing.txt
$ wc -l progressing.txt 
28 progressing.txt

A small subset of those actually change between True and Flase

$ grep 'clusteroperator/storage changed: Progressing' events.txt | grep True
openshift-cluster-storage-operator                 143m        Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from Unknown to False ("All is well"),Available changed from Unknown to True ("DefaultStorageClassControllerAvailable: StorageClass provided by supplied CSI Driver instead of the cluster-storage-operator")
openshift-cluster-storage-operator                 143m        Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from False to True ("AWSEBSProgressing: Waiting for Deployment to act on changes")
openshift-cluster-storage-operator                 143m        Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing message changed from "AWSEBSProgressing: Waiting for Deployment to deploy pods" to "AWSEBSCSIDriverOperatorCRProgressing: Waiting for AWSEBS operator to report status\nAWSEBSProgressing: Waiting for Deployment to deploy pods",Available changed from True to False ("AWSEBSCSIDriverOperatorCRAvailable: Waiting for AWSEBS operator to report status"),Upgradeable changed from Unknown to True ("All is well")
openshift-cluster-storage-operator                 136m        Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
openshift-cluster-storage-operator                 45m         Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from False to True ("AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
openshift-cluster-storage-operator                 2m11s       Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
openshift-cluster-storage-operator                 8m6s        Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from False to True ("SHARESCSIDriverOperatorCRProgressing: SharedResourcesDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
openshift-cluster-storage-operator                 2m12s       Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing changed from False to True ("SHARESProgressing: Waiting for Deployment to deploy pods")

But then we end up with events like this for example, where CSO has just appended the status message with more noise between competing controllers:

openshift-cluster-storage-operator                 142m        Normal    OperatorStatusChanged                        deployment/cluster-storage-operator                                            Status for clusteroperator/storage changed: Progressing message changed from "AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverControllerServiceControllerProgressing: Waiting for Deployment to act on changes\nAWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods\nSHARESCSIDriverOperatorCRProgressing: SharedResourceCSIDriverWebhookControllerProgressing: Waiting for Deployment to deploy pods" to "AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverControllerServiceControllerProgressing: Waiting for Deployment to deploy pods\nAWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods\nSHARESCSIDriverOperatorCRProgressing: SharedResourceCSIDriverWebhookControllerProgressing: Waiting for Deployment to deploy pods",Available message changed from "AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment\nAWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service\nSHARESCSIDriverOperatorCRAvailable: SharedResourceCSIDriverWebhookControllerAvailable: Waiting for Deployment\nSHARESCSIDriverOperatorCRAvailable: SharedResourcesDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service" to "AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment\nSHARESCSIDriverOperatorCRAvailable: SharedResourceCSIDriverWebhookControllerAvailable: Waiting for Deployment\nSHARESCSIDriverOperatorCRAvailable: SharedResourcesDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service"

There are multiple controllers for multiple operators updating the progressing condition, which generates an excessive number of events. This would be (at least) annoying on a live cluster, but it also leaves CSO succeptible to `events should not repeat pathologically` test flakes in CI.

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/230

Bug OCPBUGS-35049: [capi aws] don't use S3 stub ignition for masters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35020~~. The following is the description of the original issue:
—
Description of problem:

    When investigating https://issues.redhat.com/browse/OCPBUGS-34819 we encountered an issue with the LB creation but also noticed that masters are using an S3 stub ignition even though they don't have to. Although that can be harmless, we are adding an extra, useless hop that we don't need.

Version-Release number of selected component (if applicable):

    4.16+

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    Change the AWSMachineTemplate ignition.storageType to UnencryptedUserData

https://github.com/openshift/installer/pull/8548

Bug OCPBUGS-24824: Update 4.16 csi-driver-manila-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-manila-operator/pull/213

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-manila-operator/pull/213

Bug OCPBUGS-24788: Update 4.16 ironic-rhcos-downloader-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-rhcos-downloader/pull/94

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-rhcos-downloader/pull/94

Bug OCPBUGS-25396: Archieved in Tekton Results icon is not shown in list and details page for PipelineRuns imported from Tekton Results db

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13450

Bug TRT-1720: Backport system node user cleanup

View the Description View the linked PRs

Opening for https://github.com/openshift/origin/pull/28883

https://github.com/openshift/origin/pull/28883

Bug TRT-1501: NetPol tests may not be recording results correctly?

View the Description View the linked PRs

On https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-gcp-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1756168710529224704 we failed three netpol tests on just one result, failure, with no successess. However the other 9 jobs seemed to run fine.

[sig-network] Netpol NetworkPolicy between server and client should allow ingress access from namespace on one named port [Feature:NetworkPolicy] [Skipped:Network/OVNKubernetes] [Skipped:Network/OpenShiftSDN/Multitenant] [Skipped:Network/OpenShiftSDN] [Suite:openshift/conformance/parallel] [Suite:k8s]

[sig-network] Netpol NetworkPolicy between server and client should allow egress access on one named port [Feature:NetworkPolicy] [Skipped:Network/OVNKubernetes] [Skipped:Network/OpenShiftSDN/Multitenant] [Skipped:Network/OpenShiftSDN] [Suite:openshift/conformance/parallel] [Suite:k8s]

[sig-network] Netpol NetworkPolicy between server and client should allow ingress access on one named port [Feature:NetworkPolicy] [Skipped:Network/OVNKubernetes] [Skipped:Network/OpenShiftSDN/Multitenant] [Skipped:Network/OpenShiftSDN] [Suite:openshift/conformance/parallel] [Suite:k8s]

Something seems funky with these tests, they're barely running, they don't seem to consistently report results. 4.15 has just 3 runs in the last two weeks, however 4.16 has just 1 but it passed

Whatever's going on, it's capable of taking out a payload, though it's not happening 100% of the time. https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.ci/release/4.15.0-0.ci-2024-02-10-040857

https://github.com/openshift/origin/pull/28593

Task MGMT-17266: Add OCP 4.16 images to assisted-service

View the linked PRs

https://github.com/openshift/assisted-service/pull/6091

Bug OCPBUGS-25855: Update 4.16 ose-vertical-pod-autoscaler-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/277

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/277

Task OKD-212: mco: Reference `stream-coreos` instead of `centos-stream-coreos-9`

View the Description View the linked PRs

Align with the version-less-ness of `rhel-coreos` and `fedora-coreos` and shorten the overall tag.

Both tags are currently aliases, `centos-stream-coreos-9` will be removed in the future.

https://github.com/openshift/machine-config-operator/pull/4349

Bug OCPBUGS-18115: PrometheusOperatorRejectedResources alert fires on Hypershift clusters with user-defined monitoring

View the Description View the linked PRs

Description of problem:

After enabling user-defined monitoring on an HyperShift hosted cluster, PrometheusOperatorRejectedResources starts firing.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start an hypershift-hosted cluster with cluster-bot
2. Enable user-defined monitoring
3.

Actual results:

PrometheusOperatorRejectedResources alert becomes firing

Expected results:

No alert firing

Additional info:

Need to reach out to the HyperShift folks as the fix should probably be in their code base.

Bug TRT-1493: Create AWS liveness probe in origin and standup AWS liveness endpoint

View the Description View the linked PRs

In ~~TRT-1476~~, we created a VM that served as an endpoint where we can test connectivity in gcp.
We want one for AWS.

In ~~TRT-1477~~, we created some code in origin to send HTTP GETs to that endpoint as a test to ensure connectivity remains working. Do this also for AWS.

TRT members already have an AWS account so we don't need to request one.

Task MON-3694: use the new 'slices' instead of 'golang.org/x/exp/slices' in CMO

View the Description View the linked PRs

Introduced in 1.21

https://github.com/openshift/cluster-monitoring-operator/pull/2243

Bug OCPBUGS-33933: Kube Controller Manager Operator should not set cloud provider flags

View the Description View the linked PRs

Description of problem:

    As of 4.16, we do not expect the Kube Controller Manager Operator to set any cloud provider related flags, eg `--cloud-provider`, `--cloud-config` or `--external-cloud-volume-plugin`.
In 4.17 setting these flags will prevent the KCM from starting.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/808

Bug OCPBUGS-25542: Update 4.16 ose-cluster-olm-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-olm-operator/pull/42

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-olm-operator/pull/42

Bug OCPBUGS-33962: Remove vSphere Terraform providers and manifests

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33890~~. The following is the description of the original issue:
—
Description of problem: Terraform should no longer be available for vSphere.

https://github.com/openshift/installer/pull/8431

Bug OCPBUGS-23891: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/108

Bug OCPBUGS-31044: [Azure-File] volume mount failed in multiple payload images

View the Description View the linked PRs

Description of problem:

Azure-File volume mount failed, it happens on arm cluster with multi payload

$ oc describe pod
  Warning  FailedMount       6m28s (x2 over 95m)  kubelet            MountVolume.MountDevice failed for volume "pvc-102ad3bf-3480-410b-a4db-73c64daeb3e2" : rpc error: code = InvalidArgument desc = GetAccountInfo(wduan-0319b-bkp2k-rg#clusterjzrlh#pvc-102ad3bf-3480-410b-a4db-73c64daeb3e2###wduan) failed with error: Retriable: true, RetryAfter: 0s, HTTPStatusCode: -1, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wduan-0319b-bkp2k-rg/providers/Microsoft.Storage/storageAccounts/clusterjzrlh/listKeys?api-version=2021-02-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Post "https://login.microsoftonline.com/6047c7e9-b2ad-488d-a54e-dc3f6be6a7ee/oauth2/token": dial tcp 20.190.190.193:443: i/o timeout'

The node log reports:
W0319 09:41:30.745936 1 azurefile.go:806] GetStorageAccountFromSecret(azure-storage-account-clusterjzrlh-secret, wduan) failed with error: could not get secret(azure-storage-account-clusterjzrlh-secret): secrets "azure-storage-account-clusterjzrlh-secret" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "wduan"

Checked the role looks good, at least the same as previous: 
$ oc get clusterrole azure-file-privileged-role -o yaml
...
rules:
- apiGroups:
  - security.openshift.io
  resourceNames:
  - privileged
  resources:
  - securitycontextconstraints
  verbs:
  - use

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-multi-2024-03-13-031451

How reproducible:

2/2

Steps to Reproduce:

    1. Checked in CI, azure-file cases failed due to this
    2. Create one cluster with the same config and payload, create azure-file pvc and pod
    3.

Actual results:

Pod could not be running

Expected results:

Pod should be running

Additional info:

Bug OCPBUGS-25492: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/85

Bug OCPBUGS-29832: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/164

Task HOSTEDCP-1523: Bump k8s to v0.29.3

View the Description View the linked PRs

Bump k8s to v0.29.3

https://github.com/openshift/hypershift/pull/3878

Bug OCPBUGS-31722: TestMTLSWithCRLs and TestCRLUpdate failing due to removed container image

View the Description View the linked PRs

Description of problem:

    The image quay.io/centos7/httpd-24-centos7 used in TestMTLSWithCRLs and TestCRLUpdate is no longer being rebuilt, and has had its 'latest' tag removed. Containers using this image fail to start, and cause the tests to fail.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    Run 'TEST="(TestMTLSWithCRLs|TestCRLUpdate)" make test-e2e' from the cluster-ingress-operator repo

Actual results:

    Both tests and all their subtests fail

Expected results:

    Tests pass

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/1037

Bug OCPBUGS-27455: Count mismatch in Image vunerabilities reported in the Openshift Console

View the Description View the linked PRs

Problem Description:

Installed the Red Hat Quay Container Security Operator on the 4.13.25 cluster .

Below are my test results :

```

sasakshi@sasakshi ~]$ oc version
Client Version: 4.12.7
Kustomize Version: v4.5.7
Server Version: 4.13.25
Kubernetes Version: v1.26.9+aa37255

[sasakshi@sasakshi ~]$ oc get csv -A | grep -i "quay" | tail -1
openshift container-security-operator.v3.10.2 Red Hat Quay Container Security Operator 3.10.2 container-security-operator.v3.10.1 Succeeded

[sasakshi@sasakshi ~]$ oc get subs -A
NAMESPACE NAME PACKAGE SOURCE CHANNEL
openshift-operators container-security-operator container-security-operator redhat-operators stable-3.10

[sasakshi@sasakshi ~]$ oc get imagemanifestvuln -A | wc -l
82
[sasakshi@sasakshi ~]$ oc get vuln --all-namespaces | wc -l
82

Console -> Administration -> Image Vulnerabitlites : 82

Home -> Overiview -> Status -> Image Vulnerabitlites : 66
```

Observations from My testing :

`oc get vuln --all-namespaces` reports the same count as `oc get imagemanifestvuln -A`

Difference in the count is reported in the following
```
Console -> Administration -> Image Vulnerabitlites : 82
Home -> Overiview -> Status -> Image Vulnerabitlites : 66
```
Why there is a difference in reporting of the above two entries?

Kindly refer to the attached screenshots for reference .

Documentation link referred:

https://docs.openshift.com/container-platform/4.14/security/pod-vulnerability-scan.html#security-pod-scan-query-cli_pod-vulnerability-scan

https://github.com/openshift/console/pull/13529

Bug OCPBUGS-31102: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3780

Bug OCPBUGS-32293: Jenkins Pipeline Build Tests Perm-failing

View the Description View the linked PRs

Description of problem:

   Few of the CI tests are continuously failing for Jenkins Pipelines Build Strategy. As this strategy has been deprecated since 4.10, we should skip these to unblock the PRs.

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

    Almost Everytime

Actual results:

    Tests failing

Expected results:

    All the tests should pass.

Additional info:

Observed in: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/343#issuecomment-2057851847

https://github.com/openshift/origin/pull/28732

Bug OCPBUGS-32979: CNO must report status while deploying IPsec

View the Description View the linked PRs

Description of problem:

CNO doesn't set operConfig status while deploying ipsec machine configs and ipsec daemonset, it must set progressing condition with true. When one of the components fail to deploy then it must report degraded condition set with true.

For more details, see the discussion here:
https://github.com/openshift/release/pull/50740#issuecomment-2076698580

https://github.com/openshift/cluster-network-operator/pull/2360

Bug OCPBUGS-36849: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4352

Bug OCPBUGS-29559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/999

Bug OCPBUGS-26184: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-nutanix/pull/63

Story HOSTEDCP-1308: Add an e2e check for sa token not being mounted on management side workloads

View the Description View the linked PRs

Add an e2e check for sa token not being mounted on management side workloads unless necessary. This could be an extension of the needManamgementKasAccess check

Bug OCPBUGS-33219: StatusItem component layout is wonky when no timestamp prop is present

View the Description View the linked PRs

Description of problem:

When a StatusItem lacks a timestamp, a <div> element is still output to the dom, resulting in a spacing issue above the message resulting in a veritcal misalignment with the icon on the left.  This is the result of the fact that timestamp is an optional prop, but the <div> surrounding the optional prop output is not optional.

https://github.com/openshift/console/pull/13815

Bug OCPBUGS-30774: installer log bundle should gather console logs even when ssh fails

View the Description View the linked PRs

Description of problem:

    When the installer gathers a log bundle after failure (either automatically or with gather bootstrap), the installer fails to return serial console logs if an SSH connection to the bootstrap node is refused. 

Even if the serial console logs were collected, the installer exits on error if ssh connection is refused:

time="2024-03-09T20:59:26Z" level=info msg="Pulling VM console logs"
time="2024-03-09T20:59:26Z" level=debug msg="Search for matching instances by tag in us-west-1 matching aws.Filter{\"kubernetes.io/cluster/ci-op-4ygffz3q-be93e-jnn92\":\"owned\"}"
time="2024-03-09T20:59:26Z" level=debug msg="Search for matching instances by tag in us-west-1 matching aws.Filter{\"openshiftClusterID\":\"2f9d8822-46fd-4fcd-9462-90c766c3d158\"}"
time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-bootstrap" Instance=i-0413f793ffabe9339
time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0413f793ffabe9339
time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-0" Instance=i-0ab5f920818366bb8
time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0ab5f920818366bb8
time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-2" Instance=i-0b93963476818535d
time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0b93963476818535d
time="2024-03-09T20:59:28Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-1" Instance=i-0797728e092bfbeef
time="2024-03-09T20:59:28Z" level=debug msg="Download complete" Instance=i-0797728e092bfbeef
time="2024-03-09T20:59:28Z" level=info msg="Pulling debug logs from the bootstrap machine"
time="2024-03-09T20:59:28Z" level=debug msg="Added /tmp/bootstrap-ssh3643557583 to installer's internal agent"
time="2024-03-09T20:59:28Z" level=debug msg="Added /tmp/.ssh/ssh-privatekey to installer's internal agent"
time="2024-03-09T21:01:39Z" level=error msg="Attempted to gather debug logs after installation failure: failed to connect to the bootstrap machine: dial tcp 13.57.212.80:22: connect: connection timed out"

from: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/1788/pull-ci-openshift-api-master-e2e-aws-ovn/1766560949898055680

We can see the console logs were downloaded, they should be saved in the log bundle.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Failed install where SSH to bootstrap node fails. https://github.com/openshift/installer/pull/8137 provides a potential reproducer
    2.
    3.

Actual results:

Expected results:

Additional info:

Error handling needs to be reworked here: https://github.com/openshift/installer/blob/master/cmd/openshift-install/gather.go#L160-L190

https://github.com/openshift/installer/pull/8263

Bug OCPBUGS-31901: PF5 Modal is not rendered correctly in Openshift Console Dynamic Plugin

View the Description View the linked PRs

Description of problem:

    After upgrading the dynamic console plugin to PF5, the modal is not rendered correctly. The header and footer of the modal are not displayed.

Version-Release number of selected component (if applicable):

    "@openshift-console/dynamic-plugin-sdk": "^1.1.0",
    "@patternfly/patternfly": "^5.2.0",
    "@patternfly/react-charts": "^7.2.0",
    "@patternfly/react-core": "^5.2.0",
    "@patternfly/react-icons": "^5.2.0",
    "@patternfly/react-table": "^5.2.0",
    "@patternfly/react-topology": "^5.2.0",

Steps to Reproduce:

    1. Include Modal component in a PF5 dynamic console plugin
    2. Render the modal component

Actual results:

The header and the footer of the modal are not displayed

Expected results:

The modal is rendered correctly

Additional info:

    This issue is related to the next PF modal component (currently in beta) created in PF version 5.2.0. As a temporary workaround, downgrading PF library to version 5.1.x fixes the issue.

https://github.com/openshift/console/pull/13832

Bug OCPBUGS-30530: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/76

Bug OCPBUGS-39324: [Backport-4.16] Cluster-ingress-operator logs an update when one didn't happen

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34413~~. The following is the description of the original issue:
—
Description of problem:

Cluster-ingress-operator logs an update when one didn't happen.

% grep -e 'successfully updated Infra CR with Ingress Load Balancer IPs' -m 1 -- ingress-operator.log       
2024-05-17T14:46:01.434Z	INFO	operator.ingress_controller	ingress/controller.go:326	successfully updated Infra CR with Ingress Load Balancer IPs

% grep -e 'successfully updated Infra CR with Ingress Load Balancer IPs' -c -- ingress-operator.log 
142

https://github.com/openshift/cluster-ingress-operator/pull/1016 has a logic error, which causes the operator to log this message even when it didn't do an update:

[https://github.com/openshift/cluster-ingress-operator/blob/009644a6b197b67f074cc34a07868ef01db31510/pkg/operator/controller/ingress/controller.go#L1135-L1145

// If the lbService exists for the "default" IngressController, then update Infra CR's PlatformStatus with the Ingress LB IPs. 

if haveLB && ci.Name == manifests.DefaultIngressControllerName 
{ if updated, err := computeUpdatedInfraFromService(lbService, infraConfig); err != nil 
{ errs = append(errs, fmt.Errorf("failed to update Infrastructure PlatformStatus: %w", err)) } 
else if updated 
{ if err := r.client.Status().Update(context.TODO(), infraConfig); err != nil { errs = append(errs, fmt.Errorf("failed to update Infrastructure CR after updating Ingress LB IPs: %w", err)) } } 

log.Info("successfully updated Infra CR with Ingress Load Balancer IPs") }

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%

Steps to Reproduce:

    1. Create a LB service for the default Ingress Operator
    2. Watch ingress operator logs for the search strings mentioned above

Actual results:

    Lots of these log entries will be seen even though no further updates are made to the default ingress operator:

2024-05-17T14:46:01.434Z INFO operator.ingress_controller ingress/controller.go:326 successfully updated Infra CR with Ingress Load Balancer IPs

Expected results:

    Only see this log entry when an update to Infra CR is made.  Perhaps just one the first time you add a LB service to the default ingress operator.

Additional info:

     https://github.com/openshift/cluster-ingress-operator/pull/1016 was backported to 4.15, so it would be nice to fix it and backport the fix to 4.15. It is rather noisy, and it's trivial to fix.

https://github.com/openshift/cluster-ingress-operator/pull/1139

Bug OCPBUGS-34542: Migrate HyperShift KAS to none endpoint reconciler type

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4096

Story HOSTEDCP-336: clean up API package deps to enable clean import

View the Description View the linked PRs

Currently importing the HyperShift API packages in other projects brings also the dependencies and conflicts of the rest of the non-API packages. This is a request to create Go submodule containing only the API packages.

Once we cut beta and the API is stable we should move it into its own repo

Bug OCPBUGS-31012: Allow removal of audit-logs container in kas when "None" policy is used

View the Description View the linked PRs

Description of problem:

    The kube-apiserver has a container called audit-logs
 that keeps audit records stored in the logs of the container (just 
prints to stdout). We would like the ability to disable this container 
whenever the None policy is used on the
 cluster. As of today, this consumes about 1gb of storage for each 
apiserver pod on the system. As you scale up, that 1gb per master adds 
up.

https://github.com/openshift/hypershift/issues/3764

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3765

Bug OCPBUGS-31558: Creating a NetworkAttachmentDefinition from the console always ends up in the Default namespace with the wrong name

View the Description View the linked PRs

Description of problem:

NetworkAttachmentDefinition always gets created in the default namespace if I use the form method "from the console".
It also doesn't honor the selected name and creates the NAD object with a different name (the selected name + random suffix).

Version-Release number of selected component (if applicable):

OCP 4.15.5

How reproducible:

From the console, under the Networking section, select NetworkAttachmentDefinitions and create a NAD using the Form method and not the YAML one.

Actual results:

The NAD gets created in the wrong namespace (always ends up in the default namespace) and with the wrong name.

Expected results:

The NAD resource gets created in the currently selected namespace with the chosen name

https://github.com/openshift/console/pull/13822

Bug OCPBUGS-28659: HyperShift KAS config should set ValidatingAdmissionPolicy plugin

View the Description View the linked PRs

Description of problem:

    The ValidatingAdmissionPolicy admission plugin is set in OpenShift 4.14+ kube-apiserver config, but is missing from the HyperShift config. It should be set.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    4.15: https://github.com/openshift/hypershift/blob/release-4.15/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L293-L341

    4.14: https://github.com/openshift/hypershift/blob/release-4.14/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L283-L331

Expected results:

    Expect to see ValidatingAdmissionPolicy

Additional info:

https://github.com/openshift/hypershift/pull/3488

Bug OCPBUGS-24693: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2166

Bug OCPBUGS-28207: Not clear if the Memory limits must be set in GiB or GB

View the Description View the linked PRs

The customer is pointing that the memory max value scale up is based in GB value, while the min memory for scale down is based on GiB.

So, setting both values in GB makes scale down fail:

Skipping ocgc4preplatgt-98fwh-worker-c-sk2sz - minimal limit exceeded for [memory]

While setting both values in GiB makes the scale up fail.

API reference says that GBs must be used for memory min/max limits:

https://docs.openshift.com/container-platform/4.12/rest_api/autoscale_apis/clusterautoscaler-autoscaling-openshift-io-v1.html#spec-resourcelimits

While OCP Cluster Autoscaler documentation points to GiBs:

https://docs.openshift.com/container-platform/4.12/machine_management/applying-autoscaling.html#cluster-autoscaler-cr_applying-autoscaling

https://github.com/openshift/cluster-autoscaler-operator/pull/313

Task MON-3703: Update downstream thanos to v0.34.0

View the linked PRs

https://github.com/openshift/thanos/pull/139

Bug OCPBUGS-27246: Unhealthy conditions table should put Type as first column on MachineHealthCheck details page

View the Description View the linked PRs

Description of problem:

There is an 'Unhealthy Conditions' table on MachineHealthCheck details page, currently the first column is 'Status'， user care more about Type then its Status

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-16-113018

How reproducible:

Always

Steps to Reproduce:

1. go to any MHC details page and check 'Unhealthy Conditions' table
2.
3.

Actual results:

in the table, 'Type' is the last column

Expected results:

we should put 'Type' as the first column since this is the most important factor user care

for comparsion, we can check the 'Conditions' table on ClusterOperators details page, the order is Type -> Status -> other info which is very user friendly

Additional info:

https://github.com/openshift/console/pull/13576

Bug OCPBUGS-24428: Ensure Passwords are Redacted in Agent Log Files

View the Description View the linked PRs

Description of problem:

I've noticed that 'agent-cluster-install.yaml' and 'journal.export' from the agent gather process contain passwords. It's important not to expose password information in any of these generated files.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1. Generate an agent ISO by utilising agent-config and install-config, including platform credentials
    2. Boot the ISO that was created
    3. Run the agent-gather command on the node 0 machine to generate files.

Actual results:

The 'agent-cluster-install.yaml' and 'journal.export' are containing the passwords information.

Expected results:

Password should be redacted.

Additional info:

https://github.com/openshift/assisted-service/pull/5868

Bug HIVE-2476: OpenStack MachinePool segfault with autoscaling minReplicas=0

View the Description View the linked PRs

When using an autoscaling MachinePool with OpenStack, setting minReplicas=0 results in a nil pointer panic.

See HIVE-2415 for context.

https://github.com/openshift/installer/pull/8227

Story CONSOLE-3971: Add `Impersonate Group` action to Groups list kebab and Group details Actions dropdown

View the Description View the linked PRs

As a user, I want to be able to impersonate Groups from the Groups list page kebab menu or the Group details page Actions menu dropdown so that I can more easily impersonate a Group without having to find the corresponding RoleBinding.

AC:

`Impersonate Group` action appears in the Groups list page kebab menu or the Group details page Actions menu dropdown

https://github.com/openshift/console/pull/13671

Bug OCPBUGS-24873: Update 4.16 ose-vsphere-cluster-api-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/29

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-vsphere/pull/29

Bug OCPBUGS-26203: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-workload-identity/pull/12

Bug OCPBUGS-29304: "Failed to watch *v1.PartialObjectMetadata" errors in prometheus-operator logs

View the Description View the linked PRs

Description of problem:

    Sometimes the prometheus-operator's informer will be stuck because it receives objects that can't be converted to *v1.PartialObjectMetadata.

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Not always

Steps to Reproduce:

    1. Unknown
    2.
    3.

Actual results:

    prometheus-operator logs show errors like

2024-02-09T08:29:35.478550608Z level=warn ts=2024-02-09T08:29:35.478491797Z caller=klog.go:108 component=k8s_client_runtime func=Warningf msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:110: failed to list *v1.PartialObjectMetadata: Get \"https://172.30.0.1:443/api/v1/secrets?resourceVersion=29022\": dial tcp 172.30.0.1:443: connect: connection refused"
2024-02-09T08:29:35.478592909Z level=error ts=2024-02-09T08:29:35.478541608Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="github.com/coreos/prometheus-operator/pkg/informers/informers.go:110: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: Get \"https://172.30.0.1:443/api/v1/secrets?resourceVersion=29022\": dial tcp 172.30.0.1:443: connect: connection refused"

Expected results:

    No error

Additional info:

    The bug has been introduced in v0.70.0 by https://github.com/prometheus-operator/prometheus-operator/pull/5993 so it only affects 4.16 and 4.15.

https://github.com/openshift/prometheus-operator/pull/277

Bug OCPBUGS-31536: `oc mirror --config` command failed with exit status 1

View the Description View the linked PRs

Description of problem:

    While mirroring with the following command[1], it is observed that the command fails with error[2] as shown below:
~~~
[1] oc mirror --config=imageSet-config.yaml docker://<registry_url>:<Port>/<repository>
~~~

~~~
[2] error: error rebuilding catalog images from file-based catalogs: error regenerating the cache for <registry_url>:<Port>/<repository>/community-operator-index:v4.15: exit status 1
~~~

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1. Download `oc mirror` v:4.15.0 binary
    2. Create ImageSet-config.yaml
    3. Use the following command:
~~~
oc mirror --config=imageSet-config.yaml docker://<registry_url>:<Port>/<repository>
~~~
    4. Observe the mentioned error

Actual results:

    Command failed to complete with the mentioned error.

Expected results:

   ICSP and mapping.txt file should be created.

Additional info:

Bug OCPBUGS-31745: TaskRun status is not displayed near the name

View the Description View the linked PRs

Description of problem:

The TaskRun status is not displayed near the TaskRun name on the TaskRun details page

All temporal resources like PipelineRuns, Builds, Shipwright BuildRuns, etc show the status of the resource (succeeded, failed, etc) near the name on the resource details page.

https://github.com/openshift/console/pull/13746

Bug OCPBUGS-36073: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/120

Bug OCPBUGS-5113: Date&Time values are not showing as per browser default language

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12506

Bug OCPBUGS-29104: HCP Has No Signer For SRE Break-Glass Access

View the Description View the linked PRs

Description of problem:

    Only customers have a break-glass certificate signer.

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    Always

Steps to Reproduce:

    1.create CSR with any other signer chosen
    2.does not work
    3.

Actual results:

    does not work

Expected results:

    should work

Additional info:

https://github.com/openshift/hypershift/pull/3542

Bug OCPBUGS-30073: Upload Jar form's Clear button is not functioning

View the Description View the linked PRs

Description of problem:

Clear Button in Upload Jar Form is not working, user need to close the form in-order to remove the previous selected JAR file.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Open Upload Jar File form from Add Page
    2. Upload a JAR file
    3. Remove the JAR the file by using clear button

Actual results:

    The selected JAR file is not removed even after using "Clear" button

Expected results:

    The "Clear" button should remove the selected file from the form.

Additional info:

https://github.com/openshift/console/pull/13644

Bug OCPBUGS-29183: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/937

Bug OCPBUGS-34886: upgrade status details formatting issue

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33762~~. The following is the description of the original issue:
—
Description of problem:

the newly available TP upgrade status command have formatting issue while expanding update health using --details flag, a plural s:<resource> is displayed, which according to dev supposed to be added to group.kind, but only the plural itself is displayed instead 

Resources:
  s: version

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-05-08-222442

How reproducible:

100%

Steps to Reproduce:

oc adm upgrade status --details=all
while there is any health issue with the cluster

Actual results:

  Resources:
    s: ip-10-0-76-83.us-east-2.compute.internal
  Description: Node is unavailable

  Resources:
    s: version
  Description: Cluster operator control-plane-machine-set is not available

  Resources:
    s: ip-10-0-58-8.us-east-2.compute.internal
  Description: failed to set annotations on node: unable to update node "&Node{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{},Allocatable:ResourceList{},Phase:,Conditions:[]NodeCondition{},Addresses:[]NodeAddress{},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:0,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},}": Patch "https://api-int.evakhoni-1514.qe.devcluster.openshift.com:6443/api/v1/nodes/ip-10-0-58-8.us-east-2.compute.internal": read tcp 10.0.58.8:48328->10.0.27.41:6443: read: connection reset by peer

Expected results:

should mention the correct <group.kind>s:<resource> ?

Additional info:
OTA-1246
slackl thread

https://github.com/openshift/oc/pull/1796

Bug OCPBUGS-28251: unable to use `continue: true` in user-defined AlertmanagerConfig

View the Description View the linked PRs

Description of problem:

Trying to define multiple receivers in a single user-defined AlertmanagerConfig

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

#### Monitoring for user-defined projects is enabled
```
oc -n openshift-monitoring get configmap cluster-monitoring-config -o yaml | head -4
```
```
apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
```

#### separate Alertmanager instance for user-defined alert routing is Enabled and Configured
```
oc -n openshift-user-workload-monitoring get configmap user-workload-monitoring-config -o yaml | head -6
```
```
apiVersion: v1
data:
  config.yaml: |
    alertmanager:
      enabled: true
      enableAlertmanagerConfig: true
```
create testing namespace 
oc new-project libor-alertmanager-testing 
```
## TESTING - MULTIPLE RECEIVERS IN ALERTMANAGERCONFIG
Single AlertmanagerConfig
`alertmanager_config_webhook_and_email_rootDefault.yaml`
```
apiVersion: monitoring.coreos.com/v1beta1
kind: AlertmanagerConfig
metadata:
  name: libor-alertmanager-testing-email-webhook
  namespace: libor-alertmanager-testing
spec:
  receivers:
  - name: 'libor-alertmanager-testing-webhook'
    webhookConfigs:
      - url: 'http://prometheus-msteams.internal-monitoring.svc:2000/occ-alerts'
  - name: 'libor-alertmanager-testing-email'
    emailConfigs:
      - to: USER@USER.CO
        requireTLS: false
        sendResolved: true
  - name: Default
  route:
    groupBy:
    - namespace
    receiver: Default
    groupInterval: 60s
    groupWait: 60s
    repeatInterval: 12h
    routes:
    - matchers:
      - name: severity
        value: critical
        matchType: '='
        continue: true
      receiver: 'libor-alertmanager-testing-webhook'
    - matchers:
      - name: severity
        value: critical
        matchType: '='
      receiver: 'libor-alertmanager-testing-email'
```
Once saved the continue statement is removed from the object. 
```
the configuration applied to alertmanager contains continue false statements
```
oc exec -n openshift-user-workload-monitoring alertmanager-user-workload-0 -- amtool config show --alertmanager.url http://localhost:9093 

```
route:
  receiver: Default
  group_by:
  - namespace
  continue: false
  routes:
  - receiver: libor-alertmanager-testing/libor-alertmanager-testing-email-webhook/Default
    group_by:
    - namespace
    matchers:
    - namespace="libor-alertmanager-testing"
    continue: true
    routes:
    - receiver: libor-alertmanager-testing/libor-alertmanager-testing-email-webhook/libor-alertmanager-testing-webhook
      matchers:
      - severity="critical"
      continue: false  <----
    - receiver: libor-alertmanager-testing/libor-alertmanager-testing-email-webhook/libor-alertmanager-testing-email
      matchers:
      - severity="critical"
      continue: false <-----
```
If I update the statements to read `continue: true` 
and test here: https://prometheus.io/webtools/alerting/routing-tree-editor/ 

then I get the desired results

workaround is to use 2 separate files - the continue statement is being added.

Actual results:

Once saved the continue statement is removed from the object.

Expected results:

continue true statement is retain and applied to alertmanager

Additional info:

https://github.com/openshift/prometheus-operator/pull/275

Bug OCPBUGS-31956: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1252

Bug OCPBUGS-24715: PipelineRuns is not loaded on repository details page

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13432

Bug OCPBUGS-36589: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4317

Bug OCPBUGS-24124: Update 4.15 ose-alibaba-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/46

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/46

Bug OCPBUGS-29453: catalogd crash loops after etcd restore

View the Description View the linked PRs

Description of problem:

The etcd team has introduced an e2e test that exercises a full etcd backup and restore cycle in OCP [1].

We run those tests as part of our PR builds and since 4.15 [2] (also 4.16 [3]), we have failed runs with the catalogd-controller-manager crash looping:

1 events happened too frequently
event [namespace/openshift-catalogd node/ip-10-0-25-29.us-west-2.compute.internal pod/catalogd-controller-manager-768bb57cdb-nwbhr hmsg/47b381d71b - Back-off restarting failed container manager in pod catalogd-controller-manager-768bb57cdb-nwbhr_openshift-catalogd(aa38d084-ecb7-4588-bd75-f95adb4f5636)] happened 44 times}


I assume something in that controller doesn't really deal gracefully with the restoration process of etcd, or the apiserver being down for some time.


[1] https://github.com/openshift/origin/blob/master/test/extended/dr/recovery.go#L97

[2] https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1205/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-etcd-recovery/1757443629380538368

[3] https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1191/pull-ci-openshift-cluster-etcd-operator-release-4.15-e2e-aws-etcd-recovery/1752293248543494144

Version-Release number of selected component (if applicable):

> 4.15

How reproducible:

always by running the test

Steps to Reproduce:

Run the test:

[sig-etcd][Feature:DisasterRecovery][Suite:openshift/etcd/recovery][Timeout:2h] [Feature:EtcdRecovery][Disruptive] Recover with snapshot with two unhealthy nodes and lost quorum [Serial]     

and observe the event invariant failing on it crash looping

Actual results:

catalogd-controller-manager crash loops and causes our CI jobs to fail

Expected results:

our e2e job is green again and catalogd-controller-manager doesn't crash loop

Additional info:

https://github.com/openshift/operator-framework-catalogd/pull/42

Bug OCPBUGS-31858: [v2] mirroring fails with "manifest unknown"

View the Description View the linked PRs

Description of problem:

Mirror failed due to {{manifest unknown}} on certain images for v2 format

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403251146.p0.g03ce0ca.assembly.stream.el9-03ce0ca", GitCommit:"03ce0ca797e73b6762fd3e24100ce043199519e9", GitTreeState:"clean", BuildDate:"2024-03-25T16:34:33Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1)  Test full==true with following imagesetconfig:
cat config-full.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.16
      full: true

`oc-mirror --config config-full.yaml  file://out-full --v2`

Actual results:

Mirror command always failed, but hit  errors :

2024/04/08 02:50:52  [ERROR]  : [Worker] errArray initializing source docker://registry.redhat.io/3scale-mas/zync-rhel8@sha256:8a108677b0b4100a3d58d924b2c7a47425292492df3dc6a2ebff33c58ca4e9e8: reading manifest sha256:8a108677b0b4100a3d58d924b2c7a47425292492df3dc6a2ebff33c58ca4e9e8 in registry.redhat.io/3scale-mas/zync-rhel8: manifest unknown

2024/04/08 09:12:55  [ERROR]  : [Worker] errArray initializing source docker://registry.redhat.io/integration/camel-k-rhel8-operator@sha256:4796985f3efcd37b057dea0a35b526c02759f8ea63327921cdd2e504c575d3c0: reading manifest sha256:4796985f3efcd37b057dea0a35b526c02759f8ea63327921cdd2e504c575d3c0 in registry.redhat.io/integration/camel-k-rhel8-operator: manifest unknown

2024/04/08 09:12:55  [ERROR]  : [Worker] errArray initializing source docker://registry.redhat.io/integration/camel-k-rhel8-operator@sha256:4796985f3efcd37b057dea0a35b526c02759f8ea63327921cdd2e504c575d3c0: reading manifest sha256:4796985f3efcd37b057dea0a35b526c02759f8ea63327921cdd2e504c575d3c0 in registry.redhat.io/integration/camel-k-rhel8-operator: manifest unknown

Expected results:

No error

https://github.com/openshift/oc-mirror/pull/854

Task MON-3753: Fix thanos make-local pipeline

View the Description View the linked PRs

ci/prow/test.local pipeline is currently broken due to the build04 cluster addressed to it in the buildfarm being a bit slow and making github.com/thanos-io/thanos/pkg/store go over the default 900s.

panic: test timed out after 15m0s1027running tests:1028	TestTSDBStoreSeries (4m3s)1029	TestTSDBStoreSeries/1SeriesWith10000000Samples (2m58s)1030

Extending it makes the test pass

 ok github.com/thanos-io/thanos/pkg/store 984.344s

We'll be addressing this alongside a follow-up issue to address this with an env var in upstream Thanos.

https://github.com/openshift/thanos/pull/144

Bug OCPBUGS-25029: Update 4.16 openshift-enterprise-haproxy-router-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/547

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/547

Bug OCPBUGS-27498: MachineAutoscaler with a minimum value of zero display a hyphen instead

View the Description View the linked PRs

Description of problem:

MachineAutoscaler resources with a minimum replica size of zero will display a hyphen ("-") instead of zero on the list and detail pages.

Version-Release number of selected component (if applicable):

    4.14.7

How reproducible:

    always

Steps to Reproduce:

    1.create a MachineAutoscaler with "min: 0" field
    2.save record
    3.navigate to MachineAutoscalers page under the Compute tab

Actual results:

    the min replicas indicates "-"

Expected results:

    min replicas indicates "0"

Additional info:

    attaching a screenshot

https://github.com/openshift/console/pull/13533

Bug OCPBUGS-29659: resolv-prepender gets into infinite loop with dns-change event

View the Description View the linked PRs

After NM introduced dns-change event, we are creating an infinite loop of on-prem-resolv-prepender.service runs. This is because our prepender script ALWAYS runs `nmcli general reload dns-rc`, no matter if the changes are needed for real or not.

Because of this, we have the following

1) NM change DNS
2) dispatcher script append a server to /etc/resolv.conf
3) dispatcher invoked again as new dns-change event.
4) dispatcher check again and creates new /etc/resolv.conf, the same as old
5) NM change DNS, dns-change event is invoked
6) goto 3

As a fix, prepender script should check if the newly generated file differs from existing /etc/resolv.conf and only apply change if needed.

https://github.com/openshift/machine-config-operator/pull/4196

Story CONSOLE-3969: Enable French and Spanish in the OCP Console

View the Description View the linked PRs

Enable French and Spanish in the OCP Console

A.C.

Add French and Spanish to User Preferences dropdown
Download French and Spanish translations from Memsource portal when it is ready
Review and open a pull request

https://github.com/openshift/console/pull/13665

Task MON-3633: Upgrade Prometheus to v2.48.1

View the linked PRs

Bug OCPBUGS-35435: vSphere UPI install fails during CAPI manifest creation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34618~~. The following is the description of the original issue:
—
Description of problem:

When performing a UPI installation, the installer fails with:
time="2024-05-29T14:38:59-04:00" level=fatal msg="failed to fetch Cluster API Machine Manifests: failed to generate asset \"Cluster API Machine Manifests\": unable to generate CAPI machines for vSphere unable to get network inventory path: unable to find network ci-vlan-896 in resource pool /cidatacenter/host/cicluster/Resources/ci-op-yrhjini6-9ef4a"

If I pre-create the resource pool(s), the installation proceeds.

Version-Release number of selected component (if applicable):

    4.16 nightly

How reproducible:

    consistently

Steps to Reproduce:

    1. Follow documentation to perform a UPI installation
    2. Installation will fail during manifest creation
    3.

Actual results:

    Installation fails

Expected results:

    Installation should proceed

Additional info:

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/51894/rehearse-51894-periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-upi-zones/1795883271666536448

https://github.com/openshift/installer/pull/8595

Bug OCPBUGS-25550: Update 4.16 ose-azure-file-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/47

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug TRT-1672: Hypershift Resource Failures

View the Description View the linked PRs

4.17.0-0.nightly-2024-05-16-195932 and 4.16.0-0.nightly-2024-05-17-031643 both have resource quota issues like

 failed to create iam: LimitExceeded: Cannot exceed quota for OpenIdConnectProvidersPerAccount: 100
	status code: 409, request id: f69bf82c-9617-408a-b281-92c1ef0ec974

 failed to create infra: failed to create VPC: VpcLimitExceeded: The maximum number of VPCs has been reached.
	status code: 400, request id: f90dcc5b-7e66-4a14-aa22-cec9f602fa8e

Seth has indicated he is working to clean things up in https://redhat-internal.slack.com/archives/C01CQA76KMX/p1715913603117349?thread_ts=1715557887.529169&cid=C01CQA76KMX

https://github.com/openshift/hypershift/pull/4052

Bug OCPBUGS-25045: Update 4.16 openshift-enterprise-tests-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/origin/pull/28452

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/origin/pull/28452

Bug OCPBUGS-24380: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2262

Bug OCPBUGS-33317: endpoint admission test relies on legacy token secrets

View the Description View the linked PRs

sig-network][endpoints] admission [apigroup:config.openshift.io] [It] blocks manual creation of EndpointSlices pointing to the cluster or service network

[Suite:openshift/conformance/parallel]
github.com/openshift/origin/test/extended/networking/endpoint_admission.go:81

[FAILED] error getting endpoint controller service account

https://github.com/openshift/origin/pull/28769

Bug OCPBUGS-29584: PowerVS: handle composite_instance

View the Description View the linked PRs

Description of problem:

When the IPI installer creates a service instance for the user, PowerVS will now have the type as composite_instance rather than service_instance. Fixup delete cluster to account for this change.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Create cluster
    2. Destroy cluster
    3.

Actual results:

The newly created service instance does not delete.

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8029

Bug OCPBUGS-31067: The CSRs are not auto-approved on 4.16.0-ec.4

View the Description View the linked PRs

Description of problem:

After build02 is upgraded to 4.16.0-ec.4 from 4.16.0-ec.3, the CSRs are not auto-approved. As a result, provisioned machines cannot become nodes of the cluster.

Version-Release number of selected component (if applicable):

oc --context build02 get clusterversion version
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-ec.4   True        False         4h28m

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Michael McCune feels the group "system:serviceaccounts" was missing in the CSR.
https://redhat-internal.slack.com/archives/CBZHF4DHC/p1710875084740869?thread_ts=1710861842.471739&cid=CBZHF4DHC

An inspection of the namespace openshift-cluster-machine-approver:
https://redhat-internal.slack.com/archives/CBZHF4DHC/p1710863462860809?thread_ts=1710861842.471739&cid=CBZHF4DHC

A workaround to approve the CSRs manually on b02:

https://github.com/openshift/release/pull/50016

https://github.com/openshift/machine-config-operator/pull/4276

Bug ACM-9504: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-32923: KubePodNotReady should not be at or above test should not fail with namespace/openshift-e2e-loki

View the Description View the linked PRs

Component Readiness has found a potential regression in [Unknown][invariant] alert/KubePodNotReady should not be at or above info in all the other namespaces.

Probability of significant regression: 100.00%

Sample (being evaluated) Release: 4.16
Start Time: 2024-04-18T00:00:00Z
End Time: 2024-04-24T23:59:59Z
Success Rate: 93.14%
Successes: 95
Failures: 7
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 100.00%
Successes: 482
Failures: 0
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Unknown&confidence=95&environment=ovn%20upgrade-micro%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=azure&sampleEndTime=2024-04-24%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-18%2000%3A00%3A00&testId=openshift-tests-upgrade%3A57b9d37e7f1d80cb25d3ba4386abc630&testName=%5BUnknown%5D%5Binvariant%5D%20alert%2FKubePodNotReady%20should%20not%20be%20at%20or%20above%20info%20in%20all%20the%20other%20namespaces&upgrade=upgrade-micro&variant=standard

https://github.com/openshift/origin/pull/28741

Bug OCPBUGS-22604: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/74

Bug OCPBUGS-35271: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13954

Bug OCPBUGS-23327: file path used for oci images can result in an error

View the Description View the linked PRs

Description of problem:

When executing oc mirror using an oci path, you can end up with in an error state when the destination is a file://&lt;path> destination (i.e. mirror to disk).

Version-Release number of selected component (if applicable):

4.14.2

How reproducible:

always

Steps to Reproduce:

At IBM we use the ibm-pak tool to generate a OCI catalog, but this bug is reproducible using a simple skopeo copy. Once you've copied the image locally you can move it around using file system copy commands to test this in different ways.

1. Make a directory structure like this to simulate how ibm-pak creates its own catalogs. The problem seems to be related to the path you use, so this represents the failure case:

mkdir -p /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list

2. make a location where the local storage will live:

mkdir -p /root/.ibm-pak/oc-mirror-storage

3. Next, copy the image locally using skopeo:

skopeo copy docker://icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:8d28189637b53feb648baa6d7e3dd71935656a41fd8673292163dd750ef91eec oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list --all --format v2s2

4. You can copy the OCI catalog content to a location where things will work properly so you can see a working example:

cp -r /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list /root/ibm-zcon-zosconnect-catalog

5. You'll need an ISC... I've included both the oci references in the example (the commented out one works, but the oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list reference fails).

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
operators:
- catalog: oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list
#- catalog: oci:///root/ibm-zcon-zosconnect-catalog
packages:
- name: ibm-zcon-zosconnect
channels:
- name: v1.0
full: true
targetTag: 27ba8e
targetCatalog: ibm-catalog
storageConfig:
local:
path: /root/.ibm-pak/oc-mirror-storage

6. run oc mirror (remember the ISC has oci refs for good and bad scenarios). You may want to change your working directory to different locations between running the good/bad examples.

oc mirror --config /root/.ibm-pak/data/publish/latest/image-set-config.yaml "file://zcon --dest-skip-tls --max-per-registry=6

Actual results:


Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
error: ".ibm-pak/data/publish/latest/catalog-oci/manifest-list/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c" is not a valid image reference: invalid reference format

Expected results:


Simple example where things were working with the oci:///root/ibm-zcon-zosconnect-catalog reference (this was executed in the same workspace so no new images were detected).

Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
3 related images processed in 668.063974ms
Writing image mapping to zcon/oc-mirror-workspace/operators.1700092336/manifests-ibm-zcon-zosconnect-catalog/mapping.txt
No new images detected, process stopping

Additional info:


I debugged the error that happened and captured one of the instances where the ParseReference call fails. This is only for reference to help narrow down the issue.

github.com/openshift/oc/pkg/cli/image/imagesource.ParseReference (/root/go/src/openshift/oc-mirror/vendor/github.com/openshift/oc/pkg/cli/image/imagesource/reference.go:111)
github.com/openshift/oc-mirror/pkg/image.ParseReference (/root/go/src/openshift/oc-mirror/pkg/image/image.go:79)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*MirrorOptions).addRelatedImageToMapping (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:194)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*OperatorOptions).plan.func3 (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/operator.go:575)
golang.org/x/sync/errgroup.(*Group).Go.func1 (/root/go/src/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:75)
runtime.goexit (/usr/local/go/src/runtime/asm_amd64.s:1594)

Also, I wanted to point out that because we use a period in the path (i.e. .ibm-pak) I wonder if that's causing the issue? This is just a guess and something to consider. *FOLLOWUP* ... I just removed the period from ".ibm-pak" and that seemed to make the error go away.

https://github.com/openshift/oc-mirror/pull/748

Bug OCPBUGS-24866: Update 4.16 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/127

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/127

Bug OCPBUGS-36594: ovnkube-node hostPath mount of /var/lib/kubelet is missing HostToContainer mountPropagation, breaks CSI driver

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30950~~. The following is the description of the original issue:
—
Description of problem: ovnkube-node and multus DaemonSets have hostPath volumes which prevent clean unmount of CSI Volumes because of missing "mountPropagation: HostToContainer" parameter in volumeMount

Version-Release number of selected component (if applicable): OpenShift 4.14

How reproducible: Always

Steps to Reproduce:

1. on a node mount a file system underneath /var/lib/kubelet/ simulating the mount of a CSI driver PersistentVolume

2. restart the ovnkube-node pod running on that node

3. unmount the filesystem from 1. The mount will then be removed from the host list of mounted devices however a copy of the mount is still active in the mount namespace of the ovnkube-node pod.
This is blocking some CSI drivers relying on multipath to properly delete a block device, since mounts are still registered on the block device.

Actual results:
CSI Volume Mount cleanly unmounted.

Expected results:
CSI Volume Mount uncleanly unmounted.

Additional info:

The mountPropagation parameter is already implememted in the volumeMount for the host rootFS:

- name: host-slash
readOnly: true
mountPath: /host
mountPropagation: HostToContainer

However the same parameter is missing for the volumeMount of /var/lib/kubelet

It is possible to workaround the issue with a kubectl patch command like this:

$ kubectl patch daemonset ovnkube-node --type='json' -p='[
{
"op": "replace",
"path": "/spec/template/spec/containers/7/volumeMounts/1",
"value": {
"name": "host-kubelet",
"mountPath": "/var/lib/kubelet",
"mountPropagation": "HostToContainer",
"readOnly": true
}
}
]'

Affected Platforms: Platform Agnostic UPI

https://github.com/openshift/cluster-network-operator/pull/2430

Bug OCPBUGS-42933: Errors when the image registry is configured to use a custom Azure storage account located in a different resource group blocked the upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42812~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42514. The following is the description of the original issue:
—
Description of problem:

When configuring the OpenShift image registry to use a custom Azure storage account in a different resource group, following the official documentation [1], the image-registy CO degrade and upgrade from version 4.14.x to 4.15.x fails. The image registry operator reports misconfiguration errors related to Azure storage credentials, preventing the upgrade and causing instability in the control plane.

[1] Configuring registry storage in Azure user infrastructure

Version-Release number of selected component (if applicable):

   4.14.33, 4.15.33

How reproducible:

Set up ARO:

- Deploy an ARO or OpenShift cluster on Azure, version 4.14.x.

Configure Image Registry:

- Follow the official documentation [1] to configure the image registry to use a custom Azure storage account located in a different resource group.
- Ensure that the image-registry-private-configuration-user secret is created in the openshift-image-registry namespace.
- Do not modify the installer-cloud-credentials secret.

Check the image registry CO status
Initiate Upgrade:

- Attempt to upgrade the cluster to OpenShift version 4.15.x.

Steps to Reproduce:

If we have the image-registry-private-configuration-user inplace and installer-cloud-credentials with no modified

We got the error

    NodeCADaemonProgressing: The daemon set node-ca is deployed Progressing: Unable to apply resources: unable to sync storage configuration: client misconfigured, missing 'TenantID', 'ClientID', 'ClientSecret', 'FederatedTokenFile', 'Creds', 'SubscriptionID' option(s)

The oeprator will also genreate a new secret image-registry-private-configuration with the same content as image-registry-private-configuration-user

$ oc get secret  image-registry-private-configuration -o yaml
apiVersion: v1
data:
  REGISTRY_STORAGE_AZURE_ACCOUNTKEY: xxxxxxxxxxxxxxxxx
kind: Secret
metadata:
  annotations:
    imageregistry.operator.openshift.io/checksum: sha256:524fab8dd71302f1a9ade9b152b3f9576edb2b670752e1bae1cb49b4de992eee
  creationTimestamp: "2024-09-26T19:52:17Z"
  name: image-registry-private-configuration
  namespace: openshift-image-registry
  resourceVersion: "126426"
  uid: e2064353-2511-4666-bd43-29dd020573fe
type: Opaque

2. then we delete the secret image-registry-private-configuration-user

now the secret image-registry-private-configuration will still exisit with the same content, but image-registry CO got a new error

NodeCADaemonProgressing: The daemon set node-ca is deployed Progressing: Unable to apply resources: unable to sync storage configuration: failed to get keys for the storage account arojudesa: storage.AccountsClient#ListKeys: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Storage/storageAccounts/arojudesa' under resource group 'aro-ufjvmbl1' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"

3. apply the workaround to manually changeing the secret installer-cloud-credentials azure_resourcegroup key with custom storage account resourcegroup

$ oc get secret installer-cloud-credentials -o yaml
apiVersion: v1
data:
  azure_client_id: xxxxxxxxxxxxxxxxx
  azure_client_secret: xxxxxxxxxxxxxxxxx
  azure_region: xxxxxxxxxxxxxxxxx
  azure_resource_prefix: xxxxxxxxxxxxxxxxx
  azure_resourcegroup: xxxxxxxxxxxxxxxxx <<<<<-----THIS
  azure_subscription_id: xxxxxxxxxxxxxxxxx
  azure_tenant_id: xxxxxxxxxxxxxxxxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/credentials-request: openshift-cloud-credential-operator/openshift-image-registry-azure
  creationTimestamp: "2024-09-26T16:49:57Z"
  labels:
    cloudcredential.openshift.io/credentials-request: "true"
  name: installer-cloud-credentials
  namespace: openshift-image-registry
  resourceVersion: "133921"
  uid: d1268e2c-1825-49f0-aa44-d0e1cbcda383
type: Opaque

The image-registry report healthy and this help the continue the upgrade

Actual results:

    The image registry seems still use the service principal way for Azure storage account authentication

Expected results:

    We expect the REGISTRY_STORAGE_AZURE_ACCOUNTKEY should the only thing image registry operator need for storage account authentication if Customer provide

The image registry continues to function using the custom Azure storage account in the different resource group.

Additional info:

Reproducibility: The issue is consistently reproducible by following the official documentation to configure the image registry with a custom storage account in a different resource group and then attempting an upgrade.
Related Issues:
- Similar problems have been reported in previous incidents, suggesting a systemic issue with the image registry operator's handling of Azure storage credentials.
Critical Customer Impact: Customers are required to perform manual interventions after every upgrade for each cluster, which is not sustainable and leads to operational overhead.

Slack : https://redhat-internal.slack.com/archives/CCV9YF9PD/p1727379313014789

https://github.com/openshift/cluster-image-registry-operator/pull/1134

Bug TRT-1445: Resource Watch on 4.16 payload jobs takes one extra hour to finish

View the Description View the linked PRs

With 4.15, resource watch completes whenever it is interrupted. But for 4.16 jobs, it does not until 1h grace period kicks in and the job is terminated by ci-operator. This means:

All 4.16 jobs take 1h longer than normally
If an upgrade pushes the overall time close to the limit, sub-jobs from aggregator will fail at 4h mark.

This was discovered when investigating this slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1705578623724259

https://github.com/openshift/origin/pull/28580

Task CLID-85: Change the config in executor.go to use templates

View the linked PRs

https://github.com/openshift/oc-mirror/pull/828

Bug OCPBUGS-24635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2153

Bug OCPBUGS-25403: update apiserver et.al. to v0.29.0

View the Description View the linked PRs

This update is compatible with recent kube-openapi changes so that we can drop our replace k8s.io/kube-openapi => k8s.io/kube-openapi v0.0.0-20230928195430-ce36a0c3bb67 introduced in 53b387f4f54c8426526478afd0fd3e2b4e7aec66.

https://github.com/openshift/cluster-monitoring-operator/pull/2205

Bug OCPBUGS-32941: Bump cluster-dns-operator to Kubernetes 1.29 for 4.16

View the Description View the linked PRs

Description of problem

The cluster-dns-operator repository vendors k8s.io/* v0.28.3 and controller-runtime v0.16.3. OpenShift 4.16 is based on Kubernetes 1.29.

Version-Release number of selected component (if applicable)

4.16.

How reproducible

Always.

Steps to Reproduce

Check https://github.com/openshift/cluster-dns-operator/blob/release-4.16/go.mod.

Actual results

The k8s.io/* packages are at v0.28.3, and the sigs.k8s.io/controller-runtime package is at v0.16.3.

Expected results

The k8s.io/* packages are at v0.29.0 or newer, and the sigs.k8s.io/controller-runtime package is at v0.17.0 or newer.

Additional info

The controller-runtime v0.17 release includes some breaking changes, such as the removal of apiutil.NewDiscoveryRESTMapper; see the release notes at https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.17.0. The k8s.io/api v0.29 release drops flowcontrol/v1alpha1, which means we also need to bump openshift/api in order to get https://github.com/openshift/api/pull/1647. https://github.com/openshift/cluster-dns-operator/pull/394 will include a openshift/api bump that includes the removal of flowcontrol/v1alpha1 from openshift/api, so better to merge #394 first, and then bump k8s.io/api and controller-runtime after that.

https://github.com/openshift/cluster-dns-operator/pull/408

Bug OCPBUGS-30102: Add support to disable machine management components

View the Description View the linked PRs

Description of problem:

    For high scalability, we need an option to disable unused machine management control plane components.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1. Create HostedCluster/HostedControlPlane
    2. 
    3.

Actual results:

    Machine management components (cluster-api, machine-approver, auto-scaler, etc) are deployed

Expected results:

    Should have option to disable as some use cases they provide no utility.

Additional info:

https://github.com/openshift/hypershift/pull/3570

Bug OCPBUGS-31283: ART requests updates to 4.16 image ose-azure-workload-identity-webhook-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-workload-identity/pull/16

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-workload-identity/pull/16

Bug OCPBUGS-18701: [Azuredisk-csi-driver] allocatable volumes count incorrect in csinode for Standard_B4as_v2 instance types

View the Description View the linked PRs

Description of problem:

[Azuredisk-csi-driver] allocatable volumes count incorrect in csinode for Standard_B4as_v2 instance types

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-02-132842

How reproducible:

Always

Steps to Reproduce:

1. Install Azure OpenShift cluster use the Standard_B4as_v2 instance type
2. Check the csinode object allocatable volumes count
3. Create a pod with the max allocatable volumes count pvcs(provision by azuredisk-csi-driver)

Actual results:

In step 2 the allocatable volumes count is 16.
$ oc get csinode pewang-0908s-r6lwd-worker-southcentralus3-tvwwr -ojsonpath='{.spec.drivers[?(@.name=="disk.csi.azure.com")].allocatable.count}'
16

In step 3 the pod stuck at containerCreating that caused by attach volume failed of 
09-07 22:38:28.758        "message": "The maximum number of data disks allowed to be attached to a VM of this size is 8.",\r

Expected results:

In step 2 the allocatable volumes count should be 8.
In step 3 the pod should be Running well and all volumes could be read and written data

Additional info:

$ az vm list-skus -l eastus --query "[?name=='Standard_B4as_v2']"| jq -r '.[0].capabilities[] | select(.name =="MaxDataDiskCount")'
{
  "name": "MaxDataDiskCount",
  "value": "8"
}

Currently in 4.14 we use the v1.28.1 driver, I checked the upstream issues and PRs, the issue fixed in v1.28.2
https://github.com/kubernetes-sigs/azuredisk-csi-driver/releases/tag/v1.28.2

https://github.com/openshift/azure-disk-csi-driver/pull/49

Bug OCPBUGS-35706: ROSA STS Classic Cluster creation goes into error status sometimes with version 4.16.0-0.nightly-2024-06-14-072943

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35494~~. The following is the description of the original issue:
—
Description of problem:

ROSA Cluster creation goes into error status sometimes with version 4.16.0-0.nightly-2024-06-14-072943

Version-Release number of selected component (if applicable):

How reproducible:

60%

Steps to Reproduce:

1. Prepare VPC
2. Create a rosa sts cluster cluster with subnets
3. Wait for cluster ready

Actual results:

Cluster goes into error status

Expected results:

Cluster get ready

Additional info:

The failure happens by CI job triggering Here are the Jobs:

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-aws-rosa-sts-localzone-f7/180153139425024819

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-aws-rosa-sts-private-proxy-f7/1801531362717470720

https://github.com/openshift/installer/pull/8622

Bug OCPBUGS-36838: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/521

Bug OCPBUGS-41996: Unable to edit "until" in silences (of alerts) from the Developer perspective

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34533~~. The following is the description of the original issue:
—
Description of problem:

   I have a CU who reported that they are not able to edit the "Until" option from developers perspective.

Version-Release number of selected component (if applicable):

    OCP v4.15.11

Screenshot
https://redhat-internal.slack.com/archives/C04BSV48DJS/p1716889816419439

https://github.com/openshift/console/pull/14296

Bug OCPBUGS-31058: API documentation for ingresses.status.componentRoutes.currentHostnames has developer notes

View the Description View the linked PRs

Description of problem

The API documentation for the status.componentRoutes.currentHostnames field in the ingress config API has developer notes from the Go definition.

Version-Release number of selected component (if applicable)

OpenShift 4.11 and all subsequent versions of OpenShift so far.

How reproducible

100%.

Steps to Reproduce

1. Read the documentation for the API field: oc explain ingresses.status.componentRoutes.currentHostnames --api-version=config.openshift.io/v1

Actual results

The ingresses.config.openshift.io CRD has developer notes in the description of the status.componentRoutes.currentHostnames field:

% oc explain ingresses.status.componentRoutes.currentHostnames --api-version=config.openshift.io/v1
KIND:     Ingress
VERSION:  config.openshift.io/v1

FIELD:    currentHostnames <[]string>

DESCRIPTION:
     currentHostnames is the list of current names used by the route. Typically,
     this list should consist of a single hostname, but if multiple hostnames
     are supported by the route the operator may write multiple entries to this
     list.

     Hostname is an alias for hostname string validation. The left operand of
     the | is the original kubebuilder hostname validation format, which is
     incorrect because it allows upper case letters, disallows hyphen or number
     in the TLD, and allows labels to start/end in non-alphanumeric characters.
     See https://bugzilla.redhat.com/show_bug.cgi?id=2039256.
     ^([a-zA-Z0-9\p{S}\p{L}]((-?[a-zA-Z0-9\p{S}\p{L}]{0,62})?)|([a-zA-Z0-9\p{S}\p{L}](([a-zA-Z0-9-\p{S}\p{L}]{0,61}[a-zA-Z0-9\p{S}\p{L}])?)(\.)){1,}([a-zA-Z\p{L}]){2,63})$
     The right operand of the | is a new pattern that mimics the current API
     route admission validation on hostname, except that it allows hostnames
     longer than the maximum length:
     ^(([a-z0-9][-a-z0-9]{0,61}[a-z0-9]|[a-z0-9]{1,63})[\.]){0,}([a-z0-9][-a-z0-9]{0,61}[a-z0-9]|[a-z0-9]{1,63})$
     Both operand patterns are made available so that modifications on ingress
     spec can still happen after an invalid hostname was saved via validation by
     the incorrect left operand of the | operator.

Expected results

The second paragraph should be omitted from the CRD:

% oc explain ingresses.status.componentRoutes.currentHostnames --api-version=config.openshift.io/v1
KIND:     Ingress
VERSION:  config.openshift.io/v1

FIELD:    currentHostnames <[]string>

DESCRIPTION:
     currentHostnames is the list of current names used by the route. Typically,
     this list should consist of a single hostname, but if multiple hostnames
     are supported by the route the operator may write multiple entries to this
     list.

Additional info

The API field was introduced in OpenShift 4.8: https://github.com/openshift/api/pull/852/commits/c53c57f3d465f28b27ee4fad48763f049228486e

The developer note was added in OpenShift 4.11: https://github.com/openshift/api/pull/1120/commits/1fec415423985530a8925a5fd8c87e1741d8c2fb

https://github.com/openshift/api/pull/1819

Bug OCPBUGS-25627: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/31

Bug OCPBUGS-31843: unable to logout when logged in as kubeadmin

View the Description View the linked PRs

Description of problem:

'kubeadmin' user unable to logout when logged with 'kube:admin' IDP, clicking on 'Log out' does nothing

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-04-06-020637

How reproducible:

Always

Steps to Reproduce:

1. Login to console with 'kube:admin' IDP, type username 'kubeadmin' and its password
2. Try to Log out from console

Actual results:

2. unable to log out successfully

Expected results:

2. any user should be able to log out successfully

Additional info:

https://github.com/openshift/console/pull/13740

Task MON-2853: Write Runbook for Alert TargetDown

View the Description View the linked PRs

A runbook for the alert TargetDown will useful for Openshift users.
This runbook should explain:
1. how to identify which targets are down
2. how to investigate the reason why the target goes offline
3. resolution of common causes bringing down the target

Related cases:

TLS certificate expiration, see discussion

https://github.com/openshift/cluster-monitoring-operator/pull/2237

Bug OCPBUGS-43826: Missing runbook for the TelemeterClientFailures alerting rule

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43788~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-18007. The following is the description of the original issue:
—
Description of problem:

When the TelemeterClientFailures alert fires, there's no runbook link explaining the meaning of the alert and what to do about it.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Check the TelemeterClientFailures alerting rule's annotations
2.
3.

Actual results:

No runbook_url annotation.

Expected results:

runbook_url annotation is present.

Additional info:

This is a consequence of a telemeter server outage that triggered questions from customers about the alert:
https://issues.redhat.com/browse/OHSS-25947
https://issues.redhat.com/browse/OCPBUGS-17966
Also in relation to https://issues.redhat.com/browse/OCPBUGS-17797

https://github.com/openshift/cluster-monitoring-operator/pull/2508

Bug OCPBUGS-18699: Web Console Shows Non-printable file detected

View the Description View the linked PRs

Description of problem:

Openshift Console shows "Info alert:Non-printable file detected. File contains non-printable characters. Preview is not available." while edit an XML file type configmaps.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create configmap from file:
# oc create cm test-cm --from-file=server.xml=server.xml
configmap/test-cm created

2. If we try to edit the configmap in the OCP console we see the following error:

Info alert:Non-printable file detected.
File contains non-printable characters. Preview is not available.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13496

Bug OCPBUGS-24946: Update 4.16 ose-cluster-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-baremetal-operator/pull/393

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-baremetal-operator/pull/393

Bug OCPBUGS-31563: [AWS SDK Install] Port 22 is missing worker node's security group in SDK install

View the Description View the linked PRs

Description of problem:

Port 22 is added to the worker node security group in TF install [1]:

resource "aws_security_group_rule" "worker_ingress_ssh" {
  type          	= "ingress"
  security_group_id = aws_security_group.worker.id
  description   	= local.description

  protocol	= "tcp"
  cidr_blocks = var.cidr_blocks
  from_port   = 22
  to_port 	= 22
}

But it's missing in SDK install [2]


[1] https://github.com/openshift/installer/blob/master/data/data/aws/cluster/vpc/sg-worker.tf#L39-L48
[2] https://github.com/openshift/installer/pull/7676/files#diff-c89a0152f7d51be6e3830081d1c166d9333628982773c154d8fc9a071c8ff765R272

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-03-31-180021

How reproducible:

Always

Steps to Reproduce:

    1. Create a cluster using SDK installation method
    2.
    3.

Actual results:

See description.

Expected results:

Port 22 is added to worker node's security group.

Additional info:

https://github.com/openshift/installer/pull/8229

Bug OCPBUGS-25453: duplicate failure domains in CMPS

View the Description View the linked PRs

Description of problem:

 CMPS was supported in 4.15 on vsphere platform when enable TechPreviewNoUpgrade. but after I build the cluster with no failure domains/single failure domain setting in install-config. there were three duplicated failure domains.

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2023-12-11-033133

How reproducible:

    install a cluster with TP enabled and don't set failure domain (or set single failure doamin) in install-config.

Steps to Reproduce:

    1. do not config failure domain in install-config (or set single failure doamin).
    2. install a cluster with TP enabled
    3. check CPMS with command:   
       oc get controlplanemachineset -oyaml

Actual results:

duplicated failure domains.
        failureDomains:
     platform: VSphere
     vsphere:
     - name: generated-failure-domain
     - name: generated-failure-domain
     - name: generated-failure-domain
    metadata:
     labels:

Expected results:

 failure domain should not duplicated when setting single failure domain in install-config.
 failure domain should not exists when not setting failure domain in install-config.

Additional info:

https://github.com/openshift/installer/pull/7860

Bug OCPBUGS-26400: tuned: tuned breaks dynamic IRQ affinity

View the Description View the linked PRs

Description of problem:

If GloballyDisableIrqLoadBalancing in disabled in the performance profile then irqs should be balanced across all cpus minus the cpus that are explicitly removed by crio via the pod annotation irq-load-balancing.crio.io: "disable"

There's an issue when the scheduler plugin in tuned will attempt to affine all irqs to the non-isolated cores. Isolated here means non-reserved, not truly isolated cores. This is directly at odds with the user intent. So now we have tuned fighting with crio/irqbalance both trying to do different things. 

Scenarios
- If a pod get’s launched with the annotation after tuned has started, runtime or after a reboot - ok 
- On a reboot if tuned recovers after the guaranteed pod has been launched - broken
- If tuned restarts at runtime for any reason - broken

Version-Release number of selected component (if applicable):

   4.14 and likely earlier

How reproducible:

    See description

Steps to Reproduce:

    1.See description 
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/976

Bug OCPBUGS-35291: OpenShift won't start when instances shut down and restarted

View the Description View the linked PRs

Description of problem: OCP doesn't resume from "hibernation" (shutdown/restart of cloud instances).

NB: This is not related to certs.

Version-Release number of selected component (if applicable): 4.16 nightlies, at least 4.16.0-0.nightly-2024-05-14-095225 through 4.16.0-0.nightly-2024-05-21-043355

How reproducible: 100%

Steps to Reproduce:

1. Install 4.16 nightly on AWS. (Other platforms may be affected, don't know.)
2. Shut down all instances. (I've done this via hive hibernation; Vadim Rutkovsky has done it via cloud console.)
3. Start instances. (Ditto.)

Actual results: OCP doesn't start. Per Vadim:
"kubelet says host IP unknown; known addresses: [] so etcd can't start."

Expected results: OCP starts normally.

Additional info: We originally thought this was related to OCPBUGS-30860, but reproduced with nightlies containing the updated AMIs.

Bug OCPBUGS-32346: openshift-apiserver-operator capable of firing over 100 events in seconds on OpenShiftAPICheckFailed

View the Description View the linked PRs

In https://issues.redhat.com/browse/OCPBUGS-24195 Lukasz is working on a solution to a problem both the auth and apiserver operators have where a large number of identical kube events can be emitted. The kube apiserver was granted an exception here, but the linked bug was never fixed.

These OpenShiftAPICheckFailed events are reportedly originating during bootstrap, and if bootstrap takes too long many can be emitted, which can trip a test that watches for this sort of thing.

Ideally the problem should be fixed and it sounds like Lukasz is on the path to one which we hope could be used for the apiserver operator as well. (start a controller monitoring the aggregated API only after the bootstrap is complete)

Fix here would hopefully be to leverage what comes out of ~~OCPBUGS-24195~~, apply it for the apiserver operator, and then remove the exception linked above in origin.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/575

Bug OCPBUGS-38486: [HCP] APIServer CR is not synced to the hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23922~~. The following is the description of the original issue:
—
Description of problem:

In https://issues.redhat.com//browse/STOR-1453: TLSSecurityProfile feature, storage clustercsidriver.spec.observedConfig will get the value from APIServer.spec.tlsSecurityProfile to set cipherSuites and minTLSVersion in all corresponding csi driver, but it doesn't work well in hypershift cluster when only setting different value in the hostedclusters.spec.configuration.apiServer.tlsSecurityProfile in management cluster, the APIServer.spec in hosted cluster is not synced and CSI driver doesn't get the updated value as well.

Version-Release number of selected component (if applicable):

Pre-merge test with openshift/csi-operator#69,openshift/csi-operator#71

How reproducible:

Always

Steps to Reproduce:

1. Have a hypershift cluster, the clustercsidriver get the default value like "minTLSVersion": "VersionTLS12"
$ oc get clustercsidriver ebs.csi.aws.com -ojson | jq .spec.observedConfig.targetcsiconfig.servingInfo
{
  "cipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
  ],
  "minTLSVersion": "VersionTLS12"
}
 
2. set the tlsSecurityProfile in hostedclusters.spec.configuration.apiServer in mgmtcluster, like the "minTLSVersion": "VersionTLS11":
 $ oc -n clusters get hostedclusters hypershift-ci-14206 -o json | jq .spec.configuration
{
  "apiServer": {
    "audit": {
      "profile": "Default"
    },
    "tlsSecurityProfile": {
      "custom": {
        "ciphers": [
          "ECDHE-ECDSA-CHACHA20-POLY1305",
          "ECDHE-RSA-CHACHA20-POLY1305",
          "ECDHE-RSA-AES128-GCM-SHA256",
          "ECDHE-ECDSA-AES128-GCM-SHA256"
        ],
        "minTLSVersion": "VersionTLS11"
      },
      "type": "Custom"
    }
  }
}     

3. This doesn't pass to apiserver in hosted cluster
oc get apiserver cluster -ojson | jq .spec
{
  "audit": {
    "profile": "Default"
  }
}     

4. CSI Driver still use the default value which is different from mgmtcluster.hostedclusters.spec.configuration.apiServer
$ oc get clustercsidriver ebs.csi.aws.com -ojson | jq .spec.observedConfig.targetcsiconfig.servingInfo
{
  "cipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
  ],
  "minTLSVersion": "VersionTLS12"
}

Actual results:

The tlsSecurityProfile doesn't get synced

Expected results:

The tlsSecurityProfile should get synced

Additional info:

https://github.com/openshift/hypershift/pull/4559

Story OCPCLOUD-2517: Promote CAPI IPAM CRDs to GA

View the Description View the linked PRs

User Story

As a developer I want to remove the NoUpgrade annotation from the CAPI IPAM CRDs so that I can promote them to General Availability

Background

The SPLAT team is planning to have the CAPI IPAM CRDs promoted to GA because they need them in a component they are promoting to GA.

Steps

remove the NoUpgrade annotation from the CAPI IPAM CRDs

Stakeholders

SPLAT
Cluster Infra Team

Definition of Done

manifests-generator PR merged
cluster-api repo PR merged

Bug OCPBUGS-29855: Need option to disable dedicated request serving isolation

View the Description View the linked PRs

Description of problem:

    To operate HyperShift at high scale, we need an option to disable dedicated request serving isolation, if not used.

Version-Release number of selected component (if applicable):

    4.16, 4.15, 4.14, 4.13

How reproducible:

    100%

Steps to Reproduce:

    1. Install hypershift operator for versions 4.16, 4.15, 4.14, or 4.13
    2. Observe start-up logs
    3. Dedicated request serving isolation controllers are started

Actual results:

    Dedicated request serving isolation controllers are started

Expected results:

    Dedicated request serving isolation controllers to not start, if unneeded

Additional info:

https://github.com/openshift/hypershift/pull/3601

Bug OCPBUGS-35308: router deployment fails on y-stream upgrade 4.13->4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25758~~. The following is the description of the original issue:
—
Description of problem:

router pod is in CrashLoopBackup after y-stream upgrade from 4.13->4.14

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

    1. create a cluster with 4.13
    2. upgrade HC to 4.14
    3.

Actual results:

    router pod in CrashLoopBackoff

Expected results:

    router pod is running after upgrade HC from 4.13->4.14

Additional info:

images:
======
HO image: 4.15
upgrade HC from 4.13.0-0.nightly-2023-12-19-114348 to 4.14.0-0.nightly-2023-12-19-120138

router pod log:
==============
jiezhao-mac:hypershift jiezhao$ oc get pods router-9cfd8b89-plvtc -n clusters-jie-test
NAME          READY  STATUS       RESTARTS    AGE
router-9cfd8b89-plvtc  0/1   CrashLoopBackOff  11 (45s ago)  32m
jiezhao-mac:hypershift jiezhao$

Events:
 Type   Reason              Age          From        Message
 ----   ------              ----          ----        -------
 Normal  Scheduled            27m          default-scheduler Successfully assigned clusters-jie-test/router-9cfd8b89-plvtc to ip-10-0-42-36.us-east-2.compute.internal
 Normal  AddedInterface          27m          multus       Add eth0 [10.129.2.82/23] from ovn-kubernetes
 Normal  Pulling             27m          kubelet      Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d2acba15f69ea3648b3c789111db34ff06d9230a4371c5949ebe3c6218e6ea3"
 Normal  Pulled              27m          kubelet      Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d2acba15f69ea3648b3c789111db34ff06d9230a4371c5949ebe3c6218e6ea3" in 14.309s (14.309s including waiting)
 Normal  Created             26m (x3 over 27m)   kubelet      Created container private-router
 Normal  Started             26m (x3 over 27m)   kubelet      Started container private-router
 Warning BackOff             26m (x5 over 27m)   kubelet      Back-off restarting failed container private-router in pod router-9cfd8b89-plvtc_clusters-jie-test(e6cf40ad-32cd-438c-8298-62d565cf6c6a)
 Normal  Pulled              26m (x3 over 27m)   kubelet      Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d2acba15f69ea3648b3c789111db34ff06d9230a4371c5949ebe3c6218e6ea3" already present on machine
 Warning FailedToRetrieveImagePullSecret 2m38s (x131 over 27m) kubelet      Unable to retrieve some image pull secrets (router-dockercfg-q768b); attempting to pull the image may not succeed.
jiezhao-mac:hypershift jiezhao$

jiezhao-mac:hypershift jiezhao$ oc logs router-9cfd8b89-plvtc -n clusters-jie-test
[NOTICE]  (1) : haproxy version is 2.6.13-234aa6d
[NOTICE]  (1) : path to executable is /usr/sbin/haproxy
[ALERT]  (1) : config : [/usr/local/etc/haproxy/haproxy.cfg:52] : 'server ovnkube_sbdb/ovnkube_sbdb' : could not resolve address 'None'.
[ALERT]  (1) : config : Failed to initialize server(s) addr.
jiezhao-mac:hypershift jiezhao$

notes:
=====
not sure if it has the same root cause as https://issues.redhat.com/browse/OCPBUGS-24627

https://github.com/openshift/hypershift/pull/4194

Bug OCPBUGS-28933: ART requests updates to 4.16 image ose-powervs-block-csi-driver-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/66

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/66

Bug OCPBUGS-30056: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/800

Bug OCPBUGS-32292: Pinned dependency versions are outdated

View the Description View the linked PRs

Description of problem:

To bump some dependencies for CVE fixes, we added `replace` directives in the go.mod file. These dependencies have since moved way past the pinned version.
We should drop the replaces before we run into problems from having deps pinned to versions that are too old. For example, I've seen PRs with the following diff:

# golang.org/x/net v0.23.0 => golang.org/x/net v0.5.0

which is not really what we want.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Some dependencies are not upgraded because they are pinned.

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8271

Bug OCPBUGS-24030: MachineConfigNode condition Cordoned is used to show status of both cordon and uncordon

View the Description View the linked PRs

When we apply a machine config with additional ssh key info, this action only needs to uncordon the node, when uncordon is happening, condition Cordoned = True. it will make the user confuse. maybe we can refine this design to show status of cordon/uncordon separately

lastTransitionTime: '2023-11-28T16:53:58Z'   message: 'Action during previous iteration: (Un)Cordoned node. The node is reporting     Unschedulable = false'   reason: UpdateCompleteCordoned   status: 'False'   type: Cordoned

https://github.com/openshift/machine-config-operator/pull/4065

Bug OCPBUGS-24303: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7893

Bug OCPBUGS-28965: [IBMCloud] Cluster install failed and machine-api-controllers stucks in CrashLoopBackOff

View the Description View the linked PRs

Description of problem:

Cluster install failed on ibm cloud and machine-api-controllers stucks in CrashLoopBackOff

Version-Release number of selected component (if applicable):

from 4.16.0-0.nightly-2024-02-02-224339

How reproducible:

Always

Steps to Reproduce:

    1. Install cluster on IBMCloud
    2.
    3.

Actual results:

Cluster install failed
$ oc get node                       
NAME                     STATUS   ROLES                  AGE     VERSION
maxu-16-gp2vp-master-0   Ready    control-plane,master   7h11m   v1.29.1+2f773e8
maxu-16-gp2vp-master-1   Ready    control-plane,master   7h11m   v1.29.1+2f773e8
maxu-16-gp2vp-master-2   Ready    control-plane,master   7h11m   v1.29.1+2f773e8

$ oc get machine -n openshift-machine-api           
NAME                           PHASE   TYPE   REGION   ZONE   AGE
maxu-16-gp2vp-master-0                                        7h15m
maxu-16-gp2vp-master-1                                        7h15m
maxu-16-gp2vp-master-2                                        7h15m
maxu-16-gp2vp-worker-1-xfvqq                                  7h5m
maxu-16-gp2vp-worker-2-5hn7c                                  7h5m
maxu-16-gp2vp-worker-3-z74z2                                  7h5m

openshift-machine-api                              machine-api-controllers-6cb7fcdcdb-k6sv2                     6/7     CrashLoopBackOff   92 (31s ago)     7h1m

$ oc logs -n openshift-machine-api  -c  machine-controller  machine-api-controllers-6cb7fcdcdb-k6sv2                          
I0204 10:53:34.336338       1 main.go:120] Watching machine-api objects only in namespace "openshift-machine-api" for reconciliation.panic: runtime error: invalid memory address or nil pointer dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x285fe72]
goroutine 25 [running]:k8s.io/klog/v2/textlogger.(*tlogger).Enabled(0x0?, 0x0?)	/go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/k8s.io/klog/v2/textlogger/textlogger.go:81 +0x12sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Enabled(0xc000438100, 0x0?)	/go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:114 +0x92github.com/go-logr/logr.Logger.Info({{0x3232210?, 0xc000438100?}, 0x0?}, {0x2ec78f3, 0x17}, {0x0, 0x0, 0x0})	/go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/github.com/go-logr/logr/logr.go:276 +0x72sigs.k8s.io/controller-runtime/pkg/metrics/server.(*defaultServer).Start(0xc0003bd2c0, {0x322e350?, 0xc00058a140})	/go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/metrics/server/server.go:185 +0x75sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1(0xc0002c4540)	/go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:223 +0xc8created by sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile in goroutine 24	/go/src/github.com/openshift/machine-api-provider-ibmcloud/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:207 +0x19d

Expected results:

 Cluster install succeed

Additional info:

may relate to this pr https://github.com/openshift/machine-api-provider-ibmcloud/pull/34

https://github.com/openshift/machine-api-provider-ibmcloud/pull/35

Bug OCPBUGS-30503: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-snapshotter/pull/144

Bug OCPBUGS-33686: alertmanager Dockerfile.ocp broken

View the Description View the linked PRs

Our ocp Dockerfile curently uses the build target. This however now also builds the react frontend and requires npm in turn. We build the frontend during mirroring already.

Switch out Dockerfile to use the common-build target. This should enable the bump to 0.27, tracked through this issue as well.

https://github.com/openshift/prometheus-alertmanager/pull/91

Task MON-3649: Upgrade prom-label-proxy to v0.8.0

View the linked PRs

Bug OCPBUGS-34500: [4.16] backport manual etcd signer cert rotation e2e test

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28830

Bug OCPBUGS-38911: Values entered into the Instantiate Template form are automatically cleared

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38412~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-32773. The following is the description of the original issue:
—
Description of problem:

In the OpenShift WebConsole, when using the Instantiate Template screen, the values entered into the form are automatically cleared.

This issue occurs for users with developer roles who do not have administrator privileges, but does not occur for users with the cluster-admin cluster role. 


Additionally, using the developer tools of the web browser, I observed the following console logs when the values were cleared:


https://console-openshift-console.apps.mmatsuta-blue.apac.aws.cee.support/api/prometheus/api/v1/rules 403 (Forbidden)
https://console-openshift-console.apps.mmatsuta-blue.apac.aws.cee.support/api/alertmanager/api/v2/silences 403 (Forbidden)


It appears that a script attempting to fetch information periodically from PrometheusRule and Alertmanager's silences encounters a 403 error due to insufficient permissions, which causes the script to halt and the values in the form to be reset and cleared.


This bug prevents users from successfully creating instances from templates in the WebConsole.

Version-Release number of selected component (if applicable):

4.15 4.14

How reproducible:

YES

Steps to Reproduce:

1. Log in with a non-administrator account.
2. Select a template from the developer catalog and click on Instantiate Template.
3. Enter values into the initially empty form.
4. Wait for several seconds, and the entered values will disappear.

Actual results:

Entered values are disappeard

Expected results:

Entered values are appeard

Additional info:

I could not find the appropriate component to report this issue. I reluctantly chose Dev Console, but please adjust it to the correct component.

https://github.com/openshift/console/pull/14187

Bug OCPBUGS-29519: CAPI manifests missing CustomNoUpgrade annotation

View the Description View the linked PRs

Description of problem:

CAPI manifests have the TechPreviewNoUpgrade annotation but are missing the CustomNoUpgrade annotation

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-20035: CI does not provide failure details

View the Description View the linked PRs

Description of problem:

Results of -hypershift-aws-e2e-external CI jobs do not contain obvious reason why a test failed. For example, this TestCreateCluster is listed as failed, but all failures in TestCreateCluster look like errors dumping the cluster after failure.

It should show that "storage operator did not become Available=true". Or even tell that "pod cluster-storage-operator-6f6d69bf89-fx2d2 in the hosted control plane XYZ is in CrashloopBackoff"

The PR under test had a simple typo leading to crashloop and it should be more obvious what went wrong.

Version-Release number of selected component (if applicable):

4.15.0-0.ci.test-2023-10-03-040803

https://github.com/openshift/hypershift/pull/3190

Bug MGMT-16505: [PSI] [BE] - Huge amount of "Cluster was updated with api-vip , ingress-vip " in cluster events

View the Description View the linked PRs

Description of the problem:

In PSI, BE master ~ 2.30 - Massive amount of the following message "Cluster was updated with api-vip <IP ADDRESS>, ingress-vip <IP ADDRESS>" in cluster events.

This message is repeating itself every minute X 5 (I guess related to number of hosts ?)

the installation was started, but aborted due to network connection issues.

I've tried to reproduce in staging, but couldn't.

How reproducible:

Still checking

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:
This message should not be shown more than once

https://github.com/openshift/assisted-service/pull/5872

Bug OCPBUGS-29586: Control Plane Kube Apiserver Service Port should remain as 2040 for IBM Cloud Provider

View the Description View the linked PRs

Description of problem:

    A recent [PR](https://github.com/openshift/hypershift/commit/c030ab66d897815e16d15c987456deab8d0d6da0) updated the kube-apiserver service port to `6443`. That change causes a small outage when upgrading from a 4.13 cluster in IBMCloud. We need to keep the service port as 2040 for IBM Cloud Provider to avoid the outage.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3569

Story TRT-1441: Stop pushing intervals to Loki

View the Description View the linked PRs

We've moved to using bigquery for this, stop pushing and free up some loki cycles.

This is done with a command in origin: https://github.com/openshift/origin/blob/24e011ba3adf2767b88619351895bb878de3d62a/pkg/cmd/openshift-tests/dev/dev.go#L211

So all this code and probably some libraries could be removed.

But first, remove the invocation of this command in the release repo. (upload-intervals)

https://github.com/openshift/origin/pull/28525

Bug OCPBUGS-41217: OLM catalogsource pods do not recover from node failure when registryPoll is none

View the Description View the linked PRs

Description of problem:

The pod of catalogsource without registryPoll wasn't recreated during the node failure

    jiazha-mac:~ jiazha$ oc get pods 
NAME                                    READY   STATUS        RESTARTS       AGE
certified-operators-rcs64               1/1     Running       0              123m
community-operators-8mxh6               1/1     Running       0              123m
marketplace-operator-769fbb9898-czsfn   1/1     Running       4 (117m ago)   136m
qe-app-registry-5jxlx                   1/1     Running       0              106m
redhat-marketplace-4bgv9                1/1     Running       0              123m
redhat-operators-ww5tb                  1/1     Running       0              123m
test-2xvt8                              1/1     Terminating   0              12m

jiazha-mac:~ jiazha$ oc get pods test-2xvt8 -o wide 
NAME         READY   STATUS    RESTARTS   AGE    IP            NODE                                          NOMINATED NODE   READINESS GATES
test-2xvt8   1/1     Running   0          7m6s   10.129.2.26   qe-daily-417-0708-cv2p6-worker-westus-gcrrc   <none>           <none>

jiazha-mac:~ jiazha$ oc get node qe-daily-417-0708-cv2p6-worker-westus-gcrrc
NAME                                          STATUS     ROLES    AGE    VERSION
qe-daily-417-0708-cv2p6-worker-westus-gcrrc   NotReady   worker   116m   v1.30.2+421e90e

Version-Release number of selected component (if applicable):

     Cluster version is 4.17.0-0.nightly-2024-07-07-131215

How reproducible:

    always

Steps to Reproduce:

    1. create a catalogsource without the registryPoll configure.

jiazha-mac:~ jiazha$ cat cs-32183.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: test
  namespace: openshift-marketplace
spec:
  displayName: Test Operators
  image: registry.redhat.io/redhat/redhat-operator-index:v4.16
  publisher: OpenShift QE
  sourceType: grpc

jiazha-mac:~ jiazha$ oc create -f cs-32183.yaml 
catalogsource.operators.coreos.com/test created

jiazha-mac:~ jiazha$ oc get pods test-2xvt8 -o wide 
NAME         READY   STATUS    RESTARTS   AGE     IP            NODE                                          NOMINATED NODE   READINESS GATES
test-2xvt8   1/1     Running   0          3m18s   10.129.2.26   qe-daily-417-0708-cv2p6-worker-westus-gcrrc   <none>           <none>


     2. Stop the node 
jiazha-mac:~ jiazha$ oc debug node/qe-daily-417-0708-cv2p6-worker-westus-gcrrc 
Temporary namespace openshift-debug-q4d5k is created for debugging node...
Starting pod/qe-daily-417-0708-cv2p6-worker-westus-gcrrc-debug-v665f ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.5
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# systemctl stop kubelet; sleep 600; systemctl start kubelet


Removing debug pod ...
Temporary namespace openshift-debug-q4d5k was removed.

jiazha-mac:~ jiazha$ oc get node qe-daily-417-0708-cv2p6-worker-westus-gcrrc
NAME                                          STATUS     ROLES    AGE    VERSION
qe-daily-417-0708-cv2p6-worker-westus-gcrrc   NotReady   worker   115m   v1.30.2+421e90e


    3. check it this catalogsource's pod recreated.

Actual results:

No new pod was generated.

    jiazha-mac:~ jiazha$ oc get pods 
NAME                                    READY   STATUS        RESTARTS       AGE
certified-operators-rcs64               1/1     Running       0              123m
community-operators-8mxh6               1/1     Running       0              123m
marketplace-operator-769fbb9898-czsfn   1/1     Running       4 (117m ago)   136m
qe-app-registry-5jxlx                   1/1     Running       0              106m
redhat-marketplace-4bgv9                1/1     Running       0              123m
redhat-operators-ww5tb                  1/1     Running       0              123m
test-2xvt8                              1/1     Terminating   0              12m

once node recovery, a new pod was generated.


jiazha-mac:~ jiazha$ oc get node qe-daily-417-0708-cv2p6-worker-westus-gcrrc
NAME                                          STATUS   ROLES    AGE    VERSION
qe-daily-417-0708-cv2p6-worker-westus-gcrrc   Ready    worker   127m   v1.30.2+421e90e

jiazha-mac:~ jiazha$ oc get pods 
NAME                                    READY   STATUS    RESTARTS       AGE
certified-operators-rcs64               1/1     Running   0              127m
community-operators-8mxh6               1/1     Running   0              127m
marketplace-operator-769fbb9898-czsfn   1/1     Running   4 (121m ago)   140m
qe-app-registry-5jxlx                   1/1     Running   0              109m
redhat-marketplace-4bgv9                1/1     Running   0              127m
redhat-operators-ww5tb                  1/1     Running   0              127m
test-wqxvg                              1/1     Running   0              27s

Expected results:

During the node failure, a new catalog source pod should be generated.

Additional info:

Hi Team,

After some more investigating the source code of operator-lifecycle-manager, we figure out the reason.

The commit [1] try to fix this issue by adding "force deleting dead pod" process into ensurePod() function.
The ensurePod() is called by EnsureRegistryServer() [2].
However, the syncRegistryServer() will return immediately without calling EnsureRegistryServer() if there is no registryPoll in catalog [3].

There is no registryPoll defined in catalogsource that were generated when we build catalog image following Doc [4].

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: quay-server.bastion.tokyo.com:5000/redhat/redhat-operator-index-logging:logging-vstable-5.8-v5.8.5
  sourceType: grpc

So the catalog pod created by the catalogsource cannot recovered.

And we verified that the catalog pod can be recreated on other node if we add the configuration of registryPoll to catalogsource as the following (The lines with <==).

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: quay-server.bastion.tokyo.com:5000/redhat/redhat-operator-index-logging:logging-vstable-5.8-v5.8.5
  sourceType: grpc
  updateStrategy:   <==
    registryPoll:   <==
      interval: 10m <==

The registryPoll is NOT MUST for catalogsource.
So the commit [1] trying to fix the issue in EnsureRegistryServer() is not properly.

[1] https://github.com/operator-framework/operator-lifecycle-manager/pull/3201/files
[2] https://github.com/joelanford/operator-lifecycle-manager/blob/82f499723e52e85f28653af0610b6e7feff096cf/pkg/controller/registry/reconciler/grpc.go#L290
[3] https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/catalog/operator.go#L1009
[4] https://docs.openshift.com/container-platform/4.16/operators/admin/olm-managing-custom-catalogs.html

https://github.com/openshift/operator-framework-olm/pull/854

Bug TRT-1506: Find out what happened to single second disruption tests

View the Description View the linked PRs

These tests look to have been mistakenly deleted in 4.14 during the big monitortest refactor. We could use them right now to identify gcp jobs with spatter disruption.

[sig-network] there should be nearly zero single second disruptions for _
[sig-network] there should be reasonably few single second disruptions for _

Find out what happened and get them restored. Code is there but it looks like there are assumptions about extracting the backend name that may have been broken somewhere.

https://github.com/openshift/origin/pull/28592

Bug OCPBUGS-25548: Update 4.16 ose-csi-snapshot-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-snapshotter/pull/135

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-snapshotter/pull/135

Bug OCPBUGS-29641: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/972

Bug OCPBUGS-31303: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-samples-operator/pull/536

Bug OCPBUGS-28270: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-35465: ManagedField in YAML editor is not collapsed by default which is incorrect

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34631~~. The following is the description of the original issue:
—
Description of problem:

  ManagedField in YAML editor is not collapsed by default which is incorrect, From OCP4.7, the field should be collapsed by default

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.Navigate to Workloads>pods and click on any pod to display details page. 
2.Click on the YAML and scroll down to see the managedFields
3.

Actual results:

The field is not collapsed

Expected results:

The field should be collapsed by default

Additional info:

https://github.com/openshift/console/pull/13974

Bug OCPBUGS-36156: [vSphere] unexpected log traceback in terminal when installing cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35547~~. The following is the description of the original issue:
—
Description of problem:

When creating IPI cluster, following unexpected traceback appears in terminal occasionally, it won't cause any failure and install succeed finally.

# ./openshift-install create cluster --dir cluster --log-level debug
...
INFO Importing OVA sgao-nest-ktqck-rhcos-generated-region-generated-zone into failure domain generated-failure-domain.
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 131 [running]:
	>  runtime/debug.Stack()
	>  	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x5e
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/go/src/github.com/openshift/installer/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:60 +0xcd
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).Error(0xc000e37200, {0x26fd23c0, 0xc0016b4270}, {0x77d22d3, 0x3d}, {0x0, 0x0, 0x0})
	>  	/go/src/github.com/openshift/installer/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:139 +0x5d
	>  github.com/go-logr/logr.Logger.Error({{0x270398d8?, 0xc000e37200?}, 0x0?}, {0x26fd23c0, 0xc0016b4270}, {0x77d22d3, 0x3d}, {0x0, 0x0, 0x0})
	>  	/go/src/github.com/openshift/installer/vendor/github.com/go-logr/logr/logr.go:301 +0xda
	>  sigs.k8s.io/cluster-api-provider-vsphere/pkg/session.newClient.func1({0x26fd6c40?, 0xc0021f0160?})
	>  	/go/src/github.com/openshift/installer/vendor/sigs.k8s.io/cluster-api-provider-vsphere/pkg/session/session.go:265 +0xda
	>  sigs.k8s.io/cluster-api-provider-vsphere/pkg/session.newClient.KeepAliveHandler.func2()
	>  	/go/src/github.com/openshift/installer/vendor/github.com/vmware/govmomi/session/keep_alive.go:36 +0x22
	>  github.com/vmware/govmomi/session/keepalive.(*handler).Start.func1()
	>  	/go/src/github.com/openshift/installer/vendor/github.com/vmware/govmomi/session/keepalive/handler.go:124 +0x98
	>  created by github.com/vmware/govmomi/session/keepalive.(*handler).Start in goroutine 1
	>  	/go/src/github.com/openshift/installer/vendor/github.com/vmware/govmomi/session/keepalive/handler.go:116 +0x116

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-06-13-213831

How reproducible:

sometimes

Steps to Reproduce:

1. Create IPI cluster on vSphere multiple times
2, Check output in terminal

Actual results:

unexpected log traceback appears in terminal

Expected results:

unexpected log traceback should not appear in terminal

Additional info:

https://github.com/openshift/installer/pull/8657

Bug OCPBUGS-24986: Update 4.16 ose-prometheus-adapter-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/98

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Story TRT-1476: Add a GCP liveness probe to origin

View the Description View the linked PRs

Work to setup the endpoint will be handled in another card.

For this one we want to setup a new disruption backend similar to the cluster-network-liveness-probe. We'll poll, submit request ids and possibly job identifiers.

This approach us gets us free disruption intervals in bigquery, charting per job, and graphing capabilities from the disruption dashboard.

Bug OCPBUGS-31666: Route API documentation erroneously states that insecureEdgeTerminationPolicy defaults to "Allow"

View the Description View the linked PRs

Description of problem

The Route API documentation states that the default value for the spec.tls.insecureEdgeTerminationPolicy field is "Allow". However, the observable default behavior is that of "None".

Version-Release number of selected component (if applicable)

OpenShift 3.11 and earlier and OpenShift 4.1 through 4.16.

How reproducible

100%.

Steps to Reproduce

1. Check the documentation: oc explain routes.spec.tls.insecureEdgeTerminationPolicy
2. Create an example application and edge-terminated route without specifying insecureEdgeTerminationPolicy, and try to connect to the route using HTTP:

oc adm new-project hello-openshift
oc -n hello-openshift create -f https://raw.githubusercontent.com/openshift/origin/56867df5e362aab0d2d8fa8c225e6761c7469781/examples/hello-openshift/hello-pod.json
oc -n hello-openshift expose pod hello-openshift
oc -n hello-openshift create route edge --service=hello-openshift
curl -k https://hello-openshift-hello-openshift.apps.<cluster domain>
curl -I http://hello-openshift-hello-openshift.apps.<cluster domain>

Actual results

The documentation states that "Allow" is the default:

% oc explain routes.spec.tls.insecureEdgeTerminationPolicy                        
KIND:     Route
VERSION:  route.openshift.io/v1

FIELD:    insecureEdgeTerminationPolicy <string>

DESCRIPTION:
     insecureEdgeTerminationPolicy indicates the desired behavior for insecure
     connections to a route. While each router may make its own decisions on
     which ports to expose, this is normally port 80.

     * Allow - traffic is sent to the server on the insecure port
     (edge/reencrypt terminations only) (default). * None - no traffic is
     allowed on the insecure port. * Redirect - clients are redirected to the
     secure port.

However, in practice, the default seems to be "None":

% oc adm new-project hello-openshift
Created project hello-openshift
% oc -n hello-openshift create -f https://raw.githubusercontent.com/openshift/origin/56867df5e362aab0d2d8fa8c225e6761c7469781/examples/hello-openshift/hello-pod.json
pod/hello-openshift created
% oc -n hello-openshift expose pod hello-openshift
service/hello-openshift exposed
% oc -n hello-openshift create route edge --service=hello-openshift
route.route.openshift.io/hello-openshift created
% oc -n hello-openshift get routes/hello-openshift -o yaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    openshift.io/host.generated: "true"
  creationTimestamp: "2024-04-02T22:59:32Z"
  labels:
    name: hello-openshift
  name: hello-openshift
  namespace: hello-openshift
  resourceVersion: "27147"
  uid: 50029f66-a089-4ec0-be04-91f176883e2b
spec:
  host: hello-openshift-hello-openshift.apps.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
    weight: 100
  wildcardPolicy: None
status:
  ingress:
  - conditions:
    - lastTransitionTime: "2024-04-02T22:59:32Z"
      status: "True"
      type: Admitted
    host: hello-openshift-hello-openshift.apps.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org
    routerCanonicalHostname: router-default.apps.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org
    routerName: default
    wildcardPolicy: None
  - conditions:
    - lastTransitionTime: "2024-04-02T22:59:32Z"
      status: "True"
      type: Admitted
    host: hello-openshift-hello-openshift.apps.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org
    routerCanonicalHostname: router-custom.custom.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org
    routerName: custom
    wildcardPolicy: None
% curl -k https://hello-openshift-hello-openshift.apps.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org
Hello OpenShift!
% curl -I http://hello-openshift-hello-openshift.apps.8fbd3fa1605eb7f8632a.hypershift.aws-2.ci.openshift.org 
HTTP/1.0 503 Service Unavailable
pragma: no-cache
cache-control: private, max-age=0, no-cache, no-store
content-type: text/html

Expected results

Given the API documentation, I would maybe expect to see insecureEdgeTerminationPolicy: Allow in the route definition, and I would definitely expect the curl http:// command to succeed.

Alternatively, I would expect the API documentation to state that the default for insecureEdgeTerminationPolicy is "None", based on the observed behavior.

Additional info

The current "(default)" text was added in https://github.com/openshift/origin/pull/10983/commits/dc1aecd4bcdae7525536180bab2a0a0083aaa0f4.

Bug OCPBUGS-29871: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/171

Bug OCPBUGS-38826: Openshift Installer: create a cluster in AWS with public subnets only

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38692~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38114. The following is the description of the original issue:
—
Description of problem:

Starting from version 4.16, the installer does not support creating a cluster in AWS with the OPENSHIFT_INSTALL_AWS_PUBLIC_ONLY=true flag enabled anymore.

Version-Release number of selected component (if applicable):

How reproducible:

The installation procedure fails systemically when using a predefined VPC

Steps to Reproduce:

    1. Follow the procedure at https://docs.openshift.com/container-platform/4.16/installing/installing_aws/ipi/installing-aws-vpc.html#installation-aws-config-yaml_installing-aws-vpc to prepare an install-config.yaml in order to install a cluster with a custom VPC
    2. Run `openshift-install create cluster ...'
    3. The procedure fails: `failed to create load balancer`

Actual results:

The installation procedure fails.

Expected results:

An OCP cluster to be provisioned in AWS, with public subnets only.

Additional info:

https://github.com/openshift/installer/pull/8892

Bug OCPBUGS-22487: certificate signed by unknown authority while uninstalling operators from console.

View the Description View the linked PRs

Description of problem:

The customer has a custom apiserver certificate.

This error can be found while trying to uninstall any operator by console:

openshift-console/pods/console-56494b7977-d7r76/console/console/logs/current.log:

2023-10-24T14:13:21.797447921+07:00 E1024 07:13:21.797400 1 operands_handler.go:67] Failed to get new client for listing operands: Get "https://api.<cluster>.<domain>:6443/api?timeout=32s": x509: certificate signed by unknown authority

when trying the same request from the console pod we can see no issue.

We see the root ca that signs apiserver certificate and this CA is trusted in the pod.

It seems the code that provokes this issue is:

https://github.com/openshift/console/blob/master/pkg/server/operands_handler.go#L62-L70

https://github.com/openshift/console/pull/13632

Bug OCPBUGS-24965: Update 4.16 ose-ovn-kubernetes-base-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/1978

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/1978

Bug OCPBUGS-25484: Install failure for console operator

View the Description View the linked PRs

Description of problem:

Reviewing 4.15 Install failures (install should succeed: overall) there are a number of variants impacted by recent install failures.

search.ci: Cluster operator console is not available

Jobs like periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-serial show failures that appear to start with 4.15.0-0.nightly-2023-12-07-225558 have installation failures due to console-operator

ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again

4.15.0-0.nightly-2023-12-07-225558 contains console-operator/pull/814, noting in case it is related

Version-Release number of selected component (if applicable):

 4.15

How reproducible:

Steps to Reproduce:

    1. Review link to install failures above
    2.
    3.

Actual results:

Expected results:

Additional info:
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn
periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade

Bug OCPBUGS-30855: OCP4.16 - Port_Security flag doesn't work in ShiftonStack Sriov Worker node deployment

View the Description View the linked PRs

Description of problem:

    The Port_Security has been override although it has set to false in Worker machineset configuration

Version-Release number of selected component (if applicable):

    OCP=4.14.14
    RHOSP=17.1

How reproducible:

    NFV Perf lab 
    ShiftonStack Deployment mode = IPI

Steps to Reproduce:

    1.Network configuration resources for Worker node
$ oc get machinesets.machine.openshift.io -n openshift-machine-api | grep worker
5kqfbl3y0rhocpnfv-wj2jj-worker-0   1         1         1       1           5d23h
$ oc describe machinesets.machine.openshift.io -n openshift-machine-api 5kqfbl3y0rhocpnfv-wj2jj-worker-0
Name:         5kqfbl3y0rhocpnfv-wj2jj-worker-0
Namespace:    openshift-machine-api
Labels:       machine.openshift.io/cluster-api-cluster=5kqfbl3y0rhocpnfv-wj2jj
              machine.openshift.io/cluster-api-machine-role=worker
              machine.openshift.io/cluster-api-machine-type=worker
Annotations:  machine.openshift.io/memoryMb: 47104
              machine.openshift.io/vCPU: 26
API Version:  machine.openshift.io/v1beta1
Kind:         MachineSet
Metadata:
  Creation Timestamp:  2024-03-07T05:24:07Z
  Generation:          3
  Resource Version:    226098
  UID:                 8cb06872-9b62-4c2c-b66b-bf91a03efa2d
Spec:
  Replicas:  1
  Selector:
    Match Labels:
      machine.openshift.io/cluster-api-cluster:     5kqfbl3y0rhocpnfv-wj2jj
      machine.openshift.io/cluster-api-machineset:  5kqfbl3y0rhocpnfv-wj2jj-worker-0
  Template:
    Metadata:
      Labels:
        machine.openshift.io/cluster-api-cluster:       5kqfbl3y0rhocpnfv-wj2jj
        machine.openshift.io/cluster-api-machine-role:  worker
        machine.openshift.io/cluster-api-machine-type:  worker
        machine.openshift.io/cluster-api-machineset:    5kqfbl3y0rhocpnfv-wj2jj-worker-0
    Spec:
      Lifecycle Hooks:
      Metadata:
      Provider Spec:
        Value:
          API Version:        machine.openshift.io/v1alpha1
          Availability Zone:  worker
          Cloud Name:         openstack
          Clouds Secret:
            Name:        openstack-cloud-credentials
            Namespace:   openshift-machine-api
          Config Drive:  true
          Flavor:        sos-worker
          Image:         5kqfbl3y0rhocpnfv-wj2jj-rhcos
          Kind:          OpenstackProviderSpec
          Metadata:
          Networks:
            Filter:
            Subnets:
              Filter:
                Id:  7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34
          Ports:
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p1
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p1
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p2
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p2
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p3
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p3
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    1a892dcf-bf93-46ef-bf37-bda6cf923471
            Name Suffix:    provider3p4
            Network ID:     50a557b5-34c2-4c47-b539-963688f7167c
            Port Security:  false
            Tags:
              sriov
            Trunk:      false
            Vnic Type:  direct
            Fixed I Ps:
              Subnet ID:    76430b9e-302f-428d-916a-77482d9cfb19
            Name Suffix:    provider4p4
            Network ID:     e2106b16-8f83-4e2e-bdbd-20e2c12ec279
            Port Security:  false
            Tags:
              sriov
            Trunk:         false
            Vnic Type:     direct
          Primary Subnet:  7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34
          Security Groups:
            Filter:
            Name:             5kqfbl3y0rhocpnfv-wj2jj-worker
          Server Group Name:  5kqfbl3y0rhocpnfv-wj2jj-worker-worker
          Server Metadata:
            Name:                  5kqfbl3y0rhocpnfv-wj2jj-worker
            Openshift Cluster ID:  5kqfbl3y0rhocpnfv-wj2jj
          Tags:
            openshiftClusterID=5kqfbl3y0rhocpnfv-wj2jj
          Trunk:  true
          User Data Secret:
            Name:  worker-user-data
Status:
  Available Replicas:      1
  Fully Labeled Replicas:  1
  Observed Generation:     3
  Ready Replicas:          1
  Replicas:                1
Events:                    <none>
$ oc get nodes
NAME                                     STATUS   ROLES                  AGE     VERSION
5kqfbl3y0rhocpnfv-wj2jj-master-0         Ready    control-plane,master   5d23h   v1.27.10+28ed2d7
5kqfbl3y0rhocpnfv-wj2jj-master-1         Ready    control-plane,master   5d23h   v1.27.10+28ed2d7
5kqfbl3y0rhocpnfv-wj2jj-master-2         Ready    control-plane,master   5d23h   v1.27.10+28ed2d7
5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr   Ready    worker                 5d22h   v1.27.10+28ed2d7
$ oc describe nodes 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
Name:               5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=sos-worker
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=regionOne
                    failure-domain.beta.kubernetes.io/zone=worker
                    feature.node.kubernetes.io/network-sriov.capable=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=sos-worker
                    node.openshift.io/os_id=rhcos
                    topology.cinder.csi.openstack.org/zone=worker
                    topology.kubernetes.io/region=regionOne
                    topology.kubernetes.io/zone=worker
Annotations:        alpha.kubernetes.io/provided-node-ip: 192.168.0.91
                    csi.volume.kubernetes.io/nodeid: {"cinder.csi.openstack.org":"aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879"}
                    machine.openshift.io/machine: openshift-machine-api/5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-8c613531f97974a9561f8b0ada0c2cd0
                    machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion: 505735
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    sriovnetwork.openshift.io/state: Idle
                    tuned.openshift.io/bootcmdline:
                      skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=10-25 tuned.non_isolcpus=000003ff systemd.cpu_affinity=0,1,2,3...
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 07 Mar 2024 06:09:31 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
  AcquireTime:     <unset>
  RenewTime:       Wed, 13 Mar 2024 04:55:28 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:00 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:00 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:00 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 13 Mar 2024 04:55:33 +0000   Thu, 07 Mar 2024 15:18:05 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.0.91
  Hostname:    5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr
Capacity:
  cpu:                          26
  ephemeral-storage:            104266732Ki
  hugepages-1Gi:                20Gi
  hugepages-2Mi:                0
  memory:                       47264764Ki
  openshift.io/intl_provider3:  4
  openshift.io/intl_provider4:  4
  pods:                         250
Allocatable:
  cpu:                          16
  ephemeral-storage:            95018478229
  hugepages-1Gi:                20Gi
  hugepages-2Mi:                0
  memory:                       25166844Ki
  openshift.io/intl_provider3:  4
  openshift.io/intl_provider4:  4
  pods:                         250
System Info:
  Machine ID:                             aa5cfdcbeb4646d88ac25bb6f0c0d879
  System UUID:                            aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879
  Boot ID:                                77573755-0d27-4717-80fe-4579692d9c2c
  Kernel Version:                         5.14.0-284.54.1.el9_2.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 414.92.202402201520-0 (Plow)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.27.3-6.rhaos4.14.git7eb2281.el9
  Kubelet Version:                        v1.27.10+28ed2d7
  Kube-Proxy Version:                     v1.27.10+28ed2d7
ProviderID:                               openstack:///aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879
Non-terminated Pods:                      (19 in total)
  Namespace                               Name                                                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                               ----                                                 ------------  ----------  ---------------  -------------  ---
  crucible-rickshaw                       testpmd-host-device-e810-sriov                       10 (62%)      10 (62%)    10000Mi (40%)    10000Mi (40%)  3d13h
  openshift-cluster-csi-drivers           openstack-cinder-csi-driver-node-hnv49               30m (0%)      0 (0%)      150Mi (0%)       0 (0%)         5d22h
  openshift-cluster-node-tuning-operator  tuned-fcjfp                                          10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5d22h
  openshift-dns                           dns-default-v7s59                                    60m (0%)      0 (0%)      110Mi (0%)       0 (0%)         5d22h
  openshift-dns                           node-resolver-gkz8b                                  5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         5d22h
  openshift-image-registry                node-ca-p5dn5                                        10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5d22h
  openshift-ingress-canary                ingress-canary-fk59t                                 10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         5d22h
  openshift-machine-config-operator       machine-config-daemon-9qw8z                          40m (0%)      0 (0%)      100Mi (0%)       0 (0%)         5d22h
  openshift-monitoring                    node-exporter-czcmj                                  9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         5d22h
  openshift-monitoring                    prometheus-adapter-7696787779-vj5wk                  1m (0%)       0 (0%)      40Mi (0%)        0 (0%)         5d4h
  openshift-multus                        multus-additional-cni-plugins-l7rpv                  10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5d22h
  openshift-multus                        multus-nxr6k                                         10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         5d22h
  openshift-multus                        network-metrics-daemon-tb7sq                         20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         5d22h
  openshift-network-diagnostics           network-check-target-pqtp9                           10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         5d22h
  openshift-openstack-infra               coredns-5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr       200m (1%)     0 (0%)      400Mi (1%)       0 (0%)         5d22h
  openshift-openstack-infra               keepalived-5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr    200m (1%)     0 (0%)      400Mi (1%)       0 (0%)         5d22h
  openshift-sdn                           sdn-9mdnb                                            110m (0%)     0 (0%)      220Mi (0%)       0 (0%)         5d22h
  openshift-sriov-network-operator        sriov-device-plugin-tr68w                            10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5d13h
  openshift-sriov-network-operator        sriov-network-config-daemon-dtf95                    100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         5d22h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                     Requests       Limits
  --------                     --------       ------
  cpu                          10845m (67%)   10 (62%)
  memory                       11928Mi (48%)  10000Mi (40%)
  ephemeral-storage            0 (0%)         0 (0%)
  hugepages-1Gi                8Gi (40%)      8Gi (40%)
  hugepages-2Mi                0 (0%)         0 (0%)
  openshift.io/intl_provider3  4              4
  openshift.io/intl_provider4  4              4
Events:                        <none>

    2. OpenStack Network resource for Worker node
$ openstack server list --all --fit-width
+--------------------------------------+----------------------------------------+--------+----------------------------------------------------------------------------------------------------------------------+-------------------------------+------------+
| ID                                   | Name                                   | Status | Networks                                                                                                             | Image                         | Flavor     |
+--------------------------------------+----------------------------------------+--------+----------------------------------------------------------------------------------------------------------------------+-------------------------------+------------+
| aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr | ACTIVE | management=192.168.0.91; provider-3=192.168.177.197, 192.168.177.59, 192.168.177.66, 192.168.177.83;                 | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-worker |
|                                      |                                        |        | provider-4=192.168.178.108, 192.168.178.121, 192.168.178.144, 192.168.178.18                                         |                               |            |
| 1a24baf3-acde-49a0-ab8e-4f4afcc9d3cc | 5kqfbl3y0rhocpnfv-wj2jj-master-2       | ACTIVE | management=192.168.0.62                                                                                              | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-master |
| 3e545ab5-6e28-4189-8d94-9272dfa1cd05 | 5kqfbl3y0rhocpnfv-wj2jj-master-1       | ACTIVE | management=192.168.0.78                                                                                              | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-master |
| 97e5c382-0fb0-4a70-b58e-0469d3869a4e | 5kqfbl3y0rhocpnfv-wj2jj-master-0       | ACTIVE | management=192.168.0.93                                                                                              | 5kqfbl3y0rhocpnfv-wj2jj-rhcos | sos-master |
+--------------------------------------+----------------------------------------+--------+----------------------------------------------------------------------------------------------------------------------+-------------------------------+------------+$ openstack port list --server aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879
+--------------------------------------+----------------------------------------------------+-------------------+--------------------------------------------------------------------------------+--------+
| ID                                   | Name                                               | MAC Address       | Fixed IP Addresses                                                             | Status |
+--------------------------------------+----------------------------------------------------+-------------------+--------------------------------------------------------------------------------+--------+
| 0a562c29-4ddc-41c4-82e8-13934d3ee273 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-0           | fa:16:3e:16:9a:c3 | ip_address='192.168.0.91', subnet_id='7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34'    | ACTIVE |
| 0c1814db-cd4f-4f6a-a0c6-4f8e569b6767 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p4 | fa:16:3e:15:88:d7 | ip_address='192.168.178.108', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19' | ACTIVE |
| 1778cb62-5fbf-42be-8847-53a7b092bdf5 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p2 | fa:16:3e:2a:64:e4 | ip_address='192.168.177.197', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471' | ACTIVE |
| 557f205b-2674-4f6e-91a2-643fe1702be2 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p1 | fa:16:3e:56:a3:48 | ip_address='192.168.177.83', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'  | ACTIVE |
| 721b5f15-2dc9-4509-a4ba-09f364ae8771 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p3 | fa:16:3e:dd:c3:28 | ip_address='192.168.177.59', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'  | ACTIVE |
| 9da4b1be-27d7-4428-a194-9eb4b02f6ac5 | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p3 | fa:16:3e:fb:06:1b | ip_address='192.168.178.144', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19' | ACTIVE |
| a72fcbd2-83d3-4fa9-be3d-e9fbde27d4bf | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p4 | fa:16:3e:a9:28:0e | ip_address='192.168.177.66', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'  | ACTIVE |
| ba5cd10f-c6bc-4bed-b978-3b8a3560ad5c | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p1 | fa:16:3e:33:e4:c4 | ip_address='192.168.178.18', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19'  | ACTIVE |
| bf2ce123-76fc-4e5c-9e4f-0473febbdeac | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p2 | fa:16:3e:ce:91:10 | ip_address='192.168.178.121', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19' | ACTIVE |
+--------------------------------------+----------------------------------------------------+-------------------+--------------------------------------------------------------------------------+--------+
$ openstack port show --fit-width 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p4
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| Field                   | Value                                                                                                                     |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| admin_state_up          | UP                                                                                                                        |
| allowed_address_pairs   |                                                                                                                           |
| binding_host_id         | nfv-intel-11.perflab.com                                                                                                  |
| binding_profile         | pci_slot='0000:b1:11.2', pci_vendor_info='8086:1889', physical_network='provider4'                                        |
| binding_vif_details     | connectivity='l2', port_filter='False', vlan='178'                                                                        |
| binding_vif_type        | hw_veb                                                                                                                    |
| binding_vnic_type       | direct                                                                                                                    |
| created_at              | 2024-03-07T06:03:43Z                                                                                                      |
| data_plane_status       | None                                                                                                                      |
| description             | Created by cluster-api-provider-openstack cluster openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj                           |
| device_id               | aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879                                                                                      |
| device_owner            | compute:worker                                                                                                            |
| device_profile          | None                                                                                                                      |
| dns_assignment          | fqdn='host-192-168-178-108.openstacklocal.', hostname='host-192-168-178-108', ip_address='192.168.178.108'                |
| dns_domain              |                                                                                                                           |
| dns_name                |                                                                                                                           |
| extra_dhcp_opts         |                                                                                                                           |
| fixed_ips               | ip_address='192.168.178.108', subnet_id='76430b9e-302f-428d-916a-77482d9cfb19'                                            |
| id                      | 0c1814db-cd4f-4f6a-a0c6-4f8e569b6767                                                                                      |
| ip_allocation           | None                                                                                                                      |
| mac_address             | fa:16:3e:15:88:d7                                                                                                         |
| name                    | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider4p4                                                                        |
| network_id              | e2106b16-8f83-4e2e-bdbd-20e2c12ec279                                                                                      |
| numa_affinity_policy    | None                                                                                                                      |
| port_security_enabled   | True                                                                                                                      |
| project_id              | 927450d0f06647a99d86214acd822679                                                                                          |
| propagate_uplink_status | None                                                                                                                      |
| qos_network_policy_id   | None                                                                                                                      |
| qos_policy_id           | None                                                                                                                      |
| resource_request        | None                                                                                                                      |
| revision_number         | 6                                                                                                                         |
| security_group_ids      | f0df9265-c7fd-4f47-875f-d346e5cb5074                                                                                      |
| status                  | ACTIVE                                                                                                                    |
| tags                    | cluster-api-provider-openstack, openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj, openshiftClusterID=5kqfbl3y0rhocpnfv-wj2jj |
| trunk_details           | None                                                                                                                      |
| updated_at              | 2024-03-07T06:04:10Z                                                                                                      |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+$ openstack port show --fit-width 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p2
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| Field                   | Value                                                                                                                     |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
| admin_state_up          | UP                                                                                                                        |
| allowed_address_pairs   |                                                                                                                           |
| binding_host_id         | nfv-intel-11.perflab.com                                                                                                  |
| binding_profile         | pci_slot='0000:b1:01.1', pci_vendor_info='8086:1889', physical_network='provider3'                                        |
| binding_vif_details     | connectivity='l2', port_filter='False', vlan='177'                                                                        |
| binding_vif_type        | hw_veb                                                                                                                    |
| binding_vnic_type       | direct                                                                                                                    |
| created_at              | 2024-03-07T06:03:41Z                                                                                                      |
| data_plane_status       | None                                                                                                                      |
| description             | Created by cluster-api-provider-openstack cluster openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj                           |
| device_id               | aa5cfdcb-eb46-46d8-8ac2-5bb6f0c0d879                                                                                      |
| device_owner            | compute:worker                                                                                                            |
| device_profile          | None                                                                                                                      |
| dns_assignment          | fqdn='host-192-168-177-197.openstacklocal.', hostname='host-192-168-177-197', ip_address='192.168.177.197'                |
| dns_domain              |                                                                                                                           |
| dns_name                |                                                                                                                           |
| extra_dhcp_opts         |                                                                                                                           |
| fixed_ips               | ip_address='192.168.177.197', subnet_id='1a892dcf-bf93-46ef-bf37-bda6cf923471'                                            |
| id                      | 1778cb62-5fbf-42be-8847-53a7b092bdf5                                                                                      |
| ip_allocation           | None                                                                                                                      |
| mac_address             | fa:16:3e:2a:64:e4                                                                                                         |
| name                    | 5kqfbl3y0rhocpnfv-wj2jj-worker-0-n6crr-provider3p2                                                                        |
| network_id              | 50a557b5-34c2-4c47-b539-963688f7167c                                                                                      |
| numa_affinity_policy    | None                                                                                                                      |
| port_security_enabled   | True                                                                                                                      |
| project_id              | 927450d0f06647a99d86214acd822679                                                                                          |
| propagate_uplink_status | None                                                                                                                      |
| qos_network_policy_id   | None                                                                                                                      |
| qos_policy_id           | None                                                                                                                      |
| resource_request        | None                                                                                                                      |
| revision_number         | 9                                                                                                                         |
| security_group_ids      | f0df9265-c7fd-4f47-875f-d346e5cb5074                                                                                      |
| status                  | ACTIVE                                                                                                                    |
| tags                    | cluster-api-provider-openstack, openshift-machine-api-5kqfbl3y0rhocpnfv-wj2jj, openshiftClusterID=5kqfbl3y0rhocpnfv-wj2jj |
| trunk_details           | None                                                                                                                      |
| updated_at              | 2024-03-07T06:10:42Z                                                                                                      |
+-------------------------+---------------------------------------------------------------------------------------------------------------------------+
$ openstack network list
+--------------------------------------+-------------+--------------------------------------+
| ID                                   | Name        | Subnets                              |
+--------------------------------------+-------------+--------------------------------------+
| 50a557b5-34c2-4c47-b539-963688f7167c | provider-3  | 1a892dcf-bf93-46ef-bf37-bda6cf923471 |
| e2106b16-8f83-4e2e-bdbd-20e2c12ec279 | provider-4  | 76430b9e-302f-428d-916a-77482d9cfb19 |
| 5fdddf1c-3a71-4752-94bd-bdb5b9674500 | management  | 7fb7d2d6-325d-49e1-b3f8-b4dbb1197e34 |
+--------------------------------------+-------------+--------------------------------------+$ openstack network show provider-3
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2024-03-01T16:45:48Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | 50a557b5-34c2-4c47-b539-963688f7167c |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | None                                 |
| is_vlan_transparent       | None                                 |
| mtu                       | 9216                                 |
| name                      | provider-3                           |
| port_security_enabled     | True                                 |
| project_id                | ad4b9a972ac64bd9916ad7ee80288353     |
| provider:network_type     | vlan                                 |
| provider:physical_network | provider3                            |
| provider:segmentation_id  | 177                                  |
| qos_policy_id             | None                                 |
| revision_number           | 2                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | True                                 |
| status                    | ACTIVE                               |
| subnets                   | 1a892dcf-bf93-46ef-bf37-bda6cf923471 |
| tags                      |                                      |
| updated_at                | 2024-03-01T16:45:52Z                 |
+---------------------------+--------------------------------------+$ openstack network show provider-4
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2024-03-01T16:45:57Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | e2106b16-8f83-4e2e-bdbd-20e2c12ec279 |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | None                                 |
| is_vlan_transparent       | None                                 |
| mtu                       | 9216                                 |
| name                      | provider-4                           |
| port_security_enabled     | True                                 |
| project_id                | ad4b9a972ac64bd9916ad7ee80288353     |
| provider:network_type     | vlan                                 |
| provider:physical_network | provider4                            |
| provider:segmentation_id  | 178                                  |
| qos_policy_id             | None                                 |
| revision_number           | 2                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | True                                 |
| status                    | ACTIVE                               |
| subnets                   | 76430b9e-302f-428d-916a-77482d9cfb19 |
| tags                      |                                      |
| updated_at                | 2024-03-01T16:46:01Z                 |
+---------------------------+--------------------------------------+
     3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-openstack/pull/107

Bug OCPBUGS-36862: 4.16 "Bad" reconciliation loops can cause unbounded dockercfg secret creation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36833~~. The following is the description of the original issue:
—
Description of problem:

In 4,16 OCP starts to place an annotation on service accounts when it creates a dockercfg secret. Some operators/reconciliation loops (incorrectly) will then try to set the annotation on the SA back to exactly what they wanted. OCP will annotate again and create a new secret. Operators sets it back without annotation. Rinse Repeat.

Eventually etcd will get completely overloaded with secrets, will start to OOM, and the entire cluster will come down.

There is belief that at least otel, tempo, acm, odf/ocs, strymzi, elasticsearch and possibly other operators reconciled the annoations on the SA by setting them back exactly how they wanted them set.

These seem to be related (but no complete)

https://issues.redhat.com/browse/LOG-5776

https://issues.redhat.com/browse/ENTMQST-6129

https://issues.redhat.com/browse/TRACING-4435

https://issues.redhat.com/browse/ACM-10987

https://github.com/openshift/openshift-controller-manager/pull/323

Bug MGMT-14380: While setting up OCP with ODF using AI, host status is shown as insufficient though additional multipath data disk is attached

View the Description View the linked PRs

Description of the problem:

Setting up OCP with ODF (compact mode) using AI (stage). I have 3 hosts with install disk (120GB) and data disk (500GB, disk type: Multipath). Though we have non bootable disk (500GB), host status is "Insufficient". Could not proceed forward as the "Next" button is disabled.

Steps to reproduce:

1. Create a new cluster

2. Select "Install OpenShift Data Foundation" in Operators page

3. Take 3 hosts with 1 installation disk and 1 non-installation disk on each.

4. Add hosts by booting hosts with downloaded iso

Actual results:

Status of hosts is "Insufficient" and "Next" button is disabled

data.json

Expected results:

Status of hosts should be "Ready" and "Next" button should be enabled to proceed with installation

https://github.com/openshift/assisted-service/pull/6072

Bug OCPBUGS-27190: It should deny creating an ImageDigestMirrorSet with conflicting mirrorSourcePolicy

View the Description View the linked PRs

Description of problem:

When creating an ImageDigestMirrorSet with conflicting mirrorSourcePolicy, it didn't prompt error.

Version-Release number of selected component (if applicable):

% oc get clusterversion 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-14-100410   True        False         27m     Cluster version is 4.15.0-0.nightly-2024-01-14-100410

How reproducible:

always

Steps to Reproduce:

1. create an ImageContentSourcePolicy 

ImageContentSourcePolicy.yaml:
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: ubi8repo
spec:
  repositoryDigestMirrors:
  - mirrors:
    - example.io/example/ubi-minimal
    - example.com/example/ubi-minimal
    source: registry.access.redhat.com/ubi6/ubi-minimal
  - mirrors:
    - mirror.example.net
    source: registry.example.com/example

2.After the mcp finish updating, check the /etc/containers/registries.conf update as expected

3.create an ImageDigestMirrorSet with conflicting mirrorSourcePolicy for the same source "registry.example.com/example"

ImageDigestMirrorSet-conflict.yaml: 
apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  name: digest-mirror
spec:
  imageDigestMirrors:
  - mirrors:
    - example.io/example/ubi-minimal
    - example.com/example/ubi-minimal
    source: registry.access.redhat.com/ubi8/ubi-minimal
    mirrorSourcePolicy: AllowContactingSource
  - mirrors:
    - mirror.example.net
    source: registry.example.com/example
    mirrorSourcePolicy: NeverContactSource

Actual results:

3. create successfully, but the mcp didn't get updated and no relevant mc generated.

The machine-config-controller log showed:
I0116 02:34:03.897335       1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not Create/Update MachineConfig: could not update registries config with new changes: conflicting mirrorSourcePolicy is set for the same source "registry.example.com/example" in imagedigestmirrorsets and/or imagetagmirrorsets

Expected results:

3. it should prompt: there exist conflicting mirrorSourcePolicy for the same source "registry.example.com/example" in ICSP

Additional info:

https://github.com/openshift/machine-config-operator/pull/4125

Bug OCPBUGS-30437: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-attacher/pull/71

Bug OCPBUGS-34910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/238

Bug OCPBUGS-33024: Agent: installation fails if proxy set with % character in the credentials

View the Description View the linked PRs

Description of problem:

    If a cluster is installed using proxy and the username used for connecting to the proxy contains the characters "%40" for encoding a "@" in case of providing a doamin, the instalation fails. The failure is because the proxy variables implemented in the file "/etc/systemd/system.conf.d/10-default-env.conf" in the bootstrap node are ignored by systemd. This issue seems was already fixed in MCO (BZ 1882674 - fixed in RHOCP 4.7), but looks like is affecting the bootstrap process in 4.13 and 4.14, causing the installation to not start at all.

Version-Release number of selected component (if applicable):

    4.14, 4.13

How reproducible:

    100% always

Steps to Reproduce:

    1. create a install-config.yaml file with "%40" in the middle of the username used for proxy.
    2. start cluster installation.
    3. bootstrap will fail for not using proxy variables.

Actual results:

Installation fails because systemd fails to load the proxy varaibles if "%" is present in the username.

Expected results:

    Installation to succeed using a username with "%40" for the proxy.

Additional info:

File "/etc/systemd/system.conf.d/10-default-env.conf" for the bootstrap should be generated in a way accepted by systemd.

https://github.com/openshift/installer/pull/8320

Bug OCPBUGS-41490: [4.16] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.16. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-36324~~.

Bug OCPBUGS-32925: Bump OVS to 3.3 in ovn-kubernetes container for OCP 4.16

View the Description View the linked PRs

Description of problem:

Fast DataPath released a new major version of Open vSwitch, which is 3.3.
This version is going to be a new LTS and contains performance improvements
and features required for future releases of OVN. Since OCP 4.16 is planned
to have a longer support time frame, it should use this version of OVS.
Moving to newer versions of OVS will also gradually allow FDP to drop support
for older streams not used by any layered products.

Most notable relevant improvements over OVS 3.1 are:
- Improved performance of database operations, most notbaly the initial read
  of the database file and the database schema conversion on updates.

The plan is to also update the main ovs-vwitchd on the os level in a separate
issue, this will provide support for flushing CT entries by marks and labels
needed for future versions of OVN.  And it's better to keep versions on the
host and inside the container in sync.

The change was discussed with FDP and OVS-QE.

https://github.com/openshift/ovn-kubernetes/pull/2142

Bug OCPBUGS-43556: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/453

Bug OCPBUGS-25008: Update 4.16 ose-machine-api-provider-azure-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/88

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/88

Story TRT-1515: openshift-tests OOMing

View the Description View the linked PRs

https://github.com/openshift/release/pull/48835

The scatter chart is very slow to load for the last week, but while there are a few hits here and there over last y days, it looks like this got a lot more common yesterday around noon and has continued ever since.

Suspicious PR: https://github.com/openshift/origin/pull/28587

https://github.com/openshift/origin/pull/28601

Bug CLID-101: oc-mirror v2 should not disable tls verification by default

View the linked PRs

Bug OCPBUGS-25000: Update 4.16 ose-cluster-api-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api/pull/190

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api/pull/190

Bug OCPBUGS-23480: Improve PipelineRun list view performance

View the Description View the linked PRs

Description of problem:

PipelineRun list view contains Task status column, which shows the overall task status of the pipelinerurn. Inorder to render this column we fetch all the tasksruns of that pipelinerun. Every pipelinerun row will have to have all the related TaskRuns information, which is causing performance issue in the pipelinerun list view.

Customer is facing issue of UI slowness and rendering problem for large number of pipelineruns with and without results enabled. In both cases, there is significant slowness being observed which is hampering their daily operations.

How reproducible:

Always

Steps to Reproduce:

1. Create few pipelineruns
2. Navigate to pipelineruns list view

Actual results:

All the Taskruns are being fetched and the pipelinerun list view renders this column asynchronously with loading indicator.

Expected results:

Taskruns should not be fetched at all, rather UI need to parse the `` string to render this column.

Additional info:

Pipelinerun status message gets updated on every task completion.

pipelinerun.status.conditions:

lastTransitionTime: '2023-11-15T07:51:42Z'
message: 'Tasks Completed: 3 (Failed: 0, Cancelled 0), Skipped: 0'
reason: Succeeded
status: 'True'
type: Succeeded

we can parse the above information to derive the following object and use this for rendering the column, this will increase the performance of this page hugely.

{
 completed: 3, // 3 (total count) - 0 (failed count) - 0 (cancelled count),
 failed: 0,
 cancelled: 0,
 skipped: 0,
 pending: 0 
}

Slack thread for more details - thread

https://github.com/openshift/console/pull/13676

Bug OCPBUGS-29874: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1218

Bug OCPBUGS-16801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/773

Bug OCPBUGS-33202: Pipeline list page is crashed when navigating from Search page

View the Description View the linked PRs

Description of problem:

    When navigating to Pipelines list page from Search menu in Dev perspective, Pipelines list page is getting crashed

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Always

Steps to Reproduce:

    1.Install Pipelines Operator
    2.Go to Developer perspective
    3.Go to search menu, select Pipeline

Actual results:

    Page is getting crashed

Expected results:

    Page should not be crashed and should show Pipelines List page

Additional info:

https://github.com/openshift/console/pull/13813

Bug OCPBUGS-30711: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-pod-identity-webhook/pull/187

Story CONSOLE-3909: i18n upload/download routine task

View the Description View the linked PRs

The story is to track i18n upload/download routine tasks which are perform every sprint.

A.C.

- Upload strings to Memosource at the start of the sprint and reach out to localization team

- Download translated strings from Memsource when it is ready

- Review the translated strings and open a pull request

- Open a followup story for next sprint

https://github.com/openshift/console/pull/13519

Story METAL-1005: [4.16] remove glanceclient and cinderclient from ironic-image

View the Description View the linked PRs

update ironic code to remove cinderclient and glanceclient image from the ironic container

https://github.com/openshift/ironic-image/pull/493

Bug OCPBUGS-34569: RegistryMirrorProvider is modifying the cached image directly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34540~~. The following is the description of the original issue:
—
ControlePlaneReleaseProvider is modifying the cached release image directly which means the userReleaseProvider is still picking up and using the registry overrides for data-plane components.

https://github.com/openshift/hypershift/pull/4114

Bug OCPBUGS-33405: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/342

Bug OCPBUGS-26088: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/route-controller-manager/pull/37

Bug OCPBUGS-36220: Failed to provision private HC on AWS

View the Description View the linked PRs

Description of problem:

Private HC provision failed on AWS.

How reproducible:

Always.

Steps to Reproduce:

Create a private HC on AWS following the steps in https://hypershift-docs.netlify.app/how-to/aws/deploy-aws-private-clusters/:

RELEASE_IMAGE=registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-20-005211
HO_IMAGE=quay.io/hypershift/hypershift-operator:latest
BUCKET_NAME=fxie-hcp-bucket
REGION=us-east-2
AWS_CREDS="$HOME/.aws/credentials"
CLUSTER_NAME=fxie-hcp-1
BASE_DOMAIN=qe.devcluster.openshift.com
EXT_DNS_DOMAIN=hypershift-ext.qe.devcluster.openshift.com
PULL_SECRET="/Users/fxie/Projects/hypershift/.dockerconfigjson"

hypershift install --oidc-storage-provider-s3-bucket-name $BUCKET_NAME --oidc-storage-provider-s3-credentials $AWS_CREDS --oidc-storage-provider-s3-region $REGION --private-platform AWS --aws-private-creds $AWS_CREDS --aws-private-region=$REGION --wait-until-available --hypershift-image $HO_IMAGE

hypershift create cluster aws --pull-secret=$PULL_SECRET --aws-creds=$AWS_CREDS --name=$CLUSTER_NAME --base-domain=$BASE_DOMAIN --node-pool-replicas=2 --region=$REGION --endpoint-access=Private --release-image=$RELEASE_IMAGE --generate-ssh

Additional info:

From the MC:
$ for k in $(oc get secret -n clusters-fxie-hcp-1 | grep -i kubeconfig | awk '{print $1}'); do echo $k; oc extract secret/$k -n clusters-fxie-hcp-1 --to - 2>/dev/null | grep -i 'server:'; done
admin-kubeconfig
    server: https://a621f63c3c65f4e459f2044b9521b5e9-082a734ef867f25a.elb.us-east-2.amazonaws.com:6443
aws-pod-identity-webhook-kubeconfig
    server: https://kube-apiserver:6443
bootstrap-kubeconfig
    server: https://api.fxie-hcp-1.hypershift.local:443
cloud-credential-operator-kubeconfig
    server: https://kube-apiserver:6443
dns-operator-kubeconfig
    server: https://kube-apiserver:6443
fxie-hcp-1-2bsct-kubeconfig
    server: https://kube-apiserver:6443
ingress-operator-kubeconfig
    server: https://kube-apiserver:6443
kube-controller-manager-kubeconfig
    server: https://kube-apiserver:6443
kube-scheduler-kubeconfig
    server: https://kube-apiserver:6443
localhost-kubeconfig
    server: https://localhost:6443
service-network-admin-kubeconfig
    server: https://kube-apiserver:6443

The bootstrap-kubeconfig uses an incorrect KAS port (should be 6443 since the KAS is exposed through LB), causing kubelet on each HC node to use the same incorrect port. As a result AWS VMs are provisioned but cannot join the HC as nodes.

From a bastion:
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.
[ec2-user@ip-10-0-5-182 ~]$ nc -zv api.fxie-hcp-1.hypershift.local 6443
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.143.91:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

Besides, the CNO also passes the wrong KAS port to Network components on the HC.

Same for HA proxy configuration on the VMs:

frontend local_apiserver
  bind 172.20.0.1:6443
  log global
  mode tcp
  option tcplog
  default_backend remote_apiserver

backend remote_apiserver
  mode tcp
  log global
  option httpchk GET /version
  option log-health-checks
  default-server inter 10s fall 3 rise 3
  server controlplane api.fxie-hcp-1.hypershift.local:443

https://github.com/openshift/hypershift/pull/4275

Bug OCPBUGS-26232: HCP fails to deploy with TechPreviewNoUpgrade featue set

View the Description View the linked PRs

spec:
  configuration:
    featureGate:
      featureSet: TechPreviewNoUpgrade

$ oc get pod
NAME                                      READY   STATUS             RESTARTS      AGE
capi-provider-bd4858c47-sf5d5             0/2     Init:0/1           0             9m33s
cluster-api-85f69c8484-5n9ql              1/1     Running            0             9m33s
control-plane-operator-78c9478584-xnjmd   2/2     Running            0             9m33s
etcd-0                                    3/3     Running            0             9m10s
kube-apiserver-55bb575754-g4694           4/5     CrashLoopBackOff   6 (81s ago)   8m30s

$ oc logs kube-apiserver-55bb575754-g4694 -c kube-apiserver --tail=5
E0105 16:49:54.411837       1 controller.go:145] while syncing ConfigMap "kube-system/kube-apiserver-legacy-service-account-token-tracking", err: namespaces "kube-system" not found
I0105 16:49:54.415074       1 trace.go:236] Trace[236726897]: "Create" accept:application/vnd.kubernetes.protobuf, */*,audit-id:71496035-d1fe-4ee1-bc12-3b24022ea39c,client:::1,api-group:scheduling.k8s.io,api-version:v1,name:,subresource:,namespace:,protocol:HTTP/2.0,resource:priorityclasses,scope:resource,url:/apis/scheduling.k8s.io/v1/priorityclasses,user-agent:kube-apiserver/v1.29.0 (linux/amd64) kubernetes/9368fcd,verb:POST (05-Jan-2024 16:49:44.413) (total time: 10001ms):
Trace[236726897]: ---"Write to database call failed" len:174,err:priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: not yet ready to handle request 10001ms (16:49:54.415)
Trace[236726897]: [10.001615835s] [10.001615835s] END
F0105 16:49:54.415382       1 hooks.go:203] PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: priorityclasses.scheduling.k8s.io "system-node-critical" is forbidden: not yet ready to handle request

https://github.com/openshift/hypershift/pull/3377

Bug OCPBUGS-33328: e2e-vsphere-ovn-serial - alert/OVNKubernetesResourceRetryFailure should not be at or above info

View the Description View the linked PRs

Component Readiness has found a potential regression in [bz-networking][invariant] alert/OVNKubernetesResourceRetryFailure should not be at or above info.

Probability of significant regression: 96.30%

Sample (being evaluated) Release: 4.16
Start Time: 2024-04-29T00:00:00Z
End Time: 2024-05-06T23:59:59Z
Success Rate: 72.73%
Successes: 32
Failures: 12
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-05-06T23:59:59Z
Success Rate: 85.20%
Successes: 236
Failures: 41
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-05-06%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Alerts&component=Networking%20%2F%20cluster-network-operator&confidence=95&environment=ovn%20no-upgrade%20amd64%20vsphere%20serial&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=vsphere&sampleEndTime=2024-05-06%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-29%2000%3A00%3A00&testId=openshift-tests%3Ab3997eeabb330f3000872f22d6ddb618&testName=%5Bbz-networking%5D%5Binvariant%5D%20alert%2FOVNKubernetesResourceRetryFailure%20should%20not%20be%20at%20or%20above%20info&upgrade=no-upgrade&variant=serial

https://github.com/openshift/ovn-kubernetes/pull/2162

Bug OCPBUGS-26025: oc command fails to execute occasionally in some CI jobs

View the Description View the linked PRs

Description of problem:

The job [sig-node] [Conformance] Prevent openshift node labeling on update by the node TestOpenshiftNodeLabeling [Suite:openshift/conformance/parallel/minimal] uses `oc debug` command [1]. Occasionally we find that command fails to run with ends up failing the test.

Failing job example - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/2220/pull-ci-openshift-cluster-monitoring-operator-master-e2e-aws-ovn-single-node/1742564553356480512

CI search results - https://search.ci.openshift.org/?search=TestOpenshiftNodeLabeling&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

[1] https://github.com/openshift/origin/pull/28296/files#diff-001ed6507a42a0a1689e86c05512e6186c3483488ec96bf5f4354a8f7fa79261R39

Version-Release number of selected component (if applicable):

How reproducible:

Observed in CI jobs mentioned above.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

   oc debug command occasionally exits with error

Expected results:

    oc debug command should not occasionally exit with error

Additional info:

https://github.com/openshift/origin/pull/28499

Bug OCPBUGS-26190: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-policy-controller/pull/145

Bug OCPBUGS-23900: DynamicResources plugin is disabled even though DynamicResourceAllocation feature is enabled

View the Description View the linked PRs

Description of problem:

DRA plugins can be installed, but do not really work because the required scheduler plugin DynamicResources isn't enabled.

Version-Release number of selected component (if applicable):

4.14.3

How reproducible:

Always

Steps to Reproduce:

1. Install an OpenShift cluster and enable the TechPreviewNoUpgrade feature set either during installation or post-install. The feature set includes the DynamicResourceAllocation feature gate.
2. Install a DRA plugin by any vendor, e.g. by NVIDIA (requires at least one GPU worker with NVIDIA GPU drivers installed on the node, and a few tweaks to allow the plugin to run on OpenShift).
3. Create a resource claim.
4. Create a pod that consumes the resource claim.

Actual results:

The pod remains in ContainerCreating state, the claim in WaitingForFirstConsumer state forever, without any meaningful event or error message.

Expected results:

A resource is allocated according to the resource claim, and assigned to the pod.

Additional info:

The problem is caused by the DynamicResources scheduler plugin not being automatically enabled when the feature flag is turned on. This makes DRA plugins run without issues (the right APIs are available), but do nothing.

Bug OCPBUGS-29573: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/458

Bug OCPBUGS-38259: Update KCM node-monitor-grace-period to upstream default value to prevent nodes going "NotReady" for a few seconds

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38258~~. The following is the description of the original issue:
—
The issue we're trying to address is that nodes go NotReady for a few seconds.
See slack thread https://redhat-external.slack.com/archives/C01C8502FMM/p1717767390381249

https://github.com/openshift/hypershift/pull/4520

Bug OCPBUGS-41234: Failed to list secrets when a large number exist on the cluster [4.16]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41233~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39531. The following is the description of the original issue:
—
-> While upgrading the cluster from 4.13.38 -> 4.14.18, it is stuck on CCO, clusterversion is complaining about

"Working towards 4.14.18: 690 of 860 done (80% complete), waiting on cloud-credential".

While checking further we see that CCO deployment is yet to rollout.

-> ClusterOperator status.versions[name=operator] isn't a narrow "CCO Deployment is updated", it's "the CCO asserts the whole CC component is updated", which requires (among other things) a functional CCO Deployment. Seems like you don't have a functional CCO Deployment, because logs have it stuck talking about asking for a leader lease. You don't have Kube API audit logs to say if it's stuck generating the Lease request, or waiting for a response from the Kube API server.

https://github.com/openshift/cloud-credential-operator/pull/757

Task MON-3715: update telemeter Go dependencies and cleanup CI

View the linked PRs

Bug OCPBUGS-25533: Update 4.16 ose-cloud-credential-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-credential-operator/pull/642

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-credential-operator/pull/642

Bug OCPBUGS-27210: Reminder to revert openshift/origin/pull#28522 (http2 test skips) before OCP 4.16 goes GA

View the Description View the linked PRs

https://github.com/openshift/origin/pull/28522 removes two tests related to http2 testing with the default certificate that are know known to fail with HAProxy 2.8. We are reworking the tests as part of ~~NE-1444~~ (HAProxy 2.8 bump).

This bug is a reminder that come OCP 4.16 GA we need to have reworked the tests so that they now pass with HAProxy 2.8 or, if not fixed, revert https://github.com/openshift/origin/pull/28522 which is why I'm marking this bug as a blocker. We do not want to ship 4.16 without reinstating the two tests.

The goal of removing the two tests in https://github.com/openshift/origin/pull/28522 is to allow us to make additional progress in https://github.com/openshift/router/pull/551 (which is our HAProxy 2.8 bump). With all tests passing in router#551 we can continue our assessment of HAProxy 2.8 by a) running the payload tests and b) creating a HAproxy 2.8 image that QE can use with their reliability test suite.

https://github.com/openshift/origin/pull/28540

Bug OCPBUGS-36182: 4.14: Build Tests Reference EOL Ruby Image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33486~~. The following is the description of the original issue:
—
Description of problem:

Build tests in OCP 4.14 reference Ruby images that are now EOL. The related code in our sample ruby build was deleted.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

    1. Run the build suite for OCP 4.14 against a 4.14 cluster

Actual results:

Test [sig-builds][Feature:Builds][Slow] builds with a context directory s2i context directory build should s2i build an application using a context directory [apigroup:build.openshift.io] fails

2024-05-08T11:11:57.558298778Z I0508 11:11:57.558273       1 builder.go:400] Powered by buildah v1.31.0
  2024-05-08T11:11:57.581578795Z I0508 11:11:57.581509       1 builder.go:473] effective capabilities: [audit_control=true audit_read=true audit_write=true block_suspend=true bpf=true checkpoint_restore=true chown=true dac_override=true dac_read_search=true fowner=true fsetid=true ipc_lock=true ipc_owner=true kill=true lease=true linux_immutable=true mac_admin=true mac_override=true mknod=true net_admin=true net_bind_service=true net_broadcast=true net_raw=true perfmon=true setfcap=true setgid=true setpcap=true setuid=true sys_admin=true sys_boot=true sys_chroot=true sys_module=true sys_nice=true sys_pacct=true sys_ptrace=true sys_rawio=true sys_resource=true sys_time=true sys_tty_config=true syslog=true wake_alarm=true]
  2024-05-08T11:11:57.583755245Z I0508 11:11:57.583715       1 builder.go:401] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"s2icontext-1","namespace":"e2e-test-contextdir-wpphk","uid":"c2db2893-06e5-4274-96ae-d8cd635a1f8d","resourceVersion":"51882","generation":1,"creationTimestamp":"2024-05-08T11:11:55Z","labels":{"buildconfig":"s2icontext","openshift.io/build-config.name":"s2icontext","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"s2icontext","openshift.io/build.number":"1"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"s2icontext","uid":"b7dbb52b-ae66-4465-babc-728ae3ceed9a","controller":true}],"managedFields":[{"manager":"openshift-apiserver","operation":"Update","apiVersion":"build.openshift.io/v1","time":"2024-05-08T11:11:55Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:openshift.io/build-config.name":{},"f:openshift.io/build.number":{}},"f:labels":{".":{},"f:buildconfig":{},"f:openshift.io/build-config.name":{},"f:openshift.io/build.start-policy":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"b7dbb52b-ae66-4465-babc-728ae3ceed9a\"}":{}}},"f:spec":{"f:output":{"f:to":{}},"f:serviceAccount":{},"f:source":{"f:contextDir":{},"f:git":{".":{},"f:uri":{}},"f:type":{}},"f:strategy":{"f:sourceStrategy":{".":{},"f:env":{},"f:from":{},"f:pullSecret":{}},"f:type":{}},"f:triggeredBy":{}},"f:status":{"f:conditions":{".":{},"k:{\"type\":\"New\"}":{".":{},"f:lastTransitionTime":{},"f:lastUpdateTime":{},"f:status":{},"f:type":{}}},"f:config":{},"f:phase":{}}}}]},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/sclorg/s2i-ruby-container"},"contextDir":"2.7/test/puma-test-app"},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/openshift/ruby:2.7-ubi8"},"pullSecret":{"name":"builder-dockercfg-v9xk2"},"env":[{"name":"BUILD_LOGLEVEL","value":"5"}]}},"output":{"to":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/e2e-test-contextdir-wpphk/test:latest"},"pushSecret":{"name":"builder-dockercfg-v9xk2"}},"resources":{},"postCommit":{},"nodeSelector":null,"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New","outputDockerImageReference":"image-registry.openshift-image-registry.svc:5000/e2e-test-contextdir-wpphk/test:latest","config":{"kind":"BuildConfig","namespace":"e2e-test-contextdir-wpphk","name":"s2icontext"},"output":{},"conditions":[{"type":"New","status":"True","lastUpdateTime":"2024-05-08T11:11:55Z","lastTransitionTime":"2024-05-08T11:11:55Z"}]}}
  2024-05-08T11:11:57.584949442Z Cloning "https://github.com/sclorg/s2i-ruby-container" ...
  2024-05-08T11:11:57.585044449Z I0508 11:11:57.585030       1 source.go:237] git ls-remote --heads https://github.com/sclorg/s2i-ruby-container
  2024-05-08T11:11:57.585081852Z I0508 11:11:57.585072       1 repository.go:450] Executing git ls-remote --heads https://github.com/sclorg/s2i-ruby-container
  2024-05-08T11:11:57.840621917Z I0508 11:11:57.840572       1 source.go:237] 663daf43b2abb5662504638d017c7175a6cff59d	refs/heads/3.2-experimental
  2024-05-08T11:11:57.840621917Z 88b4e684576b3fe0e06c82bd43265e41a8129c5d	refs/heads/add_test_latest_imagestreams
  2024-05-08T11:11:57.840621917Z 12a863ab4b050a1365d6d59970dddc6743e8bc8c	refs/heads/master
  2024-05-08T11:11:57.840730405Z I0508 11:11:57.840714       1 source.go:69] Cloning source from https://github.com/sclorg/s2i-ruby-container
  2024-05-08T11:11:57.840793509Z I0508 11:11:57.840781       1 repository.go:450] Executing git clone --recursive --depth=1 https://github.com/sclorg/s2i-ruby-container /tmp/build/inputs
  2024-05-08T11:11:59.073229755Z I0508 11:11:59.073183       1 repository.go:450] Executing git rev-parse --abbrev-ref HEAD
  2024-05-08T11:11:59.080132731Z I0508 11:11:59.080079       1 repository.go:450] Executing git rev-parse --verify HEAD
  2024-05-08T11:11:59.083626287Z I0508 11:11:59.083586       1 repository.go:450] Executing git --no-pager show -s --format=%an HEAD
  2024-05-08T11:11:59.115407368Z I0508 11:11:59.115361       1 repository.go:450] Executing git --no-pager show -s --format=%ae HEAD
  2024-05-08T11:11:59.195276873Z I0508 11:11:59.195231       1 repository.go:450] Executing git --no-pager show -s --format=%cn HEAD
  2024-05-08T11:11:59.198916080Z I0508 11:11:59.198879       1 repository.go:450] Executing git --no-pager show -s --format=%ce HEAD
  2024-05-08T11:11:59.204712375Z I0508 11:11:59.204663       1 repository.go:450] Executing git --no-pager show -s --format=%ad HEAD
  2024-05-08T11:11:59.211098793Z I0508 11:11:59.211051       1 repository.go:450] Executing git --no-pager show -s --format=%<(80,trunc)%s HEAD
  2024-05-08T11:11:59.216192627Z I0508 11:11:59.216149       1 repository.go:450] Executing git config --get remote.origin.url
  2024-05-08T11:11:59.218615714Z 	Commit:	12a863ab4b050a1365d6d59970dddc6743e8bc8c (Bump common from `1f774c8` to `a957816` (#537))
  2024-05-08T11:11:59.218661988Z 	Author:	dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
  2024-05-08T11:11:59.218683019Z 	Date:	Tue Apr 9 15:24:11 2024 +0200
  2024-05-08T11:11:59.218722882Z I0508 11:11:59.218711       1 repository.go:450] Executing git rev-parse --abbrev-ref HEAD
  2024-05-08T11:11:59.234411732Z I0508 11:11:59.234366       1 repository.go:450] Executing git rev-parse --verify HEAD
  2024-05-08T11:11:59.237729596Z I0508 11:11:59.237698       1 repository.go:450] Executing git --no-pager show -s --format=%an HEAD
  2024-05-08T11:11:59.255304604Z I0508 11:11:59.255269       1 repository.go:450] Executing git --no-pager show -s --format=%ae HEAD
  2024-05-08T11:11:59.261113560Z I0508 11:11:59.261074       1 repository.go:450] Executing git --no-pager show -s --format=%cn HEAD
  2024-05-08T11:11:59.270006232Z I0508 11:11:59.269961       1 repository.go:450] Executing git --no-pager show -s --format=%ce HEAD
  2024-05-08T11:11:59.278485984Z I0508 11:11:59.278443       1 repository.go:450] Executing git --no-pager show -s --format=%ad HEAD
  2024-05-08T11:11:59.281940527Z I0508 11:11:59.281906       1 repository.go:450] Executing git --no-pager show -s --format=%<(80,trunc)%s HEAD
  2024-05-08T11:11:59.299465312Z I0508 11:11:59.299423       1 repository.go:450] Executing git config --get remote.origin.url
  2024-05-08T11:11:59.374652834Z error: provided context directory does not exist: 2.7/test/puma-test-app

Expected results:

Tests succeed

Additional info:

Ruby 2.7 is EOL and not searchable in the Red Hat container catalog.

Failing test: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-openshift-controller-manager-operator/344/pull-ci-openshift-cluster-openshift-controller-manager-operator-release-4.14-openshift-e2e-aws-builds-techpreview/1788152058105303040

https://github.com/openshift/origin/pull/28904

Bug OCPBUGS-28334: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes-autoscaler/pull/287

Bug OCPBUGS-36698: [IBMCloud] MAPI only checks first set of subnets (no pagination support)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36185~~. The following is the description of the original issue:
—
Description of problem:

    The MAPI for IBM Cloud currently only checks the first group of subnets (50) when searching for Subnet details by name. It should provide pagination support to search all subnets.

Version-Release number of selected component (if applicable):

    4.17

How reproducible:

    100%, dependent on order of subnets returned by IBM Cloud API's however

Steps to Reproduce:

    1. Create 50+ IBM Cloud VPC Subnets
    2. Create a new IPI cluster (with or without BYON)
    3. MAPI will attempt to find Subnet details by name, likely failing as it only checks the first group (50)...depending on order returned by IBM Cloud API

Actual results:

    MAPI fails to find Subnet ID, thus cannot create/manage cluster nodes.

Expected results:

    Successful IPI deployment.

Additional info:

    IBM Cloud is working on a patch to MAPI to handle the ListSubnets API call and pagination results.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/42

Bug OCPBUGS-41372: OpenID IDP endpoint verification fails when hostname can only be resolved by data plane

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41371~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38349. The following is the description of the original issue:
—
Description of problem:

When using configuring an OpenID idp that can only be accessed via the data plane, if the hostname of the provider can only be resolved by the data plane, reconciliation of the idp fails.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    always

Steps to Reproduce:

    1. Configure an OpenID idp on a HostedCluster with a URL that points to a service in the dataplane (like https://keycloak.keycloak.svc)

Actual results:

    The oauth server fails to be reconciled

Expected results:

    The oauth server reconciles and functions properly

Additional info:

    Follow up to OCPBUGS-37753

https://github.com/openshift/hypershift/pull/4689

Bug OCPBUGS-36330: Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-28974~~. The following is the description of the original issue:
—
Description of problem:

Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15

Version-Release number of selected component (if applicable):

Upgrade from 4.1 to 4.15
4.1.41-x86_64, 4.2.36-x86_64, 4.3.40-x86_64, 4.4.33-x86_64, 4.5.41-x86_64, 4.6.62-x86_64, 4.7.60-x86_64, 4.8.57-x86_64, 4.9.59-x86_64, 4.10.67-x86_64, 4.11 nightly, 4.12 nightly, 4.13 nightly, 4.14 nightly, 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest

How reproducible:

Seems always, the issue was found in our prow ci, and I also reproduce it.

Steps to Reproduce:

1.Create an aws IPI 4.1 cluster, then upgrade it one by one to 4.14
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2024-01-19-110702   True        True          26m     Working towards 4.12.0-0.nightly-2024-02-04-062856: 654 of 830 done (78% complete), waiting on authentication, openshift-apiserver, openshift-controller-manager
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2024-02-04-062856   True        False         5m12s   Cluster version is 4.12.0-0.nightly-2024-02-04-062856
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2024-02-04-062856   True        True          61m     Working towards 4.13.0-0.nightly-2024-02-04-042638: 713 of 841 done (84% complete), waiting up to 40 minutes on machine-config
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2024-02-04-042638   True        False         10m     Cluster version is 4.13.0-0.nightly-2024-02-04-042638
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2024-02-04-042638   True        True          17m     Working towards 4.14.0-0.nightly-2024-02-02-173828: 233 of 860 done (27% complete), waiting on control-plane-machine-set, machine-api
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-02-173828   True        False         18m     Cluster version is 4.14.0-0.nightly-2024-02-02-173828     

2.When it upgrade to 4.14, check the machine scale successfully
liuhuali@Lius-MacBook-Pro huali-test %  oc create -f ms1.yaml 
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa created
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset
NAME                                            DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a    1         1         1       1           14h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa   0         0                             3s
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f    2         2         2       2           14h
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa --replicas=1
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                  PHASE     TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                   Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-master-1                   Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-master-2                   Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt    Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa-mt9kh   Running   m6a.xlarge   us-east-1   us-east-1a   15m
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k    Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb    Running   m6a.xlarge   us-east-1   us-east-1f   15h
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-128-51.ec2.internal    Ready    master   15h     v1.27.10+28ed2d7
ip-10-0-143-198.ec2.internal   Ready    worker   14h     v1.27.10+28ed2d7
ip-10-0-143-64.ec2.internal    Ready    worker   14h     v1.27.10+28ed2d7
ip-10-0-143-80.ec2.internal    Ready    master   15h     v1.27.10+28ed2d7
ip-10-0-144-123.ec2.internal   Ready    master   15h     v1.27.10+28ed2d7
ip-10-0-147-94.ec2.internal    Ready    worker   14h     v1.27.10+28ed2d7
ip-10-0-158-61.ec2.internal    Ready    worker   3m40s   v1.27.10+28ed2d7
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa --replicas=0
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get node                                                                   
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-128-51.ec2.internal    Ready    master   15h   v1.27.10+28ed2d7
ip-10-0-143-198.ec2.internal   Ready    worker   15h   v1.27.10+28ed2d7
ip-10-0-143-64.ec2.internal    Ready    worker   15h   v1.27.10+28ed2d7
ip-10-0-143-80.ec2.internal    Ready    master   15h   v1.27.10+28ed2d7
ip-10-0-144-123.ec2.internal   Ready    master   15h   v1.27.10+28ed2d7
ip-10-0-147-94.ec2.internal    Ready    worker   15h   v1.27.10+28ed2d7
liuhuali@Lius-MacBook-Pro huali-test % oc get machine                                                                
NAME                                                 PHASE     TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                  Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-master-1                  Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-master-2                  Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt   Running   m6a.xlarge   us-east-1   us-east-1a   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k   Running   m6a.xlarge   us-east-1   us-east-1f   15h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb   Running   m6a.xlarge   us-east-1   us-east-1f   15h
liuhuali@Lius-MacBook-Pro huali-test % oc delete machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa 
machineset.machine.openshift.io "ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-02-173828   True        False         43m     Cluster version is 4.14.0-0.nightly-2024-02-02-173828     

3.Upgrade to 4.15
As upgrade to 4.15 nightly stuck on operator-lifecycle-manager-packageserver which is a bug https://issues.redhat.com/browse/OCPBUGS-28744  so I build image with the fix pr (job build openshift/operator-framework-olm#679 succeeded) and upgrade to the image, upgrade successfully

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-02-173828   True        True          7s      Working towards 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest: 10 of 875 done (1% complete)
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         23m     Cluster version is 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME                                       VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
baremetal                                  4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      11h     
cloud-controller-manager                   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      8h      
cloud-credential                           4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
cluster-autoscaler                         4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
config-operator                            4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      13h     
console                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      3h19m   
control-plane-machine-set                  4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      5h      
csi-snapshot-controller                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      7h10m   
dns                                        4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
etcd                                       4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
image-registry                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      33m     
ingress                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
insights                                   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
kube-apiserver                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
kube-controller-manager                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
kube-scheduler                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      14h     
kube-storage-version-migrator              4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      34m     
machine-api                                4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
machine-approver                           4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      13h     
machine-config                             4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      10h     
marketplace                                4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      10h     
monitoring                                 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
network                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
node-tuning                                4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      56m     
openshift-apiserver                        4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
openshift-controller-manager               4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      4h56m   
openshift-samples                          4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      58m     
operator-lifecycle-manager                 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
operator-lifecycle-manager-catalog         4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
operator-lifecycle-manager-packageserver   4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      57m     
service-ca                                 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      16h     
storage                                    4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest   True        False         False      9h      
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                 PHASE     TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                  Running   m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-master-1                  Running   m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-master-2                  Running   m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt   Running   m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k   Running   m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb   Running   m6a.xlarge   us-east-1   us-east-1f   16h 

4.Check machine scale stuck in Provisioned, no csr pending

liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machineset
NAME                                            DESIRED   CURRENT   READY   AVAILABLE   AGE
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a    1         1         1       1           16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1   0         0                             6s
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f    2         2         2       2           16h
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 --replicas=1
machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                  PHASE          TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                   Running        m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-master-1                   Running        m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-master-2                   Running        m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt    Running        m6a.xlarge   us-east-1   us-east-1a   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877   Provisioning   m6a.xlarge   us-east-1   us-east-1a   4s
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k    Running        m6a.xlarge   us-east-1   us-east-1f   16h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb    Running        m6a.xlarge   us-east-1   us-east-1f   16h
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                  PHASE         TYPE         REGION      ZONE         AGE
ci-op-trzci0vq-8a8c4-dq95h-master-0                   Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-master-1                   Running       m6a.xlarge   us-east-1   us-east-1a   18h
ci-op-trzci0vq-8a8c4-dq95h-master-2                   Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt    Running       m6a.xlarge   us-east-1   us-east-1a   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877   Provisioned   m6a.xlarge   us-east-1   us-east-1a   97m
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k    Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb    Running       m6a.xlarge   us-east-1   us-east-1f   18h
ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f1-4ln47   Provisioned   m6a.xlarge   us-east-1   us-east-1f   50m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-128-51.ec2.internal    Ready    master   18h   v1.28.6+a373c1b
ip-10-0-143-198.ec2.internal   Ready    worker   18h   v1.28.6+a373c1b
ip-10-0-143-64.ec2.internal    Ready    worker   18h   v1.28.6+a373c1b
ip-10-0-143-80.ec2.internal    Ready    master   18h   v1.28.6+a373c1b
ip-10-0-144-123.ec2.internal   Ready    master   18h   v1.28.6+a373c1b
ip-10-0-147-94.ec2.internal    Ready    worker   18h   v1.28.6+a373c1b
liuhuali@Lius-MacBook-Pro huali-test % oc get csr
NAME        AGE   SIGNERNAME                                    REQUESTOR                                  REQUESTEDDURATION   CONDITION
csr-596n7   21m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-147-94.ec2.internal    <none>              Approved,Issued
csr-7nr9m   42m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-147-94.ec2.internal    <none>              Approved,Issued
csr-bc9n7   16m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-128-51.ec2.internal    <none>              Approved,Issued
csr-dmk27   18m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-128-51.ec2.internal    <none>              Approved,Issued
csr-ggkgd   64m   kubernetes.io/kube-apiserver-client-kubelet   system:node:ip-10-0-143-198.ec2.internal   <none>              Approved,Issued
csr-rs9cz   70m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-143-80.ec2.internal    <none>              Approved,Issued
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

 Machine stuck in Provisioned

Expected results:

  Machine should get Running

Additional info:

Must gather: https://drive.google.com/file/d/1TrZ_mb-cHKmrNMsuFl9qTdYo_eNPuF_l/view?usp=sharing 
I can see the provisioned machine on AWS console: https://drive.google.com/file/d/1-OcsmvfzU4JBeGh5cil8P2Hoe5DQsmqF/view?usp=sharing
System log of ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877: https://drive.google.com/file/d/1spVT_o0S4eqeQxE5ivttbAazCCuSzj1e/view?usp=sharing 
Some log on the instance: https://drive.google.com/file/d/1zjxPxm61h4L6WVHYv-w7nRsSz5Fku26w/view?usp=sharing

https://github.com/openshift/machine-config-operator/pull/4445

Bug OCPBUGS-33170: terminationMessagePolicy should be FallbackToLogsOnError

View the Description View the linked PRs

While debugging a problem, I noticed some containers lack FallbackToLogsOnError. This is important for debugging via the API. Found via https://github.com/openshift/origin/pull/28547

Bug OCPBUGS-18534: [Jira:"NetworkEdge"] monitor test service-type-load-balancer-availability setup fails frequently in 4.14 & 4.15 PowerVS CI jobs

View the Description View the linked PRs

Description of problem:

Following tests fails consistently in 4.14 powerVS runs

{}[Jira:"NetworkEdge"] XXXitor test service-type-load-balancer-availability setup

JobLink

Issue 1 analysis:

Error Description :

{ failed during setup error waiting for replicaset: failed waiting for pods to be running: timeout waiting for 2 pods to be ready}

Some Observations:

while creating a TCP service service-test with type=LoadBalancer for starting SimultaneousPodIPController it is failing to get loadbalancers details from cloud which is resulting to the error before starting data collection for e2e test and leading to the failure of test case "[Jira:"NetworkEdge"] XXXitor test service-type-load-balancer-availability setup".

Bug OCPBUGS-25557: Update 4.16 ose-aws-ebs-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-operator/pull/87

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-27732: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1671

Bug OCPBUGS-25173: Update 4.16 ose-libvirt-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-libvirt/pull/274

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-libvirt/pull/275

Bug OCPBUGS-29919: Logs of runtimecfg node-ip detection too verbose

View the Description View the linked PRs

==== This Jira covers only baremetal-runtimecfg component with respect to node IP detection ====

Description of problem:

Pods running in the namespace openshift-vsphere-infra are so much verbose printing as INFO messages that should debug.

This excesse of verbosity has an impact in CRIO, in the node and also in the Logging system. 

For instance, having 71 nodes, the number of logs coming from this namespace in 1 month was: 450.000.000 meaning 1TB of logs written to disk on the node by CRIO, reading but the Red Hat log collector and stored in the Log Store.

Added to the impact on the performance, it have a financial impact for the storage needed.

Examples of logs are that adjust better to DEBUG and not as INFO:
```
/// For keep-alive pods are printed 4 messages per node each 10 seconds per node, in this example, the number of nodes is 71, then, this means 284 log entries per second, then 1704 log entries by minute and keepalive pod
$ oc logs keepalived-master.example-0 -c  keepalived-monitor |grep master.example-0|grep 2024-02-15T08:20:21 |wc -l

$ oc logs keepalived-master-example-0 -c  keepalived-monitor |grep worker-example-0|grep 2024-02-15T08:20:21 
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.671390814Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"
2024-02-15T08:20:21.733399279Z time="2024-02-15T08:20:21Z" level=info msg="Searching for Node IP of worker-example-0. Using 'x.x.x.x' as machine network. Filtering out VIPs '[x.x.x.x x.x.x.x]'."
2024-02-15T08:20:21.733421398Z time="2024-02-15T08:20:21Z" level=info msg="For node worker-example-0 selected peer address x.x.x.x using NodeInternalIP"

/// For haproxy logs observed 2 logs printed per 6 seconds for each master, this means 6 messages in the same second, 60 messages/minute per pod
$ oc logs haproxy-master-0-example -c haproxy-monitor
...
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="Searching for Node IP of master-example-0. Using 'x.x.x.x/24' as machine network. Filtering out VIPs '[x.x.x.x]'."
2024-02-15T08:20:00.517159455Z time="2024-02-15T08:20:00Z" level=info msg="For node master-example-0 selected peer address x.x.x.x using NodeInternalIP"

Version-Release number of selected component (if applicable):

OpenShift 4.14
VSphere IPI installation

How reproducible:

Always

Steps to Reproduce:

    1. Install OpenShift 4.14 Vsphere IPI environment
    2. Review the logs of the haproxy pods and keealived pods running in the namespace `openshift-vsphere-infra`

Actual results:

The pods haproxy-* and keepalived-* pods being so much verbose printing as INFO messages should be as DEBUG. 

Some of the messages are available in the Description of the problem in the present bug.

Expected results:

Printed as INFO only relevant messages helping to reduce the verbosity of the pods running in the namespace  `openshift-vsphere-infra`

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/301

Task MON-3693: Update Prometheus owners in openshift/origin

View the linked PRs

https://github.com/openshift/origin/pull/28552

Bug OCPBUGS-14481: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13491

Bug OCPBUGS-29614: control-plane-machine-set-operator is failing on AWS UPI

View the Description View the linked PRs

[sig-cluster-lifecycle][Feature:Machines] Managed cluster should [sig-scheduling][Early] control plane machine set operator should not have any events [Suite:openshift/conformance/parallel]

Looks like this test is permafailing on 4.16 and 4.15 AWS UPI jobs - does this need to be skipped on UPI?

{  fail [github.com/openshift/origin/test/extended/machines/machines.go:191]: Unexpected error:
    <*errors.StatusError | 0xc0031b8f00>: 
    controlplanemachinesets.machine.openshift.io "cluster" not found
    {
        ErrStatus: 
            code: 404
            details:
              group: machine.openshift.io
              kind: controlplanemachinesets
              name: cluster
            message: controlplanemachinesets.machine.openshift.io "cluster" not found
            metadata: {}
            reason: NotFound
            status: Failure,
    }
occurred
Ginkgo exit error 1: exit with code 1}

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-upi/1758372765364129792

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-upi/1758308563689672704

Bug OCPBUGS-37838: Log bundle analizer gives bogus analyze when failing to download bootstrap logs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34953~~. The following is the description of the original issue:
—
Description of problem: When the bootstrap times out, the installer tries to download the logs from the bootstrap VM and gives an analysis of what happened. On OpenStack platform, we're currently failing to download the bootstrap logs (tracked in OCPBUGS-34950), which causes the analysis to always return an erroneous message:

time="2024-06-05T08:34:45-04:00" level=error msg="Bootstrap failed to complete: timed out waiting for the condition"
time="2024-06-05T08:34:45-04:00" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."
time="2024-06-05T08:34:45-04:00" level=error msg="The bootstrap machine did not execute the release-image.service systemd unit"

The affirmation that the bootstrap machine did not execute the release-image.service systemd unit is wrong, as I can confirm by SSH'ing to the bootstrap node:

systemctl status release-image.service
● release-image.service - Download the OpenShift Release Image
     Loaded: loaded (/etc/systemd/system/release-image.service; static)
     Active: active (exited) since Wed 2024-06-05 11:57:33 UTC; 1h 16min ago
    Process: 2159 ExecStart=/usr/local/bin/release-image-download.sh (code=exited, status=0/SUCCESS)
   Main PID: 2159 (code=exited, status=0/SUCCESS)
        CPU: 47.364s

Jun 05 11:57:05 mandre-tnvc8bootstrap systemd[1]: Starting Download the OpenShift Release Image...
Jun 05 11:57:06 mandre-tnvc8bootstrap podman[2184]: 2024-06-05 11:57:06.895418265 +0000 UTC m=+0.811028632 system refresh
Jun 05 11:57:06 mandre-tnvc8bootstrap release-image-download.sh[2159]: Pulling quay.io/openshift-release-dev/ocp-release@sha256:31cdf34b1957996d5c79c48466abab2fcfb9d9843>
Jun 05 11:57:32 mandre-tnvc8bootstrap release-image-download.sh[2269]: 079f5c86b015ddaf9c41349ba292d7a5487be91dd48e48852d10e64dd0ec125d
Jun 05 11:57:32 mandre-tnvc8bootstrap podman[2269]: 2024-06-05 11:57:32.82473216 +0000 UTC m=+25.848290388 image pull 079f5c86b015ddaf9c41349ba292d7a5487be91dd48e48852d1>
Jun 05 11:57:33 mandre-tnvc8bootstrap systemd[1]: Finished Download the OpenShift Release Image.

The installer was just unable to retrieve the bootstrap logs. Earlier, buried in the installer logs, we can see:

time="2024-06-05T08:34:42-04:00" level=info msg="Failed to gather bootstrap logs: failed to connect to the bootstrap machine: dial tcp 10.196.2.10:22: connect: connection
 timed out"

This is what should be reported by the analyzer.

https://github.com/openshift/installer/pull/8794

Bug OCPBUGS-29772: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/6010

Bug OCPBUGS-38035: IBM Operator Index Image fails with "cache requires rebuild: cache reports digest as xxx, but computed digest is yyy"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37667~~. The following is the description of the original issue:
—
Description of problem:

After successfully mirroring the ibm-ftm-operator via the latest oc-mirror command to internal registry and applying the newly generated IBM CatalogSource YAML file. The created catalog pod in the openShift-marketplace namespace enters CrashLoopBackOff.

Customer is trying to mirror operators and  list the catalogue the command has no issues, but catalog pod is crashing with the following error:
~~~
time="2024-07-10T13:43:07Z" level=info msg="starting pprof endpoint" address="localhost:6060"
time="2024-07-10T13:43:08Z" level=fatal msg="cache requires rebuild: cache reports digest as \"e891bfd5a4cb5702\", but computed digest is \"1922475dc0ee190c\""
~~~

Version-Release number of selected component (if applicable):

oc-mirror 4.16
OCP 4.14.z

How reproducible:

Steps to Reproduce:

    1. Create catalog image with the following imagesetconfiguration:
~~~
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  registry:
    imageURL: <internal-registry>:Port/oc-mirror-metadata/12july24
    skipTLS: false
mirror:
  platform:
    architectures:
      - "amd64"
    channels:
    - name: stable-4.14
      minVersion: 4.14.11
      maxVersion: 4.14.30
      type: ocp
      shortestPath: true
    graph: true
  operators:
  - catalog: icr.io/cpopen/ibm-operator-catalog:v1.22
    packages:
    - name: ibm-ftm-operator
      channels:
      - name: v4.4
~~~
    2.  Run the following command:
~~~
/oc-mirror --config=./imageset-config.yaml docker://Internal-registry:Port --rebuild-catalogs
~~~
    3. Create catalogsourcepod under openshift-marketplace namespace:
~~~
 cat oc-mirror-workspace/results-1721222945/catalogSource-cs-ibm-operator-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: cs-ibm-operator-catalog
  namespace: openshift-marketplace
spec:
  image: Internal-registry:Port/cpopen/ibm-operator-catalog:v1.22
  sourceType: grpc
~~~

Actual results:


catalog pod is crashing with the following error:
~~~
time="2024-07-10T13:43:07Z" level=info msg="starting pprof endpoint" address="localhost:6060"
time="2024-07-10T13:43:08Z" level=fatal msg="cache requires rebuild: cache reports digest as \"e891bfd5a4cb5702\", but computed digest is \"1922475dc0ee190c\""
~~~

Expected results:

The pod should run without any issue.

Additional info:

1. The issue is reproducible with the OCP 4.14.14 and OCP 4.14.29
2. Customer is already using oc-mirror 4.16:
~~~
./oc-mirror version
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202407030803.p0.g394b1f8.assembly.stream.el9-394b1f8", GitCommit:"394b1f814f794f4f01f473212c9a7695726020bf", GitTreeState:"clean", BuildDate:"2024-07-03T10:18:49Z", GoVersion:"go1.21.11 (Red Hat 1.21.11-1.module+el8.10.0+21986+2112108a) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}
~~~
3. Customer tried with workaround described in the KB[1]: https://access.redhat.com/solutions/7006771 but no luck
4. Customer also tried to set the OPM_BINARY, but didn't work. They download  OPM with respective arch: https://github.com/operator-framework/operator-registry/releases rename the downloaded binary as opm and set below variable before executing oc-mirror
OPM_BINARY=/path/to/opm

https://github.com/openshift/oc-mirror/pull/905

Bug MGMT-16791: [BE] - in feature support api - VIP_AUTO_ALLOC is dev_preview for 4.15 - should be unavailable

View the Description View the linked PRs

Description of the problem:

BE master ~2.30 - in feature support api - VIP_AUTO_ALLOC is dev_preview for 4.15 - should be unavailable

How reproducible:

100%

Steps to reproduce:

1. GET https://<SERVICE_ADDRESS>/api/assisted-install/v2/support-levels/features?openshift_version=4.15&cpu_architecture=x86_64

2. BE response support level

3.

Actual results:

VIP_AUTO_ALLOC is dev_preview for 4.15

Expected results:
VIP_AUTO_ALLOC should be unavailable

https://github.com/openshift/assisted-service/pull/5959

Bug OCPBUGS-38956: Nutanix: failed to install OCP cluster with DHCP network (regression with the 4.16 installer)

View the Description View the linked PRs

Description of problem:

IHAC who is facing issue while deploying nutanix IPI cluster 4.16.x with dhcp.ENV DETAILS: Nutanix Versions: AOS: 6.5.4 NCC: 4.6.6.3 PC: pc.2023.4.0.2 LCM: 3.0.0.1During the installation process after the bootstrap nodes and control-planes are created, the IP addresses on the nodes shown in the Nutanix Dashboard conflict, even when Infinite DHCP leases are set. The installation will work successfully only when using the Nutanix IPAM. Also 4.14 and 4.15 releases install successfully. IPS of master0 and master2 are conflicting, Please chk attachment. Sos-report of master0 and master1 : https://drive.google.com/drive/folders/140ATq1zbRfqd1Vbew-L_7N4-C5ijMao3?usp=sharing The issue was reported via the slack thread:https://redhat-internal.slack.com/archives/C02A3BM5DGS/p1721837567181699

Version-Release number of selected component (if applicable):

How reproducible:

Use the OCP 4.16.z installer to create an OCP cluster with Nutanix using DHCP network. The installation will fail. Always reproducible.

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    The installation will fail.

Expected results:

    The installation succeeds to create a Nutanix OCP cluster with the DHCP network.

Additional info:

https://github.com/openshift/installer/pull/8905

Bug OCPBUGS-27145: Excessive privileges used for some baremetal containers

View the Description View the linked PRs

In order to use hostPath volumes, containers in kubernetes must be started with the privileged flag set. This is because this flag toggles an SELinux boolean that cannot be toggled by enabling any particular capability. (Empirical testing shows the same restriction does not apply to emptyDir volumes.)

Since the baremetal components rely on a hostPath volumes for an number of purposes, this prevents many of them from running unprivileged.

However, there are a number of containers that do not use any hostPath volumes and need only an added capability, if anything. These should be specified explicitly instead of just setting privileged mode to enable everything.

Bug OCPBUGS-36358: HCP missing audit log configuration for oauth-openshift (OAuth server)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33060~~. The following is the description of the original issue:
—
Description of problem:

HCP has audit log configuration for Kube API server, OpenShift API server, OAuth API server (like OCP), but does not have audit for oauth-openshift (OAuth server). Discussed with Standa in https://redhat-internal.slack.com/archives/CS05TR7BK/p1714124297376299 , oauth-openshift needs audit too in HCP.

Version-Release number of selected component (if applicable):

4.11 ~ 4.16

How reproducible:

Always

Steps to Reproduce:

1. Launch HCP env.
2. Check audit log configuration:
$ oc get deployment -n clusters-hypershift-ci-279389 kube-apiserver openshift-apiserver openshift-oauth-apiserver oauth-openshift -o yaml | grep -e '^    name:' -e 'audit\.log'

Actual results:

2. It outputs oauth-openshift (OAuth server) has no audit:
    name: kube-apiserver
          - /var/log/kube-apiserver/audit.log
    name: openshift-apiserver
          - /var/log/openshift-apiserver/audit.log
    name: openshift-oauth-apiserver
          - --audit-log-path=/var/log/openshift-oauth-apiserver/audit.log
          - /var/log/openshift-oauth-apiserver/audit.log
    name: oauth-openshift

Expected results:

2. oauth-openshift (OAuth server) needs to have audit too.

Additional info:

OCP has audit for OAuth server since 4.11 ~~AUTH-6~~ https://docs.openshift.com/container-platform/4.11/security/audit-log-view.html saying "You can view the logs for the OpenShift API server, Kubernetes API server, OpenShift OAuth API server, and OpenShift OAuth server".

https://github.com/openshift/hypershift/pull/4302

Task MON-3667: remove outdated documentation in the CMO repository

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2232

Bug OCPBUGS-22995: Rule ocp4-cis-file-permissions-cni-conf returned a false negative result

View the Description View the linked PRs

Description of problem:

Rule ocp4-cis-file-permissions-cni-conf returned false negative result
From the CIS benchmark v1.4.0, it is using below command to check the multus config on nodes:

$ for i in $(oc get pods -n openshift-multus -l app=multus -oname); do oc exec -n openshift-multus $i -- /bin/bash -c "stat -c \"%a %n\" /host/etc/cni/net.d/*.conf"; done
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf
600 /host/etc/cni/net.d/00-multus.conf

Per the rule instructions, it is checking /etc/cni/net.d/ on the node.
However, the multus config on nodes is in path /etc/kubernetes/cni/net.d/, not /etc/cni/net.d/:

$ oc debug node/hongli-az-8pzqq-master-0 -- chroot /host ls -ltr /etc/cni/net.d/
Starting pod/hongli-az-8pzqq-master-0-debug ...
To use host binaries, run `chroot /host`
total 8
-rw-r--r--. 1 root root 129 Nov  7 02:18 200-loopback.conflist
-rw-r--r--. 1 root root 469 Nov  7 02:18 100-crio-bridge.conflist
Removing debug pod ...
$ oc debug node/hongli-az-8pzqq-master-0 -- chroot /host ls -ltr /etc/kubernetes/cni/net.d/
Starting pod/hongli-az-8pzqq-master-0-debug ...
To use host binaries, run `chroot /host`
total 4
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
Removing debug pod ...

$  for node in `oc get node --no-headers|awk '{print $1}'`; do oc debug node/$node -- chroot /host ls -l /etc/kubernetes/cni/net.d/; done
Starting pod/hongli-az-8pzqq-master-0-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-master-1-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-master-2-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:23 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:23 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-worker-westus-2mx6t-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:38 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:38 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-worker-westus-9qhf5-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:38 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:38 whereabouts.d
Removing debug pod ...
Starting pod/hongli-az-8pzqq-worker-westus-bcdpd-debug ...
To use host binaries, run `chroot /host`
total 4
-rw-------. 1 root root 352 Nov  7 02:38 00-multus.conf
drwxr-xr-x. 2 root root  60 Nov  7 02:38 whereabouts.d
Removing debug pod ...

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-11-05-194730

How reproducible:

Always

Steps to Reproduce:

1. $ for i in $(oc get pods -n openshift-multus -l app=multus -oname); do oc exec -n openshift-multus $i -- /bin/bash -c "stat -c \"%a %n\" /host/etc/cni/net.d/*.conf"; done
$for node in `oc get node --no-headers|awk '{print $1}'`; do oc debug node/$node -- chroot /host ls -l /etc/kubernetes/cni/net.d/; done

Actual results:

The rule should check the wrong path and return FAIL

Expected results:

The rule should check the right path and return PASS

Additional info:

It was also applicable for both SDN and OVN

https://github.com/openshift/cluster-network-operator/pull/2106

Bug OCPBUGS-24718: Update 4.16 golang-github-prometheus-alertmanager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/85

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-24984: Update 4.16 ose-csi-driver-shared-resource-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-driver-shared-resource/pull/159

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-driver-shared-resource/pull/159

Bug OCPBUGS-33495: Errors not returned from wait-for-ceo cmd during bootstrap teardown

View the Description View the linked PRs

The wait-for-ceo cmd is used during bootstrap to wait until the bootstrap completion conditions are met i.e etcd has scaled up to 3 members + bootstrap.
https://github.com/openshift/installer/blob/d08c982cdbb7f66b810f71aa9608bf51cce8c38c/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L569-L576

Currently this cmd won't return errors in the following two places:

Only logging not returning: https://github.com/openshift/cluster-etcd-operator/blob/cbfb856ec8892687a303989b84e01c8f34c1967e/pkg/cmd/waitforceo/waitforceo.go#L51-L58
Defining a new err in the if block's scope means it doesn't get passed out: https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/operator/bootstrapteardown/waitforceo.go#L26-L28

https://github.com/openshift/cluster-etcd-operator/pull/1260

Bug OCPBUGS-32044: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/573

Bug OCPBUGS-36328: [4.16.z] SCC pinning for all workloads in platform namespaces (oc node debug pods)

View the Description View the linked PRs

Backport to 4.16 of AUTH-482 specifically for the oc node debug pods.

4.17 PR: https://github.com/openshift/oc/pull/1763

https://github.com/openshift/oc/pull/1816

Bug OCPBUGS-25600: AWS: The installer doesn’t precheck if node architecture and vm type are consistent

View the Description View the linked PRs

Description of problem:

The installer doesn’t do precheck if node architecture and vm type are consistent for aws and gcp, it works on azure

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-multi-2023-12-06-195439

How reproducible:

   Always

Steps to Reproduce:

    1.Config compute architecture field to arm64 but vm type choose amd64 instance type in install-config     
    2.Create cluster 
    3.Check installation

Actual results:

Azure will precheck if architecture is consistent with instance type when creating manifests, like:
12-07 11:18:24.452 [INFO] Generating manifests files.....12-07 11:18:24.452 level=info msg=Credentials loaded from file "/home/jenkins/ws/workspace/ocp-common/Flexy-install/flexy/workdir/azurecreds20231207-285-jd7gpj"
12-07 11:18:56.474 level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: controlPlane.platform.azure.type: Invalid value: "Standard_D4ps_v5": instance type architecture 'Arm64' does not match install config architecture amd64

But aws and gcp don’t have precheck, it will fail during installation, but many resources have been created. The case more likely to happen in multiarch cluster

Expected results:

The installer can do a precheck for architecture and vm type , especially for heterogeneous supported platforms(aws,gcp,azure)

Additional info:

https://github.com/openshift/installer/pull/7835

Bug OCPBUGS-42300: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/785

Bug OCPBUGS-25504: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/49

Bug OCPBUGS-25862: Spurious "wait has exceeded 40 minutes" when etcd operator briefly goes degraded in late upgrade

View the Description View the linked PRs

Description of problem:

At 17:26:09, the cluster is happily upgrading nodes:

An update is in progress for 57m58s: Working towards 4.14.1: 734 of 859 done (85% complete), waiting on machine-config

At 17:26:54, the upgrade starts to reboot master nodes and COs get noisy (this one specifically is ~~OCPBUGS-20061~~)

An update is in progress for 58m50s: Unable to apply 4.14.1: the cluster operator control-plane-machine-set is not available

~Two minutes later, at 17:29:07, CVO starts to shout about waiting on operators for over 40 despite not indicating anything is wrong earlier:

An update is in progress for 1h1m2s: Unable to apply 4.14.1: wait has exceeded 40 minutes for these operators: etcd, kube-apiserver

This is only because these operators go briefly degraded during master reboot (which they shouldn't but that is a different story). CVO computes its 40 minutes against the time when it first started to upgrade the given operator so it:

1. Upgrades etcd / KAS very early in the upgrade, noting the time when it started to do that
2. These two COs upgrade successfuly and upgrade proceeds
3. Eventually cluster starts rebooting masters and etcd/KAS go degraded
4. CVO compares current time against the noted time, discovers its more than 40 minutes and starts warning about it.

Version-Release number of selected component (if applicable):

all

How reproducible:

Not entirely deterministic:

1. the upgrade must go for 40m+ between upgrading etcd and upgrading nodes
2. the upgrade must reboot a master that is not running CVO (otherwise there will be a new CVO instance without the saved times, they are only saved in memory)

Steps to Reproduce:

1. Watch oc adm upgrade during the upgrade

Actual results:

Spurious "waiting for over 40m" message pops out of the blue

Expected results:

CVO simply says "waiting up to 40m on" and this eventually goes away as the node goes up and etcd goes out of degraded.

https://github.com/openshift/cluster-version-operator/pull/1011

Bug OCPBUGS-29126: Storage operators could flap during upgrade

View the Description View the linked PRs

Storage operators are typically running only one replica and they could flip-flop between Progressing:True and Progressing:False also Available: False and Available:True during upgrade.

Usually this settles down, but this is causing our CI to report poor signal on vSphere platform - https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-vsphere-ovn-upgrade

https://github.com/openshift/cluster-storage-operator/pull/456

Bug OCPBUGS-31074: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28662

Bug OCPBUGS-24926: Update 4.16 ose-alibaba-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-alibaba-cloud/pull/40

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/40

Bug OCPBUGS-31869: ART requests updates to 4.16 image ose-azure-file-csi-driver-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/azure-file-csi-driver/pull/59

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/azure-file-csi-driver/pull/59

Bug OCPBUGS-35056: AWS - CPO can use incorrect CIDR range on the default worker security group

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34274~~. The following is the description of the original issue:
—
Description of problem:

AWS VPCs support a primary CIDR range and multiple secondary CIDR ranges: https://aws.amazon.com/about-aws/whats-new/2017/08/amazon-virtual-private-cloud-vpc-now-allows-customers-to-expand-their-existing-vpcs/

Let's pretend a VPC exists with:

Primary CIDR range: 10.0.0.0/24 (subnet-a)
Seconday CIDR range: 10.1.0.0/24 (subnet-b)

and a hostedcontrolplane object like:

  networking:
...
    machineNetwork:
    - cidr: 10.1.0.0/24
...
  olmCatalogPlacement: management
  platform:
    aws:
      cloudProviderConfig:
        subnet:
          id: subnet-b
        vpc: vpc-069a93c6654464f03

Even though all EC2 instances will be spun up in subnet-b (10.1.0.0/24), CPO will detect the CIDR range of the VPC as 10.0.0.0/24 (https://github.com/openshift/hypershift/blob/0d10c822912ed1af924e58ccb8577d2bb1fd68be/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L4755-L4765) and create security group rules only allowing inboud traffic from 10.0.0.0/24. This specifically prevents these EC2 instances from communicating with the VPC Endpoint created by the awsendpointservice CR and reading the hosted control plane pods.

Version-Release number of selected component (if applicable):

    Reproduced on a 4.14.20 ROSA HCP cluster, but the version should not matter

How reproducible:

100%

Steps to Reproduce:

    1. Create a VPC with at least one secondary CIDR block
    2. Install a ROSA HCP cluster providing the secondary CIDR block as the machine CIDR range and selecting the appropriate subnets within the secondary CIDR range

Actual results:

* Observe that the default security group contains inbound security group rules allowing traffic from the VPC's primary CIDR block (not a CIDR range containing the cluster's worker nodes)

* As a result, the EC2 instances (worker nodes) fail to reach the ignition-server

Expected results:

The EC2 instances are able to reach the ignition-server and HCP pods

Additional info:

This bug seems like it could be fixed by using the machine CIDR range for the security group instead of the VPC CIDR range. Alternatively, we could duplicate rules for every secondary CIDR block, but the default AWS quota is 60 inbound security group rules/security group, so it's another failure condition to keep in mind if we go that route.

aws ec2 describe-vpcs output for a VPC with secondary CIDR blocks:    

❯ aws ec2 describe-vpcs --region us-east-2 --vpc-id vpc-069a93c6654464f03
{
    "Vpcs": [
        {
            "CidrBlock": "10.0.0.0/24",
            "DhcpOptionsId": "dopt-0d1f92b25d3efea4f",
            "State": "available",
            "VpcId": "vpc-069a93c6654464f03",
            "OwnerId": "429297027867",
            "InstanceTenancy": "default",
            "CidrBlockAssociationSet": [
                {
                    "AssociationId": "vpc-cidr-assoc-0abbc75ac8154b645",
                    "CidrBlock": "10.0.0.0/24",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                },
                {
                    "AssociationId": "vpc-cidr-assoc-098fbccc85aa24acf",
                    "CidrBlock": "10.1.0.0/24",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                }
            ],
            "IsDefault": false,
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "test"
                }
            ]
        }
    ]
}

https://github.com/openshift/hypershift/pull/4174

Bug OCPBUGS-22301: ClusterResourceQuota is stuck in delete state when using foreground deletion cascading strategy

View the Description View the linked PRs

Description of problem:

When trying to delete a ClusterResourceQuota resource, using foreground deletion cascading strategy it's stuck in that state and failing to complete the removal.

Once background deletion cascading strategy is used it's immediately removed.

Now, given that OpenShift GitOps is using foreground deletion cascading strategy by default, this does expose some challenges when managing ClusterResourceQuota resources using OpenShift GitOps.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-23-223425 but also previous version of OpenShift Container Platform 4 are affected

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4
2. Create the ClusterResourceQuota as shown below

$ bat -p /tmp/crq.yaml
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
  creationTimestamp: null
  name: blue
spec:
  quota:
    hard:
      pods: "10"
      secrets: "20"
  selector:
    annotations: null
    labels:
      matchLabels:
        color: nocolor

3. Delete the ClusterResourceQuota using "oc delete --cascade=foreground clusterresourcequota blue"

Actual results:

$ oc delete --cascade=foreground clusterresourcequota blue
clusterresourcequota.quota.openshift.io "blue" deleted

Is stuck and won't finish, the resource looks as shown below.

$ oc get clusterresourcequota blue -o yaml
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"quota.openshift.io/v1","kind":"ClusterResourceQuota","metadata":{"annotations":{},"creationTimestamp":null,"name":"blue"},"spec":{"quota":{"hard":{"pods":"10","secrets":"20"}},"selector":{"annotations":null,"labels":{"matchLabels":{"color":"nocolor"}}}}}
  creationTimestamp: "2023-10-24T07:37:48Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2023-10-24T07:59:47Z"
  finalizers:
  - foregroundDeletion
  generation: 2
  name: blue
  resourceVersion: "60554"
  uid: c18dd92c-afeb-47f4-a944-8b55be4037d7
spec:
  quota:
    hard:
      pods: "10"
      secrets: "20"
  selector:
    annotations: null
    labels:
      matchLabels:
        color: nocolor

Expected results:

The ClusterResourceQuota to be deleted using foreground deletion cascading strategy without being stuck as there does not appear to be any OwnerReference that is still around and blocking removal

Additional info:

Bug OCPBUGS-36954: Update to azidentity v1.7.0 [4.16]

View the Description View the linked PRs

Description of problem:

  Snyk is failing on some deps

Version-Release number of selected component (if applicable):

  At least master/4.17 and 4.16

How reproducible:

    100%

Steps to Reproduce:

Open a PR against master or release-4.16 branch, Snyk will fail. And it seems like recent history shows that the test is just being overridden, we should stop overriding the test and fix the deps or justify excluding them from Snyk

Actual results:

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cloud-credential-operator/679/pull-ci-openshift-cloud-credential-operator-master-security/1793098328855023616

https://github.com/openshift/cloud-credential-operator/pull/727

Bug OCPBUGS-35253: [AWS] securityGroups and subnet don’t keep consistent in machine yaml and on aws console

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34713~~. The following is the description of the original issue:
—
Description of problem:

    [AWS] securityGroups and subnet don’t keep consistent in machine yaml and on aws console 
    No securityGroups huliu-aws531d-vlzbw-master-sg for masters on aws console, but shows in master machines yaml 
    No securityGroups huliu-aws531d-vlzbw-worker-sg for workers on aws console, but shows in worker machines yaml 
    No subnet huliu-aws531d-vlzbw-private-us-east-2a for masters and workers on aws console, but shows in master and worker machines yaml

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-05-30-130713
This happens in the latest 4.16(CAPI) AWS cluster

How reproducible:

    Always

Steps to Reproduce:

    1. Install a AWS 4.16 cluster
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-05-30-130713   True        False         46m     Cluster version is 4.16.0-0.nightly-2024-05-30-130713
liuhuali@Lius-MacBook-Pro huali-test % oc  get machine
NAME                                          PHASE     TYPE         REGION      ZONE         AGE
huliu-aws531d-vlzbw-master-0                  Running   m6i.xlarge   us-east-2   us-east-2a   65m
huliu-aws531d-vlzbw-master-1                  Running   m6i.xlarge   us-east-2   us-east-2b   65m
huliu-aws531d-vlzbw-master-2                  Running   m6i.xlarge   us-east-2   us-east-2c   65m
huliu-aws531d-vlzbw-worker-us-east-2a-swwmk   Running   m6i.xlarge   us-east-2   us-east-2a   62m
huliu-aws531d-vlzbw-worker-us-east-2b-f2gw9   Running   m6i.xlarge   us-east-2   us-east-2b   62m
huliu-aws531d-vlzbw-worker-us-east-2c-x6gbz   Running   m6i.xlarge   us-east-2   us-east-2c   62m

    2.Check the machines yaml, there are 4 securityGroups and 2 subnet value for master machines, 3 securityGroups and 2 subnet value for worker machines. 
But check on aws console, only 3 securityGroups and 1 subnet value for masters, 2 securityGroups and 1 subnet value for workers.

liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws531d-vlzbw-master-0  -oyaml
…
      securityGroups:
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-master-sg
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-node
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-lb
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-controlplane
      subnet:
        filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-private-us-east-2a
          - huliu-aws531d-vlzbw-subnet-private-us-east-2a
…
https://drive.google.com/file/d/1YyPQjSCXOm-1gbD3cwktDQQJter6Lnk4/view?usp=sharing
https://drive.google.com/file/d/1MhRIm8qIZWXdL9-cDZiyu0TOTFLKCAB6/view?usp=sharing
https://drive.google.com/file/d/1Qo32mgBerWp5z6BAVNqBxbuH5_4sRuBv/view?usp=sharing
https://drive.google.com/file/d/1seqwluMsPEFmwFL6pTROHYyJ_qPc0cCd/view?usp=sharing


liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws531d-vlzbw-worker-us-east-2a-swwmk  -oyaml
…
      securityGroups:
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-worker-sg
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-node
      - filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-lb
      subnet:
        filters:
        - name: tag:Name
          values:
          - huliu-aws531d-vlzbw-private-us-east-2a
          - huliu-aws531d-vlzbw-subnet-private-us-east-2a
…


https://drive.google.com/file/d/1FM7dxfSK0CGnm81dQbpWuVz1ciw9hgpq/view?usp=sharing
https://drive.google.com/file/d/1QClWivHeGGhxK7FdBUJnGu-vHylqeg5I/view?usp=sharing
https://drive.google.com/file/d/12jgyFfyP8fTzQu5wRoEa6RrXbYt_Gxm1/view?usp=sharing

Actual results:

    securityGroups and subnet don’t keep consistent in machine yaml and on aws console

Expected results:

    securityGroups and subnet should keep consistent in machine yaml and on aws console

Additional info:

https://github.com/openshift/installer/pull/8569

Bug OCPBUGS-25894: Kube-apiserver operator is trying to delete prometheus rule that does not exists

View the Description View the linked PRs

Description of problem:

Kube-apiserver operator is trying to delete prometheus rule that does not exists leading to huge amount of unwanted audit logs, 

With the introduction of the change as a part of BUG-2004585 kube-apiserver SLO rulesare split into 2 groups kube-apiserver-slos-basic and kube-apiserver-slos-extended kube-apiserver-operator is trying to delete /apis/monitoring.coreos.com/v1/namespaces/openshift-kube-apiserver/prometheusrules/kube-apiserver-slos which no longer exist in the cluster

Version-Release number of selected component (if applicable):

4.12
4.13
4.14

How reproducible:

    Its easy to reproduce

Steps to Reproduce:

    1. install a cluster with 4.12
    2. enable cluster logging 
    3. forward the audit log to internal or external logstore using below config

apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  pipelines: 
  - name: all-to-default
    inputRefs:
    - infrastructure
    - application
    - audit
    outputRefs:
    - default     

    4. Check the audit logs in kibana, it will show the logs like below image

Actual results:

    Kube-apiserver-operator is trying to delete prometheus rule that does not exists in the cluster

Expected results:

if the rule is not there in the cluster it should not be searched for deletion

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1642

Bug TRT-1512: Aggregator not seeing test results?

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-azure-sdn-upgrade-4.15-minor-release-openshift-release-analysis-aggregator/1757905312053989376

Aggregator claims these tests only ran 4 times out of what looks like 10 jobs that ran to normal completion:

[sig-network-edge] Application behind service load balancer with PDB remains available using new connections
[sig-network-edge] Application behind service load balancer with PDB remains available using reused connections

However looking at one of the jobs not in the list of passes, we can see these tests ran:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade/1757905303602466816

Why is the aggregator missing this result somehow?

https://github.com/openshift/origin/pull/28604

Bug OCPBUGS-25025: did not find "trackTimestampsStaleness: true" setting for kubelet/kubelet-minimal servicemonitor

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2191

Bug OCPBUGS-22410: Install on vSphere using relative path for datastore is not backwards compatible on OCP 4.13+

View the Description View the linked PRs

Description of problem:

Original issue reported here: https://issues.redhat.com/browse/ACM-6189 reported by QE and customer.

Using ACM/hive, customers can deploy Openshift on vSphere. In the upcoming release of ACM 2.9, we support customers on OCP 4.12 - 4.15. ACM UI updates the install config as users add configurations details.

This has worked for several releases over the last few years. However in OCP 4.13+ the format has changed and there is now additional validation to check if the datastore is a full path.

As per https://issues.redhat.com/browse/SPLAT-1093, removal of the legacy fields should not happen until later, so any legacy configurations such as relative paths should still work.

Version-Release number of selected component (if applicable):

ACM 2.9.0-DOWNSTREAM-2023-10-24-01-06-09
OpenShift 4.14.0-rc.7
OpenShift 4.13.18
OpenShift 4.12.39

How reproducible:

Always

Steps to Reproduce:

1. Deploy OCP 4.12 on vSphere using legacy field and relative path without folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS
2. Installer passes.
3. Deploy OCP 4.12 on vSphere using legacy field and relative path WITH folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS-Folder/WORKLOAD-DS
4. Installer fails.
5. Deploy OCP 4.12 on vSphere using legacy field and FULL path (e.g. platform.vsphere.defaultDatastore: /Workload Datacenter/datastore/WORKLOAD-DS-Folder/WORKLOAD-DS 
6. Installer fails.

7. Deploy OCP 4.13 on vSphere using legacy field and relative path without folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS
8. Installer fails.
9. Deploy OCP 4.13 on vSphere using legacy field and relative path WITH folder (e.g. platform.vsphere.defaultDatastore: WORKLOAD-DS-Folder/WORKLOAD-DS 
10. Installer passes. 
11. Deploy OCP 4.13 on vSphere using legacy field and FULL path (e.g. platform.vsphere.defaultDatastore: /Workload Datacenter/datastore/WORKLOAD-DS-Folder/WORKLOAD-DS 
12. Installer fails.

Actual results:

Default Datastore Value	OCP 4.12	OCP 4.13	OCP 4.14
`/Workload Datacenter/datastore/WORKLOAD-DS-Folder/WORKLOAD-DS`	No	Yes	Yes
`WORKLOAD-DS-Folder/WORKLOAD-DS`	No	Yes	Yes
`WORKLOAD-DS`	Yes	No	No

For OCP 4.12.z managed clusters deployments name-only path is the only one that works as expected.
For OCP 4.13.z+ managed cluster deployments only full name and relative path with folder works as expected.

Expected results:

OCP 4.13.z+ takes relative path without specifying the folder like OCP 4.12.z does.

Additional info:

https://github.com/openshift/installer/pull/7931

Bug OCPBUGS-23022: Installer should not complain about API resolution when the domains resolved

View the Description View the linked PRs

Description of problem:

Searching CI turns up runs like this which log:

level=warning msg=The bootstrap machine is unable to resolve API and/or API-Int Server URLs

despite the gathere log-bundle saying resolution was fine:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade/1721875076346810368/artifacts/e2e-gcp-ovn-upgrade/ipi-install-install-stableinitial/artifacts/log-bundle-20231107134325.tar | tar xOz log-bundle-20231107134325/bootstrap/services/bootkube.json | jq -r '.[] | .timestamp + "  " + (.stage // "-") + " " + .phase + " " + (.result // "-")' bootstrap/services/bootkube.json | sort | grep resolve
2023-11-03T10:26:53Z  resolve-api-int-url stage end success
2023-11-03T10:26:53Z  resolve-api-int-url stage start -
2023-11-03T10:26:53Z  resolve-api-url stage end success
2023-11-03T10:26:53Z  resolve-api-url stage start -
2023-11-03T10:47:30Z  resolve-api-int-url stage end success
2023-11-03T10:47:30Z  resolve-api-int-url stage start -
2023-11-03T10:47:30Z  resolve-api-url stage end success
2023-11-03T10:47:30Z  resolve-api-url stage start -

Version-Release number of selected component (if applicable):

Definitely 4.15, from that CI run. Likely all releases since the check landed in installer#5816.

How reproducible:

Untested, but from inspecting the code, I'd expect fairly reproducible.

Steps to Reproduce:

1. Feed the installer an impossible manifest (e.g. using a kind that does not exist).
2. Try to install.
3. See the installer gather bootstrap logs and analyze them.

Actual results:

level=warning msg=The bootstrap machine is unable to resolve API and/or API-Int Server URLs

Expected results:

Installer does not complain about API resolution unless API resolution is broken.

Additional info:

The check logic seems to be looking at overall success of the bootkube service. It should be updated to only check the success of the resolve-api-url and resolve-api-int-url steps in that service. Ideally with separate analysis steps, so you don't have to say "and/or" in the logged warning.

https://github.com/openshift/installer/pull/7701

Bug OCPBUGS-25576: Update 4.16 csi-attacher-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-external-attacher/pull/69

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-external-attacher/pull/69

Bug OCPBUGS-31530: Failed to create the catalog source pod with error: Pod is invalid: metadata.labels must be no more than 63 characters

View the Description View the linked PRs

Description of problem:

Mirror OCI catalog with v2, after create catalog source , the pod is not present, check the catalog see error info :

oc describe catalogsource cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
Name:         cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
Namespace:    openshift-marketplace
Labels:       <none>
Annotations:  <none>
API Version:  operators.coreos.com/v1alpha1
Kind:         CatalogSource
Metadata:
  Creation Timestamp:  2024-03-29T02:49:47Z
  Generation:          1
  Resource Version:    53264
  UID:                 69a39693-b29b-4fa4-a6da-de31dc3d521c
Spec:
  Image:        ec2-3-139-239-15.us-east-2.compute.amazonaws.com:5000/multi/redhat-operator-index:8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
  Source Type:  grpc
Status:
  Message:  couldn't ensure registry server - error ensuring pod: : error creating new pod: cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c-: Pod "cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7da785sd" is invalid: metadata.labels: Invalid value: "cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c": must be no more than 63 characters
  Reason:   RegistryServerError
Events:     <none>

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403251146.p0.g03ce0ca.assembly.stream.el9-03ce0ca", GitCommit:"03ce0ca797e73b6762fd3e24100ce043199519e9", GitTreeState:"clean", BuildDate:"2024-03-25T16:34:33Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1)  Copy the operator as OCI format to localhost:
`skopeo copy --all docker://registry.redhat.io/redhat/redhat-operator-index:v4.15 oci:///app1/noo/redhat-operator-index --remove-signatures`

2)  Use following imagesetconfigure for mirror: cat config-multi-op.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
  operators:
    - catalog: oci:///app1/noo/redhat-operator-index
      packages:
        - name: odf-operator
`oc-mirror --config config-multi-op.yaml file://outmulitop   --v2`


3) Do diskTomirror :
`oc-mirror --config config-multi-op.yaml --from file://outmulitop  --v2 docker://ec2-3-139-239-15.us-east-2.compute.amazonaws.com:5000/multi`

4) Create cluster resource with file: itms-oc-mirror.yaml
   `oc create -f cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c.yaml`

Actual results:

4) The pod for catalogsource not present 

oc describe catalogsource cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
Name:         cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
Namespace:    openshift-marketplace
Labels:       <none>
Annotations:  <none>
API Version:  operators.coreos.com/v1alpha1
Kind:         CatalogSource
Metadata:
  Creation Timestamp:  2024-03-29T02:49:47Z
  Generation:          1
  Resource Version:    53264
  UID:                 69a39693-b29b-4fa4-a6da-de31dc3d521c
Spec:
  Image:        ec2-3-139-239-15.us-east-2.compute.amazonaws.com:5000/multi/redhat-operator-index:8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
  Source Type:  grpc
Status:
  Message:  couldn't ensure registry server - error ensuring pod: : error creating new pod: cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c-: Pod "cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7da785sd" is invalid: metadata.labels: Invalid value: "cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c": must be no more than 63 characters
  Reason:   RegistryServerError
Events:     <none>

cat cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c.yaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: null
  name: cs-redhat-operator-index-8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
  namespace: openshift-marketplace
spec:
  image: ec2-3-139-239-15.us-east-2.compute.amazonaws.com:5000/multi/redhat-operator-index:8bfb449c24d03d6ddbd05d3de9fe7a7dae4a2ecdb8f84487f28d24d6ca2d175c
  sourceType: grpc
status: {}

Expected results:

4) catalog source pod running well.

https://github.com/openshift/oc-mirror/pull/826

Bug OCPBUGS-25093: Update 4.16 csi-livenessprobe-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-livenessprobe/pull/56

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-livenessprobe/pull/56

Bug OCPBUGS-36165: MachineOSConfig CurrentImagePullSecret field is not being used during image rollout

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34261~~. The following is the description of the original issue:
—
Description of problem:

The CurrentImagePullSecret field on the MachineOSConfig is not being consumed by the rollout process. This is evident when the designated image registry is private and the only way to pull an image is to present a secret.

How reproducible:

Always

Steps to Reproduce:

Configure a private image registry to be the designated image registry of choice for on-cluster layering.
Get the pull secret for the registry and apply it to the cluster in the MCO namespace.
Configure on-cluster layering to use this secret for pushing the image.
Wait for the build to complete.
Wait for the rollout to occur.

Actual results:

The node and MachineConfigPool will degrade because rpm-ostree is unable to pull the newly-built image because it does not have access to the credentials even though the MachineOSConfig has a field for them.

Expected results:

Rolling out the newly-built OS image should succeed.

Additional info:

It looks like we'll need make the getImageRegistrySecrets() function aware of all MachineOSConfigs and pull the secrets from there. Where this could be problematic is where there are two image registries with different secrets. This is because the secrets are merged based on the image registry hostname. Instead, what we may want to do is have the MCD write only the contents of the referenced secret to the nodes' filesystem before calling rpm-ostree to consume it. This could potentially also reduce or eliminate the overall complexity introduced by the getImageRegistrySecrets() while simultaneously resolving the concerns found under https://issues.redhat.com//browse/OCPBUGS-33803.

It is worth mentioning that even though we use a private image registry to test the rollout process in OpenShift CI, the reason it works is because it uses an Imagestream which the machine-os-puller service account and its image pull secret is associated with it. This secret is surfaced to all of the cluster nodes by the getImageRegistrySecrets() process. So in effect, it may appear that its working when it does not work as intended. A way to test this would be to create an ImageStream in a separate namespace along with a separate pull secret and then attempt to use that ImageStream and pull secret within a MachineOSConfig.

Finally, to add another wrinkle to this problem: If a cluster admin wants to use a different final image pull secret for each MachineConfigPool, merging those will get more difficult. Assuming the image registry has the same hostname, this would lead to the last secret being merged as the winner. And the last secret that gets merged would be the secret that gets used; which may be the incorrect secret.

https://github.com/openshift/machine-config-operator/pull/4430

Bug OCPBUGS-26182: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-kubevirt/pull/40

Bug OCPBUGS-42014: Cluster creation failure rate increased since June 2024

View the Description View the linked PRs

Description of problem. In Advanced Cluster Security we rely for OCP creation in our CI and recently observed an increase of cluster creation failures. While we've been advised to retry the failures (and we do so now, see ROX-25416), I'm afraid our use case is not so unique and others are affected as well.

We suggest upgrading terraform and provider to the latest version (possible before license changes) in openshift-installer for 4.12+. The underlying issue is probably already fixed upstream and released in v5.37.0.

Version-Release number of selected component (if applicable): TBD

How reproducible: TBD

Steps to Reproduce: TBD

Actual results: TBD

Expected results: TBD

Additional info.

The most common error we see in our JIRA issues is and that is something we could find similar issues with AWS provider too eg. ~~OCPBUGS-4213~~.

level=error msg=Error: Provider produced inconsistent result after apply .... resource was present, but now absent

Summary of errors from:

      3 failed to create cluster: failed to apply Terraform: error(GCPComputeBackendTimeout) from Infrastructure Provider: GCP is experiencing backend service interuptions, the compute instance failed to create in reasonable time.&#34;
      3 Provider produced inconsistent result after apply\n\nWhen applying changes to\nmodule.master.google_service_account.master-node-sa[0], provider\n\&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new value: Root\nresource was present, but now absent.\n\n
      6 Error waiting to create Network: Error waiting for Creating Network: timeout while waiting for state to become &#39;DONE&#39; (last state: &#39;RUNNING&#39;, timeout: 4m0s)\n\n  with module.network.google_compute_network.cluster_network[0],\n  on network/network.tf line 1, in resource \&#34;google_compute_network\&#34; \&#34;cluster_network\&#34;:\n   1: resource \&#34;google_compute_network\&#34; \&#34;cluster_network\&#34; {\n\n&#34;
      9 error applying Terraform configs: failed to apply Terraform: error(GCPComputeBackendTimeout) from Infrastructure Provider: GCP is experiencing backend service interuptions, the compute instance failed to create in reasonable time.&#34;
     14 Provider produced inconsistent result after apply\n\nWhen applying changes to module.master.google_service_account.master-node-sa,\nprovider \&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new\nvalue: Root resource was present, but now absent.
     16 Provider produced inconsistent result after apply\n\nWhen applying changes to google_service_account_key.bootstrap, provider\n\&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new value: Root\nresource was present, but now absent.
     18 Provider produced inconsistent result after apply\n\nWhen applying changes to module.iam.google_service_account.worker-node-sa,\nprovider \&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new\nvalue: Root resource was present, but now absent.
     34 Error creating service account key: googleapi: Error 404: Service account projects/acs-san-stackroxci/serviceAccounts/XXX@acs-san-stackroxci.iam.gserviceaccount.com does not exist., notFound\n\n  with google_service_account_key.bootstrap,\n  on main.tf line 38, in resource \&#34;google_service_account_key\&#34; \&#34;bootstrap\&#34;:\n  38: resource \&#34;google_service_account_key\&#34; \&#34;bootstrap\&#34; {\n\n&#34;
     45 error applying Terraform configs: failed to apply Terraform: exit status 1\n\nError: Provider produced inconsistent result after apply\n\nWhen applying changes to\nmodule.master.google_service_account.master-node-sa[0], provider\n\&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new value: Root\nresource was present, but now absent.
     59 error applying Terraform configs: failed to apply Terraform: exit status 1\n\nError: Provider produced inconsistent result after apply\n\nWhen applying changes to module.iam.google_service_account.worker-node-sa,\nprovider \&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new\nvalue: Root resource was present, but now absent.
    100 Provider produced inconsistent result after apply\n\nWhen applying changes to google_service_account.bootstrap-node-sa, provider\n\&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new value: Root\nresource was present, but now absent.
    103 Provider produced inconsistent result after apply\n\nWhen applying changes to module.iam.google_service_account.worker-node-sa[0],\nprovider \&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new\nvalue: Root resource was present, but now absent.
    116 Provider produced inconsistent result after apply\n\nWhen applying changes to\nmodule.master.google_service_account.master-node-sa[0], provider\n\&#34;provider[\\\&#34;openshift/local/google\\\&#34;]\&#34; produced an unexpected new value: Root\nresource was present, but now absent.

The openshift installer contains a bundled terraform and google-provider

4.12
- Terraform 1.0.11 (November 10, 2021)
- https://github.com/openshift/installer/blob/release-4.12/terraform/terraform/go.mod#L5
- Provider: v4.5.0 (Dec 20, 2021)
- https://github.com/openshift/installer/blob/release-4.12/terraform/providers/google/go.mod#L5
4.14+
- Terraform: 1.3.7 (January 04, 2023)
- https://github.com/openshift/installer/blob/release-4.14/terraform/terraform/go.mod#L5
- Provider: v4.74.0 (Jul 18, 2023)
- https://github.com/openshift/installer/blob/release-4.14/terraform/providers/google/go.mod#L5
4.18+
- Does not contain google provider at all: https://github.com/openshift/installer/pull/8723

https://github.com/openshift/installer/pull/9077

Bug OCPBUGS-27502: "all tls artifacts must be registered" failing on some metal jobs

View the Description View the linked PRs

These two tests are permafailing on some metal jobs:

[sig-arch][Late] all tls artifacts must be registered [Suite:openshift/conformance/parallel]

[sig-arch][Late] all registered tls artifacts must have no metadata violation regressions [Suite:openshift/conformance/parallel]

Example run: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-upgrade-ovn-ipv6/1749394708653674496

It was previously passing 100%, maybe it's tied to recent changes?

https://github.com/openshift/origin/pull/28444

Additional context here:

https://sippy.dptools.openshift.org/sippy-ng/tests/4.16/analysis?test=%5Bsig-arch%5D%5BLate%5D%20all%20tls%20artifacts%20must%20be%20registered%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-arch%5D%5BLate%5D%20all%20tls%20artifacts%20must%20be%20registered%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

https://github.com/openshift/origin/pull/28541

Bug OCPBUGS-30852: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4261

Bug OCPBUGS-42555: 4.16: After upgrading OCP and LSO to 4.14, openshift-logging elasticsearch pods will not bind to PVC and remain in Pending state

View the Description View the linked PRs

This is a clone of issue OCPBUGS-42120. The following is the description of the original issue:
—
Description of problem:

    After upgrading OCP and LSO to version 4.14, elasticsearch pods in the openshift-logging deployment are unable to schedule to their respective nodes and remain Pending, even though the LSO managed PVs are bound to the PVCs. A test pod using a newly created test PV managed by the LSO is able to schedule correctly however.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    Consistently

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

    Pods consuming previously existing LSO managed PVs are unable to schedule and remain in a Pending state after upgrading OCP and LSO to 4.14.

Expected results:

    That pods would be able to consume LSO managed PVs and schedule correctly to nodes.

Additional info:

https://github.com/openshift/kubernetes/pull/2099

Bug OCPBUGS-26408: removal of SA token mount causes ovnkube-control-plane pod to crash

View the Description View the linked PRs

Description of problem:

Adding automountServiceAccount: false to the pod removes the SA token in ovnkube-control-plane pod, this causes it to crash with following error:   

F1212 12:18:13.705048 1 ovnkube.go:136] unable to create kubernetes rest config, err: TLS-secured apiservers require token/cert and CA certificate.

This error is misleading as the pod doesnt use KAS and doesnt need the SA token.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

    1. Add automountServiceAccountToken: false to pod spec in ovnkube-control-plane deployment
    2. Check new pod for error
    3.

Actual results:

pod crashes with error:     unable to create kubernetes rest config, err: TLS-secured apiservers require token/cert and CA certificate.

Expected results:

 pod runs without issues

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2306

Bug OCPBUGS-35946: Invalid filesystem query on compute nodes table

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33136~~. The following is the description of the original issue:
—
Description of problem:

    Compute nodes table, does not display correct filesystem data

Version-Release number of selected component (if applicable):

    4.16.0-0.ci-2024-04-29-054754

How reproducible:

    Always

Steps to Reproduce:

    1. In an Openshift cluster 4.16.0-0.ci-2024-04-29-054754
    2. Go to the Compute / Nodes menu
    3. Check the Filesystem column

Actual results:

    There is no storage data displayed

Expected results:

    The query is executed correctly and the storage data is displayed correctly

Additional info:

    The query has an error as is not concatenating things correctly: https://github.com/openshift/console/blob/master/frontend/packages/console-app/src/components/nodes/NodesPage.tsx#L413

https://github.com/openshift/console/blob/master/frontend/packages/console-app/src/components/nodes/NodesPage.tsx#L413

https://github.com/openshift/console/pull/13999

Bug OCPBUGS-37418: Resolve snyk issue: k8s.io/client-go/transport [4.16]

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37334~~. The following is the description of the original issue:
—
Description of problem:

    ci/prow/security is failing: k8s.io/client-go/transport

Version-Release number of selected component (if applicable):

4.16

How reproducible:

    always

Steps to Reproduce:

    1. trigger ci/prow/security on a pull request
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/735

Bug OCPBUGS-31484: oc newapp unit tests are failing due to removed images

View the Description View the linked PRs

Description of problem:

    all images have been removed from quay.io/centos7 and oc newapp unit tests are heavily relying on these images and started failing. See https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_oc/1716/pull-ci-openshift-oc-master-unit/1773203483667730432

Version-Release number of selected component (if applicable):

    probably all

How reproducible:

    Open a PR and see that pre-submit unit test fails

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug TRT-1507: Metal jobs permafailing 4.16 nightly: pathological event should not see excessive Back-off restarting failed containers for ns/openshift-kni-infra

View the Description View the linked PRs

[sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers for ns/openshift-kni-infra

Test is now passing near 0% on metal-ipi.

Started around Feb 10th.

event [namespace/openshift-kni-infra node/master-1.ostest.test.metalkube.org pod/haproxy-master-1.ostest.test.metalkube.org hmsg/64785a22cf - Back-off restarting failed container haproxy in pod haproxy-master-1.ostest.test.metalkube.org_openshift-kni-infra(336080d8c1b455c151170524132c026d)] happened 295 times

Possible relation to haproxy 2.8 merge?

Logs indicate the error is:

/bin/bash: line 47: socat: command not found

https://github.com/openshift/router/pull/561

Bug OCPBUGS-25558: Update 4.16 ose-cluster-csi-snapshot-controller-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/183

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/183

Bug OCPBUGS-25931: Singleline syntax for inline code snippet

View the Description View the linked PRs

Description of problem:

Single line execute markdown reference is not working.

Steps to Reproduce

Install Web Terminal Operator
Create a ConsoleQuickStart resource using "copy-execute-demo.yaml" file
Open "Sample Quick Start" from quick start catalog page

Actual results:

The inline code is not getting rendered properly specifically for single line execute syntax.

Expected results:

The inline code should show a code block with a small execute icon to run the commands in web terminal

https://github.com/openshift/console/pull/13580

Bug OCPBUGS-25780: There is no response when clicking on button "Select a version" when there is new update

View the Description View the linked PRs

Description of problem:

When there is new update for cluster, try to click "Select a version" from cluster settings page, there is no reaction.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-12-19-033450

How reproducible:

Always

Steps to Reproduce:

    1.Prepare a cluster with available update.
    2.Go to Cluster Settings page, choose a version by clicking on "Select a version" button.
    3.

Actual results:

2. There is no response when click on the button, user could not select a version from the page.

Expected results:

2. A modal should show up for user to select version after clicking on "Select a version" button

Additional info:

screenshot: https://drive.google.com/file/d/1Kpyu0kUKFEQczc5NVEcQFbf_uly_S60Y/view?usp=sharing

Bug OCPBUGS-36620: PAC: PLRs log link is broken

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30841~~. The following is the description of the original issue:
—
Description of problem:

    PAC provide the log link in git to see log of the PLR. Which is broken on 4.15 after this change https://github.com/openshift/console/pull/13470. This PR changes the log URL after react route package upgrade.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/14035

Bug OU-308: monitoring-plugin failing in CI

View the Description View the linked PRs

PR https://github.com/openshift/monitoring-plugin/pull/83 was intended to just modify the images built for local testing, but accidentally changed the deafult Dockerfile leading to a mismatch between the nginx config and Dockerfile used in CI. The causes the monitoring-plugin to fail to load in CI builds.

https://github.com/openshift/monitoring-plugin/pull/90

Bug OCPBUGS-38788: [4.16] Get hypershift-kubevirt conformance tests on Azure pass

View the Description View the linked PRs

It has been shown that running the conformance test suite on hypershift hosted clusters with the kubevirt provider are far more stable than their metal counterparts. In order to get the conformance on Azure passing, we need to skip a single test that sends ping (ICMP) to the Internet, as azure is blocking ICMP.

https://github.com/openshift/origin/pull/28938

Bug OCPBUGS-25500: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/252

Bug OCPBUGS-35235: ironic.service fails to start on bootstrap node when provisioning network is disabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34493~~. The following is the description of the original issue:
—
Description of problem:

Failed to deploy baremetal cluster as cluster nodes are not introspected

Version-Release number of selected component (if applicable):

4.15.15

How reproducible:

periodically

Steps to Reproduce:

    1. Deploy baremetal dualstack cluster with disabled provisioning network
    2.
    3.

Actual results:

Cluster fails to deploy as ironic.service fails to start on the bootstrap node:

[root@api ~]# systemctl status ironic.service
○ ironic.service - Ironic baremetal deployment service
     Loaded: loaded (/etc/containers/systemd/ironic.container; generated)
     Active: inactive (dead)

May 27 08:01:05 api.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: Dependency failed for Ironic baremetal deployment service.
May 27 08:01:05 api.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: ironic.service: Job ironic.service/start failed with result 'dependency'.

Expected results:

ironic.service is started, nodes are introspected and cluster is deployed

Additional info:

https://github.com/openshift/installer/pull/8563

Bug OCPBUGS-23299: the checkbox should be displayed on a single row

View the Description View the linked PRs

Description of problem:

the checkbox should be displayed on a single row
eg: for 'Deny all ingress traffic' & 'Deny all egress traffic' in Create NetworkPolicy page
for 'Secure Route' in Create route page

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2023-11-14-082209

How reproducible:

Always

Steps to Reproduce:

1. Go to Networking -> NetworkPolicies page, click the 'Create NetworkPolicy' button
2. Check the Policy type section, check if the checkbox of 'Deny all ingress traffic' & "Deny all egress traffic" is displayed in a single row
3. Check the same things in 'Create route' page,

Actual results:

not in a single row

Expected results:

in a single row

Additional info:

https://drive.google.com/file/d/1xgEe-CuuRYrY9tBFmIa-7o5Rcn7iCr1e/view?usp=drive_link

https://github.com/openshift/console/pull/13397

Bug OCPBUGS-36720: [AWS] install failed with featureSet CustomNoUpgrade is configured

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34708~~. The following is the description of the original issue:
—
Description of problem:

failed job: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1023/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi/1796261717831847936                 

seeing below error:
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: error unpacking terraform: could not unpack the directory for the aws provider: open mirror/openshift/local/aws: file does not exist

Version-Release number of selected component (if applicable):

4.16/4.17

How reproducible:

100%

Steps to Reproduce:

    1. create AWS cluster with "CustomNoUpgrade" featureSet is configured

install-config.yaml
----------------------
featureSet: CustomNoUpgrade
featureGates: [GatewayAPIEnabled=true]

    2.

Actual results:

level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: error unpacking terraform: could not unpack the directory for the aws provider: open mirror/openshift/local/aws: file does not exist

Expected results:

install should be successful

Additional info:

workaround is to add ClusterAPIInstallAWS=true to feature_gates as well, .e.g
featureSet: CustomNoUpgrade
featureGates: [GatewayAPIEnabled=true,ClusterAPIInstallAWS=true]

discussion thread: https://redhat-internal.slack.com/archives/C68TNFWA2/p1716887301410459

https://github.com/openshift/installer/pull/8716

Bug OCPBUGS-26049: Tab "VolumeSnapshots" crashed on PVC page.

View the Description View the linked PRs

Description of problem:

Go to one pvc "VolumeSnapshots" tab, it shows error "Oh no! Something went wrong."

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-01-03-140457

How reproducible:

Always

Steps to Reproduce:

    1.Create a pvc in project. Go to the pvc's "VolumeSnapshots" tab.
    2.
    3.

Actual results:

1. The error "Oh no! Something went wrong." shows up on the page.

Expected results:

1. Should show volumesnapshot related to the pvc without error.

Additional info:

screenshot: https://drive.google.com/file/d/1l0i0DCFh_q9mvFHxnftVJL0AM1LaKFOO/view?usp=sharing

https://github.com/openshift/console/pull/13481

Bug OCPBUGS-26121: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/156

Bug OCPBUGS-27926: Update 4.16 cluster-etcd-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1187

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1189

Bug OCPBUGS-25590: Update 4.16 ose-azure-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/102

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/102

Story HOSTEDCP-1436: NoodPool Subnet field should be required

View the Description View the linked PRs

Nodepools fail to provision if not subnet is specified.
Subnet field https://github.com/openshift/hypershift/blob/main/api/hypershift/v1beta1/nodepool_types.go#L760 should be required.

https://github.com/openshift/hypershift/pull/3581

Bug OCPBUGS-30303: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-kubevirt/pull/35

Task MULTIARCH-4160: Stop using the DetermineFromRelease function

View the Description View the linked PRs

Multi-arch compute clusters have an issue where the cluster version's image ref is single arch, so this change resolves the image ref without spinning up a pod.

https://github.com/openshift/origin/pull/28598

Bug OCPBUGS-31084: Version shown for `oc-mirror --v2 version` should be similar to `oc-mirror version`

View the Description View the linked PRs

Description of problem:

Version shown for `oc-mirror --v2 version` should be similar to `oc-mirror version`

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295", GitCommit:"c4f829512107f7d0f52a057cd429de2030b9b3b3", GitTreeState:"clean", BuildDate:"2024-03-07T03:46:24Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1) `oc-mirror –v2 -v`

Actual results:

oc-mirror --v2 -v--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used. oc-mirror version v2.0.0-dev-01

Expected results:

oc-mirror version --output=yaml
clientVersion:
  buildDate: "2024-03-07T03:46:24Z"
  compiler: gc
  gitCommit: c4f829512107f7d0f52a057cd429de2030b9b3b3
  gitTreeState: clean
  gitVersion: 4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295
  goVersion: go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime
  major: ""
  minor: ""
  platform: linux/amd64

https://github.com/openshift/oc-mirror/pull/817

Bug OCPBUGS-16181: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/514

Bug OCPBUGS-32328: Azure upgrades to 4.14.15+ fail with UPI storage account

View the Description View the linked PRs

Description of problem:

Cluster with user provisioned image registry storage accounts fails to upgrade to 4.14.20 due to image-registry-operator being degraded.

message: "Progressing: The registry is ready\nNodeCADaemonProgressing: The daemon set node-ca is deployed\nAzurePathFixProgressing: Migration failed: panic: AZURE_CLIENT_ID is required for authentication\nAzurePathFixProgressing: \nAzurePathFixProgressing: goroutine 1 [running]:\nAzurePathFixProgressing: main.main()\nAzurePathFixProgressing: \t/go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:25 +0x15c\nAzurePathFixProgressing: "

cmd/move-blobs was introduced due to https://issues.redhat.com/browse/OCPBUGS-29003.

Version-Release number of selected component (if applicable):

4.14.15+

How reproducible:

I have not reproduced myself but I imagine you would hit this every time when upgrading from 4.13->4.14.15+ with Azure UPI image registry

Steps to Reproduce:

    1.Starting on version 4.13, Configuring the registry for Azure user-provisioned infrastructure - https://docs.openshift.com/container-platform/4.14/registry/configuring_registry_storage/configuring-registry-storage-azure-user-infrastructure.html.

    2.  Upgrade to 4.14.15+
    3.

Actual results:

    Upgrade does not complete succesfully 
$ oc get co
....
image-registry                             4.14.20        True        False         True       617d     AzurePathFixControllerDegraded: Migration failed: panic: AZURE_CLIENT_ID is required for authentication...

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.38   True        True          7h41m   Unable to apply 4.14.20: wait has exceeded 40 minutes for these operators: image-registry

Expected results:

Upgrade to complete successfully

Additional info:

https://github.com/openshift/cluster-image-registry-operator/pull/1021

Bug OCPBUGS-34803: The hypershift cli (hcp) reports an inaccurate OCP supported version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32186~~. The following is the description of the original issue:
—
Description of problem:

    The self-managed hypershift cli (hcp) reports an inaccurate OCP supported version.

For example, if I have a hypershift-operator deployed which supports OCP v4.14 and I build the hcp cli from the latest source code, when I execute "hcp -v", the cli tool reports the following. 


$ hcp -v
hcp version openshift/hypershift: 02bf7af8789f73c7b5fc8cc0424951ca63441649. Latest supported OCP: 4.16.0

This makes it appear that the hcp cli is capable of deploying OCP v4.16.0, when the backend is actually limited to v4.14.0.

The cli needs to indicate what the server is capable of deploying. Otherwise it appears that v4.16.0 would be deployable in this scenario, but the backend would not allow that.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    100%

Steps to Reproduce:

    1. download an HCP client that does not match the hypershift-operator backend
    2. execute 'hcp -v'
    3. the reported "Latest supported OCP" is not representative of the version the hypershift-operator actually supports

Actual results:

Expected results:

     hcp cli reports a latest OCP version that is representative of what the deployed hypershift operator is capable of deploying.

Additional info:

https://github.com/openshift/hypershift/pull/4141

Bug OCPBUGS-24844: Update 4.16 ose-openstack-cinder-csi-driver-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/145

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/145

Bug OCPBUGS-32547: Hit error when generating pruning plan for delete phase using --generate command with oc-mirror v2

View the Description View the linked PRs

Description of problem:

Hit error when create pruning plan generation for delete phase using –generate for mirror2mirror

Version-Release number of selected component (if applicable):

oc-mirror  version WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202404191609.p0.g9ac063b.assembly.stream.el9-9ac063b", GitCommit:"9ac063b0b88466183a50287af277c5ed40a8e238", GitTreeState:"clean", BuildDate:"2024-04-19T22:03:51Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1) Use following isc to do mirror2mirror for v2:
cat config.yaml
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
  platform:
    channels:
      - name: stable-4.15
  additionalImages: 
    - name: registry.redhat.io/ubi8/ubi:latest
    - name: registry.redhat.io/ubi8/ubi-minimal:latest
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.14
      packages:
      - name: 3scale-operator
    - catalog: oci:///app1/ibm-catalog
      targetTag: "v14"
      targetCatalog: "zhouy/catalog"
    - catalog: oci:///app1/noo/redhat-operator-index
      packages:
        - name: cluster-kube-descheduler-operator
        - name: advanced-cluster-management

`oc-mirror --config config.yaml --v2 docker://xxx.com:5000/m2m  --workspace file:///app1/0416/clid20/`

2) generate pruning plan for delete phase using --generate
cat config-delete.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: DeleteImageSetConfiguration
delete:
  platform:
    channels:
      - name: stable-4.15
  additionalImages: 
    - name: registry.redhat.io/ubi8/ubi:latest
    - name: registry.redhat.io/ubi8/ubi-minimal:latest
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.14
      packages:
      - name: 3scale-operator
    - catalog: oci:///app1/ibm-catalog
      targetTag: "v14"
      targetCatalog: "zhouy/catalog"
    - catalog: oci:///app1/noo/redhat-operator-index
      packages:
        - name: cluster-kube-descheduler-operator
        - name: advanced-cluster-management


`oc-mirror delete --config config-delete.yaml --workspace file://clid20 --v2 --generate  docker://xxx.com:5000/m2m`

Actual results:

2) Many errors for generate command: 

2024/04/22 10:02:29  [ERROR]  : reading manifest d8e94620237da97e1b65dac4fb616d21d13e2fea08c9385145a02ad3fbd59d88 in localhost:55000/3scale-amp2/3scale-rhel7-operator-metadata: manifest unknown image : map[]
2024/04/22 10:02:29  [ERROR]  : [delete-images] reading manifest d8e94620237da97e1b65dac4fb616d21d13e2fea08c9385145a02ad3fbd59d88 in localhost:55000/3scale-amp2/3scale-rhel7-operator-metadata: manifest unknown
2024/04/22 10:02:29  [ERROR]  : reading manifest 4f72f049436af1a940833c61b075d84ad5910c7bb2df2a8de99618995c067dfe in localhost:55000/3scale-amp2/3scale-rhel7-operator: manifest unknown image : map[]
2024/04/22 10:02:29  [ERROR]  : [delete-images] reading manifest 4f72f049436af1a940833c61b075d84ad5910c7bb2df2a8de99618995c067dfe in localhost:55000/3scale-amp2/3scale-rhel7-operator: manifest unknown
2024/04/22 10:02:29  [ERROR]  : reading manifest 4f72f049436af1a940833c61b075d84ad5910c7bb2df2a8de99618995c067dfe in localhost:55000/3scale-amp2/3scale-rhel7-operator: manifest unknown image : map[]
2024/04/22 10:02:29  [ERROR]  : [delete-images] reading manifest 4f72f049436af1a940833c61b075d84ad5910c7bb2df2a8de99618995c067dfe in localhost:55000/3scale-amp2/3scale-rhel7-operator: manifest unknown
2024/04/22 10:02:29  [ERROR]  : reading manifest 68b310ed3cfd65db893ba015ef1d5442365201c0ced006c1915e90edb99933ea in localhost:55000/3scale-amp2/apicast-gateway-rhel8: manifest unknown image : map[]
2024/04/22 10:02:29  [ERROR]  : [delete-images] reading manifest 68b310ed3cfd65db893ba015ef1d5442365201c0ced006c1915e90edb99933ea in localhost:55000/3scale-amp2/apicast-gateway-rhel8: manifest unknown
2024/04/22 10:02:29  [ERROR]  : reading manifest 2b8525c55cfbd5b5d66b50868ebd8fe6f468b10715653d047cdae25fa28e5983 in localhost:55000/3scale-amp2/backend-rhel8: manifest unknown image : map[]
2024/04/22 10:02:29  [ERROR]  : [delete-images] reading manifest 2b8525c55cfbd5b5d66b50868ebd8fe6f468b10715653d047cdae25fa28e5983 in localhost:55000/3scale-amp2/backend-rhel8: manifest unknown
2024/04/22 10:02:29  [ERROR]  : reading manifest 97e6355fcfadf7fea1f30cc8bbf833c8a518655cef4d63051df29fbcde2c1f00 in localhost:55000/3scale-amp2/memcached-rhel7: manifest unknown image : map[]

Expected results:

2) no error

https://github.com/openshift/oc-mirror/pull/848

Bug OCPBUGS-27156: [gcp] destroying the problem cluster unexpectedly deletes the dns record-sets not created by the installer

View the Description View the linked PRs

Description of problem:

   Trying to create the second cluster using the same cluster name and base domain as the first cluster would fail, as expected, because of the dns record-sets conflicts. But deleting the second cluster leads to the first cluster inaccessible, which is unexpected.

Version-Release number of selected component (if applicable):

    4.15.0-0.nightly-2024-01-14-100410

How reproducible:

    Always

Steps to Reproduce:

1. create the first cluster and make sure it succeeds
2. try to create the second cluster, with the same cluster name, base domain, and region, and make sure it failed
3. destroy the second cluster which failed due to "Platform Provisioning Check"
4. check if the first cluster is still healthy

Actual results:

    The first cluster turns unhealthy, because the dns record-sets are deleted by step3

Expected results:

    The dns record-sets of the first cluster stay untouched during step3, and the the first cluster stays healthy after step3.

Additional info:

(1) the first cluster is by Flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/257549/, and it's healthy initially

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-14-100410   True        False         54m     Cluster version is 4.15.0-0.nightly-2024-01-14-100410
$ oc get nodes
NAME                                                       STATUS   ROLES                  AGE   VERSION
jiwei-0115y-lgns8-master-0.c.openshift-qe.internal         Ready    control-plane,master   73m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-master-1.c.openshift-qe.internal         Ready    control-plane,master   73m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-master-2.c.openshift-qe.internal         Ready    control-plane,master   74m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-worker-a-gqq96.c.openshift-qe.internal   Ready    worker                 62m   v1.28.5+c84a6b8
jiwei-0115y-lgns8-worker-b-2h9xd.c.openshift-qe.internal   Ready    worker                 63m   v1.28.5+c84a6b8
$ 

(2) try to create the second cluster and expect failing due to dns record already exists

$ openshift-install version
openshift-install 4.15.0-0.nightly-2024-01-14-100410
built from commit b6f320ab7eeb491b2ef333a16643c140239de0e5
release image registry.ci.openshift.org/ocp/release@sha256:385d84c803c776b44ce77b80f132c1b6ed10bd590f868c97e3e63993b811cc2d
release architecture amd64
$ mkdir test1
$ cp install-config.yaml test1
$ yq-3.3.0 r test1/install-config.yaml baseDomain
qe.gcp.devcluster.openshift.com
$ yq-3.3.0 r test1/install-config.yaml metadata
creationTimestamp: null
name: jiwei-0115y
$ yq-3.3.0 r test1/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
$ openshift-install create cluster --dir test1
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
INFO Consuming Install Config from target directory 
FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": metadata.name: Invalid value: "jiwei-0115y": record(s) ["api.jiwei-0115y.qe.gcp.devcluster.openshift.com."] already exists in DNS Zone (openshift-qe/qe) and might be in use by another cluster, please remove it to continue 
$ 

(3) delete the second cluster

$ openshift-install destroy cluster --dir test1
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
INFO Deleted 2 recordset(s) in zone qe            
INFO Deleted 3 recordset(s) in zone jiwei-0115y-lgns8-private-zone 
WARNING Skipping deletion of DNS Zone jiwei-0115y-lgns8-private-zone, not created by installer 
INFO Time elapsed: 37s                            
INFO Uninstallation complete!                     
$ 

(4) check the first cluster status and the dns record-sets

$ oc get clusterversion
Unable to connect to the server: dial tcp: lookup api.jiwei-0115y.qe.gcp.devcluster.openshift.com on 10.11.5.160:53: no such host
$
$ gcloud dns managed-zones describe jiwei-0115y-lgns8-private-zone
cloudLoggingConfig:
  kind: dns#managedZoneCloudLoggingConfig
creationTime: '2024-01-15T07:22:55.199Z'
description: Created By OpenShift Installer
dnsName: jiwei-0115y.qe.gcp.devcluster.openshift.com.
id: '9193862213315831261'
kind: dns#managedZone
labels:
  kubernetes-io-cluster-jiwei-0115y-lgns8: owned
name: jiwei-0115y-lgns8-private-zone
nameServers:
- ns-gcp-private.googledomains.com.
privateVisibilityConfig:
  kind: dns#managedZonePrivateVisibilityConfig
  networks:
  - kind: dns#managedZonePrivateVisibilityConfigNetwork
    networkUrl: https://www.googleapis.com/compute/v1/projects/openshift-qe/global/networks/jiwei-0115y-lgns8-network
visibility: private
$ gcloud dns record-sets list --zone jiwei-0115y-lgns8-private-zone
NAME                                          TYPE  TTL    DATA
jiwei-0115y.qe.gcp.devcluster.openshift.com.  NS    21600  ns-gcp-private.googledomains.com.
jiwei-0115y.qe.gcp.devcluster.openshift.com.  SOA   21600  ns-gcp-private.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
$ gcloud dns record-sets list --zone qe --filter='name~jiwei-0115y'
Listed 0 items.
$

https://github.com/openshift/installer/pull/7932

Bug OCPBUGS-30301: Guest nodes can't join the cluster with NodePort publish strategy

View the Description View the linked PRs

Description of problem:

When creating an HostedCluster with 'NodePort' service publishing strategy, the VMs (guest nodes) are trying to contact HCP services, such as ignition and oauth. If these services are colocated on the same infra node, they can't be reached via NodePort because the 'virt-launcher' NetworkPolicy is blocking it.
Need to explicitly add access to oauth and ignition-server-proxy pods so they can be reached from the virtual machines on the same node.

Version-Release number of selected component (if applicable):

4.16.0

How reproducible:

Always, if conditions are met

Steps to Reproduce:

    1. As described above
    2.
    3.

Actual results:

VMs are not joining the cluster as nodes if the ignition server is running on the same infra node as the VM.

Expected results:

All VMs are joining the cluster as nodes, and the HostedCluster is eventually Completed and Available

Additional info:

https://github.com/openshift/hypershift/pull/3680

Bug OCPBUGS-30477: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/213

Bug OCPBUGS-36290: [IBMCLOUD] New VPC regions not GA'd cause failures during resource lookup

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-14963~~. The following is the description of the original issue:
—
Description of problem:

When using IPI for IBM Cloud to create a Private BYON cluster, the installer attempts to fetch the VPC resource to verify if it is already a PermittedNetwork for the DNS Services Zone.
However, currently there is a new VPC Region that is listed in IBM Cloud, eu-es, which is not yet GA'd. This means while it is listed in available VPC Regions, to search for resources, requests to eu-es fail. Any attempts to use VPC Regions alphabetically after eu-es (appears they are returned in this order), fail due to requests made to eu-es. This includes, eu-gb, us-east, and us-south, causing a golang panic.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Create IBM Cloud BYON resources in us-east or us-south
2. Attempt to create a Private BYON based cluster in us-east or us-south

Actual results:

DEBUG   Fetching Common Manifests...               
DEBUG   Reusing previously-fetched Common Manifests 
DEBUG Generating Terraform Variables...            
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x2bdb706]

goroutine 1 [running]:
github.com/openshift/installer/pkg/asset/installconfig/ibmcloud.(*Metadata).IsVPCPermittedNetwork(0xc000e89b80, {0x1a8b9918, 0xc00007c088}, {0xc0009d8678, 0x8})
	/go/src/github.com/openshift/installer/pkg/asset/installconfig/ibmcloud/metadata.go:175 +0x186
github.com/openshift/installer/pkg/asset/cluster.(*TerraformVariables).Generate(0x1dc55040, 0x5?)
	/go/src/github.com/openshift/installer/pkg/asset/cluster/tfvars.go:606 +0x3a5a
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000ca0d80, {0x1a8ab280, 0x1dc55040}, {0x0, 0x0})
	/go/src/github.com/openshift/installer/pkg/asset/store/store.go:227 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffd948754cc?, {0x1a8ab280, 0x1dc55040}, {0x1dc32840, 0x8, 0x8})
	/go/src/github.com/openshift/installer/pkg/asset/store/store.go:77 +0x48
main.runTargetCmd.func1({0x7ffd948754cc, 0xb})
	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:261 +0x125
main.runTargetCmd.func2(0x1dc38800?, {0xc000ca0a80?, 0x3?, 0x3?})
	/go/src/github.com/openshift/installer/cmd/openshift-install/create.go:291 +0xe7
github.com/spf13/cobra.(*Command).execute(0x1dc38800, {0xc000ca0a20, 0x3, 0x3})
	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000bc8000)
	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:918
main.installerMain()
	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
	/go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff

Expected results:

Successful Private cluster creation using BYON on IBM Cloud

Additional info:

IBM Cloud development has identified the issue and is working on a fix to all affected supported releases (4.12, 4.13, 4.14+)

https://github.com/openshift/installer/pull/8674

Bug OCPBUGS-30498: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/463

Bug OCPBUGS-31032: test "deploymentconfigs when tagging images should successfully tag the deployed image fails" because of lack of resources

View the Description View the linked PRs

Description of problem:

    The following test "[sig-apps][Feature:DeploymentConfig] deploymentconfigs when tagging images should successfully tag the deployed image [apigroup:apps.openshift.io][apigroup:authorization.openshift.io][apigroup:image.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" fails. One pod is stuck in Pending because it requests 3G memory that the node doesn't have (There are 2 pods of 3G)

Version-Release number of selected component (if applicable):

    4.14.17

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28680

Bug OCPBUGS-23896: [OCP 4.16] VM stuck in terminating state after OCP node crash

View the Description View the linked PRs

Description of problem:

After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

Version-Release number of selected component (if applicable):

OCP 4.12.15 
osp-director-operator.v1.3.0
kubevirt-hyperconverged-operator.v4.12.5

How reproducible:

Login to a OCP 4.12.15 Node running a VM 
Manually crash the master node.
After reboot the VM stay in terminating state

Steps to Reproduce:

    1. ssh core@masterX 
    2. sudo su
    3. echo c > /proc/sysrq-trigger

Actual results:

After reboot the VM stay in terminating state


$ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
NAME                               STATUS   ROLES                         AGE   VERSION
model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08


$ omc get pod -n openstack 
NAME                                                        READY   STATUS         RESTARTS   AGE
openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
openstackclient                                             1/1     Running        0          7h
osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
osp-director-operator-index-qxxvw                           1/1     Running        0          8h
virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d

$ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
Status:                    Terminating (lasts 37h)

$ xsos sosreport-xxxx/|grep time
...
  Boot time: Wed Nov 22 01:44:11 AM UTC 2023
  Uptime:    8:27,  0 users

Expected results:

VM restart automatically OR does not stay in Terminating state

Additional info:

The issue has been seen two time.

First time, a crash of the kernel occured and we had the associated VM on the node in terminating state

Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
The VM running on the OCP node stay in terminating state

https://github.com/openshift/kubernetes/pull/1829

Bug OCPBUGS-32510: metrics server shows issue 403 forbidden + context deadline exceeded on SNO

View the Description View the linked PRs

MetricsServer feature gate was made GA https://github.com/openshift/api/pull/1851/files a few
nightly payload jobs showed failures after this

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node/1781765264501641216/artifacts/e2e-aws-ovn-single-node/gather-extra/artifacts/pods/openshift-monitoring_metrics-server-6897fb87fb-5pvrh_metrics-server.log

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node/1781765264501641216/artifacts/e2e-aws-ovn-single-node/gather-extra/artifacts/

Bug OCPBUGS-41611: 4.16: Disable:Broken for [sig-builds][Feature:Builds][Slow] can use private repositories as build input build using an HTTP token should be able to clone source code via an HTTP token [apigroup:build.openshift.io]

View the Description View the linked PRs

Description of problem:


Disable:Broken for [sig-builds][Feature:Builds][Slow] can use private repositories as build input build using an HTTP token should be able to clone source code via an HTTP token [apigroup:build.openshift.io]

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/29079

Bug OCPBUGS-31083: Incorrect help info for loglevel when using --v2 flag

View the Description View the linked PRs

Description of problem:

Incorrect help info for loglevel when using --v2 flag

Version-Release number of selected component (if applicable):

oc-mirror version 
WARNING: This version information is deprecated and will be replaced with the output from --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.16.0-202403070215.p0.gc4f8295.assembly.stream.el9-c4f8295", GitCommit:"c4f829512107f7d0f52a057cd429de2030b9b3b3", GitTreeState:"clean", BuildDate:"2024-03-07T03:46:24Z", GoVersion:"go1.21.7 (Red Hat 1.21.7-1.el9) X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1) Check `oc-mirror –v2 -h` help info
2) When use command `oc-mirror --config config.yaml --from file://out docker://xxxxxx.com:5000/test  --v2 --max-nested-paths 2 --loglevel trace`

Actual results:

oc-mirror --config config.yaml --from file://out docker://ec2-18-188-118-33.us-east-2.compute.amazonaws.com:5000/test  --v2 --max-nested-paths 2 --loglevel trace
--v2 flag identified, flow redirected to the oc-mirror v2 version. PLEASE DO NOT USE that. V2 is still under development and it is not ready to be used. 
2024/03/20 07:18:30  [INFO]   : mode diskToMirror 
2024/03/20 07:18:30  [TRACE]  : creating signatures directory out/working-dir/signatures 
2024/03/20 07:18:30  [TRACE]  : creating release images directory out/working-dir/release-images 
2024/03/20 07:18:30  [TRACE]  : creating release cache directory out/working-dir/hold-release 
2024/03/20 07:18:30  [TRACE]  : creating operator cache directory out/working-dir/hold-operator 
2024/03/20 07:18:30  [ERROR]  :  error parsing local storage configuration : invalid loglevel trace Must be one of [error, warn, info, debug]

Expected results:

Show correct help information

Additional info:

same for `--strict-archive archiveSize`, the information is not clear how to use it.

https://github.com/openshift/oc-mirror/pull/841

Bug OCPBUGS-31549: Node provisioning fails due to metadata wipe of non-OS disks having invalid block size

View the Description View the linked PRs

Description of problem: new feature in ironic to clear the non-OS disks during the bmh installation. Only works for disks with blocksize=512

Customer says the following:

This is unlisted new feature (or enhancement) in OCP4.14. This non-OS disk wiping during bmh installation is not available in 4.12.

    Version-Release number of selected component (if applicable):{code:none}



The following command generated by ironic.
"dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct"
Fails with

ironic-agent[4054]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct

ironic-agent[4054]: Exit code: 1

podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root [-] Unexpected error dispatching erase_devices_metadata to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7f050797f2e0>: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.

How reproducible:
Repeatable for bmh with disk has block size larger than 512.

Steps to Reproduce:

    1. This problem will occur on server with disk has block size greater than 512.   For example, SAMSUNG, p/n: KR-05RJND-SSK00-389-02DF-A02, that drive has block size of 4096.
    2. Add a bmh which has non-OS disk with block size greater than 512.
      2a.  The introspection of the bmh will be fine.
      2b.  When the bmh is added (provisioning phase), the OCP installation will try to wipe out the non-OS drive using command "dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct".   This can be monitored in the bmh via "journalctl -f -b".  In the case of disk with block size of 4096, the above dd command will be rejected.  The bmh will be rebooted and attempt the step 2b over again.
    3. Manual testing.  in a bmh server with disks have block-size greater than 512, (tested with disks have bz=4096 ).
      the command:    "dd bs=512 if=/dev/zero of=/dev/sdb count=33"  will failed
       the alternate command which determine the disk's block-size for the dd command will work. 
      "dd bs=$(blockdev --getss /dev/sdb) if=/dev/zero of=/dev/sdb count=33 oflag=direct"

Actual results:

Expected results:
disk will be formatted with any blocksize

additial logs:

In OCP 4.14.16, there is new feature in ironic to clear the non-OS disks during the bmh installation. The following command generated by ironic.
"dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct"
"dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct"

The reason for failure is the alignment restriction, logical block size are different depending disk type.

May be instead of
"dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct".

It could be replaced by:
"dd bs=$(blockdev --getss /dev/sda) if=/dev/zero of=/dev/sda count=33 oflag=direct"

That would work for various type of disks.

~~~~ THE IRONIC ERROR in OCP 4.14.16 ~~~~
-agent[4054]: 2024-03-27 04:00:48.240 1 ERROR root [-] Unexpected error dispatching erase_devices_metadata to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7f050797f2e0>: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root [-] Unexpected error dispatching erase_devices_metadata to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7f050797f2e0>: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n": ironic_python_agent.errors.BlockDeviceEraseError: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n"
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Traceback (most recent call last):
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3124, in dispatch_to_managers
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root return getattr(manager, method)(*args, **kwargs)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1702, in erase_devices_metadata
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root raise errors.BlockDeviceEraseError(excpt_msg)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root ironic_python_agent.errors.BlockDeviceEraseError: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n"
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.240 1 ERROR root
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root [-] Error performing clean step erase_devices_metadata: ironic_python_agent.errors.BlockDeviceEraseError: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n"
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Traceback (most recent call last):
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/extensions/clean.py", line 77, in execute_clean_step
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root result = hardware.dispatch_to_managers(step['step'], node, ports,
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3124, in dispatch_to_managers
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root return getattr(manager, method)(*args, **kwargs)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1702, in erase_devices_metadata
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root raise errors.BlockDeviceEraseError(excpt_msg)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root ironic_python_agent.errors.BlockDeviceEraseError: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n"
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root [-] Command failed: execute_clean_step, error: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n": ironic_python_agent.errors.BlockDeviceEraseError: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n"
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Traceback (most recent call last):
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/extensions/base.py", line 174, in run
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root result = self.execute_method(**self.command_params)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/extensions/clean.py", line 77, in execute_clean_step
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root result = hardware.dispatch_to_managers(step['step'], node, ports,
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 3124, in dispatch_to_managers
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root return getattr(manager, method)(*args, **kwargs)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root File "/usr/lib/python3.9/site-packages/ironic_python_agent/hardware.py", line 1702, in erase_devices_metadata
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root raise errors.BlockDeviceEraseError(excpt_msg)
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root ironic_python_agent.errors.BlockDeviceEraseError: Error erasing block device: Failed to erase the metadata on the device(s): "/dev/sdb": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Command: dd bs=512 if=/dev/zero of=/dev/sdb count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stderr: "dd: error writing '/dev/sdb': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.6049e-05 s, 0.0 kB/s\n"; "/dev/sda": Unexpected error while running command.
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Command: dd bs=512 if=/dev/zero of=/dev/sda count=33 oflag=direct
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stdout: ''
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root Stderr: "dd: error writing '/dev/sda': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes copied, 3.9605e-05 s, 0.0 kB/s\n"
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com podman[4014]: 2024-03-27 04:00:48.242 1 ERROR root
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Exit code: 1
Mar 27 04:00:48 r660hd-0.compactocp802.mavdallab.com ironic-agent[4054]: Stdout: ''

https://github.com/openshift/ironic-agent-image/pull/128

Bug OCPBUGS-34770: Network node identity uses unescaped IPv6 addresses in the ValidatingWebhookConfiguration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34359~~. The following is the description of the original issue:
—
The issue was observed during testing of the k8s 1.30 rebase in which the webhook client started using http2 for loopback IPs: kubernetes/kubernetes#122558.
It looks like the issue is caused by how a http2 client handles this invalid address, I verified this change by setting up a cluster with openshift/kubernetes#1953 and this pr.

https://github.com/openshift/cluster-network-operator/pull/2395

Bug OCPBUGS-34799: [4.16.z] SCC pinning for all workloads in platform namespaces

View the Description View the linked PRs

Backport of AUTH-482

https://github.com/openshift/machine-config-operator/pull/4384

Bug OCPBUGS-37494: [aws] "create" iam role permissions required even when BYO role

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36390~~. The following is the description of the original issue:
—
Description of problem:

    The Installer still requires permissions to create and delete IAM roles even when the users brings existing roles.

Version-Release number of selected component (if applicable):

    4.16+

How reproducible:

    always

Steps to Reproduce:

    1. Specify existing IAM role in the install-config
    2.
    3.

Actual results:

    The following permissions are required even though they are not used:
        "iam:CreateRole",
        "iam:DeleteRole",
        "iam:DeleteRolePolicy",
        "iam:PutRolePolicy",
        "iam:TagInstanceProfile"

Expected results:

    Only actually needed permissions are required.

Additional info:

    I think this is tech debt from when roles were not tagged. The fix will kind of revert https://github.com/openshift/installer/pull/5286

https://github.com/openshift/installer/pull/8768

Bug OCPBUGS-25696: HCP does not deploy cloud provider kubevirt with configured node selectors

View the Description View the linked PRs

Description of problem:

    When deploying a HCP KubeVirt cluster using the hcp's --node-selector cli arg, that node selector is not applied to the "kubevirt-cloud-controller-manager" pods within the HCP namespace. 

This makes it not possible to pin the entire HCP pods to specific nodes.

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

    100%

Steps to Reproduce:

    1. deploy an hcp kubevirt cluster with the --node-selector cli option
    2.
    3.

Actual results:

    the node selector is not applied to cloud provider kubevirt pod

Expected results:

    the node selector should be applied to cloud provider kubevirt pod.

Additional info:

https://github.com/openshift/hypershift/pull/3382

Bug OCPBUGS-28765: Azure CSI Driver operator has incorrect node selector for HyperShift

View the Description View the linked PRs

Description of problem:

  The azure csi driver operator cannot run in a HyperShift control plane because it has this selector: node-role.kubernetes.io/master: ""

Version-Release number of selected component (if applicable):

    4.16 ci latest

How reproducible:

always

Steps to Reproduce:

    1. Install hypershift
    2. Create azure hosted cluster

Actual results:

    azure-disk-csi-driver-operator pod remains in Pending state

Expected results:

    all control plane pods run

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/448

Task MULTIARCH-4610: Add support for overriding endpoints in cloud-provider-powervs

View the linked PRs

https://github.com/openshift/cloud-provider-powervs/pull/65

Bug OCPBUGS-26401: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/1045

Bug OCPBUGS-30455: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/402

Bug OCPBUGS-28787: CCO degrade when remove root credential for GCP cluster in Mint mode

View the Description View the linked PRs

Description of problem:

It was found when testing OCP-71263 and regression OCP-35770 for 4.15.
For GCP in Mint mode, the root credential can be removed after cluster installation.
But after removing the root credential, CCO became degrade.

Version-Release number of selected component (if applicable):

4.15.0-0.nightly-2024-01-25-051548

4.15.0-rc.3

How reproducible:

    
Always

Steps to Reproduce:

    1.Install a GCP cluster with Mint mode

    2.After install, remove the root credential
jianpingshu@jshu-mac ~ % oc delete secret -n kube-system gcp-credentials
secret "gcp-credentials" deleted     

    3.Wait some time(about 1/2h to 1h), CCO became degrade 
    
jianpingshu@jshu-mac ~ % oc get co cloud-credential
NAME               VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
cloud-credential   4.15.0-rc.3   True        True          True       6h45m   6 of 7 credentials requests are failing to sync.

jianpingshu@jshu-mac ~ % oc -n openshift-cloud-credential-operator get -o json credentialsrequests | jq -r '.items[] | select(tostring | contains("InfrastructureMismatch") | not) | .metadata.name as $n | .status.conditions // [{type: "NoConditions"}] | .[] | .type + "=" + .status + " " + $n + " " + .reason + ": " + .message' | sort
CredentialsProvisionFailure=False openshift-cloud-network-config-controller-gcp CredentialsProvisionSuccess: successfully granted credentials request
CredentialsProvisionFailure=True cloud-credential-operator-gcp-ro-creds CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-gcp-ccm CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-gcp-pd-csi-driver-operator CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-image-registry-gcs CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-ingress-gcp CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found
CredentialsProvisionFailure=True openshift-machine-api-gcp CredentialsProvisionFailure: failed to grant creds: unable to fetch root cloud cred secret: Secret "gcp-credentials" not found

openshift-cloud-network-config-controller-gcp has no failure because it doesn't has customized role in 4.15.0.rc3

Actual results:

 CCO became degrade

Expected results:

 CCO not in degrade, just "upgradeable" condition updated with missing the root credential

Additional info:

Tested the same case on 4.14.10, no issue

https://github.com/openshift/cloud-credential-operator/pull/685

Bug OCPBUGS-43344: Allow from host network networkpolicies do not work during live migration

View the Description View the linked PRs

Description of problem:

We are in a live migration scenario.

If a project has a networkpolicy to allow from the host network (more concretely, to allow from the ingress controllers and the ingress controllers are in the host network), traffic doesn't work during the live migration between any ingress controller node (either migrated or not migrated) and an already migrated application node.

I'll expand later in the description and internal comments, but the TL;DR is that the IPs of the tun0 of not migrated source nodes and the IPs of the ovn-k8s-mp0 from migrated source nodes are not added to the address sets related to the networkpolicy ACL in the target OVN-Kubernetes node, so that traffic is not allowed.

Version-Release number of selected component (if applicable):

4.16.13

How reproducible:

Always

Steps to Reproduce:

1. Before the migration: have a project with a networkpolicy that allows from the ingress controller and the ingress controller in the host network. Everything must work properly at this point.

2. Start the migration

3. During the migration, check connectivity from the host network of either a migrated node or a non-migrated node. Both will fail (checking from the same node doesn't fail)

Actual results:

Pod on the worker node is not reachable from the host network of the ingress controller node (unless the pod is in the same node than the ingress controller), which causes the ingress controller routes to throw 503 error.

Expected results:

Pod on the worker node to be reachable from the ingress controller node, even when the ingress controller node has not migrated yet and the application node has.

Additional info:

This is not a duplicate of OCPBUGS-42578. This bug refers to the host-to-pod communication path while the other one doesn't.

This is a customer issue. More details to be included in private comments for privacy.

Workaround: Creating a networkpolicy that explicitly allows traffic from tun0 and ovn-k8s-mp0 interfaces. However, note that the workaround can be problematic for clusters with hundreds or thousands of projects. Another possible workaround is to temporarily delete all the networkpolicies of the projects. But again, this may be problematic (and a security risk).

Bug OCPBUGS-25563: Update 4.16 ose-alibaba-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-alibaba-cloud/pull/44

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/44

Bug OCPBUGS-38697: Post SDN to OVN live migration shows intermittent service connectivity failures for OSD on GCP cluster

View the Description View the linked PRs

Description of problem: After migrating cluster from SDN to OVN, seeing intermittent failures while accessing service.

Wed Jul 31 05:28:11 UTC 2024
Wed Jul 31 01:28:11 EDT 2024
Hello OpenShift!


Wed Jul 31 05:28:42 UTC 2024
Wed Jul 31 01:28:42 EDT 2024
curl: (7) Failed to connect to 34.92.142.227 port 27018 after 75006 ms: Couldn't connect to server


Wed Jul 31 05:30:27 UTC 2024
Wed Jul 31 01:30:27 EDT 2024
Hello OpenShift!



Wed Jul 31 05:31:59 UTC 2024
Wed Jul 31 01:31:59 EDT 2024
Hello OpenShift!


Wed Jul 31 05:33:31 UTC 2024
Wed Jul 31 01:33:31 EDT 2024
Hello OpenShift!


Wed Jul 31 05:34:01 UTC 2024
Wed Jul 31 01:34:01 EDT 2024
curl: (52) Empty reply from server


Wed Jul 31 05:38:51 UTC 2024
Wed Jul 31 01:38:51 EDT 2024
Hello OpenShift!

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.15.14
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.0-0.nightly-2024-07-29-053620
Kubernetes Version: v1.28.11+add48d0*no* further _formatting_ is done here

How reproducible:

Steps to Reproduce:

1. Create a 4.14 SDN OSD on GCP cluster

2. Upgrade to 4.15

3. Scale cluster to 24 nodes

4. Add cluster-density-v2 workload

5. Run migration and let if finish

6. Start seeing errors

Actual results: Intermittent failures accessing service.

Expected results: Live migration should not cause disruption to service.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Do presume that Engineering will access attachments through supportshell.
Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

When showing the results from commands, include the entire command in the output.
For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
For guidance on using this template please see
OCPBUGS Template Training for Networking components

https://github.com/openshift/ovn-kubernetes/pull/2282

Story CONSOLE-3948: Rename “supported but not recommended” to "known issues"

View the Description View the linked PRs

Description of problem:

OTA team wants to rename the `supported but not recommended` update edges to `known issues`

Version-Release number of selected component (if applicable):

    openshift-4.16

Expected results:

`supported but not recommended` edges are renamed to `known issues`

Additional info:

https://github.com/openshift/console/blob/962ab7b443f87e45486741444a32053a807d3219/frontend/public/components/modals/cluster-update-modal.tsx#L191

https://github.com/openshift/console/blob/962ab7b443f87e45486741444a32053a807d3219/frontend/public/components/modals/cluster-update-modal.tsx#L216

https://github.com/openshift/console/blob/962ab7b443f87e45486741444a32053a807d3219/frontend/public/components/modals/cluster-update-modal.tsx#L219

https://github.com/openshift/console/blob/962ab7b443f87e45486741444a32053a807d3219/frontend/public/components/modals/cluster-update-modal.tsx#L234

https://github.com/openshift/console/blob/962ab7b443f87e45486741444a32053a807d3219[…]rontend/public/components/cluster-settings/cluster-settings.tsx

https://github.com/openshift/console/pull/13622

Bug OCPBUGS-24807: Update 4.16 ose-cloud-credential-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-credential-operator/pull/639

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-credential-operator/pull/639

Bug OCPBUGS-24852: Update 4.16 ose-cluster-ovirt-csi-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovirt-csi-driver-operator/pull/129

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-33376: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/534

Bug OCPBUGS-24751: Update 4.16 openshift-enterprise-base-rhel9-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/155

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/155

Bug OCPBUGS-29388: OpenShift Document for AWS Cloudformation Template on Worker Nodes needs updated description

View the Description View the linked PRs

Description of problem:[link Worker CloudFormation Template|[Installing a cluster on AWS using CloudFormation templates - Installing on AWS | Installing | OpenShift Container Platform 4.13|https://docs.openshift.com/container-platform/4.13/installing/installing_aws/installing-aws-user-infra.html#installation-cloudformation-worker_installing-aws-user-infra]]

    In OpenShift Documentation under Manual AWS Cloudformation Templates. Within the cloudformation template for Worker Nodes. The description for Subnet and WorkerSecurityGroupId refer to the Master Nodes. Based on the variable names the descriptions should refer to Worker Nodes instead.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8112

Bug OCPBUGS-30586: CAPI E2Es failing to start in some CAPI provider's release branches

View the Description View the linked PRs

Description of problem:

CAPI E2Es failing to start in some CAPI provider's release branches.

Failing with the following error:

`go: errors parsing go.mod:94/tmp/tmp.ssf1LXKrim/go.mod:5: unknown directive: toolchain`

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-api/199/pull-ci-openshift-cluster-api-master-e2e-aws-capi-techpreview/1765512397532958720#1:build-log.txt%3A91-95

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

  This is because the script launching the e2e is launching it from the `main` branch of the cluster-capi-operator (which has some backward incompabible go toolchain changes), rather than the correctly matching release branch.

Bug OCPBUGS-31073: Can't install operator on 4.15 after uninstalling it on a prior version

View the Description View the linked PRs

Description of problem

I had a version of MTC installed on my cluster when it was running a prior version. I had deleted it some time ago, long before upgrading to 4.15. I upgraded it to 4.15 and needed to reinstall to take a look at something, but found the operator would not install.

I originally tried with 4.15.0, but on failure upgraded to 4.15.3 to see if it would resolve the issue, but it did no.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.15.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.3
Kubernetes Version: v1.28.7+6e2789b

How reproducible:

Always as far as I can tell. I have at least two clusters where I was able to reproduce it.

Steps to Reproduce:

    1. Install Migration Toolkit for Containers on OpenShift 4.14
    2. Uninstall it
    3. Upgrade to 4.15
    4. Try to install it again

Actual results:

The operator never installs. UI just shows "Upgrade status: Unkown Failure"

Observe the catalog operator logs and note errors like:
E0319 21:35:57.350591       1 queueinformer_operator.go:319] sync {"update" "openshift-migration"} failed: bundle unpacking failed with an error: [roles.rbac.authorization.k8s.io "c1572438804f004fb90b6768c203caad96c47331f7ecc4f68c3cf6b43b0acfd" already exists, roles.rbac.authorization.k8s.io "724788f6766aa5ba19b24ef4619b6a8e8e856b8b5fb96e1380f0d3f5b9dcb7a" already exists]

If you delete the roles, you'll get the same for rolebindings, then the same for jobs.batch, and then configmaps.

Expected results:

Operator just installs

Additional info:

If you clean up all these resources the operator will install successfully.

https://github.com/openshift/operator-framework-olm/pull/737

Bug OCPBUGS-32947: [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

View the Description View the linked PRs

Description of problem:

    [vSphere] network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-04-23-032717

How reproducible:

    Always

Steps to Reproduce:

    1.Install a vSphere 4.16 cluster, we use automated template: ipi-on-vsphere/versioned-installer
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-04-23-032717   True        False         24m     Cluster version is 4.16.0-0.nightly-2024-04-23-032717     

    2.Check the controlplanemachineset, you can see network.devices, template and workspace have value.
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset     
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         3         3       3                       Active   51m
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T02:52:11Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "18273"
  uid: f340d9b4-cf57-4122-b4d4-0f45f20e4d79
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Active
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices:
              - networkName: devqe-segment-221
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: huliu-vs425c-f5tfl-rhcos-generated-region-generated-zone
            userDataSecret:
              name: master-user-data
            workspace:
              datacenter: DEVQEdatacenter
              datastore: /DEVQEdatacenter/datastore/vsanDatastore
              folder: /DEVQEdatacenter/vm/huliu-vs425c-f5tfl
              resourcePool: /DEVQEdatacenter/host/DEVQEcluster/Resources
              server: vcenter.devqe.ibmc.devcluster.openshift.com
status:
  conditions:
  - lastTransitionTime: "2024-04-25T02:59:37Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:03:45Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:01:04Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    3.Delete the controlplanemachineset, it will recreate a new one, but those three fields that had values before are now cleared.

liuhuali@Lius-MacBook-Pro huali-test % oc delete controlplanemachineset cluster
controlplanemachineset.machine.openshift.io "cluster" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
cluster   3         3         3       3                       Inactive   6s
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset cluster -oyaml
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2024-04-25T03:45:51Z"
  finalizers:
  - controlplanemachineset.machine.openshift.io
  generation: 1
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "46172"
  uid: 45d966c9-ec95-42e1-b8b0-c4945ea58566
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  state: Inactive
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        platform: VSphere
        vsphere:
        - name: generated-failure-domain
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: huliu-vs425c-f5tfl
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            apiVersion: machine.openshift.io/v1beta1
            credentialsSecret:
              name: vsphere-cloud-credentials
            diskGiB: 120
            kind: VSphereMachineProviderSpec
            memoryMiB: 16384
            metadata:
              creationTimestamp: null
            network:
              devices: null
            numCPUs: 4
            numCoresPerSocket: 4
            snapshot: ""
            template: ""
            userDataSecret:
              name: master-user-data
            workspace: {}
status:
  conditions:
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Error
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-04-25T03:45:51Z"
    message: ""
    observedGeneration: 1
    reason: AllReplicasUpdated
    status: "False"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3     

    4.I active the controlplanemachineset and it does not trigger an update,  I continue to add these field values back and it does not trigger an update, I continue to edit these fields to add a second network device and it still does not trigger an update. 


            network:
              devices:
              - networkName: devqe-segment-221
              - networkName: devqe-segment-222


By the way, I can create worker machines with other network device or two network devices.
huliu-vs425c-f5tfl-worker-0a-ldbkh    Running                          81m
huliu-vs425c-f5tfl-worker-0aa-r8q4d   Running                          70m

Actual results:

    network.devices, template and workspace will be cleared when deleting the controlplanemachineset, updating these fields will not trigger an update

Expected results:

    The fields value should not be changed when deleting the controlplanemachineset, 
    Updating these fields should trigger an update, or if these fields should not be modified, then it should not take effect when modifying the controlplanemachineset, as such an inconsistency seems confusing.

Additional info:

    Must gather:  https://drive.google.com/file/d/1mHR31m8gaNohVMSFqYovkkY__t8-E30s/view?usp=sharing

Bug OCPBUGS-33378: Builds TestWebhook failed on step testing unauthenticated forbidden on upgrade

View the Description View the linked PRs

During jobs that upgrade to 4.16 from 4.15, the testing of unauthenticated build webhook invocation fails (I suspect due to the existing rolebindings from 4.15 surviving the upgrade).

[sig-builds][Feature:Builds][webhook] TestWebhook [apigroup:build.openshift.io][apigroup:image.openshift.io] [Suite:openshift/conformance/parallel] 
.
.
.
    STEP: testing unauthenticated forbidden webhooks @ 05/07/24 20:03:20.024
    STEP: executing the webhook to get the build object @ 05/07/24 20:03:20.024
    [FAILED] in [It] - github.com/openshift/origin/test/extended/builds/webhook.go:36 @ 05/07/24 20:03:20.148

https://github.com/openshift/origin/pull/28783

Bug OCPBUGS-34759: [4.16] The secrets-store-csi-driver with AWS provider integration does not work in HyperShift hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18711~~. The following is the description of the original issue:
—
Description of problem:

secrets-store-csi-driver with AWS provider does not work in HyperShift hosted cluster, pod can't mount the volume successfully.

Version-Release number of selected component (if applicable):

secrets-store-csi-driver-operator.v4.14.0-202308281544 in 4.14.0-0.nightly-2023-09-06-235710 HyperShift hosted cluster.

How reproducible:

Always

Steps to Reproduce:

1. Follow test case OCP-66032 "Setup" part to install secrets-store-csi-driver-operator.v4.14.0-202308281544 , secrets-store-csi-driver and AWS provider successfully:

$ oc get po -n openshift-cluster-csi-drivers
NAME                                                READY   STATUS    RESTARTS   AGE
aws-ebs-csi-driver-node-7xxgr                       3/3     Running   0          5h18m
aws-ebs-csi-driver-node-fmzwf                       3/3     Running   0          5h18m
aws-ebs-csi-driver-node-rgrxd                       3/3     Running   0          5h18m
aws-ebs-csi-driver-node-tpcxq                       3/3     Running   0          5h18m
csi-secrets-store-provider-aws-2fm6q                1/1     Running   0          5m14s
csi-secrets-store-provider-aws-9xtw7                1/1     Running   0          5m15s
csi-secrets-store-provider-aws-q5lvb                1/1     Running   0          5m15s
csi-secrets-store-provider-aws-q6m65                1/1     Running   0          5m15s
secrets-store-csi-driver-node-4wdc8                 3/3     Running   0          6m22s
secrets-store-csi-driver-node-n7gkj                 3/3     Running   0          6m23s
secrets-store-csi-driver-node-xqr52                 3/3     Running   0          6m22s
secrets-store-csi-driver-node-xr24v                 3/3     Running   0          6m22s
secrets-store-csi-driver-operator-9cb55b76f-7cbvz   1/1     Running   0          7m16s

2. Follow test case OCP-66032 steps to create AWS secret, set up AWS IRSA successfully.

3. Follow test case OCP-66032 steps SecretProviderClass, deployment with the secretProviderClass successfully. Then check pod, pod is stuck in ContainerCreating:

$ oc get po
NAME                               READY   STATUS              RESTARTS   AGE
hello-openshift-84c76c5b89-p5k4f   0/1     ContainerCreating   0          10m

$ oc describe po hello-openshift-84c76c5b89-p5k4f
...
Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    11m   default-scheduler  Successfully assigned xxia-proj/hello-openshift-84c76c5b89-p5k4f to ip-10-0-136-205.us-east-2.compute.internal
  Warning  FailedMount  11m   kubelet            MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 92d1ff5b-36be-4cc5-9b55-b12279edd78e
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 50907328-70a6-44e0-9f05-80a31acef0b4
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 617dc3bc-a5e3-47b0-b37c-825f8dd84920
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 8ab5fc2c-00ca-45e2-9a82-7b1765a5df1a
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: b76019ca-dc04-4e3e-a305-6db902b0a863
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: b395e3b2-52a2-4fc2-80c6-9a9722e26375
  Warning  FailedMount  11m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: ec325057-9c0a-4327-80c9-a9b6233a64dd
  Warning  FailedMount  10m  kubelet  MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
           status code: 400, request id: 405492b2-ed52-429b-b253-6a7c098c26cb
  Warning  FailedMount  82s (x5 over 9m35s)  kubelet  Unable to attach or mount volumes: unmounted volumes=[secrets-store-inline], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
  Warning  FailedMount  74s (x5 over 9m25s)  kubelet  (combined from similar events): MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod xxia-proj/hello-openshift-84c76c5b89-p5k4f, err: rpc error: code = Unknown desc = us-east-2: Failed fetching secret xxiaSecret: WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Incorrect token audience
  status code: 400, request id: c38bbed1-012d-4250-b674-24ab40607920

Actual results:

Hit above stuck issue.

Expected results:

Pod should be Running.

Additional info:

Compared another operator (cert-manager-operator) which also uses AWS IRSA: OCP-62500 , that case works well. So secrets-store-csi-driver-operator has bug.

https://github.com/openshift/hypershift/pull/4134

Bug OCPBUGS-29233: Internal Registry does not recognize the `ca-west-1` AWS Region

View the Description View the linked PRs

Description of problem:

Internal registry Pods will panic while deploying OCP on `ca-west-1` AWS Region

Version-Release number of selected component (if applicable):

4.14.2

How reproducible:

Every time

Steps to Reproduce:

    1. Deploy OCP on `ca-west-1` AWS Region

Actual results:

$ oc logs image-registry-85b69cd9fc-b78sb -n openshift-image-registry
time="2024-02-08T11:43:09.287006584Z" level=info msg="start registry" distribution_version=v3.0.0+unknown go.version="go1.20.10 X:strictfipsruntime" openshift_version=4.14.0-202311021650.p0.g5e7788a.assembly.stream-5e7788a
time="2024-02-08T11:43:09.287365337Z" level=info msg="caching project quota objects with TTL 1m0s" go.version="go1.20.10 X:strictfipsruntime"
panic: invalid region provided: ca-west-1goroutine 1 [running]:
github.com/distribution/distribution/v3/registry/handlers.NewApp({0x2873f40?, 0xc00005c088?}, 0xc000581800)
    /go/src/github.com/openshift/image-registry/vendor/github.com/distribution/distribution/v3/registry/handlers/app.go:130 +0x2bf1
github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp({0x2873f40, 0xc00005c088}, 0x0?, {0x2876820?, 0xc000676cf0})
    /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0xb9
github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp({0x2873f40?, 0xc00005c088}, {0x285ffd0?, 0xc000916070}, 0xc000581800, 0xc00095c000, {0x0?, 0x0})
    /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x485
github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer({0x2873f40, 0xc00005c088}, 0xc000581800, 0xc00095c000)
    /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:212 +0x38a
github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute({0x2858b60, 0xc000916000})
    /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:166 +0x86b
main.main()
    /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x496

Expected results:

The internal registry is deployed with no issues

Additional info:

This is a new AWS Region we are adding support to. The support will be backported to 4.14.z

Bug OCPBUGS-18716: OVN br-int flows do not get updated on other nodes when a nodes bond MACADDR is changed to other slave interface after reboot.

View the Description View the linked PRs

Description of problem:

OVN br-int ovs flows do not get updated on other nodes when a nodes bond MACADDR is changed to other slave interface after reboot. This causes network traffic coming for the sdn of one node to get dropped when it hits the node that changed mac addresses on its bond interface.

Version-Release number of selected component (if applicable): 4.12+

How reproducible: 100% of the time after rebooting if the mac changes. mac does not always change.

Steps to Reproduce:

1. Capture bond0 mac before reboot
2. Reboot host
3. Confirm mac change
4.  oc run --rm -it test-pod-sdn --image=registry.redhat.io/openshift4/network-tools-rhel8  --overrides='{"spec": {"tolerations": [{"operator": "Exists"}],"nodeSelector":{"kubernetes.io/hostname":"nodeb-not-rebooted"}}}' /bin/bash
5. Ping rebooted node

Actual results:

ping hits rebooted node but is dropped because the MAC address is of other slave interface and not the one bond is using.

Expected results:

OVS flows to update on all nodesafter reboot if MAC changes

Additional info:

  If we restart NetworkManager a couple times this triggers OVS flows to get updated, not sure why. 

Possible workarounds 
 -  https://access.redhat.com/solutions/6972925
 - Statically set the mac of bond0 to one of the slave interfaces.

https://github.com/openshift/ovn-kubernetes/pull/2010

Bug TRT-1539: Loki outages should not fail tests

View the Description View the linked PRs

Yesterday a major DPCR and thus Loki outage took the system down entirely. One test would fail as a result:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1762573177050894336

[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

    [
      {
        "metric": {
          "__name__": "ALERTS",
          "alertname": "KubeDaemonSetRolloutStuck",
          "alertstate": "firing",
          "container": "kube-rbac-proxy-main",
          "daemonset": "loki-promtail",
          "endpoint": "https-main",
          "job": "kube-state-metrics",
          "namespace": "openshift-e2e-loki",
          "prometheus": "openshift-monitoring/k8s",
          "service": "kube-state-metrics",
          "severity": "warning"
        },
        "value": [
          1709071917.851,
          "1"
        ]
      }
    ]

The query this test uses should be adapted to omit everything in openshift-e2e-loki.

Ideally, backports would be good here, but we could just fix it going forward also if this is too cumbersome.

https://github.com/openshift/origin/pull/28627

Task HOSTEDCP-1376: Bump sigs.k8s and update dependabot groupings

View the Description View the linked PRs

Bump the sigs.k8s dependencies and update dependabot groupings

https://github.com/openshift/hypershift/pull/3392

Bug OCPBUGS-34506: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8478

Bug OCPBUGS-35503: [4.16] metal3 pod produces too much logs and eats up the node disk space

View the Description View the linked PRs

This is a clone of bug ~~OCPBUGS-35211~~, so that the fix can be backported to 4.16.
------
Description of problem:

The ACM perf/scale hub OCP has  3 baremetal nodes, each has 480GB for the installation disk. metal3 pod uses too much disk space for logs and make the node has disk presure and start evicting pods. which make the ACM stop provisioning clusters.
below is the log size of the metal3 pods:
# du -h -d 1 /sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83
4.0K	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/machine-os-images
276M	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-httpd
181M	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ironic
384G	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ramdisk-logs
77M	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ironic-inspector
385G	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83

# ls -l -h /sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ramdisk-logs
total 384G
-rw-------. 1 root root 203G Jun 10 12:44 0.log
-rw-r--r--. 1 root root 6.5G Jun 10 09:05 0.log.20240610-084807.gz
-rw-r--r--. 1 root root 8.1G Jun 10 09:27 0.log.20240610-090606.gz
-rw-------. 1 root root 167G Jun 10 09:27 0.log.20240610-092755

the logs are too huge to be attached. Please contact me if you need access to the cluster to check.

Version-Release number of selected component (if applicable):

the one has the issue is 4.16.0-rc4. 4.16.0.rc3 does not have the issue

How reproducible:

Steps to Reproduce:

1.Install latest ACM 2.11.0 build on OCP 4.16.0-rc4 and deploy 3500 SNOs on baremetal hosts
2.
3.

Actual results:

ACM stop deploying the rest of SNOs after 1913 SNOs are deployed b/c ACM pods are being evicated.

Expected results:

3500 SNOs are deployed.

Additional info:

https://github.com/openshift/cluster-baremetal-operator/pull/426

Bug OCPBUGS-26440: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-35323: 4.16 bootstrap AWS S3 bucket fails to be created with s3 bucket policy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35309~~. The following is the description of the original issue:
—
Description of problem:


Installation of 4.16 fails with a AWS AccessDenied error trying to attach a bootstrap s3 bucket policy.

Version-Release number of selected component (if applicable):


4.16+

How reproducible:


Every time

Steps to Reproduce:

1. Create an installer policy with the permissions listed in the installer [here|https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/permissions.go]
2. Run a install in AWS IPI

Actual results:


Install fails attempting to attach a policy to the bootstrap s3 bucket

{code:java}
time="2024-06-11T14:58:15Z" level=debug msg="I0611 14:58:15.485718     132 s3.go:256] \"Created bucket\" controller=\"awscluster\" controllerGroup=\"infrastru
cture.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/jamesh-sts-8tl72\" namespace=\"openshift-cluster-api-guests\"
 name=\"jamesh-sts-8tl72\" reconcileID=\"c390f027-a2ee-4d37-9e5d-b6a11882c46b\" cluster=\"openshift-cluster-api-guests/jamesh-sts-8tl72\" bucket_name=\"opensh
ift-bootstrap-data-jamesh-sts-8tl72\""
time="2024-06-11T14:58:15Z" level=debug msg="E0611 14:58:15.643613     132 controller.go:329] \"Reconciler error\" err=<"
time="2024-06-11T14:58:15Z" level=debug msg="\tfailed to reconcile S3 Bucket for AWSCluster openshift-cluster-api-guests/jamesh-sts-8tl72: ensuring bucket pol
icy: creating S3 bucket policy: AccessDenied: Access Denied"

Expected results:{code:none}

Install completes successfully

Additional info:


The installer did not attach an S3 bootstrap bucket policy in the past as far as I can tell [here|https://github.com/openshift/installer/blob/release-4.15/data/data/aws/cluster/main.tf#L133-L148], this new permission is required because of new functionality. 

CAPA is placing a policy that denies non SSL encrypted traffic to the bucket, this shouldn't have an effect on installs, adding the IAM policy to allow the policy to be added results in a successful install. 

S3 bootstrap bucket policy:


{code:java}
            "Statement": [
                {
                    "Sid": "ForceSSLOnlyAccess",
                    "Principal": {
                        "AWS": [
                            "*"
                        ]
                    },
                    "Effect": "Deny",
                    "Action": [
                        "s3:*"
                    ],
                    "Resource": [
                        "arn:aws:s3:::openshift-bootstrap-data-jamesh-sts-2r5f7/*"
                    ],
                    "Condition": {
                        "Bool": {
                            "aws:SecureTransport": false
                        }
                    }
                }
            ]
        },

https://github.com/openshift/installer/pull/8576

Story WRKLDS-1004: components should use AlwaysAllow UnhealthyPodEvictionPolicy in PDBs

View the Description View the linked PRs

allow eviction of unhealthy (not ready) pods even if there are no disruptions allowed on a PodDisruptionBudget. This can help to drain/maintain a node and recover without a manual intervention when multiple instances of nodes or pods are misbehaving.

to prevent possible issues similar to https://issues.redhat.com//browse/OCPBUGS-23796

Bug OCPBUGS-32172: Fix name in setup.cfg for cachito configuration

View the Description View the linked PRs

the name in setup.cfg is incorrectly set as ironic-image
it should be ironic-agent-image

https://github.com/openshift/ironic-agent-image/pull/122

Bug OCPBUGS-25605: pinned packages in ironic-image breaks ART pipeline

View the Description View the linked PRs

because of the pin in the packages list the ART pipeline is rebuilding packages all the time
unfortunately we need to remove the strong pins and move back to relaxed ones

once that's done we need to merge https://github.com/openshift-eng/ocp-build-data/pull/4097

https://github.com/openshift/ironic-image/pull/441

Bug OCPBUGS-32841: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1747

Bug OCPBUGS-26114: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/161

Bug OCPBUGS-27779: Wrong disk size filled in when expanding a pvc in the UI

View the Description View the linked PRs

Description of problem:

When expanding a PVC of unit-less size (e.g., '2147483648'), the Expand PersistentVolumeClaim modal populates the spinner with a unit-less value (e.g., 2147483648) instead of a meaningful value.

Version-Release number of selected component (if applicable):

CNV - 4.14.3

How reproducible:

always

Steps to Reproduce:

1.Create a PVC using the following YAML.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:   
  name: task-pv-claim
spec: 
  storageClassName: gp3-csi
  accessModes:     
    - ReadWriteOnce
  resources: 
    requests:       
      storage: "2147483648"

apiVersion: v1
kind: Pod
metadata:   
  name: task-pv-pod
spec:   
  securityContext:     
    runAsNonRoot: true
    seccompProfile:       
      type: RuntimeDefault
  volumes:     
    - name: task-pv-storage
      persistentVolumeClaim:         
        claimName: task-pv-claim
  containers:     
    - name: task-pv-container
      image: nginx
      ports:         
        - containerPort: 80
          name: "http-server"
      volumeMounts:         
        - mountPath: "/usr/share/nginx/html"
          name: task-pv-storage

2. From the newly created PVC details page, Click Actions > Expand PVC.
3. Note the value in the spinner input.

See https://drive.google.com/file/d/1toastX8rCBtUzx5M-83c9Xxe5iPA8fNQ/view for a demo

https://github.com/openshift/console/pull/13532

Bug OCPBUGS-24746: Update 4.16 openshift-enterprise-builder-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/builder/pull/375

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/builder/pull/382

Bug MGMT-17438: ODF min disk changed to 75G (should be 25G)

View the Description View the linked PRs

Description of the problem:

Looks like ODF minimum size disk validation set to 75 G per node while it was ~25.
The validation should when only ODF enabled.

ODFMinDiskSizeGB int64 `envconfig:"ODF_MIN_DISK_SIZE_GB" default:"25"`

insufficient   ODF requirements: Insufficient resources to deploy ODF in compact mode. ODF requires a minimum of 3 hosts. Each host must have at least 1 additional disk of 75 GB minimum and an installation disk.

How reproducible:

always

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/6150

Bug MGMT-17523: OSImage without CPU architecture should not be accepted

View the Description View the linked PRs

Creating an OS image without a cpu architecture field is currently allowed.

    - openshiftVersion: "4.12"
      version: "rhcos-412.86.202308081039-0"
      url: "http://registry.ocp-edge-cluster-assisted-0.qe2.e2e.bos.redhat.com:8080/images/openshift-v4/amd64/dependencies/rhcos/4.12/4.12.30/rhcos-live.x86_64.iso"

This results in invalid InfraEnvs being allowed and assisted-image-service returning an empty ISO file

Assisted-image-service log (4.12- is missing the architecture):

{"file":"/remote-source/app/pkg/imagestore/imagestore.go:299","func":"github.com/openshift/assisted-image-service/pkg/imagestore.(*rhcosStore).Populate","level":"info","msg":"Finished creating minimal iso for 4.12- (rhcos-412.86.202308081039-0)","time":"2024-04-11T17:04:16Z"}

InfraEnv conditions:

[
  {
    "lastTransitionTime": "2024-04-11T17:04:47Z",
    "message": "Image has been created",
    "reason": "ImageCreated",
    "status": "True",
    "type": "ImageCreated"
  }
]

InfraEnv ISODownloadURL:

  isoDownloadURL: https://assisted-image-service-multicluster-engine.apps.ocp-edge-cluster-assisted-0.qe2.e2e.bos.redhat.com/byapikey/eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI1ZjZhZmZjYy0zMzMwLTQ0NTYtODkxOC1lOThmYTE5ZTU2NGQifQ.T3h-_q6yMr1JvNkWXMspNk_9MFsHOX-CGBlBIlfpgjje9k-Y6RsI_6cWdZgJTPT0nMXRJiEUuvBJZJGPNdK-MQ/4.12/x86_64/minimal.iso

Actually curling for this URL:

$ curl -kI "https://assisted-image-service-multicluster-engine.apps.ocp-edge-cluster-assisted-0.qe2.e2e.bos.redhat.com/byapikey/eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI1ZjZhZmZjYy0zMzMwLTQ0NTYtODkxOC1lOThmYTE5ZTU2NGQifQ.T3h-_q6yMr1JvNkWXMspNk_9MFsHOX-CGBlBIlfpgjje9k-Y6RsI_6cWdZgJTPT0nMXRJiEUuvBJZJGPNdK-MQ/4.12/x86_64/minimal.iso"
HTTP/1.1 404 Not Found
content-type: text/plain; charset=utf-8
x-content-type-options: nosniff
date: Thu, 11 Apr 2024 17:09:35 GMT
content-length: 19
set-cookie: 1a4b5ac1ad25c005c048fb541ba389b4=02300906d3489ab71b6417aaeed52390; path=/; HttpOnly; Secure; SameSite=None

https://github.com/openshift/assisted-service/pull/6256

Bug OCPBUGS-38399: Cannot use new proxy settings in Alertmanager configuration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38398~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38174. The following is the description of the original issue:
—
Description of problem:

The prometheus operator fails to reconcile when proxy settings like no_proxy are set in the Alertmanager configuration secret.

Version-Release number of selected component (if applicable):

4.15.z and later

How reproducible:

    Always when AlertmanagerConfig is enabled

Steps to Reproduce:

    1. Enable UWM with AlertmanagerConfig
    enableUserWorkload: true
    alertmanagerMain:
      enableUserAlertmanagerConfig: true
    2. Edit the "alertmanager.yaml" key in the alertmanager-main secret (see attached configuration file)
    3. Wait for a couple of minutes.

Actual results:

Monitoring ClusterOperator goes Degraded=True.

Expected results:

No error

Additional info:

The Prometheus operator logs show that it doesn't understand the proxy_from_environment field.

https://github.com/openshift/prometheus-operator/pull/299

Bug OCPBUGS-38464: [release-4.16] "OpenShift LightSpeed" should be "OpenShift Lightspeed" on getting started resource card of overview page

View the Description View the linked PRs

This is a clone of issue OCPBUGS-38228. The following is the description of the original issue:
—
Description of problem:

On overview page's getting started resources card, there is "OpenShift LightSpeed" link when this operator is available on the cluster, the text should be updated to "OpenShift Lightspeed" to keep consistent with operator name.

Version-Release number of selected component (if applicable):

4.17.0-0.nightly-2024-08-08-013133
4.16.0-0.nightly-2024-08-08-111530

How reproducible:

Always

Steps to Reproduce:

    1. Check overview page's getting started resources card,  
    2.
    3.

Actual results:

1. There is "OpenShift LightSpeed" link  in "Explore new features and capabilities"

Expected results:

1. The text should be "OpenShift Lightspped" to keep consistent with operator name.

Additional info:

https://github.com/openshift/console/pull/14148

Bug OCPBUGS-30476: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/57

Bug OCPBUGS-36834: GCP cluster with CCO Passthrough mode failed to install due to CCO degraded

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36140~~. The following is the description of the original issue:
—
Description of problem:

GCP private cluster with CCO Passthrough mode failed to install due to CCO degraded.
status:  
conditions:  - lastTransitionTime: "2024-06-24T06:04:39Z"    message: 1 of 7 credentials requests are failing to sync.    reason: CredentialsFailing    status: "True"    type: Degraded

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2024-06-21-203120

How reproducible:

Always

Steps to Reproduce:

    1.Create GCP private cluster with CCO Passthrough mode, flexy template is private-templates/functionality-testing/aos-4_13/ipi-on-gcp/versioned-installer-xpn-private     
    2.Wait for cluster installation

Actual results:

jianpingshu@jshu-mac ~ % oc get clusterversionNAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUSversion             False       False         23m     Error while reconciling 4.13.0-0.nightly-2024-06-21-203120: the cluster operator cloud-credential is degraded

status:  
conditions:  - lastTransitionTime: "2024-06-24T06:04:39Z"    message: 1 of 7 credentials requests are failing to sync.    reason: CredentialsFailing    status: "True"    type: Degraded

jianpingshu@jshu-mac ~ % oc -n openshift-cloud-credential-operator get -o json credentialsrequests | jq -r '.items[] | select(tostring | contains("InfrastructureMismatch") | not) | .metadata.name as $n | .status.conditions // [{type: "NoConditions"}] | .[] | .type + "=" + .status + " " + $n + " " + .reason + ": " + .message' | sortCredentialsProvisionFailure=True cloud-credential-operator-gcp-ro-creds CredentialsProvisionFailure: failed to grant creds: error while validating permissions: error testing permissions: googleapi: Error 400: Permission commerceoffercatalog.agreements.list is not valid for this resource., badRequest
NoConditions= openshift-cloud-network-config-controller-gcp :
NoConditions= openshift-gcp-ccm :
NoConditions= openshift-gcp-pd-csi-driver-operator :
NoConditions= openshift-image-registry-gcs :
NoConditions= openshift-ingress-gcp :
NoConditions= openshift-machine-api-gcp :

Expected results:

Cluster installed successfully without degrade

Additional info:

Some problem PROW CI tests: 
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.14-multi-nightly-gcp-ipi-user-labels-tags-filestore-csi-tp-arm-f14/1805064266043101184
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-gcp-ipi-xpn-fips-f28/1804676149503070208

https://github.com/openshift/cloud-credential-operator/pull/714

Bug OCPBUGS-26434: Ensure Passwords are Redacted in Agent Gather manifest Files

View the Description View the linked PRs

When platform specific passwords are included in the install-config.yaml they are stored in the generated agent-cluster-install.yaml, which is included in the output of the agent-gather command. These passwords should be redacted.

https://github.com/openshift/installer/pull/7873

Bug OCPBUGS-26772: Wrong disk size filled in and cannot be changed when cloning a pvc in the UI

View the Description View the linked PRs

Description of problem:

When cloning a PVC of 60GiB size, the system autofills the remote size to be 8192 PeB. This size cannot be changed in the UI before starting the clone.

Version-Release number of selected component (if applicable):

CNV - 4.14.3

How reproducible:

always

Steps to Reproduce:

1.Create a VM with a PVC of 60Gib
2.Power off the VM
3.As a cluster admin, clone the 60GiB PVC (Storage -> PersistentVolumeClaims -> Kebab menu next to pvc

Actual results:

The system tries to clone the 60 GiB PVC as a 8192 PeB

Expected results:

A new pvc of the 60 GiB

Additional info:

This seems like the closed BZ 2177979.I will upload a screenshot of the UI.
Here is the yaml for the original pvc.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
cdi.kubevirt.io/storage.bind.immediate.requested: "true"
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.pod.phase: Succeeded
cdi.kubevirt.io/storage.populator.progress: 100.0%
cdi.kubevirt.io/storage.preallocation.requested: "false"
cdi.kubevirt.io/storage.usePopulator: "true"
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
creationTimestamp: "2023-12-05T17:34:19Z"
finalizers:kubernetes.io/pvc-protectionprovisioner.storage.kubernetes.io/cloning-protection
labels:
app: containerized-data-importer
app.kubernetes.io/component: storage
app.kubernetes.io/managed-by: cdi-controller
app.kubernetes.io/part-of: hyperconverged-cluster
app.kubernetes.io/version: 4.14.0
kubevirt.io/created-by: 60f46f91-2db3-4118-aaba-b1697b29c496
name: win2k19-base
namespace: base-images
ownerReferences:apiVersion: cdi.kubevirt.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: DataVolume
name: win2k19-base
uid: 8980e7b7-ce0b-47b4-a7e4-f4c79e984ebe
resourceVersion: "697047"
uid: fccb0aa9-8541-4b51-b49e-ddceaa22b68c
spec:
accessModes:ReadWriteMany
dataSource:
apiGroup: cdi.kubevirt.io
kind: VolumeImportSource
name: volume-import-source-8980e7b7-ce0b-47b4-a7e4-f4c79e984ebe
dataSourceRef:
apiGroup: cdi.kubevirt.io
kind: VolumeImportSource
name: volume-import-source-8980e7b7-ce0b-47b4-a7e4-f4c79e984ebe
resources:
requests:
storage: "64424509440"
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Block
volumeName: pvc-dbfc9fe9-5677-469d-9402-c2f3a22dab3f
status:
accessModes:ReadWriteMany
capacity:
storage: 60Gi
phase: Bound



Here is the yaml for the cloning pvc.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
creationTimestamp: "2023-12-06T14:24:07Z"
finalizers:kubernetes.io/pvc-protection
name: win2k19-base-clone
namespace: base-images
resourceVersion: "1551054"
uid: f72665c3-6408-4129-82a2-e663d8ecc0cc
spec:
accessModes:ReadWriteMany
dataSource:
apiGroup: ""
kind: PersistentVolumeClaim
name: win2k19-base
dataSourceRef:
apiGroup: ""
kind: PersistentVolumeClaim
name: win2k19-base
resources:
requests:
storage: "9223372036854775807"
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Block
status:
phase: Pending

https://github.com/openshift/console/pull/13503

Bug OCPBUGS-27213: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2234

Spike MON-3644: Ease the tracking of monitoring components versions.

View the Description View the linked PRs

We frequently receive inquiries regarding the versions of monitoring components (such as Prometheus, Alertmanager, etc.) that are used in a giving OCP version.
Currently, obtaining this information requires several manual steps on our part, e.g.:

Identify the relevant GitHub repository.
Check out the appropriate branch.
Locate the file that contains the version.

What if we automate this?

How about a view that displays the versions of all components for all recent OCP versions.

https://github.com/openshift/cluster-monitoring-operator/pull/2220

Bug OCPBUGS-26143: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/219

Bug OCPBUGS-34612: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2392

Bug HOSTEDCP-1570: Remove liveness and readiness probes from image registry operator and ingress operator

View the Description View the linked PRs

The image registry operator and ingress operator use the `/metrics` endpoint for liveness/readiness probes which in the case of the former results in a payload of ~100kb. This at scale can be non-performant and is also not best practice. The teams which own these operators should instead introduce health endpoints if these probes are needed.

https://github.com/openshift/hypershift/pull/4116

Bug OCPBUGS-41885: IPI vSphere disconnected installation fails to use template in 4.16

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39239~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38918. The following is the description of the original issue:
—
Description of problem:

   When installing OpenShift 4.16 on vSphere using IPI method with a template it fails with below error:
2024-08-07T09:55:51.4052628Z             "level=debug msg=  Fetching Image...",
2024-08-07T09:55:51.4054373Z             "level=debug msg=  Reusing previously-fetched Image",
2024-08-07T09:55:51.4056002Z             "level=debug msg=  Fetching Common Manifests...",
2024-08-07T09:55:51.4057737Z             "level=debug msg=  Reusing previously-fetched Common Manifests",
2024-08-07T09:55:51.4059368Z             "level=debug msg=Generating Cluster...",
2024-08-07T09:55:51.4060988Z             "level=info msg=Creating infrastructure resources...",
2024-08-07T09:55:51.4063254Z             "level=debug msg=Obtaining RHCOS image file from 'https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.16-9.4/builds/416.94.202406251923-0/x86_64/rhcos-416.94.202406251923-0-vmware.x86_64.ova?sha256=893a41653b66170c7d7e9b343ad6e188ccd5f33b377f0bd0f9693288ec6b1b73'",
2024-08-07T09:55:51.4065349Z             "level=debug msg=image download content length: 12169",
2024-08-07T09:55:51.4066994Z             "level=debug msg=image download content length: 12169",
2024-08-07T09:55:51.4068612Z             "level=debug msg=image download content length: 12169",
2024-08-07T09:55:51.4070676Z             "level=error msg=failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed during pre-provisioning: failed to use cached vsphere image: bad status: 403"

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    All the time in user environment

Steps to Reproduce:

    1.Try to install disconnected IPI install on vSphere using a template.
    2.
    3.

Actual results:

    No cluster installation

Expected results:

    Cluster installed with indicated template

Additional info:

    - 4.14 works as expected in customer environment
    - 4.15 works as expected in customer environment

https://github.com/openshift/installer/pull/8999

Bug OCPBUGS-24956: Installer should have a pre-check which prevents installation on non-BareMetal platforms without the CloudCredential cap

View the Description View the linked PRs

The Cloud Credential operator was made optional in OCP 4.15, see https://issues.redhat.com/browse/OCPEDGE-69. The CloudCredential cap was added as a new capability.

However, for OCP 4.15 the disablement of CCO is only supported on BareMetal platforms, see https://issues.redhat.com/browse/OCPEDGE-69?focusedId=23595076&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23595076.

We propose to guard against installations on non-BareMetal platforms without the CloudCredential cap, which could be implemented similar to https://issues.redhat.com/browse/OCPBUGS-15659. 

Bug OCPBUGS-25125: regression - aws-ebs-csi-driver-node- fails to deploy too many times because of SCCs

View the Description View the linked PRs

Description of problem:

 The `aws-ebs-csi-driver-node-` appears to be failing to deploy way too often in the CI recently

Version-Release number of selected component (if applicable):

    4.14

How reproducible:

  in a statistically significant pattern

Steps to Reproduce:

    1. run OCP test suite many times for it to matter

Actual results:

    fail [github.com/openshift/origin/test/extended/authorization/scc.go:76]: 1 pods failed before test on SCC errors
Error creating: pods "aws-ebs-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[1].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/aws-ebs-csi-driver-node -n openshift-cluster-csi-drivers happened 4 times

Expected results:

Test pass

Additional info:

Link to the regression dashboard - https://sippy.dptools.openshift.org/sippy-ng/component_readiness/capability?baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=SCC&component=oauth-apiserver&confidence=95&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&pity=5&sampleEndTime=2023-12-11%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2023-12-05%2000%3A00%3A00

[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]

Bug OCPBUGS-35303: kubelet-bootstrap-kubeconfig should have ownership annotations

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34782~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1701

Bug OCPBUGS-24359: oc-mirror with v2 will create more data compared with v1 format

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/753

Bug OCPBUGS-26486: [Driver: pd.csi.storage.gke.io] [Testpattern: Dynamic PV (block volmode)] provisioning should provision storage with pvc data source in parallel [Slow] failing

View the Description View the linked PRs

Description of problem:

The following test started to fail freequently in the periodic tests:

External Storage [Driver: pd.csi.storage.gke.io] [Testpattern: Dynamic PV
 (block volmode)] provisioning should provision storage with pvc data 
source in parallel

Version-Release number of selected component (if applicable):

    4.15

How reproducible:

    Sometimes, but way too often in the CI

Steps to Reproduce:

    1. Run the periodic-ci-openshift-release-master-nightly-X.X-e2e-gcp-ovn-csi test

Actual results:

    Provisioning of some volumes fails with

time="2024-01-05T02:30:07Z" level=info msg="resulting interval message" message="{ProvisioningFailed  failed to provision volume with StorageClass \"e2e-provisioning-9385-e2e-scw2z8q\": rpc error: code = Internal desc = CreateVolume failed to create single zonal disk pvc-35b558d6-60f0-40b1-9cb7-c6bdfa9f28e7: failed to insert zonal disk: unknown Insert disk operation error: rpc error: code = Internal desc = operation operation-1704421794626-60e299f9dba08-89033abf-3046917a failed (RESOURCE_OPERATION_RATE_EXCEEDED): Operation rate exceeded for resource 'projects/XXXXXXXXXXXXXXXXXXXXXXXX/zones/us-central1-a/disks/pvc-501347a5-7d6f-4a32-b0e0-cf7a896f316d'. Too frequent operations from the source resource. map[reason:ProvisioningFailed]}"

Expected results:

    Test passes

Additional info:

    Looks like we're hitting the API quota limits with the test

Failed test run example:

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-ovn-csi/1743082616304701440

Link to Sippy:

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2023-10-31%2023%3A59%3A59&baseRelease=4.14&baseStartTime=2023-10-04%2000%3A00%3A00&capability=Dynamic%20PV%20%28block%20volmode%29&component=Storage%20%2F%20Kubernetes%20External%20Components&confidence=95&environment=ovn%20no-upgrade%20amd64%20gcp%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-01-08%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-01-02%2000%3A00%3A00&testId=openshift-tests%3A7845229f6a2c8faee6573878f566d2f3&testName=External%20Storage%20%5BDriver%3A%20pd.csi.storage.gke.io%5D%20%5BTestpattern%3A%20Dynamic%20PV%20%28block%20volmode%29%5D%20provisioning%20should%20provision%20storage%20with%20pvc%20data%20source%20in%20parallel%20%5BSlow%5D&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/112

Bug OCPBUGS-27933: Update 4.16 ose-ovn-kubernetes-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ovn-kubernetes/pull/2027

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ovn-kubernetes/pull/2027

Bug OCPBUGS-33254: CBO starts before all master nodes are provisioned

View the Description View the linked PRs

Observed in

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-serial-ovn-ipv6/1786198211774386176

there was a delay in creating master-0 ,

control plane services started on master-2, at this point (as master-0 wasn't yet in a provisioned state) we had 2 sets of provisioning services provisioning master-0 and presumably stomping over each other.

master-0 never came up

Bug OCPBUGS-24904: Update 4.16 ose-machine-api-provider-openstack-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-openstack/pull/100

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-openstack/pull/100

Bug OCPBUGS-29946: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4216

Bug OCPBUGS-38302: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8897

Bug OCPBUGS-26489: SNO Control Plane Fails to Come Up After Reboot On Cloud Deployments

View the Description View the linked PRs

Description of problem:

The MCO's gcp-e2e-op-single-node job https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node has been failing consistently since early Jan.

It always fails on TestKernelArguments but that happens to be the first time where it gets the node to reboot, after which the node never comes up, so we don't get must-gather and (for some reason) don't get any console gathers either.

This is only 4.16 and only single node. Doing the same test on HA gcp clusters yield no issues. The test itself doesn't seem to matter as the next test would fail the same way if it was skipped.

This can be reproduced so far only via a 4.16 clusterbot cluster.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

100%

Steps to Reproduce:

    1. install SNO 4.16 cluster
    2. run MCO's TestKernelArguments
    3.

Actual results:

Node never comes back up

Expected results:

Test passes

Additional info:

https://github.com/openshift/machine-config-operator/pull/4142

Story CONSOLE-4020: i18n upload/download routine task - Sprint 252

View the Description View the linked PRs

The story is to track i18n upload/download routine tasks which are perform every sprint.

A.C.

- Upload strings to Memosource at the start of the sprint and reach out to localization team

- Download translated strings from Memsource when it is ready

- Review the translated strings and open a pull request

- Open a followup story for next sprint

https://github.com/openshift/console/pull/13760

Bug OCPBUGS-35839: UI is distorted for build labels in topology for Safari

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32550~~. The following is the description of the original issue:
—
Description of problem:

In the Safari browser when creating an app with either pipeline or build option the topology shows the status on the left-hand corner of the topology(More details can be checked in the screenshot or video)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create an app 
    2. Go to topology 
    3.

Actual results:

UI is distorted with build labels, not in the appropriate position

Expected results:

UI should show labels properly

Additional info:

Safari 17.4.1

https://github.com/openshift/console/pull/13988

Bug OCPBUGS-38502: Power VS: Madrid cannot use e980 as a system type

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38439~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38436. The following is the description of the original issue:
—
Description of problem:

    e980 is a valid system type for the madrid region but it is not listed as such in the installer.

Version-Release number of selected component (if applicable):

How reproducible:

    Easily

Steps to Reproduce:

    1. Try to deploy to mad02 with SysType set to e980
    2. Fail
    3.

Actual results:

    Installer exits

Expected results:

    Installer should continue as it's a valid system type.

Additional info:

https://github.com/openshift/installer/pull/8846

Bug OCPBUGS-35408: [release-4.16] Console's pseudolocalization not working

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-30218~~. The following is the description of the original issue:
—
Description of problem:

Pseudolocalization is not working in console.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Go to any console's page and add '?pseudolocalization=true' suffix to the URL
    2.     3.

Actual results:

    The page stays set with the same language

Expected results:

    The page should be pseudolocalized language

Additional info:

    Looks like this is the issue https://github.com/MattBoatman/i18next-pseudo/issues/4

https://github.com/openshift/console/pull/13972

Bug OCPBUGS-26986: whereabouts reconciler schedule is not configurable

View the Description View the linked PRs

Description of problem:

whereabouts reconciler is responsible for reclaiming dangling IPs, and freeing them to be available to allocate to new pods.
This is crucial for scenarios where the amount of addresses are limited and dangling IPs prevent whereabouts from successfully allocating new IPs to new pods.

The reconciliation schedule is currently hard-coded to run once a day, without a user-friendly way to configure.

Version-Release number of selected component (if applicable):

How reproducible:

    Create a Whereabouts reconciler daemon set, not able to configure the reconciler schedule.

Steps to Reproduce:

    1. Create a Whereabouts reconciler daemonset
       instructions: https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-      network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network

     2. Run `oc get pods -n openshift-multus | grep whereabouts-reconciler`

     3. Run `oc logs whereabouts-reconciler-xxxxx`

Actual results:

    You can't configure the cron-schedule of the reconciler.

Expected results:

    Be able to modify the reconciler cron schedule.

Additional info:

    The fix for this bug is in two places: whereabouts, and cluster-network-operator.
    From this reason, in order to verify correctly we need to use both fixed components.
    Please read below for more details about how to apply the new configurations.

How to Verify:

    Create a whereabouts-config ConfigMap with a custom value, and check in the
    whereabouts-reconciler pods' logs that it is updated, and triggering the clean up.

Steps to Verify:

    1. Create a Whereabouts reconciler daemonset
    2. Wait for the whereabouts-reconciler pods to be running. (takes time for the daemonset to get created).
    3. See in logs: "[error] could not read file: <nil>, using expression from flatfile: 30 4 * * *"
       This means it uses the hardcoded default value. (Because no ConfigMap yet)
    4. Run: oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/2 * * * *"
    5. Check in the logs for: "successfully updated CRON configuration" 
    6. Check that in the next 2 minutes the reconciler runs: "[verbose] starting reconciler run"

Bug OCPBUGS-32295: vSphere ControlPlaneMachineSet Operator Not Allowing Changes to CPMS

View the Description View the linked PRs

Description of problem:

When installing a cluster, if the CPMS is created with a template without a path, the ControlPlaneMachineSet operator is rejecting any modifications to / deletion of the CPMS CR.

Version-Release number of selected component (if applicable):

4.16

How reproducible:

always

Steps to Reproduce:

1. Install cluster with generated FD
2. Once the cluster is installed, attempt to delete the CPMS

Actual results:

Deletion of CPMS is rejected due to invalid template definition

Expected results:

Deletion of CPMS completes without error.

Additional info:

The job "pull-ci-openshift-cluster-control-plane-machine-set-operator-main-e2e-vsphere-operator" is currently failing with:

~~~
control plane machine set should be able to be updated
Expected success, but got an error:
    <*errors.StatusError | 0xc000233c20>: 
    admission webhook "controlplanemachineset.machine.openshift.io" denied the request: spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.template: Invalid value: "ci-op-7xjyyytp-91aad-zrdm2-rhcos-generated-region-generated-zone": template must be provided as the full path
    {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {
                SelfLink: "",
                ResourceVersion: "",
                Continue: "",
                RemainingItemCount: nil,
            },
            Status: "Failure",
            Message: "admission webhook \"controlplanemachineset.machine.openshift.io\" denied the request: spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.template: Invalid value: \"ci-op-7xjyyytp-91aad-zrdm2-rhcos-generated-region-generated-zone\": template must be provided as the full path",
            Reason: "Forbidden",
            Details: nil,
            Code: 403,
        },
    } failed [FAILED] Timed out after 60.000s.
~~~

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/288

Bug OCPBUGS-37758: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/579

Story OCPNODE-1877: Remove openshift-proxy-pull-test-container

View the Description View the linked PRs

As an openshift developer, I want to remove the image openshift-proxy-pull-test-container from the build, so we will not be affected by the possible bugs during the image build.

we requested the ART team to add this image in the ticket https://issues.redhat.com/browse/ART-2961

https://github.com/openshift/machine-config-operator/pull/4047

Bug OCPBUGS-24805: Update 4.16 ose-kubevirt-csi-driver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubevirt-csi-driver/pull/26

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubevirt-csi-driver/pull/26

Bug OCPBUGS-24872: Update 4.16 prometheus-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/264

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/264

Bug OCPBUGS-24976: Update 4.16 cluster-network-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-network-operator/pull/2156

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-28836: Fix usersettings identifier creation

View the Description View the linked PRs

Description of problem:

Usernames can contain all kinds of characters that are not allowed in resource names. Hash the name instead and use hex representation of the result to get a usable identifier.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

    1. log in to the web console configured with a login to a 3rd party OIDC provider
    2. go to the User Preferences page / check the logs in the javascript console

Actual results:

The User Preferences page shows empty values instead of defaults.
The javascript console reports things like
```
consoleFetch failed for url /api/kubernetes/api/v1/namespaces/openshift-console-user-settings/configmaps/user-settings-kubeadmin r: configmaps "user-settings-kubeadmin" not found
```

Expected results:

   I am able to persist my user preferences.

Additional info:

https://github.com/openshift/console/pull/13557

Bug OCPBUGS-25572: Update 4.16 ose-cluster-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-baremetal-operator/pull/394

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-baremetal-operator/pull/394

Bug OCPBUGS-36184: [release-4.16] Implement e2e test for the connect timeout

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/api/pull/1829 needs to be backported to 4.15 and 4.14. The API team asked (https://redhat-internal.slack.com/archives/CE4L0F143/p1715024118699869) to have an test before they can review and approve a backport. This bug's goal is to implement an e2e test which would use the connect timeout tunning option.

Version-Release number of selected component (if applicable):

4.17

How reproducible:

Always

Steps to Reproduce:

N/A

Actual results:

Expected results:

Additional info:

The e2e test could have been a part of the initial implementation PR (https://github.com/openshift/cluster-ingress-operator/pull/1035).

https://github.com/openshift/cluster-ingress-operator/pull/1094

Task MON-3778: Request for sending data via telemetry - OLS

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about openshift lightspeed because we want to understand how users are making use of the product(configuration options they enable) as well as the experience they are having when using it(e.g. response times)

selected_model_info

Represents the llm provider+model the customer is currently using

Labels

<provider> one of the supported provider names (e.g. openai, watsonx, azureopenai)
<model> one of the supported llm model names (e.g. gpt-3.5-turbo, granite-13b-chat-v2)

The cardinality of the metric is around 6 currently, may grow somewhat as we add supported providers+models in the future (not all provider + model combinations are valid, so it's not cardinality of models*providers)

model_enabled

Represents all the provider/model combinations the customer has configured in ols (but are not necessarily currently using)

Labels

<model>, one of the supported llm model names (e.g. gpt-3.5-turbo, granite-13b-chat-v2)
<provider>, one of our 3 supported providers (openai, watsonx, azureopenai)

The cardinality of the metric is around 4 currently since not all provider/model combinations are valid. May grow somewhat as we add supported models in the future.

rest_api_calls_total

number of api calls with path + response code

Labels

<status_code>, the http response code returned (e.g. 200, 401, 403, 500)
<path>, one of our request paths, e.g. /v1/query, /v1/feedback

cardinality is around 12 (paths times number of likely response codes)

https://github.com/openshift/cluster-monitoring-operator/pull/2300

Bug OCPBUGS-11936: afterburn-hostname service failed when OVN-Kubernetes networkType

View the Description View the linked PRs

Description of problem:

While deploying a cluster with OVNKubnernetes or applying a cloud-provider-config change, all OCP nodes got a failing unit on them:

$  oc debug -q node/ostest-h9vbm-master-0 -- chroot  /host sudo systemctl list-units --failed
  UNIT                       LOAD   ACTIVE SUB    DESCRIPTION
● afterburn-hostname.service loaded failed failed Afterburn HostnameLOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.

$ oc debug -q node/ostest-h9vbm-master-0 -- chroot  /host sudo systemctl status afterburn-hostname
× afterburn-hostname.service - Afterburn Hostname
     Loaded: loaded (/etc/systemd/system/afterburn-hostname.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Tue 2023-04-18 11:48:35 UTC; 2h 26min ago
   Main PID: 1309 (code=exited, status=123)
        CPU: 148msApr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     1: maximum number of retries (10) reached
Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     2: failed to fetch
Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     3: error sending request for url (http://169.254.169.254/latest/meta-data/hostname): error trying to connect: tcp connect error: Network is unreachable (os error 101)
Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     4: error trying to connect: tcp connect error: Network is unreachable (os error 101)
Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     5: tcp connect error: Network is unreachable (os error 101)
Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     6: Network is unreachable (os error 101)
Apr 18 11:48:35 ostest-h9vbm-master-0 hostnamectl[2494]: Too few arguments.
Apr 18 11:48:35 ostest-h9vbm-master-0 systemd[1]: afterburn-hostname.service: Main process exited, code=exited, status=123/n/a
Apr 18 11:48:35 ostest-h9vbm-master-0 systemd[1]: afterburn-hostname.service: Failed with result 'exit-code'.
Apr 18 11:48:35 ostest-h9vbm-master-0 systemd[1]: Failed to start Afterburn Hostname.


$ oc debug -q node/ostest-h9vbm-worker-0-fkxdr -- chroot  /host sudo systemctl list-units --failed
  UNIT                       LOAD   ACTIVE SUB    DESCRIPTION
● afterburn-hostname.service loaded failed failed Afterburn HostnameLOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.

Once the installation of the config change is done, restarting the service resolves the issue:

$ oc debug -q node/ostest-h9vbm-worker-0-fkxdr -- chroot  /host sudo systemctl restart afterburn-hostname

$ oc debug -q node/ostest-h9vbm-worker-0-fkxdr -- chroot  /host sudo systemctl status afterburn-hostname
○ afterburn-hostname.service - Afterburn Hostname
     Loaded: loaded (/etc/systemd/system/afterburn-hostname.service; enabled; preset: disabled)
     Active: inactive (dead) since Tue 2023-04-18 14:14:40 UTC; 9s ago
    Process: 171875 ExecStart=/usr/local/bin/openstack-afterburn-hostname (code=exited, status=0/SUCCESS)
   Main PID: 171875 (code=exited, status=0/SUCCESS)
        CPU: 119msApr 18 14:14:32 ostest-h9vbm-worker-0-fkxdr systemd[1]: Starting Afterburn Hostname...
Apr 18 14:14:39 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:39.521 WARN failed to locate config-drive, using the metadata service API instead
Apr 18 14:14:39 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:39.583 INFO Fetching http://169.254.169.254/latest/meta-data/hostname: Attempt #1
Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:40.237 INFO Fetch successful
Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:40.237 INFO wrote hostname ostest-h9vbm-worker-0-fkxdr to /dev/stdout
Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr systemd[1]: afterburn-hostname.service: Deactivated successfully.
Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr systemd[1]: Finished Afterburn Hostname.
error: non-zero exit code from debug container
[stack@undercloud-0 ~]$ oc debug -q node/ostest-h9vbm-master-0 -- chroot  /host sudo systemctl status afterburn-hostname
× afterburn-hostname.service - Afterburn Hostname
     Loaded: loaded (/etc/systemd/system/afterburn-hostname.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Tue 2023-04-18 11:48:35 UTC; 2h 26min ago
   Main PID: 1309 (code=exited, status=123)
        CPU: 148ms

Version-Release number of selected component (if applicable):

Observed on 4.13.0-0.nightly-2023-04-13-171034 and 4.12.13

How reproducible:

Always

Additional info:

More retries or expanding them in time can help resolve this. It seems that in OVN-K the network is taking time to get ready and therefore the retries are timed out with the current configuration before the network is ready.

Must-gather link provided on private comment.

https://github.com/openshift/machine-config-operator/pull/4092

Bug OCPBUGS-23812: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/81

Bug OCPBUGS-37671: [release4.16] Ingress controller related certificates' validate dates gathering

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35727~~. The following is the description of the original issue:
—
Business required:

We had a recommendation to check the certificate of the default ingress controller expiration after it has expired. From the referenced KCS, it seems that many customers(hundreds) hit this issue. So, Oscar Arribas Arribas suggests that if we can have a recommendation to alert customers before certificate expiration.

Gathering method:

1. Gather all the ingresscontroller objects(we already gathered the default ingresscontroller) with commands:
oc get ingresscontrollers -n openshift-ingress-operator
2. Gather operator auto-generated certificate's validate dates with commands:

$ oc get ingresscontrollers -n openshift-ingress-operator -o yaml | grep -A1 defaultCertificate
#### empty output here when certificate created by the operator

$ oc get secret router-ca -n openshift-ingress-operator -o yaml | grep crt | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Dec 28 00:00:00 2022 GMT
notAfter=Jan 22 23:59:59 2024 GMT

$ oc get secret router-certs-default -n openshift-ingress -o yaml | grep crt | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Dec 28 00:00:00 2022 GMT
notAfter=Jan 22 23:59:59 2024 GMT

3. Gather custom certificates' validate dates with commands:

$ oc get ingresscontrollers -n openshift-ingress-operator -o yaml | grep -A1 defaultCertificate
    defaultCertificate:
      name: [custom-cert-secret-1]

#### for each [custom-cert-secret] above
$ oc get secret [custom-cert-secret-1] -n openshift-ingress -o yaml | grep crt | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
notBefore=Dec 28 00:00:00 2022 GMT
notAfter=Jan 22 23:59:59 2024 GMT

Other Information:

An RFE to create a cluster alert is under reveiwing: https://issues.redhat.com/browse/RFE-4269

https://github.com/openshift/insights-operator/pull/970

Bug OCPBUGS-37302: 4.16 NodePool CEL validation breaking existing NodePools

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36897~~. The following is the description of the original issue:
—
Description of problem:

   4.16 NodePool CEL validation breaking existing/older NodePools

Version-Release number of selected component (if applicable):

   4.16.0

How reproducible:

   100%

Steps to Reproduce:

    1. Deploy 4.16 NodePool CRDs
    2. Create NodePool resource without spec.replicas + spec.autoScaling
    3.

Actual results:

    The NodePool "22276350-mynodepool" is invalid: spec: Invalid value: "object": One of replicas or autoScaling should be set but not both

Expected results:

    NodePool to apply successfully

Additional info:

    Breaking change: https://github.com/openshift/hypershift/pull/3786

https://github.com/openshift/hypershift/pull/4395

Bug OCPBUGS-25582: Update 4.16 ose-cluster-machine-approver-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/223

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/223

Bug OCPBUGS-26236: VolumeSnapshots data is not displayed in PVC > VolumeSnapshots tab

View the Description View the linked PRs

Description of problem:

    VolumeSnapshots data  is not displayed in PVC >  VolumeSnapshots tab

Version-Release number of selected component (if applicable):

    4.16.0-0.ci-2024-01-05-050911

How reproducible:

Steps to Reproduce:

    1. Create a PVC i.e. "my-pvc"
    2. Create a Pod and bind it to the "my-pvc"
    3. Create a VolumeSnapshots and associate it with the "my-pvc"
    4. Goto to PVC detail > VolumeSnapshots tab

Actual results:

  VolumeSnapshots data  is not displayed in PVC >  VolumeSnapshots tab

Expected results:

 VolumeSnapshots data should be displayed in PVC >  VolumeSnapshots tab

Additional info:

https://github.com/openshift/console/pull/13485

Bug OCPBUGS-25378: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2225

Bug TRT-1359: monitor test azure-metrics-collector failed with throttle error

View the Description View the linked PRs

[Jira:"Test Framework"] monitor test azure-metrics-collector collection failure in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/28395/pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd/1724427658311241728

Looks like Azure is throttling our request. We should probably try some retry mechanism.

Relevant thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1699977299650309

https://github.com/openshift/origin/pull/28459

Bug OCPBUGS-25079: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2208

Bug OCPBUGS-30762: Fix build issues in Console dynamic plugin SDK v1

View the Description View the linked PRs

We have detected several bugs in Console dynamic plugin SDK v1 as part of Kubevirt plugin PR #1804

These bugs affect dynamic plugins which target Console 4.15+

1. Build errors related to Hot Module Replacement chunk files

ERROR in [entry] [initial] kubevirt-plugin.494371abc020603eb01f.hot-update.js
Missing call to loadPluginEntry

2. Build warnings issued by `dynamic-module-import-loader`

LOG from @openshift-console/dynamic-plugin-sdk-webpack/lib/webpack/loaders/dynamic-module-import-loader ../node_modules/ts-loader/index.js??ruleSet[1].rules[0].use[0]!./utils/hooks/useKubevirtWatchResource.ts
<w> Detected parse errors in /home/vszocs/work/kubevirt-plugin/src/utils/hooks/useKubevirtWatchResource.ts

3. Build warnings related to PatternFly shared modules

WARNING in shared module @patternfly/react-core
No required version specified and unable to automatically determine one. Unable to find required version for "@patternfly/react-core" in description file (/home/vszocs/work/kubevirt-plugin/node_modules/@openshift-console/dynamic-plugin-sdk/package.json). It need to be in dependencies, devDependencies or peerDependencies.

How to reproduce

1. git clone Kubevirt plugin repo
2. switch to commit containing changes from PR #1804
3. yarn install && yarn dev to update dependencies and start local dev server

https://github.com/openshift/console/pull/13657

Bug OCPBUGS-39191: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-controller-manager/pull/329

Bug OCPBUGS-25629: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-nutanix/pull/26

Bug OCPBUGS-28741: Default release image does not fall within operator support window

View the Description View the linked PRs

Description of problem:

    When no release image is provided on a HostedCluster, the backend hypershift operator picks the latest OCP release image within the operator's support windown.

Today this fails due to how the operator selects this default image. For example, hypershift operator 4.14 does not support 4.15, but the 4.15.0.rc.3 is picked as a default release image today. 

This is a result of not anticipating that release candidates would not be reported as the latest stable release. The filter used to pick the latest release needs to consider patch level releases before the next y stream release.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

    1.create a self managed hcp cluster and do not specify a release image

Actual results:

    the hcp will be rejected because the default release image picked does not fall within the support window

Expected results:

    hcp should be created with the latest release image in the support window

Additional info:

https://github.com/openshift/hypershift/pull/3450

Bug OCPBUGS-24864: Update 4.16 ose-multus-admission-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/multus-admission-controller/pull/78

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-23004: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/37

Task MON-3699: Unify how fields are hidden from CMO doc

View the linked PRs

https://github.com/openshift/cluster-monitoring-operator/pull/2247

Bug OCPBUGS-34155: Panic in MAPO deleting Machine in ERROR state

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33806~~. The following is the description of the original issue:
—

I0516 19:40:24.080597       1 controller.go:156] mbooth-psi-ph2q7-worker-0-9z9nn: reconciling Machine
I0516 19:40:24.113866       1 controller.go:200] mbooth-psi-ph2q7-worker-0-9z9nn: reconciling machine triggers delete
I0516 19:40:32.487925       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="machine-controller" "name"="mbooth-psi-ph2q7-worker-0-9z9nn" "namespace"="openshift-machine-api" "object"={"name":"mbooth-psi-ph2q7-worker-0-9z9nn","namespace":"openshift-machine-api"} "reconcileID"="f477312c-dd62-49b2-ad08-28f48c506c9a"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x242a275]

goroutine 317 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x29cfb00?, 0x40f1d50?})
        /usr/lib/golang/src/runtime/panic.go:914 +0x21f
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute.(*Service).constructPorts(0x3056b80?, 0xc00074d3d0, 0xc0004fe100)
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute/instance.go:188 +0xb5
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute.(*Service).DeleteInstance(0xc00074d388, 0xc000c61300?, {0x3038ae8, 0xc0008b7440}, 0xc00097e2a0, 0xc0004fe100)
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute/instance.go:678 +0x42d
github.com/openshift/machine-api-provider-openstack/pkg/machine.(*OpenstackClient).Delete(0xc0001f2380, {0x304f708?, 0xc000c6df80?}, 0xc0008b7440)
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/pkg/machine/actuator.go:341 +0x305
github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc00045de50, {0x304f708, 0xc000c6df80}, {{{0xc00066c7f8?, 0x0?}, {0xc000dce980?, 0xc00074dd48?}}})
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:216 +0x1cfe
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x3052e08?, {0x304f708?, 0xc000c6df80?}, {{{0xc00066c7f8?, 0xb?}, {0xc000dce980?, 0x0?}}})
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0004eb900, {0x304f740, 0xc00045c500}, {0x2ac0340?, 0xc0001480c0?})
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0004eb900, {0x304f740, 0xc00045c500})
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 269
        /go/src/sigs.k8s.io/cluster-api-provider-openstack/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x565

> kc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-ec.6   True        False         7d3h    Cluster version is 4.16.0-ec.6

> kc -n openshift-machine-api get machines.m mbooth-psi-ph2q7-worker-0-9z9nn -o yaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: ERROR
    openstack-resourceId: dc08c2a2-cbda-4892-a06b-320d02ec0c6c
  creationTimestamp: "2024-05-16T16:53:16Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-05-16T19:23:44Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: mbooth-psi-ph2q7-worker-0-
  generation: 3
  labels:
    machine.openshift.io/cluster-api-cluster: mbooth-psi-ph2q7
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: mbooth-psi-ph2q7-worker-0
    machine.openshift.io/instance-type: ci.m1.xlarge
    machine.openshift.io/region: regionOne
    machine.openshift.io/zone: ""
  name: mbooth-psi-ph2q7-worker-0-9z9nn
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: machine.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: mbooth-psi-ph2q7-worker-0
    uid: f715dba2-b0b2-4399-9ab6-19daf6407bd7
  resourceVersion: "8391649"
  uid: 6d1ad181-5633-43eb-9b19-7c73c86045c3
spec:
  lifecycleHooks: {}
  metadata: {}
  providerID: openstack:///dc08c2a2-cbda-4892-a06b-320d02ec0c6c
  providerSpec:
    value:
      apiVersion: machine.openshift.io/v1alpha1
      cloudName: openstack
      cloudsSecret:
        name: openstack-cloud-credentials
        namespace: openshift-machine-api
      flavor: ci.m1.xlarge
      image: ""
      kind: OpenstackProviderSpec
      metadata:
        creationTimestamp: null
      networks:
      - filter: {}
        subnets:
        - filter:
            tags: openshiftClusterID=mbooth-psi-ph2q7
      rootVolume:
        diskSize: 50
        sourceUUID: rhcos-4.16
        volumeType: tripleo
      securityGroups:
      - filter: {}
        name: mbooth-psi-ph2q7-worker
      serverGroupName: mbooth-psi-ph2q7-worker
      serverMetadata:
        Name: mbooth-psi-ph2q7-worker
        openshiftClusterID: mbooth-psi-ph2q7
      tags:
      - openshiftClusterID=mbooth-psi-ph2q7
      trunk: true
      userDataSecret:
        name: worker-user-data
status:
  addresses:
  - address: mbooth-psi-ph2q7-worker-0-9z9nn
    type: Hostname
  - address: mbooth-psi-ph2q7-worker-0-9z9nn
    type: InternalDNS
  conditions:
  - lastTransitionTime: "2024-05-16T16:56:05Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2024-05-16T19:24:26Z"
    message: Node drain skipped
    status: "True"
    type: Drained
  - lastTransitionTime: "2024-05-16T17:14:59Z"
    status: "True"
    type: InstanceExists
  - lastTransitionTime: "2024-05-16T16:56:05Z"
    status: "True"
    type: Terminable
  lastUpdated: "2024-05-16T19:23:52Z"
  phase: Deleting

https://github.com/openshift/machine-api-provider-openstack/pull/114

Bug OCPBUGS-36779: 3rd master still not joining to the cluster on ABI

View the Description View the linked PRs

Previously, in ~~OCPBUGS-32105~~, we fixed a bug where a race between the assisted-installer and the assisted-installer-controller to mark a Node as Joined would result in 30+ minutes of (unlogged) retries by the former if the latter won. This was indistinguishable from the installation process hanging and it would eventually timed out.

This bug has been fixed, but we were unable to reproduce the circumstances that caused it.

However, a reproduction by the customer reveals another problem: we now correctly retry checking the control plane nodes for readiness if we encounter a conflict with another write from assisted-installer-controller. However, we never reload fresh data from assisted-service - data that would show the host has already been updated and thus prevent us from trying to update it again. Therefore, we continue to get a conflict on every retry. (This is at least now logged, so we can see what is happening.)

This also suggests a potential way to reproduce the problem: whenever one control plane node has booted to the point that the assisted-installer-controller is running before the second control plane node has booted to the point that the Node is marked as ready in the k8s API, there is a possibility of a race. There is in fact no need for the write from assisted-installer-controller to come in the narrow window between when assisted-installer reads vs. writes to the assisted-service API, because assisted-installer is always using a stale read.

https://github.com/openshift/assisted-installer/pull/881

Bug OCPBUGS-31431: Hide dev perspective Pipelines nav option if dynamic plugin nav option is enable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13701

Bug OCPBUGS-30135: OpenShiftSDN error should say "unsupported" rather than "deprecated"

View the Description View the linked PRs

Description of problem:

    Installer now errors when attempting to use networkType: OpenShiftSDN; but the message still says "deprecated".

Version-Release number of selected component (if applicable):

4.15+

How reproducible:

100%

Steps to Reproduce:

    1. Attempt to install 4.15+ with networkType: OpenShiftSDN
Observe error in logs: time="2024-03-01T14:37:25Z" level=error msg="failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: networking.networkType: Invalid value: \"OpenShiftSDN\": networkType OpenShiftSDN is deprecated, please use OVNKubernetes"

Actual results:

Observe error in logs:

time="2024-03-01T14:37:25Z" level=error msg="failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: networking.networkType: Invalid value: \"OpenShiftSDN\": networkType OpenShiftSDN is deprecated, please use OVNKubernetes"

Expected results:

A message more like:

Observe error in logs: time="2024-03-01T14:37:25Z" level=error msg="failed to fetch Master Machines: failed to load asset \"Install Config\": failed to create install config: invalid \"install-config.yaml\" file: networking.networkType: Invalid value: \"OpenShiftSDN\": networkType OpenShiftSDN is not supported, please use OVNKubernetes"

Additional info:
See thread

https://github.com/openshift/installer/pull/8092

Bug OCPBUGS-43367: manifests should not use APIs that are removed in upcoming releases

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42784~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42745. The following is the description of the original issue:
—
flowschemas.v1beta3.flowcontrol.apiserver.k8s.io used in manifests/09_flowschema.yaml

https://github.com/openshift/cluster-authentication-operator/pull/717

Story OCPNODE-2281: If there are several ICSP objects in a single file, only the first object in the file is taken into account when converting ICSP to IDMS.

View the Description View the linked PRs

Description of problem:

If there are several ICSP objects in a single file, only the first object in the file is taken into account when converting ICSP to IDMS.

Version-Release number of selected component (if applicable):

    > 4.13

How reproducible:

    Its reproducible and follow below steps

Steps to Reproduce:

    1. Create a manifest file with multiple ICSP objects ( the manifest  file the ICSP generated by oc-mirror will be having multiple ICSP objects)
    2. Convert the ICSP to IDMS using below command

       oc adm migrate icsp <file1> --dest-dir=<dest dir>

     
    3. Look at the generated IDMS manifest file in the destination directory. it will contain only the first ICSP object which is present in file and rest of the ICSP objects are ignored.

Actual results:

Only first object of ICSP got converted in to IDMS and rest of them are ignored.

Expected results:

 If we have multiple entries for ICSP in a single file, while converting to IDMS it should convert all the ICSP object to IDMS.

Additional info:

https://github.com/openshift/oc/pull/1751

Bug OCPBUGS-29103: HCP CSR Allows Invalid CNs

View the Description View the linked PRs

Description of problem:

    The HCP CSR flow allows any CN in the incoming CSR.

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    Using the CSR flow, any name you add to the CN in the CSR will be your username against the Kubernetes API server - check your username using the SelfSubjectRequest API (kubectl auth whoami)

Steps to Reproduce:

    1.create CSR with CN=whatever
    2.CSR signed, create kubeconfig
    3.using kubeconfig, kubectl auth whoami should show whatever CN

Actual results:

    any CN in CSR is the username against the cluster

Expected results:

    we should only allow CNs with some known prefix (system:customer-break-glass:...)

Additional info:

https://github.com/openshift/hypershift/pull/3538

Bug OCPBUGS-38517: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/8848

Bug OCPBUGS-28634: Unable to add node to HCP cluster in a disconnected environment

View the Description View the linked PRs

Description of problem:

When creating hypershift cluster in disconnected env, the worker node cannot pass the validation of the assisted service due to an ignition error.

Version-Release number of selected component (if applicable):

4.14.z

How reproducible:

  100 %

Steps to Reproduce:

    1. Steps to install HCP cluster is mentioned in documentation: https://hypershift-docs.netlify.app/labs/dual/mce/agentserviceconfig/#assisted-service-customization
    2.
    3.

Actual results:

Node addition fails

Expected results:

Node should get added to the cluster

Additional info:

https://github.com/openshift/hypershift/pull/3687

Bug OCPBUGS-32175: Update i18n docs : Change "traditional Chinese" to "simplified Chinese"

View the Description View the linked PRs

Change "the supported traditional Chinese" to "the supported simplified Chinese"

https://github.com/openshift/console/pull/13754

Bug OCPBUGS-38373: 4.12 -> 4.13 upgrade using IPI on Azure does not work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37534~~. The following is the description of the original issue:
—
Description of problem:

Prow jobs upgrading from 4.9 to 4.16 are failing when they upgrade from 4.12 to 4.13.

Nodes become NotReady when MCO tries to apply the new 4.13 configuration to the MCPs.

The failing job is: periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.9-azure-ipi-f28

We have reproduced the issue and we found an ordering cycle error in the journal log

Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 systemd-journald.service[838]: Runtime Journal (/run/log/journal/960b04f10e4f44d98453ce5faae27e84) is 8.0M, max 641.9M, 633.9M free.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found ordering cycle on network-online.target/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on node-valid-hostname.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on ovs-configuration.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on firstboot-osupdate.target/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-firstboot.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-pull.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Job network-online.target/start deleted to break ordering cycle starting with machine-config-daemon-pull.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: Queued start job for default target Graphical Interface.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: (This warning is only shown for the first unit using IP firewalling.)
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: Deactivated successfully.

Version-Release number of selected component (if applicable):

    Using IPI on Azure, these are the version involved in the current issue upgrading from 4.9 to 4.13:
    
      version: 4.13.0-0.nightly-2024-07-23-154444
      version: 4.12.0-0.nightly-2024-07-23-230744
      version: 4.11.59
      version: 4.10.67
      version: 4.9.59

How reproducible:

    Always

Steps to Reproduce:

    1. Upgrade an IPI on Azure cluster from 4.9 to 4.13. Theoretically, upgrading from 4.12 to 4.13 should be enough, but we reproduced it following the whole path.

Actual results:


    Nodes become not ready
$ oc get nodes
NAME                                                 STATUS                        ROLES    AGE     VERSION
ci-op-g94jvswm-cc71e-998q8-master-0                  Ready                         master   6h14m   v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-master-1                  Ready                         master   6h13m   v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-master-2                  NotReady,SchedulingDisabled   master   6h13m   v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus1-c7ngb   NotReady,SchedulingDisabled   worker   6h2m    v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus2-2ppf6   Ready                         worker   6h4m    v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus3-nqshj   Ready                         worker   6h6m    v1.25.16+306a47e

And in the NotReady nodes we can see the ordering cycle error mentioned in the description of this ticket.

Expected results:

No ordering cycle error should happen and the upgrade should be executed without problems.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4528

Bug OCPBUGS-26490: Update 4.16 ose-network-tools-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/network-tools/pull/108

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/network-tools/pull/108

Bug OCPBUGS-28664: openshift/csi-driver-shared-resource-operator - replace 'coreydaley' with 'sayan-biswas' in OWNERS file

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource-operator/pull/99

Bug OCPBUGS-30514: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/65

Bug OCPBUGS-36607: [aws] remove terraform configs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35188~~. The following is the description of the original issue:
—
Description of problem:

Now that capi/aws is the default in 4.16+, the old terraform aws configs won't be maintained since there is no way to use them. Users interested in the configs can still access them in the 4.15 branch where they are still maintained as the installer still uses terraform.

Version-Release number of selected component (if applicable):

4.16+

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

terraform aws configs are left in the repo.

Expected results:

Configs are removed.

Additional info:

https://github.com/openshift/installer/pull/8705

Bug OCPBUGS-24935: Update 4.16 ose-baremetal-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/baremetal-operator/pull/327

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/baremetal-operator/pull/327

Bug OCPBUGS-35838: Remove KMS V1 provider support for IBM Cloud

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35450~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4250

Bug OCPBUGS-34825: Proxy settings in buildDefaults preserved in image

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-12699~~. The following is the description of the original issue:
—
Description of problem:

Proxy settings in buildDefaults preserved in image

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

I have a customer, so during builds their developers need proxy access.
For this they have configured buildDefaults on thier cluster as described here:https://docs.openshift.com/container-platform/4.10/cicd/builds/build-configuration.html.
The problem is that buildDefaults.defaultProxy sets the proxy environment variables in uppercase.
Several RedHat S2I images use tools that depend on curl. curl only supports lower-case proxy environment variables. As such the defaultProxy settings are not taken into account.To workaround this "behavior defect", they have configured:
- buildDefaults.env.http_proxy
- buildDefaults.env.https_proxy
- buildDefaults.env.no_proxy
But the side effect is that the lowercase environment variables are preserved in the container image. So at runtime, the proxy settings are still active and they constantly have to support developers to unset them again (when using non-fqdn for example). This is causing frustration for them and thier developers.
1. Why can't the buildDefaults.defaultProxy not be set in lower and uppercase proxy settings?2. Why are the buildDefaults.env preserved in the container image while buildDefaults.defaultProxy is correctly unset/removed from the container image. As the name implies, for us "buildDefaults" should only be kept during the build and settings should correctly be removed before pushing the image in the registry.Also have shared them the below KCS:
https://access.redhat.com/solutions/1575513.
But cu was not satisfied with that , and they responded with the following:
The article does not provide a solution to the problem. It describes the same and gives a dirty workaround a developers will have to apply it on each individual buildconfig. This is not wanted.
The fact that we set these envs using buildDefaults, is the same workaround. But still the core problem remains: the envs are preserved in the container image when using this workaround.
This needs to be addressed by engineering so this is fixed properly.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/314

Bug OCPBUGS-28673: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-26488: CCO reports wrong credentials mode in metrics

View the Description View the linked PRs

Description of problem:

CCO reports credsremoved mode in metrics when the cluster is actually in the default mode. 
See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/47349/rehearse-47349-pull-ci-openshift-cloud-credential-operator-release-4.16-e2e-aws-qe/1744240905512030208 (OCP-31768).

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always.

Steps to Reproduce:

1. Creates an AWS cluster with CCO in the default mode (ends up in mint)
2. Get the value of the cco_credentials_mode metric

Actual results:

credsremoved

Expected results:

mint

Root cause:

The controller-runtime client used in metrics calculator (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L77) is unable to GET the root credentials Secret (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L184) since it is backed by a cache which only contains target Secrets requested by other operators (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/cmd/operator/cmd.go#L164-L168).

https://github.com/openshift/cloud-credential-operator/pull/645

Bug OCPBUGS-25544: Update 4.16 csi-node-driver-registrar-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/csi-node-driver-registrar/pull/62

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/csi-node-driver-registrar/pull/62

Bug OCPBUGS-32487: OLM logs contain initialization errors in HyperShift

View the Description View the linked PRs

Description of problem:

    The olm-operator pod has initilization errors in the logs in a HyperShift deployment. It appears that the --writePackageServerStatusName="" passed in as an argument is being interpreted as \"\" instead of an empty string.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

$ kubectl -n master-coh67vr100a3so6e7erg logs olm-operator-75474cfd48-w2fp5

Actual results:

Several errors that look like this

time="2024-04-19T12:41:32Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator

Expected results:

    No errors

Additional info:

https://github.com/openshift/hypershift/pull/3918

Bug OCPBUGS-36907: better handling of deprecated parameter in cluster-monitoring-config

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36495~~. The following is the description of the original issue:
—
Description of problem:

with the release of 4.16  prometheus adapter[0] is deprecated and there is a new alert[1] ClusterMonitoringOperatorDeprecatedConfig, there needs to be a better details on how these alerts can be handled which will reduce the support cases.

[0] https://docs.openshift.com/container-platform/4.16/release_notes/ocp-4-16-release-notes.html#ocp-4-16-prometheus-adapter-removed
[1] https://docs.openshift.com/container-platform/4.16/release_notes/ocp-4-16-release-notes.html#ocp-4-16-monitoring-changes-to-alerting-rules

Version-Release number of selected component (if applicable):

4.16

How reproducible:

NA

Steps to Reproduce:

NA

Actual results:

As per the current configuration the clarification for the alert is not provided with much information

Expected results:

  more information should be provided on how to fix the alert.

Additional info:

As per the discussion, there will be a runbook added which will help in better understanding of the alert

https://github.com/openshift/cluster-monitoring-operator/pull/2411

Bug OCPBUGS-44337: Removal of additionalTrustBundle CA that was passed via install-config.yaml during agent-based installation, does not remove certificate from node

View the Description View the linked PRs

Description of problem:

 When we remove additionalTrustBundle CA of mirror registry(user-ca-bundle) that was passed via the install-config.yaml for agent installer installation,
MCO does not remove certificatefrom the nodes.

$ oc version
Client Version: 4.15.23
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.23
Kubernetes Version: v1.28.11+add48d0
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.23   True        False         3h2m    Cluster version is 4.15.23

How reproducible:

    Always

Steps to Reproduce:

    1.Create cluster with additionalTrustBundle CA in install-config
    2.Locate the mirror reg CA certificate stored on the node's /etc/pki/ directory
     ~~~
     cd /etc/pki/ca-trust/source/anchors
[root@master1 anchors]# ls -la
total 216
drwxr-xr-x. 2 root root     49 Sep 18 05:23 .
drwxr-xr-x. 4 root root     80 Sep 18 05:20 ..
-rw-------. 1 root root 220593 Sep 18 05:23 openshift-config-user-ca-bundle.crt
    ~~~

    3. back up and delete the CM (user-ca-bundle)
     ~~~
   $ oc delete configmap/user-ca-bundle -n openshift-config
configmap "user-ca-bundle" deleted
     ~~~

    4. Observer if some changes happens at the MCO/MCP level due to the same.
    5. Switch to the node and check same /etc/pki/../ to see if CA is present or not

Actual results:

Certificate still present under  "/etc/pki/ca-trust/source/anchors" on the nodes. No new MC got generated

# cd /etc/pki/ca-trust/source/anchors
[root@master1 anchors]# ls -la
total 216
drwxr-xr-x. 2 root root     49 Sep 18 05:23 .
drwxr-xr-x. 4 root root     80 Sep 18 05:20 ..
-rw-------. 1 root root 220593 Sep 18 05:23 openshift-config-user-ca-bundle.crt

[root@master1 anchors]# cat openshift-config-user-ca-bundle.crt | grep "MIID2TCCAsGgAwIBAgIUb1e2U0GXeW5qmTlgzE8SSDvht2YwDQYJKoZIhvcNAQEL"

MIID2TCCAsGgAwIBAgIUb1e2U0GXeW5qmTlgzE8SSDvht2YwDQYJKoZIhvcNAQEL
MIID2TCCAsGgAwIBAgIUb1e2U0GXeW5qmTlgzE8SSDvht2YwDQYJKoZIhvcNAQEL

Expected results:

    New MC should get created once the user-ca-bundle has been removed and roll out of MC should happen on the node. Certificate should be removed on the nodes.

Additional info:

https://github.com/openshift/machine-config-operator/pull/4688

Bug HOSTEDCP-1501: Cluster sizing configuration is overwritten when running `hypershift install`

View the Description View the linked PRs

If there is an existing configuration, running `hypershift install` should not overwrite it.

https://github.com/openshift/hypershift/pull/3787

Bug OCPBUGS-25779: Update 4.16 kube-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/sdn/pull/600

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/sdn/pull/600

Bug OCPBUGS-26046: ART-8361: Replace genisoimage with xorriso in 4.15

View the Description View the linked PRs

This is duplicate of https://issues.redhat.com/browse/ART-8361 one since on ART bugs we are not able to set `target` so creating the issue here.

https://github.com/openshift/cluster-api-provider-libvirt/pull/281

Bug OCPBUGS-31680: The button text for VolumeSnapshotContents is incorrect

View the Description View the linked PRs

Description of problem:

    The button text for VolumeSnapshotContents is incorrect

Version-Release number of selected component (if applicable):

    4.16.0-0.nightly-2024-04-02-182836

How reproducible:

    always

Steps to Reproduce:

    1. Navigate to Storage -> VolumeSnapshotContents page
       /k8s/cluster/snapshot.storage.k8s.io~v1~VolumeSnapshotContent
    2. check the create button text
    3.

Actual results:

the text in the button shows 'Create VolumeSnapshot'

Expected results:

the text in the button should be 'Create VolumeSnapshotContents'

Additional info:

https://github.com/openshift/console/pull/13742

Story TRT-1503: gcp jobs failing with /tmp/google-cloud-sdk/bin/gcloud: line 129: exec: python: not

View the Description View the linked PRs

4.16 payload are failing because of multiple issues. One of them is due to missing python module on RHEL9.

Here is a slack thread on it: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1707744121181479

PR for the fix:

https://github.com/openshift/oc/pull/1682

https://github.com/openshift/oc/pull/1682

Story MGMT-17518: Bump x/net to at least v0.24.0 to mitigate CVE-2023-45288

View the Description View the linked PRs

In order to address ~~OCPBUGS-30905~~
Bump x/net to at least v0.24.0 to mitigate CVE-2023-45288

Bug OCPBUGS-19635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2070

Bug OCPBUGS-24531: Changes required for regression in mutable scope

View the Description View the linked PRs

Description of problem:
Based on the discussion in ~~https://issues.redhat.com/browse/OCPBUGS-24044~~
and the discussion in this slack [https://redhat-internal.slack.com/archives/CBWMXQJKD/p1700510945375019|thread] we need to update our CI and some of the work done for mutable scope in ~~NE-621~~.

Specifically, we need to

modify TestScopeChange and TestUnmanagedDNSToManagedDNSInternalIngressController to delete the service on all platforms, as toggling scope is no longer recommended.
modify any special behavior added for platformsWithMutableScope.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

  1. Run CI TestUnmanagedDNSToManagedDNSInternalIngressController
  2. Observe failure in unmanaged-migrated-internal

Actual results:

CI tests fail.

Expected results:

CI tests shouldn't fail.

Additional info:

This is a change from past behavior, as reported in https://issues.redhat.com/browse/OCPBUGS-24044. Further discussion revealed that the new behavior is currently expected but could be restored in the future. Notes to SRE and release notes are needed for this change to behavior.

Bug OCPBUGS-25662: ECR Image pull fails in-spite of attaching AmazonEC2ContainerRegistryReadOnly policy to the worker nodes.

View the Description View the linked PRs

Description of problem:

In ROSA/OCP 4.14.z, attaching AmazonEC2ContainerRegistryReadOnly policy to the worker nodes (in ROSA's case, this was attached to the ManagedOpenShift-Worker-Role, which is assigned by the installer to all the worker nodes), has no effect on ECR Image pull. User gets an authentication error. Attaching the policy ideally should avoid the need to provide an image-pull-secret. However, the error is resolved only if the user also provides an image-pull-secret.
This is proven to work correctly in 4.12.z. Seems something has changed in the recent OCP versions.

Version-Release number of selected component (if applicable):

4.14.2 (ROSA)

How reproducible:

The issue is reproducible using the below steps.

Steps to Reproduce:

    1. Create a deployment in ROSA or OCP on AWS, pointing at a private ECR repository
    2. The image pulling will fail with Error: ErrImagePull & authentication required errors
    3.

Actual results:

The image pull fails with "Error: ErrImagePull" & "authentication required" errors. However, the image pull is successful only if the user provides an image-pull-secret to the deployment.

Expected results:

The image should be pulled successfully by virtue of the ECR-read-only policy attached to the worker node role; without needing an image-pull-secret.

Additional info:

In other words:

in OCP 4.13 (and below) if a user adds the ECR:* permissions to the worker instance profile, then the user can specify ECR images and authentication of the worker node to ECR is done using the instance profile. In 4.14 this no longer works.

It is not sufficient as an alternative, to provide a pull secret in a deployment because AWS rotates ECR tokens every 12 hours. That is not a viable solution for customers that until OCP 4.13, did not have to rotate pull secrets constantly.

The experience in 4.14 should be the same as in 4.13 with ECR.

The current AWS policy that's used is this one: `arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly`

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:DescribeImages",
                "ecr:BatchGetImage",
                "ecr:GetLifecyclePolicy",
                "ecr:GetLifecyclePolicyPreview",
                "ecr:ListTagsForResource",
                "ecr:DescribeImageScanFindings"
            ],
            "Resource": "*"
        }
    ]
}

Bug OCPBUGS-31658: Altinfra builds are failing due to missing CustomNoUpgrade manifest for etcd

View the Description View the linked PRs

Description of problem:

Altinfra build jobs are failing

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always

Steps to Reproduce:

1.Build master installer and use latest nightly 4.16 release image
2.Run CAPI enabled installer with FeatureSet CustomNoUpgrade and featureGates: ["ClusterAPIInstall=true"]

Actual results:

Cluster fails to complete boostrap

Expected results:

Cluster is able to install completely

Additional info:

This bug is to track investigation into why altinfra e2e jobs were failing for:
https://prow.ci.openshift.org/job-history/gs/test-platform-results/pr-logs/directory/pull-ci-openshift-installer-master-altinfra-e2e-vsphere-capi-ovn
Upon looking into it, etcd operator was not being created.  We saw the following:

CVO:

402 17:18:59.959209       1 task.go:124] error running apply for etcd "cluster" (108 of 937): failed to get resource type: no matches for kind "Etcd" in version "operator.openshift.io/v1"
E0402 17:19:03.862993       1 task.go:124] error running apply for etcd "cluster" (108 of 937): failed to get resource type: no matches for kind "Etcd" in version "operator.openshift.io/v1"
E0402 17:19:09.157126       1 task.go:124] error running apply for etcd "cluster" (108 of 937): failed to get resource type: no matches for kind "Etcd" in version "operator.openshift.io/v1"
I0402 17:19:20.234944       1 task_graph.go:550] Result of work: [Could not update etcd "cluster" (108 of 937): the server does not recognize this resource, check extension API servers Cluster operator kube-apiserver is not available Cluster operator machine-api is not available Cluster operator authentication is not available Cluster operator image-registry is not available Cluster operator ingress is not available Cluster operator monitoring is not available Cluster operator openshift-apiserver is not available Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (536 of 937): resource may have been deleted Could not update oauthclient "console" (597 of 937): the server does not recognize this resource, check extension API servers Could not update imagestream "openshift/driver-toolkit" (659 of 937): resource may have been deleted Could not update role "openshift/copied-csv-viewer" (727 of 937): resource may have been deleted Could not update role "openshift-console-operator/prometheus-k8s" (855 of 937): resource may have been deleted Could not update role "openshift-console/prometheus-k8s" (859 of 937): resource may have been deleted]
I0402 17:19:20.235037       1 sync_worker.go:1166] Update error 108 of 937: UpdatePayloadResourceTypeMissing Could not update etcd "cluster" (108 of 937): the server does not recognize this resource, check extension API servers (*errors.withStack: failed to get resource type: no matches for kind "Etcd" in version "operator.openshift.io/v1")
* Could not update etcd "cluster" (108 of 937): the server does not recognize this resource, check extension API servers

https://github.com/openshift/cluster-etcd-operator/pull/1233

Bug OCPBUGS-33080: Egressqos failed to update status

View the Description View the linked PRs

Description of problem:

Apply egressqos on OCP, the status of egressqos is empty. Check ovnkube-pod logs, it shows error like below:

I0429 09:39:19.013461    4771 egressqos.go:460] Processing sync for EgressQoS abc/default
I0429 09:39:19.022635    4771 egressqos.go:463] Finished syncing EgressQoS default on namespace abc : 9.174361ms
E0429 09:39:19.028426    4771 egressqos.go:368] failed to update EgressQoS object abc/default with status: Apply failed with 1 conflict: conflict with "ip-10-0-62-24.us-east-2.compute.internal" with subresource "status": .status.conditions
I0429 09:39:19.031526    4771 egressqos.go:460] Processing sync for EgressQoS default/default
I0429 09:39:19.039827    4771 egressqos.go:463] Finished syncing EgressQoS default on namespace default : 8.322774ms
E0429 09:39:19.044060    4771 egressqos.go:368] failed to update EgressQoS object default/default with status: Apply failed with 1 conflict: conflict with "ip-10-0-70-102.us-east-2.compute.internal" with subresource "status": .status.conditions
I0429 09:39:19.052877    4771 egressqos.go:460] Processing sync for EgressQoS abc/default
I0429 09:39:19.055945    4771 egressqos.go:463] Finished syncing EgressQoS default on namespace abc : 3.182828ms
E0429 09:39:19.060563    4771 egressqos.go:368] failed to update EgressQoS object abc/default with status: Apply failed with 1 conflict: conflict with "ip-10-0-62-24.us-east-2.compute.internal" with subresource "status": .status.conditions
I0429 09:39:19.072238    4771 egressqos.go:460] Processing sync for EgressQoS default/default

Version-Release number of selected component (if applicable):

4.16

How reproducible:

always

Steps to Reproduce:

1. create egressqos in ns abc

% cat egress_qos.yaml 
kind: EgressQoS
apiVersion: k8s.ovn.org/v1
metadata:
  name: default
  namespace: abc
spec:
  egress:
  - dscp: 46
    dstCIDR: 3.16.78.227/32
  - dscp: 30
    dstCIDR: 0.0.0.0/0

2. check egressqos

% oc get egressqos default -o yaml
apiVersion: k8s.ovn.org/v1
kind: EgressQoS
metadata:
  creationTimestamp: "2024-04-29T09:24:55Z"
  generation: 1
  name: default
  namespace: abc
  resourceVersion: "376134"
  uid: f9dfe380-81ee-4edd-845d-49ba2c856e81
spec:
  egress:
  - dscp: 46
    dstCIDR: 3.16.78.227/32
  - dscp: 30
    dstCIDR: 0.0.0.0/0
status: {}

3. check crd egressqos

% oc get crd egressqoses.k8s.ovn.org -o yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.8.0
  creationTimestamp: "2024-04-29T05:23:12Z"
  generation: 1
  name: egressqoses.k8s.ovn.org
  ownerReferences:
  - apiVersion: operator.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Network
    name: cluster
    uid: 3bfac7ab-ca29-477f-a97f-27592b7e176d
  resourceVersion: "3642"
  uid: 25dabf13-611f-4c29-bf22-4a0b56e4b7f7
spec:
  conversion:
    strategy: None
  group: k8s.ovn.org
  names:
    kind: EgressQoS
    listKind: EgressQoSList
    plural: egressqoses
    singular: egressqos
  scope: Namespaced
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        description: EgressQoS is a CRD that allows the user to define a DSCP value
          for pods egress traffic on its namespace to specified CIDRs. Traffic from
          these pods will be checked against each EgressQoSRule in the namespace's
          EgressQoS, and if there is a match the traffic is marked with the relevant
          DSCP value.
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            properties:
              name:
                pattern: ^default$
                type: string
            type: object
          spec:
            description: EgressQoSSpec defines the desired state of EgressQoS
            properties:
              egress:
                description: a collection of Egress QoS rule objects
                items:
                  properties:
                    dscp:
                      description: DSCP marking value for matching pods' traffic.
                      maximum: 63
                      minimum: 0
                      type: integer
                    dstCIDR:
                      description: DstCIDR specifies the destination's CIDR. Only
                        traffic heading to this CIDR will be marked with the DSCP
                        value. This field is optional, and in case it is not set the
                        rule is applied to all egress traffic regardless of the destination.
                      format: cidr
                      type: string
                    podSelector:
                      description: PodSelector applies the QoS rule only to the pods
                        in the namespace whose label matches this definition. This
                        field is optional, and in case it is not set results in the
                        rule being applied to all pods in the namespace.
                      properties:
                        matchExpressions:
                          description: matchExpressions is a list of label selector
                            requirements. The requirements are ANDed.
                          items:
                            description: A label selector requirement is a selector
                              that contains values, a key, and an operator that relates
                              the key and values.
                            properties:
                              key:
                                description: key is the label key that the selector
                                  applies to.
                                type: string
                              operator:
                                description: operator represents a key's relationship
                                  to a set of values. Valid operators are In, NotIn,
                                  Exists and DoesNotExist.
                                type: string
                              values:
                                description: values is an array of string values.
                                  If the operator is In or NotIn, the values array
                                  must be non-empty. If the operator is Exists or
                                  DoesNotExist, the values array must be empty. This
                                  array is replaced during a strategic merge patch.
                                items:
                                  type: string
                                type: array
                            required:
                            - key
                            - operator
                            type: object
                          type: array
                        matchLabels:
                          additionalProperties:
                            type: string
                          description: matchLabels is a map of {key,value} pairs.
                            A single {key,value} in the matchLabels map is equivalent
                            to an element of matchExpressions, whose key field is
                            "key", the operator is "In", and the values array contains
                            only "value". The requirements are ANDed.
                          type: object
                      type: object
                  required:
                  - dscp
                  type: object
                type: array
            required:
            - egress
            type: object
          status:
            description: EgressQoSStatus defines the observed state of EgressQoS
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}
status:
  acceptedNames:
    kind: EgressQoS
    listKind: EgressQoSList
    plural: egressqoses
    singular: egressqos
  conditions:
  - lastTransitionTime: "2024-04-29T05:23:12Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2024-04-29T05:23:12Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1

Actual results:

egressqos status is not updated correctly

Expected results:

egressqos status should be updated once applied.

Additional info:

 % oc version
Client Version: 4.16.0-0.nightly-2024-04-26-145258
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.0-0.nightly-2024-04-26-145258
Kubernetes Version: v1.29.4+d1ec84a

Bug OCPBUGS-24219: Azure - OCP IPI Installation UDP packets are subject to SNAT with LB Service using ETP equals to Local (OVN-Kubernetes as CNI)

View the Description View the linked PRs

UDP Packets are subject to SNAT in a self-managed OCP 4.13.13 cluster on Azure (OVN-K as CNI) using a Load Balancer Service with `externalTrafficPolicy: Local`. UDP Packets correctly arrive to the Node hosting the Pod but the source IP seen by the Pod is the OVN GW Router of the Node.

I've reproduced the customer scenario with the following steps:

Deploy a blank enviroment on Azure using [demo lab | https://demo.redhat.com/catalog?item=babylon-catalog-prod/azure-gpte.open-environment-azure-subscription.prod&utm_source=webapp&utm_medium=share-link ]
Install an OCP 4.13.13 cluster with the installer
Use this GitHub repo to deploy a simple UDP server
Run `oc apply -f server.yaml` to deploy Deployment and Service resources
application listens on 10001 for UDP traffic, while the load balancer listens on 10001 for UDP traffic and forward it on 10001 pod port
Using nc from the bastion host to connect to the external load balancer IP and starting writing something

This is issue is very critical because it is blocking customer business.

https://github.com/openshift/ovn-kubernetes/pull/2038

Bug OCPBUGS-41967: user workload monitoring is trying to scrap RH operators which have been installed in openshift-operators namespace

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-41341~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-39126. The following is the description of the original issue:
—
Description of problem:


Difficult to detect in which component I should report this bug. The description is the following.

Today we can install RH operators using one precise namespace or just all namepaces that will install the operator in "openshift-operators" namespace.

if this operator creates a serviceMonitor that should be scrapped by platform prometheus, this will have a token authentication and security configured in its definition.

But if the operator is installed in "openshift-operators" namespace, it's user workload monitoring that will try to scrappe it since this mentioned namespace has not the corresponding label to be scrapped by platform monitoring and we don't want it to have it because in this namespace we can also install community operators.

The result is that user workload monitoring will scrap this namespace and the service monitors will be skipped since they are configured with security against platform monitoring and UWM will not hande this.

A possible workaround is to do:

oc label namespace openshift-operators openshift.io/user-monitoring=false

losing functionality since some RH operators will not be monitored if installed in openshift-operators.

Version-Release number of selected component (if applicable):

 4.16

https://github.com/openshift/cluster-monitoring-operator/pull/2474

Bug OCPBUGS-24214: Network TLS artifacts should have ownership annotations

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/2120

Bug OCPBUGS-25750: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/313

Bug OCPBUGS-33901: machine-os-puller SA refreshes every hour, causing machine config regeneration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33803~~. The following is the description of the original issue:
—
The MCO currently lays down a file at /etc/mco/internal-registry-pull-secret.json, which is extracted from the machine-os-puller SA into ControllerConfig. It is then templated down to a MachineConfig. For some reason, this SA is now being refreshed every hour or so, causing a new MachineConfig to be generated every hour. This also causes CI issues as the machineconfigpools will randomly update to a new config in the middle of a test.

More context: https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1715888365021729

https://github.com/openshift/machine-config-operator/pull/4374

Bug OCPBUGS-23228: hosted-cluster-config-operator-manager should throttle creation attempts

View the Description View the linked PRs

Description of problem:

Release controller > 4.14.2 > HyperShift conformance run > gathered assets:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.user.username == "system:admin" and .verb == "create" and .requestURI == "/apis/operator.openshift.io/v1/storages") | .userAgent' | sort | uniq -c
     65 hosted-cluster-config-operator-manager
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.user.username == "system:admin" and .verb == "create" and .requestURI == "/apis/operator.openshift.io/v1/storages") | .requestReceivedTimestamp + " " + (.responseStatus | (.code | tostring) + " " + .reason)' | head -n5
2023-11-09T17:17:15.130454Z 409 AlreadyExists
2023-11-09T17:17:15.163256Z 409 AlreadyExists
2023-11-09T17:17:15.198908Z 409 AlreadyExists
2023-11-09T17:17:15.230532Z 409 AlreadyExists
2023-11-09T17:17:22.899579Z 409 AlreadyExists

That's banging away pretty hard with creation attempts that keep getting 409ed, presumably because an earlier creation attempt succeeded. If the controller needs very quick latency in re-creation, perhaps an informing watch? If the controller can handle some re-creation latency, perhaps a quieter poll?

Version-Release number of selected component (if applicable):

4.14.2. I haven't checked other releases.

How reproducible:

Likely 100%. I saw similar behavior in an unrelated dump, and confirmed the busy 409s in the first CI run I checked.

Steps to Reproduce:

1. Dump a hosted cluster.
2. Inspect its audit logs for hosted-cluster-config-operator-manager create activity.

Actual results:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance/1722648207965556736/artifacts/e2e-aws-ovn-conformance/dump/artifacts/namespaces/clusters-e8d2a8003773eacb6a8b/core/pods/logs/kube-apiserver-5f47c7b667-42h2f-audit-logs.log | grep -v '/var/log/kube-apiserver/audit.log. has' | jq -r 'select(.userAgent == "hosted-cluster-config-operator-manager" and .verb == "create") | .verb + " " + (.responseStatus.code | tostring)' | sort | uniq -c
    130 create 409

Expected results:

Zero or rare 409 creation request from this user-agent.

Additional info:

The user agent seems to be defined here, so likely the fix will involve changes to that manager.

https://github.com/openshift/hypershift/pull/3396

Bug OCPBUGS-24858: Update 4.16 ose-kube-metrics-server-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-metrics-server/pull/21

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-metrics-server/pull/21

Bug OCPBUGS-30642: [GCP] GCP installs are permafailing because of oauth2 bump

View the Description View the linked PRs

Description of problem:

    Since the golang.org/x/oauth2 package has been upgraded, GCP installs have been failing with 

level=info msg=Credentials loaded from environment variable "GOOGLE_CLOUD_KEYFILE_JSON", file "/var/run/secrets/ci.openshift.io/cluster-profile/gce.json"
level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [platform.gcp.project: Internal error: failed to create cloud resource service: Get "http://169.254.169.254/computeMetadata/v1/universe/universe_domain": dial tcp 169.254.169.254:80: connect: connection refused, : Internal error: failed to create compute service: Get "http://169.254.169.254/computeMetadata/v1/universe/universe_domain": dial tcp 169.254.169.254:80: connect: connection refused]

Version-Release number of selected component (if applicable):

    4.16/master

How reproducible:

    always

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

    The bump has been introduced by https://github.com/openshift/installer/pull/8020

https://github.com/openshift/installer/pull/8133

Bug OCPBUGS-34398: Cluster Bootstrap does not account for capabilities in rendered manifests

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33882~~. The following is the description of the original issue:
—
As of OpenShift 4.16, CRD management is more complex. This is an artifact of improvements made to feature gates and feature sets. David Eads and I agreed that, to avoid confusion, we should aim to stop having CRDs installed via operator repos, and, if their types live in o/api, install them from there instead.

We started this by moving the ControlPlaneMachineSet back to o/api, which is part of the MachineAPI capability.

Unbeknown to us at the time, the way the installer currently works is that all resources that are rendered, get applied by a cluster-bootstrap tools, roughly here and not by CVO.

Cluster-bootstrap is not capability aware, so installed the CPMS CRD, which in turn broke the check in the CSR approver which stops it from crashing on MachineAPI less clusters.

Options for moving forward include:

Reverting the move (complex)
Making the API render somehow understand capabilities and remove any CRD from a disabled cap
Make the cluster-bootstrap tool filter for caps

I'm not sure presently which of the 2nd or 3rd options is better, nor am I sure how I would expect the caps to come into knowledge of the "renderers", installer can provide them as args in bootkube.sh.template?

Original bug below, description of what's happening above

Description of problem:

After running tests on an SNO with Telco DU profile for a couple of hours kubernetes.io/kubelet-serving CSRs in Pending state start showing up and accumulating in time.

Version-Release number of selected component (if applicable):

4.16.0-rc.1

How reproducible:

once so far

Steps to Reproduce:

    1. Deploy SNO with DU profile with disabled capabilities:

    installConfigOverrides:  "{\"capabilities\":{\"baselineCapabilitySet\": \"None\", \"additionalEnabledCapabilities\": [ \"NodeTuning\", \"ImageRegistry\", \"OperatorLifecycleManager\" ] }}"

2. Leave the node running tests overnight for a couple of hours

3. Check for Pending CSRs

Actual results:

oc get csr -A | grep Pending | wc -l 
27

Expected results:

No pending CSRs    

Also oc logs will return a tls internal error:

oc -n openshift-cluster-machine-approver --insecure-skip-tls-verify-backend=true logs machine-approver-866c94c694-7dwks 
Defaulted container "kube-rbac-proxy" out of: kube-rbac-proxy, machine-approver-controller
Error from server: Get "https://[2620:52:0:8e6::d0]:10250/containerLogs/openshift-cluster-machine-approver/machine-approver-866c94c694-7dwks/kube-rbac-proxy": remote error: tls: internal error

Additional info:

Checking the machine-approver-controller container logs on the node we can see the reconciliation is failing be cause it cannot find the Machine API which is disabled from the capabilities.

I0514 13:25:09.266546       1 controller.go:120] Reconciling CSR: csr-dw9c8
E0514 13:25:09.275585       1 controller.go:138] csr-dw9c8: Failed to list machines in API group machine.openshift.io/v1beta1: no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
E0514 13:25:09.275665       1 controller.go:329] "Reconciler error" err="Failed to list machines: no matches for kind \"Machine\" in version \"machine.openshift.io/v1beta1\"" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-dw9c8" namespace="" name="csr-dw9c8" reconcileID="6f963337-c6f1-46e7-80c4-90494d21653c"
I0514 13:25:43.792140       1 controller.go:120] Reconciling CSR: csr-jvrvt
E0514 13:25:43.798079       1 controller.go:138] csr-jvrvt: Failed to list machines in API group machine.openshift.io/v1beta1: no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
E0514 13:25:43.798128       1 controller.go:329] "Reconciler error" err="Failed to list machines: no matches for kind \"Machine\" in version \"machine.openshift.io/v1beta1\"" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-jvrvt" namespace="" name="csr-jvrvt" reconcileID="decbc5d9-fa10-45d1-92f1-1c999df956ff"

https://github.com/openshift/api/pull/1905

Bug OCPBUGS-41619: After updating the cluster to openshift 4.15.11 the value for vCenter Cluster in vsphere connnection configuration is missing.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39453~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-35321. The following is the description of the original issue:
—
Description of problem:

Customer have update its cluster to 4.15.11 from 4.11.x After updating the cluster to openshift 4.15.11 the value for vCenter Cluster in vsphere connection configuration is missing. From GUI it should be observable.

-> Not displaying the vcenter clsuter name in GUI.

-> We have also see the Cloud-config all things are at it's place but we missing some parameter from openshift console in v-sphere connection configuration.


Please find the attached screenshot for more reference here.

Version-Release number of selected component (if applicable):

How reproducible:

Customer have reproduced that issue we are yet to do so.

Steps to Reproduce:

[x] -- Customer have update it's cluster from 4.11.x to 4.15.11 after upgrade cluster looks fine & healthy in it's state but missing a parameter from the v-sphere connection configuration in open-shift console as shown in attached screenshot.

Expected results:

Additional info:

https://github.com/openshift/console/pull/14268

Bug OCPBUGS-30469: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task MON-3789: add a golangci-lint-fix makefile target to fix golangci-lint errros

View the Description View the linked PRs

golangci offers a "--fix" option

There fixed when possible.

https://github.com/openshift/cluster-monitoring-operator/pull/2275

Bug OCPBUGS-21865: oc-mirror doesn't respect the minVersion & maxVersion

View the Description View the linked PRs

oc-mirror - maxVersion of the imageset config is ignored for operators

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create 2 imageset that we are using:

_imageset-config-test1-1.yaml:_
~~~
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /local/oc-mirror/test1/metadata
mirror:
  platform:
    architectures:
      - amd64
    graph: true
    channels:
      - name: stable-4.12
        type: ocp
        minVersion: 4.12.1
        maxVersion: 4.12.1
        shortestPath: true
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12
      packages:
        - name: cincinnati-operator
          channels:
            - name: v1
              minVersion: 5.0.1
              maxVersion: 5.0.1
~~~

_imageset-config-test1-2.yaml:_
~~~
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /local/oc-mirror/test1/metadata
mirror:
  platform:
    architectures:
      - amd64
    graph: true
    channels:
      - name: stable-4.12
        type: ocp
        minVersion: 4.12.1
        maxVersion: 4.12.1
        shortestPath: true
  operators:
    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12
      packages:
        - name: cincinnati-operator
          channels:
            - name: v1
              minVersion: 5.0.1
              maxVersion: 5.0.1
        - name: local-storage-operator
          channels:
            - name: stable
              minVersion: 4.12.0-202305262042
              maxVersion: 4.12.0-202305262042
        - name: odf-operator
          channels:
            - name: stable-4.12
              minVersion: 4.12.4-rhodf
              maxVersion: 4.12.4-rhodf
        - name: rhsso-operator
          channels:
            - name: stable
              minVersion: 7.6.4-opr-002
              maxVersion: 7.6.4-opr-002
    - catalog: registry.redhat.io/redhat/redhat-marketplace-index:v4.12
      packages:
        - name: k10-kasten-operator-rhmp
          channels:
            - name: stable
              minVersion: 6.0.6
              maxVersion: 6.0.6
  additionalImages:
    - name: registry.redhat.io/rhel8/postgresql-13:1-125
~~~

2. Generate a first .tar file from the first imageset-config file (imageset-config-test1-1.yaml)
oc mirror --config=imageset-config-test1-1.yaml file:///local/oc-mirror/test1

3. Use the first .tar file to populate our registry
oc mirror --from=/root/oc-mirror/test1/mirror_seq1_000000.tar docker://registry-url/oc-mirror1

4.Generate a second .tar file from the second imageset-config file (imageset-config-test1-2.yaml)
oc mirror --config=imageset-config-test1-2.yaml file:///local/oc-mirror/test1

5. Populate the private registry named `oc-mirror1` with the second .tar file:

oc mirror --from=/root/oc-mirror/test1/mirror_seq2_000000.tar docker://registry-url/oc-mirror1

6. Check the catalog index for **odf** and **rhsso** operators

[root@test ~]# oc-mirror list operators --package odf-operator --catalog=registry-url/oc-mirror1/redhat/redhat-operator-index:v4.12 --channel stable-4.12
    VERSIONS
    4.12.7-rhodf
    4.12.8-rhodf
    4.12.4-rhodf
    4.12.5-rhodf
    4.12.6-rhodf

[root@test ~]# oc-mirror list operators --package rhsso-operator --catalog=registry-url/oc-mirror1/redhat/redhat-operator-index:v4.12 --channel stable
    VERSIONS
    7.6.4-opr-002
    7.6.4-opr-003
    7.6.5-opr-001
    7.6.5-opr-002

Actual results:

Check the catalog index for **odf** and **rhsso** operators. oc-mirror is not respecting the minVersion & maxVersion

[root@test ~]# oc-mirror list operators --package odf-operator --catalog=registry-url/oc-mirror1/redhat/redhat-operator-index:v4.12 --channel stable-4.12 VERSIONS 4.12.7-rhodf 4.12.8-rhodf 4.12.4-rhodf 4.12.5-rhodf 4.12.6-rhodf

[root@test ~]# oc-mirror list operators --package rhsso-operator --catalog=registry-url/oc-mirror1/redhat/redhat-operator-index:v4.12 --channel stable VERSIONS 7.6.4-opr-002 7.6.4-opr-003 7.6.5-opr-001 7.6.5-opr-002

Expected results:

oc-mirror should respect the minVersion & maxVersion

[root@test ~]# oc-mirror list operators --package odf-operator --catalog=registry-url/oc-mirror2/redhat/redhat-operator-index:v4.12 --channel stable-4.12 VERSIONS 4.12.4-rhodf

[root@test ~]# oc-mirror list operators --package rhsso-operator --catalog=registry-url/oc-mirror2/redhat/redhat-operator-index:v4.12 --channel stable VERSIONS 7.6.4-opr-002

Additional info:

https://github.com/openshift/oc-mirror/pull/761

Bug OCPBUGS-28244: [ASH] cluster install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

View the Description View the linked PRs

Description of problem:

Cluster install fails on ASH, nodes tainted with node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

Version-Release number of selected component (if applicable):

4.16.0-0.nightly-2024-01-24-133352

How reproducible:

Always

Steps to Reproduce:

1. Built a cluster on ASH  
$ oc get node            
NAME                             STATUS     ROLES                  AGE    VERSION
ropatil-261ash1-x9kcj-master-0   NotReady   control-plane,master   7h     v1.29.1+0e0d15b
ropatil-261ash1-x9kcj-master-1   NotReady   control-plane,master   7h1m   v1.29.1+0e0d15b
ropatil-261ash1-x9kcj-master-2   NotReady   control-plane,master   7h1m   v1.29.1+0e0d15b 

$ oc get node -o yaml | grep uninitialized    
      key: node.cloudprovider.kubernetes.io/uninitialized
      key: node.cloudprovider.kubernetes.io/uninitialized
      key: node.cloudprovider.kubernetes.io/uninitialized  

$ oc get po -n openshift-cloud-controller-manager     
NAME                                              READY   STATUS             RESTARTS         AGE
azure-cloud-controller-manager-7b75cbbd64-qzhmm   0/1     CrashLoopBackOff   43 (20s ago)     4h54m
azure-cloud-controller-manager-7b75cbbd64-w5cl8   1/1     Running            70 (2m52s ago)   7h33m
azure-cloud-node-manager-9r8gb                    0/1     CrashLoopBackOff   93 (79s ago)     7h33m
azure-cloud-node-manager-jn8lv                    0/1     CrashLoopBackOff   93 (82s ago)     7h33m
azure-cloud-node-manager-n4vt4                    0/1     CrashLoopBackOff   93 (102s ago)    7h33m  

$ oc -n openshift-cloud-controller-manager logs -f azure-cloud-controller-manager-7b75cbbd64-w5cl8 -c cloud-controller-manager     Error from server: no preferred addresses found; known addresses: []

Actual results:

Cluster install failed on ASH

Expected results:

Cluster install succeed on ASH

Additional info:

log-bundle: https://drive.google.com/file/d/1QQwyQ1MxuunZx6AXqOTt6KwYwUk2GW7R/view?usp=sharing

https://github.com/openshift/cloud-provider-azure/pull/104

Bug OCPBUGS-29391: AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

View the Description View the linked PRs

Description of problem:

AWS HyperShift clusters' nodes cannot join cluster with custom domain name in DHCP Option Set

Version-Release number of selected component (if applicable):

Any

How reproducible:

100%

Steps to Reproduce:

1. Create a VPC for a HyperShift/ROSA HCP cluster in AWS
2. Replace the VPC's DHCP Option Set with another with a custom domain name (example.com or really any domain of your choice)
3. Attempt to install a HyperShift/ROSA HCP cluster with a nodepool

Actual results:

All EC2 instances will fail to become nodes. They will generate CSR's based on the default domain name - ec2.internal for us-east-1 or ${region}.compute.internal for other regions (e.g. us-east-2.compute.internal)

Expected results:

Either that they become nodes or that we document that custom domain names in DHCP Option Sets are not allowed with HyperShift at this time. There is currently no pressing need for this feature, though customers do use this in ROSA Classic/OCP successfully.

Additional info:

This is a known gap currently in cluster-api-provider-aws (CAPA) https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1691

https://github.com/openshift/hypershift/pull/3779

Bug OCPBUGS-29423: https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#tabledata references PF4 classname

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#tabledata includes a reference to `pf-c-table__action`, but v1+ of console-dynamic-plugin-sdk requires PatternFly 5, so the reference should be updated to `pf-v5-c-table__action`.

https://github.com/openshift/console/pull/13606

Bug OCPBUGS-35448: AWS bootstrapping failure due to missing MCS target groups

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34819~~. The following is the description of the original issue:
—
Some AWS installs are failing to bootstrap due to an issue where CAPA may fail to create load balancer resources, but still declare that infrastructure is ready (see upstream issue for more details).

In these cases, load balancers are failing to be created due to either rate limiting:

time="2024-05-25T21:43:07Z" level=debug msg="E0525 21:43:07.975223     356 awscluster_controller.go:280] \"failed to reconcile load balancer\" err=<"
time="2024-05-25T21:43:07Z" level=debug msg="\t[failed to modify target group attribute: Throttling: Rate exceeded"

or in some cases another error:

time="2024-06-01T06:43:58Z" level=debug msg="E0601 06:43:58.902534     356 awscluster_controller.go:280] \"failed to reconcile load balancer\" err=<"
time="2024-06-01T06:43:58Z" level=debug msg="\t[failed to apply security groups to load balancer \"ci-op-jnqi01di-5feef-92njc-int\": ValidationError: A load balancer ARN must be specified"
time="2024-06-01T06:43:58Z" level=debug msg="\t\tstatus code: 400, request id: 77446593-03d2-40e9-93c0-101590d150c6, failed to create target group for load balancer: DuplicateTargetGroupName: A target group with the same name 'apiserver-target-1717224237' exists, but with different settings"

We have an upstream PR in progress to retry the reconcile logic for load balancers.

Original component readiness report below.

=====

Component Readiness has found a potential regression in install should succeed: cluster bootstrap.

There is no significant evidence of regression

Sample (being evaluated) Release: 4.16
Start Time: 2024-05-28T00:00:00Z
End Time: 2024-06-03T23:59:59Z
Success Rate: 96.60%
Successes: 227
Failures: 8
Flakes: 0

Base (historical) Release: 4.15
Start Time: 2024-02-01T00:00:00Z
End Time: 2024-02-28T23:59:59Z
Success Rate: 99.87%
Successes: 767
Failures: 1
Flakes: 0

View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Installer%20%2F%20openshift-installer&confidence=95&environment=ovn%20no-upgrade%20amd64%20aws%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2024-06-03%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-05-28%2000%3A00%3A00&testId=cluster%20install%3A6ce515c7c732a322333427bf4f5508a5&testName=install%20should%20succeed%3A%20cluster%20bootstrap&upgrade=no-upgrade&variant=standard

https://github.com/openshift/installer/pull/8598

Bug OCPBUGS-20466: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/873

Bug OCPBUGS-24650: AdminPolicyBasedExternalRoute status.Status is not updated

View the Description View the linked PRs

Description of problem:

https://issues.redhat.com/browse/OCPBUGS-22710?focusedId=23594559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-23594559

seems the issue still here, test on 4.15.0-0.nightly-2023-12-04-223539, there is status.message for each zone, but there is no summarized status, so move to Assigned.
apbexternalroute yaml file is:

apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: default-route-policy
spec:
  from:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: test
  nextHops:
    static:
    - ip: "172.18.0.8"
    - ip: "172.18.0.9"

and Status section as below:

% oc get apbexternalroute
NAME                   LAST UPDATE   STATUS
default-route-policy   12s <--- still empty
% oc describe apbexternalroute default-route-policy | tail -n 10
Status:
Last Transition Time: 2023-12-06T02:12:11Z
Messages:
qiowang-120620-gtt85-master-2.c.openshift-qe.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
qiowang-120620-gtt85-master-0.c.openshift-qe.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
qiowang-120620-gtt85-worker-a-55fzx.c.openshift-qe.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
qiowang-120620-gtt85-master-1.c.openshift-qe.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
qiowang-120620-gtt85-worker-b-m98ms.c.openshift-qe.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
qiowang-120620-gtt85-worker-c-vtl8q.c.openshift-qe.internal: configured external gateway IPs: 172.18.0.8,172.18.0.9
Events: <none>

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2149

Bug OCPBUGS-25897: Failed to create secret on HyperShift Hosted Cluster with short-lived token was enabled by CCO.

View the Description View the linked PRs

Description of problem:

Hosted cluster credentialsMode mode is not manual and cannot create secrets.
Now the Control Plan credentialsMode is the same as Management Cluster, but for this feature, it should be manual mode on Hosted Cluster no matter what the credentialsMode of Management Cluster is.

Version-Release number of selected component (if applicable):

    4.16

How reproducible:

    Always

Steps to Reproduce:

   
 1.Creates CredentialsRequest including the spec.providerSpec.stsIAMRoleARN string. 
   
 2.Cloud Credential Operator could not populate Secret based on CredentialsRequest.   

$ oc get secret -A | grep test-mihuang
#Secret not found.  

$ oc get CredentialsRequest -n openshift-cloud-credential-operator
NAME                                                  AGE
...
test-mihuang                                               44s
    3.

Actual results:

    Secret not create successfully.

Expected results:

    Successfully created the secret on the hosted cluster.

Additional info:

https://github.com/openshift/hypershift/pull/3375

Bug OCPBUGS-44234: Backwards compatibility for ENI tagging in AWS on HCP ROSA

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-43921~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-43898. The following is the description of the original issue:
—
Description of problem:

OCP 4.17 requires permissions to tag network interfaces (ENIs) on instance creation in support of the Egress IP feature.

ROSA HCP uses managed IAM policies, which are reviewed and gated by AWS. The current policy AWS has applied does not allow us to tag ENIs out of band, only ones that have 'red-hat-managed: true`, which are going to be tagged during instance creation.

However, in order to support backwards compatibility for existing clusters, we need to roll out a CAPA patch that allows us to call `RunInstances` with or without the ability to tag ENIs.

Once we backport this to the Z streams, upgrade clusters and rollout the updated policy with AWS, we can then go back and revert the backport.

For more information see https://issues.redhat.com/browse/SDE-4496

Version-Release number of selected component (if applicable):

4.17

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-api-provider-aws/pull/531

Bug OCPBUGS-25339: OLM pod panics when EnsureSecretOwnershipAnnotations runs

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Run OLM on 4.15 cluster
    2.
    3.

Actual results:

    OLM pod will panic

Expected results:

    Should run just fine

Additional info:

    This issue is due to failure of initiate a new map if nil

https://github.com/openshift/operator-framework-olm/pull/634

Bug OCPBUGS-25538: Update 4.16 ose-image-customization-controller-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/image-customization-controller/pull/116

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/image-customization-controller/pull/116

Bug OCPBUGS-42878: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/131

Bug OCPBUGS-24408: Node Overview Pane not displaying

View the Description View the linked PRs

Description of problem:

Node Overview Pane not displaying

Version-Release number of selected component (if applicable):

How reproducible:

    In the openshift console, under Compute > Node > Node Details >
the Overview tab does not display

Steps to Reproduce:

 In the openshift console, under Compute > Node > Node Details >
the Overview tab does not display

Actual results:

    Overview tab does not display

Expected results:

    Overview tab should display

Additional info:

{code

https://github.com/openshift/console/pull/13422

Bug OCPBUGS-32188: cluster-olm-operator's README should give more context about what the repo is for

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-olm-operator/pull/49

Bug OCPBUGS-16736: ‘Oh no! Something went wrong’ will be shown when user go to MultiClusterEngine details -> Yaml tab

View the Description View the linked PRs

Description of problem:

    Oh no! Something went wrong’ will be shown when user go to MultiClusterEngine details -> Yaml tab

Version-Release number of selected component (if applicable):

    4.14.0-0.nightly-2023-07-20-215234

How reproducible:

    Always

Steps to Reproduce:

1. Install 'multicluster engine for Kubernetes' operator in the cluster
2. Use the default value to create a new MultiClusterEngine
3. Navigate to the MultiClusterEngine details -> Yaml Tab

Actual results: 
‘Oh no! Something went wrong.’ error will be shown with below details
TypeErrorDescription:
 Cannot read properties of null (reading 'editor')

Expected results:

    no error

Additional info:

    This bug fix is in conjunction with https://issues.redhat.com/browse/OCPBUGS-22778

https://github.com/openshift/console/pull/13510

Bug OCPBUGS-24848: Update 4.16 monitoring-plugin-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/monitoring-plugin/pull/86

Bug OCPBUGS-33522: common users are unable to create ephemeral/CSI volumes in upgraded clusters

View the Description View the linked PRs

Description of problem:

    The storage team added CSI and ephemeral volumes in 4.12 and 4.13 but the affected SCCs are not being reconciled, resulting in these capabilities unreachable to the hands of the expected end users.

Version-Release number of selected component (if applicable):

    4.13+

How reproducible:

    100%

Steps to Reproduce:

    1.check either of "anyuid", "hostaccess", "hostmount-anyuid", "hostnetwork", "nonroot", "restricted" SCCs on a cluster upgraded from 4.11

Actual results:

    no "csi" and "ephemeral" in .volumes

Expected results:

    "csi" and "ephemeral" in .volumes

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1675

Bug OCPBUGS-36759: [UI] RWOP accessMode is not available on OpenShift console UI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29777~~. The following is the description of the original issue:
—
Description of problem:

RWOP accessMode is tech preview feature starting from OCP 4.14 and GA in 4.16. But on OCP console UI, there is not option available for creating a PVC with RWOP accessMode

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Login to OCP console in Administrator mode (4.14/4.15/4.16)
    2. Go to 'Storage -> PersistentVolumeClaim -> Click on Create PersistentVolumeClaim' 
    3. Check under 'Access Mode*', RWOP option is not present

Actual results:

    RWOP accessMode option is not present

Expected results:

    RWOP accessMode option is present

Additional info:

Storage feature: https://issues.redhat.com/browse/STOR-1171

https://github.com/openshift/console/pull/14040

Task HOSTEDCP-1314: e2e to make sure that e2e clusters uses NLB

View the Description View the linked PRs

Modify e2e to ensure all HCs use NLB for ingress controller.
Modify private cluster e2e to ensure it uses NLB wit scope internal for ingress controller.
This is to mimic what we do in ROSA
https://redhat-internal.slack.com/archives/C01C8502FMM/p1701254572808179

https://github.com/openshift/hypershift/pull/3293

Bug OCPBUGS-25412: APIServer URL Env is required on all nodes

View the Description View the linked PRs

Description of problem:

    The apiserver-url.env file is a dependency of all CCM components. These mostly run on the masters, however, on Azure, they also run on workers.

A recent change in kube (https://github.com/kubernetes/kubernetes/pull/121028) means that a previous bug has been fixed that now means that workers no longer bootstrap, since Kubelet no longer sets an IP address.

To resolve this issue, we need the CNM to be able to talk to KAS outside of the CNI, this works already on masters, but the url env file is missing on workers so they get stuck.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4076

Bug OCPBUGS-24945: Update 4.16 ose-alibaba-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/48

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/48

Bug OCPBUGS-26017: Update 4.16 ose-multus-whereabouts-ipam-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/223

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/223

Bug OCPBUGS-29466: HCP: hypershift-operator on disconnected clusters ignores ImageContentSourcePolicies when a ImageDigestMirrorSet exist on the management cluster

View the Description View the linked PRs

Description of problem:

    The design doc for ImageDigestMirrorSet states:
"ImageContentSourcePolicy CRD will be marked as deprecated and will be supported during all of 4.x. Update and coexistence of ImageDigestMirrorSet/ ImageTagMirrorSet and ImageContentSourcePolicy is supported. We encourage users to move to IDMS while supporting both in the cluster, but will not remove ICSP in OCP 4.x.".
see: https://github.com/openshift/machine-config-operator/blob/master/docs/ImageMirrorSetDesign.md#goals

see also:
https://github.com/openshift/enhancements/blob/master/enhancements/api-review/add-new-CRD-ImageDigestMirrorSet-and-ImageTagMirrorSet-to-config.openshift.io.md#update-the-implementation-for-migration-path
for the rationale behind it.

but the hypershift-operator is reading ImageContentSourcePolicy only if no ImageDigestMirrorSet exists on the cluster, see:
https://github.com/openshift/hypershift/blob/main/support/globalconfig/imagecontentsource.go#L101-L102

Version-Release number of selected component (if applicable):

    4.14, 4.15, 4.16

How reproducible:

    100%

Steps to Reproduce:

    1. Set both an ImageContentSourcePolicy and ImageDigestMirrorSet with different content on the management cluster
    2.
    3.

Actual results:

the hypershift-operator consumes only the ImageDigestMirrorSet content ignoring the ImageContentSourcePolicy one.

Expected results:

since both ImageDigestMirrorSet and ImageContentSourcePolicy (although deprecated) are still supported on the management cluster, the hypershift-operator should align.

Additional info:

currently oc-mirror (v1) is only generating imageContentSourcePolicy.yaml without any imageDigestMirrorSet.yaml equivalent breaking the hypershift disconnected scenario on clusters where an IDMS is already there for other reasons.

https://github.com/openshift/hypershift/pull/3862

Bug OCPBUGS-28748: MultipleDefaultStorageClasses alert fires even when there is only one default SC

View the Description View the linked PRs

In three clusters, I am receiving the alert:

"Multiple default storage classes are marked as default. The storage class is chosen for the PVC is depended on version of the cluster.

Starting with OpenShift 4.13, a persistent volume claim (PVC) requesting the default storage class gets the most recently created default storage class if multiple default storage classes exist."

But the alert clearly shows only one default SC:

"Red Hat recommends to set only one storage class as the default one.

Current default storage classes:

ocs-external-storagecluster-ceph-rbd"

This is confirmed with 'oc get sc'

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ocs-external-storagecluster-ceph-rbd (default) openshift-storage.rbd.csi.ceph.com Delete Immediate true 351d
ocs-external-storagecluster-ceph-rbd-windows openshift-storage.rbd.csi.ceph.com Delete Immediate true 11d
ocs-external-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 351d
openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 351d

https://github.com/openshift/cluster-storage-operator/pull/460

Bug OCPBUGS-31703: ResourceYAMLEditor has no creation option

View the Description View the linked PRs

Description of problem:

    ResourceYAMLEditor has no create option. THis means that can be used only for editing objects.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Use ResourceYAMLEditor in a different page from the details page
    2.
    3.

Actual results:
Be able to create the object
See the samples in the sidebar
See 'Create' button instead of 'Save'

Expected results:

    only "save' button and no samples.
Additional info:
{code:none}

https://github.com/openshift/console/pull/13722

Bug OCPBUGS-31711: platform.aws.lbType explain docs is misleading

View the Description View the linked PRs

Description of problem:

The go docs in the install-config's platform.aws.lbType is misleading as well as on the ingress object (oc explain ingresses.config.openshift.io.spec.loadBalancer.platform.aws.type).

Both say:
"When this field is specified, the default ingresscontroller will be created using the specified load-balancer type."

That is true, but what is missing is that ALL ingresscontrollers will be created using the specified load-balancer type by default (not just the default ingresscontroller).

This missing information can be confusing to users.

Version-Release number of selected component (if applicable):

    4.12+

How reproducible:

    100%

Steps to Reproduce:

openshift-install explain installconfig.platform.aws.lbType
- or -
oc explain ingresses.config.openshift.io.spec.loadBalancer.platform.aws.type

Actual results:

./openshift-install explain installconfig.platform.aws.lbType
KIND:     InstallConfig
VERSION:  v1RESOURCE: <string>

LBType is an optional field to specify a load balancer type. When this field is specified, the default ingresscontroller will be created using the specified load-balancer type. 

...
[same with ingress.spec.loadBalancer.platform.aws.type]

Expected results:

My suggestion:

./openshift-install explain installconfig.platform.aws.lbType
KIND:     
InstallConfig 
VERSION:  v1RESOURCE: <string>  

LBType is an optional field to specify a load balancer type. When this field is specified, all ingresscontrollers (including the default ingresscontroller) will be created using the specified load-balancer type by default.

...
[same with ingress.spec.loadBalancer.platform.aws.type]

Additional info:

    Since the change should be the same thing for both the installconfig and ingress object, this bug would handle both.

https://github.com/openshift/installer/pull/8258

Bug OCPBUGS-33430: Minor typo in capi installation

View the Description View the linked PRs

This is just a minor typo, but since its in an Info message that will appear on every installation it should be fixed.

time="2024-05-08T17:30:57-04:00" level=info msg="Waiting up to 15m0s (until 5:45PM EDT) for network infrastructure to become ready..."
time="2024-05-08T17:33:09-04:00" level=info msg="Netork infrastructure is ready" <==

https://github.com/openshift/installer/pull/8375

Bug OCPBUGS-27932: Update 4.16 ose-apiserver-network-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/apiserver-network-proxy/pull/47

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/apiserver-network-proxy/pull/47

Bug OCPBUGS-30491: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-olm-operator/pull/48

Task MON-3698: Update downstream prom-label-proxy to v0.8.1

View the linked PRs

https://github.com/openshift/prom-label-proxy/pull/363

Bug MGMT-16813: ClusterImageSetReference pulls wrong image

View the Description View the linked PRs

Description of the problem:

Specified ClusterImageSetRef in SiteConfig CR in HubCluster mistmatches the specific image pulled from quay when trying to install 4.12.x managed SNO clusters. This behavior has not been detected when installing SNO 4.14.x

How reproducible:

Install 4.12.19 from ACM using clusterImageSetNameRef: "img4.12.19-x86-64-appsub"

Actual results:

Image pulled is actually img4.12.19-multi-x86-64-appsub

# oc adm release info 4.12.19 | more
Name:           4.12.19
Digest:         sha256:4db028028aae12cb82b784ae54aaca6888f224b46565406f1a89831a67f42030
Created:        2023-05-24T06:58:32Z
OS/Arch:        linux/amd64
Manifests:      647
Metadata files: 1
 
Pull From: quay.io/openshift-release-dev/ocp-release@sha256:4db028028aae12cb82b784ae54aaca6888f224b46565406f1a89831a67f42030
  Metadata:     
     release.openshift.io/architecture: multi                               url:                               https://access.redhat.com/errata/RHSA-2023:3287

Expected results:

Image pulled is img4.12.19-x86-64-appsub

# oc adm release info 4.12.19 | more

Name:           4.12.19
Digest:     sha256:41fd42cc8b9f86fc86cc8763dcf27e976299ff632a336d393b8e643bd8a5f967
 
Pull From: quay.io/openshift-release-dev/ocp-release@sha256:41fd42cc8b9f86fc86cc8763dcf27e976299ff632a336d393b8e643bd8a5f967
  Metadata:     
url: https://access.redhat.com/errata/RHSA-2023:3287

https://github.com/openshift/assisted-service/pull/6066

Bug OCPBUGS-30641: Power VS: Cannot deploy with service IDs

View the Description View the linked PRs

Description of problem:

    When deploying with a service ID, the installer is unable to query resource groups.

Version-Release number of selected component (if applicable):

    4.13-4.16

How reproducible:

    Easily

Steps to Reproduce:

    1. Create a service ID with seemingly enough permissions to do an IPI install
    2. Deploy to power vs with IPI
    3. Fail

Actual results:

    Fail to deploy a cluster with service ID

Expected results:

    cluster create should succeed

Additional info:

https://github.com/openshift/installer/pull/8111

Bug OCPBUGS-37214: Multiple KMS encryption providers specified in single element for IBM Cloud

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37102~~. The following is the description of the original issue:
—
Description of problem:

Enabling KMS for IBM Cloud will result in the kube-apiserver failing with the following configuration error:

17:45:45 E0711 17:43:00.264407       1 run.go:74] "command failed" err="error while parsing file: resources[0].providers[0]: Invalid value: config.ProviderConfiguration{AESGCM:(*config.AESConfiguration)(nil), AESCBC:(*config.AESConfiguration)(nil), Secretbox:(*config.SecretboxConfiguration)(nil), Identity:(*config.IdentityConfiguration)(0x89b4c60), KMS:(*config.KMSConfiguration)(0xc000ff1900)}: more than one provider specified in a single element, should split into different list elements"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/4380

Bug OCPBUGS-34112: Admission webhook warning on Route and buildConfig creation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33383~~. The following is the description of the original issue:
—
Description of problem:

    Admission webhook warning on creation of Route - violates policy 299 - unknow field "metadata.defaultAnnotations"  
Admission webhook warning on creation of buildConfig - violates policy 299 - unknow field "spec.source.git.type"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Navigate to Import from git form and create a deployment
    2. See the `Admission webhook warning` toast notification

Actual results:

    Admission webhook warning - violates policy 299 - unknow field "metadata.defaultAnnotations" showing up on creation of Route and Admission webhook warning on creation of buildConfig - violates policy 299 - unknow field "spec.source.git.type"

Expected results:

    No Admission webhook warning should show

Additional info:

https://github.com/openshift/console/pull/13881

Bug OCPBUGS-25881: [UI] in Openshift-storage-client namespace, 'RWX' access mode RBD PVC with Volume mode 'Filesystem' is not blocked, it attempt to create and stuck in pending state

View the Description View the linked PRs

Description of problem:

Copying BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2250911 on OCP side (as fix is needed on console).

[UI] In Openshift-storage-client namespace, 'RWX' access  mode RBD PVC with volumemode'Filesystem' can be created from Client. However, this is an invalid combination for RBD PVC creation From ODF Operator UI of other Platforms. Volume mode is not available when Cepfrbd storageclass and RWX access mode selected on other platform. This is visible in client operator view.  This attempt to create PVc and stuck in pending state

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Deploy Provider Client setup.
2. From UI Create PVC, select storage class : ceph-rbd, RWX access mode, check filemode : in case of this bug 'Filesystem' and 'block' volume mode is visible on UI, select volumemode: Filesystem and create the PVC.

Actual results:

PVC Created and stuck in pending status. 
PVC event shows error like:
 Generated from openshift-storage-client.rbd.csi.ceph.com_csi-rbdplugin-provisioner-6d9dcb9fc7-vjj22_2bd4ede5-9418-4c8e-80ae-169b5cb4fa8012 times in the last 13 minutes
failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = multi node access modes are only supported on rbd `block` type volumes

Expected results:

Volumemode should not be visible on page when PVC with RWX access mode and RBD storage class is selected.

Additional info:

Screenshots are attached to the BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2250911

https://bugzilla.redhat.com/show_bug.cgi?id=2250911#c3

https://github.com/openshift/console/pull/13418

Bug OCPBUGS-33460: Adjust Netqueues automation to use modified tuned profiles directory

View the Description View the linked PRs

Description of problem:

    Due to recent changes in tuned (https://github.com/openshift/cluster-node-tuning-operator/pull/1045/) The profiles directory was moved from /etc/tuned/<openshift-node-performance-performance-profile-name> to /var/lib/ocp-tuned/profiles/<openshift-node-performacne-performance-profile-name>

Version-Release number of selected component (if applicable):

    4.16.0

How reproducible:

    everytime

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/1055

Bug OCPBUGS-29532: console-operator is unable to add its OIDC client info

View the Description View the linked PRs

Description of problem:

    If the authentication.config/cluster Type=="" but the OAuth/User APIs are already missing, the console-operator won't update the authentication.config/cluster status with its own client as it's crashing on being unable to retrieve OAuthClients.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. scale oauth-apiserver to 0
    2. set featuregates to TechPreviewNotUpgradable
    3. watch the authentication.config/cluster .status.oidcClients

Actual results:

    The client for the console does not appear.

Expected results:

    The client for the console should appear.

Additional info:

https://github.com/openshift/console-operator/pull/861

Bug OCPBUGS-37645: Cannot reach to kubernetes.default.svc.cluster.local from workers of Hosted Cluster

View the Description View the linked PRs

Description of problem

Debug into one of the worker nodes on the hosted cluster:

oc debug node/ip-10-1-0-97.ca-central-1.compute.internal

nslookup kubernetes.default.svc.cluster.local
Server:         10.1.0.2
Address:        10.1.0.2#53

** server can't find kubernetes.default.svc.cluster.local: NXDOMAIN

curl -k https://172.30.0.1:443/readyz
curl: (7) Failed to connect to 172.30.0.1 port 443: Connection refused

sh-5.1# curl -k https://172.20.0.1:443/readyz
ok

Version-Release number of selected component (if applicable):

4.15.20

Steps to Reproduce:

Unknown

Actual results:

Pods on a hosted cluster's workers unable to connect to their internal kube apiserver via the service IP.

Expected results:

Pods on a hosted cluster's workers have connectivity to their kube apiserver via the service IP.

Additional info:

Checked the "Konnectivity server" logs on Dynatrace and found the error below occurs repeatedly

E0724 01:02:00.223151       1 server.go:895] "DIAL_RSP contains failure" err="dial tcp 172.30.176.80:8443: i/o timeout" dialID=8375732890105363305 agentID="1eab211f-6ea1-46ea-bc78-14d75d6ba325"

E0724 01:02:00.223482       1 tunnel.go:150] "Received failure on connection" err="read tcp 10.128.17.15:8090->10.128.82.107:52462: use of closed network connection"

Looks the konnectivity server is trying to establish a connection to 172.30.176.80:8443 but is timing out
also the 2nd error indicates that an existing network connection was closed unexpectedly

Relevant OHSS Ticket: https://issues.redhat.com/browse/OHSS-36053

Slack thread discussion

https://github.com/openshift/hypershift/pull/4431

Bug OCPBUGS-20565: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/320

Bug OCPBUGS-26128: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/77

Bug OCPBUGS-32981: Bootstrap fails to wait for masters to provision

View the Description View the linked PRs

We shut down the bootstrap node before the control plane hosts are provisioned:

Apr 24 17:30:05 localhost.localdomain master-bmh-update.sh[10498]: openshift-machine-api   openshift-4                                            true             8m24s
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[4461]: Waiting for 2 masters to become provisioned
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[10602]: NAMESPACE               NAME          STATE          CONSUMER                  ONLINE   ERROR   AGE
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[10602]: openshift-machine-api   openshift-0   provisioning   cluster4-59zbh-master-0   true             8m46s
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[10602]: openshift-machine-api   openshift-1   provisioning   cluster4-59zbh-master-1   true             8m45s
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[10602]: openshift-machine-api   openshift-2   provisioning   cluster4-59zbh-master-2   true             8m45s
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[10602]: openshift-machine-api   openshift-3                                            true             8m44s
Apr 24 17:30:25 localhost.localdomain master-bmh-update.sh[10602]: openshift-machine-api   openshift-4                                            true             8m44s
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[4461]: Stopping provisioning services...
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[10708]: deactivating
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[4461]: Unpause all baremetal hosts
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[10724]: baremetalhost.metal3.io/openshift-0 annotated
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[10724]: baremetalhost.metal3.io/openshift-1 annotated
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[10724]: baremetalhost.metal3.io/openshift-2 annotated
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[10724]: baremetalhost.metal3.io/openshift-3 annotated
Apr 24 17:30:45 localhost.localdomain master-bmh-update.sh[10724]: baremetalhost.metal3.io/openshift-4 annotated
Apr 24 17:30:45 localhost.localdomain systemd[1]: Finished Update master BareMetalHosts with introspection data.

Bug OCPBUGS-25898: PipelineRuns details page get active on Task selection on logs page and logs page get empty on logs tab selection

View the Description View the linked PRs

Description of problem:

PipelineRun logs page navigation is broken on navigate through the task on the PiplineRun log tab.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Navigate to PipelineRuns details page and select the Logs tab.
    2. Navigate through the tasks of the PipelineRun tasks

Actual results:

- Details tab gets active on selection of any task
- Logs page gets empty on seldction of Logs tab again
- Last task is not selected for completed PipelineRuns

Expected results:

- Logs tab should be active when user is not the Logs tab
- Last task should be selected in case of the completed PipelineRuns

Additional info:

  It is a regression after change in logic of tab selection in HorizontalNav component.

https://github.com/openshift/console/pull/13216/files#diff-267d61f330ad6cd9b0f2d743d9ff27929fbe7001780d73e1ec88599d3778eb96R177-R190

Video- https://drive.google.com/file/d/15fx9GWO2dRh4uaibRmZ4VTk4HFxQ7NId/view?usp=sharing

https://github.com/openshift/console/pull/13470

Bug OCPBUGS-31082: OpenShift Pipelines: Error during adding parameters to Pipeline

View the Description View the linked PRs

Description of problem:

When adding parameters to a pipeline there is an error when trying to save.

It seems a resource[] section is added, this doesn't happen when using yaml resources and oc client.

Discussed with Vikram Raj

Version-Release number of selected component (if applicable):

    4.14.12

How reproducible:

    Always

Steps to Reproduce:

    1.Create a pipeline
    2.Add a parameter
    3.Save the pipeline

Actual results:

    Error shown

Expected results:

    Save successful

Additional info:

https://github.com/openshift/console/pull/13741

Story PODAUTO-91: Add joelsmith to OWNERS of openshift/kubernetes-autoscaler

View the Description View the linked PRs

This ticket is to satisfy the bot on GH

https://github.com/openshift/kubernetes-autoscaler/pull/273

Bug OCPBUGS-34826: Nutanix CCM: SWEET32 "SSL Medium Strength Cipher Suites Supported" reported

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34689~~. The following is the description of the original issue:
—
Description of problem:

   Customer is running Openshift on AHV and their Tenable Security Scan reported the following vulnerability on the Nutanix Cloud Controller Manager Deployment. 
https://www.tenable.com/plugins/nessus/42873 on port 10258 SSL Medium Strength Cipher Suites Supported (SWEET32)
The Nutanix Cloud Controller Manager deployment runs two pods and exposes port 10258 to the outside world. 
sh-4.4# netstat -ltnp|grep -w '10258'
tcp6       0      0 :::10258                :::*                    LISTEN      10176/nutanix-cloud
sh-4.4# ps aux|grep 10176
root       10176  0.0  0.2 1297832 59764 ?       Ssl  Feb15   4:40 /bin/nutanix-cloud-controller-manager --v=3 --cloud-provider=nutanix --cloud-config=/etc/cloud/nutanix_config.json --controllers=* --configure-cloud-routes=false --cluster-name=trulabs-8qmx4 --use-service-account-credentials=true --leader-elect=true --leader-elect-lease-duration=137s --leader-elect-renew-deadline=107s --leader-elect-retry-period=26s --leader-elect-resource-namespace=openshift-cloud-controller-manager
root     1403663  0.0  0.0   9216  1100 pts/0    S+   14:17   0:00 grep 10176


[centos@provisioner-trulabs-0-230518-065321 ~]$ oc get pods -A -o wide | grep nutanix
openshift-cloud-controller-manager                 nutanix-cloud-controller-manager-5c4cdbb9c-jnv7c            1/1     Running     0                4d18h   172.17.0.249   trulabs-8qmx4-master-1       <none>           <none>
openshift-cloud-controller-manager                 nutanix-cloud-controller-manager-5c4cdbb9c-vtrz5            1/1     Running     0                4d18h   172.17.0.121   trulabs-8qmx4-master-0       <none>           <none>


[centos@provisioner-trulabs-0-230518-065321 ~]$ oc describe pod -n openshift-cloud-controller-manager                 nutanix-cloud-controller-manager-5c4cdbb9c-jnv7c
Name:                 nutanix-cloud-controller-manager-5c4cdbb9c-jnv7c
Namespace:            openshift-cloud-controller-manager
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      cloud-controller-manager
Node:                 trulabs-8qmx4-master-1/172.17.0.249
Start Time:           Thu, 15 Feb 2024 19:24:52 +0000
Labels:               infrastructure.openshift.io/cloud-controller-manager=Nutanix
                      k8s-app=nutanix-cloud-controller-manager
                      pod-template-hash=5c4cdbb9c
Annotations:          operator.openshift.io/config-hash: b3e08acdcd983115fe7a2b94df296362b20c35db781c8eec572fbe24c3a7c6aa
Status:               Running
IP:                   172.17.0.249
IPs:
  IP:           172.17.0.249
Controlled By:  ReplicaSet/nutanix-cloud-controller-manager-5c4cdbb9c
Containers:
  cloud-controller-manager:
    Container ID:  cri-o://f5c0f39e1907093c9359aa2ac364c5bcd591918b06103f7955b30d350c730a8a
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f3e7b600d94d1ba0be1edb328ae2e32393acba819742ac3be5e6979a3dcbf4c
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f3e7b600d94d1ba0be1edb328ae2e32393acba819742ac3be5e6979a3dcbf4c
    Port:          10258/TCP
    Host Port:     10258/TCP
    Command:
      /bin/bash
      -c
      #!/bin/bash
      set -o allexport
      if [[ -f /etc/kubernetes/apiserver-url.env ]]; then
        source /etc/kubernetes/apiserver-url.env
      fi
      exec /bin/nutanix-cloud-controller-manager \
        --v=3 \
        --cloud-provider=nutanix \
        --cloud-config=/etc/cloud/nutanix_config.json \
        --controllers=* \
        --configure-cloud-routes=false \
        --cluster-name=$(OCP_INFRASTRUCTURE_NAME) \
        --use-service-account-credentials=true \
        --leader-elect=true \
        --leader-elect-lease-duration=137s \
        --leader-elect-renew-deadline=107s \
        --leader-elect-retry-period=26s \
        --leader-elect-resource-namespace=openshift-cloud-controller-manager

    State:          Running
      Started:      Thu, 15 Feb 2024 19:24:56 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     200m
      memory:  128Mi
    Environment:
      OCP_INFRASTRUCTURE_NAME:   trulabs-8qmx4
      NUTANIX_SECRET_NAMESPACE:  openshift-cloud-controller-manager
      NUTANIX_SECRET_NAME:       nutanix-credentials
      POD_NAMESPACE:             openshift-cloud-controller-manager (v1:metadata.namespace)
    Mounts:
      /etc/cloud from nutanix-config (ro)
      /etc/kubernetes from host-etc-kube (ro)
      /etc/pki/ca-trust/extracted/pem from trusted-ca (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4ht28 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  nutanix-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cloud-conf
    Optional:  false
  trusted-ca:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      ccm-trusted-ca
    Optional:  false
  host-etc-kube:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes
    HostPathType:  Directory
  kube-api-access-4ht28:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                             node.kubernetes.io/not-ready:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:                      <none>


Medium Strength Ciphers (> 64-bit and < 112-bit key, or 3DES)

    Name                          Code             KEX           Auth     Encryption             MAC
    ----------------------        ----------       ---           ----     ---------------------  ---
    ECDHE-RSA-DES-CBC3-SHA        0xC0, 0x12       ECDH          RSA      3DES-CBC(168)          SHA1
    DES-CBC3-SHA                  0x00, 0x0A       RSA           RSA      3DES-CBC(168)          SHA1

The fields above are :

  {Tenable ciphername}
  {Cipher ID code}
  Kex={key exchange}
  Auth={authentication}
  Encrypt={symmetric encryption method}
  MAC={message authentication code}
  {export flag}


[centos@provisioner-trulabs-0-230518-065321 ~]$ curl -v telnet://172.17.0.2:10258
* About to connect() to 172.17.0.2 port 10258 (#0)
*   Trying 172.17.0.2...
* Connected to 172.17.0.2 (172.17.0.2) port 10258 (#0)

Version-Release number of selected component (if applicable):

How reproducible:

The nutanix CCM manager pod running in the OCP cluster does not set the option "--tls-cipher-suites".

Steps to Reproduce:

Create an OCP Nutanix cluster.

Actual results:

Run the below cli returns nothing.
$ oc describe pod -n openshift-cloud-controller-manager nutanix-cloud-controller-manager-... | grep "\--tls-cipher-suites"

Expected results:

   Expect the nutanix CCM manager deployment set the proper option "--tls-cipher-suites".

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/350

Bug OCPBUGS-23386: Unable to run oc commands on RHEL9 Host with FIPS enabled OCP cluster

View the Description View the linked PRs

Description of problem:

Unable to run oc commands in FIPS enable OCP cluster on PowerVS

Version-Release number of selected component (if applicable):

4.15.0-ec2

How reproducible:

Deploy OCP cluster with FIPS enabled

Steps to Reproduce:

1. Enable the var in var.tfvars - fips_compliant      = true
2. Deploy the cluster
3. run oc commands

Actual results:

[root@rdr-swap-fips-syd05-bastion-0 ~]# oc version
FIPS mode is enabled, but the required OpenSSL library is not available

[root@rdr-swap-fips-syd05-bastion-0 ~]# oc debug node/syd05-master-0.rdr-swap-fips.ibm.com
FIPS mode is enabled, but the required OpenSSL library is not available

[root@rdr-swap-fips-syd05-bastion-0 ~]# fips-mode-setup --check
FIPS mode is enabled.

Expected results:

# oc debug node/syd05-master-0.rdr-swap-fips1.ibm.com
Temporary namespace openshift-debug-dns7d is created for debugging node...
Starting pod/syd05-master-0rdr-swap-fips1ibmcom-debug-hs4dr ...
To use host binaries, run `chroot /host`
Pod IP: 193.168.200.9

Additional info:

Not able to collect must gather logs due to the issue

links - https://access.redhat.com/solutions/7034387

Story TRT-1567: 4.16 ci payload failures, all workloads in ns/openshift-must-gather-494qg must set the 'openshift.io/required-scc' annotation

View the Description View the linked PRs

Starting with this 4.16-ci payload, we see these failures (examples shown below). It happens on aws, azure, and gcp:

4.16.0-0.ci-2024-03-16-025152 Rejected 38 hours ago 03-16T02:51:52Z https://amd64.ocp.releases.ci.openshift.org/releasestream/4.16.0-0.ci/release/4.16.0-0.ci-2024-03-16-025152

aggregated-aws-ovn-upgrade-4.16-minor Failed

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-aws-ovn-upgrade-4.16-minor-release-openshift-release-analysis-aggregator/1768832918135771136

Failed: suite=[openshift-tests], [sig-auth] all workloads in ns/openshift-must-gather-smq72 must set the 'openshift.io/required-scc' annotation

aggregated-azure-sdn-upgrade-4.16-minor Failed

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-azure-sdn-upgrade-4.16-minor-release-openshift-release-analysis-aggregator/1768832928868995072

Failed: suite=[openshift-tests], [sig-auth] all workloads in ns/openshift-must-gather-494qg must set the 'openshift.io/required-scc' annotation

This looks like the culprit: https://github.com/openshift/origin/pull/28589 ; revert = https://github.com/openshift/origin/pull/28659.

https://github.com/openshift/origin/pull/28659

Bug OCPBUGS-20085: [IBMCloud] Unhandled response during destroy disks

View the Description View the linked PRs

Description of problem:

During the destroy cluster operation, unexpected results from the IBM Cloud API calls for Disks can result in panics when response data (or responses) are missing, resulting in unexpected failures during destroy.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Unknown, dependent on IBM Cloud API responses

Steps to Reproduce:

1. Successfully create IPI cluster on IBM Cloud
2. Attempt to cleanup (destroy) the cluster

Actual results:

Golang panic attempting to parse a HTTP response that is missing or lacking data.


level=info msg=Deleted instance "ci-op-97fkzvv2-e6ed7-5n5zg-master-0"
E0918 18:03:44.787843      33 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 228 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6a3d760?, 0x274b5790})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x6a3d760, 0x274b5790})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1()
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc000791ce0, 0xc000573700)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc000791ce0, {{0xc00160c060, 0x29}, {0xc00160c090, 0x28}, {0xc0016141f4, 0x9}, {0x82b9f0d, 0x4}, {0xc00160c060, ...}})
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc000791ce0)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x485
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x7f7801e503c8, 0x18})
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:109 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x227a2f78?, 0xc00013c000?}, 0xc000a9b690?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:154 +0x57
k8s.io/apimachinery/pkg/util/wait.poll({0x227a2f78, 0xc00013c000}, 0xd0?, 0x146fea5?, 0x7f7801e503c8?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:245 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x227a2f78, 0xc00013c000}, 0x4136e7?, 0x28?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:229 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x100000000000000?, 0x806f00?)
	/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:214 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc000791ce0, {{0x82bb9a3?, 0xc000a9b7d0?}, 0xc000111de0?}, 0x840366?, 0xc00054e900?)
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
	/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xa87
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference

Expected results:

Destroy IBM Cloud Disks during cluster destroy, or provide a useful error message to follow up on.

Additional info:

The ability to reproduce is relatively low, as it requires the IBM Cloud API's to return specific data (or lack there of), which is currently unknown why the HTTP respoonse and/or data is missing.

IBM Cloud already has a PR to attempt to mitigate this issue, like done with other destroy resource calls. Potentially followup for additional resources as necessary.
https://github.com/openshift/installer/pull/7515

https://github.com/openshift/installer/pull/7515

Bug OCPBUGS-35392: [capi aws] delete S3 ignition bucket during bootstrap destroy

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35037~~. The following is the description of the original issue:
—
Description of problem:

    Contrary to terraform, we do not delete the S3 bucket used for ignition during bootstrapping.

Version-Release number of selected component (if applicable):

    4.16+

How reproducible:

    always

Steps to Reproduce:

    1. Deploy cluster
    2. Check that openshift-bootstrap-data-$infraID bucket exists and is empty.
    3.

Actual results:

    Empty bucket left.

Expected results:

    Bucket is deleted.

Additional info:

https://github.com/openshift/installer/pull/8590

Bug OCPBUGS-43964: cvo trying to progress unaccepted release following scale toggle

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-42386~~. The following is the description of the original issue:
—
Description of problem:

usually providing a cluster with unaccepted update, such as unsigned payload without force, is treated with releaseaccepted=false progressing=false. however by scaling cvo deployment down and up again, progressing=true is observed, causing oc adm upgrade as well as oc adm upgrade status to display incorrect information, and clusterversion object to display empty capabilities and history item with version ""

Version-Release number of selected component (if applicable):

4.16.0-rc.4 but observed as well as early as 4.10.67

How reproducible:

100%

Steps to Reproduce:

1. target the cluster at unsigned build without using force
❯ oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a

2. scale cvo down and up again
 ❯ oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled

❯ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled

Actual results:

oc adm update displays "info: An upgrade is in progress. Working towards..."

also a warning of "Architecture has not been configured"

❯ oc adm upgrade
info: An upgrade is in progress. Working towards registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a

ReleaseAccepted=False  

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a" failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a against keyrings: verifier-public-key-redhat

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.16
warning: Cannot display available updates:
  Reason: NoArchitecture
  Message: Architecture has not been configured.

clusterversion object have Progressing True, "capabilities: {}" as well as a partial history item with version ""

 ❯ oc get clusterversion version -oyaml                                                                                                                       
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2024-06-10T11:36:51Z"
  generation: 3
  name: version
  resourceVersion: "70199"
  uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
spec:
  channel: stable-4.16
  clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
  desiredUpdate:
    architecture: ""
    force: false
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    version: ""
status:
  availableUpdates: null
  capabilities: {}
  conditions:
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Architecture has not been configured.
    reason: NoArchitecture
    status: "False"
    type: RetrievedUpdates
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Capabilities match configured spec
    reason: AsExpected
    status: "False"
    type: ImplicitlyEnabledCapabilities
  - lastTransitionTime: "2024-06-10T14:06:42Z"
    message: 'Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a"
      failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      against keyrings: verifier-public-key-redhat'
    reason: RetrievePayload
    status: "False"
    type: ReleaseAccepted
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Done applying 4.16.0-rc.4
    status: "True"
    type: Available
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2024-06-10T14:07:30Z"
    message: Working towards registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    status: "True"
    type: Progressing
  desired:
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    version: ""
  history:
  - completionTime: null
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    startedTime: "2024-06-10T14:07:30Z"
    state: Partial
    verified: false
    version: ""
  - completionTime: "2024-06-10T12:06:31Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T11:37:17Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  observedGeneration: 3
  versionHash: AjnKTa_3kbg=

in upgrade status, Progressing to an empty target with Completion 0%

= Control Plane =
Assessment:      Progressing
Target Version:   (from 4.16.0-rc.4)
Completion:      0%
Duration:        2m26.971091165s
Operator Status: 33 Healthy

Expected results:

clusterversion stays the same as before scale toggle

apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2024-06-10T11:36:51Z"
  generation: 3
  name: version
  resourceVersion: "69881"
  uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
spec:
  channel: stable-4.16
  clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
  desiredUpdate:
    architecture: ""
    force: false
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    version: ""
status:
  availableUpdates: null
  capabilities:
    enabledCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
    knownCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
  conditions:
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: 'Unable to retrieve available updates: currently reconciling cluster
      version 4.16.0-rc.4 not found in the "stable-4.16" channel'
    reason: VersionNotFound
    status: "False"
    type: RetrievedUpdates
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Capabilities match configured spec
    reason: AsExpected
    status: "False"
    type: ImplicitlyEnabledCapabilities
  - lastTransitionTime: "2024-06-10T14:06:42Z"
    message: 'Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a"
      failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      against keyrings: verifier-public-key-redhat'
    reason: RetrievePayload
    status: "False"
    type: ReleaseAccepted
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Done applying 4.16.0-rc.4
    status: "True"
    type: Available
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Cluster version is 4.16.0-rc.4
    status: "False"
    type: Progressing
  desired:
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    url: https://access.redhat.com/errata/RHEA-2024:0041
    version: 4.16.0-rc.4
  history:
  - completionTime: "2024-06-10T12:06:31Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T11:37:17Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  observedGeneration: 2
  versionHash: AjnKTa_3kbg=

no upgrade is in progress message for release that is not accepted

 ❯ oc adm upgrade
Cluster version is 4.16.0-rc.4

ReleaseAccepted=False

  Reason: RetrievePayload
  Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a" failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a against keyrings: verifier-public-key-redhat

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.16
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-rc.4 not found in the "stable-4.16" channel

Additional info:

it is possible to kick the cluster out of this state, by applying --clear, which causing the cluster to breefly progress into its original version, followed by 3 items appearing in history

❯ oc adm upgrade --clear
Cleared the update field, still at registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a

❯ oc adm upgrade
info: An upgrade is in progress. Working towards 4.16.0-rc.4: 116 of 894 done (12% complete)

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.16
warning: Cannot display available updates:
  Reason: VersionNotFound
  Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-rc.4 not found in the "stable-4.16" channel

❯ oc get clusterversion version -oyaml
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: "2024-06-10T11:36:51Z"
  generation: 4
  name: version
  resourceVersion: "72594"
  uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
spec:
  channel: stable-4.16
  clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
status:
  availableUpdates: null
  capabilities:
    enabledCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
    knownCapabilities:
    - Build
    - CSISnapshot
    - CloudControllerManager
    - CloudCredential
    - Console
    - DeploymentConfig
    - ImageRegistry
    - Ingress
    - Insights
    - MachineAPI
    - NodeTuning
    - OperatorLifecycleManager
    - Storage
    - baremetal
    - marketplace
    - openshift-samples
  conditions:
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: 'Unable to retrieve available updates: currently reconciling cluster
      version 4.16.0-rc.4 not found in the "stable-4.16" channel'
    reason: VersionNotFound
    status: "False"
    type: RetrievedUpdates
  - lastTransitionTime: "2024-06-10T11:37:17Z"
    message: Capabilities match configured spec
    reason: AsExpected
    status: "False"
    type: ImplicitlyEnabledCapabilities
  - lastTransitionTime: "2024-06-10T14:13:07Z"
    message: Payload loaded version="4.16.0-rc.4" image="quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6"
      architecture="amd64"
    reason: PayloadLoaded
    status: "True"
    type: ReleaseAccepted
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    message: Done applying 4.16.0-rc.4
    status: "True"
    type: Available
  - lastTransitionTime: "2024-06-10T12:06:31Z"
    status: "False"
    type: Failing
  - lastTransitionTime: "2024-06-10T14:14:00Z"
    message: Cluster version is 4.16.0-rc.4
    status: "False"
    type: Progressing
  desired:
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    url: https://access.redhat.com/errata/RHEA-2024:0041
    version: 4.16.0-rc.4
  history:
  - completionTime: "2024-06-10T14:14:00Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T14:13:07Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  - completionTime: "2024-06-10T14:13:07Z"
    image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
    startedTime: "2024-06-10T14:07:30Z"
    state: Partial
    verified: false
    version: ""
  - completionTime: "2024-06-10T12:06:31Z"
    image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
    startedTime: "2024-06-10T11:37:17Z"
    state: Completed
    verified: false
    version: 4.16.0-rc.4
  observedGeneration: 4
  versionHash: AjnKTa_3kbg=

also trying to apply a rollback at this state, resulting in invalid SemVer error

 ❯ OC_ENABLE_CMD_UPGRADE_ROLLBACK=true oc adm upgrade rollback                                                             
error: previous version "" invalid SemVer: Version string empty

https://github.com/openshift/cluster-version-operator/pull/1097

Bug OCPBUGS-37726: Update owners in route-override-cni

View the Description View the linked PRs

Backport ownerfile update

https://github.com/openshift/route-override-cni/pull/58

Bug OCPBUGS-34161: kubelet-serving CSRs in Pending state on SNO with Telco DU with disabled capabilities

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33644~~. The following is the description of the original issue:
—
Description of problem:

After running tests on an SNO with Telco DU profile for a couple of hours kubernetes.io/kubelet-serving CSRs in Pending state start showing up and accumulating in time.

Version-Release number of selected component (if applicable):

4.16.0-rc.1

How reproducible:

once so far

Steps to Reproduce:

    1. Deploy SNO with DU profile with disabled capabilities:

    installConfigOverrides:  "{\"capabilities\":{\"baselineCapabilitySet\": \"None\", \"additionalEnabledCapabilities\": [ \"NodeTuning\", \"ImageRegistry\", \"OperatorLifecycleManager\" ] }}"

2. Leave the node running tests overnight for a couple of hours

3. Check for Pending CSRs

Actual results:

oc get csr -A | grep Pending | wc -l 
27

Expected results:

No pending CSRs    

Also oc logs will return a tls internal error:

oc -n openshift-cluster-machine-approver --insecure-skip-tls-verify-backend=true logs machine-approver-866c94c694-7dwks 
Defaulted container "kube-rbac-proxy" out of: kube-rbac-proxy, machine-approver-controller
Error from server: Get "https://[2620:52:0:8e6::d0]:10250/containerLogs/openshift-cluster-machine-approver/machine-approver-866c94c694-7dwks/kube-rbac-proxy": remote error: tls: internal error

Additional info:

Checking the machine-approver-controller container logs on the node we can see the reconciliation is failing be cause it cannot find the Machine API which is disabled from the capabilities.

I0514 13:25:09.266546       1 controller.go:120] Reconciling CSR: csr-dw9c8
E0514 13:25:09.275585       1 controller.go:138] csr-dw9c8: Failed to list machines in API group machine.openshift.io/v1beta1: no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
E0514 13:25:09.275665       1 controller.go:329] "Reconciler error" err="Failed to list machines: no matches for kind \"Machine\" in version \"machine.openshift.io/v1beta1\"" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-dw9c8" namespace="" name="csr-dw9c8" reconcileID="6f963337-c6f1-46e7-80c4-90494d21653c"
I0514 13:25:43.792140       1 controller.go:120] Reconciling CSR: csr-jvrvt
E0514 13:25:43.798079       1 controller.go:138] csr-jvrvt: Failed to list machines in API group machine.openshift.io/v1beta1: no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
E0514 13:25:43.798128       1 controller.go:329] "Reconciler error" err="Failed to list machines: no matches for kind \"Machine\" in version \"machine.openshift.io/v1beta1\"" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-jvrvt" namespace="" name="csr-jvrvt" reconcileID="decbc5d9-fa10-45d1-92f1-1c999df956ff"

https://github.com/openshift/cluster-machine-approver/pull/233

Bug OCPBUGS-34384: Default Rolebindings Created on OCP 4.16 with No Capabilities

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34077~~. The following is the description of the original issue:
—
In OCP 4.16.0, the default role bindings for image puller, image pusher, and deployer are created, even if the respective capabilities are disabled on the cluster.

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/350

Bug OCPBUGS-37763: iptables-alerter logs spurious events under heavy load

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-37713~~. The following is the description of the original issue:
—
Under heavy load(?) crictl can fail and return errors which iptables-alerter does not handle correctly, and as a result, it may accidentally end up checking for iptables rules in hostNetwork pods, and then logging events about it.

https://github.com/openshift/cluster-network-operator/pull/2452

Bug OCPBUGS-27959: Panic: send on closed channel

View the Description View the linked PRs

In a CI run of etcd-operator-e2e I've found the following panic in the operator logs:

E0125 11:04:58.158222       1 health.go:135] health check for member (ip-10-0-85-12.us-west-2.compute.internal) failed: err(context deadline exceeded)
panic: send on closed channel

goroutine 15608 [running]:
github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth.func1()
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:58 +0xd2
created by github.com/openshift/cluster-etcd-operator/pkg/etcdcli.getMemberHealth
	github.com/openshift/cluster-etcd-operator/pkg/etcdcli/health.go:54 +0x2a5

which unfortunately is an incomplete log file. The operator recovered itself by restarting, we should fix the panic nonetheless.

Job run for reference:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1186/pull-ci-openshift-cluster-etcd-operator-master-e2e-operator/1750466468031500288

https://github.com/openshift/cluster-etcd-operator/pull/1190

Bug OCPBUGS-28920: OCP 4.13.30 - allow-from-ingress NetworkPolicy does not consistently allow traffic from HostNetworked pods or from node IP's (packet timeout)

View the Description View the linked PRs

Description of problem:

- Observed that after upgrade to 4.13.30 (from 4.13.24) On all nodes/projects (replicated on two clusters that underwent the same upgrade) - traffic routed from HostNetworked pods (router-default) calling to backends intermittently timeout/fail to reach their destination.

This manifests as the router pods marking backends as DOWN and dropping traffic; but The behavior can be replicated with curl outside of the HAProxy pods via entering a debug shell to a host node (or SSH) and curling the pod IP directly. A significant percentage of packets time out to the target backend on intermittent subsequent calls.
We narrowed the behavior down to the moment we applied the NetworkPolicy for `allow-from-ingress` as outlined below - immediately the namespace began to drop packets on a curl loop running from an infra node directly against the pod IP (some 2-3% of all calls timed out).

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
  metadata:
    name: allow-from-openshift-ingress
    namespace: testing
spec:
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
             policy-group.network.openshift.io/ingress: ""
    podSelector: {}
    policyTypes:
    - Ingress

Version-Release number of selected component (if applicable):

How reproducible:

every time, all namespaces with this network policy on this clusterversion (replicated on two clusters that underwent the same upgrade).

Steps to Reproduce:

1. Upgrade cluster to 4.13.30

2. Apply test pod running basic HTTP instance at random port

3. Apply networkpolicy to allow-from-ingress and begin curl loop against target pod directly from ingressnode (or other worker node) at host chroot level (nodeIP).

4. Observe that curls time out intermittently --> replicator curl loop is below (note inclusion of --connect-timeout flag to help allow loop to continue more rapidly without waiting for full 2m connect timeout on typical syn failure).

$ while true; do curl --connect-timeout 5 --noproxy '*' -k -w "dnslookup: %{time_namelookup} | connect: %{time_connect} | appconnect: %{time_appconnect} | pretransfer: %{time_pretransfer} | starttransfer: %{time_starttransfer} | total: %{time_total} | size: %{size_download} | response: %{response_code}\n" -o /dev/null -s https://<POD>:<PORT>; done

Actual results:

- Traffic to all backends is dropped/degraded as a result of this intermittent failure marking valid/healthy pods as unavailable due to the connection failure to the backends.

Expected results:

- traffic should not be iimpeded, especially when the application of the networkpolicy to allow said traffic is implemented.

Additional info:

This behavior began immediately after completed upgrade from 4.13.24 to 4.13.30 and has been replicated on two separate clusters.
Customer has been forced to reinstall a cluster at downgraded version to ensure stability/deliverables for their user-base and this is a critical impact outage scenario for them

– additional required template details in first comment below.

RCA UPDATE:
So the problem is that host-network namespace is not labeled by ingress controller and if router pods are hostNetworked, network policy with `policy-group.network.openshift.io/ingress: ""` selector won't allow incoming connections. To reproduce, we need to run ingress controller with `EndpointPublishingStrategy=HostNetwork` https://docs.openshift.com/container-platform/4.14/networking/nw-ingress-controller-endpoint-publishing-strategies.html and then check host-network namespace labels with

oc get ns openshift-host-network --show-labels
# expected this
kubernetes.io/metadata.name=openshift-host-network,network.openshift.io/policy-group=ingress,policy-group.network.openshift.io/host-network=,policy-group.network.openshift.io/ingress=

# but before the fix you will see 
kubernetes.io/metadata.name=openshift-host-network,policy-group.network.openshift.io/host-network=

Another way to verify this is the same problem (disruptive, only recommended for test environments) is to make CNO unmanaged

oc scale deployment cluster-version-operator -n openshift-cluster-version --replicas=0
oc scale deployment network-operator -n openshift-network-operator --replicas=0

and then label openshift-host-network namespace manually based on expected labels ^ and see if the problem disappears

Potentially affected versions (may need to reproduce to confirm)

4.16.0, 4.15.0, 4.14.0 since https://issues.redhat.com//browse/OCPBUGS-8070

4.13.30 https://issues.redhat.com/browse/OCPBUGS-22293

4.12.48 https://issues.redhat.com/browse/OCPBUGS-24039

Mitigation/support KCS:
https://access.redhat.com/solutions/7055050

https://github.com/openshift/cluster-network-operator/pull/2259

Bug OCPBUGS-30575: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13659

Bug OCPBUGS-27788: Power VS: Cannot deploy to mad

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Try to deploy in mad02 or mad04 with powervs
    2. Cannot import boot image
    3. fail

Actual results:

Fail

Expected results:

Cluster comes up

Additional info:

https://github.com/openshift/installer/pull/7941

Bug OCPBUGS-25835: Installer should validate that 'baremetal' capability is enabled for baremetal platform

View the Description View the linked PRs

If the user specifies baselineCapabilitySet: None in the install-config and does not specifically enable the capability baremetal, yet still uses platform: baremetal then the install will reliably fail.

This failure takes the form of a timeout with the bootkube logs (not easily accessible to the user) full of errors like:

bootkube.sh[46065]: "99_baremetal-provisioning-config.yaml": unable to get REST mapping for "99_baremetal-provisioning-config.yaml": no matches for kind "Provisioning" in version "metal3.io/v1alpha1"
bootkube.sh[46065]: "99_openshift-cluster-api_hosts-0.yaml": unable to get REST mapping for "99_openshift-cluster-api_hosts-0.yaml": no matches for kind "BareMetalHost" in version "metal3.io/v1alpha1"

Since the installer can tell when processing the install-config if the baremetal capability is missing, we should detect this and error out immediately to save the user an hour of their life and us a support case.

Although this was found on an agent install, I believe the same will apply to a baremetal IPI install.

https://github.com/openshift/installer/pull/7901

Bug OCPBUGS-28590: GCP: unhelpful error message when using env credentials

View the Description View the linked PRs

Description of problem:

 Facing error while creating manifests:

./openshift-install create manifests --dir openshift-config
FATAL failed to fetch Master Machines: failed to generate asset "Master Machines": failed to create master machine objects: failed to create provider: unexpected end of JSON input

Using below document :

https://docs.openshift.com/container-platform/4.14/installing/installing_gcp/installing-gcp-vpc.html#installation-gcp-config-yaml_installing-gcp-vpc

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/8002

Bug OCPBUGS-34393: [release-4.16] When editing a ConfigMap in the Form View from console, the value window/box is no longer resizable in OpenShift 4.15 versions.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34200~~. The following is the description of the original issue:
—
Description of problem:

 The value box in the ConfigMap Form view is no longer resizable. It is resizable as expected in  OCP version 4.14.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

OCP Console -> Administrator -> Workloads -> ConfigMaps -> Create ConfigMap -> Form view -> value

Actual results:

    Value window box should be resizable.

Expected results:

    It is not resizable anymore in 4.15 OpenShift Clusters.

Additional info:

https://github.com/openshift/console/pull/13894

Bug OCPBUGS-25047: Update 4.16 ose-powervs-machine-controllers-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-powervs/pull/67

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-powervs/pull/68

Bug OCPBUGS-30480: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/203

Bug OCPBUGS-35551: Bump to kubernetes 1.29.6

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.29.6:

Changelog:
v1.29.6: https://github.com/kubernetes/kubernetes/blob/release-1.29/CHANGELOG/CHANGELOG-1.29.md#changelog-since-v1295

https://github.com/openshift/kubernetes/pull/1990

Bug OCPBUGS-42159: High number of redundant kubeproxy rules present in OCP 4.15 with OpenshiftSDN

View the Description View the linked PRs

Description of problem:

A high number of kubeproxy rules is observed on OCP cluster installed with version 4.15.19 and OpenshiftSDN network plugin. The quantity of redundant rules increases continuosly and it seems more related to the rules from openshift-ingress namespace. In the following example, it is present in a cluster node 157k redundant rules related to KUBE-MARK-MASQ rule:

$ less iptables-nat_rules.txt | grep openshift-ingress/router-nodeport-<svc-name> | wc -l
157761  <-----

 0     0 KUBE-MARK-MASQ  all  --  !tun0  *       0.0.0.0/0            0.0.0.0/0            /* masquerade traffic for openshift-ingress/router-nodeport-<svc-name>:http external destinations */

Nodeport services seems to be more affected by the issue.

After a the node reboot, the quantity of rules drop, however some hours later, the issue reoccurs.

Version-Release number of selected component (if applicable): OCP 4.15.19

How reproducible: Not easily

Actual results: Affected nodes are firing alerts NodeProxyApplySlow and ClusterProxyApplySlow

Expected results: Cluster shouldn't create this high quantity of redundant rules

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms: RHOCP

https://github.com/openshift/sdn/pull/633

Bug OCPBUGS-37939: [4.16] While upgrading from 4.12.55 to 4.13.42, the network operator goes in a degraded state due to the ovnkube-master pods ending up in a crashloopbackoff.

View the Description View the linked PRs

Description of problem:

While upgrading from 4.12.55 to 4.13.42 the network operator seems to be in a degraded state due to the ovnkube-master pods ending up in a crashloopbackoff.

The ovnkube-master container appears to hit a context deadline timeout and is not starting. This happens for all 3 ovnkube-master pods.

ovnkube-master-b5dwz   5/6     CrashLoopBackOff   15 (4m49s ago)   75m
ovnkube-master-dm6g5   5/6     CrashLoopBackOff   15 (3m50s ago)   72m
ovnkube-master-lzltc         5/6     CrashLoopBackOff   16 (31s ago)     76m

Relevant logs :

1 ovnkube.go:369] failed to start network controller manager: failed to start default network controller: failed to sync address sets on controller init: failed to transact address set sync ops: error in transact with ops [{Op:insert Table:Address_Set Row:map[addresses:{GoSet:[172.21.4.58 172.30.113.119 172.30.113.93 172.30.140.204 172.30.184.23 172.30.20.1 172.30.244.26 172.30.250.254 172.30.29.56 172.30.39.131 172.30.54.87 172.30.54.93 172.30.70.9]} external_ids:{GoMap:map[direction:ingress gress-index:0 ip-family:v4 ...]} log:false match:ip4.src == {$a10011776377603330168, $a10015887742824209439, $a10026019104056290237, $a10029515256826812638, $a5952808452902781817, $a10084011578527782670, $a10086197949337628055, $a10093706521660045086, $a10096260576467608457, $a13012332091214445736, $a10111277808835218114, $a10114713358929465663, $a101155018460287381, $a16191032114896727480, $a14025182946114952022, $a10127722282178953052, $a4829957937622968220, $a10131833063630260035, $a3533891684095375041, $a7785003721317615588, $a10594480726457361847, $a10147006001458235329, $a12372228123457253136, $a10016996505620670018, $a10155660392008449200, $a10155926828030234078, $a15442683337083171453, $a9765064908646909484, $a7550609288882429832, $a11548830526886645428, $a10204075722023637394, $a10211228835433076965, $a5867828639604451547, $a10222049254704513272, $a13856077787103972722, $a11903549070727627659,.... (this is a very long list of ACL)

https://github.com/openshift/ovn-kubernetes/pull/2244

Bug OCPBUGS-34041: [4.16] metal3-ironic-inspector CrashLoopBackOff - /certs/ca/ironic permission denied

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32304~~. The following is the description of the original issue:
—
Description of problem:

There is one pod of metal3 operator in constant failure state. The cluster was acting as Hub cluster with ACM + GitOps for SNO installation. It was working well for a few days, until this moment when no other sites could be deployed.

oc get pods -A | grep metal3
openshift-machine-api                              metal3-64cf86fb8b-fg5b9                                           3/4     CrashLoopBackOff   35 (108s ago)   155m
openshift-machine-api                              metal3-baremetal-operator-84875f859d-6kj9s                        1/1     Running            0               155m
openshift-machine-api                              metal3-image-customization-57f8d4fcd4-996hd                       1/1     Running            0               5h

Version-Release number of selected component (if applicable):

OCP version: 4.16.ec5

How reproducible:

Once it starts to fail, it does not recover.

Steps to Reproduce:

    1. Unclear. Install Hub cluster with ACM+GitOps
    2. (Perhaps: Update AgentServiceConfig

Actual results:

Pod crashing and installation of spoke cluster fails

Expected results:

Pod running and installation of spoke cluster succeds.

Additional info:

Logs of metal3-ironic-inspector:

`[kni@infra608-1 ~]$ oc logs pods/metal3-64cf86fb8b-fg5b9 -c metal3-ironic-inspector
+ CONFIG=/etc/ironic-inspector/ironic-inspector.conf
+ export IRONIC_INSPECTOR_ENABLE_DISCOVERY=false
+ IRONIC_INSPECTOR_ENABLE_DISCOVERY=false
+ export INSPECTOR_REVERSE_PROXY_SETUP=true
+ INSPECTOR_REVERSE_PROXY_SETUP=true
+ . /bin/tls-common.sh
++ export IRONIC_CERT_FILE=/certs/ironic/tls.crt
++ IRONIC_CERT_FILE=/certs/ironic/tls.crt
++ export IRONIC_KEY_FILE=/certs/ironic/tls.key
++ IRONIC_KEY_FILE=/certs/ironic/tls.key
++ export IRONIC_CACERT_FILE=/certs/ca/ironic/tls.crt
++ IRONIC_CACERT_FILE=/certs/ca/ironic/tls.crt
++ export IRONIC_INSECURE=true
++ IRONIC_INSECURE=true
++ export 'IRONIC_SSL_PROTOCOL=-ALL +TLSv1.2 +TLSv1.3'
++ IRONIC_SSL_PROTOCOL='-ALL +TLSv1.2 +TLSv1.3'
++ export 'IPXE_SSL_PROTOCOL=-ALL +TLSv1.2 +TLSv1.3'
++ IPXE_SSL_PROTOCOL='-ALL +TLSv1.2 +TLSv1.3'
++ export IRONIC_VMEDIA_SSL_PROTOCOL=ALL
++ IRONIC_VMEDIA_SSL_PROTOCOL=ALL
++ export IRONIC_INSPECTOR_CERT_FILE=/certs/ironic-inspector/tls.crt
++ IRONIC_INSPECTOR_CERT_FILE=/certs/ironic-inspector/tls.crt
++ export IRONIC_INSPECTOR_KEY_FILE=/certs/ironic-inspector/tls.key
++ IRONIC_INSPECTOR_KEY_FILE=/certs/ironic-inspector/tls.key
++ export IRONIC_INSPECTOR_CACERT_FILE=/certs/ca/ironic-inspector/tls.crt
++ IRONIC_INSPECTOR_CACERT_FILE=/certs/ca/ironic-inspector/tls.crt
++ export IRONIC_INSPECTOR_INSECURE=true
++ IRONIC_INSPECTOR_INSECURE=true
++ export IRONIC_VMEDIA_CERT_FILE=/certs/vmedia/tls.crt
++ IRONIC_VMEDIA_CERT_FILE=/certs/vmedia/tls.crt
++ export IRONIC_VMEDIA_KEY_FILE=/certs/vmedia/tls.key
++ IRONIC_VMEDIA_KEY_FILE=/certs/vmedia/tls.key
++ export IPXE_CERT_FILE=/certs/ipxe/tls.crt
++ IPXE_CERT_FILE=/certs/ipxe/tls.crt
++ export IPXE_KEY_FILE=/certs/ipxe/tls.key
++ IPXE_KEY_FILE=/certs/ipxe/tls.key
++ export RESTART_CONTAINER_CERTIFICATE_UPDATED=false
++ RESTART_CONTAINER_CERTIFICATE_UPDATED=false
++ export MARIADB_CACERT_FILE=/certs/ca/mariadb/tls.crt
++ MARIADB_CACERT_FILE=/certs/ca/mariadb/tls.crt
++ export IPXE_TLS_PORT=8084
++ IPXE_TLS_PORT=8084
++ mkdir -p /certs/ironic
++ mkdir -p /certs/ironic-inspector
++ mkdir -p /certs/ca/ironic
mkdir: cannot create directory '/certs/ca/ironic': Permission denied

https://github.com/openshift/cluster-baremetal-operator/pull/418

Task MGMT-17194: Block clusterimageset by tag when mirror registries

View the Description View the linked PRs

If the user relies on mirror registries, and clusterimageset is set to a tagged image (e.g. quay.io/openshift-release-dev/ocp-release:4.15.0-multi), as opposed to a by digest image (e.g. quay.io/openshift-release-dev/ocp-release@sha256:b86422e972b9c838dfdb8b481a67ae08308437d6489ea6aaf150242b1d30fa1c), then `oc` will fail to pull with:

--icsp-file only applies to images referenced by digest and will be ignored for tags

Instead we should probably block it at the reconcile stage, or give the user clearer CR errors so they don't have to dig in the assisted service logs to figure out what went wrong

The oc error is actually much more confusing - oc ignores the icsp, tries to pull from quay, and runs into issues because mirror registries are trypically used in disconnected environments where quay is unreachable / has a different certificate - so there's a lot of red herrings the user will chase until they realize they should have used digest

Bug OCPBUGS-26977: Required RBAC for network-node-identity is not created when hosted cluster networkType is set to Other.

View the Description View the linked PRs

Description of problem:

When using a custom CNI plugin in a hostedcluster, multus requires some CSRs to be approved. The component approving these CSRs is the network-node-identity. This component only gets the proper RBAC rules configured when networkType is set to Calico.

In the current implementation, there is an condition that will apply the required RBAC if the networkType is set to Calico[1].

When using other CNI plugins, like Cilium, you're supposed to set networkType to Other. With current implementation, you won't get the required RBAC in place and as such, the required CSRs won't be approved automatically.


[1] https://github.com/openshift/hypershift/blob/release-4.14/control-plane-operator/controllers/hostedcontrolplane/cno/clusternetworkoperator.go#L139

Version-Release number of selected component (if applicable):

Latest

How reproducible:

Always

Steps to Reproduce:

    1. Set hostedcluster.spec.networking.networkType to Other
    2. Wait for the HC to start deploying and for the Nodes to join the cluster
    3. The nodes will remain in NotReady. Multus pods will complaing about certificates not being ready.
    4. If you list CSRs you will find pending CSRs.

Actual results:

RBAC not properly configured when networkType set to Other

Expected results:

RBAC properly configured when networkType set to Other

Additional info:

Slack discussion:

https://redhat-internal.slack.com/archives/C01C8502FMM/p1704824277049609

https://github.com/openshift/hypershift/pull/3403

Bug OCPBUGS-25012: Update 4.16 ose-hypershift-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/hypershift/pull/3303

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/hypershift/pull/3303

Story OCPNODE-2282: oc supports parsing multiple IDMS in single file.

View the Description View the linked PRs

oc supports parsing multiple IDMS in single file. this is a prerequisite for feature ~~OCPNODE-2281~~

https://github.com/openshift/oc/pull/1756

Story TRT-1721: Add Intervals for TargetDown metrics in 4.16

View the Description View the linked PRs

Fallout of https://issues.redhat.com/browse/OCPBUGS-35371

We simply do not have enough visibility into why these kubelet endpoints are going down, outside of a reboot, while kubelet itself stays up.

A big step would be charting them with the intervals. Add a new monitor test to query prometheus at the end of the run looking for when these targets were down.

Prom query:

max by (node, metrics_path) (up{job="kubelet"}) == 0

Then perhaps a test to flake if we see this happen outside of a node reboot. This seems to happen on every gcp-ovn (non-upgrade) job I look at. It does NOT seem to happen on AWS.

https://github.com/openshift/origin/pull/28901

Bug MGMT-16414: When trying to create cluster with s390x architecture, an error occurs that stops cluster creation

View the Description View the linked PRs

Description of the problem:

When trying to create cluster with s390x architecture, an error occurs that stops cluster creation. The error is "cannot use Skip MCO reboot because it's not compatible with the s390x architecture on version 4.15.0-ec.3 of OpenShift"

How reproducible:

Always

Steps to reproduce:

Create cluster with architecture s390x

Actual results:

Create failed

Expected results:

Create should succeed

https://github.com/openshift/assisted-service/pull/5822

Bug OCPBUGS-13106: e2e-gcp-operator loadbalancer service not going ready CI flakes

View the Description View the linked PRs

Description of problem:

Various jobs are failing in e2e-gcp-operator due to the LoadBalancer-Type Service not going "ready", which means it most likely not getting an IP address.

Tests so far affected are:
- TestUnmanagedDNSToManagedDNSInternalIngressController
- TestScopeChange
- TestInternalLoadBalancerGlobalAccessGCP
- TestInternalLoadBalancer
- TestAllowedSourceRanges

For example, in TestInternalLoadBalancer, the Load Balancer never comes back ready:

operator_test.go:1454: Expected conditions: map[Admitted:True Available:True DNSManaged:True DNSReady:True LoadBalancerManaged:True LoadBalancerReady:True]
         Current conditions: map[Admitted:True Available:False DNSManaged:True DNSReady:False Degraded:True DeploymentAvailable:True DeploymentReplicasAllAvailable:True DeploymentReplicasMinAvailable:True DeploymentRollingOut:False EvaluationConditionsDetected:False LoadBalancerManaged:True LoadBalancerProgressing:False LoadBalancerReady:False Progressing:False Upgradeable:True]

Where DNSReady:False and LoadBalancerReady:False.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

10% of the time

Steps to Reproduce:

1. Run e2e-gcp-operator many times until you see one of these failures

Actual results:

Test Failure

Expected results:

Not failure

Additional info:

Search.CI Links:
TestScopeChange
TestInternalLoadBalancerGlobalAccessGCP & TestInternalLoadBalancer

This does not seem related to https://issues.redhat.com/browse/OCPBUGS-6013. The DNS E2E tests actually pass this same condition check.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/329

Bug OCPBUGS-24839: Update 4.16 ose-route-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/route-controller-manager/pull/36

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/route-controller-manager/pull/36

Bug OCPBUGS-30212: oc adm catalog mirror does not work on windows

View the Description View the linked PRs

The command does not honor Windows path separators.

Related to https://issues.redhat.com//browse/OCPBUGS-28864 (access restricted and not publicly visible). This report serves as a target issue for the fix and its backport to older OCP versions. Please see more details in https://issues.redhat.com//browse/OCPBUGS-28864.

https://github.com/openshift/oc/pull/1680

Bug OCPBUGS-36411: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/14022

Bug OCPBUGS-39170: noProxy URL not available in Prometheus k8s CR after configuring remote-write

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-39029~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38289. The following is the description of the original issue:
—
Description of problem:

The cluster-wide proxy is getting injected for remote-write config automatically but not the noProxy URLs in Prometheus k8s CR which is available in openshift-monitoring project which is expected. However, if the remote-write endpoint is in noProxy region, then metrics are not transferred.

Version-Release number of selected component (if applicable):

RHOCP 4.16.4

How reproducible:

100%

Steps to Reproduce:

1. Configure proxy custom resource in RHOCP 4.16.4 cluster
2. Create cluster-monitoring-config configmap in openshift-monitoring project
3. Inject remote-write config (without specifically configuring proxy for remote-write)
4. After saving the modification in  cluster-monitoring-config configmap, check the remoteWrite config in Prometheus k8s CR. Now it contains the proxyUrl but NOT the noProxy URL(referenced from cluster proxy). Example snippet:
==============
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
[...]
  name: k8s
  namespace: openshift-monitoring
spec:
[...]
  remoteWrite:
  - proxyUrl: http://proxy.abc.com:8080     <<<<<====== Injected Automatically but there is no noProxy URL.
    url: http://test-remotewrite.test.svc.cluster.local:9090

Actual results:

The proxy URL from proxy CR is getting injected in Prometheus k8s CR automatically when configuring remoteWrite but it doesn't have noProxy inherited from cluster proxy resource.

Expected results:

The noProxy URL should get injected in Prometheus k8s CR as well.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2447

Bug OCPBUGS-28556: Update 4.16 ose-multus-whereabouts-ipam-cni-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/whereabouts-cni/pull/242

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/whereabouts-cni/pull/242

Bug OCPBUGS-27924: Update 4.16 ose-cluster-dns-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/401

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/402

Bug OCPBUGS-32525: CNO must consider infra and workload machine config pools for IPsec rollout

View the Description View the linked PRs

Description of problem:

CNO assumes only master and worker machine config pools present on the cluster, While running CI with 24 nodes, it's found that there are two more pools infra and workload present. So these pools are also taken into consideration while rolling out ipsec machine config.

# omg get mcp
NAME      CONFIG                                              UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
infra     rendered-infra-52f7615d8c841e7570b7ab6cbafecac8     True     False     False     3             3                  3                    0                     38m
master    rendered-master-fbb5d8e1337d1244d30291ffe3336e45    True     False     False     3             3                  3                    0                     1h10m
worker    rendered-worker-52f7615d8c841e7570b7ab6cbafecac8    False    True      False     24            12                 12                   0                     1h10m
workload  rendered-workload-52f7615d8c841e7570b7ab6cbafecac8  True     False     False     0             0                  0                    0                     38m

CI run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/50740/rehearse-50740-pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-azure-4.16-nightly-x86-control-plane-ipsec-24nodes/1782308642033242112

https://github.com/openshift/cluster-network-operator/pull/2349

Bug TRT-1400: Update disruption cronjob to use NO-ISSUE

View the Description View the linked PRs

Placeholder so we can merge.

https://github.com/openshift/origin/pull/28416

Bug OCPBUGS-29210: Several CLI example commands do not work

View the Description View the linked PRs

Several oc examples are incorrect. These are used in the CLI reference docs, but would also appear in the oc CLI help.

The commands that don't work have been removed manually from the CLI reference docs via this update: https://github.com/openshift/openshift-docs/compare/9907074162999c982a8a97c45665c98913d848c9..441f3419ef460d9863a45e4c2d6914b1c019e1d1

List of commands:

oc adm copy-to-node --copy=new-bootstrap-kubeconfig=/etc/kubernetes/kubeconfig
oc adm copy-to-node --copy=new-bootstrap-kubeconfig=/etc/kubernetes/kubeconfig -l node-role.kubernetes.io/master
oc adm certificates regenerate-leaf -A --all
oc adm ocp-certificates regenerate-leaf -n openshift-config-managed kube-controller-manager-client-cert
oc adm certificates regenerate-machine-config-server-serving-cert --update-ignition=false
oc adm certificates update-ignition-ca-bundle-for-machine-config-server
oc adm certificates regenerate-signers -A --all
oc adm certificates regenerate-signers -n openshift-kube-apiserver-operator loadbalancer-serving-signer
oc adm ocp-certificates remove-old-trust configmaps -A --all
oc adm ocp-certificates remove-old-trust -n openshift-config-managed configmaps/kube-apiserver-aggregator-client-ca

For more information, see the feedback on these PRs:

4.15 CLI doc PR: https://github.com/openshift/openshift-docs/pull/71213
4.14 CLI doc PR: https://github.com/openshift/openshift-docs/pull/65709

https://github.com/openshift/oc/pull/1686

Bug OCPBUGS-32059: Helm Plugin's Catalog incorrectly renders a single index entry into multiple tiles

View the Description View the linked PRs

Description of problem:

The Helm Plugin's index view is parsing a given chart entry's into multiple tiles if the individual entry names vary.

This is inconsistent with the Helm CLI experience, which treats all items in an index entry (i.e. all versions of a given chart) to be a part of the same chart.

Version-Release number of selected component (if applicable):

All

How reproducible:

100%

Steps to Reproduce:

    1. Open the Developer Console, Helm Plugin
    2. Select a namespace and Click to create a helm release
    3. Search for the developer-hub chart in the catalog (this is an example demonstrating the problem)

Actual results:

There are two tiles for Developer Hub, but only one index entry in the corresponding index (https://charts.openshift.io)

Expected results:

A single tile should exist for this single index entry.

Additional info:

The cause of this is an expected indexing inconsistency, but the experience should align with the Helm CLI's behavior, and should still represent a single catalog tile per index entry.

https://github.com/openshift/console/pull/13776

Bug OCPBUGS-38930: Invalid configuration for device 0 error with openshift-installer for vsphere

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38560~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-37945. The following is the description of the original issue:
—
Description of problem:

    openshift-install create cluster leads to error:
ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed during pre-provisioning: unable to initialize folders and templates: failed to import ova: failed to lease wait: Invalid configuration for device '0'. 

Vsphere standard port group

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1. openshift-install create cluster
     2. Choose Vsphere
    3. fill in the blanks
4. Have a standard port group

Actual results:

    error

Expected results:

    cluster creation

Additional info:

https://github.com/openshift/installer/pull/8898

Bug OCPBUGS-39383: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24716: 4.16 [vSphere CSI Driver] [zonal] Volume provisioning failed with: No compatible datastores found for accessibility requirements

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-29110: HCP: imageRegistryOverrides information are extracted only once on HyperShift operator initialization and never refreshed

View the Description View the linked PRs

Description of problem:

HO uses the ICSP/IDMS from mgmt cluster to extract the OCP release metadata to be used in the HostedCluster.

But they are extracted only once in main.go:
https://github.com/jparrill/hypershift/blob/9bf1403ae09c0f262ebfe006267e3b442cc70149/hypershift-operator/main.go#L287-L293
before starting the HC and NP controllers but they are never refreshed anymore when ICSP/IDMS changes on the management cluster neither when a new HostedCluster is created.

Version-Release number of selected component (if applicable):

    4.14 4.15 4.16

How reproducible:

100%

Steps to Reproduce:

    1. ensure that HO is already running
    2. create an ICSP or a IMDS on the management cluster
    3. try to create an hosted-cluster

Actual results:

the imageRegistryOverrides setting for the new hosted-cluster ignores the ICSP/IMDS created when the HO was already running.
Killing HO operator pod and wait for it to restart will bring to a different result.

Expected results:

HO is consistently consuming  ICSP/IMDS info at runtime without the need to be restarted

Additional info:

    It affects disconnected deployments

https://github.com/openshift/hypershift/pull/3933

Bug OCPBUGS-36937: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4357

Task MGMT-16980: Move default value of ENABLE_SKIP_MCO_REBOOT to false for ABI and ACM

View the Description View the linked PRs

https://github.com/openshift/assisted-service/pull/6011

Story TRT-1456: hypershift broken on 4.16 ci payloads

View the Description View the linked PRs

https://redhat-internal.slack.com/archives/C01C8502FMM/p1706104538539689

https://github.com/openshift/hypershift/pull/3460

Bug OCPBUGS-31118: Priority Class override for ignition-server deployment was accidentally ripped out when a new reconcileProxyDeployment() func was introduced.

View the Description View the linked PRs

Description of problem:

    Priority Class override for ignition-server deployment was accidentally ripped out when a new reconcileProxyDeployment() func was introduced.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1.Create a cluster with priority class override opted in
    2.Override priority class in HC
    3.Check ignition server deployment priority class

Actual results:

doesn't override priority class

Expected results:

overridden priority class

Additional info:

https://github.com/openshift/hypershift/pull/3784

Bug OCPBUGS-26605: e2e-gcp-op-layering CI job continuously failing

View the Description View the linked PRs

Description of problem:

The e2e-gcp-op-layering CI job seems to be continuously and consistently failing during the teardown process. In particular, it appears to be the TestOnClusterBuildRollsOutImage test that is failing whenever it attempts to tear down the node. See: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering/1744805949165539328 for an example of a failing job.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Open a PR to the GitHub MCO repository.

Actual results:

The teardown portion of the TestOnClusterBuildsRollout test fails thusly:

  utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f
    utils.go:1098: 
            Error Trace:    /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                        /usr/lib/golang/src/testing/testing.go:1150
                                        /usr/lib/golang/src/testing/testing.go:1328
                                        /usr/lib/golang/src/testing/testing.go:1570
            Error:          Received unexpected error:
                            exit status 1
            Test:           TestOnClusterBuildRollsOutImage
    utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f
    utils.go:1098: 
            Error Trace:    /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                        /usr/lib/golang/src/testing/testing.go:1150
                                        /usr/lib/golang/src/testing/testing.go:1328
                                        /usr/lib/golang/src/testing/testing.go:1312
                                        /usr/lib/golang/src/runtime/panic.go:522
                                        /usr/lib/golang/src/testing/testing.go:980
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                        /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                        /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                        /usr/lib/golang/src/testing/testing.go:1150
                                        /usr/lib/golang/src/testing/testing.go:1328
                                        /usr/lib/golang/src/testing/testing.go:1570
            Error:          Received unexpected error:
                            exit status 1
            Test:           TestOnClusterBuildRollsOutImage

Expected results:

This part of the test should pass.

Additional info:

The way the test teardown process currently works is that it shells out to the oc command to delete the underlying Machine and Node. We delete the underlying machine and node so that the cloud provider will provision us a new one due to issues with opting out of on-cluster builds that have yet to be resolved.

At the time this test was written, it was implemented in this way to avoid having to vendor the Machine client and API into the MCO codebase which has since happened. I suspect the issue is that oc is failing in some way since we get an exit status 1 from where it is invoked. Now that the Machine client and API are vendored into the MCO codebase, it makes more sense for us to use those directly instead of shelling out to oc in order to do this since we would get more verbose error messages instead.

https://github.com/openshift/machine-config-operator/pull/4110

Story MGMT-17095: Disable Local Cluster Import feature in master and ocm-2.10

View the Description View the linked PRs

https://github.com/openshift/assisted-service/pull/6040

Bug OCPBUGS-32583: pinned images fail when using ImageDigestMirrorSets

View the Description View the linked PRs

Description of problem:

When we pin an image while using ImageDigestMirrorSets, we get this failure:


E0422 14:22:29.588035    2366 daemon.go:1380] Fatal error from auxiliary tools: failed to get auth config for image example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a
4591f08019: no auth found for image: "example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019"

Version-Release number of selected component (if applicable):

pre-merge

How reproducible:

Always

Steps to Reproduce:

    1. Create an ImageDigestMirrorSet to configure mirrors for an image

apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  name: digest-mirror
spec:
  imageDigestMirrors:
  - mirrors:
    - quay.io/openshifttest/busybox
    source: example.io/digest-example/mybusy
    mirrorSourcePolicy: NeverContactSource  # do not redirect to the source registry if the pull from the mirror is failed


    2. Create a pinnedimageset using the values in the ImageDigestMirrorSet

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: PinnedImageSet
metadata:
  creationTimestamp: "2024-04-22T12:49:51Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: my-worker-pinned-images
  resourceVersion: "78482"
  uid: f06c94c4-067f-4404-b3c2-11d5aff4e0cb
spec:
  pinnedImages:
  - name: example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019

Actual results:

In the MCD logs we can see the following failure

$ oc -n openshift-machine-config-operator logs machine-config-daemon-dmtxx
...
I0422 14:26:14.117242    2438 pinned_image_set.go:274] Reconciling pinned image set: my-worker-pinned-images-2: generation: 1
E0422 14:26:14.125965    2438 pinned_image_set.go:981] failed to get auth config for image example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019: no auth found
 for image: "example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019"
W0422 14:26:14.125990    2438 pinned_image_set.go:983]  failed: worker max retries: 15

Expected results:

Since we can pull the image when we debug the node, we should be able to pin it:

sh-5.1# crictl pull example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019
Image is up to date for example.io/digest-example/mybusy@sha256:0415f56ccc05526f2af5a7ae8654baec97d4a614f24736e8eef41a4591f08019

Additional info:

We get a similar error when we try to pin release images while using releases built by clusterbot. For example, we can try to pin the rhel-coreos image:

$ oc adm release info --image-for rhel-coreos
registry.build03.ci.openshift.org/ci-ln-4cx6v6b/stable@sha256:85a096a567ca287ba9c0fe36642e49c34eb4dd541914f8823750e4b186fce569

If we try to pin it, we get a similar error:

E0422 13:03:14.979410    2951 daemon.go:1380] Fatal error from auxiliary tools: failed to get auth config for image registry.build03.ci.openshift.org/ci-ln-4cx6v6b/stable@sha256:85a096a567ca287ba9c0fe36642e49c34eb4dd541914f8823750e4b186fce569: no auth found for image: "registry.build03.ci.openshift.org/ci-ln-4cx6v6b/stable@sha256:85a096a567ca287ba9c0fe36642e49c34eb4dd541914f8823750e4b186fce569"


But the image can actually be pulled:

sh-5.1# crictl  pull registry.build03.ci.openshift.org/ci-ln-4cx6v6b/stable@sha256:85a096a567ca287ba9c0fe36642e49c34eb4dd541914f8823750e4b186fce569
Image is up to date for registry.build03.ci.openshift.org/ci-ln-4cx6v6b/stable@sha256:85a096a567ca287ba9c0fe36642e49c34eb4dd541914f8823750e4b186fce569

https://github.com/openshift/machine-config-operator/pull/4345

Bug OCPBUGS-38129: Potentially wrong junit test report name

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-38119~~. The following is the description of the original issue:
—
Description of problem:

It would be nice to have each of the e2e test specs shown in the test grid report (https://testgrid.k8s.io/redhat-openshift-olm#periodic-ci-openshift-operator-framework-olm-master-periodics-e2e-gcp-olm&show-stale-tests=). I noticed that the test grid for 4.14 is exhibiting the right behaviour: 
https://testgrid.k8s.io/redhat-openshift-olm#periodic-ci-openshift-operator-framework-olm-release-4.14-periodics-e2e-gcp-olm&show-stale-tests=

So, we should make the junit e2e report look like what it looks like in the 4.14 branch.

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Open browser of your choice
    2. Go to the link in the description section
    3. Direct eyeballs to screen

Actual results:

    No e2e specs in the test grid table

Expected results:

    e2e specs in the test grid table

Additional info:

https://github.com/openshift/operator-framework-olm/pull/833

Bug OCPBUGS-35341: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/4204

Bug OCPBUGS-385: oc-mirror requires that the default channel of an operator is mirrored

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/749

Bug OCPBUGS-29531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2264

Bug OCPBUGS-24959: Update 4.16 ose-aws-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/59

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/59

Bug OCPBUGS-25584: Update 4.16 ose-vsphere-cloud-controller-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/60

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/60

Bug OCPBUGS-27853: OCP upgrade to nightly build failed on provider cluster - OVN-K fails to process annotation on live-migratable VM pods

View the Description View the linked PRs

Description of problem:

Upgrading OCP from 4.14.7 to 4.15.0 nightly build failed on Provider cluster which is part of provider-client setup.
Platform: IBM Cloud Bare Metal cluster.

Steps done:

Step 1.

$ oc patch clusterversions/version -p '{"spec":{"channel":"stable-4.15"}}' --type=merge
clusterversion.config.openshift.io/version patched

Step 2:
$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-18-050837 --allow-explicit-upgrade --force
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requesting update to release image registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-18-050837

The cluster was not upgraded successfully.

 
$ oc get clusteroperator | grep -v "4.15.0-0.nightly-2024-01-18-050837   True        False         False"
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2024-01-18-050837   True        False         True       111s    APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
console                                    4.15.0-0.nightly-2024-01-18-050837   False       False         False      111s    RouteHealthAvailable: console route is not admitted
dns                                        4.15.0-0.nightly-2024-01-18-050837   True        True          False      12d     DNS "default" reports Progressing=True: "Have 4 available DNS pods, want 5.\nHave 5 available node-resolver pods, want 6."
etcd                                       4.15.0-0.nightly-2024-01-18-050837   True        False         True       12d     EtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [{Member:ID:14147288297306253147 name:"baremetal2-06.qe.rh-ocs.com" peerURLs:"https://52.116.161.167:2380" clientURLs:"https://52.116.161.167:2379"  Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://52.116.161.167:2379]: context deadline exceeded} {Member:ID:15369339084089827159 name:"baremetal2-03.qe.rh-ocs.com" peerURLs:"https://52.116.161.164:2380" clientURLs:"https://52.116.161.164:2379"  Healthy:true Took:9.617293ms Error:<nil>} {Member:ID:17481226479420161008 name:"baremetal2-04.qe.rh-ocs.com" peerURLs:"https://52.116.161.165:2380" clientURLs:"https://52.116.161.165:2379"  Healthy:true Took:9.090133ms Error:<nil>}]...
image-registry                             4.15.0-0.nightly-2024-01-18-050837   True        True          False      12d     Progressing: All registry resources are removed...
machine-config                             4.14.7                               True        True          True       7d22h   Unable to apply 4.15.0-0.nightly-2024-01-18-050837: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, MachineConfigPool master has not progressed to latest configuration: controller version mismatch for rendered-master-9b7e02d956d965d0906def1426cb03b5 expected eaab8f3562b864ef0cc7758a6b19cc48c6d09ed8 has 7649b9274cde2fb50a61a579e3891c8ead2d79c5: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-34b4781f1a0fe7119765487c383afbb3, retrying]]
monitoring                                 4.15.0-0.nightly-2024-01-18-050837   False       True          True       7m54s   UpdatingUserWorkloadPrometheus: client rate limiter Wait returned an error: context deadline exceeded, UpdatingUserWorkloadThanosRuler: waiting for ThanosRuler object changes failed: waiting for Thanos Ruler openshift-user-workload-monitoring/user-workload: context deadline exceeded
network                                    4.15.0-0.nightly-2024-01-18-050837   True        True          False      12d     DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 2 nodes)...
node-tuning                                4.15.0-0.nightly-2024-01-18-050837   True        True          False      98m     Working towards "4.15.0-0.nightly-2024-01-18-050837"


$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9b7e02d956d965d0906def1426cb03b5   False     True       True       3              0                   0                     1                      12d
worker   rendered-worker-4f54b43e9f934f0659761929f55201a1   False     True       True       3              1                   1                     1                      12d


$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.7    True        True          120m    Unable to apply 4.15.0-0.nightly-2024-01-18-050837: an unknown error has occurred: MultipleErrors


$ oc get nodes
NAME                          STATUS                     ROLES                         AGE   VERSION
baremetal2-01.qe.rh-ocs.com   Ready                      worker                        12d   v1.27.8+4fab27b
baremetal2-02.qe.rh-ocs.com   Ready                      worker                        12d   v1.27.8+4fab27b
baremetal2-03.qe.rh-ocs.com   Ready                      control-plane,master,worker   12d   v1.27.8+4fab27b
baremetal2-04.qe.rh-ocs.com   Ready                      control-plane,master,worker   12d   v1.27.8+4fab27b
baremetal2-05.qe.rh-ocs.com   Ready                      worker                        12d   v1.28.5+c84a6b8
baremetal2-06.qe.rh-ocs.com   Ready,SchedulingDisabled   control-plane,master,worker   12d   v1.27.8+4fab27b
----------------------------------------------------

During the efforts to bring the cluster back to a good state, these steps were done:
The node baremetal2-06.qe.rh-ocs.com was uncordoned.

Tried to upgrade to using the command

$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-22-051500 --allow-explicit-upgrade --force --allow-upgrade-with-warnings=true
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
warning: --allow-upgrade-with-warnings is bypassing: the cluster is already upgrading:  Reason: ClusterOperatorsDegraded
  Message: Unable to apply 4.15.0-0.nightly-2024-01-18-050837: wait has exceeded 40 minutes for these operators: etcd, kube-apiserverRequesting update to release image registry.ci.openshift.org/ocp/release:4.15.0-0.nightly-2024-01-22-051500


Upgrade to 4.15.0-0.nightly-2024-01-22-051500 also was not successful.
Node baremetal2-01.qe.rh-ocs.com was drained manually to see if that works.

Some clusteroperators stayed on the previous version. Some moved to Degraded state. 

$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9b7e02d956d965d0906def1426cb03b5   False     True       False      3              1                   1                     0                      13d
worker   rendered-worker-4f54b43e9f934f0659761929f55201a1   False     True       True       3              1                   1                     1                      13d


$ oc get pdb -n openshift-storage
NAME                                              MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mds-ocs-storagecluster-cephfilesystem   1               N/A               1                     11d
rook-ceph-mon-pdb                                 N/A             1                 1                     11d
rook-ceph-osd                                     N/A             1                 1                     3h17m


$ oc rsh rook-ceph-tools-57fd4d4d68-p2psh ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                             STATUS  REWEIGHT  PRI-AFF
-1         5.23672  root default                                                   
-5         1.74557      host baremetal2-01-qe-rh-ocs-com                           
 1    ssd  0.87279          osd.1                             up   1.00000  1.00000
 4    ssd  0.87279          osd.4                             up   1.00000  1.00000
-7         1.74557      host baremetal2-02-qe-rh-ocs-com                           
 3    ssd  0.87279          osd.3                             up   1.00000  1.00000
 5    ssd  0.87279          osd.5                             up   1.00000  1.00000
-3         1.74557      host baremetal2-05-qe-rh-ocs-com                           
 0    ssd  0.87279          osd.0                             up   1.00000  1.00000
 2    ssd  0.87279          osd.2                             up   1.00000  1.00000


OCP must-gather logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/hcp414-aaa/hcp414-aaa_20240112T084548/logs/must-gather-ibm-bm2-provider/must-gather.local.1079362865726528648/

Version-Release number of selected component (if applicable):

Initial version:
OCP 4.14.7
ODF 4.14.4-5.fusion-hci
OpenShift Virtualization: kubevirt-hyperconverged-operator.4.16.0-380
Local Storage: local-storage-operator.v4.14.0-202312132033
OpenShift Data Foundation Client : ocs-client-operator.v4.14.4-5.fusion-hci

How reproducible:

Reporting the first occurance of the isue.

Steps to Reproduce:

    1. On a Provider-client HCI setup , upgrade provider cluster to a nightly build of OCP

Actual results:

    OCP upgrade not successful. Some operators become degraded. worker machineconfigpool have 1 degraded machine count.

Expected results:

OCP upgrade to nightly build from 4.14.7 should be success.

Additional info:

    There are 3 hosted clients present

https://github.com/openshift/ovn-kubernetes/pull/2063

Bug OCPBUGS-24573: [vsphere]keepalived still running on worker nodes even using usermanaged ELB

View the Description View the linked PRs

Description of problem:

 setup cluster cluster on vsphere by usermanaged ELB, with install-config.yaml

    apiVIPs:
      - 10.38.153.2
    ingressVIPs:
      - 10.38.153.3
    loadBalancer:
      type: UserManaged
networking:
  machineNetwork:
    - cidr: "10.38.153.0/25"
featureSet: TechPreviewNoUpgrade

after cluster is started, Found the keeplaived still running on worker nodes.

omc get pod -n openshift-vsphere-infra
NAME                                                   READY   STATUS    RESTARTS   AGE
coredns-ci-op-2kch7ldp-72b07-7l4vs-master-0            2/2     Running   0          1h
coredns-ci-op-2kch7ldp-72b07-7l4vs-master-1            2/2     Running   0          59m
coredns-ci-op-2kch7ldp-72b07-7l4vs-master-2            2/2     Running   0          59m
coredns-ci-op-2kch7ldp-72b07-7l4vs-worker-0-tqc74      2/2     Running   0          39m
coredns-ci-op-2kch7ldp-72b07-7l4vs-worker-1-s654k      2/2     Running   0          37m
keepalived-ci-op-2kch7ldp-72b07-7l4vs-worker-0-tqc74   2/2     Running   0          39m
keepalived-ci-op-2kch7ldp-72b07-7l4vs-worker-1-s654k   2/2     Running   0          37m

Version-Release number of selected component (if applicable):

4.15

How reproducible:

always

Steps to Reproduce:

    1. setup vsphere on multi-subnet network with ELB, job
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/46458/rehearse-46458-periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-ipi-zones-multisubnets-external-lb-usermanaged-f28/1732560150155235328     2.
    3.

Actual results:

Expected results:

    keepalived should not be running on worker node.

Additional info:

    must-gather logs:  https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/pr-logs/pull/openshift_release/46458/rehearse-46458-periodic-ci-openshift-openshift-tests-private-release-4.15-amd64-nightly-vsphere-ipi-zones-multisubnets-external-lb-usermanaged-f28/1732560150155235328/artifacts/vsphere-ipi-zones-multisubnets-external-lb-usermanaged-f28/gather-must-gather/artifacts/must-gather.tar

Version-Release number of selected component (if applicable):
{code:none}

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/api/pull/1698

Bug OCPBUGS-28738: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/68

Bug OCPBUGS-32887: OCP upgrade from 4.13 to 4.14 triggers the error "failed to update canary route openshift-ingress-canary/canary"

View the Description View the linked PRs

Description of problem:

In the OCP upgrades from 4.13 to 4.14, the canary route configuration is changed as below:

Canary route configuration in OCP 4.13
$ oc get route -n openshift-ingress-canary canary -oyaml
apiVersion: route.openshift.io/v1
kind: Route
metadata:
labels:
ingress.openshift.io/canary: canary_controller
name: canary
namespace: openshift-ingress-canary
spec:
host: canary-openshift-ingress-canary.apps.<cluster-domain>.com <---- canary route configured with .spec.host
Canary route configuration in OCP 4.14:
$ oc get route -n openshift-ingress-canary canary -oyaml
apiVersion: route.openshift.io/v1
kind: Route
labels:
ingress.openshift.io/canary: canary_controller
name: canary
namespace: openshift-ingress-canary
spec:
port:
targetPort: 8080
subdomain: canary-openshift-ingress-canary <---- canary route configured with .spec.subdomain

After the upgrade, the following messages are printed in the ingress-operator pod:

2024-04-24T13:16:34.637Z        ERROR   operator.init   controller/controller.go:265    Reconciler error        {"controller": "canary_controller", "object": {"name":"default","namespace":"openshift-ingress-operator"}, "namespace": "openshift-ingress-operator", "name": "default", "reconcileID": "46290893-d755-4735-bb01-e8b707be4053", "error": "failed to ensure canary route: failed to update canary route openshift-ingress-canary/canary: Route.route.openshift.io \"canary\" is invalid: spec.subdomain: Invalid value: \"canary-openshift-ingress-canary\": field is immutable"}

The issue is resolved when the canary route is deleted.

See below the audit logs from the process:

# The route can't be updated with error 422: 

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"4e8bfb36-21cc-422b-9391-ef8ff42970ca","stage":"ResponseComplete","requestURI":"/apis/route.openshift.io/v1/namespaces/openshift-ingress-canary/routes/canary","verb":"update","user":{"username":"system:serviceaccount:openshift-ingress-operator:ingress-operator","groups":["system:serviceaccounts","system:serviceaccounts:openshift-ingress-operator","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["ingress-operator-746cd8598-hq2st"],"authentication.kubernetes.io/pod-uid":["f3ebccdf-f3b3-420d-8ea5-e33d98945403"]}},"sourceIPs":["10.128.0.93","10.128.0.2"],"userAgent":"Go-http-client/2.0","objectRef":{"resource":"routes","namespace":"openshift-ingress-canary","name":"canary","uid":"3e179946-d4e3-45ad-9380-c305baefd14e","apiGroup":"route.openshift.io","apiVersion":"v1","resourceVersion":"297888"},"responseStatus":{"metadata":{},"status":"Failure","message":"Route.route.openshift.io \"canary\" is invalid: spec.subdomain: Invalid value: \"canary-openshift-ingress-canary\": field is immutable","reason":"Invalid","details":{"name":"canary","group":"route.openshift.io","kind":"Route","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"canary-openshift-ingress-canary\": field is immutable","field":"spec.subdomain"}]},"code":422},"requestReceivedTimestamp":"2024-04-24T13:16:34.630249Z","stageTimestamp":"2024-04-24T13:16:34.636869Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"openshift-ingress-operator\" of ClusterRole \"openshift-ingress-operator\" to ServiceAccount \"ingress-operator/openshift-ingress-operator\""}}

# Route is deleted manually

"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"70821b58-dabc-4593-ba6d-5e81e5d27d21","stage":"ResponseComplete","requestURI":"/aps/route.openshift.io/v1/namespaces/openshift-ingress-canary/routes/canary","verb":"delete","user":{"username":"system:admin","groups":["system:masters","syste:authenticated"]},"sourceIPs":["10.0.91.78","10.128.0.2"],"userAgent":"oc/4.13.0 (linux/amd64) kubernetes/7780c37","objectRef":{"resource":"routes","namespace:"openshift-ingress-canary","name":"canary","apiGroup":"route.openshift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"status":"Success","details":{"ame":"canary","group":"route.openshift.io","kind":"routes","uid":"3e179946-d4e3-45ad-9380-c305baefd14e"},"code":200},"requestReceivedTimestamp":"2024-04-24T1324:39.558620Z","stageTimestamp":"2024-04-24T13:24:39.561267Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}

# Route is created again

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"92e6132a-aa1d-482d-a1dc-9ce021ae4c37","stage":"ResponseComplete","requestURI":"/aps/route.openshift.io/v1/namespaces/openshift-ingress-canary/routes","verb":"create","user":{"username":"system:serviceaccount:openshift-ingress-operator:ingres-operator","groups":["system:serviceaccounts","system:serviceaccounts:openshift-ingress-operator","system:authenticated"],"extra":{"authentication.kubernetesio/pod-name":["ingress-operator-746cd8598-hq2st"],"authentication.kubernetes.io/pod-uid":["f3ebccdf-f3b3-420d-8ea5-e33d98945403"]}},"sourceIPs":["10.128.0.93""10.128.0.2"],"userAgent":"Go-http-client/2.0","objectRef":{"resource":"routes","namespace":"openshift-ingress-canary","name":"canary","apiGroup":"route.opensift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":201},"requestReceivedTimestamp":"2024-04-24T13:24:39.577255Z","stageTimestamp":"2024-04-24T1:24:39.584371Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"openshift-ingress-perator\" of ClusterRole \"openshift-ingress-operator\" to ServiceAccount \"ingress-operator/openshift-ingress-operator\""}}

Version-Release number of selected component (if applicable):

    Ocp upgrade between 4.13 and 4.14

How reproducible:

    Upgrade the cluster from OCP 4.13 to 4.14 and check the ingress operator pod logs

Steps to Reproduce:

    1. Install cluster in OCP 4.13
    2. Upgrade to OCP 4.14
    3. Check the ingress operator logs

Actual results:

    Reported errors above

Expected results:

    The ingress canary route should be update without isssues

Additional info:

Story CORS-3254: Update google.golang.org/api/cloudresourcemanager library version

View the Description View the linked PRs

User Story:

As a developer, I want to be able to:

Use the latest library version(v3) of `google.golang.org/api/cloudresourcemanager` which has been extended with APIs required for resource manager tags.

so that I can achieve

Tags related implementation could be kept inline with existing framework made available for using GCP clients.

Acceptance Criteria:

Description of criteria:

UTs are updated, if any changes are required with library update.

https://github.com/openshift/installer/pull/8213

Bug OCPBUGS-37441: Cluster API should sort CredentialsRequest manifests after namespace

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36296~~. The following is the description of the original issue:
—

Description of problem

Currently the manifests directory has:

0000_30_cluster-api_00_credentials-request.yaml
0000_30_cluster-api_00_namespace.yaml
...

CredentialsRequests go into the openshift-cloud-credential-operator namespace, so they can come before or after the openshift-cluster-api namespace. But because they ask for Secrets in the openshift-cluster-api namespace, there would be less race and drama if the CredentialsRequest manifests were given a name that sorted them after the namespace. Like 0000_30_cluster-api_01_credentials-request.yaml.

Version-Release number of selected component

I haven't gone digging in history, it may have been like this since forever.

How reproducible

Every time.

Steps to Reproduce

With a release image pullspec like registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-27-184535:

$ oc adm release extract --to manifests registry.ci.openshift.org/ocp/release:4.17.0-0.nightly-2024-06-27-184535
$ ls manifests/0000_30_cluster-api_* | grep 'namespace\|credentials-request'

Actual results

$ ls manifests/0000_30_cluster-api_* | grep 'namespace\|credentials-request'
manifests/0000_30_cluster-api_00_credentials-request.yaml
manifests/0000_30_cluster-api_00_namespace.yaml

Expected results

$ ls manifests/0000_30_cluster-api_* | grep 'namespace\|credentials-request'
manifests/0000_30_cluster-api_00_namespace.yaml
manifests/0000_30_cluster-api_01_credentials-request.yaml

https://github.com/openshift/cluster-capi-operator/pull/190

Bug OCPBUGS-42138: [IBMCloud] fail to find "Tested instance types for IBMCloud"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-34712~~. The following is the description of the original issue:
—
Description of problem:

in the doc installing_ibm_cloud_public/installing-ibm-cloud-customizations.html have not the tested instance type list

Version-Release number of selected component (if applicable):

4.15

How reproducible:

   Always

Steps to Reproduce:

    1.https://docs.openshift.com/container-platform/4.15/installing/installing_ibm_cloud_public/installing-ibm-cloud-customizations.html

    have not list the tested vm

Actual results:

  have not list the tested type

Expected results:

   list the tested instance type as https://docs.openshift.com/container-platform/4.15/installing/installing_azure/installing-azure-customizations.html#installation-azure-tested-machine-types_installing-azure-customizations

Additional info:

https://github.com/openshift/installer/pull/9031

Bug OCPBUGS-26397: SDN Failues for [sig-network][Feature:tuning] sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist [Suite:openshift/conformance/parallel]

View the Description View the linked PRs

Seeing failures for SDN periodics running [sig-network][Feature:tuning] sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist [Suite:openshift/conformance/parallel] beginning with 4.16.0-0.nightly-2024-01-05-205447

sippy: sysctl allowlist update should start a pod with custom sysctl only when the sysctl is added to whitelist

  Jan  5 23:14:22.066: INFO: At 2024-01-05 23:14:09 +0000 UTC - event for testpod: {kubelet ip-10-0-54-42.us-west-2.compute.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod_e2e-test-tuning-bzspr_2a9ce6e0-726d-47a6-ac64-71d430926574_0(968a55c5afd81e077b1d15a4129084d5f15002ac3ae6aa9fe32648e841940fe2): error adding pod e2e-test-tuning-bzspr_testpod to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): timed out waiting for the condition

That payload contains OCPBUGS-26222: Adds a wait on unix socket readiness not sure that is the cause but will investigate.

https://github.com/openshift/multus-cni/pull/207

Bug OCPBUGS-34530: "alertmanager-trusted-ca-bundle configmap not injected in alertmanager-user-workload pods

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33645~~. The following is the description of the original issue:
—
Description of problem:

After enabling separate alertmanager instance for user-defined alert routing, the alertmanager-user-workload pods are initialized but the configmap alertmanager-trusted-ca-bundle is not injected in the pods.
[-] https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-alert-routing-for-user-defined-projects.html#enabling-a-separate-alertmanager-instance-for-user-defined-alert-routing_enabling-alert-routing-for-user-defined-projects

Version-Release number of selected component (if applicable):

RHOCP 4.13, 4.14 and 4.15

How reproducible:

100%

Steps to Reproduce:

1. Enable user-workload monitoring using[a]
2. Enable separate alertmanager instance for user-defined alert routing using [b]
3. Check if alertmanager-trusted-ca-bundle configmap is injected in alertmanager-user-workload pods which are running in openshift-user-workload-monitoring project.
$ oc describe pod alertmanager-user-workload-0 -n openshift-user-workload-monitoring | grep alertmanager-trusted-ca-bundle

[a] https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-monitoring-for-user-defined-projects.html#enabling-monitoring-for-user-defined-projects_enabling-monitoring-for-user-defined-projects

[b] https://docs.openshift.com/container-platform/4.15/observability/monitoring/enabling-alert-routing-for-user-defined-projects.html#enabling-a-separate-alertmanager-instance-for-user-defined-alert-routing_enabling-alert-routing-for-user-defined-projects

Actual results:

alertmanager-user-workload pods are NOT injected with alertmanager-trusted-ca-bundle configmap.

Expected results:

alertmanager-user-workload pods should be injected with alertmanager-trusted-ca-bundle configmap.

Additional info:

Similar configmap is injected fine in alertmanager-main pods which are running in openshift-monitoring project.

https://github.com/openshift/cluster-monitoring-operator/pull/2373

Bug OCPBUGS-25626: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/295

Bug OCPBUGS-29437: openshift-hyperkube RPM is too big

View the Description View the linked PRs

Description of problem:

It was notices that the openshift-hyperkube RPM which is primarilly, perhaps exclusively, used to install the kubelet in RHCOS or other environments included kube-apiserver, kube-controller-manager, and kube-scheduler binaries. Those binaries are all built and used via container images, which as far as I can tell don't make use of the RPM.

Version-Release number of selected component (if applicable):

4.12 - 4.16

How reproducible:

100%

Steps to Reproduce:

    1. rpm -ql openshift-hyperkube on any node
    2.
    3.

Actual results:

# rpm -ql openshift-hyperkube
/usr/bin/hyperkube
/usr/bin/kube-apiserver
/usr/bin/kube-controller-manager
/usr/bin/kube-scheduler
/usr/bin/kubelet
/usr/bin/kubensenter

# ls -lah /usr/bin/kube-apiserver /usr/bin/kube-controller-manager /usr/bin/kube-scheduler /usr/bin/hyperkube /usr/bin/kubensenter /usr/bin/kubelet
-rwxr-xr-x. 2 root root  945 Jan  1  1970 /usr/bin/hyperkube
-rwxr-xr-x. 2 root root 129M Jan  1  1970 /usr/bin/kube-apiserver
-rwxr-xr-x. 2 root root 114M Jan  1  1970 /usr/bin/kube-controller-manager
-rwxr-xr-x. 2 root root  54M Jan  1  1970 /usr/bin/kube-scheduler
-rwxr-xr-x. 2 root root 105M Jan  1  1970 /usr/bin/kubelet
-rwxr-xr-x. 2 root root 3.5K Jan  1  1970 /usr/bin/kubensenter

Expected results:

Just the kubelet and deps on the host OS, that's all that's necessary

Additional info:

My proposed change would be for people that cared about making this slim to install `openshift-hyperkube-kubelet` instead.

Bug OCPBUGS-30279: Do imports on imagestreams respect ImageTagMirrorSet?

View the Description View the linked PRs

Not sure which component this bug should be associated with.

I am not even sure if importing respects ImageTagMirrorSet.

We could not figure out in the slack conversion.

https://redhat-internal.slack.com/archives/C013VBYBJQH/p1709583648013199

Description of problem:

The expecting behaviour of ImageTagMirrorSet of redirecting the pulling of a proxy to quay.io did not work out.

Version-Release number of selected component (if applicable):

oc --context build02 get clusterversion version
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-ec.3   True        False         7d4h

Steps to Reproduce:

oc --context build02 get ImageTagMirrorSet quay-proxy -o yaml
apiVersion: config.openshift.io/v1
kind: ImageTagMirrorSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"config.openshift.io/v1","kind":"ImageTagMirrorSet","metadata":{"annotations":{},"name":"quay-proxy"},"spec":{"imageTagMirrors":[{"mirrors":["quay.io/openshift/ci"],"source":"quay-proxy.ci.openshift.org/openshift/ci"}]}}
  creationTimestamp: "2024-03-05T03:49:59Z"
  generation: 1
  name: quay-proxy
  resourceVersion: "4895378740"
  uid: 69fb479e-85bd-4a16-a38f-29b08f2636c3
spec:
  imageTagMirrors:
  - mirrors:
    - quay.io/openshift/ci
    source: quay-proxy.ci.openshift.org/openshift/ci


oc --context build02 tag --source docker quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest hongkliu-test/proxy-test-2:011 --as system:admin
Tag proxy-test-2:011 set to quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest.

oc --context build02 get is proxy-test-2 -o yaml
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    openshift.io/image.dockerRepositoryCheck: "2024-03-05T20:03:02Z"
  creationTimestamp: "2024-03-05T20:03:02Z"
  generation: 2
  name: proxy-test-2
  namespace: hongkliu-test
  resourceVersion: "4898915153"
  uid: f60b3142-1f5f-42ae-a936-a9595e794c05
spec:
  lookupPolicy:
    local: false
  tags:
  - annotations: null
    from:
      kind: DockerImage
      name: quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest
    generation: 2
    importPolicy:
      importMode: Legacy
    name: "011"
    referencePolicy:
      type: Source
status:
  dockerImageRepository: image-registry.openshift-image-registry.svc:5000/hongkliu-test/proxy-test-2
  publicDockerImageRepository: registry.build02.ci.openshift.org/hongkliu-test/proxy-test-2
  tags:
  - conditions:
    - generation: 2
      lastTransitionTime: "2024-03-05T20:03:02Z"
      message: 'Internal error occurred: quay-proxy.ci.openshift.org/openshift/ci:ci_ci-operator_latest:
        Get "https://quay-proxy.ci.openshift.org/v2/": EOF'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: "011"

Actual results:

The status of the stream shows that it still tries to connect to quay-proxy.

Expected results:

The request goes to quay.io directly.

Additional info:

The proxy has been shut down completely just to simplify the case. If it was on, there are Access logs showing the proxy get the requests for the image.
oc scale deployment qci-appci -n ci --replicas 0
deployment.apps/qci-appci scaled

I also checked the pull secret in the namespace and it has correct pull credentials to both proxy and quay.io.

https://github.com/openshift/openshift-apiserver/pull/419

Bug OCPBUGS-34002: console pods are crashlooping in OIDC authentication configuration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-29510~~. The following is the description of the original issue:
—
Description of problem:

    When a cluster is configured for direct OIDC configuration (authentication.config/cluster .spec.type=OIDC), console pods will be in crashloop until an OIDC client is configured for the console.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

100% in Hypershift; 100% in TechPreviewNoUpgrade featureset on standalone OpenShift

Steps to Reproduce:

    1. Update authentication.config/cluster so that Type=OIDC

Actual results:

    The console operator tries to create a new console rollout, but the pods crashloop. This is because the operator sets the console pods to "disabled". This would normally actually mean a privilege escalation, fortunately the configuration prevents a successful deploy.

Expected results:

    Console pods are healthy, they show a page which says that no authentication is currently configured.

Additional info:

https://github.com/openshift/console/pull/13874

Bug OCPBUGS-28937: ART requests updates to 4.16 image ose-openstack-cinder-csi-driver-operator-container

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/160

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/160

Bug OCPBUGS-27242: fix or ignore snyk errors for ocp storage repos

View the Description View the linked PRs

Description of problem:

In https://github.com/openshift/release/pull/47618 there are quite a few warnings from snyk in the presubmit rehearsal jobs that have not been reported in the bugs filed against storage

We need to go through each one and either fix (in the case of legit bugs) or ignore (false positives / test cases) to avoid having a presubmit job that always fails

Version-Release number of selected component (if applicable):

4.16

How reproducible:

always

Steps to Reproduce:

run 'security' presubmit rehearsal jobs in https://github.com/openshift/release/pull/47618

Actual results:

snyk issues reported

Expected results:

clean test runs

Additional info:

Task MON-3795: Request for sending data via telemetry for feature AdminNetworkPolicy

View the Description View the linked PRs

The goal is to collect metrics about AdminNetworkPolicy and BaselineAdminNetworkPolicy CRDs because its essentially to understand how the users are using this feature and in fact if they are using it OR not. This is required for 4.16 Feature https://issues.redhat.com/browse/SDN-4157 and hoping to get approval and PRs merged before the code freeze time frame for 4.16 (April 26th 2024)

admin_network_policy_total

~~admin_network_policy_total represents the total number of admin network policies in the cluster~~

~~Labels: None~~

~~See https://github.com/ovn-org/ovn-kubernetes/pull/4239 for more information~~

~~Cardinality of the metric is at most 1.~~

baseline_admin_network_policy_total

~~baseline_admin_network_policy_total represents the total number of baseline admin network policies in the cluster (0 or 1)~~

~~Labels: None~~

~~See https://github.com/ovn-org/ovn-kubernetes/pull/4239 for more information~~

~~Cardinality of the metric is at most 1.~~

We don't need the above two anymore because we have https://redhat-internal.slack.com/archives/C0VMT03S5/p1712567951869459?thread_ts=1712346681.157809&cid=C0VMT03S5

Instead of that we are adding two other metrics for rule count: (https://issues.redhat.com/browse/MON-3828 )

admin_network_policy_db_objects_total

admin_network_policy_db_objects_total represents the total number of OVN NBDB objects (table_name) owned by AdminNetworkPolicy controller in the cluster

table_name, possible values are "ACL" and "AddressSet" (In future "Port_Group")

See https://github.com/ovn-org/ovn-kubernetes/pull/4254 for more information

Cardinality of the metric is at most 3.

baseline_admin_network_policy_db_objects_total

baseline_admin_network_policy_db_objects_total represents the total number of OVN NBDB objects (table_name) owned by BaselineAdminNetworkPolicy controller in the cluster

table_name, possible values are "ACL" and "AddressSet" (In future "Port_Group")

See https://github.com/ovn-org/ovn-kubernetes/pull/4254 for more information

Cardinality of the metric is at most 3.

Bug OCPBUGS-27967: Update dependencies for Ironic and Ironic Inspector

View the linked PRs

https://github.com/openshift/ironic-image/pull/451

Bug OCPBUGS-32105: The third master is not joining to the cluster on an Agent Based Installations

View the Description View the linked PRs

After performing an Agent Based Installation on Baremetal, the master node which was initially the rendezvous host is not joining to the cluster.

Checking podman containers on this node we see that 'assisted-installer' pod appears with 143 exit code after the second master is detected as ready:

2024-04-01T15:21:14.677437000Z time="2024-04-01T15:21:14Z" level=info msg="Found 1 ready master nodes"
2024-04-01T15:21:19.684831000Z time="2024-04-01T15:21:19Z" level=info msg="Found a new ready master node <second-master> with id <master-id>"

podman pods status:

$ podman ps -a
CONTAINER ID  IMAGE                                                                                                                   COMMAND               CREATED         STATUS                     PORTS       NAMES
20b338ab8906  localhost/podman-pause:4.4.1-1707368644                                                                                                       16 hours ago    Up 16 hours                            d2b97e733b33-infra
0876c611f655  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27c5328e1d9a0d7db874c6e52efae631ab3c29a3d4da50c50b2e783dcb784128  /bin/bash start_d...  16 hours ago    Up 16 hours                            assisted-db
a9a116bed3a7  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27c5328e1d9a0d7db874c6e52efae631ab3c29a3d4da50c50b2e783dcb784128  /assisted-service     16 hours ago    Up 16 hours                            service
0afbe44c2cf2  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:27c5328e1d9a0d7db874c6e52efae631ab3c29a3d4da50c50b2e783dcb784128  /usr/local/bin/ag...  16 hours ago    Exited (0) 16 hours ago                apply-host-config
45da1bdf2440  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b3daca74ad515845d5f8dcf384f0e51d58751a2785414edc3f20969a6fc0403  next_step_runner ...  16 hours ago    Up 16 hours                            next-step-runner
8d1306b0ea3a  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:79e97d8cbd27e2c7402f7e016de97ca2b1f4be27bd52a981a27e7a2132be1ef4  --role bootstrap ...  16 hours ago    Exited (143) 15 hours ago              assisted-installer
8b0cc08890b4  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f44844c4024dfa35688eac52e5e3d1540311771c4a24fef1ba4a6dccecc0e55  start --node-name...  16 hours ago    Exited (0) 16 hours ago                hungry_varahamihira
4916c14b9f7e  registry.redhat.io/rhel9/support-tools:latest                                                                           /usr/bin/bash         34 seconds ago  Up 34 seconds                          toolbox-core

crio pods status:

CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                 ATTEMPT             POD ID              POD
03b89032db0bc       98fc664e8c2aa859c10ec8ea740b083c7c85925d75506bcb85c6c9c640945c36                                                         13 seconds ago      Exited              etcd                 182                 5d42cdad70890       etcd-bootstrap-member-<failed-master-name>.local
01008c6e32e5a       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b38d75b297fa52d1ba29af0715cec2430cd5fda1a608ed0841a09c55c292fb3   16 hours ago        Running             coredns              0                   5f8736b856a0c       coredns-<failed-master-name> 5e00e89ebef34       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e119d0d9f8470dd634a62329d2670602c5f169d0d9bbe5ad25cee07e716c94b   16 hours ago        Exited              render-config        0                   5f8736b856a0c       coredns-<failed-master-name> f5098d5d27a39       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e119d0d9f8470dd634a62329d2670602c5f169d0d9bbe5ad25cee07e716c94b   16 hours ago        Running             keepalived-monitor   0                   4fb91cefa8a9e       keepalived-<failed-master-name> a1e9d4c8cf477       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d24879d39e10fcf00a7c28ab23de1d6cf0c433a1234ff34880f12642b75d4512   16 hours ago        Running             keepalived           0                   4fb91cefa8a9e       keepalived-<failed-master-name> de21bc99f0d3f       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8c74c57f91f0f7ed26bb62f58c7b84c55750e51947fd6cc5711fa18f30b9f68c   16 hours ago        Running             etcdctl              0                   5d42cdad70890       etcd-bootstrap-member-<failed-master-name>

https://github.com/openshift/assisted-installer/pull/823

Bug OCPBUGS-34518: Local development: User toggle is not visible when authentification is disabled

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-32950~~. The following is the description of the original issue:
—
Description of problem:
Affects only developers with a local build.

Version-Release number of selected component (if applicable):
4.15

How reproducible:
Always

Steps to Reproduce:
Build and run the console locally.

Actual results:
The user toggle menu isn't shown, so developers cannot access the user preference, such as the language or theme.

Expected results:
The user toggle should be there.

https://github.com/openshift/console/pull/13902

Bug MGMT-14226: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-28676: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2238

Bug MGMT-17361: [Staging] - Click new cluster ends with endless loading page wheel

View the Description View the linked PRs

Description of the problem:

Staging UI 2.31.1, BE 2.31.0 - click on create new cluster - UI have rolling wheel but nothing loads and no error can be found.

Edit:
BE v2/openshift-versions response is empty

How reproducible:

100%

Steps to reproduce:

1. Click on create new cluster

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/6115

Bug OCPBUGS-10498: kubelet service is unable to parse the "kubelet_node_name" when multiple domain name used

View the Description View the linked PRs

Description of problem:

The customer requires multiple domain names in their AWS VPCs DHCP option set which is to allow on-prem DNS(infoblox) lookups to work. 

The problem is that kubelet service is unable to parse the node name properly. 

~~~
hyperkube[2562]: Error: failed to run Kubelet: failed to create kubelet: could not initialize volume plugins for KubeletVolumePluginMgr: parse "http://example.compute.internal example.com:9001": invalid character " " in host name
~~~

/etc/systemd/system/kubelet.service.d/20-aws-node-name.conf
[Service]
Environment="KUBELET_NODE_NAME=ip-x-x-x-x.example.example test.example"
                                                         ^
                                                        space


The is customer is aware of this KCS article. If the cu follows what the KCS article says, it will break their DNS functionality. 

Kubelet fails to start on nodes during OCP 4.x IPI installation on AWS - Red Hat Customer Portal
https://access.redhat.com/solutions/6978959

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create/adding a node with multiple domain names
2. Add base domain to the DHCP option in the VPC setting
3.

Actual results:

kubelet is failing to start

Expected results:

should be able to add a worker node that has multiple domain names

Additional info:

https://github.com/openshift/machine-config-operator/pull/4368

Bug OCPBUGS-26023: SessionAffinity does not work after scaling down the Pods

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

1. oc patch svc <svc> --type merge --patch '{"spec":{"sessionAffinity":"ClientIP"}}'

2. curl <svc>:<port>

3. oc scale --replicas=3 deploy/<deploy>

4. oc scale --replicas=0 deploy/<deploy>

5. oc scale --replicas=3 deploy/<deploy>

Actual results:

Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 54850
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 46668
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 46682
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60144
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60150
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 60160
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 51720

Expected results:

Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46914
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46928
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46944
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40510
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40520

Additional info:

See the hostname in the server log output for each command.

$ oc patch svc <svc> --type merge --patch '{"spec":{"sessionAffinity":"ClientIP"}}'

Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46914
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46928
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 46944
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40510
Hostname: tcp-668655b888-9mmfc, SourceIP: 10.128.1.205, SourcePort: 40520

$ oc scale --replicas=1 deploy/<deploy>

Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 47082
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 47088
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 54832
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 54848

$ oc scale --replicas=3 deploy/<deploy>

Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 54850
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 46668
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 46682
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60144
Hostname: tcp-668655b888-dcwlr, SourceIP: 10.128.1.205, SourcePort: 60150
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 60160
Hostname: tcp-668655b888-xxcn2, SourceIP: 10.128.1.205, SourcePort: 51720

https://github.com/openshift/ovn-kubernetes/pull/2018

Bug OCPBUGS-25927: vSphere config ini does not have resourcepool-path key

View the Description View the linked PRs

Reported in https://issues.redhat.com/browse/MGMT-14527

resourcepool-path key needs to be added to config ini

https://github.com/openshift/console/pull/13210

Bug OCPBUGS-30545: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/115

Bug OCPBUGS-30267: [IBMCloud] MonitorTests liveness/readiness probe error events repeat

View the Description View the linked PRs

Description of problem:

    All e2e-ibmcloud-ovn testing is failing due to repeated events of liveness or readiness probes failing during MonitorTests.

Version-Release number of selected component (if applicable):

    4.16.0-0.ci.test-2024-02-20-184205-ci-op-lghcpt9x-latest

How reproducible:

    Appears to be 100%

Steps to Reproduce:

    1. Setup IPI cluster on IBM Cloud
    2. Run OCP Conformance w/ MonitorTests (CI does this on IBM Cloud related PR's)

Actual results:

    Failed OCP Conformance tests, due to MonitorTests failure:

: [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager expand_less0s{  2 events happened too frequently

event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-2 pod/ibm-cloud-controller-manager-6c5f8594c5-bpnm8 hmsg/d91441a732 - reason/ProbeError Liveness probe error: Get "https://10.241.129.4:10258/healthz": dial tcp 10.241.129.4:10258: connect: connection refused result=reject 
body: 
 From: 20:25:44Z To: 20:25:45Z
event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/ibm-cloud-controller-manager-6c5f8594c5-wn4fq hmsg/fda26f2bbf - reason/ProbeError Liveness probe error: Get "https://10.241.64.6:10258/healthz": dial tcp 10.241.64.6:10258: connect: connection refused result=reject 
body: 
 From: 20:25:54Z To: 20:25:55Z}


: [sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver expand_less0s{  1 events happened too frequently

event happened 25 times, something is wrong: namespace/openshift-oauth-apiserver node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/apiserver-c5ff4776b-kqg7c hmsg/c9e932e38d - reason/ProbeError Readiness probe error: HTTP probe failed with statuscode: 500 result=reject 
body: [+]ping ok
[+]log ok
[+]etcd ok
[-]etcd-readiness failed: reason withheld
[+]informer-sync ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/openshift.io-StartUserInformer ok
[+]poststarthook/openshift.io-StartOAuthInformer ok
[+]poststarthook/openshift.io-StartTokenTimeoutUpdater ok
[+]shutdown ok
readyz check failed

 From: 20:25:04Z To: 20:25:05Z}

Expected results:

    Passing OCP Conformance (w/ MonitorTests) test

Additional info:

    The frequent (perhaps only) failures appear to occur via:

[sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager

[sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver

I am unsure on the cause of the liveness/readiness probe failures as of yet, unsure if the underlying Infrastructure is the cause (and if so, what resource).

https://github.com/openshift/origin/pull/28667

Bug OCPBUGS-32366: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-24747: Update 4.16 ironic-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-image/pull/438

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-image/pull/438

Bug OCPBUGS-36775: RHEL worker nodes no longer work due to missing MGLRU

View the Description View the linked PRs

This is a clone of issue OCPBUGS-36711. The following is the description of the original issue:
—
Description of problem:

    With the changes in https://github.com/openshift/machine-config-operator/pull/4425, RHEL worker nodes fail as follows:

[root@ptalgulk-0807c-fq97t-w-a-l-rhel-1 cloud-user]# systemctl --failed
  UNIT                  LOAD   ACTIVE SUB    DESCRIPTION                
● disable-mglru.service loaded failed failed Disables MGLRU on Openshfit

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
[root@ptalgulk-0807c-fq97t-w-a-l-rhel-1 cloud-user]# journalctl -u disable-mglru.service
-- Logs begin at Mon 2024-07-08 06:23:03 UTC, end at Mon 2024-07-08 08:31:35 UTC. --
Jul 08 06:23:14 localhost.localdomain systemd[1]: Starting Disables MGLRU on Openshfit...
Jul 08 06:23:14 localhost.localdomain bash[710]: /usr/bin/bash: /sys/kernel/mm/lru_gen/enabled: No such file or directory
Jul 08 06:23:14 localhost.localdomain systemd[1]: disable-mglru.service: Main process exited, code=exited, status=1/FAILURE
Jul 08 06:23:14 localhost.localdomain systemd[1]: disable-mglru.service: Failed with result 'exit-code'.
Jul 08 06:23:14 localhost.localdomain systemd[1]: Failed to start Disables MGLRU on Openshfit.
Jul 08 06:23:14 localhost.localdomain systemd[1]: disable-mglru.service: Consumed 4ms CPU time

We should only disable mglru if it exists.

Version-Release number of selected component (if applicable):

    4.16, 4.17

How reproducible:

    Attempt to bring up rhel worker node

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4462

Bug OCPBUGS-22739: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/534

Bug OCPBUGS-25612: Logs for PipelineRuns fetched from the Tekton Results API is not loading

View the Description View the linked PRs

Description of problem:

    Logs for PipelineRuns fetched from the Tekton Results API is not loading

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Navigate to the Log tab of PipelineRun fetched from the Tekton Results
    2.
    3.

Actual results:

    Logs window is empty with a loading indicator

Expected results:

Logs should be shown

Additional info:

https://github.com/openshift/console/pull/13455

Bug OCPBUGS-26992: Dockerfile.okd needs to be updated

View the Description View the linked PRs

Dockerfile.okd is behind compared to Dockerfile

https://github.com/openshift/cluster-samples-operator/pull/531

Bug OCPBUGS-24989: Update 4.16 ose-cluster-kube-storage-version-migrator-operator-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/101

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/101

Bug OCPBUGS-22542: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/53

Bug OCPBUGS-29247: ibm-vpc-block-csi-driver is missing sidecar metrics

View the Description View the linked PRs

Description of problem:

ibm-vpc-block-csi-driver deployment is missing sidecar metrics and the kube-rbac-proxy sidecar

Version-Release number of selected component (if applicable):

4.10+

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/112

Bug OCPBUGS-29373: http/2 connection coalescing component routing broken with single certificate

View the Description View the linked PRs

Description of problem:

Due to HTTP/2 Connection Coalescing (https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/), routes which use the same certificate can present unexplained 503 errors when attempting to access an HTTP/2 enabled ingress controller.

It appears that HAProxy supports the ability to force HTTP 1.1 on a route-by-route basis, but our Ingress Controller does not expose that option.

This is especially problematic for component routes because generally speaking, customers use a wildcard or SAN to deploy custom component routes (console, OAuth, downloads), but with HTTP/2, this does not work properly.

To address this issue, we're proposing the creation of an annotation haproxy.router.openshift.io/http2-disable, which will allow the disabling of HTTP/2 on a route-by-route basis, or smarter logic built into our Ingress operator to handle this situation.

Version-Release number of selected component (if applicable):

 OpenShift 4.14

How reproducible:

Serve routes to applications in Openshift.
Observe the routes through a HTTP/2 enabled client.
Notice that http/2 client connections are broken (returns 503 on second connection when using same certificates across a mix of re-encrypt and passthrough routes)

Steps to Reproduce:

(see above notes)

Actual results:

503 error

Expected results:

no error

Additional info:

Bug OCPBUGS-36854: Cherry-pick Prometheus remote-write bug fix up to 4.16

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-36768~~. The following is the description of the original issue:
—
Description of problem:


https://github.com/prometheus/prometheus/pull/14446 is a fix for https://github.com/prometheus/prometheus/issues/14087 (see there for details)

This was introduced in Prom 2.51.0 https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/deps-versions.md

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/prometheus/pull/212

Bug MGMT-17412: [stg] assisted-service does not fallback to stale release images data in case there is any and error occurred

View the Description View the linked PRs

Description of the problem:

On startup, if release images syncing fails, the service will fail instead of checking whether it can proceed with stale data.

How reproducible:

Always.

Steps to reproduce:

Mock release images service, make it succeed once and then fail.
Restart the service.

Actual results:

assisted-service will fail.

Expected results:

assisted-service will continue with stale data

Bug OCPBUGS-18577: e2e test failure: [sig-network][Feature:EgressFirewall] when using openshift ovn-kubernetes should ensure egressfirewall is created"

View the Description View the linked PRs

Description of problem:

Job link: powervs » zstreams » zstream-ocp4x-powervs-london06-p9-current-upgrade #282 Console [Jenkins] (ibm.com)

Must-gather link

long snippet from e2e log

external internet 09/01/23 07:26:09.624
Sep  1 07:26:09.624: INFO: Running 'oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 http://www.google.com:80'
STEP: creating an egressfirewall object 09/01/23 07:26:09.903
STEP: calling oc create -f /tmp/fixture-testdata-dir978363556/test/extended/testdata/egress-firewall/ovnk-egressfirewall-test.yaml 09/01/23 07:26:09.903
Sep  1 07:26:09.904: INFO: Running 'oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/root/.kube/config create -f /tmp/fixture-testdata-dir978363556/test/extended/testdata/egress-firewall/ovnk-egressfirewall-test.yaml'
egressfirewall.k8s.ovn.org/default createdSTEP: sending traffic to control plane nodes should work 09/01/23 07:26:22.122
Sep  1 07:26:22.130: INFO: Running 'oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443'
Sep  1 07:26:23.358: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443:
StdOut>
command terminated with exit code 28
StdErr>
command terminated with exit code 28[AfterEach] [sig-network][Feature:EgressFirewall]
  github.com/openshift/origin/test/extended/util/client.go:180
STEP: Collecting events from namespace "e2e-test-egress-firewall-e2e-2vvzx". 09/01/23 07:26:23.358
STEP: Found 4 events. 09/01/23 07:26:23.361
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {multus } AddedInterface: Add eth0 [10.131.0.89/23] from ovn-kubernetes
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {kubelet lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com} Pulled: Container image "quay.io/openshift/community-e2e-images:e2e-quay-io-redhat-developer-nfs-server-1-1-dlXGfzrk5aNo8EjC" already present on machine
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {kubelet lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com} Created: Created container egressfirewall-container
Sep  1 07:26:23.361: INFO: At 2023-09-01 07:26:08 -0400 EDT - event for egressfirewall: {kubelet lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com} Started: Started container egressfirewall-container
Sep  1 07:26:23.363: INFO: POD             NODE                                           PHASE    GRACE  CONDITIONS
Sep  1 07:26:23.363: INFO: egressfirewall  lon06-worker-0.rdr-qe-ocp-upi-7250.redhat.com  Running         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:07 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:09 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:09 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2023-09-01 07:26:07 -0400 EDT  }]
Sep  1 07:26:23.363: INFO: 
Sep  1 07:26:23.367: INFO: skipping dumping cluster info - cluster too large
Sep  1 07:26:23.383: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-egress-firewall-e2e-2vvzx-user}, err: <nil>
Sep  1 07:26:23.398: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-egress-firewall-e2e-2vvzx}, err: <nil>
Sep  1 07:26:23.414: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~X_2HPGEj3O9hpd-3XKTckrp9bO23s_7zlJ3Tkn7ncBE}, err: <nil>
[AfterEach] [sig-network][Feature:EgressFirewall]
  github.com/openshift/origin/test/extended/util/client.go:180
STEP: Collecting events from namespace "e2e-test-no-egress-firewall-e2e-84f48". 09/01/23 07:26:23.414
STEP: Found 0 events. 09/01/23 07:26:23.416
Sep  1 07:26:23.417: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Sep  1 07:26:23.417: INFO: 
Sep  1 07:26:23.421: INFO: skipping dumping cluster info - cluster too large
Sep  1 07:26:23.446: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-no-egress-firewall-e2e-84f48-user}, err: <nil>
Sep  1 07:26:23.451: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-no-egress-firewall-e2e-84f48}, err: <nil>
Sep  1 07:26:23.457: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~2Lk8-jWfwpdyo59E9YF7kQFKH2LBUSvnbJdKj7rOzn4}, err: <nil>
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  dump namespaces | framework.go:196
STEP: dump namespace information after failure 09/01/23 07:26:23.457
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  tear down framework | framework.go:193
STEP: Destroying namespace "e2e-test-no-egress-firewall-e2e-84f48" for this suite. 09/01/23 07:26:23.457
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  dump namespaces | framework.go:196
STEP: dump namespace information after failure 09/01/23 07:26:23.462
[DeferCleanup (Each)] [sig-network][Feature:EgressFirewall]
  tear down framework | framework.go:193
STEP: Destroying namespace "e2e-test-egress-firewall-e2e-2vvzx" for this suite. 09/01/23 07:26:23.463
fail [github.com/openshift/origin/test/extended/networking/egress_firewall.go:155]: Unexpected error:
    <*fmt.wrapError | 0xc001dd50a0>: {
        msg: "Error running /usr/local/bin/oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443:\nStdOut>\ncommand terminated with exit code 28\nStdErr>\ncommand terminated with exit code 28\nexit status 28\n",
        err: <*exec.ExitError | 0xc001dd5080>{
            ProcessState: {
                pid: 140483,
                status: 7168,
                rusage: {
                    Utime: {Sec: 0, Usec: 149480},
                    Stime: {Sec: 0, Usec: 19930},
                    Maxrss: 222592,
                    Ixrss: 0,
                    Idrss: 0,
                    Isrss: 0,
                    Minflt: 1536,
                    Majflt: 0,
                    Nswap: 0,
                    Inblock: 0,
                    Oublock: 0,
                    Msgsnd: 0,
                    Msgrcv: 0,
                    Nsignals: 0,
                    Nvcsw: 596,
                    Nivcsw: 173,
                },
            },
            Stderr: nil,
        },
    }
    Error running /usr/local/bin/oc --namespace=e2e-test-egress-firewall-e2e-2vvzx --kubeconfig=/tmp/configfile1049873803 exec egressfirewall -- curl -q -s -I -m1 -k https://193.168.200.248:6443:
    StdOut>
    command terminated with exit code 28
    StdErr>
    command terminated with exit code 28
    exit status 28
    
occurred
Ginkgo exit error 1: exit with code 1failed: (18.7s) 2023-09-01T11:26:23 "[sig-network][Feature:EgressFirewall] when using openshift ovn-kubernetes should ensure egressfirewall is created  [Suite:openshift/conformance/parallel]"

Version-Release number of selected component (if applicable):

4.13.11

How reproducible:

This e2e failure is not consistently reproduceable.

Steps to Reproduce:

1.Start a Z stream Job via Jenkins
2.monitor e2e

Actual results:

e2e is getting failed

Expected results:

e2e should pass

Additional info:

https://github.com/openshift/origin/pull/28300

Bug OCPBUGS-35374: [4.16] HCP oauth-openshift panics when anonymously curl'ed (not seen in OCP)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-33695~~. The following is the description of the original issue:
—
Description of problem:

Found in QE CI case failure https://issues.redhat.com/browse/OCPQE-22045 that: 4.16 HCP oauth-openshift panics when anonymously curl'ed (this is not seen in OCP 4.16 and HCP 4.15).

Version-Release number of selected component (if applicable):

HCP 4.16 4.16.0-0.nightly-2024-05-14-165654

How reproducible:

Always

Steps to Reproduce:

1.
$ export KUBECONFIG=HCP.kubeconfig
$ oc get --raw=/.well-known/oauth-authorization-server | jq -r .issuer
https://oauth-clusters-hypershift-ci-283235.apps.xxxx.com:443

2. Panics when anonymously curl'ed:
$ curl -k "https://oauth-clusters-hypershift-ci-283235.apps.xxxx.com:443/oauth/authorize?response_type=token&client_id=openshift-challenging-client"
This request caused apiserver to panic. Look in the logs for details.

3. Check logs.
$ oc --kubeconfig=/home/xxia/my/env/hypershift-management/mjoseph-hyp-283235-416/kubeconfig -n clusters-hypershift-ci-283235 get pod | grep oauth-openshift
oauth-openshift-55c6967667-9bxz9                     2/2     Running   0          6h23m
oauth-openshift-55c6967667-l55fh                     2/2     Running   0          6h22m
oauth-openshift-55c6967667-ntc6l                     2/2     Running   0          6h23m

$ for i in oauth-openshift-55c6967667-9bxz9 oauth-openshift-55c6967667-l55fh oauth-openshift-55c6967667-ntc6l; do oc logs --timestamps --kubeconfig=/home/xxia/my/env/hypershift-management/mjoseph-hyp-283235-416/kubeconfig -n clusters-hypershift-ci-283235 $i > logs/hypershift-management/mjoseph-hyp-283235-416/$i.log; done

$ grep -il panic *.log
oauth-openshift-55c6967667-ntc6l.log

$ cat oauth-openshift-55c6967667-ntc6l.log
2024-05-15T03:43:59.769424528Z I0515 03:43:59.769303       1 secure_serving.go:57] Forcing use of http/1.1 only
2024-05-15T03:43:59.772754182Z I0515 03:43:59.772725       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
2024-05-15T03:43:59.772803132Z I0515 03:43:59.772782       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
2024-05-15T03:43:59.772841518Z I0515 03:43:59.772834       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
2024-05-15T03:43:59.772870498Z I0515 03:43:59.772787       1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
2024-05-15T03:43:59.772982605Z I0515 03:43:59.772736       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
2024-05-15T03:43:59.773009678Z I0515 03:43:59.773002       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
2024-05-15T03:43:59.773214896Z I0515 03:43:59.773194       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/etc/kubernetes/certs/serving-cert/tls.crt::/etc/kubernetes/certs/serving-cert/tls.key"
2024-05-15T03:43:59.773939655Z I0515 03:43:59.773923       1 secure_serving.go:213] Serving securely on [::]:6443
2024-05-15T03:43:59.773965659Z I0515 03:43:59.773952       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
2024-05-15T03:43:59.873008524Z I0515 03:43:59.872970       1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
2024-05-15T03:43:59.873078108Z I0515 03:43:59.873021       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
2024-05-15T03:43:59.873120163Z I0515 03:43:59.873032       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
2024-05-15T09:25:25.782066400Z E0515 09:25:25.782026       1 runtime.go:77] Observed a panic: runtime error: invalid memory address or nil pointer dereference
2024-05-15T09:25:25.782066400Z goroutine 8662 [running]:
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/timeout.go:110 +0x9c
2024-05-15T09:25:25.782066400Z panic({0x2115f60?, 0x3c45ec0?})
2024-05-15T09:25:25.782066400Z     runtime/panic.go:914 +0x21f
2024-05-15T09:25:25.782066400Z github.com/openshift/oauth-server/pkg/oauth/handlers.(*unionAuthenticationHandler).AuthenticationNeeded(0xc0008a90e0, {0x7f2a74268bd8?, 0xc000607760?}, {0x293c340?, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     github.com/openshift/oauth-server/pkg/oauth/handlers/default_auth_handler.go:122 +0xce1
2024-05-15T09:25:25.782066400Z github.com/openshift/oauth-server/pkg/oauth/handlers.(*authorizeAuthenticator).HandleAuthorize(0xc0008a9110, 0xc0007b06c0, 0x7?, {0x293c340, 0xc0007d1ef0})
2024-05-15T09:25:25.782066400Z     github.com/openshift/oauth-server/pkg/oauth/handlers/authenticator.go:54 +0x21d
2024-05-15T09:25:25.782066400Z github.com/openshift/oauth-server/pkg/osinserver.AuthorizeHandlers.HandleAuthorize({0xc0008a91a0?, 0x3, 0x772d66?}, 0x22ef8e0?, 0xc0007b2420?, {0x293c340, 0xc0007d1ef0})
2024-05-15T09:25:25.782066400Z     github.com/openshift/oauth-server/pkg/osinserver/interfaces.go:29 +0x95
2024-05-15T09:25:25.782066400Z github.com/openshift/oauth-server/pkg/osinserver.(*osinServer).handleAuthorize(0xc0004a54c0, {0x293c340, 0xc0007d1ef0}, 0xd?)
2024-05-15T09:25:25.782066400Z     github.com/openshift/oauth-server/pkg/osinserver/osinserver.go:77 +0x25e
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x0?, {0x293c340?, 0xc0007d1ef0?}, 0x410acc?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z net/http.(*ServeMux).ServeHTTP(0x2390e60?, {0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2514 +0x142
2024-05-15T09:25:25.782066400Z github.com/openshift/oauth-server/pkg/oauthserver.(*OAuthServerConfig).buildHandlerChainForOAuth.WithRestoreOAuthHeaders.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     github.com/openshift/oauth-server/pkg/server/headers/oauthbasic.go:57 +0x1ca
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x235fda0?, {0x293c340?, 0xc0007d1ef0?}, 0x291ef40?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.TrackCompleted.trackCompleted.func21({0x293c340?, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:110 +0x177
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x29490d0?, {0x293c340?, 0xc0007d1ef0?}, 0x4?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filters.withAuthorization.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/authorization.go:78 +0x639
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xc1893dc16e2d2585?, {0x293c340?, 0xc0007d1ef0?}, 0xc0007fabb8?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:84 +0x192
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x3c5b920?, {0x293c340?, 0xc0007d1ef0?}, 0x3?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/maxinflight.go:196 +0x262
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x235fda0?, {0x293c340?, 0xc0007d1ef0?}, 0x291ef40?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.TrackCompleted.trackCompleted.func23({0x293c340?, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:110 +0x177
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x7f2a74226390?, {0x293c340?, 0xc0007d1ef0?}, 0xc0007953c8?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithImpersonation.func4({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/impersonation.go:50 +0x1c3
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xcd1160?, {0x293c340?, 0xc0007d1ef0?}, 0x0?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:84 +0x192
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x235fda0?, {0x293c340?, 0xc0007d1ef0?}, 0x291ef40?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.TrackCompleted.trackCompleted.func24({0x293c340?, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:110 +0x177
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xcd1160?, {0x293c340?, 0xc0007d1ef0?}, 0x0?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:84 +0x192
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x235fda0?, {0x293c340?, 0xc0007d1ef0?}, 0x291ef40?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.TrackCompleted.trackCompleted.func26({0x293c340?, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:110 +0x177
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x29490d0?, {0x293c340?, 0xc0007d1ef0?}, 0x291a100?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filters.withAuthentication.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3700)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/authentication.go:120 +0x7e5
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x29490d0?, {0x293c340?, 0xc0007d1ef0?}, 0x291ef40?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x293c340, 0xc0007d1ef0}, 0xc0007a3500)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filterlatency/filterlatency.go:94 +0x37a
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xc0003e0900?, {0x293c340?, 0xc0007d1ef0?}, 0xc00061af20?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1()
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/timeout.go:115 +0x62
2024-05-15T09:25:25.782066400Z created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP in goroutine 8660
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/timeout.go:101 +0x1b2
2024-05-15T09:25:25.782066400Z 
2024-05-15T09:25:25.782066400Z goroutine 8660 [running]:
2024-05-15T09:25:25.782066400Z k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1fb1a00?, 0xc000810260})
2024-05-15T09:25:25.782066400Z     k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:75 +0x85
2024-05-15T09:25:25.782066400Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0xc0005aa840, 0x1, 0x1c08865?})
2024-05-15T09:25:25.782066400Z     k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:49 +0x6b
2024-05-15T09:25:25.782066400Z panic({0x1fb1a00?, 0xc000810260?})
2024-05-15T09:25:25.782066400Z     runtime/panic.go:914 +0x21f
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc000528cc0, {0x2944dd0, 0xc000476460}, 0xdf8475800?)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/timeout.go:121 +0x35c
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithRequestDeadline.withRequestDeadline.func27({0x2944dd0, 0xc000476460}, 0xc0007a3300)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/request_deadline.go:100 +0x237
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x29490d0?, {0x2944dd0?, 0xc000476460?}, 0x2459ac0?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithWaitGroup.withWaitGroup.func28({0x2944dd0, 0xc000476460}, 0xc0004764b0?)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/waitgroup.go:86 +0x18c
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xc0007a3200?, {0x2944dd0?, 0xc000476460?}, 0xc0004764b0?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithWarningRecorder.func13({0x2944dd0?, 0xc000476460}, 0xc000476410?)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/warning.go:35 +0xc6
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0x2390e60?, {0x2944dd0?, 0xc000476460?}, 0xd?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithCacheControl.func14({0x2944dd0, 0xc000476460}, 0x0?)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/cachecontrol.go:31 +0xa7
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xc0002a0fa0?, {0x2944dd0?, 0xc000476460?}, 0xc0005aad90?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithHTTPLogging.WithLogging.withLogging.func34({0x2944dd0, 0xc000476460}, 0x1?)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/server/httplog/httplog.go:111 +0x95
2024-05-15T09:25:25.782066400Z net/http.HandlerFunc.ServeHTTP(0xc0007b0360?, {0x2944dd0?, 0xc000476460?}, 0x0?)
2024-05-15T09:25:25.782066400Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782066400Z k8s.io/apiserver/pkg/endpoints/filters.WithTracing.func1({0x2944dd0?, 0xc000476460?}, 0xc0007a3200?)
2024-05-15T09:25:25.782066400Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/traces.go:42 +0x222
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0x29490d0?, {0x2944dd0?, 0xc000476460?}, 0x291ef40?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP(0xc000289b80, {0x293c340?, 0xc0007d1bf0}, 0xc0007a3100, {0x2923a40, 0xc000528d68})
2024-05-15T09:25:25.782129547Z     go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.44.0/handler.go:217 +0x1202
2024-05-15T09:25:25.782129547Z go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1({0x293c340?, 0xc0007d1bf0?}, 0xc0001fec40?)
2024-05-15T09:25:25.782129547Z     go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.44.0/handler.go:81 +0x35
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0x2948fb0?, {0x293c340?, 0xc0007d1bf0?}, 0x100?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithLatencyTrackers.func16({0x29377e0?, 0xc0001fec40}, 0xc000289e40?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/webhook_duration.go:57 +0x14a
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0xc0007a2f00?, {0x29377e0?, 0xc0001fec40?}, 0x7f2abb853108?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithRequestInfo.func17({0x29377e0, 0xc0001fec40}, 0x3d02360?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/requestinfo.go:39 +0x118
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0xc0007a2e00?, {0x29377e0?, 0xc0001fec40?}, 0x12a1dc02246f?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithRequestReceivedTimestamp.withRequestReceivedTimestampWithClock.func31({0x29377e0, 0xc0001fec40}, 0xc000508b58?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/request_received_time.go:38 +0xaf
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0x3?, {0x29377e0?, 0xc0001fec40?}, 0xc0005ab818?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithMuxAndDiscoveryComplete.func18({0x29377e0?, 0xc0001fec40?}, 0xc0007a2e00?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/mux_discovery_complete.go:52 +0xd5
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0xc000081800?, {0x29377e0?, 0xc0001fec40?}, 0xc0005ab888?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithPanicRecovery.withPanicRecovery.func32({0x29377e0?, 0xc0001fec40?}, 0xc0007d18f0?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/server/filters/wrap.go:74 +0xa6
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0x29490d0?, {0x29377e0?, 0xc0001fec40?}, 0xc00065eea0?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithAuditInit.withAuditInit.func33({0x29377e0, 0xc0001fec40}, 0xc00040c580?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/endpoints/filters/audit_init.go:63 +0x12c
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0x2390e60?, {0x29377e0?, 0xc0001fec40?}, 0xd?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z github.com/openshift/oauth-server/pkg/oauthserver.(*OAuthServerConfig).buildHandlerChainForOAuth.WithPreserveOAuthHeaders.func2({0x29377e0, 0xc0001fec40}, 0xc0007a2d00)
2024-05-15T09:25:25.782129547Z     github.com/openshift/oauth-server/pkg/server/headers/oauthbasic.go:42 +0x16e
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0xc0005aba80?, {0x29377e0?, 0xc0001fec40?}, 0x24c95d5?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z github.com/openshift/oauth-server/pkg/oauthserver.(*OAuthServerConfig).buildHandlerChainForOAuth.WithStandardHeaders.func3({0x29377e0, 0xc0001fec40}, 0xc0005abb18?)
2024-05-15T09:25:25.782129547Z     github.com/openshift/oauth-server/pkg/server/headers/headers.go:30 +0xde
2024-05-15T09:25:25.782129547Z net/http.HandlerFunc.ServeHTTP(0xc0005abb68?, {0x29377e0?, 0xc0001fec40?}, 0xc00040c580?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2136 +0x29
2024-05-15T09:25:25.782129547Z k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0x3d33480?, {0x29377e0?, 0xc0001fec40?}, 0xc0005abb50?)
2024-05-15T09:25:25.782129547Z     k8s.io/apiserver@v0.29.2/pkg/server/handler.go:189 +0x25
2024-05-15T09:25:25.782129547Z net/http.serverHandler.ServeHTTP({0xc0007d1830?}, {0x29377e0?, 0xc0001fec40?}, 0x6?)
2024-05-15T09:25:25.782129547Z     net/http/server.go:2938 +0x8e
2024-05-15T09:25:25.782129547Z net/http.(*conn).serve(0xc0007b02d0, {0x29490d0, 0xc000585e90})
2024-05-15T09:25:25.782129547Z     net/http/server.go:2009 +0x5f4
2024-05-15T09:25:25.782129547Z created by net/http.(*Server).Serve in goroutine 249
2024-05-15T09:25:25.782129547Z     net/http/server.go:3086 +0x5cb
2024-05-15T09:25:25.782129547Z http: superfluous response.WriteHeader call from k8s.io/apiserver/pkg/server.DefaultBuildHandlerChain.WithPanicRecovery.func19 (wrap.go:57)
2024-05-15T09:25:25.782129547Z E0515 09:25:25.782066       1 wrap.go:58] "apiserver panic'd" method="GET" URI="/oauth/authorize?response_type=token&client_id=openshift-challenging-client" auditID="ac4795ff-5935-4ff5-bc9e-d84018f29469"

Actual results:

Panics when anonymously curl'ed

Expected results:

No panic

https://github.com/openshift/oauth-server/pull/152

Bug OCPBUGS-36249: [CAPI Install] Fail to collect applied cluster api manifests when setting ENV OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-35852~~. The following is the description of the original issue:
—
Description of problem:

When setting ENV OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP to keep bootstrap, and launch capi-based installation, installer exit with error when collecting applied cluster api manifests..., since local cluster api was already stopped.


06-20 15:26:51.216  level=debug msg=Machine jima417aws-gjrzd-bootstrap is ready. Phase: Provisioned
06-20 15:26:51.216  level=debug msg=Checking that machine jima417aws-gjrzd-master-0 has provisioned...
06-20 15:26:51.217  level=debug msg=Machine jima417aws-gjrzd-master-0 has status: Provisioned
06-20 15:26:51.217  level=debug msg=Checking that IP addresses are populated in the status of machine jima417aws-gjrzd-master-0...
06-20 15:26:51.217  level=debug msg=Checked IP InternalDNS: ip-10-0-50-47.us-east-2.compute.internal
06-20 15:26:51.217  level=debug msg=Found internal IP address: 10.0.50.47
06-20 15:26:51.217  level=debug msg=Machine jima417aws-gjrzd-master-0 is ready. Phase: Provisioned
06-20 15:26:51.217  level=debug msg=Checking that machine jima417aws-gjrzd-master-1 has provisioned...
06-20 15:26:51.217  level=debug msg=Machine jima417aws-gjrzd-master-1 has status: Provisioned
06-20 15:26:51.217  level=debug msg=Checking that IP addresses are populated in the status of machine jima417aws-gjrzd-master-1...
06-20 15:26:51.218  level=debug msg=Checked IP InternalDNS: ip-10-0-75-199.us-east-2.compute.internal
06-20 15:26:51.218  level=debug msg=Found internal IP address: 10.0.75.199
06-20 15:26:51.218  level=debug msg=Machine jima417aws-gjrzd-master-1 is ready. Phase: Provisioned
06-20 15:26:51.218  level=debug msg=Checking that machine jima417aws-gjrzd-master-2 has provisioned...
06-20 15:26:51.218  level=debug msg=Machine jima417aws-gjrzd-master-2 has status: Provisioned
06-20 15:26:51.218  level=debug msg=Checking that IP addresses are populated in the status of machine jima417aws-gjrzd-master-2...
06-20 15:26:51.218  level=debug msg=Checked IP InternalDNS: ip-10-0-60-118.us-east-2.compute.internal
06-20 15:26:51.218  level=debug msg=Found internal IP address: 10.0.60.118
06-20 15:26:51.218  level=debug msg=Machine jima417aws-gjrzd-master-2 is ready. Phase: Provisioned
06-20 15:26:51.218  level=info msg=Control-plane machines are ready
06-20 15:26:51.218  level=info msg=Cluster API resources have been created. Waiting for cluster to become ready...
06-20 15:26:51.219  level=warning msg=OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP is set, shutting down local control plane.
06-20 15:26:51.219  level=info msg=Shutting down local Cluster API control plane...
06-20 15:26:51.473  level=info msg=Stopped controller: Cluster API
06-20 15:26:51.473  level=info msg=Stopped controller: aws infrastructure provider
06-20 15:26:52.830  level=info msg=Local Cluster API system has completed operations
06-20 15:26:52.830  level=debug msg=Collecting applied cluster api manifests...
06-20 15:26:52.831  level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: [failed to get manifest openshift-cluster-api-guests: Get "https://127.0.0.1:46555/api/v1/namespaces/openshift-cluster-api-guests": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest default: Get "https://127.0.0.1:46555/apis/infrastructure.cluster.x-k8s.io/v1beta2/awsclustercontrolleridentities/default": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd: Get "https://127.0.0.1:46555/apis/cluster.x-k8s.io/v1beta1/namespaces/openshift-cluster-api-guests/clusters/jima417aws-gjrzd": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd: Get "https://127.0.0.1:46555/apis/infrastructure.cluster.x-k8s.io/v1beta2/namespaces/openshift-cluster-api-guests/awsclusters/jima417aws-gjrzd": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-bootstrap: Get "https://127.0.0.1:46555/apis/infrastructure.cluster.x-k8s.io/v1beta2/namespaces/openshift-cluster-api-guests/awsmachines/jima417aws-gjrzd-bootstrap": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master-0: Get "https://127.0.0.1:46555/apis/infrastructure.cluster.x-k8s.io/v1beta2/namespaces/openshift-cluster-api-guests/awsmachines/jima417aws-gjrzd-master-0": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master-1: Get "https://127.0.0.1:46555/apis/infrastructure.cluster.x-k8s.io/v1beta2/namespaces/openshift-cluster-api-guests/awsmachines/jima417aws-gjrzd-master-1": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master-2: Get "https://127.0.0.1:46555/apis/infrastructure.cluster.x-k8s.io/v1beta2/namespaces/openshift-cluster-api-guests/awsmachines/jima417aws-gjrzd-master-2": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-bootstrap: Get "https://127.0.0.1:46555/apis/cluster.x-k8s.io/v1beta1/namespaces/openshift-cluster-api-guests/machines/jima417aws-gjrzd-bootstrap": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master-0: Get "https://127.0.0.1:46555/apis/cluster.x-k8s.io/v1beta1/namespaces/openshift-cluster-api-guests/machines/jima417aws-gjrzd-master-0": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master-1: Get "https://127.0.0.1:46555/apis/cluster.x-k8s.io/v1beta1/namespaces/openshift-cluster-api-guests/machines/jima417aws-gjrzd-master-1": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master-2: Get "https://127.0.0.1:46555/apis/cluster.x-k8s.io/v1beta1/namespaces/openshift-cluster-api-guests/machines/jima417aws-gjrzd-master-2": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-bootstrap: Get "https://127.0.0.1:46555/api/v1/namespaces/openshift-cluster-api-guests/secrets/jima417aws-gjrzd-bootstrap": dial tcp 127.0.0.1:46555: connect: connection refused, failed to get manifest jima417aws-gjrzd-master: Get "https://127.0.0.1:46555/api/v1/namespaces/openshift-cluster-api-guests/secrets/jima417aws-gjrzd-master": dial tcp 127.0.0.1:46555: connect: connection refused]

Version-Release number of selected component (if applicable):

4.16/4.17 nightly build

How reproducible:

always

Steps to Reproduce:

1. Set ENV OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP 
2. Trigger the capi-based installation
3.

Actual results:

Installer exited when collecting capi manifests.

Expected results:

Installation should be successful.

Additional info:

https://github.com/openshift/installer/pull/8670

Bug OCPBUGS-28548: Promote ecr-credential-provider image with RPM

View the Description View the linked PRs

Description of problem:

In https://github.com/openshift/release/pull/47648 ecr-credentials-provider is built in CI and later included in RHCOS.

In order to make it work on OKD it needs to be included in the payload, so that OKD machine-os could extract RPM and install it on the host

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Ref: ~~OCPBUGS-25662~~

Bug OCPBUGS-24790: Update 4.16 ironic-static-ip-manager-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ironic-static-ip-manager/pull/41

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ironic-static-ip-manager/pull/41

Bug OCPBUGS-36097: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/88

Bug OCPBUGS-27399: duplicate pending CSR info shown on Nodes list page

View the Description View the linked PRs

Description of problem:

when there is only one server CSR pending on approval, we still show two records(one is client CSR requires approval which is already several hours old and the other is server CSR requires approval)

Version-Release number of selected component (if applicable):

pre-merge testing of https://github.com/openshift/console/pull/13493

How reproducible:

Always

Steps to Reproduce:

1. select one node which is joining the cluster, approve client CSR and do not approve server CSR, wait for some time

=> we can see only one node is pending on server CSR approval
$ oc get csr | grep Pending | grep system:node
csr-54sn4   142m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-7nhb9   65m    kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-9g22f   4m4s   kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-bgrdq   35m    kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-chqnf   50m    kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-f4sbl   127m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-msnml   157m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-p9qrp   19m    kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-qp2pw   112m   kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-qrlnv   96m    kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending
csr-tk7j4   81m    kubernetes.io/kubelet-serving                 system:node:ip-10-0-49-55.us-east-2.compute.internal                        <none>              Pending

Actual results:

1. on nodes list page, we can see two rows shown for node ip-10-0-49-55.us-east-2.compute.internal

Expected results:

since the pending client CSR has been there for several hours and the node now is actually waiting for server CSR approval, we should only show one record/row to indicate user that it requires server CSR approval

The pending client CSR associated with ip-10-0-49-55.us-east-2.compute.internal is already 3 hours old
$ oc get csr csr-4d628
NAME        AGE   SIGNERNAME                                    REQUESTOR                                                                   REQUESTEDDURATION   CONDITION
csr-4d628   3h    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending

Additional info:

https://github.com/openshift/console/pull/13522

Bug OCPBUGS-29425: Power VS: Destroy code needs to account for edge case of lists

View the Description View the linked PRs

Description of problem:

    In accounts with a large amount of resources, the destroy code will fail to list all resources. This has revealed some changes that need to be made to the destroy code to handle these situations.

Version-Release number of selected component (if applicable):

How reproducible:

    Difficult - but we have an account where we can reproduce it consistently

Steps to Reproduce:

    1. Try to destroy a cluster in an account with a large amount of resources.
    2. Fail.
    3.

Actual results:

Fail to destroy

Expected results:

Destroy succeeds

Additional info:

https://github.com/openshift/installer/pull/8010

Bug OCPBUGS-31768: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/108

Bug OCPBUGS-24975: Update 4.16 ose-cluster-bootstrap-container image to be consistent with ART

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-36138: [4.16]The certificate relating to operator-lifecycle-manager-packageserver isn't rotated after expired

View the Description View the linked PRs

Cluster operator status showing `Unavailable`:

ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: APIServiceResourceIssue, message: found the CA cert is not active

Below script used for checking validity of the certificate and recreate them

# Check Cluster Existing Certificates :
        echo -e "NAMESPACE\tNAME\tEXPIRY" && oc get secrets -A -o go-template='{{range .items}}{{if eq .type "kubernetes.io/tls"}}{{.metadata.namespace}}{{" "}}{{.metadata.name}}{{" "}}{{index .data "tls.crt"}}{{"\n"}}{{end}}{{end}}' | while read namespace name cert; do echo -en "$namespace\t$name\t"; echo $cert | base64 -d | openssl x509 -noout -enddate; done | column -t
# Manually Update Cluster Certificates : 
        az aro update -n xxxx -g xxxx  --refresh-credentials --debug
# Check again Cluster Existing Certificates :

        echo -e "NAMESPACE\tNAME\tEXPIRY" && oc get secrets -A -o go-template='{{range .items}}{{if eq .type "kubernetes.io/tls"}}{{.metadata.namespace}}{{" "}}{{.metadata.name}}{{" "}}{{index .data "tls.crt"}}{{"\n"}}{{end}}{{end}}' | while read namespace name cert; do echo -en "$namespace\t$name\t"; echo $cert | base64 -d | openssl x509 -noout -enddate; done | column -t
#Renew Secret/Certificate for OLM :
        # Check Secret Expiration :
                oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
        # Backup the current secret :
                oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager > packageserver-service-cert.yaml
        # Delete the Secret :
                oc delete secret packageserver-service-cert -n openshift-operator-lifecycle-manager
        # Check Secret Expiration again :
                oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
# Get Cluster Operator :
  oc get co
  oc get co operator-lifecycle-manager
  oc get co operator-lifecycle-manager-catalog
  oc get co operator-lifecycle-manager-packageserver
# Go to the kube-system namespace and take the backup of extension-apiserver-authentication configmap:
  oc project kube-system 
  oc get cm extension-apiserver-authentication -oyaml >> extcm_backup.yaml
# delete the extension-apiserver-authentication configmap to :
  oc delete cm extension-apiserver-authentication -n kube-system
  oc get cm -n kube-system |grep extension-apiserver-authentication
  oc get apiservice v1.packages.operators.coreos.com -o jsonpath='{.spec.caBundle}' | base64 -d | openssl x509 -noout -text

We have check the certificate details as below :

$ oc get apiservice 
v1.packages.operators.coreos.com
-o jsonpath='{.spec.caBundle}' | base64 -d | openssl x509 -text
E1213 10:24:41.606151 3802053 memcache.go:255] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1213 10:24:41.639144 3802053 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1213 10:24:41.651532 3802053 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
E1213 10:24:41.660851 3802053 memcache.go:106] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 5319897470906267024 (0x49d4129052ddf590)
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: O = "Red Hat, Inc."
        Validity
            Not Before: Nov 29 18:41:35 2021 GMT
            Not After : Nov 29 18:41:35 2023 GMT
        Subject: O = "Red Hat, Inc."
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:ea:c0:af:d3:af:e6:0e:61:82:c8:f4:fe:ec:22:
                    8d:c5:c1:08:6f:91:92:8b:09:05:e9:72:ca:d4:68:
                    fb:aa:e1:ec:e2:e8:ca:32:4c:1f:e7:fc:3a:eb:61:
                    0b:df:9c:b4:13:62:f4:67:6c:d2:8f:97:a0:a8:a8:
                    69:08:22:4d:62
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Certificate Sign
            X509v3 Extended Key Usage:
                TLS Web Client Authentication, TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:TRUE
            X509v3 Subject Key Identifier:
                53:A4:1D:22:F8:0F:8E:C5:74:8C:C6:F4:90:F0:2D:29:B0:65:89:19
    Signature Algorithm: ecdsa-with-SHA256
         30:45:02:21:00:f5:32:98:3d:34:b6:fd:65:47:3b:31:0d:88:
         fc:fe:35:cd:4f:51:75:a0:89:16:1a:9e:56:d5:f7:49:e6:3a:
         a3:02:20:43:fa:81:78:56:f4:1f:9b:3a:5b:7f:28:7e:a8:5b:
         b7:7a:3e:0a:99:67:88:0e:66:e4:c9:d5:9d:2f:79:80:3e
----BEGIN CERTIFICATE----
MIIBhzCCAS2gAwIBAgIISdQSkFLd9ZAwCgYIKoZIzj0EAwIwGDEWMBQGA1UEChMN
UmVkIEhhdCwgSW5jLjAeFw0yMTExMjkxODQxMzVaFw0yMzExMjkxODQxMzVaMBgx
FjAUBgNVBAoTDVJlZCBIYXQsIEluYy4wWTATBgcqhkjOPQIBBggqhkjOPQMBBwNC
AATqwK/Tr+YOYYLI9P7sIo3FwQhvkZKLCQXpcsrUaPuq4ezi6MoyTB/n/DrrYQvf
nLQTYvRnbNKPl6CoqGkIIk1io2EwXzAOBgNVHQ8BAf8EBAMCAoQwHQYDVR0lBBYw
FAYIKwYBBQUHAwIGCCsGAQUFBwMBMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYE
FFOkHSL4D47FdIzG9JDwLSmwZYkZMAoGCCqGSM49BAMCA0gAMEUCIQD1Mpg9NLb9
ZUc7MQ2I/P41zU9RdaCJFhqeVtX3SeY6owIgQ/qBeFb0H5s6W38ofqhbt3o+Cpln
iA5m5MnVnS95gD4=

https://github.com/openshift/operator-framework-olm/pull/798

Bug OCPBUGS-23260: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-33199: The filepath including leading slash makes error during parsing devfile using Gitlab

View the Description View the linked PRs

Description of problem:

When creating an application based on devfile "Import from Git" in Developer console using only GitLab repo, the following error block to create it.
It only happened when using GitLab, not Github. And CLI operation based on "oc new-app" could work well. In other words, the issue is only for Dev console.  

  Could not fetch kubernetes resource "/deploy.yaml" for component "kubernetes-deploy" from Git repository https://{gitlaburl}.

Version-Release number of selected component (if applicable):

4.15.z

How reproducible:

Always

Steps to Reproduce:

You can always reproduce according to the following procedures.
a. Switch "Developer" mode at your web console.
b. Move "+Add", then click "Import from Git" in "Git Repository" section at the page.
c. Input "https://<GITLAB HOSTNAME>/XXXX/devfile-sample-go-basic.git" to the "Git Repo URL" text box.
d. Select "GitLab" at "Git type" drop box.
e. You can see the below error messages.

Actual results:

The "/deploy.yaml" file path evaluated as invalid one with 400 response status during the process as follows.
Look at the URL, "/%2Fdeploy.yaml" shows us leading slash was duplicated there.

  Request URL:
    https://<GITLAB HOSTNAME>/api/v4/projects/yyyy/repository/files/%2Fdeploy.yaml/raw?ref=main
  Response:
    {"error":"file_path should be a valid file path"}

Expected results:

 The request URL for handling "deploy.yaml" file should be removed the duplicated leading slash and provide correct file path.

 Request URL:
   https://<GITLAB HOSTNAME>/api/v4/projects/yyyy/repository/files/deploy.yaml/raw?ref=main
 Response:
   "deploy.yaml" contents.

Additional info:

I submitted a pull request to fix this here: https://github.com/openshift/console/pull/13812

https://github.com/openshift/console/pull/13812

Bug OCPBUGS-24800: Update 4.16 ose-egress-http-proxy-container image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/156

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/156

Bug OCPBUGS-26014: CVO does not reconcile metadata on ClusterOperators

View the Description View the linked PRs

Description of problem:

While testing oc adm upgrade status against b02, I noticed some COs do not have any annotations, while I expected them to have the include/exclude.release.openshift.io/* ones (to recognize COs that come from the payload).

$ b02 get clusteroperator etcd -o jsonpath={.metadata.annotations}
$ ota-stage get clusteroperator etcd -o jsonpath={.metadata.annotations}
{"exclude.release.openshift.io/internal-openshift-hosted":"true","include.release.openshift.io/self-managed-high-availability":"true","include.release.openshift.io/single-node-developer":"true"}

CVO does not reconcile CO resources once they exist, only precreates them but does not touch them once they exist. Build02 does not have CO with reconciled metadata because it was born as 4.2 which (AFAIK) is before OCP started to use the exclude/include annotations.

Version-Release number of selected component (if applicable):

4.16 (development branch)

How reproducible:

deterministic

Steps to Reproduce:

1. delete an annotation on a ClusterOperator resource

Actual results:

The annotation wont be recreated

Expected results:

The annotation should be recreated

https://github.com/openshift/cluster-version-operator/pull/1012

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Self-managed
Classic (standalone cluster)	Classic (standalone)
Hosted control planes	N/A
Multi node, Compact (three node), or Single node (SNO), or all	All
Connected / Restricted Network	All
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	All
Operator compatibility	N/A
Backport needed (list applicable versions)	N/A
Other (please specify)	N/A

~~Deployment considerations~~	~~List applicable specific needs (N/A = not applicable)~~
~~Self-managed, managed, or both~~
~~Classic (standalone cluster)~~
~~Hosted control planes~~
~~Multi node, Compact (three node), or Single node (SNO), or all~~
~~Connected / Restricted Network~~
~~Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)~~
~~Operator compatibility~~
~~Backport needed (list applicable versions)~~
~~UI need (e.g. OpenShift Console, dynamic plugin, OCM)~~
~~Other (please specify)~~

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	both
Classic (standalone cluster)	yes
Hosted control planes	yes
Multi node, Compact (three node), or Single node (SNO), or all	all although SNO is rare on Azure
Connected / Restricted Network	both
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	x86
Operator compatibility	Azure File CSI operator
Backport needed (list applicable versions)	No
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	No
Other (please specify)	ship downstream images with from forked azcopy

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Self-managed
Classic (standalone cluster)	Classic
Hosted control planes	N/A
Multi node, Compact (three node), or Single node (SNO), or all	All
Connected / Restricted Network, or all	All
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	All applicable architectures
Operator compatibility
Backport needed (list applicable versions)	4.14+
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	N/A
Other (please specify)

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	yes
Classic (standalone cluster)	yes
Hosted control planes	no
Multi node, Compact (three node), or Single node (SNO), or all	Multi node and compact clusters
Connected / Restricted Network	Yes
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	Yes
Operator compatibility	N/A
Backport needed (list applicable versions)	N/A
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	N/A
Other (please specify)	N/A

#	namespace	4.19	4.18	4.17	4.16	4.15	4.14
1	oc debug node pods			#1763	#1816	#1818
2	openshift-apiserver-operator				#573	#581
3	openshift-authentication				#656	#675
4	openshift-authentication-operator				#656	#675
5	openshift-catalogd				#50	#58
6	openshift-cloud-credential-operator				#681	#736
7	openshift-cloud-network-config-controller		#2282	#2490	#2496
8	openshift-cluster-csi-drivers	#118 #5310 #135	#524 #131 #306 #265 #75		#170 #459	#484
9	openshift-cluster-node-tuning-operator				#968	#1117
10	openshift-cluster-olm-operator				#54	n/a	n/a
11	openshift-cluster-samples-operator				#535	#548
12	openshift-cluster-storage-operator		#516		#459 #196	#484 #211
13	openshift-cluster-version				#1038	#1068
14	openshift-config-operator				#410	#420
15	openshift-console			#871	#908	#924
16	openshift-console-operator			#871	#908	#924
17	openshift-controller-manager				#336	#361
18	openshift-controller-manager-operator				#336	#361
19	openshift-e2e-loki		#56579	#56579	#56579	#56579
20	openshift-image-registry				#1008	#1067
21	openshift-ingress		#1032
22	openshift-ingress-canary		#1031
23	openshift-ingress-operator		#1031
24	openshift-insights	#1033	#1041	#1049	#915	#967
25	openshift-kni-infra		#4504	#4542	#4539	#4540
26	openshift-kube-storage-version-migrator				#107	#112
27	openshift-kube-storage-version-migrator-operator				#107	#112
28	openshift-machine-api	#1308 #1317	#1311	#407	#315 #282 #1220 #73 #50 #433	#332 #326 #1288 #81 #57 #443
29	openshift-machine-config-operator		#4636	#4219	#4384	#4393
30	openshift-manila-csi-driver			#234	#235	#236
31	openshift-marketplace		#578		#561	#570
32	openshift-metallb-system		#238	#240	#241
33	openshift-monitoring	#2298 #366	#2498		#2335	#2420
34	openshift-network-console		#2545
35	openshift-network-diagnostics		#2282	#2490	#2496
36	openshift-network-node-identity		#2282	#2490	#2496
37	openshift-nutanix-infra		#4504		#4539	#4540
38	openshift-oauth-apiserver				#656	#675
39	openshift-openstack-infra		#4504		#4539	#4540
40	openshift-operator-controller				#100	#120

requirement	Notes	isMvp?

Question	Outcome

4.16.23

Changes from 4.15.41

Complete Features

Note: phase 2 target is tech preview.

Feature Overview

Goals & Requirements

Feature Overview (aka. Goal Summary)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Feature Overview (aka. Goal Summary)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Use Cases (Optional):

Overview

Goals

Requirements

Feature Overview (aka. Goal Summary)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

User Story:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

User Story:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

User Story:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

Engineering Details:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

User Story:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

User Story:

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

User Story:

Acceptance Criteria: