Back to index

4.18.0-ec.2

Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |

Changes from 4.17.4

Note: this page shows the Feature-Based Change Log for a release

Complete Features

These features were completed when this image was assembled

Feature Overview (aka. Goal Summary)

Today we expose two main APIs for HyperShift, namely `HostedCluster` and `NodePool`. We also have metrics to gauge adoption by reporting the # of hosted clusters and nodepools.

But we are still missing other metrics to be able to make correct inference about what we see in the data.

Goals (aka. expected user outcomes)

  • Provide Metrics to highlight # of Nodes per NodePool or # of Nodes per cluster
  • Make sure the error between what appears in CMO via `install_type` and what we report as # Hosted Clusters is minimal.

Use Cases (Optional):

  • Understand product adoption
  • Gauge Health of deployments
  • ...

 

Overview

Today we have hypershift_hostedcluster_nodepools as a metric exposed to provide information on the # of nodepools used per cluster. 

 

Additional NodePools metrics such as hypershift_nodepools_size and hypershift_nodepools_available_replicas are available but not ingested in Telemetry.

In addition to knowing how many nodepools per hosted cluster, we would like to expose the knowledge of the nodepool size.

 

This will help inform our decision making and provide some insights on how the product is being adopted/used.

Goals

The main goal of this epic is to show the following NodePools metrics on Telemeter, ideally as recording rules: 

  • Hypershift_nodepools_size
  • hypershift_nodepools_available_replicas

Requirements

The implementation involves creating updates to the following GitHub repositories:

similar PRs:
https://github.com/openshift/hypershift/pull/1544
https://github.com/openshift/cluster-monitoring-operator/pull/1710

Feature Overview (aka. Goal Summary)  

Graduce the new PV access mode ReadWriteOncePod as GA.

Such PV/PVC can be used only in a single pod on a single node compared to the traditional ReadWriteOnce access mode, where such a PV/PVC can be used on a single node by many pods.

Goals (aka. expected user outcomes)

The customers can start using the new ReadWriteOncePod access mode.

This new mode allows customers to provision and attach PV and get the guarantee that it cannot be attached to another local pod.

 

Requirements (aka. Acceptance Criteria):

This new mode should support the same operations as regular ReadWriteOnce PVs therefore it should pass the regression tests. We should also ensure that this PV can't be accessed by another local-to-node pod.

 

Use Cases (Optional):

As a user I want to attach a PV to a pod and ensure that it can't be accessed by another local pod.

Background

We are getting this feature from upstream as GA. We need to test it and fully support it.

Customer Considerations

 

Check that there is no limitations / regression.

Documentation Considerations

Remove tech preview warning. No additional change.

 

Interoperability Considerations

N/A

Epic Goal

Support upstream feature "New RWO access mode " in OCP as GA, i.e. test it and have docs for it.

This is continuation of STOR-1171  (Beta/Tech Preview in 4.14), now we just need to mark it as GA and remove all TechPreview notes from docs.

Why is this important?

  • We get this upstream feature through Kubernetes rebase. We should ensure it works well in OCP and we have docs for it.

Upstream links

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. External: the feature is currently scheduled for GA in Kubernetes 1.29, i.e. OCP 4.16, but it may change before Kubernetes 1.29 GA.

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview (aka. Goal Summary)

 

Currently the maximum number of snapshots per volume in vSphere CSI is set to 3 and cannot be configured. Customers find this default limit too low and are asking us to make this setting configurable.

Maximum number of snapshot is 32 per volume

Goals (aka. expected user outcomes)

Customers can override the default (three) value and set it to a custom value.

Make sure we document (or link) the VMWare recommendations in terms of performances.

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-E0B41C69-7EEB-450F-A73D-5FD2FF39E891.html#GUID-7BA0CDAE-E031-470E-A685-60C82DAE36D2__GUID-D9A97A90-2777-46EA-94EB-F04A27FBB76D

 

https://kb.vmware.com/s/article/1025279

Requirements (aka. Acceptance Criteria):

The setting can be easily configurable by the OCP admin and the configuration is automatically updated. Test that the setting is indeed applied and the maximum number of snapshots per volume is indeed changed.

No change in the default

Use Cases (Optional):

As an OCP admin I would like to change the maximum number of snapshots per volumes.

Out of Scope

Anything outside of 

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-E0B41C69-7EEB-450F-A73D-5FD2FF39E891.html#GUID-7BA0CDAE-E031-470E-A685-60C82DAE36D2__GUID-D9A97A90-2777-46EA-94EB-F04A27FBB76D

Background

The default value can't be overwritten, reconciliation prevents it.

Customer Considerations

Make sure the customers understand the impact of increasing the number of snapshots per volume.

https://kb.vmware.com/s/article/1025279

Documentation Considerations

Document how to change the value as well as a link to the best practice. Mention that there is a 32 hard limit. Document other limitations if any.

Interoperability Considerations

N/A

Epic Goal*

The goal of this epic is to allow admins to configure the maximum number of snapshots per volume in vSphere CSI and find an way how to add such extension to OCP API.

Possible future candidates:

  • configure EFS volume size monitioring (via driver cmdline arg.) - STOR-1422
  • configure OpenStack topology - RFE-11

 
Why is this important? (mandatory)

Currently the maximum number of snapshots per volume in vSphere CSI is set to 3 and cannot be configured. Customers find this default limit too low and are asking us to make this setting configurable.

Maximum number of snapshot is 32 per volume

https://kb.vmware.com/s/article/1025279

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-E0B41C69-7EEB-450F-A73D-5FD2FF39E891.html#GUID-7BA0CDAE-E031-470E-A685-60C82DAE36D2__GUID-D9A97A90-2777-46EA-94EB-F04A27FBB76D

 

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1. As an admin I would like to configure the maximum number of snapshots per volume.
  2. As a user I would like to create more than 3 snapshots per volume

 
Dependencies (internal and external) (mandatory)

1) Write OpenShift enhancement (STOR-1759)

2) Extend ClusterCSIDriver API (TechPreview) (STOR-1803)

3) Update vSphere operator to use the new snapshot options (STOR-1804)

4) Promote feature from Tech Preview to Accessible-by-default (STOR-1839)

  • prerequisite: add e2e test and demonstrate stability in CI (STOR-1838)

 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - STOR
  • Documentation - STOR
  • QE - STOR
  • PX - Enablement
  • Others -

Acceptance Criteria (optional)

Configure the maximum number of snapshot to a higher value. Check the config has been updated and verify that the maximum number of snapshots per volume maps to the new setting value.

Drawbacks or Risk (optional)

Setting this config setting with a high value can introduce performances issues. This needs to be documented.

https://kb.vmware.com/s/article/1025279

 

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Feature Overview (aka. Goal Summary)

Support the SMB CSI driver through an OLM operator as tech preview. The SMB CSI driver allows OCP to consume SMB/CIFS storage with a dynamic CSI driver. This enables customers to leverage their existing storage infrastructure with either SAMBA or Microsoft environment.

https://github.com/kubernetes-csi/csi-driver-smb

Goals (aka. expected user outcomes)

Customers can start testing connecting OCP to their backend exposing CIFS. This can allow to consume net new volume or consume existing data produced outside OCP.

Requirements (aka. Acceptance Criteria):

Driver already exists and is under the storage SIG umbrella. We need to make sure the driver is meeting OCP quality requirement and if so develop an operator to deploy and maintain it.

Review and clearly define all driver limitations and corner cases.

Use Cases (Optional):

  • As an OCP admin, I want OCP to consume storage exposed via SMB/CIFS to capitalise on my existing infrastructure.
  • As an user, I want to consume external data stored on a SMB/CIFS backend.

Questions to Answer (Optional):

Review the different authentication method.

Out of Scope

Windows containers support.

Only storage class login/password authentication method. Other methods can be reviewed and considered for GA.

Background

Customers are expecting to consume storage and possibly existing data via SMB/CIFS. As of today vendor's drivers support is really limited in terms of CIFS support whereas this protocol is widely used on premise especially with MS/AD customers.

Customer Considerations

Need to understand what customers expect in terms of authentication.

How to extend this feature to windows containers.

Documentation Considerations

Document the operator and driver installation, usage capabilities and limitations.

Interoperability Considerations

Future: How to manage interoperability with windows containers (not for TP)

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

The Azure File CSI driver currently lacks cloning and snapshot restore features. The goal of this feature is to support the cloning feature as technology preview. This will help support snapshots restore in a future release

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

As a user I want to easily clone Azure File volume by creating a new PVC with spec.DataSource referencing origin volume.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

This feature only applies to OCP running on Azure / ARO and File CSI.

The usual CSI cloning CI must pass.

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both both
Classic (standalone cluster) yes
Hosted control planes yes
Multi node, Compact (three node), or Single node (SNO), or all all although SNO is rare on Azure
Connected / Restricted Network both
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) x86
Operator compatibility Azure File CSI operator
Backport needed (list applicable versions) No
UI need (e.g. OpenShift Console, dynamic plugin, OCM) No
Other (please specify) ship downstream images with from forked azcopy

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

Restoring snapshots are out of scope for now.

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Update the CSI capability matrix and any language that mentions that Azure File CSI does not support cloning.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Not impact but benefit Azure / ARO customers.

Epic Goal*

Azure File added support for cloning volumes which relies on azcopy command upstream. We need to fork azcopy so we can build and ship downstream images with from forked azcopy. AWS driver does the same with efs-utils.

Upstream repo: https://github.com/Azure/azure-storage-azcopy

NOTE: using snapshots as a source is currently not supported: https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/7591a06f5f209e4ef780259c1631608b333f2c20/pkg/azurefile/controllerserver.go#L732 

 

Why is this important? (mandatory)

This is required for adding Azure File cloning feature support.

 

Scenarios (mandatory) 

1. As a user I want to easily clone Azure File volume by creating a new PVC with spec.DataSource referencing origin volume.

 
Dependencies (internal and external) (mandatory)

1) Write OpenShift enhancement (STOR-1757)

2) Fork upstream repo (STOR-1716)

3) Add ART definition for OCP Component (STOR-1755)

  • prerequisite: Onboard image with DPTP/CI (STOR-1752)
  • prerequisite: Perform a threat model assessment (STOR-1753)
  • prerequisite: Establish common understanding with Product Management / Docs / QE / Product Support (STOR-1753)
  • requirement: ProdSec Review (STOR-1756)

4) Use the new image as base image for Azure File driver (STOR-1794)

5) Ensure e2e cloning tests are in CI (STOR-1818)

 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - yes
  • Documentation - yes
  • QE - yes
  • PX - ???
  • Others - ART

 

Acceptance Criteria (optional)

Downstream Azure File driver image must include azcopy and cloning feature must be tested.

 

Drawbacks or Risk (optional)

No risks detected so far.

 

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be "Release Pending" 

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

  • As an Alibaba Cloud customer, I want to create an OpenShift cluster with the Assisted Installer using the agnostic platform (platform=none) for connected deployments.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

In order to remove IPI/UPI support for Alibaba Cloud in OpenShift (currently Tech Preview, see also OCPSTRAT-1042), we need to provide an alternate method for Alibaba Cloud customers to spin up an OpenShift cluster. To that end, we want customers to use Assisted Installer with platform=none (and later platform=external) to bring up their OpenShift clusters.

  • Stretch goal to do this with platform=external.
  • Note: We can TP with platform=none or platform=external, but for GA it must be with platform=external.
  • Document how to use this installation method

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

  • Hybrid Cloud Console updated to reflect Alibaba Cloud installation with Assisted Installer (Tech Preview).
  • Documentation that tells customer how to use this install method
  • CI for this install method optional for OCP 4.16 (and will be addressed in the future)

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both Self-managed
Classic (standalone cluster) Classic
Hosted control planes N/A
Multi node, Compact (three node), or Single node (SNO), or all Multi-node
Connected / Restricted Network Connected for OCP 4.16 (Future: restricted)
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) x86_x64
Operator compatibility This should be the same for any operator on platform=none
Backport needed (list applicable versions) OpenShift 4.16 onwards
UI need (e.g. OpenShift Console, dynamic plugin, OCM) Hybrid Cloud Console changes needed
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

  • Restricted network deployments, i.e. As an Alibaba Cloud customer, I want to create an OpenShift cluster with the Agent-based Installer using the agnostic platform (platform=none) for restricted network deployments.

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

For OpenShift 4.16, we want to remove IPI support (currently Tech Preview) for Alibaba Cloud support (OCPSTRAT-1042). Instead we want it to make it Assisted Installer (Tech Preview) with the agnostic platform for Alibaba Cloud in OpenShift 4.16 (OCPSTRAT-1149).

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Previous UPI-based installation doc: Alibaba Cloud Red Hat OpenShift Container Platform 4.6 Deployment Guide

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

As an Alibaba Cloud customer, I want to create an OpenShift cluster with the Assisted Installer using the agnostic platform (platform=none) for connected deployments.

Epic Goal

  • Start with the original Alibaba Cloud Red Hat OpenShift Container Platform 4.6 Deployment Guide and adjust it to use the Assisted Installer with platform=none.
  • Document the steps for a successful installation using that method and feed the docs team with that information.
  • Narrow down the scope to the minimum viable to achieve Tech Preview in 4.16. We'll handle platform=external and better tools and automation in future releases.
  • Engage with the Assisted Installer team and the Solutions Architect / PTAM of Alibaba for support.
  • Provide frequent updates on work progress (at least weekly).
  • Assist QE in testing.

Why is this important?

  • In order to remove IPI/UPI support for Alibaba Cloud in OpenShift, we need to provide an alternate method for Alibaba Cloud customers to spin up an OpenShift cluster. To that end, we want customers to use Assisted Installer with platform=none (and in future releases platform=external) to bring up their OpenShift clusters.

Acceptance Criteria

  • Reproducible, stable, and documented installation steps using the Assisted Installer with platform=none provided to the docs team and QE.

Out of scope

  1. CI

Previous Work (Optional):

  1. https://www.alibabacloud.com/blog/alibaba-cloud-red-hat-openshift-container-platform-4-6-deployment-guide_597599
  2. https://github.com/kwoodson/terraform-openshift-alibaba for reference, it may help
  3. Alibaba IPI for reference, it may help
  4. Using the Assisted Installer to install a cluster on Oracle Cloud Infrastructure for reference

<!--

Please make sure to fill all story details here with enough information so
that it can be properly sized and is immediately actionable. Our Definition
of Ready for user stories is detailed in the link below:

https://docs.google.com/document/d/1Ps9hWl6ymuLOAhX_-usLmZIP4pQ8PWO15tMksh0Lb_A/

As much as possible, make sure this story represents a small chunk of work
that could be delivered within a sprint. If not, consider the possibility
of splitting it or turning it into an epic with smaller related stories.

Before submitting it, please make sure to remove all comments like this one.

-->

{}USER STORY:{}

<!--

One sentence describing this story from an end-user perspective.

-->

As a [type of user], I want [an action] so that [a benefit/a value].

{}DESCRIPTION:{}

<!--

Provide as many details as possible, so that any team member can pick it up
and start to work on it immediately without having to reach out to you.

-->

{}Required:{}

...

{}Nice to have:{}

...

{}ACCEPTANCE CRITERIA:{}

<!--

Describe the goals that need to be achieved so that this story can be
considered complete. Note this will also help QE to write their acceptance
tests.

-->

{}ENGINEERING DETAILS:{}

<!--

Any additional information that might be useful for engineers: related
repositories or pull requests, related email threads, GitHub issues or
other online discussions, how to set up any required accounts and/or
environments if applicable, and so on.

-->

Feature Overview (aka. Goal Summary)  

Enable Hosted Control Planes guest clusters to support up to 500 worker nodes. This enable customers to have clusters with large amount of worker nodes.

Goals (aka. expected user outcomes)

Max cluster size 250+ worker nodes (mainly about control plane). See XCMSTRAT-371 for additional information.
Service components should not be overwhelmed by additional customer workloads and should use larger cloud instances and when worker nodes are lesser than the threshold it should use smaller cloud instances.

Requirements (aka. Acceptance Criteria):

 

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both Managed
Classic (standalone cluster) N/A
Hosted control planes Yes
Multi node, Compact (three node), or Single node (SNO), or all N/A
Connected / Restricted Network Connected
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) x86_64 ARM
Operator compatibility N/A
Backport needed (list applicable versions) N/A
UI need (e.g. OpenShift Console, dynamic plugin, OCM) N/A
Other (please specify)  

Questions to Answer (Optional):

Check with OCM and CAPI requirements to expose larger worker node count.

 

Documentation:

  • Design document detailing the autoscaling mechanism and configuration options
  • User documentation explaining how to configure and use the autoscaling feature.

Acceptance Criteria

  • Configure max-node size from CAPI
  • Management cluster nodes automatically scale up and down based on the hosted cluster's size.
  • Scaling occurs without manual intervention.
  • A set of "warm" nodes are maintained for immediate hosted cluster creation.
  • Resizing nodes should not cause significant downtime for the control plane.
  • Scaling operations should be efficient and have minimal impact on cluster performance.

 

Goal

  • Dynamically scale the serving components of control planes

Why is this important?

  • To be able to have clusters with large amount of worker nodes

Scenarios

  1. A hosted cluster amount of worker nodes increases past X amount, the serving components are moved to larger cloud instances
  2. A hosted cluster amount of workers falls below a threshold, the serving components are moved to smaller cloud instances.

Acceptance Criteria

  • Dev - Has a valid enhancement if necessary
  • CI - MUST be running successfully with tests automated
  • QE - covered in Polarion test plan and tests implemented

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Technical Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

As a service provider, I want to be able to:

  • Configure priority and fairness settings per HostedCluster size and force these settings to be applied on the resulting hosted cluster.

so that I can achieve

  • Prevent user of hosted cluster from bringing down the HostedCluster kube apiserver with their workload.

Acceptance Criteria:

Description of criteria:

  • HostedCluster priority and fairness settings should be configurable per cluster size in the ClusterSizingConfiguration CR
  • Any changes in priority and fairness inside the HostedCluster should be prevented and overridden by whatever is configured on the provider side.
  • With the proper settings, heavy use of the API from user workloads should not result in the KAS pod getting OOMKilled due to lack of resources.

This does not require a design proposal.
This does not require a feature gate.

Feature Overview (aka. Goal Summary)  

CRIO wipe is existing feature in Openshift . When node reboots CRIO wipe goes and clear the node of all images so that node boots clean . When node come back up it need access to image registry to get all images and it takes time to get all images . For telco and edge situation node might not have access to image registry and takes time to come up .

Goal of this feature is to adjust CRIO wipe to wipe only images that has been corrupted because of sudden reboot not all images 

Feature Overview

Phase 2 of the enclave support for oc-mirror with the following goals

  • Incorporate feedback from the field from 4.16 TP
  • Performance improvements

Goals

  • Update the batch processing using `containers/image` to do the copying for setting the number of blobs (layers) to download
  • Introduce a worker for concurrency that can also update the number of images to download to improve overall performance (these values can be tweaked via CLI flags). 
  • Collaborate with the UX team to improve the console output while pulling or pushing images. 

Feature Overview

Adding nodes to on-prem clusters in OpenShift in general is a complex task. We have numerous methods and the field keeps adding automation around these methods with a variety of solutions, sometimes unsupported (see "why is this important below"). Making cluster expansions easier will let users add nodes often and fast, leading to an much improved UX.

This feature adds nodes to any on-prem clusters, regardless of their installation method (UPI, IPI, Assisted, Agent), by booting an ISO image that will add the node to the cluster specified by the user, regardless of how the cluster was installed.

Goals and requirements

  • Users can install a host on day 2 using a bootable image to an OpenShift cluster.
  • At least platforms baremetal, vSphere, none and Nutanix are supported
  • Clusters installed with any installation method can be expanded with the image
  • Clusters don't need to run any special agent to allow the new nodes to join.

How this workflow could look like

1. Create image:

$ export KUBECONFIG=kubeconfig-of-target-cluster
$ oc adm node-image -o agent.iso --network-data=worker-n.nmstate --role=worker

2. Boot image

3. Check progress

$ oc adm add-node 

Consolidate options

An important goal of this feature is to unify and eliminate some of the existing options to add nodes, aiming to provide much simpler experience (See "Why is this important below"). We have official and field-documented ways to do this, that could be removed once this feature is in place, simplifying the experience, our docs and the maintenance of said official paths:

  • UPI: Adding RHCOS worker nodes to a user-provisioned infrastructure cluster
    • This feature will replace the need to use this method for the majority of UPI clusters. The current UPI method consists on many many manual steps. The new method would replace it by a couple of commands and apply to probably more than 90% of UPI clusters.
  • Field-documented methods and asks
  • IPI:
    • There are instances were adding a node to an bare metal IPI-deployed cluster can't be done via its BMC. This new feature, while not replacing the day-2 IPI workflow, solves the problem for this use case.
  • MCE: Scaling hosts to an infrastructure environment
    • This method is the most time-consuming and in many cases overkilling, but currently, along with the UPI method, is one of the two options we can give to users.
    • We shouldn't need to ask users to install and configure the MCE operator and its infrastructure for single clusters as it becomes a project even larger than UPI's method and save this for when there's more than one cluster to manage.

With this proposed workflow we eliminate the need of using the UPI method in the vast majority of the cases. We also eliminate the field-documented methods that keep popping up trying to solve this in multiple formats, and the need to recommend using MCE to all on-prem users, and finally we add a simpler option for IPI-deployed clusters.

In addition, all the built-in validations in the assisted service would be run, improving the installation the success rate and overall UX.

This work would have an initial impact on bare metal, vSphere, Nutanix and platform-agnostic clusters, regardless of how they were installed.

Why is this important

This feature is essential for several reasons. Firstly, it enables easy day2 installation without burdening the user with additional technical knowledge. This simplifies the process of scaling the cluster resources with new nodes, which today is overly complex and presents multiple options (https://docs.openshift.com/container-platform/4.13/post_installation_configuration/cluster-tasks.html#adding-worker-nodes_post-install-cluster-tasks).

Secondly, it establishes a unified experience for expanding clusters, regardless of their installation method. This streamlines the deployment process and enhances user convenience.

Another advantage is the elimination of the requirement to install the Multicluster Engine and Infrastructure Operator , which besides demanding additional system resources, are an overkill for use cases where the user simply wants to add nodes to their existing cluster but aren't managing multiple clusters yet. This results in a more efficient and lightweight cluster scaling experience.

Additionally, in the case of IPI-deployed bare metal clusters, this feature eradicates the need for nodes to have a Baseboard Management Controller (BMC) available, simplifying the expansion of bare metal clusters.

Lastly, this problem is often brought up in the field, where examples of different custom solutions have been put in place by redhatters working with customers trying to solve the problem with custom automations, adding to inconsistent processes to scale clusters. 

Oracle Cloud Infrastructure

This feature will solve the problem cluster expansion for OCI. OCI doesn't have MAPI and CAPI isn't in the mid term plans. Mitsubishi shared their feedback making solving the problem of lack of cluster expansion a requirement to Red Hat and Oracle.

Existing work

We already have the basic technologies to do this with the assisted-service and the agent-based installer, which already do this work for new clusters, and from which we expect to leverage the foundations for this feature.

Day 2 node addition with agent image.

Yet Another Day 2 Node Addition Commands Proposal

Enable day2 add node using agent-install: AGENT-682

 

Epic Goal

  • Cleanup/carryover work from AGENT-682 for the GA release

Why is this important?

  • Address all the required elements for the GA, such as the FIPS compliancy. This will allow a smoother integration of the node-joiner into the oc tool, as planned in   OCPSTRAT-784

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. None

Previous Work (Optional):

  1. https://issues.redhat.com/browse/AGENT-682

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Add an integration test to verify that the add-nodes command generates correctly the ISO.

review the proper usage & download of the envtest related binaries (api-server and etcd)

Feature Overview (aka. Goal Summary)

As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer - specifically for IPI deployments which currently use Terraform for setting up the infrastructure.

To avoid an increased support overhead once the license changes at the end of the year, we want to provision GCP infrastructure without the use of Terraform.

Requirements (aka. Acceptance Criteria):

  • The GCP IPI Installer no longer contains or uses Terraform.
  • The new provider should aim to provide the same results and have parity with the existing GCP Terraform provider. Specifically, we should aim for feature parity against the install config and the cluster it creates to minimize impact on existing customers' UX.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Provision GCP infrastructure without the use of Terraform

Why is this important?

  • Removing Terraform from Installer

Scenarios

  1. The new provider should aim to provide the same results as the existing GCP

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Description of problem:

After successful installation IPI or UPI cluster using minimum permissions, when destroying the cluster, it keeps telling error "failed to list target tcp proxies: googleapi: Error 403: Required 'compute.regionTargetTcpProxies.list' permission" unexpectedly.    

Version-Release number of selected component (if applicable):

    4.17.0-0.nightly-2024-09-01-175607

How reproducible:

    Always

Steps to Reproduce:

    1. try IPI or UPI installation using minimum permissions, and make sure it succeeds
    2. destroy the cluster using the same GCP credentials    

Actual results:

    It keeps telling below errors until timeout.

08-27 14:51:40.508  level=debug msg=Target TCP Proxies: failed to list target tcp proxies: googleapi: Error 403: Required 'compute.regionTargetTcpProxies.list' permission for 'projects/openshift-qe', forbidden
...output omitted...
08-27 15:08:18.801  level=debug msg=Target TCP Proxies: failed to list target tcp proxies: googleapi: Error 403: Required 'compute.regionTargetTcpProxies.list' permission for 'projects/openshift-qe', forbidden

Expected results:

    It should not try to list regional target tcp proxies, because CAPI installation only creates global target tcp proxy. And the service account given to installer already has the required compute.targetTcpProxies permissions (see [1] and [2]). 

Additional info:

    FYI the latest IPI PROW CI test was about 19 days ago, where no such issue, see https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-gcp-ipi-mini-perm-custom-type-f28/1823483536926052352

Required GCP permissions for installer-provisioned infrastructure https://docs.openshift.com/container-platform/4.16/installing/installing_gcp/installing-gcp-account.html#minimum-required-permissions-ipi-gcp_installing-gcp-account

Required GCP permissions for user-provisioned infrastructure https://docs.openshift.com/container-platform/4.16/installing/installing_gcp/installing-gcp-user-infra.html#minimum-required-permissions-upi-gcp_installing-gcp-user-infra

Description of problem:

    Shared VPC installation using service account having all required permissions failed due to cluster operator ingress degraded, by telling error "error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a5b1f420669b3474d959cff80e8452dc'"

Version-Release number of selected component (if applicable):

    4.17.0-0.nightly-multi-2024-08-07-221959

How reproducible:

    Always

Steps to Reproduce:

1. "create install-config", then insert the interested settings (see [1])
2. "create cluster" (see [2])

Actual results:

    Installation failed, because cluster operator ingress degraded (see [2] and [3]). 

$ oc get co ingress
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress             False       True          True       113m    The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a5b1f420669b3474d959cff80e8452dc', forbidden...
$ 

In fact the mentioned k8s firewall-rule doesn't exist in the host project (see [4]), and, the given service account does have enough permissions (see [6]).

Expected results:

    Installation succeeds, and all cluster operators are healthy. 

Additional info:

    

Description of problem:

installing into Shared VPC stuck in waiting for network infrastructure ready

Version-Release number of selected component (if applicable):

4.17.0-0.nightly-2024-06-10-225505

How reproducible:

Always

Steps to Reproduce:

1. "create install-config" and then insert Shared VPC settings (see [1])
2. activate the service account which has the minimum permissions in the host project (see [2])
3. "create cluster"

FYI The GCP project "openshift-qe" is the service project, and the GCP project "openshift-qe-shared-vpc" is the host project. 

Actual results:

1. Getting stuck in waiting for network infrastructure to become ready, until Ctrl+C is pressed.
2. 2 firewall-rules are created in the service project unexpectedly (see [3]).

Expected results:

The installation should succeed, and there should be no any firewall-rule getting created either in the service project or in the host project.

Additional info:

 

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

Allow customer to enabled EFS CSI usage metrics.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

OCP already supports exposing CSI usage metrics however the EFS metrics are not enabled by default. The goal of this feature is to allows customers to optionally turn on EFS CSI usage metrics in order to see them in the OCP console.

The EFS metrics are not enabled by default for a good reason as it can potentially impact performances.  It's disabled in OCP, because the CSI driver would walk through the whole volume, and that can be very slow on large volumes. For this reason, the default will remain the same (no metrics), customers would need explicitly opt-in.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

Clear procedure on how to enable it as a day 2 operation. Default remains no metrics. Once enabled the metrics should be available for visualisation.

 

We should also have a way to disable metrics.

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both both
Classic (standalone cluster) yes
Hosted control planes yes
Multi node, Compact (three node), or Single node (SNO), or all AWS only
Connected / Restricted Network both
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) all AWS/EFS supported
Operator compatibility EFS CSI operator
Backport needed (list applicable versions) No
UI need (e.g. OpenShift Console, dynamic plugin, OCM) Should appear in OCP UI automatically
Other (please specify) OCP on AWS only

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

As an OCP user i want to be able to visualise the EFS CSI metrics

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

Additional metrics

Enabling metrics by default.

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

Customer request as per 

https://issues.redhat.com/browse/RFE-3290

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

We need to be extra clear on the potential performance impact

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Document how to enable CSI metrics + warning about the potential performance impact.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

It can benefit any cluster on AWS using EFS CSI including ROSA

Epic Goal*

This goal of this epic is to provide a way to admin to turn on EFS CSI usage metrics. Since this could lead to performance because the CSI driver would walk through the whole volume this option will not be enabled by default; admin will need to explicitly opt-in.

 
Why is this important? (mandatory)

Turning on EFS metrics allows users to monitor how much EFS space is being used by OCP.

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1. As an admin I would like to turn on EFS CSI metrics 
  2. As an admin I would like to visualise how much EFS space is used by OCP.

 
Dependencies (internal and external) (mandatory)

None

Contributing Teams(and contacts) (mandatory) 

  • Development - STOR
  • Documentation - STOR
  • QE - STOR
  • PX - Yes, knowledge transfer
  • Others -

Acceptance Criteria (optional)

Enable CSI metrics via the operator - ensure the driver is started with the proper cmdline options. Verify that the metrics are sent and exposed to the users.

Drawbacks or Risk (optional)

Metrics are calculated by walking through the whole volume which can impact performances. For this reason enabling CSI metrics will need an explicit opt-in from the admin. This risk needs to be explicitly documented.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

AWS CAPI implementation supports "Tenancy" configuration option: https://pkg.go.dev/sigs.k8s.io/cluster-api-provider-aws@v1.5.0/api/v1beta1#AWSMachineSpec

This option corresponds to functionality OCP currently exposes through MAPI:

This option is currently in use by existing ROSA customers, and will need to be exposed in HyperShift NodePools

User Story:

As a (user persona), I want to be able to:

  • Set Tenancy options through the NodePool API.

so that I can achieve

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This does not require a design proposal.
This requires a feature gate.

User Story:

As a (user persona), I want to be able to:

  • Set Tenancy options through the NodePool API.

so that I can achieve

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This does not require a design proposal.
This requires a feature gate.

wrap nodePool tenancy API field in a struct, to group and easily add new placement options to the API in the future.

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

Introduce snapshots support for Azure File as Tech Preview

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

After introducing cloning support in 4.17, the goal of this epic is to add the last remaining piece to support snapshots support as tech preview

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

Should pass all the regular CSI snapshot tests. All failing or known issues should be documented in the RN. Since this feature is TP we can still introduce it with knows issues.

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both both
Classic (standalone cluster) yes
Hosted control planes yes
Multi node, Compact (three node), or Single node (SNO), or all all with Azure
Connected / Restricted Network all
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) all
Operator compatibility Azure File CSI
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM) Already covered
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

As an OCP on Azure user I want to perform snapshots of my PVC and be able to restore them as a new PVC.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

Is there any known issues, if so they should be documented.

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

N/A

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

We have support for other cloud providers CSI snapshots, we need to align capabilities in Azure with their File CSI. Upstream support has lagged.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

User experience should be the same as other CSI drivers.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Add snapshot support in the CSI driver table, if there is any specific information to add, include it in the Azure File CSI driver doc. Any known issue should be documented in the RN.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Can be leveraged by ARO or OSD on Azure.

Epic Goal*

Add support for snapshots in Azure File.

 

Why is this important? (mandatory)

We should track upstream issues and ensure enablement in OpenShift. Snapshots are a standard feature of CSI and the reason we did not support it until now was lacking upstream support for snapshot restoration.

Snapshot restore feature was added recently in upstream driver 1.30.3 which we rebased to in 4.17 - https://github.com/kubernetes-sigs/azurefile-csi-driver/pull/1904

Furthermore we already included azcopy cli which is a depencency of cloning (and snapshots). Enabling snapshots in 4.17 is therefore just a matter of adding a sidecar, volumesnapshotclass and RBAC in csi-operator which is cheap compared to the gain.

However, we've observed a few issues with cloning that might need further fixes to be able to graduate to GA and intend releasing the cloning feature as Tech Preview in 4.17 - since snapshots are implemented with azcopy too we expect similar issues and suggest releasing snapshot feature also as Tech Preview first in 4.17.

 
Scenarios (mandatory) 

Users should be able to create a snapshot and restore PVC from snapshots.

 
Dependencies (internal and external) (mandatory)

azcopy - already added in scope of cloning epic

upstream driver support for snapshot restore - already added via 4.17 rebase

 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

Introduce snapshots support for Azure File as Tech Preview

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

After introducing cloning support in 4.17, the goal of this epic is to add the last remaining piece to support snapshots support as tech preview

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

Should pass all the regular CSI snapshot tests. All failing or known issues should be documented in the RN. Since this feature is TP we can still introduce it with knows issues.

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both both
Classic (standalone cluster) yes
Hosted control planes yes
Multi node, Compact (three node), or Single node (SNO), or all all with Azure
Connected / Restricted Network all
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) all
Operator compatibility Azure File CSI
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM) Already covered
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

As an OCP on Azure user I want to perform snapshots of my PVC and be able to restore them as a new PVC.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

Is there any known issues, if so they should be documented.

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

N/A

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

We have support for other cloud providers CSI snapshots, we need to align capabilities in Azure with their File CSI. Upstream support has lagged.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

User experience should be the same as other CSI drivers.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Add snapshot support in the CSI driver table, if there is any specific information to add, include it in the Azure File CSI driver doc. Any known issue should be documented in the RN.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Can be leveraged by ARO or OSD on Azure.

Feature Overview

Enable sharing ConfigMap and Secret across namespaces

Requirements

Requirement Notes isMvp?
Secrets and ConfigMaps can get shared across namespaces   YES

Questions to answer…

NA

Out of Scope

NA

Background, and strategic fit

Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them. 

Documentation Considerations

Questions to be addressed:
 * What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
 * Does this feature have doc impact?
 * New Content, Updates to existing content, Release Note, or No Doc Impact
 * If unsure and no Technical Writer is available, please contact Content Strategy.
 * What concepts do customers need to understand to be successful in [action]?
 * How do we expect customers will use the feature? For what purpose(s)?
 * What reference material might a customer want/need to complete [action]?
 * Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
 * What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic Goal*

Remove the Shared Resource CSI Driver as a tech preview feature.
 
Why is this important? (mandatory)

Shared Resources was originally introduced as a tech preview feature in OpenShift Container Platform. After extensive review, we have decided to GA this component through the Builds for OpenShift layered product.

Expected GA will be alongside OpenShift 4.16. Therefore it is safe to remove in OpenShift 4.17

 
Scenarios (mandatory)

  1. Accessing RHEL content in builds/workloads
  2. Sharing other information across namespaces in the cluster (ex: OpenShift pull secret) 

 
Dependencies (internal and external) (mandatory)

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - OpenShift Storage, OpenShift Builds (#forum-openshift-builds)
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

  • Shared Resource CSI driver cannot be installed using OCP feature gates/tech preview feature set.

Drawbacks or Risk (optional)

  • Using Shared Resources requires installation of a layered product, not part of OCP core.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Feature Overview:

Ensure CSI Stack for Azure is running on management clusters with hosted control planes, allowing customers to associate a cluster as "Infrastructure only" and move the following parts of the stack:

  • Azure Disk CSI driver
  • Azure File CSI driver
  • Azure File CSI driver operator

Value Statement:

This feature enables customers to run their Azure infrastructure more efficiently and cost-effectively by using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. Additionally, customers should care most about workloads not the management stack to operate their clusters, this feature gets us closer to this goal.

Goals:

  1. Ability for customers to associate a cluster as "Infrastructure only" and pack control planes on role=infra nodes.
  2. Ability to run cluster-storage-operator (CSO) + Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods in the management cluster.
  3. Ability to run the driver DaemonSet in the hosted cluster.

Requirements:

  1. The feature must ensure that the CSI Stack for Azure is installed and running on management clusters with hosted control planes.
  2. The feature must allow customers to associate a cluster as "Infrastructure only" and pack control planes on role=infra nodes.
  3. The feature must enable the Azure Disk CSI driver, Azure File CSI driver, and Azure File CSI driver operator to run on the appropriate clusters.
  4. The feature must enable the cluster-storage-operator (CSO) + Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods to run in the management cluster.
  5. The feature must enable the driver DaemonSet to run in the hosted cluster.
  6. The feature must ensure security, reliability, performance, maintainability, scalability, and usability.

Use Cases:

  1. A customer wants to run their Azure infrastructure using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. They use this feature to associate a cluster as "Infrastructure only" and pack control planes on role=infra nodes.
  2. A customer wants to use Azure storage without having to see/manage its stack, especially on a managed service. This would mean that we need to run the cluster-storage-operator (CSO) + Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods in the management cluster and the driver DaemonSet in the hosted cluster. 

Questions to Answer:

  1. What Azure-specific considerations need to be made when designing and delivering this feature?
  2. How can we ensure the security, reliability, performance, maintainability, scalability, and usability of this feature?

Out of Scope:

Non-CSI Stack for Azure-related functionalities are out of scope for this feature.

Workload identity authentication is not covered by this feature - see STOR-1748

Background

This feature is designed to enable customers to run their Azure infrastructure more efficiently and cost-effectively by using HyperShift control planes and supporting infrastructure without incurring additional charges from Red Hat.

Documentation Considerations:

Documentation for this feature should provide clear instructions on how to enable the CSI Stack for Azure on management clusters with hosted control planes and associate a cluster as "Infrastructure only." It should also include instructions on how to move the Azure Disk CSI driver, Azure File CSI driver, and Azure File CSI driver operator to the appropriate clusters.

Interoperability Considerations:

This feature impacts the CSI Stack for Azure and any layered products that interact with it. Interoperability test scenarios should be factored by the layered products.

 

Epic Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

Run Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster allowing customers to associate a cluster as "Infrastructure only".

 

 
Why is this important? (mandatory)

This allows customers to run their Azure infrastructure more efficiently and cost-effectively by using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. Additionally, customers should care most about workloads not the management stack to operate their clusters, this feature gets us closer to this goal.

 
Scenarios (mandatory) 

When leveraging Hosted control planes, the Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods should run in the management cluster. The driver DaemonSet should run on the managed cluster. This deployment model should provide the same feature set as the regular OCP deployment.

 
Dependencies (internal and external) (mandatory)

Hosted control plane on Azure.

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - STOR
  • Documentation -
  • QE - 
  • PX - 
  • Others -

 

Done - Checklist (mandatory)

As part of this epic, Engineers working on Azure Hypershift should be able to build and use Azure Disk storage on hypershift guests via developer preview custom build images.

Epic Goal*

What is our purpose in implementing this?  What new capability will be available to customers?

Run Azure File CSI driver operator + Azure File CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster allowing customers to associate a cluster as "Infrastructure only".

 

 
Why is this important? (mandatory)

This allows customers to run their Azure infrastructure more efficiently and cost-effectively by using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. Additionally, customers should care most about workloads not the management stack to operate their clusters, this feature gets us closer to this goal.

 
Scenarios (mandatory) 

When leveraging Hosted control planes, the Azure File CSI driver operator + Azure File CSI driver control-plane Pods should run in the management cluster. The driver DaemonSet should run on the managed cluster. This deployment model should provide the same feature set as the regular OCP deployment.

 
Dependencies (internal and external) (mandatory)

Hosted control plane on Azure.

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - STOR
  • Documentation -
  • QE - 
  • PX - 
  • Others -

 

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Goal

This goals of this features are:

  • optimize and streamline the operations of HyperShift Operator (HO) on Azure Kubernetes Service (AKS) clusters
  • Enable auto-detectopm of the underlying environment (managed or self-managed) to optimize the HO accordingly.

Place holder epic to capture all azure tickets.

TODO: review.

User Story:

As an end user of a hypershift cluster, I want to be able to:

  • Not see internal host information when inspecting a serving certificate of the kubernetes API server

so that I can achieve

  • No knowledge of internal names for the kubernetes cluster.

From slack thread: https://redhat-external.slack.com/archives/C075PHEFZKQ/p1722615219974739 

We need 4 different certs:

  • common sans
  • internal san
  • fqdn
  • svc ip

Feature Overview (aka. Goal Summary)  

Unify and update hosted control planes storage operators so that they have similar code patterns and can run properly in both standalone OCP and HyperShift's control plane.

Goals (aka. expected user outcomes)

  • Simplify the operators with a unified code pattern
  • Expose metrics from control-plane components
  • Use proper RBACs in the guest cluster
  • Scale the pods according to HostedControlPlane's AvailabilityPolicy
  • Add proper node selector and pod affinity for mgmt cluster pods

Requirements (aka. Acceptance Criteria):

  • OCP regression tests work in both standalone OCP and HyperShift
  • Code in the operators looks the same
  • Metrics from control-plane components are exposed
  • Proper RBACs are used in the guest cluster
  • Pods scale according to HostedControlPlane's AvailabilityPolicy
  • Proper node selector and pod affinity is added for mgmt cluster pods

 

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

 

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

 

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

 

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

 

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

 

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

 

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

Epic Goal*

Our current design of EBS driver operator to support Hypershift does not scale well to other drivers. Existing design will lead to more code duplication between driver operators and possibility of errors.
 
Why is this important? (mandatory)

An improved design will allow more storage drivers and their operators to be added to hypershift without requiring significant changes in the code internals.
 
Scenarios (mandatory) 

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Finally switch both CI and ART to the refactored aws-ebs-csi-driver-operator.

The functionality and behavior should be the same as the existing operator, however, the code is completely new. There could be some rough edges. See https://github.com/openshift/enhancements/blob/master/enhancements/storage/csi-driver-operator-merge.md 

 

Ci should catch the most obvious errors, however, we need to test features that we do not have in CI. Like:

  • custom CA bundles
  • cluster-wide proxy
  • custom encryption keys used in install-config.yaml
  • government cluster
  • STS
  • SNO
  • and other

Out CSI driver YAML files are mostly copy-paste from the initial CSI driver (AWS EBS?). 

As OCP engineer, I want the YAML files to be generated, so we can keep consistency among the CSI drivers easily and make them less error-prone.

It should have no visible impact on the resulting operator behavior.

Feature Overview

Support deploying an OpenShift cluster across multiple vSphere clusters, i.e. configuring multiple vCenter servers in one OpenShift cluster.

Goals

Multiple vCenter support in the Cloud Provider Interface (CPI) and the Cloud Storage Interface (CSI).

Use Cases

Customers want to deploy OpenShift across multiple vSphere clusters (vCenters) primarily for high availability.

 

粗文本*h3. *Feature Overview

Support deploying an OpenShift cluster across multiple vSphere clusters, i.e. configuring multiple vCenter servers in one OpenShift cluster.

Goals

Multiple vCenter support in the Cloud Provider Interface (CPI) and the Cloud Storage Interface (CSI).

Use Cases

Customers want to deploy OpenShift across multiple vSphere clusters (vCenters) primarily for high availability.

 

Done Done Done Criteria

This section contains all the test cases that we need to make sure work as part of the done^3 criteria.

  • Clean install of new cluster with multi vCenter configuration
  • Clean install of new cluster with single vCenter still working as previously
  • VMs / machines can be scaled across all vCenters / Failure Domains
  • PVs should be able to be created on all vCenters

Out-of-Scope

This section contains all scenarios that are considered out of scope for this enhancement that will be done via a separate epic / feature / story.

  • Migration of single vCenter OCP to a multi vCenter (stretch
  •  

Feature Overview

Add authentication to the internal components of the Agent Installer so that the cluster install is secure.

Goals

  • Day1: Only allow agents booted from the same agent ISO to register with the assisted-service and use the agent endpoints
  • Day2: Only allow agents booted from the same node ISO to register with the assisted-service and use the agent endpoints
  •  
  • Only allow access to write endpoints to the internal services
  • Use authentication to read endpoints

 

Epic Goal

  • This epic scope was originally to encompass both authentication and authorization but we have split the expanding scope into a separate epic.
  • We want to add authorization to the internal components of Agent Installer so that the cluster install is secure. 

Why is this important?

  • The Agent Installer API server (assisted-service) has several methods for Authorization but none of the existing methods are applicable tothe Agent Installer use case. 
  • During the MVP of Agent Installer we attempted to turn on the existing authorization schemes but found we didn't have access to the correct API calls.
  • Without proper authorization it is possible for an unauthorized node to be added to the cluster during install. Currently we expect this to be done as a mistake rather than maliciously.

Brainstorming Notes:

Requirements

  • Allow only agents booted from the same ISO to register with the assisted-service and use the agent endpoints
  • Agents already know the InfraEnv ID, so if read access requires authentication then that is sufficient in some existing auth schemes.
  • Prevent access to write endpoints except by the internal systemd services
  • Use some kind of authentication for read endpoints
  • Ideally use existing credentials - admin-kubeconfig client cert and/or kubeadmin-password
  • (Future) Allow UI access in interactive mode only

 

Are there any requirements specific to the auth token?

  • Ephemeral
  • Limited to one cluster: Reuse the existing admin-kubeconfig client cert

 

Actors:

  • Agent Installer: example wait-for
  • Internal systemd: configurations, create cluster infraenv, etc
  • UI: interactive user
  • User: advanced automation user (not supported yet)

 

Do we need more than one auth scheme?

Agent-admin - agent-read-write

Agent-user - agent-read

Options for Implementation:

  1. New auth scheme in assisted-service
  2. Reverse proxy in front of assisted-service API
  3. Use an existing auth scheme in assisted-service

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Previous Work (Optional):

  1. AGENT-60 Originally we wanted to just turn on local authorization for Agent Installer workflows. It was discovered this was not sufficient for our use case.

Open questions::

  1. Which API endpoints do we need for the interactive flow?
  2. What auth scheme does the Assisted UI use if any?

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

As a user, when creating node ISOs, I want to be able to:

  • See the ISO's expiration time logged when the ISO is generated using "oc adm node-image create"

so that I can achieve

  • Enhanced awareness of the ISO expiration date
  • Prevention of unexpected expiration issues
  • Improved overall user experience during node creation

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator (in csi-operator)
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • openstack-cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • csi-driver-manila-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator
  • ibm-powervs-block-csi-driver-operator
  • secrets-store-csi-driver-operator

 

  • cluster-storage-operator
  • cluster-csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

EOL, do not upgrade:

  • github.com/oVirt/csi-driver-operator

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

Feature Overview

Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

 
Goals

  • Functionality on GCP GA
  • inclusion in the cluster backups
  • flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

List any affected packages or components.

  • Installer
  • Cluster Infrastructure
  • Storage
  • Node
  • NetworkEdge
  • Internal Registry
  • CCO

This is continuation of CORS-2455 / CFE-719 work, where support for GCP tags & labels delivered as TechPreview in 4.14 and to make it GA in 4.15. It would involve removing any reference to TechPreview in code and doc and to incorporate any feedback received from the users.

TechPreview featureSet check added in machine-api-provider-gcp operator for userLabels and userTags.

And the new featureGate added in openshift/api should also be removed.

Acceptance Criteria

  • Should be able to define userLabel and userTags without setting featureSet.

TechPreview featureSet check added in installer for userLabels and userTags should be removed and the TechPreview reference made in the install-config GCP schema should be removed.

Acceptance Criteria

  • Should be able to define userLabel and userTags without setting TechPreviewNoUpgrade featureSet.

Feature Overview (aka. Goal Summary)

As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for IPI deployments which currently use Terraform for setting up the infrastructure.

To avoid an increased support overhead once the license changes at the end of the year, we want to provision Azure infrastructure without the use of Terraform.

Requirements (aka. Acceptance Criteria):

  • The Azure IPI Installer no longer contains or uses Terraform.
  • The new provider should aim to provide the same results and have parity with the existing Azure Terraform provider. Specifically, we should aim for feature parity against the install config and the cluster it creates to minimize impact on existing customers' UX.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Provision Azure infrastructure without the use of Terraform

Why is this important?

  • Removing Terraform from Installer

Scenarios

  1. The new provider should aim to provide the same results as the existing Azure
  2. terraform provider.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Description of problem:

    CAPZ creates an empty route table during installs

Version-Release number of selected component (if applicable):

4.17    

How reproducible:

    Very

Steps to Reproduce:

    1.Install IPI cluster using CAPZ
    2.
    3.
    

Actual results:

    Empty route table created and attached to worker subnet

Expected results:

    No route table created

Additional info:

    

Description of problem:

Failed to create second cluster in shared vnet, below error is thrown out during creating network infrastructure when creating 2nd cluster, installer timed out and exited.
==============
07-23 14:09:27.315  level=info msg=Waiting up to 15m0s (until 6:24AM UTC) for network infrastructure to become ready...
...
07-23 14:16:14.900  level=debug msg=	failed to reconcile cluster services: failed to reconcile AzureCluster service loadbalancers: failed to create or update resource jima0723b-1-x6vpp-rg/jima0723b-1-x6vpp-internal (service: loadbalancers): PUT https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-1-x6vpp-rg/providers/Microsoft.Network/loadBalancers/jima0723b-1-x6vpp-internal
07-23 14:16:14.900  level=debug msg=	--------------------------------------------------------------------------------
07-23 14:16:14.901  level=debug msg=	RESPONSE 400: 400 Bad Request
07-23 14:16:14.901  level=debug msg=	ERROR CODE: PrivateIPAddressIsAllocated
07-23 14:16:14.901  level=debug msg=	--------------------------------------------------------------------------------
07-23 14:16:14.901  level=debug msg=	{
07-23 14:16:14.901  level=debug msg=	  "error": {
07-23 14:16:14.901  level=debug msg=	    "code": "PrivateIPAddressIsAllocated",
07-23 14:16:14.901  level=debug msg=	    "message": "IP configuration /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-1-x6vpp-rg/providers/Microsoft.Network/loadBalancers/jima0723b-1-x6vpp-internal/frontendIPConfigurations/jima0723b-1-x6vpp-internal-frontEnd is using the private IP address 10.0.0.100 which is already allocated to resource /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/frontendIPConfigurations/jima0723b-49hnw-internal-frontEnd.",
07-23 14:16:14.902  level=debug msg=	    "details": []
07-23 14:16:14.902  level=debug msg=	  }
07-23 14:16:14.902  level=debug msg=	}
07-23 14:16:14.902  level=debug msg=	--------------------------------------------------------------------------------

Install-config for 1st cluster:
=========
metadata:
  name: jima0723b
platform:
  azure:
    region: eastus
    baseDomainResourceGroupName: os4-common
    networkResourceGroupName: jima0723b-rg
    virtualNetwork: jima0723b-vnet
    controlPlaneSubnet: jima0723b-master-subnet
    computeSubnet: jima0723b-worker-subnet
publish: External

Install-config for 2nd cluster:
========
metadata:
  name: jima0723b-1
platform:
  azure:
    region: eastus
    baseDomainResourceGroupName: os4-common
    networkResourceGroupName: jima0723b-rg
    virtualNetwork: jima0723b-vnet
    controlPlaneSubnet: jima0723b-master-subnet
    computeSubnet: jima0723b-worker-subnet
publish: External

shared master subnet/worker subnet:
$ az network vnet subnet list -g jima0723b-rg --vnet-name jima0723b-vnet -otable
AddressPrefix    Name                     PrivateEndpointNetworkPolicies    PrivateLinkServiceNetworkPolicies    ProvisioningState    ResourceGroup
---------------  -----------------------  --------------------------------  -----------------------------------  -------------------  ---------------
10.0.0.0/24      jima0723b-master-subnet  Disabled                          Enabled                              Succeeded            jima0723b-rg
10.0.1.0/24      jima0723b-worker-subnet  Disabled                          Enabled                              Succeeded            jima0723b-rg

internal lb frontedIPConfiguration on 1st cluster:
$ az network lb show -n jima0723b-49hnw-internal -g jima0723b-49hnw-rg --query 'frontendIPConfigurations'
[
  {
    "etag": "W/\"7a7531ca-fb02-48d0-b9a6-d3fb49e1a416\"",
    "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/frontendIPConfigurations/jima0723b-49hnw-internal-frontEnd",
    "inboundNatRules": [
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/inboundNatRules/jima0723b-49hnw-master-0",
        "resourceGroup": "jima0723b-49hnw-rg"
      },
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/inboundNatRules/jima0723b-49hnw-master-1",
        "resourceGroup": "jima0723b-49hnw-rg"
      },
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/inboundNatRules/jima0723b-49hnw-master-2",
        "resourceGroup": "jima0723b-49hnw-rg"
      }
    ],
    "loadBalancingRules": [
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/loadBalancingRules/LBRuleHTTPS",
        "resourceGroup": "jima0723b-49hnw-rg"
      },
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/loadBalancingRules/sint-v4",
        "resourceGroup": "jima0723b-49hnw-rg"
      }
    ],
    "name": "jima0723b-49hnw-internal-frontEnd",
    "privateIPAddress": "10.0.0.100",
    "privateIPAddressVersion": "IPv4",
    "privateIPAllocationMethod": "Static",
    "provisioningState": "Succeeded",
    "resourceGroup": "jima0723b-49hnw-rg",
    "subnet": {
      "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-rg/providers/Microsoft.Network/virtualNetworks/jima0723b-vnet/subnets/jima0723b-master-subnet",
      "resourceGroup": "jima0723b-rg"
    },
    "type": "Microsoft.Network/loadBalancers/frontendIPConfigurations"
  }
]

From above output, privateIPAllocationMethod is static and always allocate privateIPAddress to 10.0.0.100, this might cause the 2nd cluster installation failure.

Checked the same on cluster created by using terraform, privateIPAllocationMethod is dynamic.
===============
$ az network lb show -n wxjaz723-pm99k-internal -g wxjaz723-pm99k-rg --query 'frontendIPConfigurations'
[
  {
    "etag": "W/\"e6bec037-843a-47ba-a725-3f322564be58\"",
    "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-pm99k-rg/providers/Microsoft.Network/loadBalancers/wxjaz723-pm99k-internal/frontendIPConfigurations/internal-lb-ip-v4",
    "loadBalancingRules": [
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-pm99k-rg/providers/Microsoft.Network/loadBalancers/wxjaz723-pm99k-internal/loadBalancingRules/api-internal-v4",
        "resourceGroup": "wxjaz723-pm99k-rg"
      },
      {
        "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-pm99k-rg/providers/Microsoft.Network/loadBalancers/wxjaz723-pm99k-internal/loadBalancingRules/sint-v4",
        "resourceGroup": "wxjaz723-pm99k-rg"
      }
    ],
    "name": "internal-lb-ip-v4",
    "privateIPAddress": "10.0.0.4",
    "privateIPAddressVersion": "IPv4",
    "privateIPAllocationMethod": "Dynamic",
    "provisioningState": "Succeeded",
    "resourceGroup": "wxjaz723-pm99k-rg",
    "subnet": {
      "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-rg/providers/Microsoft.Network/virtualNetworks/wxjaz723-vnet/subnets/wxjaz723-master-subnet",
      "resourceGroup": "wxjaz723-rg"
    },
    "type": "Microsoft.Network/loadBalancers/frontendIPConfigurations"
  },
...
]

Version-Release number of selected component (if applicable):

  4.17 nightly build

How reproducible:

  Always

Steps to Reproduce:

    1. Create shared vnet / master subnet / worker subnet
    2. Create 1st cluster in shared vnet
    3. Create 2nd cluster in shared vnet
    

Actual results:

    2nd cluster installation failed

Expected results:

    Both clusters are installed successfully.

Additional info:

    

 

Description of problem:

Install Azure fully private IPI cluster by using CAPI with payload built from cluster bot including openshift/installer#8727,openshift/installer#8732,

install-config:
=================
platform:
  azure:
    region: eastus
    outboundType: UserDefinedRouting
    networkResourceGroupName: jima24b-rg
    virtualNetwork: jima24b-vnet
    controlPlaneSubnet: jima24b-master-subnet
    computeSubnet: jima24b-worker-subnet
publish: Internal
featureSet: TechPreviewNoUpgrade

Checked storage account created by installer, its property allowBlobPublicAccess is set to True.
$ az storage account list -g jima24b-fwkq8-rg --query "[].[name,allowBlobPublicAccess]" -o tsv
jima24bfwkq8sa    True

This is not consistent with terraform code, https://github.com/openshift/installer/blob/master/data/data/azure/vnet/main.tf#L74

At least, storage account should have no public access for fully private cluster.

Version-Release number of selected component (if applicable):

    4.17 nightly build

How reproducible:

    Always

Steps to Reproduce:

    1. Create fully private cluster
    2. Check storage account created by installer
    3.
    

Actual results:

    storage account have public access on fully private cluster.

Expected results:

     storage account should have no public access on fully private cluster.

Additional info:

    

Description of problem:

In install-config file, there is no zone/instance type setting under controlplane or defaultMachinePlatform
==========================
featureSet: CustomNoUpgrade
featureGates:
- ClusterAPIInstallAzure=true
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3

create cluster, master instances should be created in multi zones, since default instance type 'Standard_D8s_v3' have availability zones. Actually, master instances are not created in any zone.
$ az vm list -g jima24a-f7hwg-rg -otable
Name                                        ResourceGroup     Location        Zones
------------------------------------------  ----------------  --------------  -------
jima24a-f7hwg-master-0                      jima24a-f7hwg-rg  southcentralus
jima24a-f7hwg-master-1                      jima24a-f7hwg-rg  southcentralus
jima24a-f7hwg-master-2                      jima24a-f7hwg-rg  southcentralus
jima24a-f7hwg-worker-southcentralus1-wxncv  jima24a-f7hwg-rg  southcentralus  1
jima24a-f7hwg-worker-southcentralus2-68nxv  jima24a-f7hwg-rg  southcentralus  2
jima24a-f7hwg-worker-southcentralus3-4vts4  jima24a-f7hwg-rg  southcentralus  3

Version-Release number of selected component (if applicable):

4.17.0-0.nightly-2024-06-23-145410

How reproducible:

Always

Steps to Reproduce:

1. CAPI-based install on azure platform with default configuration
2. 
3.

Actual results:

master instances are created but not in any zone.

Expected results:

master instances should be created per zone based on selected instance type, keep the same behavior as terraform based install.

Additional info:

When setting zones under controlPlane in install-config, master instances can be created per zone.
install-config:
===========================
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    azure:
      zones: ["1","3"]

$ az vm list -g jima24b-p76w4-rg -otable
Name                                        ResourceGroup     Location        Zones
------------------------------------------  ----------------  --------------  -------
jima24b-p76w4-master-0                      jima24b-p76w4-rg  southcentralus  1
jima24b-p76w4-master-1                      jima24b-p76w4-rg  southcentralus  3
jima24b-p76w4-master-2                      jima24b-p76w4-rg  southcentralus  1
jima24b-p76w4-worker-southcentralus1-bbcx8  jima24b-p76w4-rg  southcentralus  1
jima24b-p76w4-worker-southcentralus2-nmgfd  jima24b-p76w4-rg  southcentralus  2
jima24b-p76w4-worker-southcentralus3-x2p7g  jima24b-p76w4-rg  southcentralus  3

 

Description of problem:

Launch CAPI based installation on Azure Government Cloud, installer was timeout when waiting for network infrastructure to become ready.

06-26 09:08:41.153  level=info msg=Waiting up to 15m0s (until 9:23PM EDT) for network infrastructure to become ready...
...
06-26 09:09:33.455  level=debug msg=E0625 21:09:31.992170   22172 azurecluster_controller.go:231] "failed to reconcile AzureCluster" err=<
06-26 09:09:33.455  level=debug msg=	failed to reconcile AzureCluster service group: reconcile error that cannot be recovered occurred: resource is not Ready: The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found.: PUT https://management.azure.com/subscriptions/8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7/resourceGroups/jima26mag-9bqkl-rg
06-26 09:09:33.456  level=debug msg=	--------------------------------------------------------------------------------
06-26 09:09:33.456  level=debug msg=	RESPONSE 404: 404 Not Found
06-26 09:09:33.456  level=debug msg=	ERROR CODE: SubscriptionNotFound
06-26 09:09:33.456  level=debug msg=	--------------------------------------------------------------------------------
06-26 09:09:33.456  level=debug msg=	{
06-26 09:09:33.456  level=debug msg=	  "error": {
06-26 09:09:33.456  level=debug msg=	    "code": "SubscriptionNotFound",
06-26 09:09:33.456  level=debug msg=	    "message": "The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found."
06-26 09:09:33.456  level=debug msg=	  }
06-26 09:09:33.456  level=debug msg=	}
06-26 09:09:33.456  level=debug msg=	--------------------------------------------------------------------------------
06-26 09:09:33.456  level=debug msg=	. Object will not be requeued
06-26 09:09:33.456  level=debug msg= > logger="controllers.AzureClusterReconciler.reconcileNormal" controller="azurecluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureCluster" AzureCluster="openshift-cluster-api-guests/jima26mag-9bqkl" namespace="openshift-cluster-api-guests" reconcileID="f2ff1040-dfdd-4702-ad4a-96f6367f8774" x-ms-correlation-request-id="d22976f0-e670-4627-b6f3-e308e7f79def" name="jima26mag-9bqkl"
06-26 09:09:33.457  level=debug msg=I0625 21:09:31.992215   22172 recorder.go:104] "failed to reconcile AzureCluster: failed to reconcile AzureCluster service group: reconcile error that cannot be recovered occurred: resource is not Ready: The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found.: PUT https://management.azure.com/subscriptions/8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7/resourceGroups/jima26mag-9bqkl-rg\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: SubscriptionNotFound\n--------------------------------------------------------------------------------\n{\n  \"error\": {\n    \"code\": \"SubscriptionNotFound\",\n    \"message\": \"The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found.\"\n  }\n}\n--------------------------------------------------------------------------------\n. Object will not be requeued" logger="events" type="Warning" object={"kind":"AzureCluster","namespace":"openshift-cluster-api-guests","name":"jima26mag-9bqkl","uid":"20bc01ee-5fbe-4657-9d0b-7013bd55bf96","apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","resourceVersion":"1115"} reason="ReconcileError"
06-26 09:17:40.081  level=debug msg=I0625 21:17:36.066522   22172 helpers.go:516] "returning early from secret reconcile, no update needed" logger="controllers.reconcileAzureSecret" controller="ASOSecret" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureCluster" AzureCluster="openshift-cluster-api-guests/jima26mag-9bqkl" namespace="openshift-cluster-api-guests" name="jima26mag-9bqkl" reconcileID="2df7c4ba-0450-42d2-901e-683de399f8d2" x-ms-correlation-request-id="b2bfcbbe-8044-472f-ad00-5c0786ebbe84"
06-26 09:23:46.611  level=debug msg=Collecting applied cluster api manifests...
06-26 09:23:46.611  level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure is not ready: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
06-26 09:23:46.611  level=info msg=Shutting down local Cluster API control plane...
06-26 09:23:46.612  level=info msg=Stopped controller: Cluster API
06-26 09:23:46.612  level=warning msg=process cluster-api-provider-azure exited with error: signal: killed
06-26 09:23:46.612  level=info msg=Stopped controller: azure infrastructure provider
06-26 09:23:46.612  level=warning msg=process cluster-api-provider-azureaso exited with error: signal: killed
06-26 09:23:46.612  level=info msg=Stopped controller: azureaso infrastructure provider
06-26 09:23:46.612  level=info msg=Local Cluster API system has completed operations
06-26 09:23:46.612  [ERROR] Installation failed with error code '4'. Aborting execution.

From above log, Azure Resource Management API endpoint is not correct, endpoint "management.azure.com" is for Azure Public cloud, the expected one for Azure Government should be "management.usgovcloudapi.net".

Version-Release number of selected component (if applicable):

    4.17.0-0.nightly-2024-06-23-145410

How reproducible:

    Always

Steps to Reproduce:

    1. Install cluster on Azure Government Cloud, capi-based installation 
    2.
    3.
    

Actual results:

    Installation failed because of the wrong Azure Resource Management API endpoint used.

Expected results:

    Installation succeeded.

Additional info:

    

Epic Goal*

There was an epic / enhancement to create a cluster-wide TLS config that applies to all OpenShift components:

https://issues.redhat.com/browse/OCPPLAN-4379
https://github.com/openshift/enhancements/blob/master/enhancements/kube-apiserver/tls-config.md

For example, this is how KCM sets --tls-cipher-suites and --tls-min-version based on the observed config:

https://issues.redhat.com/browse/WRKLDS-252
https://github.com/openshift/cluster-kube-controller-manager-operator/pull/506/files

The cluster admin can change the config based on their risk profile, but if they don't change anything, there is a reasonable default.

We should update all CSI driver operators to use this config. Right now we have a hard-coded cipher list in library-go. See OCPBUGS-2083 and OCPBUGS-4347 for background context.

 
Why is this important? (mandatory)

This will keep the cipher list consistent across many OpenShift components. If the default list is changed, we get that change "for free".

It will reduce support calls from customers and backport requests when the recommended defaults change.

It will provide flexibility to the customer, since they can set their own TLS profile settings without requiring code change for each component.

 
Scenarios (mandatory) 

As a cluster admin, I want to use TLSSecurityProfile to control the cipher list and minimum TLS version for all CSI driver operator sidecars, so that I can adjust the settings based on my own risk assessment.

 
Dependencies (internal and external) (mandatory)

None, the changes we depend on were already implemented.

 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation - 
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be "Release Pending" 

Template:

Networking Definition of Planned

Epic Template descriptions and documentation

Epic Goal

openshift-sdn is no longer part of OCP in 4.17, so remove references to it in the networking APIs.

Consider whether we can remove the entire network.openshift.io API, which will now be no-ops.

In places where both sdn and ovn-k are supported, remove references to sdn.

In some places (notably the migration API), we will probably leave an API in place that currently has no purpose.

Why is this important?

Planning Done Checklist

The following items must be completed on the Epic prior to moving the Epic from Planning to the ToDo status

  • Priority+ is set by engineering
  • Epic must be Linked to a +Parent Feature
  • Target version+ must be set
  • Assignee+ must be set
  • (Enhancement Proposal is Implementable
  • (No outstanding questions about major work breakdown
  • (Are all Stakeholders known? Have they all been notified about this item?
  • Does this epic affect SD? {}Have they been notified{+}? (View plan definition for current suggested assignee)
    1. Please use the “Discussion Needed: Service Delivery Architecture Overview” checkbox to facilitate the conversation with SD Architects. The SD architecture team monitors this checkbox which should then spur the conversation between SD and epic stakeholders. Once the conversation has occurred, uncheck the “Discussion Needed: Service Delivery Architecture Overview” checkbox and record the outcome of the discussion in the epic description here.
    2. The guidance here is that unless it is very clear that your epic doesn’t have any managed services impact, default to use the Discussion Needed checkbox to facilitate that conversation.

Additional information on each of the above items can be found here: Networking Definition of Planned

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement
    details and documents.

...

Dependencies (internal and external)

1.

...

Previous Work (Optional):

1. …

Open questions::

1. …

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Goal:

As an administrator, I would like to use my own managed DNS solution instead of only specific openshift-install supported DNS services (such as AWS Route53, Google Cloud DNS, etc...) for my OpenShift deployment.

 

Problem:

While cloud-based DNS services provide convenient hostname management, there's a number of regulatory (ITAR) and operational constraints customers face prohibiting the use of those DNS hosting services on public cloud providers.

 

Why is this important:

  • Provides customers with the flexibility to leverage their own custom managed ingress DNS solutions already in use within their organizations.
  • Required for regions like AWS GovCloud in which many customers may not be able to use the Route53 service (only for commercial customers) for both internal or ingress DNS.
  • OpenShift managed internal DNS solution ensures cluster operation and nothing breaks during updates.

 

Dependencies (internal and external):

 

Prioritized epics + deliverables (in scope / not in scope):

  • Ability to bootstrap cluster without an OpenShift managed internal DNS service running yet
  • Scalable, cluster (internal) DNS solution that's not dependent on the operation of the control plane (in case it goes down)
  • Ability to automatically propagate DNS record updates to all nodes running the DNS service within the cluster
  • Option for connecting cluster to customers ingress DNS solution already in place within their organization

 

Estimate (XS, S, M, L, XL, XXL):

 

Previous Work:

 

Open questions:

 

Link to Epic: https://docs.google.com/document/d/1OBrfC4x81PHhpPrC5SEjixzg4eBnnxCZDr-5h3yF2QI/edit?usp=sharing

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Update Installer changes to accompany the move to Installation with CAPG

Why is this important?

  • https://issues.redhat.com/browse/CORS-2460 was completed when installation on GCP was completed using terraform. Now, with the removal of terraform based installation and move to CAPI based installation, some of the previously completed tasks need to be revisited-re-implemented.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>
  1. Grab the Network Field of GCPClusterStatus
  2. Within the Network Struct, grab the APIServerForwardingRule and APIInternalForwardingRule fields
  3. Each of these fields are of type ForwardingRule which in turn contain the IPaddress of the LB
  4. Verify accuracy of this IP address by calling this method even when custom-dns is not configured. Compare the IP address extracted by this method with the DNS configuration.
  5. Using existing methods to add the above IP address to the Infra CR within the bootstrap Ignition.

Incomplete Features

When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release

Epic Goal

  • Rename `local-cluster` in RHACM.

Why is this important?

  • Customers have found it confusing to see the `local-cluster` as a hardcoded object in their ACM clusters list. 
    • They have not complained about the fact that it is there, but rather just the name of it.
  • In particular, as the architecture of RHACM evolves to include a global Hub of Hubs, the management of sub-hubs ("leaf hubs") will get problematic if we start to see numerous managed sub-hubs all with the same name `local-cluster` being imported to the global hub.

Scenarios

  1. Customer installs RHACM
  2. Customer sees local-cluster in the all clusters list
  3. Customer can rename local-cluster as needed

Alternate scenario

  1. Customer installs RHACM
  2. customer sees the management hub in the all clusters list with a unique cluster ID, not a user-configurable name
  3. Customer cannot rename local-cluster as needed; instead they could use a label to indicate some colloquial nickname 

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. Too many to accurately list at this point, but we need to consider every component, every part of RHACM.

Previous Work (Optional):

Open questions:

  1. Should the local-cluster object be a standardized unique cluster ID? or should it be user configurable?

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR> 

Slack Channel

#acm-1290-rename-local-cluster

Feature goal (what are we trying to solve here?)

Remove hard-coded local-cluster from the import local cluster feature and verify that we don't use it in the infrastructure operator  

DoD (Definition of Done)

Testing the import local cluster and checking the behavior after the upgrade.

Does it need documentation support?

Yes.

Feature origin (who asked for this feature?)

  • A Customer asked for it

  •  

No

  • A solution architect asked for it

  •  

No

 

  • Internal request

 

  •  

Reasoning (why it's important?)

  • behavior change in ACM

Competitor analysis reference

  • Do our competitors have this feature?
    • No

Feature usage (do we have numbers/data?)

  • Not relevant

Feature availability (why should/shouldn't it live inside the UI/API?)

  • Not relevant to UI

Presently the name of the local-cluster is hardwired to "local-cluster" in the local cluster import tool.
It is possible to redefine the name of the "local-cluster" in ACM then the correct local-cluster name needs to be picked up and used by the ManagedCluster.

Suggested approach

1: Obtain the correct "local-cluster" name from the ManagedCluster CR that has been labelled as "local-cluster"
2: Use this name to import the local cluster, annotate the created AgentServiceConfig, ClusterDeployment and InfraEnv as a "local cluster"
3: Handle any updates to ManagedCluster to keep the name in sync.
4: During deletion of local cluster CRs, this annotation may be used to identify CRs to be deleted.

This will leave an edge case, there will be an AgentServiceConfig, ClusterDeployment and InfraEnv "left behind" for any users who have renamed their ManagedCluster and then performed an upgrade to this new version. Those users will need to manually remove these CR's. (I will discuss further with ACM to determine a suitable course of action here.)

This makes the following assumptions, which should also be checked with the ACM team.

1: ACM users may rename their "local-cluster" in ACM (meaning that we should pick this change up)
2: ACM will use the label "local-cluster" in the ManagedCluster CR to signify a local cluster
3: There will only be one "local-cluster" in ACM (note that it's possible to add a label arbitrarily so this may not be properly enforceable.)

Requirement description:

As an VM Admin, I want to improve overall density. In our traditional VM environments, we find that we are memory bound much more than CPU. Even with properly sized VMs, we see a lot of memory just sitting around allocated to the VM, but not actually used. Moreover, we always see people requesting VMs that are sized way too big for their workloads. It is better customer service allow it to some degree and then recover the memory at the hypervisor level.

MVP:

  • Move SWAP to beta (OCP TP)
  • Dashboard for monitoring
  • Make sure the scheduler sees the real memory available, rather than that allocated to the VMs.

Documents:

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Prometheus query for UI:
sum by (instance)(((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) + (node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes)) / node_memory_MemTotal_bytes) *100

In human words: This is approximating how much over-committment of memory is taking place. A value of 100 means RAM+SWAP usage are 100% of system RAM capacity. 105% means RAM+SWAP are factor 105% of system RAM capacity.

Threshold: Yellow 95%, Red 105%
Based on: https://docs.google.com/document/d/1AbR1LACNMRU2QMqFpe-Se2mCEFLMqW_M9OPKh2v3yYw,

https://docs.google.com/document/d/1E1joajwxQChQiDVTsr9Qk_iIhpQkSI-VQP-o_BMx8Aw

 

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Description

“In order to have the same UX/UI in the dev and admin perspectives, we as the Observability UI Team need to reuse the dashboards coming from the monitoring plugin”

Goals & Outcomes

Product Requirements:

  • The dev console dashboards are loaded from the monitoring plugin

Background

The admin console's alert details page is provided by https://github.com/openshift/monitoring-plugin, but the dev console's equivalent page is still provided by code in the console codebase.

The dev console page displays fewer dashboards than the admin version of the page, so that difference will need to be supported by monitoring-plugin.

Outcomes

  • The dev console page for dashboards is loaded from monitoring-plugin and the code for the page is removed from the console codebase.
  • The dev console version of the page has the project selector dropdown, but the admin console page doesn't, so monitoring-plugin will need to be changed to support that difference.
  • We need to check when fetching dashboards that the dev and admin dashboards are fetched from the right endpoint

Proposed title of this feature request

Fleet / Multicluster Alert Management User Interface

What is the nature and description of the request?

Large enterprises are drowning in cluster alerts.

side note: Just within my demo RHACM Hub environment, across 12 managed clusters (OCP, SNO, ARO, ROSA, self-managed HCP, xKS), I have 62 alerts being reported! And I have no idea what to do about them!

Customers need the ability to interact with alerts in a meaningful way, to leverage a user interface that can filter, display, multi-select, sort, etc. To multi-select and take actions, for example:

  • alert filter state is warning
  • clusters filter is label environment=development
  • multi-select this result set
  • take action to Silence the alerts!

Why does the customer need this? (List the business requirements)

Platform engineering (sys admin; SRE etc) must maintain the health of the cluster and ensure that the business applications are running stable. There might indeed be another tool and another team which focuses on the Application health itself, but for sure the platform team is interested to ensure that the platform is running optimally and all critical alerts are responded to.

As of TODAY, what the customer must do is perform alert management via CLI. This is tedious, ad-hoc, and error prone. see blog link

The requirements are:

  • filtering fleet alerts
  • multiselect for actions like silence
  • as a bonus, configuring alert forwarding will be amazing to have.

List any affected packages or components.

OCP console Observe dynamic plugin

ACM Multicluster observability (MCO operator)

Description

"In order to provide ACM with the same monitoring capabilities OCP has, we as the Observability UI Team need to allow the monitoring plugin to be installed and work in ACM environments."

Goals & Outcomes

Product Requirements:

  • Be able to install the monitoring plugin without CMO, use COO
  • Allow the monitoring plugin to use a different backend endpoint to fetch alerts, ACM has is own alert manager
  • Add a column to the alerts list to display the cluster that originated the alert
  • Include only the alerting parts which include the alerts list, alert detail and silences

UX Requirements:

  • Align UX text and patterns between ACM concepts (hub cluster, spoke cluster, core operators) and current the monitoring plugin

Open Questions

  • Do the current monitoring plugin and the ACM monitoring plugin need to coexist in a cluster?
  • Do we need to connect to a different prometheus/thanos or is just a different alert manager?

Background

In order to enable/disable features for monitoring in different OpenShift flavors, the monitoring plugin should support feature flags

Outcomes

  • The monitoring plugin with the Go backend can be deployed with CMO and the image is built correctly from the ART team

Placeholder feature for ccx-ocp-core maintenance tasks.

This is epic tracks "business as usual" requirements / enhancements / bug fixing of Insights Operator.

 Description of problem:

Insights operator should replaces %s in https://console.redhat.com/api/gathering/v2/%s/gathering_rules error messages like the failed-to-bootstrap:

$ jq -r .content osd-ccs-gcp-ad-install.log | sed 's/\\n/\n/g' | grep 'Cluster operator insights'
time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights ClusterTransferAvailable is False with Unauthorized: failed to pull cluster transfer: OCM API https://api.openshift.com/api/accounts_mgmt/v1/cluster_transfers/?search=cluster_uuid+is+%REDACTED%27+and+status+is+%27accepted%27 returned HTTP 401: REDACTED"
time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights Disabled is False with AsExpected: "
time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights RemoteConfigurationAvailable is False with HttpStatus401: received HTTP 401 Unauthorized from https://console.redhat.com/api/gathering/v2/%s/gathering_rules"
time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights RemoteConfigurationValid is Unknown with NoValidationYet: "
time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights SCAAvailable is False with Unauthorized: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 401: REDACTED
level=info msg=Cluster operator insights ClusterTransferAvailable is False with Unauthorized: failed to pull cluster transfer: OCM API https://api.openshift.com/api/accounts_mgmt/v1/cluster_transfers/?search=cluster_uuid+is+%27REDACTED%27+and+status+is+%27accepted%27 returned HTTP 401: REDACTED
level=info msg=Cluster operator insights Disabled is False with AsExpected: 
level=info msg=Cluster operator insights RemoteConfigurationAvailable is False with HttpStatus401: received HTTP 401 Unauthorized from https://console.redhat.com/api/gathering/v2/%s/gathering_rules
level=info msg=Cluster operator insights RemoteConfigurationValid is Unknown with NoValidationYet: 
level=info msg=Cluster operator insights SCAAvailable is False with Unauthorized: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 401: REDACTED
level=info msg=Cluster operator insights UploadDegraded is True with NotAuthorized: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {\"errors\":[{\"meta\":{\"response_by\":\"gateway\"},\"detail\":\"UHC services authentication failed\",\"status\":401}]}

Version-Release number of selected component

Seen in 4.17 RCs. Also in this comment.

How reproducible

Unknown

Steps to Reproduce:

Unknown.

Actual results:

ClusterOperator conditions talking about https://console.redhat.com/api/gathering/v2/%s/gathering_rules

Expected results

URIs we expose in customer-oriented messaging to not have %s placeholders.

Additional detail

Seems like the template is coming in as conditionalGathererEndpoint here. Seems like insights-operator#964 introduced the %s, but I'm not finding the logic that's supposed to populate that placeholder.

Description of problem:

When the Insights Operator is disabled (as described in the docs here  or here), the RemoteConfigurationAvailable and RemoteConfigurationValid clusteroperator conditions are reporting the previous (before distabling the gathering) state (which might be Available=True and Valid=True).

 
Version-Release number of selected component (if applicable):

    

How reproducible:

    

Steps to Reproduce:

    1. Disable the data gathering in the Insights Operator followings the docs links above
    2. Watch the clusteroperator conditions with "oc get co insights -o json | jq .status.conditions"
    3.
    

Actual results:

    

Expected results:

    

Additional info:

    

Feature Overview

As a cluster-admin, I want to run update in discrete steps. Update control plane and worker nodes independently.
I also want to back-up and restore incase of a problematic upgrade.

 

Background:

This Feature is a continuation of https://issues.redhat.com/browse/OCPSTRAT-180.
Customers are asking for improvements to the upgrade experience (both over-the-air and disconnected). This is a feature tracking epics required to get that work done.Below is the list of done tasks.  

  1. OTA-700 Reduce False Positives (such as Degraded) 
  2. OTA-922 - Better able to show the progress made in each discrete step 
  3. [Cover by status command]Better visibility into any errors during the upgrades and documentation of what they error means and how to recover. 

Goals

  1. Have an option to do upgrades in more discrete steps under admin control. Specifically, these steps are: 
    • Control plane upgrade
    • Worker nodes upgrade
    • Workload enabling upgrade (i..e. Router, other components) or infra nodes
  2. An user experience around an end-2-end back-up and restore after a failed upgrade 
  3. MCO-530 - Support in Telemetry for the discrete steps of upgrades 

References

Epic Goal

  • Eliminate the gap between measured availability and Available=true

Why is this important?

  • Today it's not uncommon, even for CI jobs, to have multiple operators which blip through either Degraded=True or Available=False conditions
  • We should assume that if our CI jobs do this then when operating in customer environments with higher levels of chaos things will be even worse
  • We have had multiple customers express that they've pursued rolling back upgrades because the cluster is telling them that portions of the cluster are Degraded or Unavailable when they're actually not
  • Since our product is self-hosted, we can reasonably expect that the instability that we experience on our platform workloads (kube-apiserver, console, authentication, service availability), will also impact customer workloads that run exactly the same way: we're just better at detecting it.

Scenarios

  1. In all of the following, assume standard 3 master 0 worker or 3 master 2+ worker topologies
  2. Add/update CI jobs which ensure 100% Degraded=False and Available=True for the duration of upgrade
  3. Add/update CI jobs which measure availability of all components which are not explicitly defined as non-HA (ex: metal's DHCP server is singleton)
  4. Address all identified issues

Acceptance Criteria

  • openshift/enhancements CONVENTIONS outlines these requirements
  • CI - Release blocking jobs include these new/updated tests
  • Release Technical Enablement - N/A if we do this we should need no docs
  • No outstanding identified issues

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. Clayton, David, and Trevor identified many issues early in 4.8 development but were unable to ensure all teams addressed them that list is in this query, teams will be asked to address everything on this list as a 4.9 blocker+ bug and we will re-evaluate status closer to 4.9 code freeze to see which may be deferred to 4.10
    https://bugzilla.redhat.com/buglist.cgi?columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_severity%2Ctarget_release%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate&f1=longdesc&f2=cf_environment&j_top=OR&list_id=12012976&o1=casesubstring&o2=casesubstring&query_based_on=ClusterOperator%20conditions&query_format=advanced&v1=should%20not%20change%20condition%2F&v2=should%20not%20change%20condition%2F

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • DEV - Tests in place
  • DEV - No outstanding failing tests
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

BU Priority Overview

Enable installation and lifecycle support of OpenShift 4 on Oracle Cloud Infrastructure (OCI) Bare metal

Goals

  • Validating OpenShift on OCI baremetal to make it officially supported. 
  • Enable installation of OpenShift 4 on OCI bare metal using Assisted Installer.
  • Provide published installation instructions for how to install OpenShift on OCI baremetal
  • OpenShift 4 on OCI baremetal can be updated that results in a cluster and applications that are in a healthy state when update is completed.
  • Telemetry reports back on clusters using OpenShift 4 on OCI baremetal for connected OpenShift clusters (e.g. platform=external or none + some other indicator to know it's running on OCI baremetal).

Use scenarios

  • As a customer, I want to run OpenShift Virtualization on OpenShift running on OCI baremetal.
  • As a customer, I want to run Oracle BRM on OpenShift running OCI baremetal.

Why is this important

  • Customers who want to move from on-premises to Oracle cloud baremetal
  • OpenShift Virtualization is currently only supported on baremetal

Requirements

 

Requirement Notes
OCI Bare Metal Shapes must be certified with RHEL It must also work with RHCOS (see iSCSI boot notes) as OCI BM standard shapes require RHCOS iSCSI to boot (OCPSTRAT-1246)
Certified shapes: https://catalog.redhat.com/cloud/detail/249287
Successfully passing the OpenShift Provider conformance testing – this should be fairly similar to the results from the OCI VM test results. Oracle will do these tests.
Updating Oracle Terraform files  
Making the Assisted Installer modifications needed to address the CCM changes and surface the necessary configurations. Support Oracle Cloud in Assisted-Installer CI: MGMT-14039

 

RFEs:

  • RFE-3635 - Supporting Openshift on Oracle Cloud Infrastructure(OCI) & Oracle Private Cloud Appliance (PCA)

OCI Bare Metal Shapes to be supported

Any bare metal Shape to be supported with OCP has to be certified with RHEL.

From the certified Shapes, those that have local disks will be supported. This is due to the current lack of support in RHCOS for the iSCSI boot feature. OCPSTRAT-749 is tracking adding this support and remove this restriction in the future.

As of Aug 2023 this excludes at least all the Standard shapes, BM.GPU2.2 and BM.GPU3.8, from the published list at: https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#baremetalshapes 

Assumptions

  • Pre-requisite: RHEL certification which includes RHEL and OCI baremetal shapes (instance types) has successfully completed.

 

 

 

 
 

Feature goal (what are we trying to solve here?)

Please describe what this feature is going to do.

DoD (Definition of Done)

Please describe what conditions must be met in order to mark this feature as "done".

Does it need documentation support?

If the answer is "yes", please make sure to check the corresponding option.

Feature origin (who asked for this feature?)

  • A Customer asked for it

    • Name of the customer(s)
    • How many customers asked for it?
    • Can we have a follow-up meeting with the customer(s)?

 

  • A solution architect asked for it

    • Name of the solution architect and contact details
    • How many solution architects asked for it?
    • Can we have a follow-up meeting with the solution architect(s)?

 

  • Internal request

    • Who asked for it?

 

  • Catching up with OpenShift

Reasoning (why it’s important?)

  • Please describe why this feature is important
  • How does this feature help the product?

Competitor analysis reference

  • Do our competitors have this feature?
    • Yes, they have it and we can have some reference
    • No, it's unique or explicit to our product
    • No idea. Need to check

Feature usage (do we have numbers/data?)

  • We have no data - the feature doesn’t exist anywhere
  • Related data - the feature doesn’t exist but we have info about the usage of associated features that can help us
    • Please list all related data usage information
  • We have the numbers and can relate to them
    • Please list all related data usage information

Feature availability (why should/shouldn't it live inside the UI/API?)

  • Please describe the reasoning behind why it should/shouldn't live inside the UI/API
  • If it's for a specific customer we should consider using AMS
  • Does this feature exist in the UI of other installers?

To make iSCSI work, a secondary VNIC must be configured during discovery, and when the machine reboots on core OS. The configuration is almost the same for discovery and Core OS.

Currently, we have one script owned by Red Hat for discovery, and a custom manifest owned by Oracle for CoreOS configuration.

I think this configuration should be owned by Oracle because the network configuration depends on OCI API. Also, we need this script to be the same is order to ensure that the configuration applied on discovery will be the same when the machine reboots on Core OS. Finally, if a customer has a specific need, they won't be able to tailor the configuration to their needs easily, as they would have to use the REST API of the assisted service.

My suggestion is to ask Oracle to drop the configuration script in their metadata service using Oracle's terraform template. On Red Hat side, we would have to pull this script on the node, and execute it thanks to a systemd unit. The same would be done from the custom manifest provided by Oracle.

Feature goal (what are we trying to solve here?)

During 4.15, the OCP team is working on allowing booting from iscsi. Today that's disabled by the assisted installer. The goal is to enable that for ocp version >= 4.15 when using OCI external platform.

DoD (Definition of Done)

iscsi boot is enabled for ocp version >= 4.15 both in the UI and the backend. 

When booting from iscsi, we need to make sure to add the `rd.iscsi.firmware=1 ip=ibft` kargs during install to enable iSCSI booting.

Does it need documentation support?

yes

Feature origin (who asked for this feature?)

  • A Customer asked for it

    • Oracle

Reasoning (why it’s important?)

  • In OCI there are bare metal instances with iscsi support and we want to allow customers to use it{}

The secondary VNIC must be configured manually in OCI, a script must be injected in the discovery ISO to configure it.

  PR https://github.com/openshift/assisted-service/pull/6257 must be adapted to be used along external platform.

Since we ensure that the iscsi network is not the default route, the PR above will ensure that automatically select the subnet used by the default route.

Feature Overview (aka. Goal Summary)  

Support network isolation and multiple primary networks (with the possibility of overlapping IP subnets) without having to use Kubernetes Network Policies.

Goals (aka. expected user outcomes)

  • Provide a configurable way to indicate that a pod should be connected to a unique network of a specific type via its primary interface.
  • Allow networks to have overlapping IP address space.
  • The primary network defined today will remain in place as the default network that pods attach to when no unique network is specified.
  • Support cluster ingress/egress traffic for unique networks, including secondary networks.
  • Support for ingress/egress features where possible, such as:
    • EgressQoS
    • EgressService
    • EgressIP
    • Load Balancer Services

Requirements (aka. Acceptance Criteria):

  • Support for 10,000 namespaces
  •  

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Design Document

Use Cases (Optional):

  • As an OpenStack or vSphere/vCenter user, who is migrating to OpenShift Kubernetes, I want to guarantee my OpenStack/vSphere tenant network isolation remains intact as I move into Kubernetes namespaces.
  • As an OpenShift Kubernetes user, I do not want to have to rely on Kubernetes Network Policy and prefer to have native network isolation per tenant using a layer 2 domain.
  • As an OpenShift Network Administrator with multiple identical application deployments across my cluster, I require a consistent IP-addressing subnet per deployment type. Multiple applications in different namespaces must always be accessible using the same, predictable IP address.

Questions to Answer (Optional):

  •  

Out of Scope

  • Multiple External Gateway (MEG) Support - support will remain for default primary network.
  • Pod Ingress support - support will remain for default primary network.
  • Cluster IP Service reachability across networks. Services and endpoints will be available only within the unique network.
  • Allowing different service CIDRs to be used in different networks.
  • Localnet will not be supported initially for primary networks.
  • Allowing multiple primary networks per namespace.
  • Allow connection of multiple networks via explicit router configuration. This may be handled in a future enhancement.
  • Hybrid overlay support on unique networks.

Background

OVN-Kubernetes today allows multiple different types of networks per secondary network: layer 2, layer 3, or localnet. Pods can be connected to different networks without discretion. For the primary network, OVN-Kubernetes only supports all pods connecting to the same layer 3 virtual topology.

As users migrate from OpenStack to Kubernetes, there is a need to provide network parity for those users. In OpenStack, each tenant (analog to a Kubernetes namespace) by default has a layer 2 network, which is isolated from any other tenant. Connectivity to other networks must be specified explicitly as network configuration via a Neutron router. In Kubernetes the paradigm is the opposite; by default all pods can reach other pods, and security is provided by implementing Network Policy.

Network Policy has its issues:

  • it can be cumbersome to configure and manage for a large cluster
  • it can be limiting as it only matches TCP, UDP, and SCTP traffic
  • large amounts of network policy can cause performance issues in CNIs

With all these factors considered, there is a clear need to address network security in a native fashion, by using networks per user to isolate traffic instead of using Kubernetes Network Policy.

Therefore, the scope of this effort is to bring the same flexibility of the secondary network to the primary network and allow pods to connect to different types of networks that are independent of networks that other pods may connect to.

Customer Considerations

  •  

Documentation Considerations

  •  

Interoperability Considerations

Test scenarios:

  • E2E upstream and downstream jobs covering supported features across multiple networks.
  • E2E tests ensuring network isolation between OVN networked and host networked pods, services, etc.
  • E2E tests covering network subnet overlap and reachability to external networks.
  • Scale testing to determine limits and impact of multiple unique networks.

Feature Overview (aka. Goal Summary)  

Crun is GA as non default since OCP 4.14 . We want to make it as default in 4.18 while still supporting runc as non-default 

Benefits of Crun is covered here https://github.com/containers/crun 

 

FAQ.:  https://docs.google.com/document/d/1N7tik4HXTKsXS-tMhvnmagvw6TE44iNccQGfbL_-eXw/edit

***Note -> making Crun default does not means we will remove the support for runc nor we have any plans in foreseeable future to do that  

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • ...

Why is this important?

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

Move to using the upstream Cluster API (CAPI) in place of the current implementation of the Machine API for standalone Openshift

prerequisite work Goals completed in OCPSTRAT-1122
{}Complete the design of the Cluster API (CAPI) architecture and build the core operator logic needed for Phase-1, incorporating the assets from different repositories to simplify asset management.

Phase 1 & 2 covers implementing base functionality for CAPI.

Background, and strategic fit

  • Initially CAPI did not meet the requirements for cluster/machine management that OCP had the project has moved on, and CAPI is a better fit now and also has better community involvement.
  • CAPI has much better community interaction than MAPI.
  • Other projects are considering using CAPI and it would be cleaner to have one solution
  • Long term it will allow us to add new features more easily in one place vs. doing this in multiple places.

Acceptance Criteria

There must be no negative effect to customers/users of the MAPI, this API must continue to be accessible to them though how it is implemented "under the covers" and if that implementation leverages CAPI is open

Epic Goal

  • As we prepare to move over to using Cluster API (CAPI) we need to make sure that we have the providers in place to work with this. This Epic is to track the tech preview of the provider for Azure

Why is this important?

  • What are the benefits to the customer, or to us, that make this worth
    doing? Fulfills a critical need for a customer? Improves
    supportability/debuggability? Improves efficiency/performance? This
    section is used to help justify the priority of this item vs other things
    we can do.

Drawbacks

  • Reasons we should consider NOT doing this such as: limited audience for
    the feature, feature will be superceded by other work that is planned,
    resulting feature will introduce substantial administrative complexity or
    user confusion, etc.

Scenarios

  • Detailed user scenarios that describe who will interact with this
    feature, what they will do with it, and why they want/need to do that thing.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement
    details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
    Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub
    Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As an OpenShift engineer I want the CAPI Providers repositories to use the new generator tool so that they can independently generate CAPI Provider transport ConfigMaps

Background

Once the new CAPI manifests generator tool is ready, we want to make use of that directly from the CAPI Providers repositories so we can avoid storing the generated configuration centrally and independently apply that based on the running platform.

Steps

  • Install new CAPI manifest generator as a go `tool` to all the CAPI provider repositories
  • Setup a make target under the `/openshift/Makefile` to invoke the generator. Make it output the manifests under `/openshift/manifests`
  • Make sure `/openshift/manifests` is mapped to `/manifests` in the openshift/Dockerfile, so that the files are later picked up by CVO
  • Make sure the manifest generation works by triggering a manual generation
  • Check in the newly generated transport ConfigMap + Credential Requests (to let them be applied by CVO)

Stakeholders

  • <Who is interested in this/where did they request this>

Definition of Done

  • CAPI manifest generator tool is installed 
  • Docs
  • <Add docs requirements for this card>
  • Testing
  • <Explain testing that will be added>

Goal

This goals of this features are:

  • As part of a Microsoft guideline/requirement for implementing ARO HCP, we need to design a shared-ingress to kube-apiserver because MSFT has internal restrictions on IPv4 usage.  

Background

Given Microsoft's constraints on IPv4 usage, there is a pressing need to optimize IP allocation and management within Azure-hosted environments.

 

Interoperability Considerations

  • Impact: Which versions will be impacted by the changes?
  • Test Scenarios: Must test across various network and deployment scenarios to ensure compatibility and scale (perf/scale)

There's currently multiple ingress strategies we support for hosted cluster service endpoints (kas, nodePort, router...).
In a context of uncertainty about what use cases would be more critical to support, we initially exposed this in a flexible API that enables to potentially choose any combination of ingress strategies and endpoints.
ARO has internal restrictions on IPv4 usage. Because of this, to simplify the above and to be more cost effective in terms of infra we'd want to have a common shared ingress solution for all hosted clusters fleet.

As a management cluster owner I want to make sure the shared ingress is resilient to cluster failures

User Story:

Currently the SharedIngress controller waits for a HostedCluster to exist before creating the Service/LoadBalancer of the shared-ingress.

The controller should create the Service/LoadBalancer even 

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Feature Overview 

Introduce the CAPI provider for bare metal in OpenShift as an alternative and long term replacement of MAPI for managing nodes and clusters.

Goals

Technology Preview release, introducing the current implementation of the Cluster API Provider Metal3 (CAPM3) into OpenShift.

https://github.com/metal3-io/cluster-api-provider-metal3 

Goal

Our goal is to be able to deploy baremetal clusters using Cluster API in Openshift. 

Upstream

Metal3, our upstream community, already provides a CAPI provider, and our aim is to bring it downstream. 

Other

We will collaborate with the Cluster Infrastructure team on points of integration as needed.

Scope questions

  • Changes in Ironic?
    • No
  • Changes in Metal3?
    • Bringing in downstream-only MAPI changes to upstream CAPI
  • Changes in OpenShift?
    • Create and set up the CAPI repo
    • Bring in any useful changes from MAPI
  • Spec/Design/Enhancements?
    • Not for this
    • But any follow-up work (replacing BM MAPI with CAPI or doing Hypershift) likely will
  • Dependencies on other teams?
    • Maybe ART?

Feature Overview

Firmware (BIOS) updates and attributes configuration from OpenShift is key in O-RAN clusters. While can do it on day 1, customers need to set firmware attributes to hosts that have already been deployed and are part of a cluster.

This feature adds the capability of updating firmware attributes and updating the firmware image for hosts in deployed clusters.

As part of demoing our integration with hardware vendors, we need to show the ability to reconfigure already provisioned hosts: modify their BIOS settings and, in the future, do firmware upgrades. The initial demo will be concentrated on BIOS settings. The demo is expected to be based on 4.15 and to use unmerged patches since 4.15 is closed for feature development. The path to productization will be determined as an outcome of the demo.

The assumed end result is an ability to run firmware upgrades and update BIOS settings for hosts that are already provisioned without fully deprovisioning them. The hosts will still be rebooted, so some external orchestrator (a human or ZTP) will need to drain the nodes first.

Feature Overview (aka. Goal Summary)  

  • With this next-gen OLM GA release (graduated from ‘Tech Preview’), customers can: 
    • discover collections of k8s extension/operator contents released in the FBC format with richer visibility into their release channels, versions, update graphs, and the deprecation information (if any) to make informed decisions about installation and/or update them.
    • install a k8s extension/operator declaratively and potentially automate with GitOps to ensure predictable and reliable deployments.
    • update a k8s extension/operator to a desired target version or keep it updated within a specific version range for security fixes without breaking changes.
    • remove a k8s extension/operator declaratively and entirely including cleaning up its CRDs and other relevant on-cluster resources (with a way to opt out of this coming up in a later release).
  • To address the security needs of 30% of our customers who run clusters in disconnected environments, the GA release will include cluster extension lifecycle management functionality for offline environments.
  • [Tech Preview] (Cluster)Extension lifecycle management can handle runtime signature validation for container images to support OpenShift’s integration with the rising Sigstore project for secure validation of cloud-native artifacts,

Goals (aka. expected user outcomes)

1. Pre-installation:

  • Customers can access a collection of k8s extension contents from a set of default catalogs leveraging the existing catalog images shipped with OpenShift (in the FBC format) with the new Catalog API from the OLM v1 GA release.
  • With the new GAed Catalog API, customers get richer package content visibility in their release channels, versions, update graphs, and the deprecation information (if any) to help make informed decisions about installation and/or update.
  • With the new GAed Catalog API, customers can render the catalog content in their clusters with fewer resources in terms of CPU and memory usage and faster performance.
  • Customers can filter the available packages based on the package name and see the relevant information from the metadata shipped within the package. 

2. Installation:

  • Customers using a ServiceAccount with sufficient permissions can install a k8s extension/operator with a desired target version or the latest version within a specific version range (from the associated channel) to get the latest security fixes.
  • Customers can easily automate the installation flow declaratively with GitOps to ensure predictable and reliable deployments.
  • Customers get protection from having two conflicting k8s extensions/operators owning the same API objects, i.e., no conflicting ownership, ensuring cluster stability.
  • Customers can access the* metadata of the installed k8s extension/operator to see essential information such as its provided APIs, example YAMLs of its provided APIs, descriptions, infrastructure features, valid subscriptions, etc.

3. Update:

  • Customers can see what updates are available for their k8s extension/operators in the form of immediate target versions and the associated update channels.
  • Customers can trigger the update of a k8s extension/operator with a desired target version or the latest version within a specific version range (from the associated channel) to get the latest security fixes.
  • Customers get protection from workload or k8s extension/operator breakage due to CustomResourceDefinition (CRD) being upgraded to a backward incompatible version during an update.
  • During OpenShift cluster update, customers* get Informed when installed k8s extensions/operators ** do not support the next OpenShift version *(when annotated by the package author/provider).  Customers must update those k8s extensions/operators to a newer/compatible version before OLM unblocks the OpenShift cluster update. 

4. Uninstallation/Deletion:

  • Customers can cleanly remove an installed k8s extension/operator including deleting CustomResourceDefinitions (CRDs), custom resource objects (CRs) of the CRDs, and other relevant resources to revert the cluster to its original state before the installation declaratively.

5. Disconnected Environments for High-Security Workloads:

  • Approximately 30% of our customers prioritize high security by running their clusters in internet-disconnected environments, especially for mission-critical production workloads. To benefit these users, our supported GA release needs to include cluster extension lifecycle management functionality that functions within these disconnected environments.

6. [Tech Preview] Signature Validation for Secure Workflows:

  • The Red Hat-sponsored Sigstore project is gaining traction in the Kubernetes community, aiming to simplify the signing of cloud-native artifacts. OpenShift leverages Sigstore tooling to enable scalable and flexible signature validation, including support for disconnected environments. This functionality will be available as a Tech Preview in 4.17 and is targeted for General Availability (GA) Tech Preview Phase 2 in the upcoming 4.18 release. To fully support this integration as a Tech Preview release, the (cluster)extension lifecycle management needs to (be prepared to) handle runtime validation of Sigstore signatures for container images.

Requirements (aka. Acceptance Criteria):

All the expected user outcomes and the acceptance criteria in the engineering epics are covered.

Background

OLM: Gateway to the OpenShift Ecosystem

Operator Lifecycle Manager (OLM) has been a game-changer for OpenShift Container Platform (OCP) 4.  Since its launch in 2019, OLM has fostered a rich ecosystem, expanding from a curated set of 25 operators to over 100 officially supported Red Hat operators and hundreds more from certified ISVs and the community.

OLM empowers users to manage diverse technologies with ease, including ACM, ACS, Quay, GitOps, Pipelines, Service Mesh, Serverless, and Virtualization.  It has also facilitated the introduction of groundbreaking operators for entirely new workloads, like Nvidia GPU, PTP, Windows Machine Config, SR-IOV networking, and more.  Today, a staggering 91% of our connected customers leverage OLM's capabilities.

OLM v0: A Stepping Stone

While OLM v0 has been instrumental, it has limitations.  The API design, not fully GitOps-friendly or entirely declarative, presents a steeper learning curve due to its complexity.  Furthermore, OLM v0 was designed with the assumption of namespace-scoped CRDs (Custom Resource Definitions), allowing for independent operator installations and parallel versions within a single cluster.  However, this functionality never materialized in core Kubernetes, and OLM v0's attempt to simulate it has introduced limitations and bugs.

The Operator Framework Team: Building the Future

The Operator Framework team is the cornerstone of the OpenShift ecosystem.  They build and manage OLM, the Operator SDK, operator catalog formats, and tooling (opm, file-based catalogs).  Their work directly impacts how operators are developed, packaged, delivered, and managed by users and SRE teams on OpenShift clusters.

A Streamlined Future with OLM v1

The Operator Framework team has undergone significant restructuring to focus on the next generation of OLM – OLM v1.  This transition includes moving the Operator SDK to a feature-complete state with ongoing maintenance for compatibility with the latest Kubernetes and controller-runtime libraries.  This strategic shift allows the team to dedicate resources to completely revamping OLM's API and management concepts for catalog content delivery.  

Leveraging learnings and customer feedback since OCP 4's inception, OLM v1 is designed to be a major overhaul, and it will be shipped as a Generally Available (GA) feature in OpenShift 4.17.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

1. Pre-installation:

  • [GA release] Docs provide instructions on how to add Red Hat-provided Operator catalogs with the pull secret for catalogs hosted on a secure registry.
  • [GA release] Docs provide instructions on how to discover the Operator packages from a catalog.
  • [GA release] Docs provide instructions on how to query and inspect the metadata of Operator bundles and find feasible ones to be installed with the OLM v1.

2. Installation:

  • [GA release] Docs provide instructions on how to use a ServiceAccount with sufficient permissions to install a k8s extension/operator with a desired target version or the latest version within a specific version range to get the latest security fixes.
  • [GA release] Docs provide instructions on how to automate the installation flow declaratively with GitOps to ensure predictable and reliable deployments.
  • [GA release] Docs mention the OLM v1’s protection from having two conflicting k8s extensions/operators owning the same API objects, i.e., no conflicting ownership, ensuring cluster stability.
  • [GA release] Docs provide instructions on how to access the metadata of the installed k8s extension/operator to see essential information such as its provided APIs, example YAMLs of its provided APIs, descriptions, infrastructure features, valid subscriptions, etc.
  • [GA release] Docs explain how to create RBACs from a CRD to grant cluster users access to the installed k8s extension/operator's provided APIs.

3. Update:

  • [GA release] Docs provide instructions on how to see what updates are available for their k8s extension/operators in the form of immediate target versions and the associated update channels.
  • [GA release] Docs provide instructions on how to trigger the update of a k8s extension/operator with a desired target version or the latest version within a specific version range to get the latest security fixes.
  • [GA release] Docs mention OLM v1’s protection from workload or k8s extension/operator breakage due to CustomResourceDefinition (CRD) being upgraded to a backward incompatible version during an update.
  • [GA release] Docs mention OLM v1 will block the OpenShift cluster update if installed k8s extensions/operators do not support the next OpenShift version (when annotated by the package author/provider).  Provide instructions on how to find and update to a newer/compatible version before OLM unblocks the OpenShift cluster update.

4. Uninstallation/Deletion:

  • [GA release] Docs provide instructions on how to cleanly remove an installed k8s extension/operator including deleting CustomResourceDefinitions (CRDs), custom resource objects (CRs) of the CRDs, and other relevant resources.
  • [GA release] Docs provide instructions to verify the cluster has been reverted to its original state after uninstalling a k8s extension/operator

Relevant upstream CNCF OLM v1 requirements, engineering brief, and epics:

1. Pre-installation:

2. Installation:

3. Update:

4. Uninstallation/Deletion:

Relevant documents:

 

NOTE: All features will be tech-preview in the first release and then will graduate to GA next release or when it is ready for GA.

Epic Goal

  • OLM V1 supports disconnected Environments for High-Security Workloads

Why is this important?

  • Approximately 30% of our customers prioritize high security by running their clusters in internet-disconnected environments, especially for mission-critical production workloads. To benefit these users, our supported GA release needs to include cluster extension lifecycle management functionality that functions within these disconnected environments.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

https://docs.google.com/document/d/18m-OG0PN8-jjjgGT33WNujzmj_1B2Tqoqd-bVKX4CkE/edit?usp=sharing 

  • Many operators write the MaxOCPVersion field in their bundle metadata. OLM v1 needs to support the same MaxOCPVersion workflow, where OLM blocks a cluster upgrade when that version is set.
  • Outside the scope of this epic, but in a future iteration, we should also respect MinKubeVersion (and potentially support MaxKubeVersion?)

Why is this important?

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>
  1. cluster-olm-operator watches clusterextensions
  2. cluster-olm-operator queries downstream-only helm chart metadata in the release secrets of each installed operator
  3. cluster-olm-operator sets Upgradeable=False with the appropriate reason and message when the maxocpversion is the current cluster version

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Cluster Infrastructure owned components should be running on Kubernetes 1.29
  • This includes
    • The cluster autoscaler (+operator)
    • Machine API operator
      • Machine API controllers for:
        • AWS
        • Azure
        • GCP
        • vSphere
        • OpenStack
        • IBM
        • Nutanix
    • Cloud Controller Manager Operator
      • Cloud controller managers for:
        • AWS
        • Azure
        • GCP
        • vSphere
        • OpenStack
        • IBM
        • Nutanix
    • Cluster Machine Approver
    • Cluster API Actuator Package
    • Control Plane Machine Set Operator

Why is this important?

  • ...

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. ...

Open questions::

  1. ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository

Epic Goal

  • The goal of this epic is to upgrade all OpenShift and Kubernetes components that MCO uses to v1.29 which will keep it on par with rest of the OpenShift components and the underlying cluster version.

Why is this important?

  • Uncover any possible issues with the openshift/kubernetes rebase before it merges.
  • MCO continues using the latest kubernetes/OpenShift libraries and the kubelet, kube-proxy components.
  • MCO e2e CI jobs pass on each of the supported platform with the updated components.

Acceptance Criteria

  • All stories in this epic must be completed.
  • Go version is upgraded for MCO components.
  • CI is running successfully with the upgraded components against the 4.18/master branch.

Dependencies (internal and external)

  1. ART team creating the go 1.31 image for upgrade to go 1.31.
  2. OpenShift/kubernetes repository downstream rebase PR merge.

Open questions::

  1. Do we need a checklist for future upgrades as an outcome of this epic?-> yes, updated below.

Done Checklist

  • Step 1 - Upgrade go version to match rest of the OpenShift and Kubernetes upgraded components.
  • Step 2 - Upgrade Kubernetes client and controller-runtime dependencies (can be done in parallel with step 3)
  • Step 3 - Upgrade OpenShift client and API dependencies
  • Step 4 - Update kubelet and kube-proxy submodules in MCO repository
  • Step 5 - CI is running successfully with the upgraded components and libraries against the master branch.

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/4561

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Epic Goal*

Drive the technical part of the Kubernetes 1.31 upgrade, including rebasing openshift/kubernetes repositiry and coordination across OpenShift organization to get e2e tests green for the OCP release.

 
Why is this important? (mandatory)

OpenShift 4.18 cannot be released without Kubernetes 1.31

 
Scenarios (mandatory) 

  1.  

 
Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

PRs:

Retro: Kube 1.31 Rebase Retrospective Timeline (OCP 4.18)

Retro recording: https://drive.google.com/file/d/1htU-AglTJjd-VgFfwE3z_dH5tKXT1Tes/view?usp=drive_web

Description of problem:

Given 2 images with different names, but same layers, "oc image mirror" will only mirror 1 of them. For example:

$ cat images.txt
quay.io/openshift/community-e2e-images:e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS quay.io/bertinatto/test-images:e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS
quay.io/openshift/community-e2e-images:e2e-31-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS quay.io/bertinatto/test-images:e2e-31-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS

$ oc image mirror -f images.txt
quay.io/
  bertinatto/test-images
    manifests:
      sha256:298dcd808e27fbf96614e4c6f06730f22964dce41dcdc7bf21096c42411ba773 -> e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS
  stats: shared=0 unique=0 size=0B

phase 0:
  quay.io bertinatto/test-images blobs=0 mounts=0 manifests=1 shared=0

info: Planning completed in 2.6s
sha256:298dcd808e27fbf96614e4c6f06730f22964dce41dcdc7bf21096c42411ba773 quay.io/bertinatto/test-images:e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS
info: Mirroring completed in 240ms (0B/s)    

Version-Release number of selected component (if applicable):

4.18    

How reproducible:

Always    

Steps to Reproduce:

    1.
    2.
    3.
    

Actual results:

Only one of the images were mirrored.

Expected results:

Both images should be mirrored.     

Additional info:

    

TechPreview clusters are unable to bootstrap because kube-apiserver fails to start with the following error:

E0827 20:29:22.653501 1 run.go:72] "command failed" err="group version resource.k8s.io/v1alpha2 that has not been registered"

This happens because, in Kubernetes 1.31, the group version resource.k8s.io/v1alpha2 was removed and replaced with resource.k8s.io/v1alpha3. This is part of the DynamicResourceAllocation feature, which is currently TechPreview.

After discussing this with the team, we decided that the best approach is to modify the cluster-kube-apiserver-operator to start the kube-apiserver with the correct group version based on the Kubernetes version being used.

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Goal:
Update team owned repositories to Kubernetes v1.31

?? is the 1.31 freeze
?? is the 1.31 GA

Problem:<please update links for 1.31>
The following repository must be rebased onto the latest version of Kubernetes:

  1.  oc: https://github.com/openshift/oc/pull/1877

The following repositories should be rebased onto the latest version of Kubernetes:

  1. cluster-kube-controller-manager operator: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/816
  2. cluster-policy-controller: https://github.com/openshift/cluster-policy-controller/pull/156 
  3. cluster-kube-scheduler operator: https://github.com/openshift/cluster-kube-scheduler-operator/pull/547
  4. secondary-scheduler-operator: https://github.com/openshift/secondary-scheduler-operator/pull/225
  5. cluster-capacity: https://github.com/openshift/cluster-capacity/pull/97
  6.  run-once-duration-override-operator: https://github.com/openshift/run-once-duration-override-operator/pull/68
  7.  run-once-duration-override: https://github.com/openshift/run-once-duration-override/pull/36
  8.  cluster-openshift-controller-manager-operator: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/368 
  9.  openshift-controller-manager: https://github.com/openshift/openshift-controller-manager/pull/345 
  10.  cli-manager-operator: https://github.com/openshift/cli-manager-operator/pull/358
  11.  cli-manager: https://github.com/openshift/cli-manager/pull/144
  12. cluster-kube-descheduler-operator: https://github.com/openshift/cluster-kube-descheduler-operator/pull/384
  13. descheduler:

Entirely remove dependencies on k/k repository inside oc.

Why is this important:

  • Customers demand we provide the latest stable version of Kubernetes. 
  • The rebase and upstream participation represents a significant portion of the Workloads team's activity.

 
 
 
 

 

Template:

Networking Definition of Planned

Epic Template descriptions and documentation

Epic Goal

Why is this important?

Planning Done Checklist

The following items must be completed on the Epic prior to moving the Epic from Planning to the ToDo status

  • Priority+ is set by engineering
  • Epic must be Linked to a +Parent Feature
  • Target version+ must be set
  • Assignee+ must be set
  • (Enhancement Proposal is Implementable
  • (No outstanding questions about major work breakdown
  • (Are all Stakeholders known? Have they all been notified about this item?
  • Does this epic affect SD? {}Have they been notified{+}? (View plan definition for current suggested assignee)
    1. Please use the “Discussion Needed: Service Delivery Architecture Overview” checkbox to facilitate the conversation with SD Architects. The SD architecture team monitors this checkbox which should then spur the conversation between SD and epic stakeholders. Once the conversation has occurred, uncheck the “Discussion Needed: Service Delivery Architecture Overview” checkbox and record the outcome of the discussion in the epic description here.
    2. The guidance here is that unless it is very clear that your epic doesn’t have any managed services impact, default to use the Discussion Needed checkbox to facilitate that conversation.

Additional information on each of the above items can be found here: Networking Definition of Planned

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement
    details and documents.

...

Dependencies (internal and external)

1.

...

Previous Work (Optional):

1. …

Open questions::

1. …

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

As a customer of self managed OpenShift or an SRE managing a fleet of OpenShift clusters I should be able to determine the progress and state of an OCP upgrade and only be alerted if the cluster is unable to progress. Support a cli-status command and status-API which can be used by cluster-admin to monitor the progress. status command/API should also contain data to alert users about potential issues which can make the updates problematic.

Feature Overview (aka. Goal Summary)  

Here are common update improvements from customer interactions on Update experience

  1. Show nodes where pod draining is taking more time.
    Customers have to dig deeper often to find the nodes for further debugging. 
    The ask has been to bubble up this on the update progress window.
  2. oc update status ?
    From the UI we can see the progress of the update. From oc cli we can see this from "oc get cvo"  
     But the ask is to show more details in a human-readable format.

    Know where the update has stopped. Consider adding at what run level it has stopped.
     
    oc get clusterversion
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    
    version   4.12.0    True        True          16s     Working towards 4.12.4: 9 of 829 done (1% complete)
    

Documentation Considerations

Update docs for UX and CLI changes

Reference : https://docs.google.com/presentation/d/1cwxN30uno_9O8RayvcIAe8Owlds-5Wjr970sDxn7hPg/edit#slide=id.g2a2b8de8edb_0_22

Epic Goal*

Add a new command `oc adm upgrade status` command which is backed by an API.  Please find the mock output of the command output attached in this card.

Why is this important? (mandatory)

  • From the UI we can see the progress of the update. Using OC CLI we can see some of the information using "oc get clusterversion" but the output is not readable and it is a lot of extra information to process. 
  • Customer as asking us to show more details in a human-readable format as well provide an API which they can use for automation.

Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1.  

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic. 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - 
  • Documentation -
  • QE - 
  • PX - 
  • Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing - Tests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Other 

As an OTA engineer,
I would like to make sure the node in a single-node cluster is handled correctly in the upgrade-status command.

Context:
According to the discussion with the MCO team,
the node is in MCP/master but not worker.
This card is to make sure that the node are displayed that way too. My feeling is that the current code probably does the job already. In that case, we should add test coverage for the case to avoid regression in the future.

AC:

We utilize MCO annotations to determine whether a node is degraded or unavailable, and we solely source the Reason annotation to put into the insight. Many common cases are not covered by this, especially the unavailable ones: nodes can be cordoned, have a condition like DiskPressure, be in the process of termination etc. Not sure whether our code or something like MCO should provide it, but captured this as a card for now.

Description

During an upgrade, once control plane is successfully updated, status items related to that part of the upgrade cease to be relevant, and therefore we can either hide them entirely, or we can show a simplified version of them. The relevant sections are Control plane and Control plane nodes.

Current state:

An update is in progress for 28m42s: Working towards 4.14.1: 700 of 859 done (81% complete), waiting on network

= Control Plane =
...
Completion:      91%

Improvement opportunities

1. Inconsistent info: CVO message says "700 of 859 done (81% complete)" but control plane section says "Completion: 91%"
2. Unclear measure of completion: CVO message counts manifest applied and control plane section says "Completion: 91%" which counts upgraded COs. Both messages do not state what they count. Manifest count is an internal implementation detail which users likely do not understand. COs are less so, but we should be more clear in what the completion means.
3. We could take advantage of this line and communicate progress with more details

Definition of Done

We'll only remove CVO message once the rest of the output functionally covers it, so the inconsistency stays until OTA-1154. Otherwise:

= Control Plane =
...
Completion:      91% (30 operators upgraded, 1 upgrading, 2 waiting)

Upgraded operators are COs that updated its version, no matter its conditions
Upgrading operators are COs that havent updated its version and are Progressing=True
Waiting operators are COs that havent updated its version and are Progressing=False

Feature Overview

In the initial delivery of CoreOS Layering, it is required that administrators provide their own build environment to customize RHCOS images. That could be a traditional RHEL environment or potentially an enterprising administrator with some knowledge of OCP Builds could set theirs up on-cluster.

The primary virtue of an on-cluster build path is to continue using the cluster to manage the cluster. No external dependency, batteries-included.

On-cluster, automated RHCOS Layering builds are important for multiple reasons:

  • One-click/one-command upgrades of OCP are very popular. Many customers may want to make one or just a few customizations but also want to keep that simplified upgrade experience. 
  • Customers who only need to customize RHCOS temporarily (hotfix, driver test package, etc) will find off-cluster builds to be too much friction for one driver.
  • One of OCP's virtues is that the platform and OS are developed, tested, and versioned together. Off-cluster building breaks that connection and leaves it up to the user to keep the OS up-to-date with the platform containers. We must make it easy for customers to add what they need and keep the OS image matched to the platform containers.

Goals & Requirements

  • The goal of this feature is primarily to bring the 4.14 progress (OCPSTRAT-35) to a Tech Preview or GA level of support.
  • Customers should be able to specify a Containerfile with their customizations and "forget it" as long as the automated builds succeed. If they fail, the admin should be alerted and pointed to the logs from the failed build.
    • The admin should then be able to correct the build and resume the upgrade.
  • Intersect with the Custom Boot Images such that a required custom software component can be present on every boot of every node throughout the installation process including the bootstrap node sequence (example: out-of-box storage driver needed for root disk).
  • Users can return a pool to an unmodified image easily.
  • RHEL entitlements should be wired in or at least simple to set up (once).
  • Parity with current features – including the current drain/reboot suppression list, CoreOS Extensions, and config drift monitoring.

This work describes the tech preview state of On Cluster Builds. Major interfaces should be agreed upon at the end of this state.

 

As a cluster admin of user provided infrastructure,
when I apply the machine config that opts a pool into On Cluster Layering,
I want to also be able to remove that config and have the pool revert back to its non-layered state with the previously applied config.
 
As a cluster admin using on cluster layering,
when an image build has failed,
I want it to retry 3 times automatically without my intervention and show me where to find the log of the failure.

 
As a cluster admin,
when I enable On Cluster Layering,
I want to know that the builder image I am building with is stable and will not change unless I change it
so that I keep the same API promises as we do elsewhere in the platform.

 

To test:

As a cluster admin using on cluster layering,
when I try to upgrade my cluster and the Cluster Version Operator is not available,
I want the upgrade operation to be blocked.

As a cluster admin,
when I use a disconnected environment,
I want to still be able to use On Cluster Layering.

As a cluster admin using On Cluster layering,
When there has been config drift of any sort that degrades a node and I have resolved the issue,
I want to it to resync without forcing a reboot.

As a cluster admin using on cluster layering,
when a pool is using on cluster layering and references an internal registry
I want that registry available on the host network so that the pool can successfully scale up 
(MCO-770, MCO-578, MCO-574 )

As a cluster admin using on cluster layering,
when a pool is using on cluster layering and I want to scale up nodes,
the nodes should have the same config as the other nodes in the pool.

Maybe:

Entitlements: MCO-1097, MCO-1099

Not Likely:

As a cluster admin using on cluster layering,
when I try to upgrade my cluster,
I want the upgrade operation to succeed at the same rate as non-OCL upgrades do.

Currently, it is not possible for cluster admins to revert from a pool that is opted into on-cluster builds and layered MachineConfig updates. See https://issues.redhat.com/browse/OCPBUGS-16201 for details around what happens.

It is worth mentioning that this is mostly an issue for UPI (user provided infrastructure) / bare metal users of OpenShift. For IPI cases in AWS / GCP / Azure / et. al., one can simply delete the node and the machine, which will cause the Machine API to provision a fresh node to replace it, e.g.:

 

#!/bin/bash

node_name="$1"
node_name="${node_name/node\//}"
machine_id="$(oc get "node/$node_name" -o jsonpath='{.metadata.annotations.machine\.openshift\.io/machine}')"
machine_id="${machine_id/openshift-machine-api\//}"
oc delete --wait=false "machine/$machine_id" -n openshift-machine-api
oc delete --wait=false "node/$node_name"

 

Done When

  • The MCD can revert from a node from on-cluster builds / layered MachineConfigs into the legacy behavior.
  • Or we've determined that the above is either infeasible or undeisrable.

As an OpenShift cluster admin, I would like to try out on-cluster layering (OCL) to better understand how it works, how to set it up, and how to use it. To that end, a quick-start guide for what I need to do to get started as well as a troubleshooting guide would be indispensable.

 

Done When:

Feature Overview (aka. Goal Summary)  

ETCD backup API was delivered behind a feature gate in 4.14. This feature is to complete the work for allowing any OCP customer to benefit from the automatic etcd backup capability.

Goals (aka. expected user outcomes)

The ability for OCP users to benefit from the features

Requirements (aka. Acceptance Criteria):

Complete work to auto-provision internal PVCs when using the local PVC backup option. (right now, the user needs to create PVC before enabling the service)

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both both
Classic (standalone cluster) yes
Hosted control planes no
Multi node, Compact (three node), or Single node (SNO), or all all
Connected / Restricted Network both
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) all
Operator compatibility N/A
Backport needed (list applicable versions) N/A
UI need (e.g. OpenShift Console, dynamic plugin, OCM) N/A
Other (please specify) N/A

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

Epic Goal*

Provide automated backups of etcd saved locally on the cluster on Day 1 with no additional config from the user.

 
Why is this important? (mandatory)

The current etcd automated backups feature requires some configuration on the user's part to save backups to a user specified PersistentVolume.
See: https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/config/v1alpha1/types_backup.go#L46

Before the feature can be shipped as GA, we would require the capability to save backups automatically by default without any configuration. This would help all customers have an improved disaster recovery experience by always having a somewhat recent backup. 

 
Scenarios (mandatory) 

  • After a cluster is installed the etcd-operator should take etcd backups and save them to local storage.
  • The backups must be pruned according to a "reasonable" default retention policy so it doesn't exhaust local storage.
  • A warning alert must be generated upon failure to take backups.

Implementation details:
One issue we need to figure out during the design of this feature is how the current API might change as it is inherently tied to the configuration of the PVC name.
See:
https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/config/v1alpha1/types_backup.go#L99
and 
https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/operator/v1alpha1/types_etcdbackup.go#L44

Additionally we would need to figure out how the etcd-operator knows about the available space on local storage of the host so it can prune and spread backups accordingly.
 

Dependencies (internal and external) (mandatory)

Depends on changes to the etcd-operator and the tech preview APIs 

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - etcd team
  • Documentation - etcd docs team
  • QE - Sandeep Kundu
  • PX - 
  • Others -

Acceptance Criteria (optional)

Upon installing a tech-preview cluster backups must be saved locally and their status and path must be visible to the user e.g on the operator.openshift.io/v1 Etcd cluster object.

An e2e test to verify that the backups are being saved locally with some default retention policy.

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be “Release Pending” 

Feature Overview (aka. Goal Summary)

The objective is to create a comprehensive backup and restore mechanism for HCP OpenShift Virtualization Provider. This feature ensures both the HCP state and the worker node state are backed up and can be restored efficiently, addressing the unique requirements of KubeVirt environments.

Goals (aka. Expected User Outcomes)

  • Users will be able to backup and restore the KubeVirt HCP cluster, including both HCP state and worker node state.
  • Ensures continuity and reliability of operations after a restore, minimizing downtime and data loss.
  • Supports seamless re-connection of HCP to worker nodes post-restore.

Requirements (aka. Acceptance Criteria)

  • Backup of KubeVirt CSI infra PVCs
  • Backup of KubeVirt VMs + VM state + (possibly even network attachment definitions)
  • Backup of Cloud Provider KubeVirt Infra Load Balancer services (having IP addresses change here on the service could be problematic)
  • Backup of Any custom network policies associated with VM pods
  • Backup of VMs and state placed on External Infra

Use Cases (Optional)

  1. Disaster Recovery: In case of a disaster, the system can restore the HCP and worker nodes to the previous state, ensuring minimal disruption.
  2. Cluster Migration: Allows migration of hosted clusters across different management clusters/
  3. System Upgrades: Facilitates safe upgrades by providing a reliable restore point.

Out of Scope

  • Real-time synchronization of backup data.
  • Non-disruptive Backup and restore (ideal but not required)

Documentation Considerations

Interoperability Considerations

  • Impact on other projects like ACM/MCE vol-sync.
  • Test scenarios to validate interoperability with existing backup solutions.

The HCP team has delivered OADP backup and restore steps for the Agent and AWS provider here. We need to add the steps necessary to make these steps work for HCP KubeVirt clusters.

Requirements

  • Deliver backup/restore steps that reach feature parity with the documented agent and aws platforms
  • Ensure that kubevirt-csi and cloud-provider-kubevirt LBs can be backup and restored successfully
  • Ensure this works with external infra

 

Non Requirements

  • VMs do not need to be backed up to reach feature parity because the current aws/agent steps require the cluster to scale down to zero before backing up.

document this process in the upstream hypershift documentation.

  • Backup while HCP is live with active worker nodes (don't back up workers, but backup should not disrupt workers)
  • Greenfield restore (meaning previous HCP is removed), HCP nodes are re-created during the restore
  • Restore is limited to same mgmt cluster the HCP originated on

Feature Overview

  • oc-mirror by default leverages OCI 1.1 referrers or its fallback (tag-based discover) to discover related image signatures for any image that it mirrors
  • this feature is enabled by default and can be disabled globally
  • Optionally, oc-mirror can be configured to include other referring artifacts, e.g. SBOMs or in-toto attestations referenced by their OCI artifact media type

Goals

  • As part of OCPSTRAT-918 and OCPSTRAT-1245 we are introducing broad coverage in the OpenShift platform for signatures produced with the SigStore tooling, which allow for scalable and flexibly validation of the signatures, incl. offline environments
  • In order to enable offline verification, oc-mirror needs to detect whether any image that is in scope for its mirroring operation has one or more related SigStore signatures referring to, by using the OCI 1.1 referrers API or it's fallback, or cosigns tag naming convention for signatures and mirror those artifacts as well

Requirements

  • SigStore-style signature should be mirrored by default, but opt-out has to be available
  • The public key from Red Hat and the public Rekor key from Red Hat used to sign products images needs to be available offline 
  • SigStore-style attachments should optionally be able to be discovered and mirrored as well as an opt-in, the user should be able to supply a list of OCI media types they are interested in (e.g. text/spdx or application/vnd.cyclonedx for SBOMs)
Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

Background, and strategic fit

OpenShift is planning to ship all payload and layered product images signed consistently via cosign with OpenShift 4.17. oc-mirror should be able to leverage this to provide a seamless signature verification experience in an offline environment by automatically making all required signature artifacts available in the offline registry.

 

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

<your text here>

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

<your text here>

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

<enter general Feature acceptance here>

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

Overview

This task is really to ensure oc-mirror v2 has backward compatibility to what v1 was doing regarding signatures

Goal

Ensure the correct configmaps are generated and stored in a folder so that the user can deploy the related artifact/s to the cluster as in v1

Feature description

Oc-mirror v2 is focuses on major enhancements that include making oc-mirror faster and more robust and introduces caching as well as address more complex air-gapped scenarios. OC mirror v2 is a rewritten version with three goals: 

  • Manage complex air-gapped scenarios, providing support for the enclaves feature
  • Faster and more robust: introduces caching, it doesn’t rebuild catalogs from scratch
  • Improves code maintainability, making it more reliable and easier to add features, and fixes, and including a feature plugin interface

 

4.17 version of the delete functionality needs some improvements regarding:

  • CLID-196: should be able to delete operators previously mirrored using mirror to mirror
  • CLID-224: should not delete blobs that are shared with images that were not deleted.

Check if it is possible to delete operators using the delete command when the previous command was mirror to mirror. Probably it won't work because in mirror to mirror the cache is not updated.

It is necessary to find a solution for this scenario.

Feature Overview (aka. Goal Summary)  

Customers who deploy a large number of OpenShift on OpenStack clusters want to minimise the resource requirements of their cluster control planes.

Customers deploying RHOSO (OpenShift services for OpenStack, i.e. OpenStack control plane on bare metal OpenShift) already have a bare metal management cluster capable of serving Hosted Control Planes.

We should enable self-hosted (i.e. on-prem) Hosted Control Planes to serve Hosted Control Planes to OpenShift on OpenStack clusters, with a specific focus of serving Hosted Control Planes from the RHOSO management cluster.

Goals (aka. expected user outcomes)

As an enterprise IT department and OpenStack customer, I want to provide self-managed OpenShift clusters to my internal customers with minimum cost to the business.

As an internal customer of said enterprise, I want to be able to provision an OpenShift cluster for myself using the business's existing OpenStack infrastructure.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

TBD
 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

HyperShift should be able to deploy the minimum useful OpenShift cluster on OpenStack. This is the minimum requirement to be able to test it. It is not sufficient for GA.

This is a container Epic for tasks which we know need to be done for Tech Preview but which we don't intend to do now. It needs to be groomed before it is useful for planning.

When the management cluster runs on AWS, make sure we update the DNS record for *apps, so ingress can work out of the box.

Matthew Booth is worried about that feature that we added to pre-create a FIP and assign it to the Service object for router-default. This is indeed racy and could be problematic if another controller would take over that field as well, it'll create infinite loops and the result wouldn't be great for customers.

The idea is to remove that feature now and eventually add it back later when it's safer (e.g. feature added to the Ingress operator?). It's worth noting that core kubernetes has deprecated the loadBalancerIP field in the Service object, and it now works with annotations. Maybe we need to investigate that path.

We should not have to explicitly configure the location of the clouds.yaml file, since there is a list of well-known places where these can be found. We should also be able to configure the cloud used from the chosen clouds.yaml.

Right now, our pods are SingleReplica because to have multiple replicas we need more than one zone for nodes which translates into AZ in OpenStack. We need to figure that out.

We deprecated "DeploymentConfig" in-favor of "Deployment" in OCP 4.14

Now in 4.18  we want to make "Deployment " as default out of box that means customer will get Deployment when they install OCP 4.18 . 

Deployment Config will still be available in 4.18 as non default for user who still want to use it . 

FYI "DeploymentConfig" is tier 1 API in Openshift and cannot be removed from 4.x product 

Please Review this FAQ : https://docs.google.com/document/d/1OnIrGReZKpc5kzdTgqJvZYWYha4orrGMVjfP1fUpljY/edit#heading=h.oranye5nwtsy 

Epic Goal*

WRKLDS-695 was implemented to make the DC enabled through capability in 4.14. In order to prepare customers for migration to Deployments the capability got enabled by default. After 3 releases we need to reconsider whether disabling the capability by default is feasible.

More about capabilities in https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md#capability-sets.
 
Why is this important? (mandatory)

Disabling a capability by default make an OCP installation lighter. Less component running by default reduces a security risk/vulnerability surface.

 
Scenarios (mandatory) 

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

  1.  Users can still enable the capability in vanilla clusters. Existing cluster will keep the DC capability enabled during a cluster upgrade.

 
Dependencies (internal and external) (mandatory)

None

Contributing Teams(and contacts) (mandatory) 

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

  • Development - Workloads team
  • Documentation - Docs team
  • QE - Workloads QE team
  • PX - 
  • Others -

Acceptance Criteria (optional)

  • The DC capability is disabled by default in vanilla OCP installations
  • The DC capability can be enabled in a vanilla OCP installation
  • The DC capability is enabled after an upgrade in OCP clusters that have the capability already enabled before the upgrade
  • The DC capability is disabled after an upgrade OCP clusters that have the capability disabled before the upgrade

Drawbacks or Risk (optional)

None. The DC capability can be enabled if needed.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be "Release Pending" 

Before the DCs can be disabled by default all the relevant e2e relying on DCs need to be migrated to Deployments to maintain the same testing coverage.

Feature Overview (aka. Goal Summary)  

When using OpenShift in a mixed, multi-architecture environment some key details or checks or not always available. With this feature we will take a first pass at improving the UI/UX for customers as adoption of this configuration continues at pace.

Goals (aka. expected user outcomes)

The UI/UX experience should improved when being used in a mixed architecture OCP cluster

Requirements (aka. Acceptance Criteria):

  • check that only the relevant CSI drivers are deployed to the relevant architectures
  • Improve filtering/autodeterming arches in operatorhub
  • Console improvements, especially node views

<enter general Feature acceptance here>

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both Y
Classic (standalone cluster) Y
Hosted control planes Y
Multi node, Compact (three node), or Single node (SNO), or all Y
Connected / Restricted Network Y
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) All architectures
Operator compatibility n/a
Backport needed (list applicable versions) n/a
UI need (e.g. OpenShift Console, dynamic plugin, OCM) OpenShift Console
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

Epic Goal

  • Console improvements, especially node views

Why is this important?

  •  

Scenarios
1. …

Acceptance Criteria

  • (Enter a list of Acceptance Criteria unique to the Epic)

Dependencies (internal and external)
1. …

Previous Work (Optional):
1. …

Open questions::
1. …

Done Checklist

  • CI - For new features (non-enablement), existing Multi-Arch CI jobs are not broken by the Epic
  • Release Enablement: <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR orf GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - If the Epic is adding a new stream, downstream build attached to advisory: <link to errata>
  • QE - Test plans in Test Plan tracking software (e.g. Polarion, RQM, etc.): <link or reference to the Test Plan>
  • QE - Automated tests merged: <link or reference to automated tests>
  • QE - QE to verify documentation when testing
  • DOC - Downstream documentation merged: <link to meaningful PR>
  • All the stories, tasks, sub-tasks and bugs that belong to this epic need to have been completed and indicated by a status of 'Done'.

Feature Overview (aka. Goal Summary)  

As a product manager or business owner of OpenShift Lightspeed. I want to track who is using what feature of OLS and WHY. I also want to track the product adoption rate so that I can make decision about the product ( add/remove feature , add new investment )

Requirements (aka. Acceptance Criteria):

Notes:

Enable moniotring of OLS by defult when a user install OLS operator ---> check the box by defualt 

Users will have the ability to disable the monitoring by . ----> in check the box

 

Refer to this slack conversation :https://redhat-internal.slack.com/archives/C068JAU4Y0P/p1723564267962489 

 

Story

As a OLS developer, I want to the users to see the Operator recommended cluster monitoring box checked, so that the metrics are collected by default.

Acceptance Criteria

  • make the operator install UI to have the Operator recommended cluster monitoring box checked by default

Feature Overview (aka. Goal Summary)  

Add support to GCP N4 Machine Series to be used as Control Plane and Compute Nodes when deploying Openshift on Google Cloud

Goals (aka. expected user outcomes)

As a user, I want to deploy OpenShift on Google Cloud using N4 Machine Series for the Control Plane and Compute Node so I can take advantage of these new Machine types

Requirements (aka. Acceptance Criteria):

OpenShift can be deployed in Google Cloud using the new N4 Machine Series for the Control Plane and Compute Nodes

 

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  both
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all all
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Background

Google has made N4 Machine Series available on their cloud offering. These Machine Series use "hyperdisk-balanced" disk for the boot device that are not currently supported

Documentation Considerations

The documentation will be updated adding the new disk type that needs to be supported as part of this enablement. Also the N4 Machine Series will be added as tested Machine types for Google Cloud when deploying OpenShift

Epic Goal

Why is this important?

  • This is a new Machine Series Google has introduced that customers will use for their OpenShift deployments

Scenarios

  1. Deploy an OpenShift Cluster with both the Control Plane and Compute Nodes running on N4 GCP Machines

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

  1. https://issues.redhat.com/browse/CORS-3561

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo. Having a common repo will across drivers will ease maintenance burden.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).

As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both yes
Classic (standalone cluster) yes
Hosted control planes all
Multi node, Compact (three node), or Single node (SNO), or all all
Connected / Restricted Network all
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) all
Operator compatibility  
Backport needed (list applicable versions) no
UI need (e.g. OpenShift Console, dynamic plugin, OCM) no
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).

As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

N/A includes all the CSI operators Red Hat manages as part of OCP

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

This effort started with CSI operators that we included for HCP, we want to align all CSI operator to use the same approach in order to limit maintenance efforts.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

Not customer facing, this should not introduce any regression.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

No doc needed

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

N/A, it's purely tech debt / internal

Epic Goal*

Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.

 
Why is this important? (mandatory)

Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo.

 
Scenarios (mandatory) 

As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).

As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.

 
Dependencies (internal and external) (mandatory)

None, this can be done just by the storage team and independently on other operators / features.

Contributing Teams(and contacts) (mandatory) 

  • Development - 
  • QE - 

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be "Release Pending" 

Following the step described in the enhancement, we should do the following:

  • Move existing code from openstack-cinder-csi-driver-operator into a legacy directory in the csi-operator repository,
  • Add a Dockerfile for building the operator image from the new location,
  • Update openshift/release to build the image from new location,
  • Change ocp-build-data repository to ship image from new location, and
  • Coordinate merges in ocp-build-data and release repository

Once this is done, we can work towards rewriting the operator to take advantage of the new generator tooling used for existing migrated operators.

Epic Goal*

Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.

 
Why is this important? (mandatory)

Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo.

 
Scenarios (mandatory) 

As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).

As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.

 
Dependencies (internal and external) (mandatory)

None, this can be done just by the storage team and independently on other operators / features.

Contributing Teams(and contacts) (mandatory) 

  • Development - 
  • QE - 

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be "Release Pending" 

Epic Goal*

Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.

 
Why is this important? (mandatory)

Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo.

 
Scenarios (mandatory) 

As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).

As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.

Note: we do not plan to do any changes for HyperShift. The EFS CSI driver will still fully run in the guest cluster, including its control plane.

Dependencies (internal and external) (mandatory)

None, this can be done just by the storage team and independently on other operators / features.

Contributing Teams(and contacts) (mandatory) 

  • Development - 
  • QE - 

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.  

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

  • CI Testing -  Basic e2e automationTests are merged and completing successfully
  • Documentation - Content development is complete.
  • QE - Test scenarios are written and executed successfully.
  • Technical Enablement - Slides are complete (if requested by PLM)
  • Engineering Stories Merged
  • All associated work items with the Epic are closed
  • Epic status should be "Release Pending" 

Phase 3 Deliverable:

TBD

Epic Goal

  • To be refined based on initial feedback on GA

Why is this important?

  •  

Scenarios

  1. As a cluster admin, I want to reconfigure sudo without disrupting workloads.
  2. As a cluster admin, I want to update or reconfigure sshd and reload the service without disrupting workloads.
  3. As a cluster admin, I want to remove mirroring rules from an ICSP, ITMS, IDMS object without disrupting workloads because the scenario in which this might lead to non-pullable images at a undefined later point in time doesn't apply to me.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Feature Overview

Support mapping OpenShift zones to vSphere host groups, in addition to vSphere clusters.

When defining zones for vSphere administrators can map regions to vSphere datacenters and zones to vSphere clusters.

There are use cases where vSphere clusters have only one cluster construct with all their ESXi hosts but the administrators want to divide the ESXi hosts in host groups. A common example is vSphere stretched clusters, where there is only one logical vSphere cluster but the ESXi nodes are distributed across to physical sites, and grouped by site in vSphere host groups.

In order for OpenShift to be able to distribute its nodes on vSphere matching the physical grouping of hosts, OpenShift zones have to be able to map to vSphere host groups too.

Requirements{}

  • Users can define OpenShift zones mapping them to host groups at installation time (day 1)
  • Users can use host groups as OpenShift zones post-installation (day 2)

Epic Goal

Support mapping OpenShift zones to vSphere host groups, in addition to vSphere clusters.

When defining zones for vSphere administrators can map regions to vSphere datacenters and zones to vSphere clusters.

There are use cases where vSphere clusters have only one cluster construct with all their ESXi hosts but the administrators want to divide the ESXi hosts in host groups. A common example is vSphere stretched clusters, where there is only one logical vSphere cluster but the ESXi nodes are distributed across to physical sites, and grouped by site in vSphere host groups.

In order for OpenShift to be able to distribute its nodes on vSphere matching the physical grouping of hosts, OpenShift zones have to be able to map to vSphere host groups too.

Requirements{}

  • Users can define OpenShift zones mapping them to host groups at installation time (day 1)
  • Users can use host groups as OpenShift zones post-installation (day 2)

Feature Overview

Support in the IPI installer for OpenShift on vSphere to create the OpenShift node VMs with multiple NICs and subnets.

This is necessary when users want to have dedicated network links in the node VMs for storage or database for example, in addition to the service network link that we create now

Requirements

Users can specify multiple NICs for the OpenShift VMs that will be created for the  OpenShift cluster nodes with different subnets.

Epic Goal

Support in the IPI installer for OpenShift on vSphere to create the OpenShift node VMs with multiple NICs and subnets.

This is necessary when users want to have dedicated network links in the node VMs for storage or database for example, in addition to the service network link that we create now

Requirements

Users can specify multiple NICs for the OpenShift VMs that will be created for the  OpenShift cluster nodes with different subnets.

Description:

The machine config operator needs to be bumped to pick up the API change:

I0819 17:50:00.396986       1 machineconfig.go:87] ControllerConfig not found, creating new one
E0819 17:50:00.400599       1 machineconfig.go:90] Failed to create ControllerConfig: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.infra.spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

 

Acceptance Criteria:

The machine API is failing to render compute nodes when multiple NICs are configured:

Unable to apply 4.17.0-0.ci.test-2024-08-15-193100-ci-ln-igm0nhk-latest: ControllerConfig.mac
hineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.infra.spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules w
ere not checked because the object was invalid; correct the existing errors to complete validation]

Description:

Bump machine-api to pick up changes in openshift/api#2002.

Acceptance Criteria:

  • openshift/api#2002 is merged
  • openshift/library-go#1777 is merged
  • this PR is merged

issue created by splat-bot

{}USER STORY:{}

As an OpenShift provisioner, I want to provision a cluster in which nodes have multiple network adapters so that I can implement the desired network topology.

{}DESCRIPTION:{}

Customers have a need to provision nodes with multiple adapters in day 0. capv supports the ability to specify multiple adapters in its clone spec. The installer should be augmented to support additional NICs.

{}Required:{}

  •  

{}Nice to have:{}

...

{}ACCEPTANCE CRITERIA:{}

  • install-config.yaml is updated to allow multiple NICs
  • CI job testing an install with 2 network adapters
  • Validation of mutliple network adapters

{}ENGINEERING DETAILS:{}

Description:

The infrastructure spec validation needs to be updated to change the network count restriction to [10|https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=1-0.] 

 

When multiple NICs are enabled(the installer allows this?) bootstrapping fails with:

Aug 15 18:30:57 2.252.83.01.in-addr.arpa cluster-bootstrap[4889]: [#1673] failed to create some manifests:
Aug 15 18:30:57 2.252.83.01.in-addr.arpa cluster-bootstrap[4889]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : Infrastructure.config.openshift.io "cluster" is invalid: [spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

 

Acceptance Criteria:

  • API changes are tested in a payload along with MAPI and the installer

 

issue created by splat-bot

 

Feature Overview

Improve the cluster expansion with the agent workflow added in OpenShift 4.16 (TP) and OpenShift 4.17 (GA) with:

  • Caching RHCOS image for faster node addition, i.e. no extraction of image every time)
  • Add a single node with just one command, no need to write config files describing node
  • Support creating PXE artifacts 

Goals

Improve the user experience and functionality of the commands to add nodes to clusters using the image creation functionality.

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

A set of capabilities need to be added to the Hypershift Operator that will enable AWS Shared-VPC deployment for ROSA w/ HCP.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

Build capabilities into HyperShift Operator to enable AWS Shared-VPC deployment for ROSA w/ HCP.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

Antoni Segura Puimedon Please help with providing what Hypershift will need on the OCPSTRAT side.

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both (perhaps) both
Classic (standalone cluster)  
Hosted control planes yes
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) x86_64 and Arm
Operator compatibility  
Backport needed (list applicable versions) 4.14+
UI need (e.g. OpenShift Console, dynamic plugin, OCM) no (this is an advanced feature not being exposed via web-UI elements)
Other (please specify) ROSA w/ HCP

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Currently the same SG is used for both workers and VPC endpoint. Create a separate SG for the VPC endpoint and only open the ports necessary on each.

"Shared VPCs" are a unique AWS infrastructure design: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html

See prior work/explanations/etc here: https://issues.redhat.com/browse/SDE-1239

 

Summary is that in a Shared VPC environment, a VPC is created in Account A and shared to Account B. The owner of Account B wants to create a ROSA cluster, however Account B does not have permissions to create a private hosted zone in the Shared VPC. So they have to ask Account A to create the private hosted zone and link it to the Shared VPC. OpenShift then needs to be able to accept the ID of that private hosted zone for usage instead of creating the private hosted zone itself.

QE should have some environments or testing scripts available to test the Shared VPC scenario

 

The AWS endpoint controller in the CPO currently uses the control plane operator role to create the private link endpoint for the hosted cluster as well as the corresponding dns records in the hypershift.local hosted zone. If a role is created to allow it to create that vpc endpoint in the vpc owner's account, the controller would have to explicitly assume the role so it can create the vpc endpoint, and potentially a separate role for populating dns records in the hypershift.local zone.

The users would need to create a custom policy to enable this

Add the necessary API fields to support a Shared VPC infrastructure, and enable development/testing of Shared VPC support by adding the Shared VPC capability to the hypershift CLI.

Feature Overview

Console enhancements based on customer RFEs that improve customer user experience.

 

Goals

  • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

 

Requirements

  • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

 

Requirement Notes isMvp?
CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
Release Technical Enablement Provide necessary release enablement details and documents. YES

 

(Optional) Use Cases

This Section: 

  • Main success scenarios - high-level user stories
  • Alternate flow/scenarios - high-level user stories
  • ...

 

Questions to answer…

  • ...

 

Out of Scope

 

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

 

Assumptions

  • ...

 

Customer Considerations

  • ...

 

Documentation Considerations

Questions to be addressed:

  • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
  • Does this feature have doc impact?  
  • New Content, Updates to existing content,  Release Note, or No Doc Impact
  • If unsure and no Technical Writer is available, please contact Content Strategy.
  • What concepts do customers need to understand to be successful in [action]?
  • How do we expect customers will use the feature? For what purpose(s)?
  • What reference material might a customer want/need to complete [action]?
  • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
  • What is the doc impact (New Content, Updates to existing content, or Release Note)?
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

As a cluster admin I want to set a cluster wide setting for hiding the "Getting started resources" banner from Overview, for all the console users.

 

AC: 

  • Add new field to the console-operator's config, to its 'spec.customization' section, which would set the console. New field should be named 'GettingStartedBanner', which should be an enum, with states "Show" and "Hide".
  • By default the state should be "Enabled"
  • Pass the state variable to the console-config CM
  • Add e2e and integration test

 

RFE: https://issues.redhat.com/browse/RFE-4475

As a cluster admin I want to set a cluster wide setting for hiding the "Getting started resources" banner from Overview, for all the console users.

 

AC: 

  • Console will read the value of 'GettingStartedBannerState' on start and set it as a SERVER_FLAG. Based on the value it will render the "Getting started resources" banner
  • Add integration test

 

RFE: https://issues.redhat.com/browse/RFE-4475

Problem: ODC UX improvements based on customer RFEs that improve user experience.

Goal:

Why is it important?

Use cases:

  1. <case>

Acceptance criteria:

  1. Add dark/light mode support for the YAML editor, matching the console theme

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

As a user who is visually impaired, or a user who is out in the sun, when I switch the theme in the console to Light mode, then try to edit text files (e.g., the YAML configuration for a pod) using the web console, I want the editor to be in light theme.

Acceptance Criteria

  1. The CodeEditor component should change its base theme from vs-dark to vs-light when the console theme is changed from dark to light.
  2. Similarly, the console theme is changed from light to dark, the base theme for the monaco editor should change from vs-light to vs-dark.

Additional Details:

Feature Overview (aka. Goal Summary)  

This feature only covers the downstream MAPI work to Enable Capacity Blocks. 

Capacity Blocks is needed in managed OpenShift (ROSA with Hosted Control Planes) via CAPI. Once the HCP feature and OCM feature are completed then a Service Consumer can  use upstream CAPI to set Capacity reservations in  ROSA+HCP cluster.

https://docs.aws.amazon.com/en_us/AWSEC2/latest/UserGuide/capacity-blocks-using.html#capacity-blocks-purchase 

Feature Overview (Goal Summary)  

We aim to continue establishing a comprehensive testing strategy for Hosted Control Planes (HCP) that aligns with Red Hat’s support requirements and ensures customer satisfaction. This involves testing across various permutations, including providers, lifecycle, upgrades, and version compatibility. The testing must span management clusters, hubs, MCE, control planes, and nodepools, while coordinating across multiple QE teams to avoid duplication and inefficiencies. We aim to sustain an evolving testing matrix to meet product demands, especially as new versions and extended OCP lifecycles are introduced.

Goals (Expected User Outcomes)

  • Provide a scalable, systematic approach for testing HCP across multiple environments and scenarios.
  • Ensure coordination between all QE teams (ACM/MCE, HCP, KubeVirt, Agent) to avoid redundancies and inefficiencies in testing.
  • Establish a robust testing framework that can handle upgrades and version compatibility while maintaining compliance with Red Hat’s lifecycle policies.
  • Offer a clear view of coverage across different permutations of control planes and node pools.

 

Requirements (Acceptance Criteria)

  • Testing matrix covers all relevant permutations of management clusters, hubs, MCE, control planes, and node pools.
  • Use of representative sampling to ensure critical combinations are tested without unnecessary resource strain.
  • Ensure testing for upgrades includes fresh install scenarios to streamline coverage.
  • Automated processes in place to trigger relevant tests for new MCE builds or HCP updates.
  • Comprehensive tracking of QE teams’ coverage to avoid duplicated efforts.
  • Test execution time is optimized to reduce delays in delivery without compromising coverage.

 

Deployment Considerations

  • Self-managed, managed, or both: self-managed.
  • Classic (standalone cluster): No.
  • Hosted control planes: Yes.
  • Multi-node, Compact (three node), or Single node (SNO), or all: N/A.
  • Connected / Restricted Network: Yes.
  • Architectures: x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x).
  • Operator compatibility: Yes, ensure operator updates don't break testing workflows.
  • Backport needed: N/A
  • UI need:N/A
  • Other: N/A.

Use Cases (Optional)

 

See: https://docs.google.com/spreadsheets/d/1j8TjMfyCfEt8OzTgvrAG3tuC6WMweBh5ElzWu6oAvUw/edit?gid=0#gid=0 

  • Same hub multiple HCP Versions: Using the same managmenent/hub cluster (e.g., 4.15), to provision up to n+4 newer cluster versions
  • MCE ft. Management cluster compatibility. 
  • MCE ft. HCP versions compatibility
  • Upgrade Scenarios: Testing a management cluster upgrade from version 4.14 to 4.15, ensuring all connected node pools and control planes operate seamlessly.
  • Fresh Install Scenarios: Testing a new deployment with different node pool versions to ensure all configurations work correctly without requiring manual interventions.

Background

The HCP architecture introduces decoupled control planes and worker nodes, significantly increasing the number of testing permutations. Ensuring these scenarios are  tested is crucial to maintaining product quality, customer satisfaction, and stay compliant as an OpenShift form-factor.

 

Feature Overview

OCP 4 clusters still maintain pinned boot images. We have numerous clusters installed that have boot media pinned to first boot images as early as 4.1. In the future these boot images may not be certified by the OEM and may fail to boot on updated datacenter or cloud hardware platforms. These "pinned" boot images should be updateable so that customers can avoid this problem and better still scale out nodes with boot media that matches the running cluster version.

In phase 1 provided tech preview for GCP.

In phase 2, GCP support goes to GA and AWS goes to TP.

In phase 3, AWS support goes to GA and vsphere goes TP.

Requirements

Feature Overview (aka. Goal Summary)  

Review, refine and harden the CAPI-based Installer implementation introduced in 4.16

Goals (aka. expected user outcomes)

From the implementation of the CAPI-based Installer started with OpenShift 4.16 there is some technical debt that needs to be reviewed and addressed to refine and harden this new installation architecture.

Requirements (aka. Acceptance Criteria):

Review existing implementation, refine as required and harden as possible to remove all the existing technical debt

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Documentation Considerations

There should not be any user-facing documentation required for this work

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Continue to refine and harden aspects of CAPI-based Installs launched in 4.16

Why is this important?

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

Once a cloud provider uses CAPI by default, the feature gate it used becomes tech debt.

Acceptance Criteria:

Description of criteria:

  • openshift/api PR removing the feature gate
  • remove feature gate conditionals from the installer
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Epic Goal

  • This epic includes tasks the team would like to tackle to improve our process, QOL, CI. It may include tasks like updating the RHEL base image and vendored assisted-service.

Why is this important?

 

We need a place to add tasks that are not feature oriented.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

  1. ...

Open questions::

  1. ...

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story:

The agent installer does not require the infra-env id to be present in the claim to perform the authentication.

 

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

User Story:

 The agent installer does not require the infra-env id to be present in the claim to perform the authentication.

Acceptance Criteria:

Description of criteria:

  • Upstream documentation
  • Point 1
  • Point 2
  • Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

Feature Overview (aka. Goal Summary)

This feature aims to comprehensively refactor and standardize various components across HCP, ensuring consistency, maintainability, and reliability. The overarching goal to increase customer satisfaction by increasing speed to market and saving engineering budget by reducing incidents/bugs. This will be achieved by reducing technical debt, improving code quality, and simplifying the developer experience across multiple areas, including CLI consistency, NodePool upgrade mechanisms, networking flows, and more. By addressing these areas holistically, the project aims to create a more sustainable and scalable codebase that is easier to maintain and extend.

Goals (aka. Expected User Outcomes)

  • Unified Codebase: Achieve a consistent and unified codebase across different HCP components, reducing redundancy and making the code easier to understand and maintain.
  • Enhanced Developer Experience: Streamline the developer workflow by reducing boilerplate code, standardizing interfaces, and improving documentation, leading to faster and safer development cycles.
  • Improved Maintainability: Refactor large, complex components into smaller, modular, and more manageable pieces, making the codebase more maintainable and easier to evolve over time.
  • Increased Reliability: Enhance the reliability of the platform by increasing test coverage, enforcing immutability where necessary, and ensuring that all components adhere to best practices for code quality.
  • Simplified Networking and Upgrade Mechanisms: Standardize and simplify the handling of networking flows and NodePool upgrade triggers, providing a clear, consistent, and maintainable approach to these critical operations.

Requirements (aka. Acceptance Criteria)

  • Standardized CLI Implementation: Ensure that the CLI is consistent across all supported platforms, with increased unit test coverage and refactored dependencies.
  • Unified NodePool Upgrade Logic: Implement a common abstraction for NodePool upgrade triggers, consolidating scattered inputs and ensuring a clear, consistent upgrade process.
  • Refactored Controllers: Break down large, monolithic controllers into modular, reusable components, improving maintainability and readability.
  • Improved Networking Documentation and Flows: Update networking documentation to reflect the current state, and refactor network proxies for simplicity and reusability.
  • Centralized Logic for Token and Userdata Generation: Abstract the logic for token and userdata generation into a single, reusable library, improving code clarity and reducing duplication.
  • Enforced Immutability for Critical API Fields: Ensure that immutable fields within key APIs are enforced through proper validation mechanisms, maintaining API coherence and predictability.
  • Documented and Clarified Service Publish Strategies: Provide clear documentation on supported service publish strategies, and lock down the API to prevent unsupported configurations.

Use Cases (Optional)

  • Developer Onboarding: New developers can quickly understand and contribute to the HCP project due to the reduced complexity and improved documentation.
  • Consistent Operations: Operators and administrators experience a more predictable and consistent platform, with reduced bugs and operational overhead due to the standardized and refactored components.

Out of Scope

  • Introduction of new features or functionalities unrelated to the refactor and standardization efforts.
  • Major changes to user-facing commands or APIs beyond what is necessary for standardization.

Background

Over time, the HyperShift project has grown organically, leading to areas of redundancy, inconsistency, and technical debt. This comprehensive refactor and standardization effort is a response to these challenges, aiming to improve the project's overall health and sustainability. By addressing multiple components in a coordinated way, the goal is to set a solid foundation for future growth and development.

Customer Considerations

  • Minimal Disruption: Ensure that existing users experience minimal disruption during the refactor, with clear communication about any changes that might impact their workflows.
  • Enhanced Stability: Customers should benefit from a more stable and reliable platform as a result of the increased test coverage and standardization efforts.

Documentation Considerations

Ensure all relevant project documentation is updated to reflect the refactored components, new abstractions, and standardized workflows.

This overarching feature is designed to unify and streamline the HCP project, delivering a more consistent, maintainable, and reliable platform for developers, operators, and users.

Goal

As a dev I want the base code to be easier to read, maintain and test

Why is this important?

If devs are don't have a healthy dev environment the project will go and the business won't make $$

Scenarios

  1. ...

Acceptance Criteria

  • 80% unit tested code
  • No file > 1000 lines of code

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions:

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Technical Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

Goal

Refactor and modularize controllers and other components to improve maintainability, scalability, and ease of use.

User Story:

As a (user persona), I want to be able to:

  • As an external dev I want to be able to add new components to the CPO easily
  • As a core dev I want to feel safe when adding new components to the CPO
  • As a core dev I want to add new components to the CPO with our copy/pasting big chunks of code

Acceptance Criteria:

Context:
If you ever had to add or modify a component to the control plane operator the need for this becomes very obvious. There should be possible to only add components manifest through a gated interface.
Right now adding a new component requires copy/paste hundreds of lines of boilerplate and there's plenty of room for side effects. A dev need to manually remember to set the right config like AutomountServiceAccountToken false, topology opinions...

We should refactor support/config and all the consumers in the CPO to enforce components creation through audited and common signature/interfaces.
Adding a new component is only possible through this higher abstractions

More Details

  • If you ever had to add or modify a component to the control plane operator the need for this becomes very obvious. There should be possible o only add components manifest through a gated interface.
  • Right now adding a new component requires copy/paste hundreds of lines of boilerplane and there's plenty of room for side effects

Abstract away in a single place all the logic related to token and userdata secrets consuming the output of https://issues.redhat.com/browse/HOSTEDCP-1678
This should result in a single abstraction i.e. "Token" that expose a thin library e.g. Reconcile() and hide all details for token/userdata secrets lifecycle

Following up to abstracting pieces into cohesively units, capi is the next logic choice since there's many reconciliation business logic for it in the NodePool controller.
Goals:
All capi related logic is driven by a single abstraction/struct.
Almost full unit test coverage
Deeper refactor of the concrete implementation logic is left out of the scope for gradual test driven follow ups

As as dev I want to easily add and understand which input results in triggering a nodepool upgrade.

There's many scattered things that triggers nodepool rolling upgrade on change.
For code sustainability it'd be good to try to have a common abstraction that discovers all of them based on an input and return the authoritative hash for any targeted config version in time.
Related https://github.com/openshift/hypershift/pull/4057
https://github.com/openshift/hypershift/pull/3969#discussion_r1587198191

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

<your text here>

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

<your text here>

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

<enter general Feature acceptance here>

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

<your text here>

Epic Goal

  • Goal of this epic is to prepare the console codebase as well as dynamic plugins SDK. In order to do that we need to identify areas in console that need to be updated and issues which need to be fixed.

Why is this important?

  • Console as well as its dynamic plugins will need to support PF6 once its available in a stable version

Acceptance Criteria

  • Identity all the areas of code that need to be updated or fixed
  • Create stories which will address those updates and fixes

Open questions::

  1. Should we be removing PF4 as part of 4.16 ?

NOTE:

Nicole Thoen already started with crafting a technical debt impeding PF6 migrations document, which contains list of identified tech -debt items, deprecated components etc...

Locations

‎frontend/packages/console-app/src/components/‎

NavHeader.tsx [merged]

PDBForm.tsx (This should be a <Select>) [merged]

 

Acceptance Criteria:

  • Change the DropdownDeprecated component in NavHeader.tsx in favour of PF Select component.
  • Change the DropdownDeprecated component in OAuthConfigDetails.tsx in favour of PF Dropdown component.
  • Change the DropdownDeprecated component in PDBForm.tsx in favour of PF Select component.
  • Create a wrapper for these replacements, if necessary.
  • Update integration tests, if necessary.
  • Add an integration test to verify if the wrapper is accessible via keyboard.

 

DropdownDeprecated are replaced with latest components

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/dropdown

https://www.patternfly.org/components/menus/select

 

 

 

Locations

‎frontend/packages/console-shared/src/components/‎

GettingStartedGrid.tsx (has KebabToggleDeprecated)

 

Note

DropdownDeprecated is replaced with latest components

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/menu-toggle#plain-toggle-with-icon

https://www.patternfly.org/components/menus/dropdown

https://www.patternfly.org/components/menus/select

 

AC: Go though the mentioned files and swap the usage of DropdownDeprecated and KebabToggleDeprecated with PF components, based on their semantics (either Dropdown or Select components).

AC:

  • Replace ApplicationLauncher, ApplicationLauncherGroup, ApplicationLauncherItem, ApplicationLauncherSeparator with Dropdown and Menu components.
  • Update integration tests

 

PatternFly demo using Dropdown and Menu components

https://www.patternfly.org/components/menus/application-launcher/

 

 

 

NodeLogs.tsx (two) [merged]

PerspectiveDropdown.tsx (??? Can not locate this dropdown in the UI. Reached out to Christoph but didn't hear back.)

UserPreferenceDropdownField.tsx [merged]

ClusterConfigurationDropdownField.tsx (??? Can not locate this dropdown in the UI) Dead code

LanguageDropdown.tsx 

PerspectiveConfiguration.tsx (options have descriptions) [merged]

 

Acceptance Criteria

SelectDeprecated are replaced with latest Select component

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/select

 

AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.

 

 

resource-dropdown.tsx (checkbox, options have tooltips, grouped options, hasInlineFilter which is not supported in V6 Select, convert to Typeahead)

resource-log.tsx

filter-toolbar.tsx (grouped, checkbox select)

monitoring/dashboards/index.tsx  (checkbox select, hasInlineFilter which is not supported in V6 Select, convert to Typeahead) covered by https://issues.redhat.com/browse/ODC-7655

silence-form.tsx (Currently using DropdownDeprecated, should be using a Select)

timespan-dropdown.ts (Currently using DropdownDeprecated, should be using a Select) covered by https://issues.redhat.com/browse/ODC-7655

poll-interval-dropdown.tsx (Currently using DropdownDeprecated, should be using a Select) covered by https://issues.redhat.com/browse/ODC-7655

 

Note

SelectDeprecated are replaced with latest Select component

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/select

 

AC: Go though the mentioned files and swap the usage of Deprecated components with PF components, based on their semantics (either Dropdown or Select components).

 

multiselectdropdown.tsx (multiple typeahead with placeholder and noResultsFoundText) only used in packages/local-storage-operator moved to https://issues.redhat.com/browse/CONSOLE-4227

UtilizationDurationDropdown.tsx (checkbox select, plain toggle, with placeholder text)

SelectInputField.tsx  (uses most Select props) moved to https://issues.redhat.com/browse/ODC-7655

QueryBrowser.tsx  (Currently using DropdownDeprecated, should be using a Select)

 

Note

SelectDeprecated and SelectOptionDeprecated are replaced with latest Select component

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/select

 

AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.

Epic Goal

  • Goal of this epic is to prepare the console codebase as well as dynamic plugins SDK. In order to do that we need to identify areas in console that need to be updated and issues which need to be fixed.

Why is this important?

  • Console as well as its dynamic plugins will need to support PF6 once its available in a stable version

Acceptance Criteria

  • Identity all the areas of code that need to be updated or fixed
  • Create stories which will address those updates and fixes

Open questions::

  1. Should we be removing PF4 as part of 4.16 ?

NOTE:

Nicole Thoen already started with crafting a technical debt impeding PF6 migrations document, which contains list of identified tech -debt items, deprecated components etc...

monitoring/dashboards/index.tsx  (checkbox select, hasInlineFilter which is not supported in V6 Select, convert to Typeahead)

timespan-dropdown.ts (Currently using DropdownDeprecated, should be using a Select) 

poll-interval-dropdown.tsx (Currently using DropdownDeprecated, should be using a Select) 

SelectInputField.tsx  (uses most Select props)

 

`FilterSelect`, `VariableDropdown`, `TimespanDropdown`, and `IntervalDropdown`are the components that need to be updated; frontend/packages/dev-console/src/components/monitoring/MonitoringPage.tsx is the only valid instance usage of `MonitoringDashboardsPage` as web/src/components/alerting.tsx is orphaned.

 

Note

SelectDeprecated are replaced with latest Select component

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/select

 

AC: Go though the mentioned files and swap the usage of Deprecated components with PF components, based on their semantics (either Dropdown or Select components).

 

Locations

‎frontend/packages/pipelines-plugin/src/components/‎

PipelineQuickSearchVersionDropdown.tsx (Currently using DropdownDeprecated, should be using a Select)

PipelineMetricsTimeRangeDropdown.tsx (Currently using DropdownDeprecated, should be using a Select)

 

Note

DropdownDeprecated are replaced with latest Select components

https://www.patternfly.org/components/menus/menu

https://www.patternfly.org/components/menus/select

 

AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.

Feature Overview (aka. Goal Summary)  

An elevator pitch (value statement) that describes the Feature in a clear, concise way.  Complete during New status.

<your text here>

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.

<your text here>

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

<enter general Feature acceptance here>

 

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both  
Classic (standalone cluster)  
Hosted control planes  
Multi node, Compact (three node), or Single node (SNO), or all  
Connected / Restricted Network  
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
Operator compatibility  
Backport needed (list applicable versions)  
UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
Other (please specify)  

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

<your text here>

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

<your text here>

Out of Scope

High-level list of items that are out of scope.  Initial completion during Refinement status.

<your text here>

Background

Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

<your text here>

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

<your text here>

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs.  If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.

<your text here>

Interoperability Considerations

Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact?  What interoperability test scenarios should b