Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
These features were completed when this image was assembled
Today we expose two main APIs for HyperShift, namely `HostedCluster` and `NodePool`. We also have metrics to gauge adoption by reporting the # of hosted clusters and nodepools.
But we are still missing other metrics to be able to make correct inference about what we see in the data.
Today we have hypershift_hostedcluster_nodepools as a metric exposed to provide information on the # of nodepools used per cluster.
Additional NodePools metrics such as hypershift_nodepools_size and hypershift_nodepools_available_replicas are available but not ingested in Telemetry.
In addition to knowing how many nodepools per hosted cluster, we would like to expose the knowledge of the nodepool size.
This will help inform our decision making and provide some insights on how the product is being adopted/used.
The main goal of this epic is to show the following NodePools metrics on Telemeter, ideally as recording rules:
The implementation involves creating updates to the following GitHub repositories:
similar PRs:
https://github.com/openshift/hypershift/pull/1544
https://github.com/openshift/cluster-monitoring-operator/pull/1710
Graduce the new PV access mode ReadWriteOncePod as GA.
Such PV/PVC can be used only in a single pod on a single node compared to the traditional ReadWriteOnce access mode, where such a PV/PVC can be used on a single node by many pods.
The customers can start using the new ReadWriteOncePod access mode.
This new mode allows customers to provision and attach PV and get the guarantee that it cannot be attached to another local pod.
This new mode should support the same operations as regular ReadWriteOnce PVs therefore it should pass the regression tests. We should also ensure that this PV can't be accessed by another local-to-node pod.
As a user I want to attach a PV to a pod and ensure that it can't be accessed by another local pod.
We are getting this feature from upstream as GA. We need to test it and fully support it.
Check that there is no limitations / regression.
Remove tech preview warning. No additional change.
N/A
Support upstream feature "New RWO access mode " in OCP as GA, i.e. test it and have docs for it.
This is continuation of STOR-1171 (Beta/Tech Preview in 4.14), now we just need to mark it as GA and remove all TechPreview notes from docs.
Currently the maximum number of snapshots per volume in vSphere CSI is set to 3 and cannot be configured. Customers find this default limit too low and are asking us to make this setting configurable.
Maximum number of snapshot is 32 per volume
Customers can override the default (three) value and set it to a custom value.
Make sure we document (or link) the VMWare recommendations in terms of performances.
https://kb.vmware.com/s/article/1025279
The setting can be easily configurable by the OCP admin and the configuration is automatically updated. Test that the setting is indeed applied and the maximum number of snapshots per volume is indeed changed.
No change in the default
As an OCP admin I would like to change the maximum number of snapshots per volumes.
Anything outside of
The default value can't be overwritten, reconciliation prevents it.
Make sure the customers understand the impact of increasing the number of snapshots per volume.
https://kb.vmware.com/s/article/1025279
Document how to change the value as well as a link to the best practice. Mention that there is a 32 hard limit. Document other limitations if any.
N/A
Epic Goal*
The goal of this epic is to allow admins to configure the maximum number of snapshots per volume in vSphere CSI and find an way how to add such extension to OCP API.
Possible future candidates:
Why is this important? (mandatory)
Currently the maximum number of snapshots per volume in vSphere CSI is set to 3 and cannot be configured. Customers find this default limit too low and are asking us to make this setting configurable.
Maximum number of snapshot is 32 per volume
https://kb.vmware.com/s/article/1025279
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
1) Write OpenShift enhancement (STOR-1759)
2) Extend ClusterCSIDriver API (TechPreview) (STOR-1803)
3) Update vSphere operator to use the new snapshot options (STOR-1804)
4) Promote feature from Tech Preview to Accessible-by-default (STOR-1839)
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Configure the maximum number of snapshot to a higher value. Check the config has been updated and verify that the maximum number of snapshots per volume maps to the new setting value.
Drawbacks or Risk (optional)
Setting this config setting with a high value can introduce performances issues. This needs to be documented.
https://kb.vmware.com/s/article/1025279
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Support the SMB CSI driver through an OLM operator as tech preview. The SMB CSI driver allows OCP to consume SMB/CIFS storage with a dynamic CSI driver. This enables customers to leverage their existing storage infrastructure with either SAMBA or Microsoft environment.
https://github.com/kubernetes-csi/csi-driver-smb
Customers can start testing connecting OCP to their backend exposing CIFS. This can allow to consume net new volume or consume existing data produced outside OCP.
Driver already exists and is under the storage SIG umbrella. We need to make sure the driver is meeting OCP quality requirement and if so develop an operator to deploy and maintain it.
Review and clearly define all driver limitations and corner cases.
Review the different authentication method.
Windows containers support.
Only storage class login/password authentication method. Other methods can be reviewed and considered for GA.
Customers are expecting to consume storage and possibly existing data via SMB/CIFS. As of today vendor's drivers support is really limited in terms of CIFS support whereas this protocol is widely used on premise especially with MS/AD customers.
Need to understand what customers expect in terms of authentication.
How to extend this feature to windows containers.
Document the operator and driver installation, usage capabilities and limitations.
Future: How to manage interoperability with windows containers (not for TP)
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
The Azure File CSI driver currently lacks cloning and snapshot restore features. The goal of this feature is to support the cloning feature as technology preview. This will help support snapshots restore in a future release
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
As a user I want to easily clone Azure File volume by creating a new PVC with spec.DataSource referencing origin volume.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
This feature only applies to OCP running on Azure / ARO and File CSI.
The usual CSI cloning CI must pass.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | yes |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | all although SNO is rare on Azure |
Connected / Restricted Network | both |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86 |
Operator compatibility | Azure File CSI operator |
Backport needed (list applicable versions) | No |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | No |
Other (please specify) | ship downstream images with from forked azcopy |
High-level list of items that are out of scope. Initial completion during Refinement status.
Restoring snapshots are out of scope for now.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Update the CSI capability matrix and any language that mentions that Azure File CSI does not support cloning.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Not impact but benefit Azure / ARO customers.
Epic Goal*
Azure File added support for cloning volumes which relies on azcopy command upstream. We need to fork azcopy so we can build and ship downstream images with from forked azcopy. AWS driver does the same with efs-utils.
Upstream repo: https://github.com/Azure/azure-storage-azcopy
NOTE: using snapshots as a source is currently not supported: https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/7591a06f5f209e4ef780259c1631608b333f2c20/pkg/azurefile/controllerserver.go#L732
Why is this important? (mandatory)
This is required for adding Azure File cloning feature support.
Scenarios (mandatory)
1. As a user I want to easily clone Azure File volume by creating a new PVC with spec.DataSource referencing origin volume.
Dependencies (internal and external) (mandatory)
1) Write OpenShift enhancement (STOR-1757)
2) Fork upstream repo (STOR-1716)
3) Add ART definition for OCP Component (STOR-1755)
4) Use the new image as base image for Azure File driver (STOR-1794)
5) Ensure e2e cloning tests are in CI (STOR-1818)
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Downstream Azure File driver image must include azcopy and cloning feature must be tested.
Drawbacks or Risk (optional)
No risks detected so far.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Once the azure-file clone is supported, we should add clone test in our pre-submit/periodic CI.
The "pvcDataSource: true" should be added.
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
In order to remove IPI/UPI support for Alibaba Cloud in OpenShift (currently Tech Preview, see also OCPSTRAT-1042), we need to provide an alternate method for Alibaba Cloud customers to spin up an OpenShift cluster. To that end, we want customers to use Assisted Installer with platform=none (and later platform=external) to bring up their OpenShift clusters.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Self-managed |
Classic (standalone cluster) | Classic |
Hosted control planes | N/A |
Multi node, Compact (three node), or Single node (SNO), or all | Multi-node |
Connected / Restricted Network | Connected for OCP 4.16 (Future: restricted) |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_x64 |
Operator compatibility | This should be the same for any operator on platform=none |
Backport needed (list applicable versions) | OpenShift 4.16 onwards |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | Hybrid Cloud Console changes needed |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
For OpenShift 4.16, we want to remove IPI support (currently Tech Preview) for Alibaba Cloud support (OCPSTRAT-1042). Instead we want it to make it Assisted Installer (Tech Preview) with the agnostic platform for Alibaba Cloud in OpenShift 4.16 (OCPSTRAT-1149).
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Previous UPI-based installation doc: Alibaba Cloud Red Hat OpenShift Container Platform 4.6 Deployment Guide
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
As an Alibaba Cloud customer, I want to create an OpenShift cluster with the Assisted Installer using the agnostic platform (platform=none) for connected deployments.
<!--
Please make sure to fill all story details here with enough information so
that it can be properly sized and is immediately actionable. Our Definition
of Ready for user stories is detailed in the link below:
https://docs.google.com/document/d/1Ps9hWl6ymuLOAhX_-usLmZIP4pQ8PWO15tMksh0Lb_A/
As much as possible, make sure this story represents a small chunk of work
that could be delivered within a sprint. If not, consider the possibility
of splitting it or turning it into an epic with smaller related stories.
Before submitting it, please make sure to remove all comments like this one.
-->
{}USER STORY:{}
<!--
One sentence describing this story from an end-user perspective.
-->
As a [type of user], I want [an action] so that [a benefit/a value].
{}DESCRIPTION:{}
<!--
Provide as many details as possible, so that any team member can pick it up
and start to work on it immediately without having to reach out to you.
-->
{}Required:{}
...
{}Nice to have:{}
...
{}ACCEPTANCE CRITERIA:{}
<!--
Describe the goals that need to be achieved so that this story can be
considered complete. Note this will also help QE to write their acceptance
tests.
-->
{}ENGINEERING DETAILS:{}
<!--
Any additional information that might be useful for engineers: related
repositories or pull requests, related email threads, GitHub issues or
other online discussions, how to set up any required accounts and/or
environments if applicable, and so on.
-->
Enable Hosted Control Planes guest clusters to support up to 500 worker nodes. This enable customers to have clusters with large amount of worker nodes.
Max cluster size 250+ worker nodes (mainly about control plane). See XCMSTRAT-371 for additional information.
Service components should not be overwhelmed by additional customer workloads and should use larger cloud instances and when worker nodes are lesser than the threshold it should use smaller cloud instances.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Managed |
Classic (standalone cluster) | N/A |
Hosted control planes | Yes |
Multi node, Compact (three node), or Single node (SNO), or all | N/A |
Connected / Restricted Network | Connected |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_64 ARM |
Operator compatibility | N/A |
Backport needed (list applicable versions) | N/A |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | N/A |
Other (please specify) |
Check with OCM and CAPI requirements to expose larger worker node count.
As a service provider, I want to be able to:
so that I can achieve
Description of criteria:
This does not require a design proposal.
This does not require a feature gate.
CRIO wipe is existing feature in Openshift . When node reboots CRIO wipe goes and clear the node of all images so that node boots clean . When node come back up it need access to image registry to get all images and it takes time to get all images . For telco and edge situation node might not have access to image registry and takes time to come up .
Goal of this feature is to adjust CRIO wipe to wipe only images that has been corrupted because of sudden reboot not all images
Phase 2 of the enclave support for oc-mirror with the following goals
For 4.17 timeframe
Adding nodes to on-prem clusters in OpenShift in general is a complex task. We have numerous methods and the field keeps adding automation around these methods with a variety of solutions, sometimes unsupported (see "why is this important below"). Making cluster expansions easier will let users add nodes often and fast, leading to an much improved UX.
This feature adds nodes to any on-prem clusters, regardless of their installation method (UPI, IPI, Assisted, Agent), by booting an ISO image that will add the node to the cluster specified by the user, regardless of how the cluster was installed.
1. Create image:
$ export KUBECONFIG=kubeconfig-of-target-cluster $ oc adm node-image -o agent.iso --network-data=worker-n.nmstate --role=worker
2. Boot image
3. Check progress
$ oc adm add-node
An important goal of this feature is to unify and eliminate some of the existing options to add nodes, aiming to provide much simpler experience (See "Why is this important below"). We have official and field-documented ways to do this, that could be removed once this feature is in place, simplifying the experience, our docs and the maintenance of said official paths:
With this proposed workflow we eliminate the need of using the UPI method in the vast majority of the cases. We also eliminate the field-documented methods that keep popping up trying to solve this in multiple formats, and the need to recommend using MCE to all on-prem users, and finally we add a simpler option for IPI-deployed clusters.
In addition, all the built-in validations in the assisted service would be run, improving the installation the success rate and overall UX.
This work would have an initial impact on bare metal, vSphere, Nutanix and platform-agnostic clusters, regardless of how they were installed.
This feature is essential for several reasons. Firstly, it enables easy day2 installation without burdening the user with additional technical knowledge. This simplifies the process of scaling the cluster resources with new nodes, which today is overly complex and presents multiple options (https://docs.openshift.com/container-platform/4.13/post_installation_configuration/cluster-tasks.html#adding-worker-nodes_post-install-cluster-tasks).
Secondly, it establishes a unified experience for expanding clusters, regardless of their installation method. This streamlines the deployment process and enhances user convenience.
Another advantage is the elimination of the requirement to install the Multicluster Engine and Infrastructure Operator , which besides demanding additional system resources, are an overkill for use cases where the user simply wants to add nodes to their existing cluster but aren't managing multiple clusters yet. This results in a more efficient and lightweight cluster scaling experience.
Additionally, in the case of IPI-deployed bare metal clusters, this feature eradicates the need for nodes to have a Baseboard Management Controller (BMC) available, simplifying the expansion of bare metal clusters.
Lastly, this problem is often brought up in the field, where examples of different custom solutions have been put in place by redhatters working with customers trying to solve the problem with custom automations, adding to inconsistent processes to scale clusters.
This feature will solve the problem cluster expansion for OCI. OCI doesn't have MAPI and CAPI isn't in the mid term plans. Mitsubishi shared their feedback making solving the problem of lack of cluster expansion a requirement to Red Hat and Oracle.
We already have the basic technologies to do this with the assisted-service and the agent-based installer, which already do this work for new clusters, and from which we expect to leverage the foundations for this feature.
Day 2 node addition with agent image.
Yet Another Day 2 Node Addition Commands Proposal
Enable day2 add node using agent-install: AGENT-682
Add an integration test to verify that the add-nodes command generates correctly the ISO.
review the proper usage & download of the envtest related binaries (api-server and etcd)
As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer - specifically for IPI deployments which currently use Terraform for setting up the infrastructure.
To avoid an increased support overhead once the license changes at the end of the year, we want to provision GCP infrastructure without the use of Terraform.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Description of problem:
After successful installation IPI or UPI cluster using minimum permissions, when destroying the cluster, it keeps telling error "failed to list target tcp proxies: googleapi: Error 403: Required 'compute.regionTargetTcpProxies.list' permission" unexpectedly.
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-09-01-175607
How reproducible:
Always
Steps to Reproduce:
1. try IPI or UPI installation using minimum permissions, and make sure it succeeds 2. destroy the cluster using the same GCP credentials
Actual results:
It keeps telling below errors until timeout. 08-27 14:51:40.508 level=debug msg=Target TCP Proxies: failed to list target tcp proxies: googleapi: Error 403: Required 'compute.regionTargetTcpProxies.list' permission for 'projects/openshift-qe', forbidden ...output omitted... 08-27 15:08:18.801 level=debug msg=Target TCP Proxies: failed to list target tcp proxies: googleapi: Error 403: Required 'compute.regionTargetTcpProxies.list' permission for 'projects/openshift-qe', forbidden
Expected results:
It should not try to list regional target tcp proxies, because CAPI installation only creates global target tcp proxy. And the service account given to installer already has the required compute.targetTcpProxies permissions (see [1] and [2]).
Additional info:
FYI the latest IPI PROW CI test was about 19 days ago, where no such issue, see https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-gcp-ipi-mini-perm-custom-type-f28/1823483536926052352 Required GCP permissions for installer-provisioned infrastructure https://docs.openshift.com/container-platform/4.16/installing/installing_gcp/installing-gcp-account.html#minimum-required-permissions-ipi-gcp_installing-gcp-account Required GCP permissions for user-provisioned infrastructure https://docs.openshift.com/container-platform/4.16/installing/installing_gcp/installing-gcp-user-infra.html#minimum-required-permissions-upi-gcp_installing-gcp-user-infra
Description of problem:
Shared VPC installation using service account having all required permissions failed due to cluster operator ingress degraded, by telling error "error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a5b1f420669b3474d959cff80e8452dc'"
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-multi-2024-08-07-221959
How reproducible:
Always
Steps to Reproduce:
1. "create install-config", then insert the interested settings (see [1]) 2. "create cluster" (see [2])
Actual results:
Installation failed, because cluster operator ingress degraded (see [2] and [3]). $ oc get co ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress False True True 113m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a5b1f420669b3474d959cff80e8452dc', forbidden... $ In fact the mentioned k8s firewall-rule doesn't exist in the host project (see [4]), and, the given service account does have enough permissions (see [6]).
Expected results:
Installation succeeds, and all cluster operators are healthy.
Additional info:
Description of problem:
installing into Shared VPC stuck in waiting for network infrastructure ready
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-06-10-225505
How reproducible:
Always
Steps to Reproduce:
1. "create install-config" and then insert Shared VPC settings (see [1]) 2. activate the service account which has the minimum permissions in the host project (see [2]) 3. "create cluster" FYI The GCP project "openshift-qe" is the service project, and the GCP project "openshift-qe-shared-vpc" is the host project.
Actual results:
1. Getting stuck in waiting for network infrastructure to become ready, until Ctrl+C is pressed. 2. 2 firewall-rules are created in the service project unexpectedly (see [3]).
Expected results:
The installation should succeed, and there should be no any firewall-rule getting created either in the service project or in the host project.
Additional info:
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
Allow customer to enabled EFS CSI usage metrics.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
OCP already supports exposing CSI usage metrics however the EFS metrics are not enabled by default. The goal of this feature is to allows customers to optionally turn on EFS CSI usage metrics in order to see them in the OCP console.
The EFS metrics are not enabled by default for a good reason as it can potentially impact performances. It's disabled in OCP, because the CSI driver would walk through the whole volume, and that can be very slow on large volumes. For this reason, the default will remain the same (no metrics), customers would need explicitly opt-in.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Clear procedure on how to enable it as a day 2 operation. Default remains no metrics. Once enabled the metrics should be available for visualisation.
We should also have a way to disable metrics.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | yes |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | AWS only |
Connected / Restricted Network | both |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | all AWS/EFS supported |
Operator compatibility | EFS CSI operator |
Backport needed (list applicable versions) | No |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | Should appear in OCP UI automatically |
Other (please specify) | OCP on AWS only |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
As an OCP user i want to be able to visualise the EFS CSI metrics
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
Additional metrics
Enabling metrics by default.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Customer request as per
https://issues.redhat.com/browse/RFE-3290
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
We need to be extra clear on the potential performance impact
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Document how to enable CSI metrics + warning about the potential performance impact.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
It can benefit any cluster on AWS using EFS CSI including ROSA
Epic Goal*
This goal of this epic is to provide a way to admin to turn on EFS CSI usage metrics. Since this could lead to performance because the CSI driver would walk through the whole volume this option will not be enabled by default; admin will need to explicitly opt-in.
Why is this important? (mandatory)
Turning on EFS metrics allows users to monitor how much EFS space is being used by OCP.
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
None
Contributing Teams(and contacts) (mandatory)
Acceptance Criteria (optional)
Enable CSI metrics via the operator - ensure the driver is started with the proper cmdline options. Verify that the metrics are sent and exposed to the users.
Drawbacks or Risk (optional)
Metrics are calculated by walking through the whole volume which can impact performances. For this reason enabling CSI metrics will need an explicit opt-in from the admin. This risk needs to be explicitly documented.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
AWS CAPI implementation supports "Tenancy" configuration option: https://pkg.go.dev/sigs.k8s.io/cluster-api-provider-aws@v1.5.0/api/v1beta1#AWSMachineSpec
This option corresponds to functionality OCP currently exposes through MAPI:
This option is currently in use by existing ROSA customers, and will need to be exposed in HyperShift NodePools
As a (user persona), I want to be able to:
so that I can achieve
Description of criteria:
Detail about what is specifically not being delivered in the story
This does not require a design proposal.
This requires a feature gate.
As a (user persona), I want to be able to:
so that I can achieve
Description of criteria:
Detail about what is specifically not being delivered in the story
This does not require a design proposal.
This requires a feature gate.
wrap nodePool tenancy API field in a struct, to group and easily add new placement options to the API in the future.
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
Introduce snapshots support for Azure File as Tech Preview
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
After introducing cloning support in 4.17, the goal of this epic is to add the last remaining piece to support snapshots support as tech preview
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Should pass all the regular CSI snapshot tests. All failing or known issues should be documented in the RN. Since this feature is TP we can still introduce it with knows issues.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | yes |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | all with Azure |
Connected / Restricted Network | all |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | all |
Operator compatibility | Azure File CSI |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | Already covered |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
As an OCP on Azure user I want to perform snapshots of my PVC and be able to restore them as a new PVC.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
Is there any known issues, if so they should be documented.
High-level list of items that are out of scope. Initial completion during Refinement status.
N/A
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
We have support for other cloud providers CSI snapshots, we need to align capabilities in Azure with their File CSI. Upstream support has lagged.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
User experience should be the same as other CSI drivers.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Add snapshot support in the CSI driver table, if there is any specific information to add, include it in the Azure File CSI driver doc. Any known issue should be documented in the RN.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Can be leveraged by ARO or OSD on Azure.
Epic Goal*
Add support for snapshots in Azure File.
Why is this important? (mandatory)
We should track upstream issues and ensure enablement in OpenShift. Snapshots are a standard feature of CSI and the reason we did not support it until now was lacking upstream support for snapshot restoration.
Snapshot restore feature was added recently in upstream driver 1.30.3 which we rebased to in 4.17 - https://github.com/kubernetes-sigs/azurefile-csi-driver/pull/1904
Furthermore we already included azcopy cli which is a depencency of cloning (and snapshots). Enabling snapshots in 4.17 is therefore just a matter of adding a sidecar, volumesnapshotclass and RBAC in csi-operator which is cheap compared to the gain.
However, we've observed a few issues with cloning that might need further fixes to be able to graduate to GA and intend releasing the cloning feature as Tech Preview in 4.17 - since snapshots are implemented with azcopy too we expect similar issues and suggest releasing snapshot feature also as Tech Preview first in 4.17.
Scenarios (mandatory)
Users should be able to create a snapshot and restore PVC from snapshots.
Dependencies (internal and external) (mandatory)
azcopy - already added in scope of cloning epic
upstream driver support for snapshot restore - already added via 4.17 rebase
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
Introduce snapshots support for Azure File as Tech Preview
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
After introducing cloning support in 4.17, the goal of this epic is to add the last remaining piece to support snapshots support as tech preview
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Should pass all the regular CSI snapshot tests. All failing or known issues should be documented in the RN. Since this feature is TP we can still introduce it with knows issues.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | yes |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | all with Azure |
Connected / Restricted Network | all |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | all |
Operator compatibility | Azure File CSI |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | Already covered |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
As an OCP on Azure user I want to perform snapshots of my PVC and be able to restore them as a new PVC.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
Is there any known issues, if so they should be documented.
High-level list of items that are out of scope. Initial completion during Refinement status.
N/A
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
We have support for other cloud providers CSI snapshots, we need to align capabilities in Azure with their File CSI. Upstream support has lagged.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
User experience should be the same as other CSI drivers.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Add snapshot support in the CSI driver table, if there is any specific information to add, include it in the Azure File CSI driver doc. Any known issue should be documented in the RN.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Can be leveraged by ARO or OSD on Azure.
Enable sharing ConfigMap and Secret across namespaces
Requirement | Notes | isMvp? |
---|---|---|
Secrets and ConfigMaps can get shared across namespaces | YES |
NA
NA
Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them.
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
Epic Goal*
Remove the Shared Resource CSI Driver as a tech preview feature.
Why is this important? (mandatory)
Shared Resources was originally introduced as a tech preview feature in OpenShift Container Platform. After extensive review, we have decided to GA this component through the Builds for OpenShift layered product.
Expected GA will be alongside OpenShift 4.16. Therefore it is safe to remove in OpenShift 4.17
Scenarios (mandatory)
Dependencies (internal and external) (mandatory)
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Drawbacks or Risk (optional)
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Ensure CSI Stack for Azure is running on management clusters with hosted control planes, allowing customers to associate a cluster as "Infrastructure only" and move the following parts of the stack:
This feature enables customers to run their Azure infrastructure more efficiently and cost-effectively by using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. Additionally, customers should care most about workloads not the management stack to operate their clusters, this feature gets us closer to this goal.
Non-CSI Stack for Azure-related functionalities are out of scope for this feature.
Workload identity authentication is not covered by this feature - see STOR-1748
This feature is designed to enable customers to run their Azure infrastructure more efficiently and cost-effectively by using HyperShift control planes and supporting infrastructure without incurring additional charges from Red Hat.
Documentation for this feature should provide clear instructions on how to enable the CSI Stack for Azure on management clusters with hosted control planes and associate a cluster as "Infrastructure only." It should also include instructions on how to move the Azure Disk CSI driver, Azure File CSI driver, and Azure File CSI driver operator to the appropriate clusters.
This feature impacts the CSI Stack for Azure and any layered products that interact with it. Interoperability test scenarios should be factored by the layered products.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Run Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster allowing customers to associate a cluster as "Infrastructure only".
Why is this important? (mandatory)
This allows customers to run their Azure infrastructure more efficiently and cost-effectively by using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. Additionally, customers should care most about workloads not the management stack to operate their clusters, this feature gets us closer to this goal.
Scenarios (mandatory)
When leveraging Hosted control planes, the Azure Disk CSI driver operator + Azure Disk CSI driver control-plane Pods should run in the management cluster. The driver DaemonSet should run on the managed cluster. This deployment model should provide the same feature set as the regular OCP deployment.
Dependencies (internal and external) (mandatory)
Hosted control plane on Azure.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Done - Checklist (mandatory)
As part of this epic, Engineers working on Azure Hypershift should be able to build and use Azure Disk storage on hypershift guests via developer preview custom build images.
For this story, we are going to enable deployment of azure disk driver and operator by default in hypershift environment.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Run Azure File CSI driver operator + Azure File CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster allowing customers to associate a cluster as "Infrastructure only".
Why is this important? (mandatory)
This allows customers to run their Azure infrastructure more efficiently and cost-effectively by using hosted control planes and supporting infrastructure without incurring additional charges from Red Hat. Additionally, customers should care most about workloads not the management stack to operate their clusters, this feature gets us closer to this goal.
Scenarios (mandatory)
When leveraging Hosted control planes, the Azure File CSI driver operator + Azure File CSI driver control-plane Pods should run in the management cluster. The driver DaemonSet should run on the managed cluster. This deployment model should provide the same feature set as the regular OCP deployment.
Dependencies (internal and external) (mandatory)
Hosted control plane on Azure.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
We need to modify csi-operator so as it be ran as azure-file operator on hypershift and standalone clusters.
As part of this story, we will simply move building and CI of existing code to combined csi-operator.
Place holder epic to capture all azure tickets.
TODO: review.
As an end user of a hypershift cluster, I want to be able to:
so that I can achieve
From slack thread: https://redhat-external.slack.com/archives/C075PHEFZKQ/p1722615219974739
We need 4 different certs:
Unify and update hosted control planes storage operators so that they have similar code patterns and can run properly in both standalone OCP and HyperShift's control plane.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
Epic Goal*
Our current design of EBS driver operator to support Hypershift does not scale well to other drivers. Existing design will lead to more code duplication between driver operators and possibility of errors.
Why is this important? (mandatory)
An improved design will allow more storage drivers and their operators to be added to hypershift without requiring significant changes in the code internals.
Scenarios (mandatory)
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Finally switch both CI and ART to the refactored aws-ebs-csi-driver-operator.
The functionality and behavior should be the same as the existing operator, however, the code is completely new. There could be some rough edges. See https://github.com/openshift/enhancements/blob/master/enhancements/storage/csi-driver-operator-merge.md
Ci should catch the most obvious errors, however, we need to test features that we do not have in CI. Like:
Out CSI driver YAML files are mostly copy-paste from the initial CSI driver (AWS EBS?).
As OCP engineer, I want the YAML files to be generated, so we can keep consistency among the CSI drivers easily and make them less error-prone.
It should have no visible impact on the resulting operator behavior.
Support deploying an OpenShift cluster across multiple vSphere clusters, i.e. configuring multiple vCenter servers in one OpenShift cluster.
Multiple vCenter support in the Cloud Provider Interface (CPI) and the Cloud Storage Interface (CSI).
Customers want to deploy OpenShift across multiple vSphere clusters (vCenters) primarily for high availability.
粗文本*h3. *Feature Overview
Support deploying an OpenShift cluster across multiple vSphere clusters, i.e. configuring multiple vCenter servers in one OpenShift cluster.
Multiple vCenter support in the Cloud Provider Interface (CPI) and the Cloud Storage Interface (CSI).
Customers want to deploy OpenShift across multiple vSphere clusters (vCenters) primarily for high availability.
This section contains all the test cases that we need to make sure work as part of the done^3 criteria.
This section contains all scenarios that are considered out of scope for this enhancement that will be done via a separate epic / feature / story.
For this task, we need to create a new periodical that will test the multi vcenter feature.
Add authentication to the internal components of the Agent Installer so that the cluster install is secure.
Requirements
Are there any requirements specific to the auth token?
Actors:
Do we need more than one auth scheme?
Agent-admin - agent-read-write
Agent-user - agent-read
Options for Implementation:
As a user, when creating node ISOs, I want to be able to:
so that I can achieve
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF).
Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.
This includes (but is not limited to):
Operators:
EOL, do not upgrade:
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
(Using separate cards for each driver because these updates can be more complicated)
Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.
Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.
Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").
Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.
Goals
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
List any affected packages or components.
TechPreview featureSet check added in machine-api-provider-gcp operator for userLabels and userTags.
And the new featureGate added in openshift/api should also be removed.
Acceptance Criteria
TechPreview featureSet check added in installer for userLabels and userTags should be removed and the TechPreview reference made in the install-config GCP schema should be removed.
Acceptance Criteria
As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for IPI deployments which currently use Terraform for setting up the infrastructure.
To avoid an increased support overhead once the license changes at the end of the year, we want to provision Azure infrastructure without the use of Terraform.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Description of problem:
CAPZ creates an empty route table during installs
Version-Release number of selected component (if applicable):
4.17
How reproducible:
Very
Steps to Reproduce:
1.Install IPI cluster using CAPZ 2. 3.
Actual results:
Empty route table created and attached to worker subnet
Expected results:
No route table created
Additional info:
Description of problem:
Failed to create second cluster in shared vnet, below error is thrown out during creating network infrastructure when creating 2nd cluster, installer timed out and exited. ============== 07-23 14:09:27.315 level=info msg=Waiting up to 15m0s (until 6:24AM UTC) for network infrastructure to become ready... ... 07-23 14:16:14.900 level=debug msg= failed to reconcile cluster services: failed to reconcile AzureCluster service loadbalancers: failed to create or update resource jima0723b-1-x6vpp-rg/jima0723b-1-x6vpp-internal (service: loadbalancers): PUT https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-1-x6vpp-rg/providers/Microsoft.Network/loadBalancers/jima0723b-1-x6vpp-internal 07-23 14:16:14.900 level=debug msg= -------------------------------------------------------------------------------- 07-23 14:16:14.901 level=debug msg= RESPONSE 400: 400 Bad Request 07-23 14:16:14.901 level=debug msg= ERROR CODE: PrivateIPAddressIsAllocated 07-23 14:16:14.901 level=debug msg= -------------------------------------------------------------------------------- 07-23 14:16:14.901 level=debug msg= { 07-23 14:16:14.901 level=debug msg= "error": { 07-23 14:16:14.901 level=debug msg= "code": "PrivateIPAddressIsAllocated", 07-23 14:16:14.901 level=debug msg= "message": "IP configuration /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-1-x6vpp-rg/providers/Microsoft.Network/loadBalancers/jima0723b-1-x6vpp-internal/frontendIPConfigurations/jima0723b-1-x6vpp-internal-frontEnd is using the private IP address 10.0.0.100 which is already allocated to resource /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/frontendIPConfigurations/jima0723b-49hnw-internal-frontEnd.", 07-23 14:16:14.902 level=debug msg= "details": [] 07-23 14:16:14.902 level=debug msg= } 07-23 14:16:14.902 level=debug msg= } 07-23 14:16:14.902 level=debug msg= -------------------------------------------------------------------------------- Install-config for 1st cluster: ========= metadata: name: jima0723b platform: azure: region: eastus baseDomainResourceGroupName: os4-common networkResourceGroupName: jima0723b-rg virtualNetwork: jima0723b-vnet controlPlaneSubnet: jima0723b-master-subnet computeSubnet: jima0723b-worker-subnet publish: External Install-config for 2nd cluster: ======== metadata: name: jima0723b-1 platform: azure: region: eastus baseDomainResourceGroupName: os4-common networkResourceGroupName: jima0723b-rg virtualNetwork: jima0723b-vnet controlPlaneSubnet: jima0723b-master-subnet computeSubnet: jima0723b-worker-subnet publish: External shared master subnet/worker subnet: $ az network vnet subnet list -g jima0723b-rg --vnet-name jima0723b-vnet -otable AddressPrefix Name PrivateEndpointNetworkPolicies PrivateLinkServiceNetworkPolicies ProvisioningState ResourceGroup --------------- ----------------------- -------------------------------- ----------------------------------- ------------------- --------------- 10.0.0.0/24 jima0723b-master-subnet Disabled Enabled Succeeded jima0723b-rg 10.0.1.0/24 jima0723b-worker-subnet Disabled Enabled Succeeded jima0723b-rg internal lb frontedIPConfiguration on 1st cluster: $ az network lb show -n jima0723b-49hnw-internal -g jima0723b-49hnw-rg --query 'frontendIPConfigurations' [ { "etag": "W/\"7a7531ca-fb02-48d0-b9a6-d3fb49e1a416\"", "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/frontendIPConfigurations/jima0723b-49hnw-internal-frontEnd", "inboundNatRules": [ { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/inboundNatRules/jima0723b-49hnw-master-0", "resourceGroup": "jima0723b-49hnw-rg" }, { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/inboundNatRules/jima0723b-49hnw-master-1", "resourceGroup": "jima0723b-49hnw-rg" }, { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/inboundNatRules/jima0723b-49hnw-master-2", "resourceGroup": "jima0723b-49hnw-rg" } ], "loadBalancingRules": [ { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/loadBalancingRules/LBRuleHTTPS", "resourceGroup": "jima0723b-49hnw-rg" }, { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-49hnw-rg/providers/Microsoft.Network/loadBalancers/jima0723b-49hnw-internal/loadBalancingRules/sint-v4", "resourceGroup": "jima0723b-49hnw-rg" } ], "name": "jima0723b-49hnw-internal-frontEnd", "privateIPAddress": "10.0.0.100", "privateIPAddressVersion": "IPv4", "privateIPAllocationMethod": "Static", "provisioningState": "Succeeded", "resourceGroup": "jima0723b-49hnw-rg", "subnet": { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima0723b-rg/providers/Microsoft.Network/virtualNetworks/jima0723b-vnet/subnets/jima0723b-master-subnet", "resourceGroup": "jima0723b-rg" }, "type": "Microsoft.Network/loadBalancers/frontendIPConfigurations" } ] From above output, privateIPAllocationMethod is static and always allocate privateIPAddress to 10.0.0.100, this might cause the 2nd cluster installation failure. Checked the same on cluster created by using terraform, privateIPAllocationMethod is dynamic. =============== $ az network lb show -n wxjaz723-pm99k-internal -g wxjaz723-pm99k-rg --query 'frontendIPConfigurations' [ { "etag": "W/\"e6bec037-843a-47ba-a725-3f322564be58\"", "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-pm99k-rg/providers/Microsoft.Network/loadBalancers/wxjaz723-pm99k-internal/frontendIPConfigurations/internal-lb-ip-v4", "loadBalancingRules": [ { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-pm99k-rg/providers/Microsoft.Network/loadBalancers/wxjaz723-pm99k-internal/loadBalancingRules/api-internal-v4", "resourceGroup": "wxjaz723-pm99k-rg" }, { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-pm99k-rg/providers/Microsoft.Network/loadBalancers/wxjaz723-pm99k-internal/loadBalancingRules/sint-v4", "resourceGroup": "wxjaz723-pm99k-rg" } ], "name": "internal-lb-ip-v4", "privateIPAddress": "10.0.0.4", "privateIPAddressVersion": "IPv4", "privateIPAllocationMethod": "Dynamic", "provisioningState": "Succeeded", "resourceGroup": "wxjaz723-pm99k-rg", "subnet": { "id": "/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/wxjaz723-rg/providers/Microsoft.Network/virtualNetworks/wxjaz723-vnet/subnets/wxjaz723-master-subnet", "resourceGroup": "wxjaz723-rg" }, "type": "Microsoft.Network/loadBalancers/frontendIPConfigurations" }, ... ]
Version-Release number of selected component (if applicable):
4.17 nightly build
How reproducible:
Always
Steps to Reproduce:
1. Create shared vnet / master subnet / worker subnet 2. Create 1st cluster in shared vnet 3. Create 2nd cluster in shared vnet
Actual results:
2nd cluster installation failed
Expected results:
Both clusters are installed successfully.
Additional info:
Description of problem:
Install Azure fully private IPI cluster by using CAPI with payload built from cluster bot including openshift/installer#8727,openshift/installer#8732, install-config: ================= platform: azure: region: eastus outboundType: UserDefinedRouting networkResourceGroupName: jima24b-rg virtualNetwork: jima24b-vnet controlPlaneSubnet: jima24b-master-subnet computeSubnet: jima24b-worker-subnet publish: Internal featureSet: TechPreviewNoUpgrade Checked storage account created by installer, its property allowBlobPublicAccess is set to True. $ az storage account list -g jima24b-fwkq8-rg --query "[].[name,allowBlobPublicAccess]" -o tsv jima24bfwkq8sa True This is not consistent with terraform code, https://github.com/openshift/installer/blob/master/data/data/azure/vnet/main.tf#L74 At least, storage account should have no public access for fully private cluster.
Version-Release number of selected component (if applicable):
4.17 nightly build
How reproducible:
Always
Steps to Reproduce:
1. Create fully private cluster 2. Check storage account created by installer 3.
Actual results:
storage account have public access on fully private cluster.
Expected results:
storage account should have no public access on fully private cluster.
Additional info:
Description of problem:
In install-config file, there is no zone/instance type setting under controlplane or defaultMachinePlatform ========================== featureSet: CustomNoUpgrade featureGates: - ClusterAPIInstallAzure=true compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: {} replicas: 3 controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: {} replicas: 3 create cluster, master instances should be created in multi zones, since default instance type 'Standard_D8s_v3' have availability zones. Actually, master instances are not created in any zone. $ az vm list -g jima24a-f7hwg-rg -otable Name ResourceGroup Location Zones ------------------------------------------ ---------------- -------------- ------- jima24a-f7hwg-master-0 jima24a-f7hwg-rg southcentralus jima24a-f7hwg-master-1 jima24a-f7hwg-rg southcentralus jima24a-f7hwg-master-2 jima24a-f7hwg-rg southcentralus jima24a-f7hwg-worker-southcentralus1-wxncv jima24a-f7hwg-rg southcentralus 1 jima24a-f7hwg-worker-southcentralus2-68nxv jima24a-f7hwg-rg southcentralus 2 jima24a-f7hwg-worker-southcentralus3-4vts4 jima24a-f7hwg-rg southcentralus 3
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-06-23-145410
How reproducible:
Always
Steps to Reproduce:
1. CAPI-based install on azure platform with default configuration 2. 3.
Actual results:
master instances are created but not in any zone.
Expected results:
master instances should be created per zone based on selected instance type, keep the same behavior as terraform based install.
Additional info:
When setting zones under controlPlane in install-config, master instances can be created per zone. install-config: =========================== controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: azure: zones: ["1","3"] $ az vm list -g jima24b-p76w4-rg -otable Name ResourceGroup Location Zones ------------------------------------------ ---------------- -------------- ------- jima24b-p76w4-master-0 jima24b-p76w4-rg southcentralus 1 jima24b-p76w4-master-1 jima24b-p76w4-rg southcentralus 3 jima24b-p76w4-master-2 jima24b-p76w4-rg southcentralus 1 jima24b-p76w4-worker-southcentralus1-bbcx8 jima24b-p76w4-rg southcentralus 1 jima24b-p76w4-worker-southcentralus2-nmgfd jima24b-p76w4-rg southcentralus 2 jima24b-p76w4-worker-southcentralus3-x2p7g jima24b-p76w4-rg southcentralus 3
Description of problem:
Launch CAPI based installation on Azure Government Cloud, installer was timeout when waiting for network infrastructure to become ready. 06-26 09:08:41.153 level=info msg=Waiting up to 15m0s (until 9:23PM EDT) for network infrastructure to become ready... ... 06-26 09:09:33.455 level=debug msg=E0625 21:09:31.992170 22172 azurecluster_controller.go:231] "failed to reconcile AzureCluster" err=< 06-26 09:09:33.455 level=debug msg= failed to reconcile AzureCluster service group: reconcile error that cannot be recovered occurred: resource is not Ready: The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found.: PUT https://management.azure.com/subscriptions/8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7/resourceGroups/jima26mag-9bqkl-rg 06-26 09:09:33.456 level=debug msg= -------------------------------------------------------------------------------- 06-26 09:09:33.456 level=debug msg= RESPONSE 404: 404 Not Found 06-26 09:09:33.456 level=debug msg= ERROR CODE: SubscriptionNotFound 06-26 09:09:33.456 level=debug msg= -------------------------------------------------------------------------------- 06-26 09:09:33.456 level=debug msg= { 06-26 09:09:33.456 level=debug msg= "error": { 06-26 09:09:33.456 level=debug msg= "code": "SubscriptionNotFound", 06-26 09:09:33.456 level=debug msg= "message": "The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found." 06-26 09:09:33.456 level=debug msg= } 06-26 09:09:33.456 level=debug msg= } 06-26 09:09:33.456 level=debug msg= -------------------------------------------------------------------------------- 06-26 09:09:33.456 level=debug msg= . Object will not be requeued 06-26 09:09:33.456 level=debug msg= > logger="controllers.AzureClusterReconciler.reconcileNormal" controller="azurecluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureCluster" AzureCluster="openshift-cluster-api-guests/jima26mag-9bqkl" namespace="openshift-cluster-api-guests" reconcileID="f2ff1040-dfdd-4702-ad4a-96f6367f8774" x-ms-correlation-request-id="d22976f0-e670-4627-b6f3-e308e7f79def" name="jima26mag-9bqkl" 06-26 09:09:33.457 level=debug msg=I0625 21:09:31.992215 22172 recorder.go:104] "failed to reconcile AzureCluster: failed to reconcile AzureCluster service group: reconcile error that cannot be recovered occurred: resource is not Ready: The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found.: PUT https://management.azure.com/subscriptions/8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7/resourceGroups/jima26mag-9bqkl-rg\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: SubscriptionNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"SubscriptionNotFound\",\n \"message\": \"The subscription '8fe0c1b4-8b05-4ef7-8129-7cf5680f27e7' could not be found.\"\n }\n}\n--------------------------------------------------------------------------------\n. Object will not be requeued" logger="events" type="Warning" object={"kind":"AzureCluster","namespace":"openshift-cluster-api-guests","name":"jima26mag-9bqkl","uid":"20bc01ee-5fbe-4657-9d0b-7013bd55bf96","apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","resourceVersion":"1115"} reason="ReconcileError" 06-26 09:17:40.081 level=debug msg=I0625 21:17:36.066522 22172 helpers.go:516] "returning early from secret reconcile, no update needed" logger="controllers.reconcileAzureSecret" controller="ASOSecret" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureCluster" AzureCluster="openshift-cluster-api-guests/jima26mag-9bqkl" namespace="openshift-cluster-api-guests" name="jima26mag-9bqkl" reconcileID="2df7c4ba-0450-42d2-901e-683de399f8d2" x-ms-correlation-request-id="b2bfcbbe-8044-472f-ad00-5c0786ebbe84" 06-26 09:23:46.611 level=debug msg=Collecting applied cluster api manifests... 06-26 09:23:46.611 level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: infrastructure is not ready: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline 06-26 09:23:46.611 level=info msg=Shutting down local Cluster API control plane... 06-26 09:23:46.612 level=info msg=Stopped controller: Cluster API 06-26 09:23:46.612 level=warning msg=process cluster-api-provider-azure exited with error: signal: killed 06-26 09:23:46.612 level=info msg=Stopped controller: azure infrastructure provider 06-26 09:23:46.612 level=warning msg=process cluster-api-provider-azureaso exited with error: signal: killed 06-26 09:23:46.612 level=info msg=Stopped controller: azureaso infrastructure provider 06-26 09:23:46.612 level=info msg=Local Cluster API system has completed operations 06-26 09:23:46.612 [[1;31mERROR[0;39m] Installation failed with error code '4'. Aborting execution. From above log, Azure Resource Management API endpoint is not correct, endpoint "management.azure.com" is for Azure Public cloud, the expected one for Azure Government should be "management.usgovcloudapi.net".
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-06-23-145410
How reproducible:
Always
Steps to Reproduce:
1. Install cluster on Azure Government Cloud, capi-based installation 2. 3.
Actual results:
Installation failed because of the wrong Azure Resource Management API endpoint used.
Expected results:
Installation succeeded.
Additional info:
Epic Goal*
There was an epic / enhancement to create a cluster-wide TLS config that applies to all OpenShift components:
https://issues.redhat.com/browse/OCPPLAN-4379
https://github.com/openshift/enhancements/blob/master/enhancements/kube-apiserver/tls-config.md
For example, this is how KCM sets --tls-cipher-suites and --tls-min-version based on the observed config:
https://issues.redhat.com/browse/WRKLDS-252
https://github.com/openshift/cluster-kube-controller-manager-operator/pull/506/files
The cluster admin can change the config based on their risk profile, but if they don't change anything, there is a reasonable default.
We should update all CSI driver operators to use this config. Right now we have a hard-coded cipher list in library-go. See OCPBUGS-2083 and OCPBUGS-4347 for background context.
Why is this important? (mandatory)
This will keep the cipher list consistent across many OpenShift components. If the default list is changed, we get that change "for free".
It will reduce support calls from customers and backport requests when the recommended defaults change.
It will provide flexibility to the customer, since they can set their own TLS profile settings without requiring code change for each component.
Scenarios (mandatory)
As a cluster admin, I want to use TLSSecurityProfile to control the cipher list and minimum TLS version for all CSI driver operator sidecars, so that I can adjust the settings based on my own risk assessment.
Dependencies (internal and external) (mandatory)
None, the changes we depend on were already implemented.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Networking Definition of Planned
Epic Template descriptions and documentation
openshift-sdn is no longer part of OCP in 4.17, so remove references to it in the networking APIs.
Consider whether we can remove the entire network.openshift.io API, which will now be no-ops.
In places where both sdn and ovn-k are supported, remove references to sdn.
In some places (notably the migration API), we will probably leave an API in place that currently has no purpose.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Goal:
As an administrator, I would like to use my own managed DNS solution instead of only specific openshift-install supported DNS services (such as AWS Route53, Google Cloud DNS, etc...) for my OpenShift deployment.
Problem:
While cloud-based DNS services provide convenient hostname management, there's a number of regulatory (ITAR) and operational constraints customers face prohibiting the use of those DNS hosting services on public cloud providers.
Why is this important:
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
Previous Work:
Open questions:
Link to Epic: https://docs.google.com/document/d/1OBrfC4x81PHhpPrC5SEjixzg4eBnnxCZDr-5h3yF2QI/edit?usp=sharing
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
Alternate scenario
#acm-1290-rename-local-cluster
Remove hard-coded local-cluster from the import local cluster feature and verify that we don't use it in the infrastructure operator
Testing the import local cluster and checking the behavior after the upgrade.
Yes.
No
No
Presently the name of the local-cluster is hardwired to "local-cluster" in the local cluster import tool.
It is possible to redefine the name of the "local-cluster" in ACM then the correct local-cluster name needs to be picked up and used by the ManagedCluster.
Suggested approach
1: Obtain the correct "local-cluster" name from the ManagedCluster CR that has been labelled as "local-cluster"
2: Use this name to import the local cluster, annotate the created AgentServiceConfig, ClusterDeployment and InfraEnv as a "local cluster"
3: Handle any updates to ManagedCluster to keep the name in sync.
4: During deletion of local cluster CRs, this annotation may be used to identify CRs to be deleted.
This will leave an edge case, there will be an AgentServiceConfig, ClusterDeployment and InfraEnv "left behind" for any users who have renamed their ManagedCluster and then performed an upgrade to this new version. Those users will need to manually remove these CR's. (I will discuss further with ACM to determine a suitable course of action here.)
This makes the following assumptions, which should also be checked with the ACM team.
1: ACM users may rename their "local-cluster" in ACM (meaning that we should pick this change up)
2: ACM will use the label "local-cluster" in the ManagedCluster CR to signify a local cluster
3: There will only be one "local-cluster" in ACM (note that it's possible to add a label arbitrarily so this may not be properly enforceable.)
Requirement description:
As an VM Admin, I want to improve overall density. In our traditional VM environments, we find that we are memory bound much more than CPU. Even with properly sized VMs, we see a lot of memory just sitting around allocated to the VM, but not actually used. Moreover, we always see people requesting VMs that are sized way too big for their workloads. It is better customer service allow it to some degree and then recover the memory at the hypervisor level.
MVP:
Documents:
Prometheus query for UI:
sum by (instance)(((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) + (node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes)) / node_memory_MemTotal_bytes) *100
In human words: This is approximating how much over-committment of memory is taking place. A value of 100 means RAM+SWAP usage are 100% of system RAM capacity. 105% means RAM+SWAP are factor 105% of system RAM capacity.
Threshold: Yellow 95%, Red 105%
Based on: https://docs.google.com/document/d/1AbR1LACNMRU2QMqFpe-Se2mCEFLMqW_M9OPKh2v3yYw,
https://docs.google.com/document/d/1E1joajwxQChQiDVTsr9Qk_iIhpQkSI-VQP-o_BMx8Aw
The admin console's alert details page is provided by https://github.com/openshift/monitoring-plugin, but the dev console's equivalent page is still provided by code in the console codebase.
The dev console page displays fewer dashboards than the admin version of the page, so that difference will need to be supported by monitoring-plugin.
Proposed title of this feature request
Fleet / Multicluster Alert Management User Interface
What is the nature and description of the request?
Large enterprises are drowning in cluster alerts.
side note: Just within my demo RHACM Hub environment, across 12 managed clusters (OCP, SNO, ARO, ROSA, self-managed HCP, xKS), I have 62 alerts being reported! And I have no idea what to do about them!
Customers need the ability to interact with alerts in a meaningful way, to leverage a user interface that can filter, display, multi-select, sort, etc. To multi-select and take actions, for example:
Why does the customer need this? (List the business requirements)
Platform engineering (sys admin; SRE etc) must maintain the health of the cluster and ensure that the business applications are running stable. There might indeed be another tool and another team which focuses on the Application health itself, but for sure the platform team is interested to ensure that the platform is running optimally and all critical alerts are responded to.
As of TODAY, what the customer must do is perform alert management via CLI. This is tedious, ad-hoc, and error prone. see blog link
The requirements are:
List any affected packages or components.
OCP console Observe dynamic plugin
ACM Multicluster observability (MCO operator)
"In order to provide ACM with the same monitoring capabilities OCP has, we as the Observability UI Team need to allow the monitoring plugin to be installed and work in ACM environments."
Product Requirements:
UX Requirements:
Placeholder feature for ccx-ocp-core maintenance tasks.
This is epic tracks "business as usual" requirements / enhancements / bug fixing of Insights Operator.
Insights operator should replaces %s in https://console.redhat.com/api/gathering/v2/%s/gathering_rules error messages like the failed-to-bootstrap:
$ jq -r .content osd-ccs-gcp-ad-install.log | sed 's/\\n/\n/g' | grep 'Cluster operator insights' time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights ClusterTransferAvailable is False with Unauthorized: failed to pull cluster transfer: OCM API https://api.openshift.com/api/accounts_mgmt/v1/cluster_transfers/?search=cluster_uuid+is+%REDACTED%27+and+status+is+%27accepted%27 returned HTTP 401: REDACTED" time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights Disabled is False with AsExpected: " time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights RemoteConfigurationAvailable is False with HttpStatus401: received HTTP 401 Unauthorized from https://console.redhat.com/api/gathering/v2/%s/gathering_rules" time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights RemoteConfigurationValid is Unknown with NoValidationYet: " time="2024-09-05T08:12:51Z" level=info msg="Cluster operator insights SCAAvailable is False with Unauthorized: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 401: REDACTED level=info msg=Cluster operator insights ClusterTransferAvailable is False with Unauthorized: failed to pull cluster transfer: OCM API https://api.openshift.com/api/accounts_mgmt/v1/cluster_transfers/?search=cluster_uuid+is+%27REDACTED%27+and+status+is+%27accepted%27 returned HTTP 401: REDACTED level=info msg=Cluster operator insights Disabled is False with AsExpected: level=info msg=Cluster operator insights RemoteConfigurationAvailable is False with HttpStatus401: received HTTP 401 Unauthorized from https://console.redhat.com/api/gathering/v2/%s/gathering_rules level=info msg=Cluster operator insights RemoteConfigurationValid is Unknown with NoValidationYet: level=info msg=Cluster operator insights SCAAvailable is False with Unauthorized: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 401: REDACTED level=info msg=Cluster operator insights UploadDegraded is True with NotAuthorized: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {\"errors\":[{\"meta\":{\"response_by\":\"gateway\"},\"detail\":\"UHC services authentication failed\",\"status\":401}]}
Seen in 4.17 RCs. Also in this comment.
Unknown
Unknown.
ClusterOperator conditions talking about https://console.redhat.com/api/gathering/v2/%s/gathering_rules
URIs we expose in customer-oriented messaging to not have %s placeholders.
Seems like the template is coming in as conditionalGathererEndpoint here. Seems like insights-operator#964 introduced the %s, but I'm not finding the logic that's supposed to populate that placeholder.
Description of problem:
When the Insights Operator is disabled (as described in the docs here or here), the RemoteConfigurationAvailable and RemoteConfigurationValid clusteroperator conditions are reporting the previous (before distabling the gathering) state (which might be Available=True and Valid=True).
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Disable the data gathering in the Insights Operator followings the docs links above 2. Watch the clusteroperator conditions with "oc get co insights -o json | jq .status.conditions" 3.
Actual results:
Expected results:
Additional info:
As a cluster-admin, I want to run update in discrete steps. Update control plane and worker nodes independently.
I also want to back-up and restore incase of a problematic upgrade.
Background:
This Feature is a continuation of https://issues.redhat.com/browse/OCPSTRAT-180.
Customers are asking for improvements to the upgrade experience (both over-the-air and disconnected). This is a feature tracking epics required to get that work done.Below is the list of done tasks.
Enable installation and lifecycle support of OpenShift 4 on Oracle Cloud Infrastructure (OCI) Bare metal
Use scenarios
Why is this important
Requirement | Notes |
---|---|
OCI Bare Metal Shapes must be certified with RHEL | It must also work with RHCOS (see iSCSI boot notes) as OCI BM standard shapes require RHCOS iSCSI to boot ( Certified shapes: https://catalog.redhat.com/cloud/detail/249287 |
Successfully passing the OpenShift Provider conformance testing – this should be fairly similar to the results from the OCI VM test results. | Oracle will do these tests. |
Updating Oracle Terraform files | |
Making the Assisted Installer modifications needed to address the CCM changes and surface the necessary configurations. | Support Oracle Cloud in Assisted-Installer CI: |
RFEs:
Any bare metal Shape to be supported with OCP has to be certified with RHEL.
From the certified Shapes, those that have local disks will be supported. This is due to the current lack of support in RHCOS for the iSCSI boot feature. OCPSTRAT-749 is tracking adding this support and remove this restriction in the future.
As of Aug 2023 this excludes at least all the Standard shapes, BM.GPU2.2 and BM.GPU3.8, from the published list at: https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#baremetalshapes
Please describe what this feature is going to do.
Please describe what conditions must be met in order to mark this feature as "done".
If the answer is "yes", please make sure to check the corresponding option.
To make iSCSI work, a secondary VNIC must be configured during discovery, and when the machine reboots on core OS. The configuration is almost the same for discovery and Core OS.
Currently, we have one script owned by Red Hat for discovery, and a custom manifest owned by Oracle for CoreOS configuration.
I think this configuration should be owned by Oracle because the network configuration depends on OCI API. Also, we need this script to be the same is order to ensure that the configuration applied on discovery will be the same when the machine reboots on Core OS. Finally, if a customer has a specific need, they won't be able to tailor the configuration to their needs easily, as they would have to use the REST API of the assisted service.
My suggestion is to ask Oracle to drop the configuration script in their metadata service using Oracle's terraform template. On Red Hat side, we would have to pull this script on the node, and execute it thanks to a systemd unit. The same would be done from the custom manifest provided by Oracle.
During 4.15, the OCP team is working on allowing booting from iscsi. Today that's disabled by the assisted installer. The goal is to enable that for ocp version >= 4.15 when using OCI external platform.
iscsi boot is enabled for ocp version >= 4.15 both in the UI and the backend.
When booting from iscsi, we need to make sure to add the `rd.iscsi.firmware=1 ip=ibft` kargs during install to enable iSCSI booting.
yes
The secondary VNIC must be configured manually in OCI, a script must be injected in the discovery ISO to configure it.
PR https://github.com/openshift/assisted-service/pull/6257 must be adapted to be used along external platform.
Since we ensure that the iscsi network is not the default route, the PR above will ensure that automatically select the subnet used by the default route.
Support network isolation and multiple primary networks (with the possibility of overlapping IP subnets) without having to use Kubernetes Network Policies.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
OVN-Kubernetes today allows multiple different types of networks per secondary network: layer 2, layer 3, or localnet. Pods can be connected to different networks without discretion. For the primary network, OVN-Kubernetes only supports all pods connecting to the same layer 3 virtual topology.
As users migrate from OpenStack to Kubernetes, there is a need to provide network parity for those users. In OpenStack, each tenant (analog to a Kubernetes namespace) by default has a layer 2 network, which is isolated from any other tenant. Connectivity to other networks must be specified explicitly as network configuration via a Neutron router. In Kubernetes the paradigm is the opposite; by default all pods can reach other pods, and security is provided by implementing Network Policy.
Network Policy has its issues:
With all these factors considered, there is a clear need to address network security in a native fashion, by using networks per user to isolate traffic instead of using Kubernetes Network Policy.
Therefore, the scope of this effort is to bring the same flexibility of the secondary network to the primary network and allow pods to connect to different types of networks that are independent of networks that other pods may connect to.
Test scenarios:
Crun is GA as non default since OCP 4.14 . We want to make it as default in 4.18 while still supporting runc as non-default
Benefits of Crun is covered here https://github.com/containers/crun
FAQ.: https://docs.google.com/document/d/1N7tik4HXTKsXS-tMhvnmagvw6TE44iNccQGfbL_-eXw/edit
***Note -> making Crun default does not means we will remove the support for runc nor we have any plans in foreseeable future to do that
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Check with ACS team; see if there are external repercussions.
Move to using the upstream Cluster API (CAPI) in place of the current implementation of the Machine API for standalone Openshift
prerequisite work Goals completed in OCPSTRAT-1122
{}Complete the design of the Cluster API (CAPI) architecture and build the core operator logic needed for Phase-1, incorporating the assets from different repositories to simplify asset management.
Phase 1 & 2 covers implementing base functionality for CAPI.
There must be no negative effect to customers/users of the MAPI, this API must continue to be accessible to them though how it is implemented "under the covers" and if that implementation leverages CAPI is open
As an OpenShift engineer I want the CAPI Providers repositories to use the new generator tool so that they can independently generate CAPI Provider transport ConfigMaps
Once the new CAPI manifests generator tool is ready, we want to make use of that directly from the CAPI Providers repositories so we can avoid storing the generated configuration centrally and independently apply that based on the running platform.
This goals of this features are:
Given Microsoft's constraints on IPv4 usage, there is a pressing need to optimize IP allocation and management within Azure-hosted environments.
Interoperability Considerations
There's currently multiple ingress strategies we support for hosted cluster service endpoints (kas, nodePort, router...).
In a context of uncertainty about what use cases would be more critical to support, we initially exposed this in a flexible API that enables to potentially choose any combination of ingress strategies and endpoints.
ARO has internal restrictions on IPv4 usage. Because of this, to simplify the above and to be more cost effective in terms of infra we'd want to have a common shared ingress solution for all hosted clusters fleet.
As a management cluster owner I want to make sure the shared ingress is resilient to cluster failures
Currently the SharedIngress controller waits for a HostedCluster to exist before creating the Service/LoadBalancer of the shared-ingress.
The controller should create the Service/LoadBalancer even
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
Our goal is to be able to deploy baremetal clusters using Cluster API in Openshift.
Metal3, our upstream community, already provides a CAPI provider, and our aim is to bring it downstream.
We will collaborate with the Cluster Infrastructure team on points of integration as needed.
Scope questions
Firmware (BIOS) updates and attributes configuration from OpenShift is key in O-RAN clusters. While can do it on day 1, customers need to set firmware attributes to hosts that have already been deployed and are part of a cluster.
This feature adds the capability of updating firmware attributes and updating the firmware image for hosts in deployed clusters.
As part of demoing our integration with hardware vendors, we need to show the ability to reconfigure already provisioned hosts: modify their BIOS settings and, in the future, do firmware upgrades. The initial demo will be concentrated on BIOS settings. The demo is expected to be based on 4.15 and to use unmerged patches since 4.15 is closed for feature development. The path to productization will be determined as an outcome of the demo.
The assumed end result is an ability to run firmware upgrades and update BIOS settings for hosts that are already provisioned without fully deprovisioning them. The hosts will still be rebooted, so some external orchestrator (a human or ZTP) will need to drain the nodes first.
1. Pre-installation:
2. Installation:
3. Update:
4. Uninstallation/Deletion:
5. Disconnected Environments for High-Security Workloads:
6. [Tech Preview] Signature Validation for Secure Workflows:
All the expected user outcomes and the acceptance criteria in the engineering epics are covered.
OLM: Gateway to the OpenShift Ecosystem
Operator Lifecycle Manager (OLM) has been a game-changer for OpenShift Container Platform (OCP) 4. Since its launch in 2019, OLM has fostered a rich ecosystem, expanding from a curated set of 25 operators to over 100 officially supported Red Hat operators and hundreds more from certified ISVs and the community.
OLM empowers users to manage diverse technologies with ease, including ACM, ACS, Quay, GitOps, Pipelines, Service Mesh, Serverless, and Virtualization. It has also facilitated the introduction of groundbreaking operators for entirely new workloads, like Nvidia GPU, PTP, Windows Machine Config, SR-IOV networking, and more. Today, a staggering 91% of our connected customers leverage OLM's capabilities.
OLM v0: A Stepping Stone
While OLM v0 has been instrumental, it has limitations. The API design, not fully GitOps-friendly or entirely declarative, presents a steeper learning curve due to its complexity. Furthermore, OLM v0 was designed with the assumption of namespace-scoped CRDs (Custom Resource Definitions), allowing for independent operator installations and parallel versions within a single cluster. However, this functionality never materialized in core Kubernetes, and OLM v0's attempt to simulate it has introduced limitations and bugs.
The Operator Framework Team: Building the Future
The Operator Framework team is the cornerstone of the OpenShift ecosystem. They build and manage OLM, the Operator SDK, operator catalog formats, and tooling (opm, file-based catalogs). Their work directly impacts how operators are developed, packaged, delivered, and managed by users and SRE teams on OpenShift clusters.
A Streamlined Future with OLM v1
The Operator Framework team has undergone significant restructuring to focus on the next generation of OLM – OLM v1. This transition includes moving the Operator SDK to a feature-complete state with ongoing maintenance for compatibility with the latest Kubernetes and controller-runtime libraries. This strategic shift allows the team to dedicate resources to completely revamping OLM's API and management concepts for catalog content delivery.
Leveraging learnings and customer feedback since OCP 4's inception, OLM v1 is designed to be a major overhaul, and it will be shipped as a Generally Available (GA) feature in OpenShift 4.17.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
1. Pre-installation:
2. Installation:
3. Update:
4. Uninstallation/Deletion:
1. Pre-installation:
2. Installation:
3. Update:
4. Uninstallation/Deletion:
Downstream change to add kustomize overlay for hostPath volume mount of /etc/containers
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
https://docs.google.com/document/d/18m-OG0PN8-jjjgGT33WNujzmj_1B2Tqoqd-bVKX4CkE/edit?usp=sharing
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.18 release, dependencies need to be updated to 1.31. This should be done by rebasing/updating as appropriate for the repository
Please review the following PR: https://github.com/openshift/machine-config-operator/pull/4561
The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.
Differences in upstream and downstream builds impact the fidelity of your CI signal.
If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.
Closing this issue without addressing the difference will cause the issue to
be reopened automatically.
Epic Goal*
Drive the technical part of the Kubernetes 1.31 upgrade, including rebasing openshift/kubernetes repositiry and coordination across OpenShift organization to get e2e tests green for the OCP release.
Why is this important? (mandatory)
OpenShift 4.18 cannot be released without Kubernetes 1.31
Scenarios (mandatory)
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
PRs:
Retro: Kube 1.31 Rebase Retrospective Timeline (OCP 4.18)
Retro recording: https://drive.google.com/file/d/1htU-AglTJjd-VgFfwE3z_dH5tKXT1Tes/view?usp=drive_web
Following the recent changes in the CRD schema validation (introduced in https://github.com/kubernetes-sigs/controller-tools/pull/944), our tooling have identified several CRD violations in our APIs:
Description of problem:
Given 2 images with different names, but same layers, "oc image mirror" will only mirror 1 of them. For example: $ cat images.txt quay.io/openshift/community-e2e-images:e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS quay.io/bertinatto/test-images:e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS quay.io/openshift/community-e2e-images:e2e-31-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS quay.io/bertinatto/test-images:e2e-31-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS $ oc image mirror -f images.txt quay.io/ bertinatto/test-images manifests: sha256:298dcd808e27fbf96614e4c6f06730f22964dce41dcdc7bf21096c42411ba773 -> e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS stats: shared=0 unique=0 size=0B phase 0: quay.io bertinatto/test-images blobs=0 mounts=0 manifests=1 shared=0 info: Planning completed in 2.6s sha256:298dcd808e27fbf96614e4c6f06730f22964dce41dcdc7bf21096c42411ba773 quay.io/bertinatto/test-images:e2e-33-registry-k8s-io-e2e-test-images-resource-consumer-1-13-LT0C2W4wMzShSeGS info: Mirroring completed in 240ms (0B/s)
Version-Release number of selected component (if applicable):
4.18
How reproducible:
Always
Steps to Reproduce:
1. 2. 3.
Actual results:
Only one of the images were mirrored.
Expected results:
Both images should be mirrored.
Additional info:
TechPreview clusters are unable to bootstrap because kube-apiserver fails to start with the following error:
E0827 20:29:22.653501 1 run.go:72] "command failed" err="group version resource.k8s.io/v1alpha2 that has not been registered"
This happens because, in Kubernetes 1.31, the group version resource.k8s.io/v1alpha2 was removed and replaced with resource.k8s.io/v1alpha3. This is part of the DynamicResourceAllocation feature, which is currently TechPreview.
After discussing this with the team, we decided that the best approach is to modify the cluster-kube-apiserver-operator to start the kube-apiserver with the correct group version based on the Kubernetes version being used.
Goal:
Update team owned repositories to Kubernetes v1.31
?? is the 1.31 freeze
?? is the 1.31 GA
Problem:<please update links for 1.31>
The following repository must be rebased onto the latest version of Kubernetes:
The following repositories should be rebased onto the latest version of Kubernetes:
Entirely remove dependencies on k/k repository inside oc.
Why is this important:
Networking Definition of Planned
Epic Template descriptions and documentation
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
As a customer of self managed OpenShift or an SRE managing a fleet of OpenShift clusters I should be able to determine the progress and state of an OCP upgrade and only be alerted if the cluster is unable to progress. Support a cli-status command and status-API which can be used by cluster-admin to monitor the progress. status command/API should also contain data to alert users about potential issues which can make the updates problematic.
Here are common update improvements from customer interactions on Update experience
oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0 True True 16s Working towards 4.12.4: 9 of 829 done (1% complete)
Update docs for UX and CLI changes
Reference : https://docs.google.com/presentation/d/1cwxN30uno_9O8RayvcIAe8Owlds-5Wjr970sDxn7hPg/edit#slide=id.g2a2b8de8edb_0_22
Epic Goal*
Add a new command `oc adm upgrade status` command which is backed by an API. Please find the mock output of the command output attached in this card.
Why is this important? (mandatory)
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
As an OTA engineer,
I would like to make sure the node in a single-node cluster is handled correctly in the upgrade-status command.
Context:
According to the discussion with the MCO team,
the node is in MCP/master but not worker.
This card is to make sure that the node are displayed that way too. My feeling is that the current code probably does the job already. In that case, we should add test coverage for the case to avoid regression in the future.
AC:
We utilize MCO annotations to determine whether a node is degraded or unavailable, and we solely source the Reason annotation to put into the insight. Many common cases are not covered by this, especially the unavailable ones: nodes can be cordoned, have a condition like DiskPressure, be in the process of termination etc. Not sure whether our code or something like MCO should provide it, but captured this as a card for now.
During an upgrade, once control plane is successfully updated, status items related to that part of the upgrade cease to be relevant, and therefore we can either hide them entirely, or we can show a simplified version of them. The relevant sections are Control plane and Control plane nodes.
An update is in progress for 28m42s: Working towards 4.14.1: 700 of 859 done (81% complete), waiting on network = Control Plane = ... Completion: 91%
1. Inconsistent info: CVO message says "700 of 859 done (81% complete)" but control plane section says "Completion: 91%"
2. Unclear measure of completion: CVO message counts manifest applied and control plane section says "Completion: 91%" which counts upgraded COs. Both messages do not state what they count. Manifest count is an internal implementation detail which users likely do not understand. COs are less so, but we should be more clear in what the completion means.
3. We could take advantage of this line and communicate progress with more details
We'll only remove CVO message once the rest of the output functionally covers it, so the inconsistency stays until OTA-1154. Otherwise:
= Control Plane = ... Completion: 91% (30 operators upgraded, 1 upgrading, 2 waiting)
Upgraded operators are COs that updated its version, no matter its conditions
Upgrading operators are COs that havent updated its version and are Progressing=True
Waiting operators are COs that havent updated its version and are Progressing=False
In the initial delivery of CoreOS Layering, it is required that administrators provide their own build environment to customize RHCOS images. That could be a traditional RHEL environment or potentially an enterprising administrator with some knowledge of OCP Builds could set theirs up on-cluster.
The primary virtue of an on-cluster build path is to continue using the cluster to manage the cluster. No external dependency, batteries-included.
On-cluster, automated RHCOS Layering builds are important for multiple reasons:
This work describes the tech preview state of On Cluster Builds. Major interfaces should be agreed upon at the end of this state.
As a cluster admin of user provided infrastructure,
when I apply the machine config that opts a pool into On Cluster Layering,
I want to also be able to remove that config and have the pool revert back to its non-layered state with the previously applied config.
As a cluster admin using on cluster layering,
when an image build has failed,
I want it to retry 3 times automatically without my intervention and show me where to find the log of the failure.
As a cluster admin,
when I enable On Cluster Layering,
I want to know that the builder image I am building with is stable and will not change unless I change it
so that I keep the same API promises as we do elsewhere in the platform.
To test:
As a cluster admin using on cluster layering,
when I try to upgrade my cluster and the Cluster Version Operator is not available,
I want the upgrade operation to be blocked.
As a cluster admin,
when I use a disconnected environment,
I want to still be able to use On Cluster Layering.
As a cluster admin using On Cluster layering,
When there has been config drift of any sort that degrades a node and I have resolved the issue,
I want to it to resync without forcing a reboot.
As a cluster admin using on cluster layering,
when a pool is using on cluster layering and references an internal registry
I want that registry available on the host network so that the pool can successfully scale up
(MCO-770, MCO-578, MCO-574 )
As a cluster admin using on cluster layering,
when a pool is using on cluster layering and I want to scale up nodes,
the nodes should have the same config as the other nodes in the pool.
Maybe:
Entitlements: MCO-1097, MCO-1099
Not Likely:
As a cluster admin using on cluster layering,
when I try to upgrade my cluster,
I want the upgrade operation to succeed at the same rate as non-OCL upgrades do.
Currently, it is not possible for cluster admins to revert from a pool that is opted into on-cluster builds and layered MachineConfig updates. See https://issues.redhat.com/browse/OCPBUGS-16201 for details around what happens.
It is worth mentioning that this is mostly an issue for UPI (user provided infrastructure) / bare metal users of OpenShift. For IPI cases in AWS / GCP / Azure / et. al., one can simply delete the node and the machine, which will cause the Machine API to provision a fresh node to replace it, e.g.:
#!/bin/bash node_name="$1" node_name="${node_name/node\//}" machine_id="$(oc get "node/$node_name" -o jsonpath='{.metadata.annotations.machine\.openshift\.io/machine}')" machine_id="${machine_id/openshift-machine-api\//}" oc delete --wait=false "machine/$machine_id" -n openshift-machine-api oc delete --wait=false "node/$node_name"
Done When
As an OpenShift cluster admin, I would like to try out on-cluster layering (OCL) to better understand how it works, how to set it up, and how to use it. To that end, a quick-start guide for what I need to do to get started as well as a troubleshooting guide would be indispensable.
Done When:
ETCD backup API was delivered behind a feature gate in 4.14. This feature is to complete the work for allowing any OCP customer to benefit from the automatic etcd backup capability.
The ability for OCP users to benefit from the features
Complete work to auto-provision internal PVCs when using the local PVC backup option. (right now, the user needs to create PVC before enabling the service)
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | yes |
Hosted control planes | no |
Multi node, Compact (three node), or Single node (SNO), or all | all |
Connected / Restricted Network | both |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | all |
Operator compatibility | N/A |
Backport needed (list applicable versions) | N/A |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | N/A |
Other (please specify) | N/A |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Epic Goal*
Provide automated backups of etcd saved locally on the cluster on Day 1 with no additional config from the user.
Why is this important? (mandatory)
The current etcd automated backups feature requires some configuration on the user's part to save backups to a user specified PersistentVolume.
See: https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/config/v1alpha1/types_backup.go#L46
Before the feature can be shipped as GA, we would require the capability to save backups automatically by default without any configuration. This would help all customers have an improved disaster recovery experience by always having a somewhat recent backup.
Scenarios (mandatory)
Implementation details:
One issue we need to figure out during the design of this feature is how the current API might change as it is inherently tied to the configuration of the PVC name.
See:
https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/config/v1alpha1/types_backup.go#L99
and
https://github.com/openshift/api/blob/ba11c1587003dc84cb014fd8db3fa597a3faaa63/operator/v1alpha1/types_etcdbackup.go#L44
Additionally we would need to figure out how the etcd-operator knows about the available space on local storage of the host so it can prune and spread backups accordingly.
Dependencies (internal and external) (mandatory)
Depends on changes to the etcd-operator and the tech preview APIs
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Upon installing a tech-preview cluster backups must be saved locally and their status and path must be visible to the user e.g on the operator.openshift.io/v1 Etcd cluster object.
An e2e test to verify that the backups are being saved locally with some default retention policy.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
As a developer, I want to add e2e test for the ** etcd-backup-server sidecar container
As a developer, I want to add etcd backup pruning logic within etcd-backup-server sidecar container
The objective is to create a comprehensive backup and restore mechanism for HCP OpenShift Virtualization Provider. This feature ensures both the HCP state and the worker node state are backed up and can be restored efficiently, addressing the unique requirements of KubeVirt environments.
The HCP team has delivered OADP backup and restore steps for the Agent and AWS provider here. We need to add the steps necessary to make these steps work for HCP KubeVirt clusters.
document this process in the upstream hypershift documentation.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
OpenShift is planning to ship all payload and layered product images signed consistently via cosign with OpenShift 4.17. oc-mirror should be able to leverage this to provide a seamless signature verification experience in an offline environment by automatically making all required signature artifacts available in the offline registry.
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
<your text here>
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Overview
This task is really to ensure oc-mirror v2 has backward compatibility to what v1 was doing regarding signatures
Goal
Ensure the correct configmaps are generated and stored in a folder so that the user can deploy the related artifact/s to the cluster as in v1
Feature description
Oc-mirror v2 is focuses on major enhancements that include making oc-mirror faster and more robust and introduces caching as well as address more complex air-gapped scenarios. OC mirror v2 is a rewritten version with three goals:
Check if it is possible to delete operators using the delete command when the previous command was mirror to mirror. Probably it won't work because in mirror to mirror the cache is not updated.
It is necessary to find a solution for this scenario.
Customers who deploy a large number of OpenShift on OpenStack clusters want to minimise the resource requirements of their cluster control planes.
Customers deploying RHOSO (OpenShift services for OpenStack, i.e. OpenStack control plane on bare metal OpenShift) already have a bare metal management cluster capable of serving Hosted Control Planes.
We should enable self-hosted (i.e. on-prem) Hosted Control Planes to serve Hosted Control Planes to OpenShift on OpenStack clusters, with a specific focus of serving Hosted Control Planes from the RHOSO management cluster.
As an enterprise IT department and OpenStack customer, I want to provide self-managed OpenShift clusters to my internal customers with minimum cost to the business.
As an internal customer of said enterprise, I want to be able to provision an OpenShift cluster for myself using the business's existing OpenStack infrastructure.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
TBD
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
HyperShift should be able to deploy the minimum useful OpenShift cluster on OpenStack. This is the minimum requirement to be able to test it. It is not sufficient for GA.
This is a container Epic for tasks which we know need to be done for Tech Preview but which we don't intend to do now. It needs to be groomed before it is useful for planning.
When the management cluster runs on AWS, make sure we update the DNS record for *apps, so ingress can work out of the box.
Matthew Booth is worried about that feature that we added to pre-create a FIP and assign it to the Service object for router-default. This is indeed racy and could be problematic if another controller would take over that field as well, it'll create infinite loops and the result wouldn't be great for customers.
The idea is to remove that feature now and eventually add it back later when it's safer (e.g. feature added to the Ingress operator?). It's worth noting that core kubernetes has deprecated the loadBalancerIP field in the Service object, and it now works with annotations. Maybe we need to investigate that path.
We should not have to explicitly configure the location of the clouds.yaml file, since there is a list of well-known places where these can be found. We should also be able to configure the cloud used from the chosen clouds.yaml.
Right now, our pods are SingleReplica because to have multiple replicas we need more than one zone for nodes which translates into AZ in OpenStack. We need to figure that out.
We don't need to create another service for Ingress, so we can save a FIP.
We deprecated "DeploymentConfig" in-favor of "Deployment" in OCP 4.14
Now in 4.18 we want to make "Deployment " as default out of box that means customer will get Deployment when they install OCP 4.18 .
Deployment Config will still be available in 4.18 as non default for user who still want to use it .
FYI "DeploymentConfig" is tier 1 API in Openshift and cannot be removed from 4.x product
Please Review this FAQ : https://docs.google.com/document/d/1OnIrGReZKpc5kzdTgqJvZYWYha4orrGMVjfP1fUpljY/edit#heading=h.oranye5nwtsy
Epic Goal*
WRKLDS-695 was implemented to make the DC enabled through capability in 4.14. In order to prepare customers for migration to Deployments the capability got enabled by default. After 3 releases we need to reconsider whether disabling the capability by default is feasible.
More about capabilities in https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md#capability-sets.
Why is this important? (mandatory)
Disabling a capability by default make an OCP installation lighter. Less component running by default reduces a security risk/vulnerability surface.
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
None
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Drawbacks or Risk (optional)
None. The DC capability can be enabled if needed.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Before the DCs can be disabled by default all the relevant e2e relying on DCs need to be migrated to Deployments to maintain the same testing coverage.
When using OpenShift in a mixed, multi-architecture environment some key details or checks or not always available. With this feature we will take a first pass at improving the UI/UX for customers as adoption of this configuration continues at pace.
The UI/UX experience should improved when being used in a mixed architecture OCP cluster
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Y |
Classic (standalone cluster) | Y |
Hosted control planes | Y |
Multi node, Compact (three node), or Single node (SNO), or all | Y |
Connected / Restricted Network | Y |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | All architectures |
Operator compatibility | n/a |
Backport needed (list applicable versions) | n/a |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | OpenShift Console |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Epic Goal
Why is this important?
Scenarios
1. …
Acceptance Criteria
Dependencies (internal and external)
1. …
Previous Work (Optional):
1. …
Open questions::
1. …
Done Checklist
As a product manager or business owner of OpenShift Lightspeed. I want to track who is using what feature of OLS and WHY. I also want to track the product adoption rate so that I can make decision about the product ( add/remove feature , add new investment )
Enable moniotring of OLS by defult when a user install OLS operator ---> check the box by defualt
Users will have the ability to disable the monitoring by . ----> in check the box
Refer to this slack conversation :https://redhat-internal.slack.com/archives/C068JAU4Y0P/p1723564267962489
Add support to GCP N4 Machine Series to be used as Control Plane and Compute Nodes when deploying Openshift on Google Cloud
As a user, I want to deploy OpenShift on Google Cloud using N4 Machine Series for the Control Plane and Compute Node so I can take advantage of these new Machine types
OpenShift can be deployed in Google Cloud using the new N4 Machine Series for the Control Plane and Compute Nodes
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | both |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | all |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Google has made N4 Machine Series available on their cloud offering. These Machine Series use "hyperdisk-balanced" disk for the boot device that are not currently supported
The documentation will be updated adding the new disk type that needs to be supported as part of this enablement. Also the N4 Machine Series will be added as tested Machine types for Google Cloud when deploying OpenShift
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo. Having a common repo will across drivers will ease maintenance burden.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).
As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | yes |
Classic (standalone cluster) | yes |
Hosted control planes | all |
Multi node, Compact (three node), or Single node (SNO), or all | all |
Connected / Restricted Network | all |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | all |
Operator compatibility | |
Backport needed (list applicable versions) | no |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | no |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).
As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
N/A includes all the CSI operators Red Hat manages as part of OCP
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
This effort started with CSI operators that we included for HCP, we want to align all CSI operator to use the same approach in order to limit maintenance efforts.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Not customer facing, this should not introduce any regression.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
No doc needed
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
N/A, it's purely tech debt / internal
Epic Goal*
Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.
Why is this important? (mandatory)
Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo.
Scenarios (mandatory)
As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).
As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.
Dependencies (internal and external) (mandatory)
None, this can be done just by the storage team and independently on other operators / features.
Contributing Teams(and contacts) (mandatory)
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Following the step described in the enhancement, we should do the following:
Once this is done, we can work towards rewriting the operator to take advantage of the new generator tooling used for existing migrated operators.
Epic Goal*
Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.
Why is this important? (mandatory)
Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo.
Scenarios (mandatory)
As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).
As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.
Dependencies (internal and external) (mandatory)
None, this can be done just by the storage team and independently on other operators / features.
Contributing Teams(and contacts) (mandatory)
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Epic Goal*
Merge the CSI driver operator into csi-operator repo and re-use asset generator and CSI operator code there.
Why is this important? (mandatory)
Maintaining separate CSI driver operator repo is hard, especially when dealing with CVEs and library bumps. In addition, we could share even more code when moving all CSI driver operators into a single repo.
Scenarios (mandatory)
As cluster admin, I upgrade my cluster to a version with this epic implemented and I do not see any change, the CSI driver works the same as before. (Some pods, their containers or services may get renamed during the upgrade process).
As OCP developer, I have 1 less repo to worry about when fixing a CVE / bumping library-go or Kubernetes libraries.
Note: we do not plan to do any changes for HyperShift. The EFS CSI driver will still fully run in the guest cluster, including its control plane.
Dependencies (internal and external) (mandatory)
None, this can be done just by the storage team and independently on other operators / features.
Contributing Teams(and contacts) (mandatory)
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
TBD
Support mapping OpenShift zones to vSphere host groups, in addition to vSphere clusters.
When defining zones for vSphere administrators can map regions to vSphere datacenters and zones to vSphere clusters.
There are use cases where vSphere clusters have only one cluster construct with all their ESXi hosts but the administrators want to divide the ESXi hosts in host groups. A common example is vSphere stretched clusters, where there is only one logical vSphere cluster but the ESXi nodes are distributed across to physical sites, and grouped by site in vSphere host groups.
In order for OpenShift to be able to distribute its nodes on vSphere matching the physical grouping of hosts, OpenShift zones have to be able to map to vSphere host groups too.
Support mapping OpenShift zones to vSphere host groups, in addition to vSphere clusters.
When defining zones for vSphere administrators can map regions to vSphere datacenters and zones to vSphere clusters.
There are use cases where vSphere clusters have only one cluster construct with all their ESXi hosts but the administrators want to divide the ESXi hosts in host groups. A common example is vSphere stretched clusters, where there is only one logical vSphere cluster but the ESXi nodes are distributed across to physical sites, and grouped by site in vSphere host groups.
In order for OpenShift to be able to distribute its nodes on vSphere matching the physical grouping of hosts, OpenShift zones have to be able to map to vSphere host groups too.
Support in the IPI installer for OpenShift on vSphere to create the OpenShift node VMs with multiple NICs and subnets.
This is necessary when users want to have dedicated network links in the node VMs for storage or database for example, in addition to the service network link that we create now
Requirements
Users can specify multiple NICs for the OpenShift VMs that will be created for the OpenShift cluster nodes with different subnets.
Support in the IPI installer for OpenShift on vSphere to create the OpenShift node VMs with multiple NICs and subnets.
This is necessary when users want to have dedicated network links in the node VMs for storage or database for example, in addition to the service network link that we create now
Requirements
Users can specify multiple NICs for the OpenShift VMs that will be created for the OpenShift cluster nodes with different subnets.
Description:
The machine config operator needs to be bumped to pick up the API change:
I0819 17:50:00.396986 1 machineconfig.go:87] ControllerConfig not found, creating new one E0819 17:50:00.400599 1 machineconfig.go:90] Failed to create ControllerConfig: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.infra.spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]
Acceptance Criteria:
The machine API is failing to render compute nodes when multiple NICs are configured:
Unable to apply 4.17.0-0.ci.test-2024-08-15-193100-ci-ln-igm0nhk-latest: ControllerConfig.mac hineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.infra.spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules w ere not checked because the object was invalid; correct the existing errors to complete validation]
Description:
Bump machine-api to pick up changes in openshift/api#2002.
Acceptance Criteria:
issue created by splat-bot
{}USER STORY:{}
As an OpenShift provisioner, I want to provision a cluster in which nodes have multiple network adapters so that I can implement the desired network topology.
{}DESCRIPTION:{}
Customers have a need to provision nodes with multiple adapters in day 0. capv supports the ability to specify multiple adapters in its clone spec. The installer should be augmented to support additional NICs.
{}Required:{}
{}Nice to have:{}
...
{}ACCEPTANCE CRITERIA:{}
{}ENGINEERING DETAILS:{}
Description:
The infrastructure spec validation needs to be updated to change the network count restriction to [10|https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=1-0.]
When multiple NICs are enabled(the installer allows this?) bootstrapping fails with:
Aug 15 18:30:57 2.252.83.01.in-addr.arpa cluster-bootstrap[4889]: [#1673] failed to create some manifests: Aug 15 18:30:57 2.252.83.01.in-addr.arpa cluster-bootstrap[4889]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : Infrastructure.config.openshift.io "cluster" is invalid: [spec.platformSpec.vsphere.failureDomains[0].topology.networks: Too many: 2: must have at most 1 items, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]
Acceptance Criteria:
issue created by splat-bot
Improve the cluster expansion with the agent workflow added in OpenShift 4.16 (TP) and OpenShift 4.17 (GA) with:
Improve the user experience and functionality of the commands to add nodes to clusters using the image creation functionality.
Run integration tests for presubmit jobs in the installer repo
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
A set of capabilities need to be added to the Hypershift Operator that will enable AWS Shared-VPC deployment for ROSA w/ HCP.
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
Build capabilities into HyperShift Operator to enable AWS Shared-VPC deployment for ROSA w/ HCP.
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Antoni Segura Puimedon Please help with providing what Hypershift will need on the OCPSTRAT side.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | (perhaps) both |
Classic (standalone cluster) | |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_64 and Arm |
Operator compatibility | |
Backport needed (list applicable versions) | 4.14+ |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | no (this is an advanced feature not being exposed via web-UI elements) |
Other (please specify) | ROSA w/ HCP |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Currently the same SG is used for both workers and VPC endpoint. Create a separate SG for the VPC endpoint and only open the ports necessary on each.
"Shared VPCs" are a unique AWS infrastructure design: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html
See prior work/explanations/etc here: https://issues.redhat.com/browse/SDE-1239
Summary is that in a Shared VPC environment, a VPC is created in Account A and shared to Account B. The owner of Account B wants to create a ROSA cluster, however Account B does not have permissions to create a private hosted zone in the Shared VPC. So they have to ask Account A to create the private hosted zone and link it to the Shared VPC. OpenShift then needs to be able to accept the ID of that private hosted zone for usage instead of creating the private hosted zone itself.
QE should have some environments or testing scripts available to test the Shared VPC scenario
The AWS endpoint controller in the CPO currently uses the control plane operator role to create the private link endpoint for the hosted cluster as well as the corresponding dns records in the hypershift.local hosted zone. If a role is created to allow it to create that vpc endpoint in the vpc owner's account, the controller would have to explicitly assume the role so it can create the vpc endpoint, and potentially a separate role for populating dns records in the hypershift.local zone.
The users would need to create a custom policy to enable this
Add the necessary API fields to support a Shared VPC infrastructure, and enable development/testing of Shared VPC support by adding the Shared VPC capability to the hypershift CLI.
The e2e tests that were introduced in U/S OVN-K repo should be ported and added to D/S.
Console enhancements based on customer RFEs that improve customer user experience.
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
As a cluster admin I want to set a cluster wide setting for hiding the "Getting started resources" banner from Overview, for all the console users.
AC:
As a cluster admin I want to set a cluster wide setting for hiding the "Getting started resources" banner from Overview, for all the console users.
AC:
As a user who is visually impaired, or a user who is out in the sun, when I switch the theme in the console to Light mode, then try to edit text files (e.g., the YAML configuration for a pod) using the web console, I want the editor to be in light theme.
This feature only covers the downstream MAPI work to Enable Capacity Blocks.
Capacity Blocks is needed in managed OpenShift (ROSA with Hosted Control Planes) via CAPI. Once the HCP feature and OCM feature are completed then a Service Consumer can use upstream CAPI to set Capacity reservations in ROSA+HCP cluster.
Epic to track work done in https://github.com/openshift/machine-api-provider-aws/pull/110
We aim to continue establishing a comprehensive testing strategy for Hosted Control Planes (HCP) that aligns with Red Hat’s support requirements and ensures customer satisfaction. This involves testing across various permutations, including providers, lifecycle, upgrades, and version compatibility. The testing must span management clusters, hubs, MCE, control planes, and nodepools, while coordinating across multiple QE teams to avoid duplication and inefficiencies. We aim to sustain an evolving testing matrix to meet product demands, especially as new versions and extended OCP lifecycles are introduced.
See: https://docs.google.com/spreadsheets/d/1j8TjMfyCfEt8OzTgvrAG3tuC6WMweBh5ElzWu6oAvUw/edit?gid=0#gid=0
The HCP architecture introduces decoupled control planes and worker nodes, significantly increasing the number of testing permutations. Ensuring these scenarios are tested is crucial to maintaining product quality, customer satisfaction, and stay compliant as an OpenShift form-factor.
This was attempted once before
https://github.com/openshift/release/pull/47599
Then reverted
https://github.com/openshift/release/pull/48326
ROSA HCP prod runs with HO from main but 4.14 and 4.15 HCs (currently), however, we do not test these together in presubmit testing, increases the chance of an escape.
OCP 4 clusters still maintain pinned boot images. We have numerous clusters installed that have boot media pinned to first boot images as early as 4.1. In the future these boot images may not be certified by the OEM and may fail to boot on updated datacenter or cloud hardware platforms. These "pinned" boot images should be updateable so that customers can avoid this problem and better still scale out nodes with boot media that matches the running cluster version.
In phase 1 provided tech preview for GCP.
In phase 2, GCP support goes to GA and AWS goes to TP.
In phase 3, AWS support goes to GA and vsphere goes TP.
This epic will encompass work involved to GA the boot image update feature for the AWS platform.
Per GA requirements, we are required to add tests to openshift/origin. This story will encompass that work.
Review, refine and harden the CAPI-based Installer implementation introduced in 4.16
From the implementation of the CAPI-based Installer started with OpenShift 4.16 there is some technical debt that needs to be reviewed and addressed to refine and harden this new installation architecture.
Review existing implementation, refine as required and harden as possible to remove all the existing technical debt
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
There should not be any user-facing documentation required for this work
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Once a cloud provider uses CAPI by default, the feature gate it used becomes tech debt.
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
We need a place to add tasks that are not feature oriented.
The agent installer does not require the infra-env id to be present in the claim to perform the authentication.
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
The agent installer does not require the infra-env id to be present in the claim to perform the authentication.
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
This feature aims to comprehensively refactor and standardize various components across HCP, ensuring consistency, maintainability, and reliability. The overarching goal to increase customer satisfaction by increasing speed to market and saving engineering budget by reducing incidents/bugs. This will be achieved by reducing technical debt, improving code quality, and simplifying the developer experience across multiple areas, including CLI consistency, NodePool upgrade mechanisms, networking flows, and more. By addressing these areas holistically, the project aims to create a more sustainable and scalable codebase that is easier to maintain and extend.
Over time, the HyperShift project has grown organically, leading to areas of redundancy, inconsistency, and technical debt. This comprehensive refactor and standardization effort is a response to these challenges, aiming to improve the project's overall health and sustainability. By addressing multiple components in a coordinated way, the goal is to set a solid foundation for future growth and development.
Ensure all relevant project documentation is updated to reflect the refactored components, new abstractions, and standardized workflows.
This overarching feature is designed to unify and streamline the HCP project, delivering a more consistent, maintainable, and reliable platform for developers, operators, and users.
As a dev I want the base code to be easier to read, maintain and test
If devs are don't have a healthy dev environment the project will go and the business won't make $$
Goal
Refactor and modularize controllers and other components to improve maintainability, scalability, and ease of use.
As a (user persona), I want to be able to:
Context:
If you ever had to add or modify a component to the control plane operator the need for this becomes very obvious. There should be possible to only add components manifest through a gated interface.
Right now adding a new component requires copy/paste hundreds of lines of boilerplate and there's plenty of room for side effects. A dev need to manually remember to set the right config like AutomountServiceAccountToken false, topology opinions...
We should refactor support/config and all the consumers in the CPO to enforce components creation through audited and common signature/interfaces.
Adding a new component is only possible through this higher abstractions
Abstract away in a single place all the logic related to token and userdata secrets consuming the output of https://issues.redhat.com/browse/HOSTEDCP-1678
This should result in a single abstraction i.e. "Token" that expose a thin library e.g. Reconcile() and hide all details for token/userdata secrets lifecycle
Following up to abstracting pieces into cohesively units, capi is the next logic choice since there's many reconciliation business logic for it in the NodePool controller.
Goals:
All capi related logic is driven by a single abstraction/struct.
Almost full unit test coverage
Deeper refactor of the concrete implementation logic is left out of the scope for gradual test driven follow ups
As as dev I want to easily add and understand which input results in triggering a nodepool upgrade.
There's many scattered things that triggers nodepool rolling upgrade on change.
For code sustainability it'd be good to try to have a common abstraction that discovers all of them based on an input and return the authoritative hash for any targeted config version in time.
Related https://github.com/openshift/hypershift/pull/4057
https://github.com/openshift/hypershift/pull/3969#discussion_r1587198191
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
<your text here>
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Nicole Thoen already started with crafting a technical debt impeding PF6 migrations document, which contains list of identified tech -debt items, deprecated components etc...
Locations
frontend/packages/console-app/src/components/
NavHeader.tsx [merged]
PDBForm.tsx (This should be a <Select>) [merged]
Acceptance Criteria:
DropdownDeprecated are replaced with latest components
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/dropdown
https://www.patternfly.org/components/menus/select
Locations
frontend/packages/console-shared/src/components/
GettingStartedGrid.tsx (has KebabToggleDeprecated)
Note
DropdownDeprecated is replaced with latest components
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/menu-toggle#plain-toggle-with-icon
https://www.patternfly.org/components/menus/dropdown
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of DropdownDeprecated and KebabToggleDeprecated with PF components, based on their semantics (either Dropdown or Select components).
AC:
PatternFly demo using Dropdown and Menu components
https://www.patternfly.org/components/menus/application-launcher/
NodeLogs.tsx (two) [merged]
PerspectiveDropdown.tsx (??? Can not locate this dropdown in the UI. Reached out to Christoph but didn't hear back.)
UserPreferenceDropdownField.tsx [merged]
ClusterConfigurationDropdownField.tsx (??? Can not locate this dropdown in the UI) Dead code
PerspectiveConfiguration.tsx (options have descriptions) [merged]
Acceptance Criteria
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.
resource-dropdown.tsx (checkbox, options have tooltips, grouped options, hasInlineFilter which is not supported in V6 Select, convert to Typeahead)
filter-toolbar.tsx (grouped, checkbox select)
monitoring/dashboards/index.tsx (checkbox select, hasInlineFilter which is not supported in V6 Select, convert to Typeahead) covered by https://issues.redhat.com/browse/ODC-7655
silence-form.tsx (Currently using DropdownDeprecated, should be using a Select)
timespan-dropdown.ts (Currently using DropdownDeprecated, should be using a Select) covered by https://issues.redhat.com/browse/ODC-7655
poll-interval-dropdown.tsx (Currently using DropdownDeprecated, should be using a Select) covered by https://issues.redhat.com/browse/ODC-7655
Note
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of Deprecated components with PF components, based on their semantics (either Dropdown or Select components).
operator-channel-version-select.tsx (Two)
Acceptance Criteria
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.
multiselectdropdown.tsx (multiple typeahead with placeholder and noResultsFoundText) only used in packages/local-storage-operator moved to https://issues.redhat.com/browse/CONSOLE-4227
UtilizationDurationDropdown.tsx (checkbox select, plain toggle, with placeholder text)
SelectInputField.tsx (uses most Select props) moved to https://issues.redhat.com/browse/ODC-7655
QueryBrowser.tsx (Currently using DropdownDeprecated, should be using a Select)
Note
SelectDeprecated and SelectOptionDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.
Replace DropdownDeprecated
Replace SelectDeprecated
Acceptance Criteria
Note:
DropdownDeprecated and KebabToggleDeprecated are replaced with latest components
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/menu-toggle#plain-toggle-with-icon
https://www.patternfly.org/components/menus/dropdown
https://www.patternfly.org/components/menus/select
Nicole Thoen already started with crafting a technical debt impeding PF6 migrations document, which contains list of identified tech -debt items, deprecated components etc...
KindFilterDropdown.tsx (checkbox select with custom content - not options)
FilterDropdown.tsx (checkbox, grouped, switch component in select menu)
NameLabelFilterDropdown.tsx (Should be a Select component; Currently using DropdownDeprecated)
Acceptance Criteria
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
SecureRouteFields.tsx (Two)
Acceptance Criteria
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
Locations
frontend/packages/topology/MoveConnectionModal.tsx
Note:
DropdownDeprecated are replaced with latest components
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/dropdown
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select or Dropdown components.
monitoring/dashboards/index.tsx (checkbox select, hasInlineFilter which is not supported in V6 Select, convert to Typeahead)
timespan-dropdown.ts (Currently using DropdownDeprecated, should be using a Select)
poll-interval-dropdown.tsx (Currently using DropdownDeprecated, should be using a Select)
SelectInputField.tsx (uses most Select props)
`FilterSelect`, `VariableDropdown`, `TimespanDropdown`, and `IntervalDropdown`are the components that need to be updated; frontend/packages/dev-console/src/components/monitoring/MonitoringPage.tsx is the only valid instance usage of `MonitoringDashboardsPage` as web/src/components/alerting.tsx is orphaned.
Note
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of Deprecated components with PF components, based on their semantics (either Dropdown or Select components).
Locations
frontend/packages/pipelines-plugin/src/components/
PipelineQuickSearchVersionDropdown.tsx (Currently using DropdownDeprecated, should be using a Select)
PipelineMetricsTimeRangeDropdown.tsx (Currently using DropdownDeprecated, should be using a Select)
Note
DropdownDeprecated are replaced with latest Select components
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
AC: Go though the mentioned files and swap the usage of SelectDeprecated with PF Select components.
TelemetryConfiguration.tsx (options have descriptions)
TelemetryUserPreferenceDropdown.tsx (options have descriptions)
Acceptance Criteria
SelectDeprecated are replaced with latest Select component
https://www.patternfly.org/components/menus/menu
https://www.patternfly.org/components/menus/select
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
<your text here>
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should b