Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
These features were completed when this image was assembled
Dependencies (internal and external)
The tool should be able to upload an OpenID Connect (OIDC) configuration to an S3 bucket, and create an AWS IAM Identity Provider that trusts identities from the OIDC provider. It should take infra name as input so that user can identify all the resources created in AWS. Make sure that resources created in AWS are tagged appropriately.
Sample command with existing key pair:
tool-name create identity-provider <infra-name> --public-key ./path/to/public/key
Ensure the Identity Provider includes audience config for both the in-cluster components ('openshift') and the pod-identity-webhook ('sts.amazonaws.com').
ccoctl should be able to delete AWS resources it created
ccoctl delete <infra-name>
https://github.com/openshift/enhancements/pull/555
https://github.com/openshift/api/pull/827
The console operator will need to support single-node clusters.
We have a console deployment and downloads deployment. Each will to be updated so that there's only a single replica when high availability mode is disabled in the Infrastructure config. We should also remove the anti-affinity rule in the console deployment that tries to spread console pods across nodes.
The downloads deployment is currently a static manifest. That likely needs to be created by the console operator instead going forward.
Acceptance Criteria:
Bump github.com/openshift/api to pickup changes from openshift/api#827
As a OpenShift administrator
I want the registry operator to use topology mode from Infrastructure (HighAvailable = 2 replicas, SingleReplica = 1 replica)
so that it the operator is not spending resources for high-availability purpose when it's not needed.
See also:
https://github.com/openshift/enhancements/blob/master/enhancements/cluster-high-availability-mode-api.md
https://github.com/openshift/api/pull/827/files
Platform | SingleReplica | HighAvailable |
---|---|---|
AWS | 1 replica | 2 replicas |
Azure | 1 replica | 2 replicas |
GCP | 1 replica | 2 replicas |
OpenStack (swift) | 1 replica | 2 replicas |
OpenStack (cinder) | 1 replica | 1 replica (PVC) |
oVirt | 1 replica | 1 replica (PVC) |
bare metal | Removed | Removed |
vSphere | Removed | Removed |
Research if we can dynamically reserve memory and CPU for nodes.
Feature Overview
This will be phase 1 of Internationalization of the OpenShift Console.
Phase 1 will include the following:
Phase 1 will not include:
Initial List of Languages to Support
---------- 4.7* ----------
*This will be based on the ability to get all the strings externalized, there is a good chance this gets pushed to 4.8.
---------- Post 4.7 ----------
POC
Goals
Internationalization has become table stakes. OpenShift Console needs to support different languages in each of the major markets. This is key functionality that will help unlock sales in different regions.
Requirements
Requirement | Notes | isMvp? |
---|---|---|
Language Selector | YES | |
Localized Date. + Time | YES | |
Externalization and translation of all client side strings | YES | |
Translation for Chinese and Japanese | YES | |
Process, infra, and testing capabilities put into place | YES | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Out of Scope
Assumptions
Customer Considerations
We are rolling this feature in phases, based on customer feedback, there may be no phase 2.
Documentation Considerations
I believe documentation already supports a large language set.
We need to automate how we send and receive updated translations using Memsource for the Red Hat Globalization team. The Ansible Tower team already has automation in place that we might be able to reuse.
Acceptance Criteria:
We have too many namespaces if we're loading them upfront. We should consolidate some of the files.
Consolidate namepsaces N-R to reduce change size
Consolidate namepsaces K-M to reduce change size
Consolidate namepsaces E-I to reduce change size
Consolidate namepsaces S-Z to reduce change size
Just do namespaces from A-D to reduce number of files being changed at once
Openshift Sandboxed Containers provide the ability to add an additional layer of isolation through virtualization for many workloads. The main way to enable the use of katacontainers on an Openshift Cluster is by first installing the Operator (for more information about operator enablement check [1]).
Once the feature is enabled on the cluster, it just a matter of a one-liner YAML modification on the pod/deployment level to run the workload using katacontianers. That might sound easy for some, but for others who don't care about YAML they might want more abstractions on how to use katacontainers for their workloads.
This feature covers all the efforts required to integrate and present Kata in Openshift UI (console) to cater to all user personas.
To enable for users to adopt Kata as a runtime, it is important to make it easy to use. Adding hook-points in the UI with ease-of-use as a goal in mind is one way to bring in more users.
The main goal of this feature is to make sure that:
Questions to be addressed:
References
[1] https://issues.redhat.com/browse/KATA-429?jql=project %3D KATA AND issuetype %3D Feature
The grand goal is to improve the usability of Kata from Openshift UI. This EPIC aims to cover only a subset that would help:
To use a different runtime e.g., Kata, the "runtimeClassName" will be set to the desired low-level runtime. Also please see [1]:
"RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run. If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an empty definition that uses the default runtime handler. More info: https://git.k8s.io/enhancements/keps/sig-node/runtime-class.md This is a beta feature as of Kubernetes v1.14.."
apiVersion: v1 kind: Pod metadata: name: nginx-runc spec: runtimeClassName: runC
The value of the runtime class cannot be changed on the pod level, but it can be changed on the deployment level
apiVersion: apps/v1 kind: Deployment metadata: name: sandboxed-nginx spec: replicas: 2 selector: matchLabels: app: sandboxed-nginx template: metadata: labels: app: sandboxed-nginx spec: runtimeClassName: kata. # ---> This can be changed containers: - name: nginx image: nginx ports: - containerPort: 80 protocol: TCP
[1] https://docs.openshift.com/container-platform/4.6/rest_api/workloads_apis/pod-core-v1.html
We should show the runtime class on workloads pages and add a badge to the heading in the case a workload uses Kata. A workload uses Kata if its pod template has `runtimeClassName` set to `kata`.
Acceptance Criteria:
Andrew Ronaldson indicated that adding a "kata" badge in the heading would be too much noise around other heading badges (ContainerCreating, Failed, etc).
Assumption
Doc: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit
CNCC was moved to the management cluster and it should use proxy settings defined for the management cluster.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
OVN IC will be the model used in Hypershift.
OVN IC changes are in progress. A number of metrics have been either moved to a new prometheus subsystem or simply a subsystem change. See this metrics doc for details: https://github.com/martinkennelly/ovn-kubernetes-1/blob/c47ed896d6eef1e78844cc258deafd20502c348b/docs/metrics.md
RHEL CoreOS should be updated to RHEL 9.2 sources to take advantage of newer features, hardware support, and performance improvements.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Questions to be addressed:
PROBLEM
We would like to improve our signal for RHEL9 readiness by increasing internal engineering engagement and external partner engagement on our community OpehShift offering, OKD.
PROPOSAL
Adding OKD to run on SCOS (a CentOS stream for CoreOS) brings the community offering closer to what a partner or an internal engineering team might expect on OCP.
ACCEPTANCE CRITERIA
Image has been switched/included:
DEPENDENCIES
The SCOS build payload.
RELATED RESOURCES
OKD+SCOS proposal: https://docs.google.com/presentation/d/1_Xa9Z4tSqB7U2No7WA0KXb3lDIngNaQpS504ZLrCmg8/edit#slide=id.p
OKD+SCOS work draft: https://docs.google.com/document/d/1cuWOXhATexNLWGKLjaOcVF4V95JJjP1E3UmQ2kDVzsA/edit
Acceptance Criteria
A stable OKD on SCOS is built and available to the community sprintly.
This comes up when installing ipi-on-aws on arm64 with the custom payload build at quay.io/aleskandrox/okd-release:4.12.0-0.okd-centos9-full-rebuild-arm64 that is using scos as machine-content-os image
```
[root@ip-10-0-135-176 core]# crictl logs c483c92e118d8
2022-08-11T12:19:39+00:00 [cnibincopy] FATAL ERROR: Unsupported OS ID=scos
```
The probable fix has to land on https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/multus/multus.yaml#L41-L53
As described in the kubernetes "ephemeral volumes" documentation this features tracks GA and improvements in
OCPPLAN-9193 Implemented local ephemeral capacity management as well as CSI Generic ephemeral volume. This feature tracks the remaining work to GA CSI ephemeral in-inline volume, specially the admission plugin to make the feature secure and prevent any insecure driver from using it. Ephemeral in-line is required by some CSI as key feature to operate (e.g SecretStore CSI), ODF is also planning to GA ephemeral in-line with ceph CSI.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
Goal:
The goal is to provide inline volume support (also known as Ephemeral volumes) via a CSI driver/operator. This epic also track the dev of the new admission plugin required to make inline volumes safe.
Problem:
Why is this important:
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
Previous Work:
Customers:
Open questions:
Notes:
Create a validating admission plugin that allows pods to be created if:
Enhancement: https://github.com/openshift/enhancements/blob/master/enhancements/storage/csi-inline-vol-security.md
Create a severity warning alert to alert to admin that there is packet loss occurring due to failed ovs vswitchd lookups. This may occur if vswitchd is cpu constrained and there are also numerous lookups.
Use metric ovs_vswitchd_netlink_overflow which shows netlink messages dropped by the vswitchd daemon due to buffer overflow in userspace.
For the kernel equivalent, use metric ovs_vswitchd_dp_flows_lookup_lost . Both metrics usually have the same value but may differ if vswitchd may restart.
Both these metrics should be aggregate into a single alert if the value has increased recently.
DoD: QE test case, code merged to CNO, metrics document updated ( https://docs.google.com/document/d/1lItYV0tTt5-ivX77izb1KuzN9S8-7YgO9ndlhATaVUg/edit )
ovnk manifests in CNO is not up-to-date, we want to sync it with manifests in microshift repo .
Extend the Workload Partitioning feature to support multi-node clusters.
Customers running RAN workloads on C-RAN Hubs (i.e. multi-node clusters) that want to maximize the cores available to the workloads (DU) should be able to utilize WP to isolate CP processes to reserved cores.
Requirements
A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
requirement | Notes | isMvp? |
< How will the user interact with this feature? >
< Which users will use this and when will they use it? >
< Is this feature used as part of current user interface? >
< What does the person writing code, testing, documenting need to know? >
< Are there assumptions being made regarding prerequisites and dependencies?>
< Are there assumptions about hardware, software or people resources?>
< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>
< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>
<Does the Feature introduce data that could be gathered and used for Insights purposes?>
< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >
< What does success look like?>
< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact>
< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>
< Which other products and versions in our portfolio does this feature impact?>
< What interoperability test scenarios should be factored by the layered product(s)?>
Question | Outcome |
Update admission controller to remove check for SNO
Add Node Admission controller to stop nodes from joining that do not have CPU Partitioning turned on.
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
The goal of this effort is to leverage OVN Kubernetes SDN to satisfy networking requirements of both traditional and modern virtualization. This Feature describes the envisioned outcome and tracks its implementation.
In its current state, OpenShift Virtualization provides a flexible toolset allowing customers to connect VMs to the physical network. It also has limited secondary overlay network capabilities and Pod network support.
It suffers from several gaps: Topology of the default pod network is not suitable for typical VM workload - due to that we are missing out on many of the advanced capabilities of OpenShift networking, and we also don't have a good solution for public cloud. Another problem is that while we provide plenty of tools to build a network solution, we are not very good in guiding cluster administrators configuring their network, making them rely on their account team.
Provide:
... while maintaining networking expectations of a typical VM workload:
Additionally, make our networking configuration more accessible to newcomers by providing a finite list of user stories mapped to recommended solutions.
4.17 user defined networks TP:
4.17 user defined networks DP:
4.17 other work:
4.18 user-defined networks for public cloud GA:
4.18 user-defined networks, other, GA:
4.18 localnet enhancements:
You can find more info about this effort in https://docs.google.com/document/d/1jNr0E0YMIHsHu-aJ4uB2YjNY00L9TpzZJNWf3LxRsKY/edit
Provide IPAM to customers connecting VMs to OVN Kubernetes secondary networks.
Who | What | Reference |
---|---|---|
DEV | Upstream roadmap issue | <link to GitHub Issue> |
DEV | Upstream code and tests merged | <link to meaningful PR> |
DEV | Upstream documentation merged | <link to meaningful PR> |
DEV | gap doc updated | <name sheet and cell> |
DEV | Upgrade consideration | <link to upgrade-related test or design doc> |
DEV | CEE/PX summary presentation | label epic with cee-training and add a <link to your support-facing preso> |
QE | Test plans in Polarion | https://polarion.engineering.redhat.com/polarion/#/project/CNV/workitem?id=CNV-10864 |
QE | Automated tests merged | <link or reference to automated tests> |
DOC | Downstream documentation merged | <link to meaningful PR> |
Add a knob to CNO to control the installation of the IPAMClaim CRD.
Requires a new OpenShift feature gate only allowing the feature to be installed in Dev / Tech preview.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
Stable IPs for migration:
IP negotiated using DHCP:
As a stakeholder aiming to adopt KubeSaw as a Namespace-as-a-Service solution, I want the project to provide streamlined tooling and a clear code-base, ensuring seamless adoption and integration into my clusters.
Efficient adoption of KubeSaw, especially as a Namespace-as-a-Service solution, relies on intuitive tooling and a transparent codebase. Improving these aspects will empower stakeholders to effortlessly integrate KubeSaw into their Kubernetes clusters, ensuring a smooth transition to enhanced namespace management.
As a Stakeholder, I want streamlined setup of the KubeSaw project and fully automated way of upgrading this setup aling with the updates of the installation.
The expected outcome within the market is both growth and retention. The improved tooling and codebase will attract new stakeholders (growth) and enhance the experience for existing users (retention) by providing a straightforward path to adopting KubeSaw's Namespace-as-a-Service features in their clusters.
This epic is to track all the unplanned work related to security incidents, fixing flaky e2e tests, and other urgent and unplanned efforts that may arise during the sprint.
We drive OpenShift cross-market customer success and new customer adoption with constant improvements and feature additions to the existing capabilities of our OpenShift Core Networking (SDN and Network Edge). This feature captures that natural progression of the product.
There are definitely grey areas, but in general:
Questions to be addressed:
CoreDNS v1.7 renamed some metrics that we use in our alerting rules. Make sure the alerting rules in https://github.com/openshift/cluster-dns-operator/blob/master/manifests/0000_90_dns-operator_03_prometheusrules.yaml are using the correct metrics names (and still work as intended).
We need to verify that no new CoreDNS dual stack features require any configuration changes or feature flags.
(All dual stack changes should just work once we rebase to coredns v1.8.1).
See https://github.com/coredns/coredns/pull/4339 .
We also need to verify that cluster DNS works for both v4 and v6 for a dual stack cluster IP service. (ie request via A and AAAA, make sure you get the desired response, and not just one or the other). A brief CI test on our dual stack metal CI might make the most sense here (KNI Might have a job like this already, need to investigate our options to add dual stack coverage to openshift/coredns).
This story is for actually updating the version of CoreDNS in github.com/openshift/coredns. Our fork will need to be rebased onto https://github.com/coredns/coredns/releases/tag/v1.8.1, which may involve some git fu. Refer to previous CoreDNS Rebase PR's for any pointers there.
Create a PR in openshift/cluster-ingress-operator to implement the PROXY protocol API.
Create a PR in openshift/cluster-ingress-operator to specify the random balancing algorithm if the feature gate is enabled, and to specify the leastconn balancing algorithm (the current default) otherwise.
The multiple destinations provided as a part of the allowedDestinations field is not working as it used to on OCP4: https://github.com/openshift/images/blob/master/egress/router/egress-router.sh#L70-L109
We need to parse this from the NAD and modify the iptables here to support them:
https://github.com/openshift/egress-router-cni/blob/master/pkg/macvlan/macvlan.go#L272-L349
Testing:
1) Created NAD:
[dsal@bkr-hv02 surya_multiple_destinations]$ cat nad_multiple_destination.yaml --- apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: egress-router spec: config: '{ "cniVersion": "0.4.0", "type": "egress-router", "name": "egress-router", "ip": { "addresses": [ "10.200.16.10/24" ], "destinations": [ "80 tcp 10.100.3.200", "8080 tcp 203.0.113.26 80", "8443 tcp 203.0.113.26 443" ], "gateway": "10.200.16.1" } }'
2) Created pod:
[dsal@bkr-hv02 surya_multiple_destinations]$ cat egress-router-pod.yaml --- apiVersion: v1 kind: Pod metadata: name: egress-router-pod annotations: k8s.v1.cni.cncf.io/networks: egress-router spec: containers: - name: openshift-egress-router-pod command: ["/bin/bash", "-c", "sleep 999999999"] image: centos/tools securityContext: privileged: true
3) Checked IPtables:
[root@worker-1 core]# iptables-save -t nat Generated by iptables-save v1.8.4 on Mon Feb 1 12:08:05 2021 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A POSTROUTING -o net1 -j SNAT --to-source 10.200.16.10 COMMIT # Completed on Mon Feb 1 12:08:05 2021
As we can see, only the SNAT rule is added. The DNAT doesn't get picked up because of the syntax difference.
Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.
The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:
Requirement | Notes | isMvp? |
---|---|---|
UI to enable and disable plugins | YES | |
Dynamic Plugin Framework in place | YES | |
Testing Infra up and running | YES | |
Docs and read me for creating and testing Plugins | YES | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Documentation Considerations
Questions to be addressed:
Related to CONSOLE-2380
We need a way for cluster admins to disable a console plugin when uninstalling an operator if it's enabled in the console operator config. Otherwise, the config will reference a plugin that no longer exists. This won't prevent console from loading, but it's something that we can clean up during uninstall.
The UI will always remove the console plugin when an operator is uninstalled. There will not be an option to keep the operator. We should have a sentence in the dialog letting the user know that the plugin will disabled when the operator is uninstalled (but only if the CSV has the plugin annotation).
If the user doesn't have authority to patch the operator config, we should warn them that the operator config can't be updated to remove the plugin.
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
The work on this story is dependent on following changes:
The console already supports custom routes on the operator config. With the new proposed CustomDomains API introduces a unified way how to stock install custom domains for routes, which both names and serving cert/keys, customers want to customise. From the console perspective those are:
The setup should be done on the Ingress config. There two new fields are introduced:
Console-operator will be only consuming the API and check for any changes. If a custom domain is set for either `console` or `downloads` route in the `openshift-console` namespace, console-operator will read the setup set a custom route accordingly. When a custom route will be setup for any of console's route, the default route wont be deleted, but instead it will updated so it redirects to the custom one. This is done because of two reasons:
Console-operator will still need to support the CustomDomain API that is available on it's config.
Acceptance criteria:
Questions:
Dump openshift/api godep to pickup new CustomDomain API for the Ingress config.
Implement console-operator changes to consume new CustomDomains API, based on the story details.
Story:
As a user viewing the pod logs tab with a selected container, I want the ability to view past logs if they are available for the container.
Acceptance Criteria:
Design doc: https://docs.google.com/document/d/1PB8_D5LTWhFPFp3Ovf85jJTc-zAxwgFR-sAOcjQCSBQ/edit#
When moving to OCP 4 we didn't port the metrics charts for Deployments, Deployment Configs, StatefulSets, DaemonSets, ReplicaSets, and ReplicationControllers. These should be the same charts that we show on the Pods page: Memory, CPU, Filesystem, Network In and Out.
This was only done for pods.
We need to decide if we want use a multiline chart or some other representation.
This would let us import YAML with multiple resources and add YAML templates that create related resources like image streams and build configs together.
See CONSOLE-580
Acceptance criteria:
As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for OpenStack deployments which currently use Terraform for setting up the infrastructure.
To avoid an increased support overhead once the license changes at the end of the year, we want to provision OpenStack infrastructure without the use of Terraform.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
\
Essentially: bring the upstream-master branch of shiftstack/cluster-api-provider-openstack under the github.com/openshift organisation.
As a result of Hashicorp's license change to BSL, Red Hat OpenShift needs to remove the use of Hashicorp's Terraform from the installer – specifically for IPI deployments which currently use Terraform for setting up the infrastructure.
To avoid an increased support overhead once the license changes at the end of the year, we want to provision OpenShift on the existing supported providers' infrastructure without the use of Terraform.
This feature will be used to track all the CAPI preparation work that is common for all the supported providers
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
I want hack/build.sh to embed the kube-apiserver and etcd dependencies in openshift-install without making external network calls so that ART/OSBS can build the installer with CAPI dependencies.
Description of criteria:
This requires/does not require a design proposal.
This requires/does not require a feature gate.
The 100.88.0.0/14 IPv4 subnet is currently reserved for the transit switch in OVN-Kubernetes for east west traffic in the OVN Interconnect architecture. We need to make this value configurable so that users can avoid conflicts with their local infrastructure. We need to support this config both prior to installation and post installation (day 2).
This epic will include stories for the upstream ovn-org work, getting that work downstream, an api change, and a cno change to consume the new api
After the upstream pr merges it needs to get into openshift ovn-k via a downstream merge
Scope of this card is to track the work around getting the required pieces in for transit switch subnet in CNO that will let users do custom configurations to transit switch subnet on both day0 (install) and day2 (post-install).
This card will complement https://issues.redhat.com/browse/SDN-4156
You can create the cluster-bot cluster with Ben's PR and do CNO changes locally and test them out.
Networking Definition of Planned
Epic Template descriptions and documentation
Support EgressIP feature with ExternalTrafficPolicy=Local and External2Pod direct routing in OVNKubernetes.
We see a lot of customers using Multi-Egress Gateway with EgressIP.
Currently, connections which reaches pod via the OVN routing gateway are send back via EgressIP if it is associated with the specific namespace.
Multiple bugs have been reported by customers:
https://issues.redhat.com/browse/OCPBUGS-16792
https://issues.redhat.com/browse/OCPBUGS-7454
https://issues.redhat.com/browse/OCPBUGS-18400
Also, resulting in filing RFE, as it was too complicated to be fixed via a bug.
https://issues.redhat.com/browse/RFE-4614
https://issues.redhat.com/browse/RFE-3944
This is observed by multiple customers using MetalLB and F5 load balancers. We haven't really tested this combination.
From the initial discussion, looks like the fix is needed in OVN. Request the team to expedite this fix, given it has bunch of customers hitting it.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
...
1. …
1. …
In 4.15, before conducting the live migration, CNO will check if a cluster is managed by the SD team. We need to remove this checking for supporting unmanaged clusters.
Epic Goal*
Provide a long term solution to SELinux context labeling in OCP.
Why is this important? (mandatory)
As of today when selinux is enabled, the PV's files are relabeled when attaching the PV to the pod, this can cause timeout when the PVs contains lot of files as well as overloading the storage backend.
https://access.redhat.com/solutions/6221251 provides few workarounds until the proper fix is implemented. Unfortunately these workaround are not perfect and we need a long term seamless optimised solution.
This feature tracks the long term solution where the PV FS will be mounted with the right selinux context thus avoiding to relabel every file.
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
As we are relying on mount context there should not be any relabeling (chcon) because all files / folders will inherit the context from the mount context
More on design & scenarios in the KEP and related epic STOR-1173
Dependencies (internal and external) (mandatory)
None for the core feature
However the driver will have to set SELinuxMountSupported to true in the CSIDriverSpec to enable this feature.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Support upstream feature "SELinux relabeling using mount options (CSIDriver API change)"" in OCP as Beta, i.e. test it and have docs for it (unless it's Alpha upstream).
Summary: If Pod has defined SELinux context (e.g. it uses "resticted" SCC) and it uses ReadWriteOncePod PVC and CSI driver responsible for the volume supports this feature, kubelet + the CSI driver will mount the volume directly with the correct SELinux labels. Therefore CRI-O does not need to recursive relabel the volume and pod startup can be significantly faster. We will need a thorough documentation for this.
This upstream epic actually will be implemented by us!
Test that the metrics described in the KEP provide useful data. I.e. check that volume_manager_selinux_volume_context_mismatch_warnings_total increases when there are two Pods that have two different SELinux contexts and use the same volume and different subpath of it.
For more details: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1710302543729539
Epic Goal*
Drive the technical part of the Kubernetes 1.29 upgrade, including rebasing openshift/kubernetes repositiry and coordination across OpenShift organization to get e2e tests green for the OCP release.
Why is this important? (mandatory)
OpenShift 4.17 cannot be released without Kubernetes 1.30
Scenarios (mandatory)
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
PRs:
Networking Definition of Planned
Epic Template descriptions and documentation
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Support network isolation and multiple primary networks (with the possibility of overlapping IP subnets) without having to use Kubernetes Network Policies.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
OVN-Kubernetes today allows multiple different types of networks per secondary network: layer 2, layer 3, or localnet. Pods can be connected to different networks without discretion. For the primary network, OVN-Kubernetes only supports all pods connecting to the same layer 3 virtual topology.
As users migrate from OpenStack to Kubernetes, there is a need to provide network parity for those users. In OpenStack, each tenant (analog to a Kubernetes namespace) by default has a layer 2 network, which is isolated from any other tenant. Connectivity to other networks must be specified explicitly as network configuration via a Neutron router. In Kubernetes the paradigm is the opposite; by default all pods can reach other pods, and security is provided by implementing Network Policy.
Network Policy has its issues:
With all these factors considered, there is a clear need to address network security in a native fashion, by using networks per user to isolate traffic instead of using Kubernetes Network Policy.
Therefore, the scope of this effort is to bring the same flexibility of the secondary network to the primary network and allow pods to connect to different types of networks that are independent of networks that other pods may connect to.
Test scenarios:
In order for the nework API related CRDs be installed and usable out-of-the-box, the new CRDs manifests should be replicated to CNO repository in a way it will install them along with other OVN-K CRDs.
Example https://github.com/openshift/cluster-network-operator/pull/1765
See https://github.com/ovn-org/ovn-kubernetes/pull/4276#discussion_r1628111584 for more details
Goal of this task is to simply add a feature gate both upstream to OVNK and to downstream in ocp/api to then leverage via CNO once the entire feature merges. This is going to be a huge EPIC so with the break down, this card is intentionally ONLY tracking the glue work to have a feature gate piece done in both places.
This card DOES NOT HAVE TO USE THE FEATURE GATE. It is meant to allow other cards to use this.
Epic Goal*
Drive the technical part of the Kubernetes 1.31 upgrade, including rebasing openshift/kubernetes repositiry and coordination across OpenShift organization to get e2e tests green for the OCP release.
Why is this important? (mandatory)
OpenShift 4.18 cannot be released without Kubernetes 1.31
Scenarios (mandatory)
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
PRs:
Networking Definition of Planned
Epic Template descriptions and documentation
Track the stories that cannot be completed before live migration GA.
These tasks shall not block the live migration GA, but we still need to get them done.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. ...
1. ...
The SDN live migration can not work properly in a cluster with specific configurations. CNO shall refuse proceeding the live migration in such a case. We need to add the pre-migration validation to CNO
The live migration shall be blocked for clusters with the following configuration
As, the live migration process may take hours for a large cluster. The workload in the cluster may trigger cluster extension by adding new nodes. We need to support adding new nodes when an SDN live migration is running in progress.
We need to backport this to 4.15.
SD team manage many clusters. Metrics can help them to monitor status of many cluster at the time. There is something which has been done for the cluster upgrade, we may want to follow the same recipe.
Elaborate more dashboards (monitoring dashboards, accessible from menu Observe > Dashboards ; admin perspective) related to networking.
Start with just a couple of areas:
More info/discussion in this work doc: https://docs.google.com/document/d/1ByNIJiOzd6w5csFYpC27NdOydnBg8Tx45uL4-7v-aCM/edit
Elaborate more dashboards (monitoring dashboards, accessible from menu Observe > Dashboards ; admin perspective) related to networking.
Start with just a couple of areas:
More info/discussion in this work doc: https://docs.google.com/document/d/1ByNIJiOzd6w5csFYpC27NdOydnBg8Tx45uL4-7v-aCM/edit
Martin Kennelly is our contact point from the SDN team
Create a dashboard from the CNO
Current metrics documentation:
Include metrics for:
Customers have requested the ability to have the ability to apply tolerations to the HCP control plane pods. This provides the flexibility to have the HCP pods scheduled to nodes with taints applied to them that are not currently tolerated by default.
API
Add new field to HostedCluster. hc.Spec.Tolerations
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
Implementation
In support/config/deployment.go, add hc.spec.tolerations from hc when generating the default config. This will cause the toleration to naturally get spread to the deployments and statefulsets.
CLI
Add new cli argument called –tolerations to the hcp cli tool during cluster creation. This argument should be able to be set multiple times. The syntax of the field should follow the convention set by the kubectl client tool when setting a taint on a node.
For example, the kubectl client tool can be used to set the following taint on a node.
kubectl taint nodes node1 key1=value1:NoSchedule
And then the hcp cli tool should be able to add a toleration for this taint during creation with the following cli arg.
hcp cluster create kubevirt –toleration “key1=value1:noSchedule” …
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
<enter general Feature acceptance here>
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Customers have requested the ability to have the ability to apply tolerations to the HCP control plane pods. This provides the flexibility to have the HCP pods scheduled to nodes with taints applied to them that are not currently tolerated by default.
API
Add new field to HostedCluster. hc.Spec.Tolerations
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
Implementation
In support/config/deployment.go, add hc.spec.tolerations from hc when generating the default config. This will cause the toleration to naturally get spread to the deployments and statefulsets.
CLI
Add new cli argument called –tolerations to the hcp cli tool during cluster creation. This argument should be able to be set multiple times. The syntax of the field should follow the convention set by the kubectl client tool when setting a taint on a node.
For example, the kubectl client tool can be used to set the following taint on a node.
kubectl taint nodes node1 key1=value1:NoSchedule
And then the hcp cli tool should be able to add a toleration for this taint during creation with the following cli arg.
hcp cluster create kubevirt –toleration “key1=value1:noSchedule” …
The cluster-network-operator needs to be HCP tolerations aware, otherwise controllers (like multus and ovn) won't be deployed by the CNO with the correct tolerations.
The code that looks at the HostedControlPlane within the CNO can be found in pkg/hypershift/hypershift.go. https://github.com/openshift/cluster-network-operator/blob/33070b57aac78118eea34060adef7f2fb7b7b4bf/pkg/hypershift/hypershift.go#L134
Continue scale testing and performance improvements for ovn-kubernetes
Networking Definition of Planned
Epic Template descriptions and documentation
Manage Openshift Virtual Machines IP addresses from within the SDN solution provided by OVN-Kubernetes.
Customers want to offload IPAM from their custom solutions (e.g. custom DHCP server running on their cluster network) to SDN.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Investigate options to identify ovn scale problems, that usually cause high CPU usage for ovn-controller and vswitchd
https://docs.google.com/document/d/15PLDLKB9tGnbGYMhdHjlOsvlvXzT9TV7VfKmTJCwMRk/edit
using source port group instead of address set will decrease the number of ovs flows per node.
Needs to be backported to 4.14
Enable CPU manager on s390x.
Why is this important?
CPU manager is an important component to manage performance of OpenShift and utilize the respective platforms.
Enable CPU manager on s390x.
CPU manager works on s390x.
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | Y |
Classic (standalone cluster) | Y |
Hosted control planes | Y |
Multi node, Compact (three node), or Single node (SNO), or all | Y |
Connected / Restricted Network | Y |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | IBM Z |
Operator compatibility | n/a |
Backport needed (list applicable versions) | n/a |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | n/a |
Other (please specify) |
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
High-level list of items that are out of scope. Initial completion during Refinement status.
<your text here>
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
<your text here>
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
OVN Kubernetes Developer's Preview for BGP as a routing protocol for User Defined Network (Segmentation) pod and VM addressability via common data center networking removing the need to negotiate NAT at the cluster's edge.
OVN-Kubernetes currently has no native routing protocol integration, and relies on a Geneve overlay for east/west traffic, as well as third party operators to handle external network integration into the cluster. The purpose of this Developer's Preview enhancement is to introduce BGP as a supported routing protocol with OVN-Kubernetes. The extent of this support will allow OVN-Kubernetes to integrate into different BGP user environments, enabling it to dynamically expose cluster scoped network entities into a provider’s network, as well as program BGP learned routes from the provider’s network into OVN. In a follow-on release, this enhancement will provide support for EVPN, which is a common data center networking fabric that relies on BGP.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Importing Routes from the Provider Network
Today in OpenShift there is no API for a user to be able to configure routes into OVN. In order for a user to change how cluster traffic is routed egress into the cluster, the user leverages local gateway mode, which forces egress traffic to hop through the Linux host's networking stack, where a user can configure routes inside of the host via NM State. This manual configuration would need to be performed and maintained across nodes and VRFs within each node.
Additionally, if a user chooses to not manage routes within the host and use local gateway mode, then by default traffic is always sent to the default gateway. The only other way to affect egress routing is by using the Multiple External Gateways (MEG) feature. With this feature the user may choose to have multiple different egress gateways per namespace to send traffic to.
As an alternative, configuring BGP peers and which route-targets to import would eliminate the need to manually configure routes in the host, and would allow dynamic routing updates based on changes in the provider’s network.
Exporting Routes into the Provider Network
There exists a need for provider networks to learn routes directly to services and pods today in Kubernetes. Metal LB is already one solution whereby load balancer IPs are advertised by BGP to provider networks, and this feature development does not intend to duplicate or replace the function of Metal LB. Metal LB should be able to interoperate with OVN-Kubernetes, and be responsible for advertising services to a provider’s network.
However, there is an alternative need to advertise pod IPs on the provider network. One use case is integration with 3rd party load balancers, where they terminate a load balancer and then send packets directly to OCP nodes with the destination IP address being the pod IP itself. Today these load balancers rely on custom operators to detect which node a pod is scheduled to and then add routes into its load balancer to send the packet to the right node.
By integrating BGP and advertising the pod subnets/addresses directly on the provider network, load balancers and other entities on the network would be able to reach the pod IPs directly.
Extending OVN-Kubernetes VRFs into the Provider Network
This is the most powerful motivation for bringing support of EVPN into OVN-Kubernetes. A previous development effort enabled the ability to create a network per namespace (VRF) in OVN-Kubernetes, allowing users to create multiple isolated networks for namespaces of pods. However, the VRFs terminate at node egress, and routes are leaked from the default VRF so that traffic is able to route out of the OCP node. With EVPN, we can now extend the VRFs into the provider network using a VPN. This unlocks the ability to have L3VPNs that extend across the provider networks.
Utilizing the EVPN Fabric as the Overlay for OVN-Kubernetes
In addition to extending VRFs to the outside world for ingress and egress, we can also leverage EVPN to handle extending VRFs into the fabric for east/west traffic. This is useful in EVPN DC deployments where EVPN is already being used in the TOR network, and there is no need to use a Geneve overlay. In this use case, both layer 2 (MAC-VRFs) and layer 3 (IP-VRFs) can be advertised directly to the EVPN fabric. One advantage of doing this is that with Layer 2 networks, broadcast, unknown-unicast and multicast (BUM) traffic is suppressed across the EVPN fabric. Therefore the flooding domain in L2 networks for this type of traffic is limited to the node.
Multi-homing, Link Redundancy, Fast Convergence
Extending the EVPN fabric to OCP nodes brings other added benefits that are not present in OCP natively today. In this design there are at least 2 physical NICs and links leaving the OCP node to the EVPN leaves. This provides link redundancy, and when coupled with BFD and mass withdrawal, it can also provide fast failover. Additionally, the links can be used by the EVPN fabric to utilize ECMP routing.
OVN Kubernetes support for BGP as a routing protocol.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
When the OCP API flag to enable BGP support in the cluster is set, CNO should deploy FRR-K8S. Depends on SDN-5086.
enhancement ref: https://github.com/openshift/enhancements/pull/1636
In the current version, router does not support to load secrets directly and uses route resource to load private key and certificates exposing the security artifacts.
Acceptance criteria :
Description of problem:
should reduce error message details for Not Found secret when edit/patch route with spec.tls.externalCertificate
Version-Release number of selected component (if applicable):
4.16.0-0.ci.test-2024-05-13-005506-ci-ln-05s0z32-latest
How reproducible:
100%
Steps to Reproduce:
1. enable TP feature "RouteExternalCertificate" 2. create pod,svc and route 3. oc -n hongli patch route myedge --type=merge --patch='{"spec":{"tls":{"externalCertificate":{"name": "newtls"}}}}'
Actual results:
the error message: The Route "myedge" is invalid: spec.tls.externalCertificate: Not found: errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"secrets \"newtls\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc0077e25a0), Code:404}}
Expected results:
something like: `spec.tls.externalCertificate: Not found: "secrets \"newtls\" not found"`
Additional info:
discuss in thread of https://redhat-internal.slack.com/archives/C06EK9ZH3Q8/p1715243443244879
Goal:
Support enablement of dual-stack VIPs on existing clusters created as dual-stack but at a time when it was not possible to have both v4 and v6 VIPs at the same time.
Why is this important?
This is a followup to SDN-2213 ("Support dual ipv4 and ipv6 ingress and api VIPs").
We expect that customers with existing dual stack clusters will want to make use of the new dual stack VIPs fixes/enablement, but it's unclear how this will work because we've never supported modifying on-prem networking configuration after initial deployment. Once we have dual stack VIPs enabled, we will need to investigate how to alter the configuration to add VIPs to an existing cluster.
We will need to make changes to the VIP fields in the Infrastructure and/or ControllerConfig objects. Infrastructure would be the first option since that would make all of the fields consistent, but that relies on the ability to change that object and have the changes persist and be propagated to the ControllerConfig. If that's not possible, we may need to make changes just in ControllerConfig.
For epics https://issues.redhat.com/browse/OPNET-14 and https://issues.redhat.com/browse/OPNET-80 we need a mechanism to change configuration values related to our static pods. Today that is not possible because all of the values are put in the status field of the Infrastructure object.
We had previously discussed this as part of https://issues.redhat.com/browse/OPNET-21 because there was speculation that people would want to move from internal LB to external, which would require mutating a value in Infrastructure. In fact, there was a proposal to put that value in the spec directly and skip the status field entirely, but that was discarded because a migration would be needed in that case and we need separate fields to indicate what was requested and what the current state actually is.
There was some followup discussion about that with Joel Speed from the API team (which unfortunately I have not been able to find a record of yet) where it was concluded that if/when we want to modify Infrastructure values we would add them to the Infrastructure spec and when a value was changed it would trigger a reconfiguration of the affected services, after which the status would be updated.
This means we will need new logic in MCO to look at the spec field (currently there are only fields in the status, so spec is ignored completely) and determine the correct behavior when they do not match. This will mean the values in ControllerConfig will not always match those in Infrastructure.Status. That's about as far as the design has gone so far, but we should keep the three use cases we know of (internal/external LB, VIP addition, and DNS record overrides) in mind as we design the underlying functionality to allow mutation of Infrastructure status values.
Depending on how the design works out, we may only track the design phase in this epic and do the implementation as part of one of the other epics. If there is common logic that is needed by all and can be implemented independently we could do that under this epic though.
For clusters that are installed as fresh 4.15 o/installer will propagate Infrastructure.Spec and Infrastructure.Status based on the install-config. However for clusters that are upgraded this code in o/installer will never run.
In order to have a consistent state at upgrade, we will make CNO to propagate Status back to Spec when cluster is upgraded to OCP 4.15.
As we have already done it when introducing multiple VIPs (API change that created plural field next to the singular), all the necessary code scaffolding is already in place.
Infrastructure.Spec will be modified by end-user. CNO needs to validate those changes and if valid, propagate them to Infrastructure.Status
This epic track sdn-side work for https://issues.redhat.com/browse/NP-654
that mostly includes code reviews for now
Add a knob in CNO to allow users to modify the changes made in ovn-k
Create custom roles for GCP with minimal set of required permissions.
Enable customers to better scope credential permissions and create custom roles on GCP that only include the minimum subset of what is needed for OpenShift.
Some of the service accounts that CCO creates, e.g. service account with role roles/iam.serviceAccountUser provides elevated permissions that are not required/used by the requesting OpenShift components. This is because we use predefined roles for GCP that come with bunch of additional permissions. The goal is to create custom roles with only the required permissions.
TBD
Evaluate if any of the GCP predefined roles in the credentials request manifest of Cluster Network Operator give elevated permissions. Remove any such predefined role from spec.predefinedRoles field and replace it with required permissions in the new spec.permissions field.
The new GCP provider spec for credentials request CR is as follows:
type GCPProviderSpec struct { metav1.TypeMeta `json:",inline"` // PredefinedRoles is the list of GCP pre-defined roles // that the CredentialsRequest requires. PredefinedRoles []string `json:"predefinedRoles"` // Permissions is the list of GCP permissions required to // create a more fine-grained custom role to satisfy the // CredentialsRequest. // When both Permissions and PredefinedRoles are specified // service account will have union of permissions from // both the fields Permissions []string `json:"permissions"` // SkipServiceCheck can be set to true to skip the check whether the requested roles or permissions // have the necessary services enabled // +optional SkipServiceCheck bool `json:"skipServiceCheck,omitempty"` }
we can use the following command to check permissions associated with a GCP predefined role
gcloud iam roles describe <role_name>
The sample output for role roleViewer is as follows. The permission are listed in "includedPermissions" field.
[akhilrane@localhost cloud-credential-operator]$ gcloud iam roles describe roles/iam.roleViewer
description: Read access to all custom roles in the project.
etag: AA==
includedPermissions:
- iam.roles.get
- iam.roles.list
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
name: roles/iam.roleViewer
stage: GA
title: Role Viewer
Update GCP Credentials Request manifest of the Cluster Network Operator to use new API field for requesting permissions.
This enhances EgressQoS CRD with status information and provide implementation to update this field with relevant information while creating/updating EgressQoS.
Review the OVN Interconnect proposal, figure out the work that needs to be done in ovn-kubernetes to be able to move to this new OVN architecture.
OVN IC will be the model used in Hypershift.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Work with https://issues.redhat.com/browse/SDN-3654 card to get data from scale team as needed and continue to improvise the numbers.
Seems like we are having some issues on dualstack afterall, reopening this issue based on https://github.com/ovn-org/ovn-kubernetes/pull/3714#issuecomment-1611575298
One idea that @kyrtapz had was to remove all nil fields in the transact and configure logs, since they anyways don't provide any useful info and they can cut down the size of a single transact line.
The OCP Console needs to detect if the ACM Operator has been installed, if detected then a new multi-cluster perspective option shows up in the perspective chooser.
As a user I need the ability to to switch to the the ACM UI from the OCP Console and vice versa without requiring the user to login multiple times.
This option also needs to be hidden if the user doesn't have the correct RBAC.
The console should detect the presence of the ACM operator and add an Advanced Cluster Management item to the perspective switcher. We will need to work with the ACM team to understand how to detect the operator and how to discover the ACM URL.
Additionally, we will need to provide a query parameter or URL fragment to indicate which perspective to use. This will allow ACM to link back to the a specific perspective since it will share the same perspective switcher in its UI. ACM will need to be able to discover the console URL.
This story does not include handling SSO, which will be tracked in a separate story.
We need to determine what RBAC checks to make before showing the ACM link.
Acceptance Criteria
1. Console shows a link to ACM in its perspective switcher
2. Console provides a way for ACM to link back to a specific perspective
3. The ACM option only appears when the ACM operator is installed
4. ACM should open in the same browser tab to give the appearance of it being one application
5. Only users with appropriate RBAC should see the link (access review TBD)
During the migration, a node will start as an SDN node (a hybrid overlay node from OVN-K perspective), then become an OVN-K node. So OVN-K needs to support such dynamical role switching.
We need to enhance cluster network operator to automate the whole SDN live-migration.
We need to be able to install the HO with external DNS and create HCPs on AKS clusters
The cloud-network-config-operator is being deployed on HyperShift with `runAsNonRoot` set to true. When HCP is deployed on non-OpenShift management clusters, such as AKS, this needs to be unset so the pod can run as root.
This is currently causing issues deploying this pod on HCP on AKS with the following error:
state:
waiting:
message: 'container has runAsNonRoot and image will run as root (pod: "cloud-network-config-controller-59d4677589-bpkfp_clusters-brcox-hypershift-arm(62a4b447-1df7-4e4a-9716-6e10ec55d8fd)", container: hosted-cluster-kubecfg-setup)'
reason: CreateContainerConfigError
We drive OpenShift cross-market customer success and new customer adoption with constant improvements and feature additions to the existing capabilities of our OpenShift Core Networking (SDN, Network Edge, Network Observability). This feature captures that natural progression of the product for that development that does not align neatly to an existing Jira Feature.
There are grey areas, but in general:
Questions to be addressed:
Upstream K8s deprecated PodSecurityPolicy and replaced it with a new built-in admission controller that enforces the Pod Security Standards (See here for the motivations for deprecation).] There is an OpenShift-specific dedicated pod admission system called Security Context Constraints. Our aim is to keep the Security Context Constraints pod admission system while also allowing users to have access to the Kubernetes Pod Security Admission.
With OpenShift 4.11, we are turned on the Pod Security Admission with global "privileged" enforcement. Additionally we set the "restricted" profile for warnings and audit. This configuration made it possible for users to opt-in their namespaces to Pod Security Admission with the per-namespace labels. We also introduced a new mechanism that automatically synchronizes the Pod Security Admission "warn" and "audit" labels.
With OpenShift 4.15, we intend to move the global configuration to enforce the "restricted" pod security profile globally. With this change, the label synchronization mechanism will also switch into a mode where it synchronizes the "enforce" Pod Security Admission label rather than the "audit" and "warn".
Epic Goal
Get Pod Security admission to be run in "restricted" mode globally by default alongside with SCC admission.
Modify the PodSecurityViolation alert to show namespace information. To prevent cardinality explosion on the namespace label, only limit the values of the label to platform namespaces ("openshift", "default", "kube-").
This will also need a carry patch in o/k
When creating a custom SCC, it is possible to assign a priority that is higher than existing SCCs. This means that any SA with access to all SCCs might use the higher priority custom SCC, and this might mutate a workload in an unexpected/unintended way.
To protect platform workloads from such an effect (which, combined with PSa, might result in rejecting the workload once we start enforcing the "restricted" profile) we must pin the required SCC to all workloads in platform namespaces (openshift-, kube-, default).
Each workload should pin the SCC with the least-privilege, except workloads in runlevel 0 namespaces that should pin the "privileged" SCC (SCC admission is not enabled on these namespaces, but we should pin an SCC for tracking purposes).
The following tables track progress.
# namespaces | 4.18 | 4.17 | 4.16 | 4.15 |
---|---|---|---|---|
monitored | 82 | 82 | 82 | 82 |
fix needed | 69 | 69 | 69 | 69 |
fixed | 34 | 30 | 30 | 39 |
remaining | 35 | 39 | 39 | 30 |
~ remaining non-runlevel | 15 | 19 | 19 | 10 |
~ remaining runlevel (low-prio) | 20 | 20 | 20 | 20 |
~ untested | 2 | 2 | 2 | 82 |
# | namespace | 4.18 | 4.17 | 4.16 | 4.15 |
---|---|---|---|---|---|
1 | oc debug node pods | #1763 | #1816 | #1818 | |
2 | openshift-apiserver-operator | #573 | #581 | ||
3 | openshift-authentication | #656 | #675 | ||
4 | openshift-authentication-operator | #656 | #675 | ||
5 | openshift-catalogd | #50 | #58 | ||
6 | openshift-cloud-credential-operator | #681 | #736 | ||
7 | openshift-cloud-network-config-controller | #2282 | #2490 | #2496 | |
8 | openshift-cluster-csi-drivers | #170 #459 | #484 | ||
9 | openshift-cluster-node-tuning-operator | #968 | #1117 | ||
10 | openshift-cluster-olm-operator | #54 | n/a | ||
11 | openshift-cluster-samples-operator | #535 | #548 | ||
12 | openshift-cluster-storage-operator | #459 #196 | #484 #211 | ||
13 | openshift-cluster-version | #1038 | #1068 | ||
14 | openshift-config-operator | #410 | #420 | ||
15 | openshift-console | #871 | #908 | #924 | |
16 | openshift-console-operator | #871 | #908 | #924 | |
17 | openshift-controller-manager | #336 | #361 | ||
18 | openshift-controller-manager-operator | #336 | #361 | ||
19 | openshift-e2e-loki | #56579 | #56579 | #56579 | #56579 |
20 | openshift-image-registry | #1008 | #1067 | ||
21 | openshift-infra | ||||
22 | openshift-ingress | #1031 | |||
23 | openshift-ingress-canary | #1031 | |||
24 | openshift-ingress-operator | #1031 | |||
25 | openshift-insights | #915 | #967 | ||
26 | openshift-kni-infra | #4504 | #4542 | #4539 | #4540 |
27 | openshift-kube-storage-version-migrator | #107 | #112 | ||
28 | openshift-kube-storage-version-migrator-operator | #107 | #112 | ||
29 | openshift-machine-api | #407 | #315 #282 #1220 #73 #50 #433 | #332 #326 #1288 #81 #57 #443 | |
30 | openshift-machine-config-operator | #4219 | #4384 | #4393 | |
31 | openshift-manila-csi-driver | #234 | #235 | #236 | |
32 | openshift-marketplace | #561 | #570 | ||
33 | openshift-metallb-system | #238 | #240 | #241 | |
34 | openshift-monitoring | #2335 | #2420 | ||
35 | openshift-network-console | ||||
36 | openshift-network-diagnostics | #2282 | #2490 | #2496 | |
37 | openshift-network-node-identity | #2282 | #2490 | #2496 | |
38 | openshift-nutanix-infra | #4504 | #4504 | #4539 | #4540 |
39 | openshift-oauth-apiserver | #656 | #675 | ||
40 | openshift-openstack-infra | #4504 | #4504 | #4539 | #4540 |
41 | openshift-operator-controller | #100 | #120 | ||
42 | openshift-operator-lifecycle-manager | #703 | #828 | ||
43 | openshift-route-controller-manager | #336 | #361 | ||
44 | openshift-service-ca | #235 | #243 | ||
45 | openshift-service-ca-operator | #235 | #243 | ||
46 | openshift-sriov-network-operator | #754 #995 | #999 | #1003 | |
47 | openshift-storage | ||||
48 | openshift-user-workload-monitoring | #2335 | #2420 | ||
49 | openshift-vsphere-infra | #4504 | #4542 | #4539 | #4540 |
50 | (runlevel) kube-system | ||||
51 | (runlevel) openshift-cloud-controller-manager | ||||
52 | (runlevel) openshift-cloud-controller-manager-operator | ||||
53 | (runlevel) openshift-cluster-api | ||||
54 | (runlevel) openshift-cluster-machine-approver | ||||
55 | (runlevel) openshift-dns | ||||
56 | (runlevel) openshift-dns-operator | ||||
57 | (runlevel) openshift-etcd | ||||
58 | (runlevel) openshift-etcd-operator | ||||
59 | (runlevel) openshift-kube-apiserver | ||||
60 | (runlevel) openshift-kube-apiserver-operator | ||||
61 | (runlevel) openshift-kube-controller-manager | ||||
62 | (runlevel) openshift-kube-controller-manager-operator | ||||
63 | (runlevel) openshift-kube-proxy | ||||
64 | (runlevel) openshift-kube-scheduler | ||||
65 | (runlevel) openshift-kube-scheduler-operator | ||||
66 | (runlevel) openshift-multus | ||||
67 | (runlevel) openshift-network-operator | ||||
68 | (runlevel) openshift-ovn-kubernetes | ||||
69 | (runlevel) openshift-sdn |
More details at ARO managed identity scope and impact.
This Section: A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
This effort is dependent on the completion of work for CCO-187, and effort in dependent modules is planned to be worked on by the CCO team unless individual repo owners can help. Operators owners/teams will be expected to review merge requests and complete appropriate QE effort for an openshift release.
This effort is dependent on the completion of work for CCO-187, and effort in dependent modules is planned to be worked on by the CCO team unless individual repo owners can help. Operators owners/teams will be expected to review merge requests and complete appropriate QE effort for an openshift release.
While trying to block requests going from the pods to different domain names, for example:
Here, the egressnetworkpolicy is working out for `registry.access.redhat.com` and `registry.access.redhat.com.edgekey.net`, however, for `registry-1.docker.io`, it is not denying access despite giving the deny entry.
"Domain name updates are polled based on the TTL (time to live) value of the domain returned by the local non-authoritative servers. The pod should also resolve the domain from the same local nameservers when necessary, otherwise, the IP addresses for the domain perceived by the egress network policy controller and the pod will be different, and the egress network policy may not be enforced as expected. Since egress network policy controller and pod are asynchronously polling the same local nameserver, there could be a race condition where pod may get the updated IP before the egress controller. Due to this current limitation, domain name usage in EgressNetworkPolicy is only recommended for domains with infrequent IP address changes."
Aim of this feature is to fix this and also support wildcard entries for EgressNetwork Policy
Make the changes as per the proposed enhancement https://github.com/openshift/enhancements/pull/1335
Note: The flag should be added to OVN-K after checking if the feature-gate DNSNameResolver is enabled.
- apiGroups: ["network.openshift.io"]
resources:
- dnsnameresolvers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Why is this important? (mandatory)
What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
Why is this important? (mandatory)
What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Networking Definition of Planned
Epic Template descriptions and documentation
With ovn-ic we have multiple actors (zones) setting status on some CRs. We need to make sure individual zone statuses are reported and then optionally merged to a single status
Without that change zones will overwrite each others statuses.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
This card is about:
Migrate every occurrence of iptables in OpenShift to use nftables, instead.
Implement a full migration from iptables to nftables within a series of "normal" upgrades of OpenShift with the goal of not causing any more network disruption than would normally be required for an OpenShift upgrade. (Different components may migrate from iptables to nftables in different releases; no coordination is needed between unrelated components.)
Template:
Networking Definition of Planned
Epic Template descriptions and documentation
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
Consume the newly introduced API and apply the scheduling configuration (taints and node selectors) to network-check-source and network-check-target.
A guest cluster can use an external OIDC token issuer. This will allow machine-to-machine authentication workflows
A guest cluster can configure OIDC providers to support the current capability: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens and the future capability: https://github.com/kubernetes/kubernetes/blob/2b5d2cf910fd376a42ba9de5e4b52a53b58f9397/staging/src/k8s.io/apiserver/pkg/apis/apiserver/types.go#L164 with an API that
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
A guest cluster can use an external OIDC token issuer. This will allow machine-to-machine authentication workflows
A guest cluster can configure OIDC providers to support the current capability: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens and the future capability: https://github.com/kubernetes/kubernetes/blob/2b5d2cf910fd376a42ba9de5e4b52a53b58f9397/staging/src/k8s.io/apiserver/pkg/apis/apiserver/types.go#L164 with an API that
Tracks work for https://github.com/kubernetes-sigs/network-policy-api/pull/209 AND then to consume that in OVNKubernetes
This card adds support for implementing ANP.Egress.Networks Peer in OVNKubernetes:
This card tracks removing feature gate and making it GA in OCP.
Migrate every occurrence of iptables in OpenShift to use nftables, instead.
Implement a full migration from iptables to nftables within a series of "normal" upgrades of OpenShift with the goal of not causing any more network disruption than would normally be required for an OpenShift upgrade. (Different components may migrate from iptables to nftables in different releases; no coordination is needed between unrelated components.)
Template:
Networking Definition of Planned
Epic Template descriptions and documentation
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
When the internal oauth-server and oauth-apiserver are removed and replaced with an external OIDC issuer (like azure AD), the console must work for human users of the external OIDC issuer.
An end user can use the openshift console without a notable difference in experience. This must eventually work on both hypershift and standalone, but hypershift is the first priority if it impacts delivery
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Follow the rebase doc[1] and update the spreadsheet[2] that tracks the required commits to be cherry-picked. Rebase the o/k repo with the "merge=ours" strategy as mentioned in the rebase doc.
Save the last commit id in the spreadsheet for future references.
Update the rebase doc if required.
[1] https://github.com/openshift/kubernetes/blob/master/REBASE.openshift.md
[2] https://docs.google.com/spreadsheets/d/10KYptJkDB1z8_RYCQVBYDjdTlRfyoXILMa0Fg8tnNlY/edit#gid=1957024452
Prev. Ref:
https://github.com/openshift/kubernetes/pull/1646
While monitoring the payload job failures, open a parallel openshift/origin bump.
Note: There is a high chance of job failures in openshift/origin bump until the openshift/kubernetes PR merges as we only update the test and not the actual kube.
Benefit of opening this PR before ocp/k8s merge is to identify and fix the issues beforehand.
Goal Summary
This feature aims to make sure that the HyperShift operator and the control-plane it deploys uses Managed Service Identities (MSI) and have access to scoped credentials (also via access to AKS's image gallery potentially). Additionally, for operators deployed in customers account (system components), they would be scoped with Azure workload identities.
Networking Definition of Planned
Epic Template descriptions and documentation
Support Managed Service Identity (MSI) authentication in Azure.
Controllers that require cloud access and run on the control plane side in ARO hosted clusters will need to use MSI to acquire tokens to interact with the hosted cluster's cloud resources.
The cluster network operator runs the following pods that require cloud credentials:
The following components use the token-minter but do not require cloud access:
These pods will need to use MSI when running in hosted control plane mode.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Networking Definition of Planned
Epic Template descriptions and documentation
openshift-sdn is no longer part of OCP in 4.17, so remove references to it in the networking APIs.
Consider whether we can remove the entire network.openshift.io API, which will now be no-ops.
In places where both sdn and ovn-k are supported, remove references to sdn.
In some places (notably the migration API), we will probably leave an API in place that currently has no purpose.
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Networking Definition of Planned
Epic Template descriptions and documentation
openshift-sdn is no longer part of OCP in 4.17, so CNO must stop referring to its image
Additional information on each of the above items can be found here: Networking Definition of Planned
...
1.
...
1. …
1. …
Stop generating long-lived service account tokens. Long-lived service account tokens are currently generated in order to then create an image pull secret for the internal image registry. This feature calls for using the TokenRequest API to generate a bound service account token for use in the image pull secret.
Use TokenRequest API to create image pull secrets.
{}Performance benefits:
One less secret created per service account. This will result in at least three less secrets generated per namespace.
Security benefits:
Long lived tokens which are no longer recommended as they present a possible security risk.
Requirements (aka. Acceptance Criteria):
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
The upstream test `ServiceAccounts no secret-based service account token should be auto-generated` was previously patched to allow for the internal image registry's managed image pull secret to be present in the `Secrets` field. This will not longer be the case as of 4.16.
Post merge of API-1644, we can remove the patch entirely.
DPDK applications require dedicated CPUs, and isolated any preemption (other processes, kernel threads, interrupts), and this can be achieved with the “static” policy of the CPU manager: the container resources need to include an integer number of CPUs of equal value in “limits” and “request”. For instance, to get six exclusive CPUs:
spec:
containers:
- name: CNF
image: myCNF
resources:
limits:
cpu: "6"
requests:
cpu: "6"
The six CPUs are dedicated to that container, however non trivial, meaning real DPDK applications do not use all of those CPUs as there is always at least one of the CPU running a slow-path, processing configuration, printing logs (among DPDK coding rules: no syscall in PMD threads, or you are in trouble). Even the DPDK PMD drivers and core libraries include pthreads which are intended to sleep, they are infrastructure pthreads processing link change interrupts for instance.
Can we envision going with two processes, one with isolated cores, one with the slow-path ones, so we can have two containers? Unfortunately no: going in a multi-process design, where only dedicated pthreads would run on a process is not an option as DPDK multi-process is going deprecated upstream and has never picked up as it never properly worked. Fixing it and changing DPDK architecture to systematically have two processes is absolutely not possible within a year, and would require all DPDK applications to be re-written. Knowing that the first and current multi-process implementation is a failure, nothing guarantees that a second one would be successful.
The slow-path CPUs are only consuming a fraction of a real CPU and can safely be run on the “shared” CPU pool of the CPU Manager, however containers specifications do not accept to request two kinds of CPUs, for instance:
spec:
containers:
- name: CNF
image: myCNF
resources:
limits:
cpu_dedicated: "4"
cpu_shared: "20m"
requests:
cpu_dedicated: "4"
cpu_shared: "20m"
Why do we care about allocating one extra CPU per container?
Let’s take a realistic example, based on a real RAN CNF: running 6 containers with dedicated CPUs on a worker node, with a slow Path requiring 0.1 CPUs means that we waste 5 CPUs, meaning 3 physical cores. With real life numbers:
Intel public CPU price per core is around 150 US$, not even taking into account the ecological aspect of the waste of (rare) materials and the electricity and cooling…
Requirement | Notes | isMvp? |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This issue has been addressed lately by OpenStack.
N/A
We need to extend the node admission plugin to support the shared cpus.
The admission should provide the following functionalities:
1. In case a user specifies more than a single `openshift.io/enabled-shared-cpus` resource, it rejects the pod request with an error explaining the user how to fix its pod spec.
2. It adds an annotation `cpu-shared.crio.io` that will be used to tell the runtime that shared cpus were requested.
For every container requested for shared cpus, it adds an annotation with the following scheme:
`cpu-shared.crio.io/<container name>`
Example of how it's done for core pinning: https://github.com/openshift/kubernetes/commit/04ff5090bae1cb181a2464696adde8709cdd0a93
We need to add support to Kubelet to advertise the shared-cpu as `openshift.io/enabled-shared-cpus` through extended resources
This should be off by default and only activated when a configuration file is being supplied.
Any Telco deployment seeks for best performances, determinism, and low TCO. Kubernetes has been thought for cloud usage, where pods run on vCPUs. In a telco deployment, vCPU can be either be:
Another parameter greatly impact performance: NUMA
OCP as of today (4.7) partitions server CPUs in multiple shared pools and a dedicated pool, without hyperthreading and NUMA awareness.
Detailed status on OCP4.12 and OCP4.14, and the key missing item for OCP4.14+ is how we do spread/pack NIC interrupts in order to get a maximum of parallelism: https://docs.google.com/presentation/d/1Aet59myjjSIesubSKZyD5Ty6pVrd0SftbVbUFbbAK8w/edit#slide=id.g290f9655170_0_903
Permit an efficient CPU usage on OCP servers: share as much as possible when possible, and dedicate only what needs to be really dedicated, at the hyperthread granularity, taking into account NUMA locality.
Multiple NUMA per socket CPU systems like AMD Rome with NPS>1 (Node Per Socket) are out of this feature scope. More details on NPS in this gdoc.
Requirement | Notes | isMvp? |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Must work on Real Time and non real time RHCOS | YES |
N/A
Today’s implementation split the available CPUs in sets dedicated to either systemd, either OCP pods (OCP infrastructure pods and applications). This permit to avoid noisy neighbor syndromes across the pools, but not within a pool, and lead to overconsumption of CPUs as each and every pool has its margin.
N/A
N/A
N/A
Example with kube-proxy (not OVN-K) of existing nftable rules (nat table) with two services, 10.10.10.1 and 9.9.9.1:
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-SVC-MZWLROVU74UDI3HJ tcp -- ** 0.0.0.0/0 10.103.205.40 /* default/nodeportamsterdam cluster IP */ tcp dpt:80
0 0 KUBE-SVC-RQWI4M5IL64FQFRX tcp -- ** 0.0.0.0/0 10.97.176.133 /* default/laposte cluster IP */ tcp dpt:443
0 0 KUBE-FW-RQWI4M5IL64FQFRX tcp -- ** 0.0.0.0/0 9.9.9.1 /* default/laposte loadbalancer IP */ tcp dpt:443
0 0 KUBE-SVC-DGNTJ5HIKQEDKIWG tcp -- ** 0.0.0.0/0 10.111.241.130 /* default/anyboss cluster IP */ tcp dpt:443
0 0 KUBE-FW-DGNTJ5HIKQEDKIWG tcp -- ** 0.0.0.0/0 10.10.10.1 /* default/anyboss loadbalancer IP */ tcp dpt:443
0 0 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- ** 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
0 0 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- ** 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- ** 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
0 0 KUBE-SVC-JD5MR3NA4I4DYORP tcp -- ** 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
2734 164K KUBE-NODEPORTS all -- ** 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
Requirements
Requirement | Notes | isMvp? |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
metalLB BGP | to be validated with BFD as well | YES |
metalLB L2 | not part of the MVP, but should be implemented as well | NO |
OVN-K shared gateway | local Gateway is enough for MVP, but ultimately any mode will have to be supported | NO |
Should work with other load balancers as metalLB | This should not be tested but the implementation should not be metalLB specific on OVN-K side, so it can be reused with F5 SPK for instance | NO |
PodToPodUseServiceIPForEgress | To be implemented if this is a low hanging fruit or when we get explicit demand. But the implementation of the MVP should permit its later implementation. | NO |
Questions to be addressed:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were completed when this image was assembled
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
Node is currently 10.x. Let's increase that to at least 14.x.
It will require some changes on the ART side as well OSBS builds
This is required to bump node to avoid https://github.com/webpack/webpack/issues/4629. We need to evaluate whether this has a domino effect on our webpack dependencies.
See https://github.com/openshift/console/pull/7306#issuecomment-755509361
Console operator should swap from using monis.app to openshift/operator-boilerplate-legacy. This will allow switching to klog/v2, which the shared libs (api,client-go,library-go) have already done.
The console requires to know the network type capabilities to show/hide some Network Policy form fields.
As a result of https://issues.redhat.com/browse/NETOBSERV-27, this logic is implemented as a features document inside the console code. The console fetches the network type from the network operator and checks the supported features towards this document.
However, this limits the feature to admin users, as other logged-in users do not have permissions to fetch the network type.
This task aims to modify the current Cluster Network Operator to expose the network capabilities as an `sdn-public` Config Map, writeable only by the SDN, readable by any `system:authenticated` user.
Enhancement Proposal PR: https://github.com/openshift/enhancements/pull/875
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Enable protectKernelDefaults by default.
Description of problem:
Enable default sysctls for kubelet.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
This epic is mainly focused to track the dev console QE automation activities for 4.8 release
1. Identify the scenarios for automation
2. Segregate the test cases into smoke, Regression and user stories
3. Designing the gherkin scripts with below priority
- Update the Smoke test suite
- Update the Regression test suite
4. Create the automation scripts using cypress
5. Implement CI
This improves the quality of the product
This is not related to any UI features. It is mainly focused on UI automation
This story is mainly related to push the pipelines code from dev console to pipelines plugin folder for extensibility purpose
Verify the pipelines regression test suite
As a operator qe, I should be able to execute them on my operator folder
1. All pipelines scripts should be able to execute in the pipelines plugin folder
2. Pipelines operator installation needs to be done by the script
CI implementation for pipelines, knative, devconsole
update package.json file
CI for pipelines:
Any update related to pipelines should execute pipelines smoke tests
on nightly builds, pipelines regression should be executed [TBD]
CI for devconsole:
Any update related to devconsole should execute devconsole smoke tests
on nightly builds, devconsole regression should be executed [TBD]
Ci for knative
Any update related to knative should execute knative smoke tests
on nightly builds, knative regression should be executed [TBD]
Setup the CI for all plugins smoke test scripts
References for CI implementation
Fixing the lint feature file lint issues and moving the topology features to topology folder which is occurring on executing `yarn run test-cypress-devconsole-headless`
This story is mainly related to push the pipelines code from dev console to gitops plugin folder for extensibility purpose
As a operator qe, I should be able to execute them on my operator folder
1. All pipelines scripts should be able to execute in the gitops plugin folder
2. gitops operator installation needs to be done by the script
Would like to include integration-tests for topology folder
Consolidate cypress cucumber and cypress frameworks related to pluigns/index.js files
Currently the PR looks too large, To reduce the size, creating these sub tasks
Updating the ReadMe documentation for knative plugin folder
updated all automation scripts and verify the execution on remote cluster
As a user,
Execute them on Chrome browser and 4.8 release cluster
Design the cypress scripts for the epic ODC-3991
Refer the Gherkin scripts https://issues.redhat.com/browse/ODC-5430
As a user,
All automation possible test scenarios related to EPIC ODC-3991 should be automated
Pipelines operator needs to be installed
By adding the owners file to service mesh, helps us to add the automatic reviewers on this gherkin scripts update
This helps to automatically notify the web terminal team members on test scenario changes
Fixing all gherkin linter errors
Create Github templates with certain criteria to met the Gherkin script standards, Automation script standards
As this .gherkin-lintrc is mainly used by QE team. so it's not necessary to be in frontend folder, So I am moving it to dev-console/integration-tests folder
Adding all necessary tags and modifying below rules due to recently observed scenarios
This epic tracks network tooling improvements for 4.12
New framework and process should be developed to make sharing network tools with devs, support and customers convenient. We are going to add some tools for ovn troubleshooting before ovn-k goes default, also some tools that we got from customer cases, and some more to help analyze and debug collected logs based on stable must-gather/sosreport format we get now thanks to 4.11 Epic.
Our estimation for this Epic is 1 engineer * 2 Sprints
WHY:
This epic is important to help improve the time it takes our customers and our team to understand an issue within the cluster.
A focus of this epic is to develop tools to quickly allow debugging of a problematic cluster. This is crucial for the engineering team to help us scale. We want to provide a tool to our customers to help lower the cognitive burden to get at a root cause of an issue.
Alert if any of the ovn controllers disconnected for a period of time from the southbound database using metric ovn_controller_southbound_database_connected.
The metric updates every 2 minutes so please be mindful of this when creating the alert.
If the controller is disconnected for 10 minutes, fire an alert.
DoD: Merged to CNO and tested by QE
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Add sock proxy to cluster-network-operator so egressip can use grpc to reach worker nodes.
With the introduction of grpc as means for determining the state of a given egress node, hypershift should
be able to leverage socks proxy and become able to know the state of each egress node.
References relevant to this work:
1281-network-proxy
[+https://coreos.slack.com/archives/C01C8502FMM/p1658427627751939+]
[+https://github.com/openshift/hypershift/pull/1131/commits/28546dc587dc028dc8bded715847346ff99d65ea+]
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled
Please read: migrating-protractor-tests-to-cypress
Protractor test to migrate: `frontend/integration-tests/tests/storage.scenario.ts`
Loops through 6 storage kinds:
15) Add storage is applicable for all workloads
16) replicationcontrollers
✔ create a replicationcontrollers resource
✔ add storage to replicationcontrollers
17) daemonsets
✔ create a daemonsets resource
✔ add storage to daemonsets
18) deployments
✔ create a deployments resource
✔ add storage to deployments
19) replicasets
✔ create a replicasets resource
✔ add storage to replicasets
20) statefulsets
✔ create a statefulsets resource
✔ add storage to statefulsets
21) deploymentconfigs
✔ create a deploymentconfigs resource
✔ add storage to deploymentconfigs
Accpetance Criteria
Please read: migrating-protractor-tests-to-cypress
Protractor test to migrate: `frontend/integration-tests/tests/filter.scenario.ts`
4) Filtering ✔ filters Pod from object detail ✔ filters invalid Pod from object detail ✔ filters from Pods list ⚠ CONSOLE-1503 - searches for object by label ✔ searches for pod by label and filtering by name ✔ searches for object by label using by other kind of workload
Accpetance Criteria