Feature CCX-137: Insights Advisor: Insights Advisor for OpenShift as standalone app

View the Description

Feature Overview
Insights Advisor for OpenShift is integrated within OpenShift Cluster Manager. This has some limitations for adding new features and also for sharing codebase between RHEL Advisor and OCM Insights Advisor tab. Insights Advisor for OpenShift lacks certain features from the RHEL UI, the codebase is not 1:1 clone.
As a customer of Insights I will have same/very similar user experience with Insights for OpenShift and Insights for RHEL. The workflows will share the main concepts, the UI elements will be same and features introduced to Advisor will be automatically considered for both all supported platforms.
As OpenShift users I will still see integrations of Insights Advisor within OpenShift Cluster Manager that shows aggregated information for customer account and single cluster view on Advisor data. These integration will point to new Insights Advisor for OpenShift app that will be tightly integrated into OpenShift Cluster Manager.

Note: The application will be reusing the codebase but will run as a separate app for OpenShift. THere's no intent to merge RHEL and OpenShift workflows into a single app.

Goals

Q2CY21: Explore possibility to unify codebase between RHEL Advisor and OCM Insights Advisor tab. Identify architecture misalignments, create UI mockups to merge the two existing UIs.
Q3CY21: Integrate OpenShift into Advisor codebase, standup the Insights Advisor for OpenShift application and change integration in OpenShift Cluster manager to point at the new app
Q4CY21: Deliver missing screen of Insights Advisor for OpenShift (Systems and Recommendations views)

Requirements

UX overview of UI elements in both UIs - Marie Doruskova
Architecture overview/misalignments for both UIs - Jan Zeleny [~fjansen]

Benefits

Feature parity between RHEL and OpenShift
Adopting new features developed by RHEL Advisor team quicker
Smaller maintenance cost

Questions to answer...

Possible deviations between OpenShift and RHEL
Remediation workflow different between OpenShift and RHEL

Out of Scope

Single app that combines RHEL hosts and OpenShift clusters. Goal is still to differentiate between platforms and offer view only for a single platform.
Direct/Supervised remediations and integration of remediations with Advanced Cluster Manager (as a Service)

Background, and strategic fit

Insights Advisor for OpenShift follows the goal to introduce multiple applications that add value for OpenShift customers under the Insights brand. The current UI and integration of Advisor into OpenShift cluster manager doesn't follow pattern that other Insights for OpenShift applications can/will follow.

Documentation Considerations

OCM documentation is impacted, existing workflows described in OCM documentation will persist. The placement of the application within OCM will be different.

Epic CCXDEV-6500: OCP Advisor (frontend, CY22Q1)

View the Description

TBA

Task CCXDEV-7039: Redirect OCP WebConsole users to Advisor through the links in the widget

View the Description View the linked PRs

OCP WebConsole, in the main dashboard, has an Insights Advisor widget, which has been redirecting users to OCM. Due to the Insights Advisor tab decommission in OCM, the links should point to Advisor instead.

4.10 code freeze = 28 January (marking the task as urgent)

https://github.com/openshift/console/pull/10875

Feature RHDP-291: Maintain existing portfolio priorities

View the Description

Feature Overview

This Feature is a general "catch all" for the time being. There are a number of existing priorities from Q1 that should be aligned with existing priorities below but if not, assign to this feature as needed.

Goals

In order to get a better overall portfolio view, we'll leverage this Feature to gather work that doesn't fall into other existing priorities on this board. As this list grows, the portfolio priority grooming team will look to split out or handle appropriately.

Requirements

A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

requirement	Notes	isMvp

(Optional) Use Cases

< How will the user interact with this feature? >

< Which users will use this and when will they use it? >

< Is this feature used as part of current user interface? >

Out of Scope

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact?>

<What concepts do customers need to understand to be successful in [action]?>
<How do we expect customers will use the feature? For what purpose(s)?>
<What reference material might a customer want/need to complete [action]?>
<Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >
<What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Questions

Question	Outcome

Epic ODC-6351: Dynamic Plugins - Round 3

View the Description

Problem:

Console provides support UI for operators which is dynamically enabled when the operator is installed; by using feature flags against presence of CRDs. While operators have their own release cadence separately from OpenShift which makes for alignment of UI to API difficult. As new features are released for the operator, the UI becomes out of sync with APIs and customers must wait till the following OpenShift release to get any new UI.

Goal:

Create an extensibility mechanism which allows Red Hat operators to build and package their own UI that extends the console.
Make console extensible in areas required to support the needs of contributing plugins.

Why is it important?

Allows an operator to maintain their own UI and release at their own cadence.
Alleviates the pressure on console to deliver UI features for multiple operators within a release.

Use cases:

Serverless / Pipelines / Helm to contribute resource details pages, import flows, topology visuals etc...

Acceptance criteria:

Red Hat Operator can build their own UI which is deployed alongside the operator and extend the dev-console
1. objective is to get to a point where it is possible to accomplish this however code will not be moved to a separate repository, nor deployed by an operator
New extensions for console to allow operators to extend the various areas of console needed in order to provide the proper user experience.
Enable operators to override the static built in support, and supply their own UI

Dependencies (External/Internal):

Design Artifacts:

Console extensions:
https://docs.google.com/document/d/1HW5_cl6cOX5P14PQN-1_8c60o9dMY6HbFDRftH6aTno/edit

Dynamic Plugins:
https://docs.google.com/document/d/19BAFo_8BtMZVvKsU-bE61bZpSydeYONkCMWntMU9NgE/edit

Enhancement proposal:
https://github.com/openshift/enhancements/pull/441

Exploration:

Note:

plugin framework covered by another epic
out of scope:
- moving plugins to separate git repository

Story ODC-6219: Override static plugin contribution with dynamic plugin contribution

View the Description View the linked PRs

Description

As a developer, I want to be able to contribute a dynamic plugin extension and override the same extension contributed by static plugin.

Acceptance Criteria

Should replace static plugin contribution of same name by dynamic plugin contribution

Additional Details:

https://github.com/openshift/console/pull/9744

Feature OBSDA-3: Allow non-admin users to configure individual notification settings

View the Description

Problem Alignment

The Problem

Today, all configuration for setting individual, for example, routing configuration is done via a single configuration file that only admins have access to. If an environment uses multiple tenants and each tenant, for example, has different systems that they are using to notify teams in case of an issue, then someone needs to file a request w/ an admin to add the required settings.

That can be bothersome for individual teams, since requests like that usually disappear in the backlog of an administrator. At the same time, administrators might get tons of requests that they have to look at and prioritize, which takes them away from more crucial work.

We would like to introduce a more self service approach whereas individual teams can create their own configuration for their needs w/o the administrators involvement.

Last but not least, since Monitoring is deployed as a Core service of OpenShift there are multiple restrictions that the SRE team has to apply to all OSD and ROSA clusters. One restriction is the ability for customers to use the central Alertmanager that is owned and managed by the SRE team. They can't give access to the central managed secret due to security concerns so that users can add their own routing information.

High-Level Approach

Provide a new API (based on the Operator CRD approach) as part of the Prometheus Operator that allows creating a subset of the Alertmanager configuration without touching the central Alertmanager configuration file.

Please note that we do not plan to support additional individual webhooks with this work. Customers will need to deploy their own version of the third party webhooks.

Goal & Success

Allow users to deploy individual configurations that allow setting up Alertmanager for their needs without an administrator.

Solution Alignment

Key Capabilities

As an OpenShift administrator, I want to control who can CRUD individual configuration so that I can make sure that any unknown third person can touch the central Alertmanager instance shipped within OpenShift Monitoring.
As a team owner, I want to deploy a routing configuration to push notifications for alerts to my system of choice.

Key Flows

Team A wants to send all their important notifications to a specific Slack channel.

Administrator gives permission to Team A to allow creating a new configuration CR in their individual namespace.
Team A creates a new configuration CR.
Team A configures what alerts should go into their Slack channel.
Open Questions & Key Decisions (optional)
Do we want to improve anything inside the developer console to allow configuration?

Epic MON-880: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task MON-2089: Add external label of origin to platform alerts

View the Description View the linked PRs

As described in https://github.com/openshift/enhancements/blob/ba3dc219eecc7799f8216e1d0234fd846522e88f/enhancements/monitoring/multi-tenant-alerting.md#distinction-between-platform-and-user-alerts, cluster admins want to distinguish platform alerts from user alerts. For this purpose, CMO should provision an external label (openshift_io_alert_source="platform") on prometheus-k8s instances.

https://github.com/openshift/cluster-monitoring-operator/pull/1508

Feature OCPPLAN-5652: The details of this Jira Card are restricted (Only Red Hat employees and contractors)

View the Description

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic NETOBSERV-26: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NETOBSERV-15: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/9953

Feature OCPPLAN-7878: NetEdge - Maintainability and Debugability & Tech Backlog

View the Description

tldr: three basic claims, the rest is explanation and one example

We cannot improve long term maintainability solely by fixing bugs.
Teams should be asked to produce designs for improving maintainability/debugability.
Specific maintenance items (or investigation of maintenance items), should be placed into planning as peer to PM requests and explicitly prioritized against them.

While bugs are an important metric, fixing bugs is different than investing in maintainability and debugability. Investing in fixing bugs will help alleviate immediate problems, but doesn't improve the ability to address future problems. You (may) get a code base with fewer bugs, but when you add a new feature, it will still be hard to debug problems and interactions. This pushes a code base towards stagnation where it gets harder and harder to add features.

One alternative is to ask teams to produce ideas for how they would improve future maintainability and debugability instead of focusing on immediate bugs. This would produce designs that make problem determination, bug resolution, and future feature additions faster over time.

I have a concrete example of one such outcome of focusing on bugs vs quality. We have resolved many bugs about communication failures with ingress by finding problems with point-to-point network communication. We have fixed the individual bugs, but have not improved the code for future debugging. In so doing, we chase many hard to diagnose problem across the stack. The alternative is to create a point-to-point network connectivity capability. this would immediately improve bug resolution and stability (detection) for kuryr, ovs, legacy sdn, network-edge, kube-apiserver, openshift-apiserver, authentication, and console. Bug fixing does not produce the same impact.

We need more investment in our future selves. Saying, "teams should reserve this" doesn't seem to be universally effective. Perhaps an approach that directly asks for designs and impacts and then follows up by placing the items directly in planning and prioritizing against PM feature requests would give teams the confidence to invest in these areas and give broad exposure to systemic problems.

Relevant links:

Documentation:
- Edge Diagnostics Scratchpad, our team's internal diagnostic guide.
- Troubleshooting OCP networking issues - The complete guide, the SDN team's diagnostic guide.
- Linux Performance, Brendan Gregg's guide to analyzing Linux performance issues.
- RFC: A proper feedback loop on Alerts.
- OpenShift Router Reload Technical Overview on Access.
- Performance Scaling HAProxy with OpenShift on Access.
- How to collect worker metrics to troubleshoot CPU load, memory pressure and interrupt issues and networking on worker nodes in OCP 4 on Access.
- OpenShift Performance and Scale Knowledge Base on Mojo, results from OpenShift scalability testing.
- Scalability and performance, OCP 4.5 documentation about the router's currently known scalability limits.
- Scaling OpenShift Container Platform HAProxy Router, OCP 3.11 documentation about the manual performance configuration that was possible in OCP 3.
- Timing web requests with cURL and Chrome from the Cloudflare blog.
- tcpdump advanced filters, some useful tcpdump commands.
- OpenShift SDN - Networking, OCP 3.11 documentation on the SDN (useful background reading).
- Ingress Operator and Controller Status Conditions, design document for improved status condition reporting.
- Observability tips for HAProxy, a slide deck by Willy Tarreau.
- Interesting Traces - Out of Order versus Retransmissions, analysis using tshark.
- The PCP Book: A Complete Documentation of Performance Co-Pilot, by Yogesh Babar.
- Debugging kernel networking bug, brief guide to using SystemTap on RHCOS.
- Troubleshooting throughput issues from the OCP 4.5 documentation.
- Troubleshooting OpenShift Clusters and Workloads.
- Red Hat Enterprise Linux Network Performance Tuning Guide (PDF).
- openshift/enhancements#289 stability: point to point network check, a diagnostic built into the kube-apiserver operator.
Diagnostic tools:
- dropwatch to watch for packet drops.
- ethtool to check NIC configuration.
- iovisor/bcc: BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more to trace and diagnose various issues in the networking stack.
- r-curler to gather timing information about HTTP/HTTPS connections.
- route-monitor, to monitor routes for reachability.
- hping(3), a programmable packet generator.
- OpenTracing / Jaeger in OpenShift.
- node-problem-detector, a possible integration point for new diagnostics.
- Using SystemTap by Brendan Gregg.
- DTrace SystemTap cheatsheet (PDF).
Visualization and more sophisticated diagnostic tools:
- eldadru/ksniff, kubectl plugin for tcpdump & Wireshark.
- ironcladlou/ditm, Dan's "Dan in the Middle" tool.
- Skydive, network diagnostic and visualization tool.
- ali, a "load testing tool capable of performing real-time analysis" with visualization.
Testing tools:
- stress-ng, a general stress-loading tool (CPU, filesystem, network, ...).
- mb, the networking benchmarking tool written and used by Jiri Mencak from our Perf+Scale team.
Case studies:
- BZ1763206 is an example of diagnosing DNS latency/timeouts.
- BZ1829779 Investigation details the diagnosis of route latency.
- BZ1845545 is an example of diagnosing misconfigured DNS for an external LB.
- Debugging network stalls on Kubernetes, from the GitHub Blog, about diagnosing Kubernetes performance issues related to ksoftirqd.

Epic NE-367: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-dns-operator/pull/307

Feature OCPPLAN-8029: Console: Dynamic Plugin Framework

View the Description

Feature Overview

Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.

The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:

Extend the Console
Deliver UI code with their Operator
Work in their own git Repo
Deliver at their own cadence

Goals

- Operators can deliver console plugins separate from the console image and update plugins when the operator updates.
- The dynamic plugin API is similar to the static plugin API to ease migration.
- Plugins can use shared console components such as list and details page components.
- Shared components from core will be part of a well-defined plugin API.
- Plugins can use Patternfly 4 components.
- Cluster admins control what plugins are enabled.
- Misbehaving plugins should not break console.
- Existing static plugins are not affected and will continue to work as expected.

Out of Scope

- Initially we don't plan to make this a public API. The target use is for Red Hat operators. We might reevaluate later when dynamic plugins are more mature.
- We can't avoid breaking changes in console dependencies such as Patternfly even if we don't break the console plugin API itself. We'll need a way for plugins to declare compatibility.
- Plugins won't be sandboxed. They will have full JavaScript access to the DOM and network. Plugins won't be enabled by default, however. A cluster admin will need to enable the plugin.
- This proposal does not cover allowing plugins to contribute backend console endpoints.

Requirements

Requirement	Notes	isMvp?
UI to enable and disable plugins		YES
Dynamic Plugin Framework in place		YES
Testing Infra up and running		YES
Docs and read me for creating and testing Plugins		YES
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-2907: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2381: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/9679

Story CONSOLE-2946: Expose all of core PatternFly for dynamic plugin use

View the Description View the linked PRs

Currently, webpack tree shakes PatternFly and only includes the components used by console in its vendor bundle. We need to expose all of the core PatternFly components for use in dynamic plugin, which means we have to disable tree shaking for PatternFly. We should expose this as a separate bundle. This will allow browsers to cache more efficiently and only need to load the PF bundle again when we upgrade PatternFly.

Open Questions

What parts of PatternFly do we consider core?

Acceptance Criteria

All PatternFly core components are exposed to dynamic plugins
PatternFly is exposed as a separate bundle that is not part of the main vendor bundle

cc Christian Vogt Vojtech Szocs Joseph Caiani James Talton

https://github.com/openshift/console/pull/9882

Feature OCPPLAN-8030: Console: Customer Happiness (RFEs) for 4.8-4.12

View the Description

Feature Overview

This Section:* High-Level description of the feature ie: Executive Summary

Note: A Feature is a capability or a well defined set of functionality that delivers business value. Features can include additions or changes to existing functionality. Features can easily span multiple teams, and multiple releases.

Goals

This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?

CI - MUST be running successfully with test automation

This is a requirement for ALL features.

YES

Release Technical Enablement

Provide necessary release enablement details and documents.

YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories

Alternate flow/scenarios - high-level user stories

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?

Does this feature have doc impact?

New Content, Updates to existing content, Release Note, or No Doc Impact

If unsure and no Technical Writer is available, please contact Content Strategy.

What concepts do customers need to understand to be successful in [action]?

How do we expect customers will use the feature? For what purpose(s)?

What reference material might a customer want/need to complete [action]?

Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.

What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CONSOLE-2893: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2967: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/9956

Story CONSOLE-922: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/10137

Story CONSOLE-2360: Run Pod in Debug mode

View the Description View the linked PRs

As a user, I want the ability to run a pod in debug mode.

This should be the equivalent of running: oc debug pod

Acceptance Criteria for MVP

Build off of the crash-loop back off popover from https://github.com/openshift/console/pull/7302 to include a description of what crash-loop back off is, a link to view logs, a link to view events and a link to debug (container-name) in terminal. If more than one container is crash-looping list them individually.
Create a debug container page that includes breadcrumbs as well as the terminal to debug. Add an informational alert at the top to make it clear that this is a temporary Pod and closing this page will delete the temporary pod.
Add debug in terminal as an action to the logs tool bar. Only enable the action when the crash-loop back off status occurs for the selected container. Add a tool tip to explain when the action is disabled.

Assets
Designs (WIP): https://docs.google.com/document/d/1b2n9Ox4xDNJ6AkVsQkXc5HyG8DXJIzU8tF6IsJCiowo/edit#

https://github.com/openshift/console/pull/9578

Feature OCPSTRAT-469: Install and upgrade OpenShift with GCP Workload Identity

View the Description

OCP/Telco Definition of Done
Feature Template descriptions and documentation.
Feature Overview

Connect OpenShift workloads to Google services with Google Workload Identity

Enable customers to access Google services from workloads on OpenShift clusters using Google Workload Identity (aka WIF)
https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity

Goals

Customers want to be able to manage and operate OpenShift on Google Cloud Platform with workload identity, much like they do with AWS + STS or Azure + workload identity.
Customers want to be able to manage and operate operators and customer workloads on top of OCP on GCP with workload identity.

Requirements

Add support to CCO for the Installation and Upgrade using both UPI and IPI methods with GCP workload identity.
Support install and upgrades for connected and disconnected/restriction environments.
Support the use of Operators with GCP workload identity with minimal friction.
Support for HyperShift and non-HyperShift clusters.
This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CCO-114: Support GCP workload identity

View the Description

Epic Goal

Complete the implementation for GCP workload identity, including support and documentation.

Why is this important?

Many customers want to follow best security practices for handling credentials.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Dependencies (internal and external)

Open questions:

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CCO-123: Update openshift operators to consume new 'external_account' type credentials

View the Description

We need to ensure following things in the openshift operators

1) Make sure to operator uses v0.0.0-20210218202405-ba52d332ba99 or later version of the golang.org/x/oauth2 module

2) Mount the oidc token in the operator pod, this needs to go in the deployment. We have done it for cluster-image-registry-operator here

3) For workload identity to work, gco credentials that the operator pod uses should be of external_account type (not service_account). The external_account credentials type have path to oidc token along, url of the service account to impersonate along with other details. These type of credentials can be generated from gcp console or programmatically (supported by ccoctl). The operator pod can then consume it from a kube secret. Make appropriate code changes to the operators so that can consume these new credentials

Following repos need one or more of above changes

Sub-task CCO-135: Update image registry to consume new 'external_account' type credentials

View the Description View the linked PRs

repo link: https://github.com/openshift/image-registry

https://github.com/openshift/image-registry/pull/283

Feature OCPSTRAT-475: Enable sharing ConfigMaps and Secrets across namespaces [Tech Preview]

View the Description

Feature Overview

Enable sharing ConfigMap and Secret across namespaces

Requirements

Requirement	Notes	isMvp?
Secrets and ConfigMaps can get shared across namespaces		YES

Questions to answer…

Out of Scope

Background, and strategic fit

Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (~~OCPBU-93~~) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them.

Documentation Considerations

Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic BUILD-293: Tech Preview Shared Resource CSI Driver

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Deliver the Projected Resources CSI driver via the OpenShift Payload

Why is this important?

Projected resource shares will be a core feature of OpenShift. The share and CSI driver have multiple use cases that are important to users and cluster administrators.
The use of projected resources will be critical to distributing Simple Content Access (SCA) certificates to workloads, such as Deployments, DaemonSets, and OpenShift Builds.

Scenarios

As a developer using OpenShift
I want to mount a Simple Content Access certificate into my build
So that I can access RHEL content within a Docker strategy build.

As a application developer or administrator
I want to share credentials across namespaces
So that I don't need to copy credentials to every workspace

Acceptance Criteria

OCP conformance suite must ensure that the projected resource CSI driver is installed on every OpenShift deployment.
OCP build suite tests that projected resource CSI driver volumes can be added to builds. Only if builds support inline CSI volumes.
Release Technical Enablement - Docs and demos on how to create a Projected Resource share and add it as a volume to workloads. A special use case for adding RHEL entitlements to builds should be included.

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story BUILD-345: Expose CSI driver metrics to Telemetry

View the Description View the linked PRs

User Story

As an OpenShift engineer
I want to know which clusters are using the Shared Resource CSI Driver
So that I can be proactive in supporting customers who are using this tech preview feature

Acceptance Criteria

Key metrics for the shared resource CSI driver are exported to Telemeter via the cluster monitoring operator.

Docs Impact

None - metrics exported to telemetry are not formally documented.

QE Impact

QE can verify that the query/recording rule for cluster monitoring operator returns data if the cluster has the Shared Resource CSI driver installed and utilizes a SharedSecret or SharedConfigMap in a pod/workload.

PX Impact

Insights rules can potentially be created off of these exported metrics. This would allow CEE to identify which clusters are using SharedSecrets or SharedConfigMaps, especially if we are exporting mount failure metrics.

Notes

To implement, a prometheus query/recording rule needs to be added to the cluster monitoring operator. Once approved by the monitoring team, the metric data will be available on DataHub once 4.10 clusters are installed with the updated version of the monitoring operator.

https://github.com/openshift/cluster-monitoring-operator/pull/1477

Story BUILD-284: Integrate Shared Resources Operator with Cluster Storage Operator

View the Description View the linked PRs

User Story

As a cluster admin
I want the cluster storage operator to install the shared resources CSI driver
So that I can test the shared resources CSI driver on my cluster

Acceptance Criteria

Cluster storage operator uses image references to resolve the csi-driver-shared-resource-operator and all images needed to deploy the csi driver.
Shared resources CSI driver is installed when the cluster enables the CSIDriverSharedResources feature gate, OR
Shared resource CSI driver is installed when the cluster enables the TechPreviewNoUpgrade feature set
CI ensures that if the TechPreviewNoUpgrade feature set is enabled on the cluster, the shared resource CSI driver is deployed and functions correctly.

Docs Impact

Docs will need to identify how to install the shared resources CSI driver (by enabling the tech preview feature set)

Notes

Tasks:

Add the Share APIs (SharedSecret, SharedConfigMap) to openshift/api
Generate clients in openshift/client-go for Share APIs
Update the CSI driver name used in the enum for the ClusterCSIDriver custom resource.
Generate custom resource definitions and include it in the deployment YAMLs for the shared resource operator
Add YAML deployment manifests for the shared resource operator to the cluster storage operator (include necessary RBAC)
Ensure cluster storage operator has permission to create custom resource definitions
Enhance the cluster storage operator to install the shared resource CSI driver only when the cluster enables the CSIDriverSharedResources feature gate

Note that to be able to test all of this on any cloud provider, we need ~~STOR-616~~ to be implemented. We can work around this by making the CSI driver installable on AWS or GCP for testing purposes.

The cluster storage operator has cluster-admin permissions. However, no other CSI driver managed by the operator includes a CRD for its API.

See https://issues.redhat.com/browse/BUILD-159?focusedCommentId=16360509&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16360509

https://github.com/openshift/cluster-storage-operator/pull/198

Feature OCPSTRAT-526: Cloud Controller Managers: Final Testing and GA tasks - Phase 1

View the Description

Feature Overview (aka. Goal Summary)

Upstream Kuberenetes is following other SIGs by moving it's intree cloud providers to an out of tree plugin format, Cloud Controller Manager, at some point in a future Kubernetes release. OpenShift needs to be ready to action this change

Goals (aka. expected user outcomes)

Bring together all the cloud controller managers (AWS, GCP, Azure), complete testing and prepare for final GA

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic OCPCLOUD-1224: Prepare CCCMO for General Availability

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Prepare the Cluster Cloud Controller Manager Operator (CCCMO) component, introduced in 4.9 for GA

Why is this important?

We must ensure that the component is stable before we can declare the product GA

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-1189: CCCMO: isolate provider specific logic within operator

View the Description View the linked PRs

Initial work was started there: https://github.com/lobziik/cluster-cloud-controller-manager-operator/pull/1/files

Need to isolate provider specific code in respective packages and introduce interface to leverage it (regular and bootstrap manifests rendering should be there atm)

DoD:

Introduce templating logic to replace existing substitution mixture

Isolate templating logic so that this is transparent to the core of the CCCMO
Improve testing of the substitution

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/110

Epic CONSOLE-2065: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CONSOLE-2280: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/10551

Epic CONSOLE-2966: 4.10 Console Dependencies & Tech Debt

View the Description

An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.

Task CONSOLE-2964: Dynamic Plugin needs to be externally consumable - Update ts-node

View the Description View the linked PRs

As an adopter of the @openshift-console/dynamic-plugin-sdk I want to easily integrate into my development pipeline so that I can extend the OCP console.

Trying to pull in the dynamic-plugin-sdk into ACM is proving to be problematic. We would have to move to older dependencies. Integrating with webpack and typescript requires a very specific setup.

The dynamic-plugin-sdk has only really been used internally by OCP and is strongly tied to the setup and dependencies of OCP. For the dynamic-plugin-sdk to be externally consumable by adopters, it should be as easy to use as other webpack plugins such as HtmlWebpackPlugin or CompressionPlugin.

Acceptance Criteria

Uses up to date dependencies - not tied to specific versions OCP console uses
Includes it's own dependencies - does not require adopters to include those dependencies
The dynamic demo plugin should be updated to use newer dependencies and use the plugin without a bunch of tweaks to tsconfig paths.

Currently

requires old dependencies
- ts-node 5.0.1 → 10.2.1

https://github.com/openshift/console/pull/10014

Story CONSOLE-2985: Replace all instances of old variables controlling global grid widths and breakpoints with Patternfly variables for more consistency of spacing between elements and behaviors

View the Description View the linked PRs

The console has many instances of old variables, $grid-float-breakpoint and $grid-gutter-width, controlling margins/padding and responsive breakpoints throughout the Admin and Dev Console. These do not provide spacing and behaviors consistent with Patternfly components which use their own variables, $pf-global-gutter-md, $pf-global-gutter, and $pf-global-breakpoint-{size}. By replacing these, the intent it to bring the console closer to a pure Patternfly structure and behavior, requiring less overrides and customizations.

https://github.com/openshift/console/pull/10332

Story CONSOLE-2979: Upgrade Cypress to 8.5.0

View the Description View the linked PRs

Update console from Cypress 6.0.0 to 8.5.0. Changes that impact us:

cypress run is headless by default
cy.intercept URL matching is more strict
Uncaught exception and unhandled promise rejection checks are more strict

https://docs.cypress.io/guides/references/migration-guide#Migrating-to-Cypress-8-0

https://github.com/openshift/console/pull/10164

Story CONSOLE-2972: Upgrade webpack 4.x dependencies

View the Description View the linked PRs

Update webpack to the latest 4.x and update webpack loaders. This will help prepare us to move to webpack 5.

https://webpack.js.org/migrate/5/

https://github.com/openshift/console/pull/10080

Epic IR-208: Continuous Improvement of Maintainability 4.10

View the Description

Epic Goal

Improve CI testing of the image registry components.

Why is this important?

The image registry, image API and the image pruner had a lot of tests removed during transition 4.0. This may make the platform less stable and/or slow down the team.

Scenarios

Acceptance Criteria

CI - tests should be more stable and have broader coverage

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.

Story IR-104: Use library-go in image-registry

View the Description View the linked PRs

In the image-registry, we have packages origin-common and kubernetes-common. The problem is that this code doesn't get updates. We can replace them with more supported library-go.

https://github.com/openshift/image-registry/pull/295

Epic IR-210: Update k8s to 1.23

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story IR-211: Bump k8s to 1.23 in image-registry repo

View the Description View the linked PRs

As a OpenShift engineer
I want image-registry to use the latest k8s libraries
so that image-registry can benefit from new upstream features.

Acceptance criteria

image-registry uses k8s.io/api v1.23.z
image-registry uses latest openshift/api, openshift/library-go, openshift/client-go

https://github.com/openshift/image-registry/pull/302

Epic JKNS-132: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story JKNS-257: Use sidecar pattern for Jenkins pod templates

View the Description View the linked PRs

User Story

As a developer using Jenkins to build my application
I want to use the base Jenkins agent image as a sidecar in my PodTemplate
So that I can use any s2i builder image in my Jenkins pipelines

Acceptance Criteria

Provide new Kubernetes Plugin Pod Templates which uses the sidecar pattern for NodeJS and Maven.
Add documentation on how to use the new pod template in a Jenkinsfile (need to specify the container where the build occurs).
Add documentation on how developers can provide an inline pod template within a Jenkinsfile. Documentation should have the following formats:
- New YAML declarative format
- Deprecated Groovy format
Existing pipelines that use the default Kubernetes Plugin Pod Templates do not break.
End to end testing (for client or sync plugin) verifies that the new pod templates work.

QE Impact

QE will need to verify that the new pod templates can successfully execute a JenkinsPipeline build.

Docs Impact

Documentation needs to be updated to explain how to use the new template.

PX Impact

Unclear if we need new CEE/PX materials beyond doc updates.

Notes

We currently have built-in pod templates for NodeJS and Maven, which use specialized agent images with NodeJS/Maven image.
Blog post here outlines the process: https://developers.redhat.com/blog/2020/06/04/an-easier-way-to-create-custom-jenkins-containers/

The Groovy style of declaring in-line pod templates is deprecated in favor of a YAML-style format.

Existing documentation for the Jenkin pod templates: https://docs.openshift.com/container-platform/4.9/openshift_images/using_images/images-other-jenkins.html#images-other-jenkins-config-kubernetes_images-other-jenkins

https://github.com/openshift/jenkins/pull/1355

Epic MON-1988: Enable audit and query logging for all prometheus read paths

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

Epic Goal

As a CFE team, we would like to enable query logging for all Prometheus read paths
As part of this, we would like to enable audit & query logging for Prometheus Adapter(aggregated server audit log), Prometheus(query log) and ThanosQuerier(query log)

Why is this important?

This would help all parties(customers, app-sres, CCX, monitoring team,..) to debug an overloaded Prometheus instance.

Scenarios

When a customer faces a high cpu consumption in any of the Prometheus instance, they can enable audit logging in Prometheus Adapter to see which component is calling metrics API
When a customer faces a high cpu consumption in any of the Prometheus instance, they can enable query logging in all Prometheus instances(PM & UWM) and ThanosQuerier to see which query is frequently executed
https://bugzilla.redhat.com/show_bug.cgi?id=1982302

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Prometheus Adapter audit logs must be enabled by default
Prometheus Adapter audit logs must be preserved after each CI run

Open questions::

Should we enable ThanosRuler query logs?

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task MON-1786: Allow OpenShift users to configure audit logs for prometheus-adapter

View the Description View the linked PRs

After investigating a complex Bugzilla involving many applications making queries to prometheus-adapter, we've noticed that we were lacking insights on the requests made to prometheus-adapter. To have such information for an aggregated API, the best would be to have audit logs for prometheus-adapter. This wasn't configurable before, but with https://github.com/kubernetes-sigs/custom-metrics-apiserver/pull/92, upstream users should now be able to configure it.

Since this would greatly help in investigating prometheus-adapter Bugzilla in the future, it would be great if we allowed OpenShift users to configure the audit logs so that they could provide them to us.

Note for the assignee: as of the time of the creation of this ticket, the upstream PR hasn't been merged in custom-metrics-apiserver and thus wasn't synced in prometheus-adapter. So we will have to wait a bit before starting looking into this ticket.

DoD:

Allow OpenShift users to configure audit logs for prometheus-adapter
Integrate with must-gather
Document how to configure audit logs in the official OpenShift documentation
Upstream jsonnet patch that enables this feature through a configuration

https://github.com/openshift/must-gather/pull/266

Epic NETOBSERV-16: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NETOBSERV-31: Expose CNI type features as a config-map

View the Description View the linked PRs

The console requires to know the network type capabilities to show/hide some Network Policy form fields.

As a result of https://issues.redhat.com/browse/NETOBSERV-27, this logic is implemented as a features document inside the console code. The console fetches the network type from the network operator and checks the supported features towards this document.

However, this limits the feature to admin users, as other logged-in users do not have permissions to fetch the network type.

This task aims to modify the current Cluster Network Operator to expose the network capabilities as an `sdn-public` Config Map, writeable only by the SDN, readable by any `system:authenticated` user.

Enhancement Proposal PR: https://github.com/openshift/enhancements/pull/875

https://github.com/openshift/cluster-network-operator/pull/1204

Epic OCPCLOUD-1256: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPCLOUD-1252: Set values for validation weebhook for GuestAccelerators field in GCPProviderSpec

View the Description View the linked PRs

We want to configure 'default' and 'allowed' values in validation webhook for Guest Accelerators field in GCPProviderSpec. Also revendor it to include newly added Guest Accelerators field.

This can be done after https://github.com/openshift/cluster-api-provider-gcp/pull/172 is merged.

DoD:

Make sure that validations return errors on issues with GPU configuration
Ensure the unit tests for the webhooks are updated

https://github.com/openshift/machine-api-operator/pull/927

Epic OCPRHV-594: [Refactor] Migrate OCP on RHV subprojects to go-ovirt-client, go-ovirt-client-log and k8sOVirtCredentialsMonitor

View the Description

Description:

Openshift on RHV is composed of the following subproject the team maintains:

Each of those projects currently uses the generated oVirt API project go-ovirt.

This leads to a number of issues:

Duplicated code between the subprojects: Since the go-ovirt is a thin layer around the API then a lot of the code which interacts with oVirt is duplicated between the projects, which leads to all the classic duplication problems such as maintaining the project, lack of clear conventions, and so on.
Bad error handling and unclear errors:
1. Since the go-ovirt is a thin layer there is a lot of error handling and checking which needs to be done, since a lot of the times it looks like a certain error should be ignored, it is never checked which could lead to unexpected situations.
2. Since the errors which are returned from the oVirt Engine are sometimes unclear, when we return those errors to the users or log them is hard to understand what is the actual issue.
Lack of retries: sometimes an operation can take some time due to some condition that needs to be met, or an operation can fail due to infrastructure issues, the go-ovirt library doesn't contain any retry logic which means each client needs to implement its own retry logic which is not done at the moment and will cause more duplicated code.
Poor logging: The current go-ovirt library doesn't log anything, and all the logs come from the subprojects, this leads to:
1. Inconsistent logging between the projects.
2. Lack of logs.
Almost no test coverage:
1. It's very hard to mock and write tests with go-ovirt since there are so many calls, but will be much easier to mock and write tests with go-ovirt-clent.
2. go-ovirt only has rudimentary tests.

Then came go-ovirt-client, go-ovirt-client-log, go-ovirt-client-log-klog and k8sOVirtCredentialsMonitor to the rescue!

The go-ovirt-client is a wrapper around the go-ovirt which contains all the error handling/retry logic/logs/tests needed to provide a decent user experience and an easy-to-use API to the oVirt engine.

go-ovirt-client-log is a library to unify the logging logic between the projects, it is used by go-ovirt-client and should be used by all the sub-projects.

go-ovirt-client-log-klog is a companion library to go-ovirt-client-log enabling logging via the Kubernetes "klog" facility.

k8sOVirtCredentialsMonitor is a utility for monitoring the oVirt credentials secret, which will automatically update the ovirt credentials is they are changed.

We aim to move all projects which are using the go-ovirt to use go-ovirt-client, go-ovirt-client-log and k8sOVirtCredentialsMonitor instead.

Benefits for the eng:

Possible to write unit tests.
Easier to maintain since less code duplication - reduce the amount of code.
Test coverage exists on the ovirt-client as well.
No(Less) bugs regarding operations that needed a retry or polling logic.
Solves a number of existing bugs

Benefits for the customers:

Clearer error messages and logs.
Fewer bugs.

Acceptance criteria:

All sub-projects are not using go-ovirt directly - at least 90% of the calls to go-ovirt should be migrated to go-ovirt-client.
All sub-projects should use the corresponding go-ovirt-client-log for logging.
All csi-driver and cluster provide use k8sOVirtCredentialsMonitor.
CI tests are green for all components.

How to test:

QE regression - make sure all flows are still working.
Green CI on all jobs.
Keep an eye out for log messages that might confuse customers.

Task OCPRHV-596: Migrate ovirt-csi-driver to go-ovirt-client

View the Description View the linked PRs

Description:

Identify all the communication between ovirt-csi-driver and the go-ovirt.
Port all the logic to go-ovirt-client.
Port all calls on ovirt-csi-driver to go-ovirt-client.

Acceptance:

ovirt-csi-driver uses go-ovirt-client for 95% percent of all oVirt related logic.

https://github.com/openshift/ovirt-csi-driver/pull/88

Epic ODC-6266: Improve DevExp for front end developers

View the Description

T-shirt size: M

Goal:

Provide an easy and successful experience for front end developers to build and deploy their applications

Why is it important?

Currently, the front end dev experience is not positive. It's much easier for them to use other platforms. Improving the front end dev experience will enable us to gain more marketshare

Use cases:

Need to be able to override the npm command when using Node Builder Image
Need to expose target port
Need access to the URL to access my application

Although we provide the ability for 2 & 3 today, the current journey does not match with the mental model of the front end developer

Acceptance criteria:

When importing an app, I should be able to easily provide the npm build and run commands
When opting in to create a route, the target port should be exposed without having to open any Advanced Options
After importing my app, if a route is exposed, I should be able to access/copy that URL

Dependencies (External/Internal):

Design Artifacts:

Desired UX experience

enable user to provide the *Build Command* when Node Builder image is being used
enable user to provide the *Run Command* when Node Builder image is being used

expose the Target Port under the *Create a route to the Application *rather than inside Show advanced Routing options

NEED TO FINALIZE HOW TO PROVIDE THE ROUTE TO EASILY COPY – Inline Notification maybe? As well as side panel?

Note:

Story ODC-6443: Add an option to add additional labels for just the Route and move the target port before the route checkbox

View the Description View the linked PRs

Description

As a user, I want have the option to add additional labels to a Route, as I could do in OCP3. See ~~RFE-622~~

The additional labels should only be added to the route, not the service or other components. The advanced option "Labels" should not be touched and these labels are added to all components.

As an small additional we should also show always the "Target port" since it also defines the Service port and to make this more clear, the "Target port" should be shown before the "Create a route to the Application" checkbox.

Acceptance Criteria

The following changes should be applied to the Import flow (from Git, from Container, ...) and to the Edit page as well:

Move the option "Target port" before the checkbox "Create a route to the Application" and do not hide the "Target port" when the checkbox is disabled
Add a new "Additional route labels" option, with a label input field to the "Advanced Routing options"
Save (Import) and update (Edit) the labels to the Route resource. When editing a Deployment with a Route the route labels should not show the shared labels.

Additional Details:

https://github.com/openshift/console/pull/10663

Epic ODC-6322: Automation Test plan for 4.10 Release

View the Description

Problem:

This epic is mainly focused on the 4.10 Release QE activities

Goal:

1. Identify the scenarios for automation
2. Segregate the test Scenarios into smoke, Regression and other user stories
a. Update the https://docs.jboss.org/display/ODC/Automation+Status+Report
3. Align with layered operator teams for updating scripts
3. Work closely with dev team for epic automation
4. Create the automation scripts using cypress
5. Implement CI for nightly builds
6. Execute scripts on sprint basis

Why is it important?

To the track the QE progress at one place in 4.10 Release Confluence page

Use cases:

<case>

Acceptance criteria:

<criteria>

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Task ODC-6453: Enhance the after all hook to handle deletion of more than one namespace created in a feature file

View the linked PRs

https://github.com/openshift/console/pull/10859

Task ODC-6455: Add page tests should use latest UI labels like "Import from Git" instead of mapping "From Devfile" strings

View the Description View the linked PRs

There are different code spots which maps the old action items "From Git", "From Dockerfile" and "From Devfile" to the new action "Import from Git".

We should avoid mapping different strings to the new version and instead update our tests so that the feature and page object files matches the latest frontend code.

Code areas I found are marked with

      // TODO (ODC-6455): Tests should use latest UI labels like "Import from Git" instead of mapping strings

https://github.com/openshift/console/pull/10864

Epic CONSOLE-2848: Port all Protractor tests to Cypress

View the Description

Epic Goal

Port all remaining Protractor tests to Cypress

Why is this important?

Protractor is very hard to debug when tests fail/flake
Once all protractor tests are ported we can remove all Protractor dependencies, scripts, and configuration files.
Cypress has better debugging, plug-ins, and reporting tools

Acceptance Criteria

CI - MUST be running successfully with tests automated

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CONSOLE-2867: Cypress: port protractor OAuth tests

View the Description

Please read: migrating-protractor-tests-to-cypress

Protractor test to migrate: `frontend/integration-tests/tests/oauth.scenario.ts`
Large but straight forward

47) OAuth

   48) BasicAuth IDP
      ✔ creates a Basic Authentication IDP
      ✔ shows the BasicAuth IDP on the OAuth settings page

   49) GitHub IDP
      ✔ creates a GitHub IDP
      ✔ shows the GitHub IDP on the OAuth settings page

   50) GitLab IDP
      ✔ creates a GitLab IDP
      ✔ shows the GitLab IDP on the OAuth settings page

   51) Google IDP
      ✔ creates a Google IDP
      ✔ shows the Google IDP on the OAuth settings page

   52) Keystone IDP
      ✔ creates a Keystone IDP
      ✔ shows the Keystone IDP on the OAuth settings page

   53) LDAP IDP
      ✔ creates a LDAP IDP
      ✔ shows the LDAP IDP on the OAuth settings page

   54) OpenID IDP
      ✔ creates a OpenID IDP
      ✔ shows the OpenID IDP on the OAuth settings page

Accpetance Criteria

Protractor test ported to cypress
Remove any unused legacy data-test-id`s
Protractor test deleted, and non longer referenced in `frontend/integration-tests/protractor.conf.ts`

Sub-task CONSOLE-2870: - delete -

View the linked PRs

https://github.com/openshift/console/pull/10226

Epic OCPCLOUD-737: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPCLOUD-1263: Integrate openshift/API machine definitions into components

View the Description

Background

As a follow up to ~~OCPCLOUD-693~~, we need to, once all of the API definitions are present in openshift/api, migrate the existing code bases to use the new API locations.

This will include:

Machine API Operator
Cluster Machine Approver
Cluster API Provider AWS|Azure|GCP|IBM|Alibaba|OpenStack|Kubevirt
Cluster API actuator pkg
Installer
WMCO
MCO
Hive
Grep OpenShift for other references to our old APIs

Steps

Replace the Machine API imports with the new openshift/API MAPI locations

Stakeholders

Cluster Infra
Owners of the repos listed above

Definition of Done

The openshift/API defintions are used across components in the MAPI ecosystem

Docs

Generated docs for API types should now come from openshift/API

Testing

Regular regression testing should be sufficient, this is a copy paste for the most part and we expect the code won't compile if we break this

Sub-task OCPCLOUD-1267: Migrate cluster-api-provider-gcp to new API defintions

View the linked PRs

https://github.com/openshift/machine-api-provider-gcp/pull/3

Epic ODC-6381: 4.9 Epics Automation stories tech debt

View the Description

Problem:

Complete all the 4.9 epic features automation user stories and merge it to master branch.

Goal:

4.9 epics automation completion

Why is it important?

Tech debt should be completed

Use cases:

<case>

Acceptance criteria:

Create the pr's for 4.9 epic user stories automation
Review it
Merge it to 4.10 master branch and 4.9 master branch

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Story ODC-6364: Epic Automation for ODC-5149 "Pipeline as Code"

View the Description View the linked PRs

Description

As a user, I want to store my delivery pipelines in a Git repository as the source of truth and execute the pipeline on OpenShift on Git events, so that I can version and trace changes to the delivery pipelines in Git.

Use Cases

Developer can see the list of Git repositories that are added to the namespace for pipeline-as-code execution
Developer can navigate from the Console to the Git repository on the Git provider
For each Git repository, developer can see the details of the last pipeline execution and the commit id that triggered it with possibility to navigating to the Git commit in the Git provider
Developer can see the list of pipelinerun executions related to a Git repository in a chronological order and the commit id that triggered each

Acceptance Criteria

As a user, looking at the Pipelines page in the Developer Console, I should be able to see a list of (a) Git repositories that are added to the namespace for PAC execution AND (b) all pipelines in the namespace
As a user, I should be able to navigate to a details page of the git repo.
1. This details page should provide access to (a) details of the git repo and (b) a list of pipeline runs.
2. This PLR tab should show additional information than the typical PLR List view, including SHA (commit id), commit message, branch & trigger type
As a user, when looking at a Pipeline Run Details page, if associate with a git repo (PAC),
1. Indicate that it's from a specific git repo rather than a PL resource
2. Include the SHA (commit id), commit message, branch & trigger type

https://github.com/openshift/console/pull/10521

Story CONSOLE-2892: Allow dynamic plugins to proxy to services on the cluster

View the Description View the linked PRs

Goal

We have several use cases where dynamic plugins need to proxy to another service on the cluster. One example is the Helm plugin. We would like to move the backend code for Helm to a separate service on the cluster, and the Helm plugin could proxy to that service for its requests. This is required to make Helm a dynamic plugin. Similarly if we want to have ACM contribute any views through dynamic plugins, we will need a way for ACM to proxy to its services (e.g., for Search).

It's possible for plugins to make requests to services exposed through routes today, but that has several problems:

It requires that the service be exposed outside the cluster, which is not always desired.
It requires the service support CORS headers for the console.
There is no way to specify a CA file for the route if it's not trusted by the browser.
Plugins will not have access to the user's access token on the client, which means that there is no simple way to handle auth.

Plugins need a way to declare in-cluster services that they need to connect to. The console backend will need to set up proxies to those services on console load. This also requires that the console operator be updated to pass the configuration to the console backend.

This work will apply only to single clusters.

Open Questions

What happens when a multitenant isolated network policy is configured on the cluster?

https://docs.openshift.com/container-platform/4.7/networking/network_policy/multitenant-network-policy.html

How do we (and can we?) support this for multi-cluster where console is running on a different hub cluster?
Do we need to auth for all requests?

Acceptance Criteria

Plugins can declare a service to proxy to in the ConsolePlugin resource
Plugins can specify a CA cert for the service
Console falls back to the service signing CA if none is specified
Plugins have a way of specifying whether the user's authentication token is included in requests through the service proxy
Dynamic plugin enhancement is updated with the implementation details
Support for server-side events (SSE) for ACM
Add support, or a flag, if auth is needed for each request.

cc Ali Mobrem [~christianmvogt]

Story OCPCLOUD-1278: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/226

Task IR-224: Bump openshift/api package

View the Description View the linked PRs

Acceptance criteria:

All tests (including e2e) pass
No regressions are introduced
openshift/api points to a recent commit on the master branch

https://github.com/openshift/cluster-image-registry-operator/pull/728

Task MON-975: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1338

Task IR-227: Remove legacy code for platformStatus

View the Description View the linked PRs

Before platformStatus, the operator used to get information about AWS and GCP from the install-config config map. This code can be removed.

https://github.com/openshift/cluster-image-registry-operator/pull/739

Task MON-1872: Use upstream kube-thanos in cluster-monitoring-operator jsonnet

View the Description View the linked PRs

As per [1], the jsonnet code for managing thanos-ruler resources should reuse the upstream kube-thanos project.

[1] https://github.com/openshift/cluster-monitoring-operator/blob/399c84dbca596b611b0c30a0d2df63a5d2b0b8cc/jsonnet/components/thanos-ruler.libsonnet#L1

https://github.com/openshift/cluster-monitoring-operator/pull/1478

Task MON-1873: Tag all resources created by CMO e2e tests

View the Description View the linked PRs

The CMO e2e tests create a bunch of resources. These should be cleaned up on a successful run. However:

Some test failures leave the create resource behind, which have to be cleaned up before a re-run.
There have been developer reports that even successful runs don't tidy up everything.

In a CI context this is rarely a problem, however running the tests locally can be made quite awkward, especially repeated runs on the same cluster.

We should tag all resources created by the e2e tests with a label (app.kubernetes.io/created-by: cmo-e2e-test).
This will allow easy cleanup by deleting all resources with that label and will allow for checking proper clean-up.

DoD:
All e2e resources get properly tagged.
It is straight forward to ensure that future code changes don't skip adding this tag.

https://github.com/openshift/cluster-monitoring-operator/pull/1397

Task MON-1964: Make Telemeter receive endpoint request limit configurable

View the Description View the linked PRs

Currently, Telemeter is not equipped with configurable request limit for receive endpoint (for full context see: https://github.com/openshift/cluster-monitoring-operator/pull/1416). It is using the default limit defined in the code base, however it seems this limit might not be suitable for our usage.

As a part of this ticket, it should be:

1) Understood what is the appropriate limit for request size for our use cases

2) Make the limit configurable in Telemeter via a flag

3) Deploy the changes, initially to the staging environment, to enable our team to test it.

Story CONSOLE-2768: console-operator should use bindata instead of inlining manifests

View the Description View the linked PRs

console-operator codebase contains a lot of inline manifests. Instead we should put those manifests into a `/bindata` folder, from which they will be read and then updated per purpose.

https://github.com/openshift/console-operator/pull/550

Task MON-1659: set relatedObjects in ClusterOperator manifests

View the Description View the linked PRs

As mentioned in [1], the cluster monitoring operator doesn't define the relatedObjects field in the ClusterOperator manifest which is initially deployed by CVO [2].
If the CMO pod fails to start, the must-gather might miss information from the monitoring namespace. Note that once CMO runs, it will update the initial ClusterOperator object with the proper information [3].

[1] http://mailman-int.corp.redhat.com/archives/aos-devel/2021-May/msg00139.html
[2] https://github.com/openshift/cluster-monitoring-operator/blob/master/manifests/0000_50_cluster-monitoring-operator_06-clusteroperator.yaml
[3] https://github.com/openshift/cluster-monitoring-operator/blob/a6bc9824035ceb8dbfe7c53cf0c138bfb2ec5643/pkg/client/status_reporter.go#L49-L63

https://github.com/openshift/cluster-monitoring-operator/pull/1483

Story MON-1679: use static authorizer feature of kube-rbac-proxy

View the Description View the linked PRs

The static authorizer feature has landed in upstream kube-rbac-proxy. Lets use it by configuring a static authorizer for all requests that hit a /metrics endpoint.

DoD:

Downstream kube-rbac-proxy is synced.
All CMO operands are configured with static authorization.
Bugzillas created for all non-monitoring components using kube-rbac-proxy for metrics authn/authz.

https://github.com/openshift/cluster-monitoring-operator/pull/1318

Task MON-1218: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1379

Story MON-1949: Improve prometheus-adapter consistency

View the Description View the linked PRs

The current integration of prometheus-adapter in OpenShift uses the platform Prometheus as a backend to get metrics. The problem with this design is that we are getting metrics from 2 different Prometheus instances which don't have replicated data, so two queries sent at the same time to prometheus-adapter might yield different results since the underlying promQL queries executed by prometheus-adapter might be on different Prometheus servers. The consequence is that we end up having inconsistent data across multiple autoscaling requests.

This can be easily tested by running:

$ while true ; do date; oc adm top pod -n openshift-monitoring  prometheus-k8s-0 ; echo; sleep 1 ;done 

Mon Jul 26 03:55:07 EDT 2021
NAME               CPU(cores)   MEMORY(bytes)   
prometheus-k8s-0   208m         4879Mi          

Mon Jul 26 03:55:08 EDT 2021                               
NAME               CPU(cores)   MEMORY(bytes)   
prometheus-k8s-0   246m         4877Mi          

Mon Jul 26 03:55:09 EDT 2021                               
NAME               CPU(cores)   MEMORY(bytes)   
prometheus-k8s-0   208m         4879Mi          

Mon Jul 26 03:55:10 EDT 2021
NAME               CPU(cores)   MEMORY(bytes)   
prometheus-k8s-0   246m         4877Mi

This isn't a bug in itself since it was designed that way, but we could do better by using thanos-querier as a backend instead of the platform Prometheus because it will duplicate the metrics from both instances and serve one consistent result based on the data that it will get from the Prometheuses.

DoD:

Use thanos-querier as a backend for prometheus-adapter

https://github.com/openshift/cluster-monitoring-operator/pull/1417

Story CONSOLE-2975: Migrate from Node Sass to Dart Sass

View the Description View the linked PRs

Node Sass is deprecated. See https://github.com/sass/node-sass

https://github.com/openshift/console/pull/10149

Task MON-1656: Add a Makefile rule in CMO for verifications and checks

View the Description View the linked PRs

Add a Makefile rule in CMO to execute all the different rule that are used for verification and validation. Currenctly, some of them might not be at the right place, for example `check-assets` which is part of `generate` despite not being responsible of any generation. https://github.com/openshift/cluster-monitoring-operator/pull/1151/files#r629371735

DoD:

Add a new rule in CMO to handle verification
Add a CI job for this rule

Task MON-1890: update openshift/kube-state-metric to 2.2.0

View the Description View the linked PRs

New release https://github.com/kubernetes/kube-state-metrics/releases

https://github.com/openshift/kube-state-metrics/pull/61

Story SPLAT-246: [vsphere] Ensure clear user agent strings set for components calling to vSphere API

View the Description View the linked PRs

*USER STORY:*

As a customer or OpenShift engineer, I want to see the user agent for anything calling from OpenShift -> vSphere to eliminate troubleshooting guesswork.

*DESCRIPTION:*

A question in #forum-vmware was raised where we identified that the user-agent may not be configured for all OpenShift components calling to vSphere API.

https://coreos.slack.com/archives/CH06KMDRV/p1627368902058800

*Required:*

Audit of OpenShift components calling to vSphere API to make sure user agent strings are set appropriately.

*Nice to have:*

How can this be prevented in the future? How can we minimize maintenance costs added by new PRs/bugs reported from this spike?

*ACCEPTANCE CRITERIA:*

New PRs or bug reports for each effected component.

4.10.27

Changes from 4.9.59

Complete Features

Feature Overview

Goals

Requirements

(Optional) Use Cases

Out of Scope

Background, and strategic fit

Assumptions

Customer Considerations

Documentation Considerations

Questions

Problem:

Goal:

Why is it important?

Use cases:

Acceptance criteria:

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

Acceptance Criteria

Additional Details:

Incomplete Features

Problem Alignment

The Problem

High-Level Approach

Goal & Success

Solution Alignment

Key Capabilities

Key Flows

Feature Overview

Goals

Out of Scope

Requirements

Feature Overview

Goals

Requirements

(Optional) Use Cases

Questions to answer…

Out of Scope

Background, and strategic fit

Assumptions

Customer Considerations

Documentation Considerations

Goals

Requirements

(Optional) Use Cases

Questions to answer…

Out of Scope

Background, and strategic fit

Assumptions

Customer Considerations

Documentation Considerations

Epic Goal

Why is this important?

Acceptance Criteria

Open questions:

Done Checklist

Feature Overview

Requirements

Questions to answer…

Out of Scope

Background, and strategic fit

Documentation Considerations

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

User Story

Acceptance Criteria

Docs Impact

QE Impact

PX Impact