Openshift: Difference between revisions

From Halfface
Jump to navigation Jump to search
 
(654 intermediate revisions by the same user not shown)
Line 1: Line 1:
=What does it mean?=
=What does it mean?=
  annotation            comment,definition
  acme                  Automated Certificate Management Environment
annotations          Key=value pairs. That provides metadata for object.
  ceph                  Delivers object, block, and file storage in one unified system.
  ceph                  Delivers object, block, and file storage in one unified system.
ceph-osd              object storage daemon for the Ceph distributed file system. It is responsible for storing objects on a local file system and providing access to them over the network.
  clbo                  CrashLoopBackOff
  clbo                  CrashLoopBackOff
  crd                   custom resource definitions
  clo                   Cluster Logging Operator
cmo                  Cluster Monitoring Operator
  cncf                  Cloud Native Computing Foundation
  cncf                  Cloud Native Computing Foundation
  cni                  Container Network Interface (OVNKubernetes OpenShiftSDN)       <
  cni                  Container Network Interface (OVNKubernetes OpenShiftSDN)
cns                  Cloud Native Storage
cnv                  Container-native Virtualization, add-on to OpenShift Container Platform that allows virtual machine workloads to run and be managed alongside container workloads.
co                    Cluster Operator
cpi                  Cloud Provider Interface
cr                    Custom Resource. (I found it like something added by enabling something. You get it from "oc api-resources")
crd                  Custom Resource Definition. The name of a CRD object must be a valid DNS subdomain name.
  cri                  Container Runtime Interface
  cri                  Container Runtime Interface
  cri-o                Lightweight container runtime for kubernetes.
  cri-o                Lightweight container runtime for kubernetes.
  csi                  Container Storage Interface
  csi                  Container Storage Interface
csm                  Container Storage Modules
  csv                  cluster service version
  csv                  cluster service version
cvo                  Cluster Version Operator
  cvss                  Common Vulnerability Scoring System
  cvss                  Common Vulnerability Scoring System
  deployment            You describe a desired state in a Deployment
daemonset            Ensures that all (or some) Nodes run a copy of a Pod
  Ephemeral             Short lived  
  deployment            You describe a desired state in a Deployment. Deployment object describes how to create or modify pods that hold a containerized application by defining the desired state of a particular component. Deployments create and manage how ReplicaSets are deployed.
eo                    ElasticSearch Operator
  ephemeral             Short lived, temporary
eus                  Extended Update Support
Fluentd              data collector designed to handle logging by unifying and processing data from various sources.
fluent bit            lightweight and high-performance data collector. logs but can handle metrics too.
fsgroup              Group which Kubernetes will change the permissions of all files in volumes to when volumes are mounted by a pod.
geneve                Generic Network Virtualization Encapsulation OVN-Kubernetes uses Geneve.
grcp                  Google Remote Procedure Call, framework that brings performance benefits and modern features to client-server applications. Like RPC
icsp                  ImageContentSourcePolicy. Blocking a payload registry.
  idp                  identity provider
  idp                  identity provider
idps                  identity providers
implicit              indirect, hinted,
ingressclass          use multiple ingress controllers managing network traffic routing within a cluster.
ipc namespace        Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. .
  ipi                  Installer-Provisioned Infrastructure
  ipi                  Installer-Provisioned Infrastructure
kcs                  Knowledge Centered Support, Red Hat's way of offering solutions and articles for known questions or problems.
  kubelet              Kubelet is the primary "node agent" that runs on each node. Takes a set of PodSpecs (primarily through the apiserver) and ensures the containers described are running and healthy.
  kubelet              Kubelet is the primary "node agent" that runs on each node. Takes a set of PodSpecs (primarily through the apiserver) and ensures the containers described are running and healthy.
kvdb                  key-value store (portworx)
machineset            Managing a set of machines with similar characteristics, manage a group of machines. Desired number of machines.
manifest              Manifest is a YAML or JSON file that describes the desired state of a Kubernetes object.
  mco                  machine-config-operator
  mco                  machine-config-operator
mcp                  machine config pools
Metricbeat            leightweight shipper for metrics
  noobaa                data service for cloud environments, providing S3 object-store interface with flexible tiering, mirroring, and spread placement policies, over any storage resource that allows GET/PUT including S3,GCS..
  noobaa                data service for cloud environments, providing S3 object-store interface with flexible tiering, mirroring, and spread placement policies, over any storage resource that allows GET/PUT including S3,GCS..
  nsfs                  virtual filesystem making Linux-kernel namespaces available.
  nsfs                  virtual filesystem making Linux-kernel namespaces available.
  oadp                  openshift api data protection
  oadp                  openshift api data protection
  oci                  Open Container Initiative
  oci                  Open Container Initiative
ocm                  OpenShift Cluster Manager
  ocp                  OpenShift Container Platform
  ocp                  OpenShift Container Platform
  ocs                  OpenShift Container Storage
  ocs                  OpenShift Container Storage
ocm                  OpenShift Cluster Manager
  odf                  OpenShift Data Foundation
  odf                  OpenShift Data Foundation
oidc                  OpenID Connect, is an identity layer on top of the OAuth 2.0 protocol.
  olm                  Operator Lifecycle Manager
  olm                  Operator Lifecycle Manager
  pvc                  Persistent volume claim.
osm                  Open Service Mesh. Lightweight, extensible, cloud native service mesh
  pv                    Persistent volume. Persistent storage.
ovnk                  Open Virtual Network Kubernetes
pdb                  Pod Disruption Budget
  pvc                  Persistent volume claim. binding between a Pod and Persistent Volume.
  pv                    Persistent volume. Persistent storage. low level representation of a storage volume.
prometheus            Prometheus is a time-series database (TSDB). handle the collection, storage, and querying of time-series data. Alerting
provisioner          A StorageClass object contains a provisioner that decides which volume plugin is used to provision PersistentVolumes.
  quay.io              builds, analyzes, distributes your container images. Owned by IBM
  quay.io              builds, analyzes, distributes your container images. Owned by IBM
ReadWriteMany        Storage read/write for many.
reconciling          Restore friendly relations between.
registrar            The node-driver-registrar is a sidecar container that registers the CSI driver with Kubelet using the kubelet plugin registration mechanism.
replicaset            Maintain a stable set of replica Pods running at any given time
rhacm                Red Hat Advanced Cluster Management for Kubernetes
  rhcos                Red Hat Enterprise Linux CoreOS
  rhcos                Red Hat Enterprise Linux CoreOS
  rhcp                  Red Hat Ceph Storage
  rhcp                  Red Hat Ceph Storage
  reconciling          Restore friendly relations between.
  rhcs                  Red Hat Cluster Suite
  replicaset            Maintain a stable set of replica Pods running at any given time
rhocp                Red Hat OpenShift Container Platform
  rook                  File, block, and object storage for your cloud native environment and is based on battle tested ceph storage.
  rhol                  Red Hat OpenShift Logging
  rook                  Operator. File, block, and object storage for your cloud native environment and is based on battle tested ceph storage.
  rosa                  Red Hat OpenShift Service on AWS
  rosa                  Red Hat OpenShift Service on AWS
s2i                  source-to-image
sa                    Service Account
scc                  security context constraints
sc                    security context
seccomp              Secure computing mode profiles can be associated with a container to restrict available system calls.
SelfLink              URL representing the given object.
  service              Logical abstraction for a deployed group of pods in a cluster (which all perform the same function).
  service              Logical abstraction for a deployed group of pods in a cluster (which all perform the same function).
scc                  security context constraints
  skopeo                Command line utility used to interact with local and remote container images and container image registries
  skopeo                Command line utility used to interact with local and remote container images and container image registries
StatefulSet          Workload object to manage stateful applications. Deployment and scaling Pods, ordering and uniqueness of Pods.
Storage Class        allows for dynamic provisioning of Persistent Volumes.
svc                  service
taint                Taints ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node.
tekton                Container-native way to manage CI/CD. It's also the basis for OpenShift Pipelines.
  thanos                Long-Term storage for your Prometheus Metrics on OpenShift
  thanos                Long-Term storage for your Prometheus Metrics on OpenShift
toleration            You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints.
ubi                  Universal Base Images OCI-compliant container base operating system images with complementary runtime languages and packages that are freely redistributable.
  upi                  User-Provisioned Infrastructure
  upi                  User-Provisioned Infrastructure
uts                  Unix Timesharing System namespace. Controls the hostname and the NIS domain.
  uWSGI                Project aims at developing a full stack for building hosting services.
  uWSGI                Project aims at developing a full stack for building hosting services.
vxlan                virtual extensible LAN, The OpenShift SDN uses OpenvSwitch tunnels, OpenFlow rules, and iptables.
  wwn                  world wide names. Fiber channel
  wwn                  world wide names. Fiber channel


Line 51: Line 107:
=read=
=read=
https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/
https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/
=Projects that I have read about but forgotten=
OpenEBS              Storage solution. Possible backends. local, nfs, zfs, nvme. CStor to serve iSCSI block storage using the underlying disks or cloud volumes in a cloud native way
=files of value=
=files of value=
  metadata.json        File created during install. Used by openshift-install destroy cluster
  metadata.json        File created during install. Used by openshift-install destroy cluster
Line 57: Line 116:
Available resources to ask about.
Available resources to ask about.
  oc api-resources
  oc api-resources
Get everything
oc api-resources -o name --no-headers | while read i ; do echo '***' $i ; oc get $i -A -o yaml 2>&1 ; done > /tmp/oc_api-resourece.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S)


=login=
=login=
Line 71: Line 132:
=select project=
=select project=
  oc project $project
  oc project $project
kubectl config set-context --current --namespace=kube-public
=create project/namespace=
oc create namespace redis
=list pods=
=list pods=
  oc get pods
  oc get pods
Line 79: Line 145:
Get pods that are not runing.
Get pods that are not runing.
  oc get pods --field-selector status.phase!=Running --all-namespaces
  oc get pods --field-selector status.phase!=Running --all-namespaces
oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done
Get pods matching two states
Get pods matching two states
  oc get pods --field-selector=status.phase!=Running,spec.restartPolicy=Always
  oc get pods --field-selector=status.phase!=Running,spec.restartPolicy=Always
oc get nodes --no-headers --selector='node-role.kubernetes.io/worker,!node-role.kubernetes.io/infra'
Get pods running on specific node
oc get pods -A -o wide --field-selector spec.nodeName=<node>
Get pods with label name=portworx-proxy
oc get pods -A -l name=portworx-proxy
Get pods with several labels
oc get pod -l 'app in (rook-ceph-mon,rook-ceph-operator,rook-ceph-osd,rook-ceph-rgw,rook-ceph-mgr,rook-ceph-mds,rook-ceph-crashcollector)'
Get pods with extra column port.
kubectl get pods --output=custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,IP:.status.podIPs[*].ip,POD_PORT:.spec.containers[*].ports[*].containerPort
Get pods with column restarts
oc get pods -o custom-columns='NAMESPACE:.metadata.namespace,POD:.metadata.name,RESTART:.status.containerStatuses[*].restartCount' -A | sort -k3 -n | tail -10


=get services=
=Service=
oc get svc
A Kubernetes Service is an abstraction that defines a logical set of Pods and a policy by which to access them. Services enable loose coupling between dependent Pods.
=Endpoint=
An Endpoint is an object that represents the IP addresses and ports of the Pods that back a Service. When a Service is created, Kubernetes automatically creates an associated Endpoints object.
=EndpointSlices=
EndpointSlices offer a scalable, efficient, and feature-rich alternative to traditional Endpoints, topology.


=get shell on node=
=get shell on node=
It is possible to debug more then nodes. (deployment, build, or job)
It is possible to debug more than nodes. (deployment, build, or job)
  oc debug node/infra-2.ocpdev.lkl.ltkalmar.se
  oc debug node/infra-2.ocpdev.lkl.ltkalmar.se
Get working env
Get working env
  chroot /host
  chroot /host
Connect to node in eks.
kubectl debug node/<node> -it --image=halfface/rockylinux-toolbox:v3


=get debug information from oc=
=get debug information from oc=
  oc debug --loglevel=10 node/$node
  oc debug --loglevel=10 node/$node
=debug pod run as root disable health checks=
oc debug deployment/my-deployment-name --as-root


=get nodes=
=get nodes=
Line 99: Line 185:
# Get nodes without headears. name, cpu:s, disk size, mem, ip address.
# Get nodes without headears. name, cpu:s, disk size, mem, ip address.
  oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,CPU:.status.capacity.cpu,DISK:.status.capacity.ephemeral-storage,MEM:.status.capacity.memory,IP:.status.addresses[?(@.type=="InternalIP")].address'
  oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,CPU:.status.capacity.cpu,DISK:.status.capacity.ephemeral-storage,MEM:.status.capacity.memory,IP:.status.addresses[?(@.type=="InternalIP")].address'
# Get node name and ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,IP:.status.addresses[?(@.type=="InternalIP")].address'
=ip address of node=
Outside pod.
oc get pod --template '{{.status.podIP}}' openshift-gitops-application-controller-0
Inside pod.
echo $POD_IP
=get nodes that are overcommited=
oc get nodes -o jsonpath='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' | sed 's/:/=node;/g' | sed 's/;/\n/g' | grep -vE 'MemoryPressure=False|DiskPressure=False|PIDPressure=False|Ready=True'
Does any node stick out.
oc get nodes --no-headers -o=custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions


=connect to pod=
=connect to pod=
Line 106: Line 204:
  router
  router
  logs
  logs
=list all containers running in a cluster=
<pre>
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq -c
</pre>
=connect to container in pod=
=connect to container in pod=
  oc rsh -c router pod/router-default-6b76b87c6-5m7h6
  oc rsh -c router pod/router-default-6b76b87c6-5m7h6


=get logs from all containers=
=get logs from all containers excluding namespace ^openshift from last 24 hours with timestamp=
Get logs all pods containers.
  oc get pods --no-headers --field-selector status.phase=Running -A -o custom-columns=NAMESPACE:.metadata.namespace,POD:.metadata.name | grep -v ^openshift | while read NAMESPACE POD ; do for CONTAINER in $(oc get pod $POD -n $NAMESPACE -o json | jq -r '.spec.containers[].name') ; do echo oc logs -n ${NAMESPACE} ${POD} -c ${CONTAINER} ; oc logs -n ${NAMESPACE} $POD -c $CONTAINER --since=24h --timestamps=true 2>&1 | grep "Error: getaddrinfo EAI_AGAIN " ; done ; done
  for POD in $(oc get pods -o jsonpath='{.items[*].metadata.name}') ; do for CONTAINER in $(oc get pod/$POD -o json | jq -r '.spec.containers[].name') ; do echo '***' pod $POD, container $CONTAINER ;
 
Get logs all pods containers in all namespaces.
=tail logs for pods matching label=
oc get namespaces --no-headers | awk '{print $1}' | while read NAMESPACE ; do oc project $NAMESPACE >/dev/null ; for POD in $(oc get pods -o jsonpath='{.items[*].metadata.name}') ; do for CONTAINER in $(oc get pod/$POD -o json | jq -r '.spec.containers[].name') ; do echo '***' namespace $NAMESPACE pod $POD, container $CONTAINER ; oc logs $POD $CONTAINER | grep vsphere.int.redbridge.se | tail -10 ; done; done ; done | tee /temp/vsphere.int.redbridge.se
  oc logs -n openshift-storage -l app=csi-cephfsplugin -c driver-registrar -f  --max-log-requests 8 --tail=1
=search logs for all pods for string save to file=
oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --tail=-1
  SEARCH="cosprod-m22s6-worker-m52c8" ; oc get namespaces --no-headers | awk '{print $1}' | while read NAMESPACE ; do oc project $NAMESPACE >/dev/null ; for POD in $(oc get pods -o jsonpath='{.items[*].metadata.name}') ; do for CONTAINER in $(oc get pod/$POD -o json | jq -r '.spec.containers[].name') ; do echo '***' namespace $NAMESPACE pod $POD, container $CONTAINER ; oc logs $POD $CONTAINER | grep "${SEARCH}" | tail -10 ; done; done ; done | tee /tmp/search_all_containers_"${SEARCH}".$(date '+%Y-%m-%d_%H-%M-%S').log
oc logs -f --tail=0 router-default-6c666984fd-ct8zf logs
oc logs -f --namespace openshift-gitops deployment/openshift-gitops-server
=Search for log entries locally on node=
ls -la $(ls -la $(grep -l EAI_AGAIN /var/log/containers/*) | awk '{print $NF}')
grep -rl EAI_AGAIN /var/log/pods/


=execute command in pod=
=execute command in pod=
  oc exec pod/router-default-545ffb97db-4h9rx -- $command
  oc exec pod/router-default-545ffb97db-4h9rx -- $command
kubectl exec --stdin --tty shell-demo -- /bin/bash
=execute command on all nodes=
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo $HOSTNAME && chronyc sources'
=execute command in all containers=
oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' -A | while read NAMESPACE POD ; do
  for CONTAINER in $(oc get -n $NAMESPACE pod/$POD -o json | jq -r '.spec.containers[].name') ; do
    echo '***' $NAMESPACE $POD $CONTAINER
    echo $(oc exec -c $CONTAINER -n $NAMESPACE $POD -- curl -m1 -skv https://inter.net 2>&1 | tr -d '\n')
  done
done | tee /tmp/$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S)
=where am i=
POD_NAME=rook-ceph-operator-6c86f788d5-f8mqf
POD_NAMESPACE=openshift-storage


=describe pods=
=describe pods=
Line 127: Line 250:
  oc get all -l '<label_name>=<label_value>'
  oc get all -l '<label_name>=<label_value>'
  oc get pods -n openshift-storage -o name -l app=rook-ceph-operator
  oc get pods -n openshift-storage -o name -l app=rook-ceph-operator
=logs=
oc logs -f router-default-6b76b87c6-4lc4b logs > /tmp/tmp
oc logs -f --tail=0 router-default-6c666984fd-ct8zf logs
oc logs -f --namespace openshift-gitops deployment/openshift-gitops-server


=get config from pod in yaml format=
=get config from pod in yaml format=
Line 159: Line 277:
=grant permission to project=
=grant permission to project=
  oc adm policy add-role-to-user view developer -n mysecrets
  oc adm policy add-role-to-user view developer -n mysecrets
=grant permission to group=
oc adm policy add-cluster-role-to-group cluster-admin admin
==grant a user cluster-admin permissions through group==
# create a new group.
oc adm groups new cluster-admin
# Bind cluster-admin Role to the Group
oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
# Add user to group
oc adm groups add-users cluster-admin T1.anbj15
=grant unrestriced access to service account=
oc adm policy add-scc-to-user privileged system:serviceaccount:isilon:isilon-node
=which pods use scc?=
oc get project -o=custom-columns='NAME:.metadata.name' --no-headers | grep -v openshift | while read NAMESPACE ; do echo '*' $NAMESPACE ; oc get pods -o=custom-columns='NAME:.metadata.name,SCC:.metadata.annotations.openshift\.io\/scc' --no-headers -n $NAMESPACE | grep restricted-v2 ; done
oc get pods --all-namespaces -o=jsonpath='{range .items[*]}{@.metadata.name}{"\t"}{@.metadata.namespace}{"\t"}{@.metadata.annotations.openshift\.io/scc}{"\n"}' | column_tab | less


=crictl=
=crictl=
==List running containers==
==List running containers==
  crictl ps
  crictl ps
crictl ps --all | grep -i coredns
==List all pods==
==List all pods==
  crictl pods
  crictl pods
Line 169: Line 304:
==Execute a command in a running container==
==Execute a command in a running container==
  crictl exec -it 1f73f2d81bf98 /bin/sh
  crictl exec -it 1f73f2d81bf98 /bin/sh
==crictl logs==
crictl logs
=nsenter=
=nsenter=
  run program in different namespaces
  run program in different namespaces
=which version=
=which version=
Get version of various objects
  oc version
  oc version
Get clusterversion
Only get cluster version
  oc get clusterversion
  oc get clusterversion
oc get clusterversion -o json|jq -r '.items[0].spec| .channel, .desiredUpdate.version'


=copy files from pod=
=copy files from pod=
Line 182: Line 322:
  ssh $node
  ssh $node
  toolbox
  toolbox
=rm toolbox=
toolbox rm --force <container>
=oc get route -A=
=oc get route -A=
get routing.
get routing.
Line 201: Line 344:
  Weight:        100 (100%)
  Weight:        100 (100%)
  Endpoints:      10.160.7.166:8000, 10.160.7.167:8000, 10.160.7.168:8000 + 35 more...
  Endpoints:      10.160.7.166:8000, 10.160.7.167:8000, 10.160.7.168:8000 + 35 more...
=oc get pods -o custom-columns=POD:.metadata.name --no-headers --all-namespaces=
=oc get pods (selecting specific pods)=
Only name without headers
Only name without headers
=get label:s=
oc get pods -o custom-columns=POD:.metadata.name --no-headers -A
  oc get pods --no-headers --all-namespaces|grep -i running | head -2 | while read namespace pod blabla ; do echo '***' oc label pod/$pod --list=true -n $namespace ; oc label pod/$pod --list=true -n $namespace ; done
Describe Failing pods.
  oc get pods -A --field-selector=status.phase=Failed --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo '*' $POD ${NAME_SPACE} ; oc describe pod $POD -n "${NAME_SPACE}" ; done | less -ISRM
 
=get pod label:s=
oc get pods --show-labels
 
=get subscriptions=
=get subscriptions=
  oc get subscriptions -A
  oc get subscriptions -A
=delete subscription=
oc delete subscription openshift-gitops-operator -n openshift-operators
=get available channels for subscription=
oc get PackageManifest $OPERATOR -o json | jq -r '.status.channels[] | .name,.currentCSV'
=update channel=
oc patch subscriptions -n $NAMESPACE $OPERATOR --type merge -p '{"spec": {"channel": "stable-4.12"}}'
=delete clusterserviceversion=
oc delete clusterserviceversion openshift-gitops-operator.v1.7.4
=whoami=
=whoami=
  oc whoami
  oc whoami
  oc config current-context
  oc config current-context
  oc whoami --show-console=true --show-context=true
  oc whoami --show-console=true --show-context=true
Which is the console url?
oc whoami --show-console
Which is the api url?
oc whoami --show-server
=get instance url=
=get instance url=
  oc get routes -n openshift-console console
  oc get routes -n openshift-console console


=create an htpasswd user=
kubernetes create htpasswd user
oc create user imageregistry
oc create identity htpasswd:imageregistry
oc create useridentitymapping htpasswd:imageregistry imageregistry
Create user/password to feed kubernetes with.
htpasswd -c -B -b htpasswd imageregistry P@ssW0rd
oc create secret generic htpass-secret --from-file=htpasswd=htpasswd -n openshift-config
Get htpasswd users.
oc get secret htpass-secret -ojsonpath={.data.htpasswd} -n openshift-config | base64 --decode
Enable htpasswd login.
oc edit oauth cluster
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret
look at oauth config.
oc get oauth cluster -o yaml
Create service account.
https://docs.openshift.com/container-platform/4.13/authentication/understanding-and-creating-service-accounts.html
=get list of user=
=get list of user=
  kubectl config view -o jsonpath='{.users[*].name}'
  oc config view -o jsonpath='{.users[*].name}'
 
=list contexts=
=list contexts=
  oc config get-contexts
  oc config get-contexts


=use-context=
=use-context=
  oc config use-context openshift-marketplace/api-ocp-02-cambio-rbcloud-net:6443/kube:admin
  oc config use-context openshift-marketplace/api-abjorklund-01-rbcloud-net:6443/kube:admin


=oc explain pv=
=oc explain pv=
Line 232: Line 424:
  oc adm drain <node> --force --delete-emptydir-data --ignore-daemonsets
  oc adm drain <node> --force --delete-emptydir-data --ignore-daemonsets
  oc adm drain <node> --ignore-daemonsets --force --grace-period=30 --delete-local-data
  oc adm drain <node> --ignore-daemonsets --force --grace-period=30 --delete-local-data
oc adm drain <node> --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets
Mark node as online.
Mark node as online.
  oc adm uncordon node1
  oc adm uncordon node1
Extend memory on node.
# Add memory to master nodes.
NODE=costest-ph9l4-master-1
oc adm cordon $NODE
oc adm drain $NODE --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets
timeout 10 oc debug node/$NODE -- chroot /host sh -c 'echo $HOSTNAME && sudo shutdown -P now'
govc vm.power -off /RGK/vm/costest-ph9l4/$NODE
govc vm.info /RGK/vm/costest-ph9l4/$NODE
govc vm.change -vm /RGK/vm/costest-ph9l4/$NODE -m 20480
govc vm.power -on /RGK/vm/costest-ph9l4/$NODE
oc adm uncordon $NODE
oc adm top nodes -l node-role.kubernetes.io/master


=Get pv:s=
=Get pv:s=
  oc get pv
  oc get pv
Sorted by size.
oc  get pv --sort-by=.spec.capacity.storage -A
Get more info about a pv.
Get more info about a pv.
  oc describe pv $PV
  oc describe pv $PV
=Access modes for pv:s. AccessMode=
RWO  - ReadWriteOnce    the volume can be mounted as read-write by a single node
ROX  - ReadOnlyMany      the volume can be mounted read-only by many nodes
RWX  - ReadWriteMany    the volume can be mounted as read-write by many nodes
RWOP - ReadWriteOncePod  the volume can be mounted as read-write by a single Pod.
=get pvc:s=
=get pvc:s=
  oc get pvc --all-namespaces | less
  oc get pvc --all-namespaces | less
sort by
oc get pvc --sort-by=.spec.resources.requests.storage -A
=create pvc=
# oc create pvc
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: abjorklund-pvc1
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
EOF
=use pvc. Create pod using pvc=
# Create test pod.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: abjorklund-test-pvc-claim1-pod
spec:
  volumes:
    - name: abjorklund-test-pvc
      persistentVolumeClaim:
        claimName: abjorklund-test-pvc
  containers:
    - name: abjorklund-test-pvc
      image: halfface/rockylinux-toolbox:v3
      volumeMounts:
        - mountPath: "/mnt/abjorklund-test-pvc"
          name: abjorklund-test-pvc
      command: ["sleep"]
      args: ["infinity"]
EOF
=extend/increase pvc=
PVC=postgres-instance1-x5b8-pgdata ;NAMESPACE=rk-cos-prod ; oc patch pvc ${PVC} --type=merge -p '{"spec":{"resources":{"requests":{"storage": "2Gi"}}}}' -n ${NAMESPACE}
=which pods are using pvc=
=which pods are using pvc=
  kubectl get pods --all-namespaces -o=json | jq -c '.items[] | {name: .metadata.name, namespace: .metadata.namespace, claimName:.spec.volumes[] | select( has ("persistentVolumeClaim") ).persistentVolumeClaim.claimName }'
  oc get pods --all-namespaces -o=json | jq -c '.items[] | {name: .metadata.name, namespace: .metadata.namespace, claimName:.spec.volumes[]? | select( has ("persistentVolumeClaim") ).persistentVolumeClaim.claimName }'


=kubectl=
=kubectl=
Line 249: Line 503:
Select context
Select context
  kubectl config use-context default/api-blabla-halfface-se:6443/kube:admin
  kubectl config use-context default/api-blabla-halfface-se:6443/kube:admin
=list groups=
=permissions=
==list groups==
  oc get groups -o wide
  oc get groups -o wide
==list cluserroles==
oc get clusterrole --all-namespaces
==list clusterrolebindings==
oc get crb
oc get clusterrolebindings


=scale=
=scale=
  oc scale --replicas=2 rc/postgresql-1
  oc scale --replicas=2 rc/postgresql-1
=top(disable wikipedia top=
oc scale -n abjorklund deployment stress-hm-6x32 --replicas=0
oc scale --replicas=3 machineset <machineset> -n openshift-machine-api
 
=top(disable wikimedia top)=
  oc adm top pods --use-protocol-buffers --all-namespaces
  oc adm top pods --use-protocol-buffers --all-namespaces
  oc adm top nodes --sort-by=cpu or memory
oc adm top pods --use-protocol-buffers --all-namespaces --sort-by=cpu | head -20| cut -c -200
  oc adm top nodes --sort-by=cpu
oc adm top nodes --sort-by=memory
 
=get memory usage of all running pods in MB=
=get memory usage of all running pods in MB=
  oc get pods -o custom-columns=POD:.metadata.name --no-headers --field-selector status.phase=Running| while read POD ; do echo $POD $(( $(oc exec -it $POD -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes </dev/null 2>/dev/null) / 1024 / 1024 )) MB ; done
  oc get pods -o custom-columns=POD:.metadata.name --no-headers --field-selector status.phase=Running| while read POD ; do echo $POD $(( $(oc exec -it $POD -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes </dev/null 2>/dev/null) / 1024 / 1024 )) MB ; done
oc get pods -A -o wide --no-headers --field-selector spec.nodeName=ocp-04-9lxgz-worker-wlw9p  --field-selector status.phase=Running | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sort -k 4 -n| less
Get memory usage per pod on specific node.
NODE=ocp-01-4dfqx-worker-4n6mk ; oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=${NODE},status.phase=Running" | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sed 's/  */\t/g' | sort -k 4 -n | column -t -s $'\t'
=get memory usage of all nodes in % of total available ram=
oc get nodes -o name | xargs -I % oc debug % -- chroot /host sh -c 'BUFFER=($(free | grep Mem:)) ; echo $HOSTNAME $(( $(( ${BUFFER[1]} - ${BUFFER[6]} )) / $(( ${BUFFER[1]} / 100 )) ))' 2>/dev/null


=oc get crd=
=oc get crd=
Line 265: Line 536:
=operators=
=operators=
Automatically setup of a instances.
Automatically setup of a instances.
=oc adm upgrade --to-image==
=list installed operators=
Upgrade to version that you found on github okd
oc get ClusterServiceVersions -A
oc get csv -A
oc get operators -o json | jq -r '.items[].status.components.refs[]?|select(.kind=="ClusterServiceVersion")|.name'
Search all namespaces. Exclude namespace.
oc get csv -A -o=custom-columns='NAME:.metadata.name,VERSION:.spec.version,DISPLAY:.spec.displayName' --no-headers | sort  | uniq
 
=list available operators=
oc get packagemanifests
=delete operator=
Delete via gui. If traces are left. Or unable to install again.
https://access.redhat.com/solutions/6762071 Remove potentially blocking references.
https://access.redhat.com/solutions/7026146 Remove label so operator is not recreated.
oc get operator prometheus.prometheus -o yaml -n openshift-operators | grep -i CustomResourceDefinition -A1    //It will list the CRDs
currently being referenced by the operator
oc edit crd thanosrulers.monitoring.coreos.com
-----------output truncated------------
  labels:
    operators.coreos.com/prometheus.prometheus: ""                            //Remove this line and then save and exit
# Remove possibly broken jobs.
oc get jobs.batch -n openshift-marketplace | grep -i 0/1
# If job was not broken then remove all references to that operator. Remove jobs and configmaps.
oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("elasticsearch-operator")) | .metadata.name' | while read i ; do echo oc delete job $i -n openshift-marketplace ; echo oc delete configmap $i -n openshift-marketplace ; done
 
=Select channel=
oc patch clusterversion version --type merge -p '{"spec": {"channel": "candidate-4.12"}}' # candidate-... channel offers unsupported early access to releases as soon as they are built.
oc patch clusterversion version --type merge -p '{"spec": {"channel": "fast-4.12"}}'      # As soon as version as a general availability (GA) release. Fully supported. Used in production environments.
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.12"}}'    # Delay from fast. Looking at quality from fast. If found good then moved to stable
oc patch clusterversion version --type merge -p '{"spec": {"channel": "eus-4.12"}}'      # Extended Update Support
 
=find if image exitst=
oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.38-x86_64
=Upgrade to version that you found on github okd=
oc adm upgrade --to-image=


=oc adm upgrade=
=oc adm upgrade=
  Upgrade okd images.
  Upgrade okd images.
=Launch a new instance of a pod for gathering debug information. Compress and deliver in support case=
=Launch a new instance of a pod for gathering debug information. Compress and deliver in support case=
  oc adm must-gather
  cd /tmp && oc adm must-gather && tar czf /tmp/must-gather.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S).tar.gz must-gather.local.*
tar cvaf /tmp/must-gather.tar.gz must-gather.local.*
=Must gather for odf. (get csv -n openshift-storage gives you version to use=
Must gather for odf.
  cd /tmp && oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:4.10
  oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:4.10
tar czf /tmp/must-gather.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S).tar.gz must-gather.local.*


=oc adm certificate approve <csr_name>=
=oc adm certificate approve <csr_name>=
Approve csr certificate
Approve csr certificate
==Approve all csr==
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
oc get csr -o name | xargs oc adm certificate approve
=certmanager=
==cert-manager design==
(  +---------+  )
  (  | Ingress |  ) Optional                                              ACME Only!
  (  +---------+  )
          |                                                    |
          |  +-------------+      +--------------------+      |  +-------+      +-----------+
          |-> | Certificate |----> | CertificateRequest | ----> |  | Order | ----> | Challenge |
              +-------------+      +--------------------+      |  +-------+      +-----------+
==look at cert-manager cr==
oc api-resources | grep cert | awk '{print $1}' | while read i ; do echo '*' $i ; oc get $i -A ; done
==list certificates==
oc get certificate -A
==list ClusterIssuer==
oc get ClusterIssuer -A
==list orders by date==
oc get orders -n openshift-config --sort-by=.metadata.creationTimestamp
==install cmctl==
<pre>
curl -fsSL https://github.com/cert-manager/cert-manager/releases/latest/download/cmctl-linux-amd64.tar.gz | (cd /usr/local/bin/ ; sudo tar zxf - cmctl)
</pre>
==completion==
. <(cmctl completion bash)
==renew cert==
cmctl renew -n openshift-config cert-api
==status of cert==
cmctl status certificate -n openshift-ingress le-wildcard-apps-certificate
=oc adm release info=
=oc adm release info=
  # Show information about the cluster's current release
  # Show information about the cluster's current release
Line 289: Line 625:
  # Show release info about a release
  # Show release info about a release
  oc adm release info 4.10.47 --pullspecs
  oc adm release info 4.10.47 --pullspecs
=release notes=
find changes between ocp versions / release note.
https://access.redhat.com/labs/ocpupgradegraph/update_path
Select source and destination.
At bottom there is graphical display.
Press each bubble and read rhba.
Point releases in the end.
https://docs.openshift.com/container-platform/4.12/release_notes/ocp-4-12-release-notes.html
=oc adm node-logs=
Look at logs from crio from master nodes.
oc adm node-logs --role master -u crio
Get logs from one node from unit crio
oc adm node-logs abjorklund-01-5tsbc-worker-0-kcr54 -u crio
Look at specific log
oc adm node-logs --role master --path=openshift-apiserver/audit.log
List logs
oc adm node-logs --role=master --path=/
List logs from specific node.
oc adm node-logs nord-ic-bc84t-master-0 --path=/oauth-server/
Logs since older reboots
oc adm node-logs --role=master --path=/
Search recursive where log file exist.
oc_debug_run_command_all_nodes 'find /var/log 2>&1 | grep <name_pod>'


=openshift upgrade path=
=openshift upgrade path=
  https://access.redhat.com/labs/ocpupgradegraph/update_path?channel=stable-4.9&arch=x86_64&is_show_hot_fix=false&current_ocp_version=4.9.15&target_ocp_version=4.10.11
  https://access.redhat.com/labs/ocpupgradegraph/update_path?channel=stable-4.9&arch=x86_64&is_show_hot_fix=false&current_ocp_version=4.9.15&target_ocp_version=4.10.11
=helm=
=Upgrade openshift/okd=
List all helm charts in all namespaces
https://docs.okd.io/latest/updating/preparing_for_updates/updating-cluster-prepare.html
  helm list -aA
Run below and look to se if api:s that are being removed has a count.
oc get apirequestcounts
 
=upgrade openshift=
# look for existing alerts.
# look for troublesome pods.
oc get pods -A  | grep -Ev ' Running | Completed '
# Set channel
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.10"}}'
oc adm upgrade --to=4.10.47
oc get clusterversion -o json|jq ".items[0].spec"
# View openshift version history.
oc get clusterversion -o json | jq -r '.items[0].status.history[] |  [.version, .startedTime, .completionTime] | join(" ")'
# View progress of update.
watch -n1 oc whoami --show-console \; oc adm upgrade
watch -cn1 "oc get clusteroperators | grep --color=always -E \"$(oc get clusterversions.config.openshift.io version -o json | jq -r .status.desired.version)|\""
# Upgrade all operators
oc get installplan -A | grep Manual | grep false
oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}'
 
=upgrade okd=
Get upgrade path.
Look here to find latest version https://github.com/okd-project/okd/releases
(cd /usr/local/bin/ ; sudo curl -s -O https://gist.githubusercontent.com/Goose29/ca7debd6aec7d1a4959faa2d1b661d93/raw/4584d89c49d4af197480539bdd873f6d9ca2dd83/upgrade-path.py ; sudo chmod 755 upgrade-path.py ) && (curl -sH 'Accept:application/json' 'https://amd64.origin.releases.ci.openshift.org/graph?channel=stable-4' | upgrade-path.py 4.13.0-0.okd-2023-07-23-051208 4.14.0-0.okd-2024-01-26-175629 )
To view status of update process run. Command is harmless and gives information about ongoing process and blockers.
oc adm upgrade
watch -cn1 "oc whoami --show-console ; echo ; oc get clusteroperators | grep --color=always -E \"$(oc get clusterversions.config.openshift.io version -o json|jq -r '.spec.desiredUpdate.version')|\""
To get slightly other view. VERSION column gives information about version. When update is done all cluster operators will have same version number.
oc get clusteroperators
Make a report of cluster status before installing. To rule out issues that you have not caused. https://halfface.se/wiki/index.php/Openshift#status_of_kubernetes
"status of kubernetes" below.
Look for api:s that are used that are flagged for being removed.
oc get apirequestcounts
Upgrade okd until there are no more updates or you have reached wanted version.
oc adm upgrade --to-latest=true --allow-explicit-upgrade
If complaining about cert.
oc patch --type='merge' --patch='{"spec":{"desiredUpdate":{"force":true}}}' clusterversion version
If client want specific version pinpoint that.
oc adm upgrade --to=<version from oc adm upgrade> --allow-explicit-upgrade
oc adm upgrade gives: Upgradeable=False Reason: AdminAckRequiredn Follow instructions from link. Command will be something like below.
oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-<version>-kube-<version>-api-removals-in-<version>":"true"}}' --type=merge
==status of kubernetes==
Get pods that are less than perfekt.
oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done
Get critical alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="critical")'
Get warning alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")'
 
=upgrade odf=
# View existing config.
oc get subscriptions -n openshift-storage odf-operator -o yaml
# Patch subscription
oc patch subscriptions -n openshift-storage odf-operator --type merge -p '{"spec": {"channel": "stable-4.10"}}'
# Get install plans
oc get installplan -n  openshift-storage -o wide
# Approve install plan.
oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done
 
=odf troubleshooting=
# ceph problem.  Run commands from rook-ceph-operator
oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator)
export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'
ceph -s
ceph osd pool ls
ceph osd pool autoscale-status
ceph config dump
# disable autoscaling
ceph osd pool ls | while read i ; do echo '*' $i ; ceph osd pool set $i pg_autoscale_mode off ; done
# Look to see how much data is being used for pg:s.
# Number of PGLog Entries, size of PGLog data in megabytes, and Average size of each PGLog item
  for i in 0 1 2 ; do echo '*' $i ; osdid=$i ; ceph tell osd.$osdid dump_mempools | jq -r '.mempool.by_pool.osd_pglog | [ .items, .bytes /1024/1024, .bytes / .items ] | @csv' ;done
ceph df
 
=cronjobs=
=cronjobs=
  oc get cj
  oc get cj
  oc get cronjobs -o wide -A
  oc get cronjobs -o wide -A
Run cronjob manually
Run cronjob manually
  oc create job -n ldap-sync --from=cronjob/ldap-sync ldap-sync-manual-001
  oc create job -n ldap-sync --from=cronjob/ldap-sync ldap-sync-manual-$(date '+%Y-%m-%d-%H-%M-%S')
Disable cronjob
.spec.suspend: true
Enable cronjob
oc patch cronjobs.batch write-to-nfs --type merge -p '{"spec": {"suspend": false}}'


=delete po=
=delete po (stop, kill)=
stop pod
stop pod
  oc delete po --all --force
  oc delete po --all --force
Line 308: Line 745:
  oc get po -A | grep -v ^NAME | awk '$4 !~ /Running/' | sort -k4 | while read NAMESPACE POD READY STATUS END ; do echo '****' $POD $STATUS ; echo oc delete po $POD -n $NAMESPACE --force --grace-period=0 ; done
  oc get po -A | grep -v ^NAME | awk '$4 !~ /Running/' | sort -k4 | while read NAMESPACE POD READY STATUS END ; do echo '****' $POD $STATUS ; echo oc delete po $POD -n $NAMESPACE --force --grace-period=0 ; done
  oc get pods -A --field-selector=status.phase!=Running --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
  oc get pods -A --field-selector=status.phase!=Running --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
(oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
# Delete pods and generate report on what has been removed.
LOG=/tmp/oc_delete_pod_$(oc config current-context | awk -F '/|:' '{print $2}').$(date '+%Y-%m-%d_%H-%M-%S').log ; (oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done | tee $LOG ; awk -F\" '{print $2}' $LOG | sed 's/-[a-z0-9]*$//g'| sed 's/-[a-z0-9]*$//g' | sort | uniq -c | sort -n | tail -20


=use other namespace=
=use other namespace=
Line 334: Line 774:
=storageclasses(sc)=
=storageclasses(sc)=
  oc get storageclasses
  oc get storageclasses
=get storageclasses defined as default=
oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name'
=set default storageclass=
# Set all sc to default false.
oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name' | while read i ; do echo '*' $i ; oc patch storageclass $i -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'; done
# Set default storageclass.
oc patch storageclass ocs-storagecluster-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
=get service accounts=
=get service accounts=
  oc get serviceaccounts -A
  oc get serviceaccounts -A
Line 341: Line 789:
  oc auth can-i --as=fjuza --list
  oc auth can-i --as=fjuza --list
  oc get groups -o wide
  oc get groups -o wide
oc auth can-i --as-group=<group> --list


=alerts=
=alerts=
==How is alertmanager configured==
oc get secret -n openshift-monitoring alertmanager-main -o json | jq -r '.data."alertmanager.yaml"|@base64d'
==Save alertmanger config==
<pre>
oc get secret alertmanager-main -n openshift-monitoring --template='{{index .data "alertmanager.yaml" | base64decode}}' > /tmp/oc_get_secret_alertmanager-main.alertmanager.yaml.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
oc extract secret/alertmanager-main --confirm -n openshift-monitoring
</pre>
==Restore alertmanager config==
oc set data secret alertmanager-main -n openshift-monitoring --from-file=alertmanager.yaml=<file_alertmanager.yaml>
==alertmanager==
View Alertmanager configured alerts.
View Alertmanager configured alerts.
  oc get prometheusrules -A -o yaml | grep alert: | sort | less
  oc get prometheusrules -A -o yaml | grep alert: | sort
view alerts firing
View configuration of alert
  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s   'http://localhost:9090/api/v1/alerts' | jq . | less -ISRM
oc get prometheusrules -A -o json | jq '.items[].spec.groups[].rules[]| select(.alert=="AlertmanagerReceiversNotConfigured")'
view alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq . | less -ISRM
View specific alert.
oc rsh -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl 'http://localhost:9090/api/v1/query?query=absent%28up%7Bjob%3D"fluentd"%7D+%3D%3D+1%29' | jq .
View alerts in state firing
  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")' | less -ISRM
View alerts in state firing with severity warning
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")' | less -ISRM
View historical alerts.  
View historical alerts.  
  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s   'http://localhost:9090/api/v1/query_range?query=ALERTS&start=2022-08-08T00:00:00.781Z&end=2022-08-09T00:00:00.781Z&step=1m'
  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=2022-08-08T00:00:00.781Z&end=2022-08-09T00:00:00.781Z&step=1m"
Talk to api with Bearer.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(date '+%Y-%m-%d' --date '-2 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1m" | jq . | less -ISRM
Get warning alerts since the last week.
echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate}| select(.severity=="warning")|select(.alertstate=="firing") | .alertname'
Get more info about fired alerts.
echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate, pod, namespace}| select(.severity=="warning")|select(.alertstate=="firing")'
Get alert during the last 6 days. Give times when alert has fired.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r . | python3 -c "import sys, re, datetime; print(re.sub(r'\b\d{10}\b', lambda x: datetime.datetime.utcfromtimestamp(int(x.group())).isoformat() + 'Z', sys.stdin.read()))" | less -ISRM
 
=disable alermanager alert=
oc -n openshift-monitoring exec -ti alertmanager-main-0 -c alertmanager -- amtool silence add --alertmanager.url http://localhost:9093  alertname=AlertmanagerReceiversNotConfigured --end="2053-11-07T00:00:00-00:00" --comment "silence alertmanager"
 
=Silence alertmanager not configured alert=
oc set data secret alertmanager-main -n openshift-monitoring --from-file=alertmanager.yaml=<(cat <<'EOF'
"global":
  "resolve_timeout": "5m"
"inhibit_rules":
  - "equal":
      - "namespace"
      - "alertname"
    "source_match":
      "severity": "critical"
    "target_match_re":
      "severity": "warning|info"
  - "equal":
      - "namespace"
      - "alertname"
    "source_match":
      "severity": "warning"
    "target_match_re":
      "severity": "info"
"receivers":
  - "name": "Default"
  - "name": "Watchdog"
  - "name": "Critical"
  - "name": "testrec" # Dummy receiver with webhook config
    "webhook_configs":
      - "url": "http://xxxxdumyxxx.com"
"route":
  "group_by":
    - "namespace"
  "group_interval": "5m"
  "group_wait": "30s"
  "receiver": "Default"
  "repeat_interval": "12h"
  "routes":
    - "match":
        "alertname": "dummyalert" # Dummy alert being routed to dummy receiver
      "receiver": "testrec"
EOF
)
 
=prometheus=
Url to web interface.
https://prometheus-k8s-openshift-monitoring.apps.<url>
echo https://prometheus-k8s-openshift-monitoring.$(oc whoami --show-console | awk -F 'console-openshift-console.' '{print $2}')
echo https://$(oc get route -n openshift-monitoring prometheus-k8s -o jsonpath="{.spec.host}")
Get disk usage from odf
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=odf_system_raw_capacity_used_bytes" | jq -r .
Get disk usage from odf over time.(metrics)
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=odf_system_raw_capacity_used_bytes&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq . | less -ISRM
Search tips
https://prometheus.io/docs/prometheus/latest/querying/basics/
Disk usage per project. Taken from RH ticket.
oc -n openshift-monitoring exec prometheus-k8s-0 -c prometheus -- curl -s -g 'http://localhost:9090/api/v1/query?' --data-urlencode 'query=(sort_desc(topk(25,(sum(kubelet_volume_stats_used_bytes * on (namespace,persistentvolumeclaim) group_left(storageclass, provisioner) (kube_persistentvolumeclaim_info * on (storageclass)  group_left(provisioner) kube_storageclass_info {provisioner=~"(.*cephfs.csi.ceph.com)"})) by (namespace)))))'
 
=openshift-user-workload-monitoring=
  "annotations": {
    "description": "Prometheus operator in openshift-user-workload-monitoring namespace rejected 2 prometheus/ServiceMonitor resources.",
    "summary": "Resources rejected by Prometheus operator"
  },...
# Look at what is causing.
oc logs -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring
# After tweaking with monitoring settings kill pod and view log.
oc delete pod -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring
oc logs -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring | less
# Stop monitoring.
oc label namespace openshift-local-storage openshift.io/cluster-monitoring-
oc label namespace openshift-local-storage openshift.io/user-monitoring=false
# Allow monitoring.
oc label namespace openshift-operators openshift.io/cluster-monitoring=true
 
=Talk to api with Bearer.=
  HOST=$(oc -n openshift-monitoring get route alertmanager-main -ojsonpath={.spec.host})
  HOST=$(oc -n openshift-monitoring get route alertmanager-main -ojsonpath={.spec.host})
  TOKEN=$(oc whoami -t)
  TOKEN=$(oc whoami -t)
  curl -skH "Authorization: Bearer $TOKEN" "https://$HOST/api/v2/alerts" | jq .
  curl -skH "Authorization: Bearer $TOKEN" "https://$HOST/api/v2/alerts" | jq .
=token=
token=`oc sa get-token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.10 or lower ---
OR
token=`oc create token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.11 or higher ---
curl using token
curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main-openshift-monitoring.apps.domain/api/v1/alerts' |  jq '.data[].labels'
=ServiceMonitor=
Prometheus Operator:
When using Prometheus Operator, custom resources like ServiceMonitor and PodMonitor might include metricsConfig settings to customize how Prometheus should scrape metrics from various services or pods.


=bash completion=
=bash completion=
Line 379: Line 939:
         rtcsync
         rtcsync
         logdir /var/log/chrony' | butane | oc apply -f -
         logdir /var/log/chrony' | butane | oc apply -f -
==get machineconfig value==
oc get mc 00-master -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig' | sed 's/^data:,//g' | jq .
==List machineconfigs by creation time==
oc get mc --sort-by=.metadata.creationTimestamp
=get users=
=get users=
  oc get users
  oc get users
=give me kubeadmin ecrypted password=
oc get secret kubeadmin -n kube-system -o json  -o=jsonpath='{.data.kubeadmin}' | base64 -d
=Give kubeadmin a new password=
==generate password hash==
htpasswd -bnBC 10 "" '<password>' | tr -d ':\n' | base64 -w0
==patch password hash==
oc patch secret/kubeadmin -n kube-system -p '{"data": {"kubeadmin": "UGFzc3dvcmQK=="}}'
=work with oc without login=
=work with oc without login=
  export KUBECONFIG=auth/kubeconfig
  export KUBECONFIG=/var/lib/kubelet/kubeconfig
==Add the following if cert is not trusted==
if on bootstrap node.
export KUBECONFIG=/etc/kubernetes/kubeconfig
 
=Add the following if cert is not trusted.ssl/tls=
  - cluster:
  - cluster:
     insecure-skip-tls-verify: true
     insecure-skip-tls-verify: true
     server: https://127.0.0.1:443
     server: https://127.0.0.1:443
   name: my-cluster
   name: my-cluster
=run oc when on node=
oc get pod -n openshift-monitoring --kubeconfig=/var/lib/kubelet/kubeconfig
=etcdctl=
=etcdctl=
  oc exec -it pod/etcd-ocp-03-lm8km-master-1 -n openshift-etcd -- bash
  oc rsh -c etcdctl -n openshift-etcd $(oc get pod -l app=etcd -oname -n openshift-etcd | awk -F"/" 'NR==1{ print $2 }')
Defaulted container "etcdctl" out of: etcdctl, etcd, etcd-metrics, etcd-health-monitor, setup (init), etcd-ensure-env-vars (init), etcd-resources-copy (init)
  [root@ocp-03-lm8km-master-1 /]# etcdctl --write-out=table endpoint status
  [root@ocp-03-lm8km-master-1 /]# etcdctl --write-out=table endpoint status
  +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
  +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Line 399: Line 980:
  | htt://172.19.14.41:2379  | 51cecd971b657ee5 |  3.5.0 |  105 MB |      true |      false |        6 |    2632074 |            2632074 |        |
  | htt://172.19.14.41:2379  | 51cecd971b657ee5 |  3.5.0 |  105 MB |      true |      false |        6 |    2632074 |            2632074 |        |
  +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
  +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
=create troubleshooting pod=
 
=create troubleshooting/debug/test pod=
  oc run abjorklund-redhat-ubi8 --image=redhat/ubi8 -i --tty -- sh
  oc run abjorklund-redhat-ubi8 --image=redhat/ubi8 -i --tty -- sh
  oc run abjorklund-curlimage-curl --image=curlimages/curl -i --tty -- sh
  oc run abjorklund-curlimage-curl --image=curlimages/curl -i --tty -- sh
  oc run -it busybox --image=busybox --restart=Never -- ash
  oc run -it busybox --image=busybox --restart=Never -- ash
oc run abjorklund-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash # dnf -y install procps-ng iproute
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux --restart=Never --command sleep infinity
==install packages to get running==
yum install -y lsof procps-ng bind-utils


=proxy settings=
=proxy settings=
Line 410: Line 997:
=oc proxy=
=oc proxy=
Run a proxy to the Kubernetes API server
Run a proxy to the Kubernetes API server
=port forward to pod=
oc port-forward <my-pod-name> <local-port>:<remote-port>
alertmanager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093  # http://localhost:9093/
grafana access.
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80        # http://localhost:3000 admin prom-operator
prometheus access.
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090    # http://localhost:9090


=Install additional ca certificate=
=Install additional ca certificate=
Line 417: Line 1,012:
   labels:
   labels:
     machineconfiguration.openshift.io/role: worker
     machineconfiguration.openshift.io/role: worker
   name: 50-redbridge-ca-cert
   name: 50-company-ca-cert
  spec:
  spec:
   config:
   config:
Line 428: Line 1,023:
         mode: 0644
         mode: 0644
         overwrite: true
         overwrite: true
         path: /etc/pki/ca-trust/source/anchors/redbridge-ca.crt
         path: /etc/pki/ca-trust/source/anchors/company-ca.crt
 
=get raw api data=
=get raw api data=
  oc get --raw "/api/v1/nodes/[node]/proxy/stats/summary"
  oc get --raw "/api/v1/nodes/[node]/proxy/stats/summary"
Line 437: Line 1,033:
  curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/metrics/resource
  curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/metrics/resource
=explain=
=explain=
Get documentation for a resource
Get documentation for a resource. Get available attributes for an resource.
  oc explain deployment
  oc explain deployment
=events=
=events=
Get events.
Get events.
  oc get events -A --sort-by=.metadata.creationTimestamp
  oc get events -A --sort-by=.metadata.creationTimestamp
=yq=
Select specific values.
oc get mcp worker -o yaml | yq '.spec.configuration.source.[].name'
Delete specific values.
oc get catalogsources -n openshift-marketplace -o yaml | yq 'del(.items.[].status)'
=jsonpath=
=jsonpath=
Get names of MachineConfigs one value per line.
Get names of MachineConfigs one value per line.
  oc get mc -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' --no-headers
  oc get mc -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' --no-headers
=endpoints=
look to see that pods are defined in
oc get endpoints -n default
=ImageStreamTag=       
=ImageStreamTag=       
ImageStreamTag represents an Image that is retrieved by tag name from an ImageStream.
ImageStreamTag represents an Image that is retrieved by tag name from an ImageStream.
=imagestream=
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  name: myapp
==Tagging Images: When you tag an image, it is added to the ImageStream with a specified tag.==
oc tag myregistry/myapp:latest myapp:latest
==Using ImageStreams in Deployment Configurations: Deployment configurations can reference ImageStreams instead of direct image URLs.==
apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
        - name: myapp
          image: image-registry.openshift-image-registry.svc:5000/myproject/myapp:latest
=BuildConfig=
=BuildConfig=
  Build configurations define a build process for new container images.
  Build configurations define a build process for new container images.
=download okd openshift-install=
# Show latest.
curl -skL https://github.com/okd-project/okd/releases | elinks --dump | sed 's/^ *//g' | grep " Latest"
# Download and install in /usr/local/bin. Keep old versions.
export OKD_VERSION=4.15.0-0.okd-2024-03-10-010116 ; (cd /temp/ ; oc adm release extract --tools quay.io/openshift/okd:${OKD_VERSION} ; cd /usr/local/bin/ ; sudo tar xf /temp/openshift-install-linux-${OKD_VERSION}.tar.gz openshift-install ; sudo mv openshift-install openshift-install.${OKD_VERSION})
=setup openshift cluster=
=setup openshift cluster=
Download binary
Download binary
Line 464: Line 1,078:
Create config file
Create config file
  install-config.yaml
  install-config.yaml
Then fire of install
Then fire off install
  openshift-install create cluster
  openshift-install create cluster
Another example
ln -s install-config.yaml.2023-03-23 install-config.yaml
./openshift-install-4.12.0-0.okd-2023-04-16-041331 create cluster
=Edit install config after setup=
Save config
<pre>
oc get cm cluster-config-v1 -n kube-system --template='{{index .data "install-config" }}' > /tmp/cm_cluster-config-v1_-n_kube-system.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
</pre>
Edit downloaded file and apply edited file.
oc set data cm cluster-config-v1 -n kube-system --from-file=install-config=/tmp/cm_cluster-config-v1_-n_kube-system.<suitable_name>
=look at install settings=
oc get -n kube-system cm/cluster-config-v1 -o yaml
=argocd login=
argocd login openshift-gitops-server-openshift-gitops.apps.costest.ltkronoberg.se --username kubeadmin --password asdfasfasdfas --sso --insecure
argocd login $(oc get routes -n openshift-gitops openshift-gitops-server -o json | jq -r .spec.host) --username $USER --password $COMPANY_PASSWORD --sso --insecure
=git sync heal=
argocd app list | grep -v NAME | awk '{print $1}' | while read i ; do echo '*' $i ; argocd app set $i --self-heal ; done


=argocd=
curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
rm argocd-linux-amd64
=metrics=
=metrics=
==Search strings.==
==Get available values==
Cpu usage per node.
Thanos monitoring points
curl -sk -H "Authorization: Bearer $(oc whoami -t)" https://$(oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}')/api/v1/metadata | jq .
node-exporter
oc --request-timeout=3 -n openshift-monitoring exec -c node-exporter $(oc get pod -n openshift-monitoring -l app.kubernetes.io/name=node-exporter -o=custom-columns='NAME:.metadata.name' --no-headers | head -1) -- curl -s 'http://localhost:9100/metrics' | grep -vE "^#|^$"
 
==Cpu usage per node.==
  100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30m])) * 100)
  100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30m])) * 100)
=oc kubectl=
instance:node_cpu_utilisation:rate1m{job="node-exporter",  cluster=""} != 0
Download openshift client.
instance:node_cpu_utilisation:rate1m{job="node-exporter"} != 0
  wget https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest/openshift-client-linux.tar.gz; tar -xzvf openshift-client-linux.tar.gz; chmod +x oc; sudo rm /usr/local/bin/oc 2>/dev/null ; sudo mv oc /usr/local/bin
==iowait==
=time and timezone in first pod=
avg by (instance) (irate(node_cpu_seconds_total{mode="iowait"}[30m]))
  oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' | head -1 | while read NAMESPACE POD ; do oc rsh -n $NAMESPACE $POD  bash -c 'date "+%Y-%m-%d %H:%M:%S %Z"' 2>/dev/null ; don
 
==namespace==
cpu usage per namespace.
sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=""}) by (namespace)
 
==load==
Load 1 graph
instance:node_load1_per_cpu:ratio{job="node-exporter", cluster=""} != 0
==usage for pvc==
kubelet_volume_stats_used_bytes
kubelet_volume_stats_available_bytes
kubelet_volume_stats_used_bytes{persistentvolumeclaim="prometheus-prometheus-k8s-1"}
 
==Memory usage==
Memory usage of node.
instance:node_memory_utilisation:ratio
node_memory_MemAvailable_bytes
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
==OOMKilled==
sum by (namespace, pod) (kube_pod_container_status_restarts_total) * on(namespace, pod) group_left(reason) kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=sum%20by%20(namespace,%20pod)%20(kube_pod_container_status_restarts_total)%20*%20on(namespace,%20pod)%20group_left(reason)%20kube_pod_container_status_last_terminated_reason%7Breason%3D%22OOMKilled%22%7D&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq .
==uptime==
oc exec -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/query?query=time%28%29%20-%20node_boot_time_seconds%7Bjob%3D%22node-exporter%22%7D%0A' | jq -r '.data.result[]|.metric.instance +"\t"+ (.value[1] | tonumber | floor | tostring)' | column_tab
 
=install oc and kubectl=
  curl -fsSL https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest/openshift-client-linux.tar.gz | (cd /usr/local/bin/ ; sudo tar zxf - oc kubectl )
 
=time and timezone in first pod(date)=
  oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' -A | grep -v cert-manager | head -1 | while read NAMESPACE POD ; do oc rsh -n $NAMESPACE $POD  bash -c 'date "+%Y-%m-%d %H:%M:%S %Z"' 2>/dev/null ; done
 
=oc get installplan=
=oc get installplan=
InstallPlan defines the installation of a set of operators.
InstallPlan defines the installation of a set of operators.
  oc get installplan install-bk8hw -n openshift-operators -o yaml
  oc get installplan install-bk8hw -n openshift-operators -o yaml
Approve all manual updates.
oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done
Get selected info from all installplans
oc get installplans.operators.coreos.com -A --no-headers -o=custom-columns='DATE:.metadata.creationTimestamp,NAME:.metadata.name,PHASE:.status.phase,CSV:.spec.clusterServiceVersionNames,NAMESPACE:.metadata.namespace'  --sort-by=.metadata.creationTimestamp
=oc extract=
=oc extract=
Extract secrets or config maps to disk
Extract secrets or config maps to disk
  # Extract only the key "nginx.conf" from config map "nginx" to the /tmp directory
  # Extract only the key "nginx.conf" from config map "nginx" to the /tmp directory
  oc extract configmap/nginx --to=/tmp --keys=nginx.conf
  oc extract configmap/nginx --to=/tmp --keys=nginx.conf
=ostree=
=dependencies,owner=
==Remotes==
Add a remote
ostree remote add <REMOTE> <URL>
Remove a remote
ostree remote delete <REMOTE>
List configured remotes
ostree remote list
List remote contents
ostree remote refs <REMOTE>
==Basic Commands==
Update to latest
rpm-ostree upgrade
Get system status
rpm-ostree status
Find available updates
rpm-ostree upgrade --check
Switch to a different OS
rpm-ostree rebase <REMOTE>:<BRANCH>
==pull secret==
oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' | jq .
 
==Layered Packages==
Uninstall a layered package
rpm-ostree uninstall <PACKAGE>
Install a layered package
rpm-ostree install <PACKAGE>
==Debugging and Rollback==
Remove the previous deployment
rpm-ostree cleanup --rollback
Download older commits
ostree pull --commit-metadata-only --depth=<n> <REMOTE> <BRANCH>
Make the previous deployment the default boot entry
rpm-ostree rollback
List downloaded commits
ostree log <REMOTE>:<BRANCH>
=dependencies=
Search in output from
Search in output from
  oc describe ...
  oc describe ...
Search for this.
Search for this.
  Controlled By:  ReplicaSet/rook-ceph-osd-0-6dcdc7fb48
  Controlled By:  ReplicaSet/rook-ceph-osd-0-6dcdc7fb48
=metadata.ownerReferences=
Define object that owns object
=nodeAffinity=
Pin pod to node with label (kubectl label nodes <your-node-name> disktype=ssd)
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
=Add user to group=
oc adm groups add-users openshift-admins rb_janitor
=api-int=
api-int.<fqdn>
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; curl -skI https://${i%%:*}:${i##*:} 2>&1 | xargs ; done | cut -c -150
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; set -x ; curl -skv https://${i%%:*}:${i##*:} -o /dev/null 2>&1 | grep "Server certificate:" -A5 ; set +x ; done | cut -c -150
=test talk to api-int=
CACERT=/tmp/%var%lib%kubelet%kubeconfig%certificate-authority-data ; grep certificate-authority-data: /var/lib/kubelet/kubeconfig | awk '{print $2}' | base64 -d > /$CACERT ; curl -s --key /var/lib/kubelet/pki/kubelet-client-current.pem --cert /var/lib/kubelet/pki/kubelet-client-current.pem --cacert $CACERT -XGET "$(grep server /etc/kubernetes/kubeconfig | awk '{print $2}')/api/v1/namespaces/default/pods?limit=500"
=api urls=
kubernetes generic:                    reference to the Kubernetes API server.
kubernetes.default:                    reference to the Kubernetes API server within the "default" namespace.
kubernetes.default.svc:                refers to the Kubernetes service within the "default" namespace.
kubernetes.default.svc.cluster.local:  This is the fully-qualified domain name (FQDN) for the Kubernetes service within the "default" namespace.
openshift:                            Similar to "kubernetes," this is a generic reference to the OpenShift API server.
openshift.default:                    reference to the OpenShift API server within the "default" namespace.
openshift.default.svc:                refers to the OpenShift service within the "default" namespace.
openshift.default.svc.cluster.local:  fully-qualified domain name (FQDN) for the OpenShift service within the "default" namespace.
=okd setup fix=
# On bootstrap node. Could work on all clusters. First a test to se if it work already.
DOMAIN=$(grep " baseDomain: " /etc/mcc/bootstrap/cluster-dns-02-config.yml | awk '{print $2}')
for i in api-int api ; do ping -c1 -W1 $i.${DOMAIN} 2>&1 | xargs; done | cut -c -150
echo "10.1.0.5 api-int.${DOMAIN} api.${DOMAIN}" >> /etc/hosts
=oc annotate=
Update the annotations on one or more resources.
oc annotate pods foo description='my frontend'
=setuid setgid=
  securityContext:
    runAsUser: 10004000
    runAsGroup: 10004000
=patch examples=
Look at oc get ... -o json and copy line after line.
oc patch redis redis-standalone --type merge  --patch '{"spec": {"securityContext": {"runAsGroup": 1000400000}}}'
Enable disable clusterlogging # Unmanaged/Managed
oc patch clusterlogging -n openshift-logging instance --type merge -p '{"spec": {"managementState": "Unmanaged"}}'
Enable disable elasticsearch
oc patch elasticsearch -n openshift-logging elasticsearch --type merge -p '{"spec": {"managementState": "Unmanaged"}}' # Unmanaged/Managed
==finalizers==
Remove finalizers from pod.
oc patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":null}}'
Add finalizer
oc patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":["kubernetes.io/pvc-protection"]}}'
Replace finalizers value with this.
oc patch pod <pod> -n <namespace> --type merge -p '{"metadata":{"finalizers":["kubernetes.io/pvc-protection","kubernetes"]}}'
==edit text/cert entry==
#!/bin/bash
SSL_URL=halfface.se
SSL_PORT=443
DATE_FILE=$(date +%F_%H-%M-%S)
openssl s_client -connect ${SSL_URL}:${SSL_PORT} -servername ${SSL_URL} -verify 5 -showcerts -certform pem </dev/null 2>/dev/null | sed -n '/^----/,/^----/p' > chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem
ln chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem ${SSL_URL}
oc create cm argocd-tls-certs-cm -n argocd --from-file ${SSL_URL} --dry-run=client -o yaml >> /tmp/chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem.patch
oc patch configmap argocd-tls-certs-cm -n argocd --patch-file /tmp/chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem.patch
=limits=
When your need to increase your cpu and memory resources.
cpu limit is either written as a number. 0.5 for half a cpu. Or rather a definition in milli. 500m for  half a cpu.
spec:
  containers:
...
    resources:
      limits:
        cpu: "2"
        memory: 5Gi
      requests:
        cpu: "2"
        memory: 5Gi
=quotas on cpu memory pvc... per project=
oc get ResourceQuota
=tolerations|node selectors|...=
oc describe pod
Node-Selectors:              node-role.kubernetes.io/app=
Tolerations:                node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
                              node.ocs.openshift.io/storage=true:NoSchedule
=enable monitoring=
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata: 
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 2d
EOF
=retention elasticsearch=
Edit the ClusterLogging CR to add or modify the retentionPolicy parameter:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
...
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy:
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3
...
=retention prometheus=
Prometheus retention. https://docs.openshift.com/container-platform/4.10/monitoring/configuring-the-monitoring-stack.html#modifying-retention-time-for-prometheus-metrics-data_configuring-the-monitoring-stack
oc edit configmap cluster-monitoring-config -n openshift-monitoring
# Enable prometheus.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 2d
EOF
=retention prometheus default=
oc get Prometheus k8s -n openshift-monitoring -o json | jq -r .spec.retention
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/status/runtimeinfo" | jq -r '.data.storageRetention'
=EFK(elk)=
ElasticSearch
# Fluentd
processing pipeline
# Kibana.
https://kibana-openshift-logging.apps.<url>
=grafana=
# grafana
https://grafana-openshift-monitoring.apps.<url>
=pull secret=
oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' -o json | jq .
Just the keys.
oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq .
Name of each key and email.
oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq -r '.auths | with_entries(.value = .value.email)' | sed 's/{//g;s/}//g;s/"//g' | grep -v '^$' | sed 's/ *//g' | sort
Download pull secret.
<pre>
oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' > /tmp/pull_secret.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
</pre>
Set pull secret.
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=/tmp/pull_secret_<file_name>
==has pull secret been update==
<pre>
echo '#' pull-secret ; oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' | jq -r '.auths[].email'
echo '#' apiserver ; oc exec deployment/apiserver -n openshift-apiserver -c openshift-apiserver -- cat /var/lib/kubelet/config.json | jq
echo '#' nodes ; oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'cat /var/lib/kubelet/config.json | jq'
</pre>
==Does pull secret work==
jq . /tmp/pull_secret.2024-01-10_12-00-01.registry.redhat.io
{
  "auths": {
    "registry.redhat.io": {
      "auth": "YmxhYmxh"
    }
  }
}
podman pull --authfile /tmp/pull_secret.2024-01-10_12-00-01.registry.redhat.io registry.redhat.io/ubi8/ubi:latest
==Which pull secret does machineconfig contain==
oc get mc 00-master -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig' | sed 's/^data:,//g' | jq .
==Is pull secret correct in machineconfigpool. Rendered config==
oc get mc rendered-master-3626460c7752fc1605e94c19b7a9aba7 -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | sed 's/^data:,//g' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig'| jq .
=change number of nodes=
oc get machineset -n openshift-machine-api
oc edit machineset -n openshift-machine-api <MachineSet>
=Elasticsearch status=
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cat/health?v
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cluster/health?pretty
=talk to elasticsearch=
oc rsh elasticsearch-cdm-q8apadpa-1-65f99d99b4-8b9wg
curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200
Oneliner
oc exec -n openshift-logging -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers -n openshift-logging | head -1) -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200
=which version of elasticsearch operator is installed=
oc get csv -n  openshift-operators-redhat -l operators.coreos.com/elasticsearch-operator.openshift-operators-redhat="" -o=custom-columns='VERSION:.spec.version' --no-headers
==list nodes==
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/nodes?v"
==Who is master node==
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/master?v"
==Is cluster recovering==
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/recovery?active_only=true"
==Look at all indices==
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v
=look at chards=
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v
=Create audit index=
oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=audit-000001 -XPUT
==Remove all red indices.==
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v | grep ^red | awk '{print $3}'  | while read i ; do echo '*' $i ; oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=${i} -X DELETE ; done
=vsphere creds=
oc get -n kube-system cm/cluster-config-v1 -o yaml
=does vsphere account have expected permissions=
oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --timestamps --tail=100 | less
=Enable openshift/okd logging=
==Enable redhat-operators==
oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": false}]'
Or edit
oc edit operatorhubs
Spec:
  Disable All Default Sources:  true
  Sources:
    Disabled:  false
    Name:      community-operators
    Disabled:  false
    Name:      redhat-operators
==Create namespace==
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-operators-redhat
  annotations:
    openshift.io/node-selector: ""
  labels:
    openshift.io/cluster-monitoring: "true"
EOF
==Create namespace==
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    openshift.io/node-selector: ""
  labels:
    openshift.io/cluster-monitoring: "true"
EOF
==Create operatorgroup==
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-operators-redhat
  namespace: openshift-operators-redhat
spec: {}
EOF
==Subscribe to OpenShift Elasticsearch Operator==
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: "elasticsearch-operator"
  namespace: "openshift-operators-redhat"
spec:
  channel: "stable"
  installPlanApproval: "Automatic"
  source: "redhat-operators"
  sourceNamespace: "openshift-marketplace"
  name: "elasticsearch-operator"
EOF
==Install the openshift logging operator.==
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  targetNamespaces:
  - openshift-logging
EOF
==Create a subscription object yaml file.==
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  channel: "stable"
  name: cluster-logging
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF
==Create OpenShift Logging instance.==
cat <<EOF | oc apply -f -
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed" 
  logStore:
    type: "elasticsearch" 
    retentionPolicy:
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3
      storage:
        storageClassName: "standard-csi"
        size: 200G
      resources:
        limits:
          memory: "16Gi"
      requests:
        memory: "16Gi"
      proxy:
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana" 
    kibana:
      replicas: 1
  collection:
    logs:
      type: "fluentd" 
      fluentd: {}
EOF
=telemetry=
Restart telemetry.
oc delete pod -n openshift-monitoring -l app.kubernetes.io/component=telemetry-metrics-collector
=Update vsphere/openstack creds=
oc edit cm cloud-provider-config -n openshift-config
default-datastore = "cl07-2-fc-loc-001"
=Get datastore=
oc get cm cloud-provider-config -n openshift-config -o json | jq -r .data.config | sed -nr "/^\[Workspace\]/ { :l /^default-datastore[ ]*=/ { s/[^=]*=[ ]*//; p; q;}; n; b l;}"
=Manage labels.=
Add a label to a node or pod:
oc label node node001.krenger.ch mylabel=myvalue
oc label pod mypod-34-g0f7k mylabel=myvalue
Remove a label (in the example “mylabel”) from a node or pod:
oc label node node001.krenger.ch mylabel-
oc label pod mypod-34-g0f7k mylabel-
Permanently label a node
oc edit machineset ocp-qz7hf-worker-us-west-1b -n openshift-machine-api
=rollout=
Restart pod in an deployment
oc rollout restart deployment -n openshift-storage csi-rbdplugin-provisioner
=api.<URL>=
openssl_x509_multi_line <(oc get secrets external-loadbalancer-serving-certkey -n openshift-kube-apiserver -o json | jq -r '.data."tls.crt"|@base64d')
=ssl certificates replace=
How to replace api.<url> and star.apps.<url> certs.
# api. Create full chain cert. Public - intermediate - root ca.
api.<url>.crt
api.<url>.key
# create secret
oc delete secret api-cert -n openshift-config
oc create secret tls api-cert --cert=api.<url>.crt --key=api.<url>.key -n openshift-config
# patch apiserver
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<url>"], "servingCertificate": {"name": "api-cert"}}]}}}'
...
# star.apps. Create full chain cert. Public - intermediate - root ca.
star.apps.<url>.crt
star.apps.<url>.key
# create secret
oc delete secret custom-certs-default -n openshift-ingress
oc create secret tls custom-certs-default --cert=star.apps.<url>.crt --key=star.apps.<url>.key -n openshift-ingress
# patch ingress controller
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
==edit serving certs==
look at api cert
oc get secret -n openshift-config $(oc get apiservers cluster -o json | jq -r '.spec.servingCerts.namedCertificates[].servingCertificate.name') -o json | jq -r '.data."tls.crt"' | base64 -d
Patch secret api cert
oc patch secret -n openshift-config $(oc get apiservers cluster -o json | jq -r '.spec.servingCerts.namedCertificates[].servingCertificate.name') -p '{"data":{"tls.crt": "<new-base64-encoded-certificate>"}}'
Look at ingress cert. wildcard.apps.<url>
oc get secret -n openshift-ingress $(oc get -n openshift-ingress-operator ingresscontrollers default -o json | jq -r .spec.defaultCertificate.name) -o json | jq -r '.data."tls.crt"' | base64 -d
Patch secret ingress wildcard.apps.<url>
oc patch secret -n openshift-ingress $(oc get -n openshift-ingress-operator ingresscontrollers default -o json | jq -r .spec.defaultCertificate.name) -p '{"data":{"tls.crt": "<new-base64-encoded-certificate>"}}'
=After you update above certificates then the following config map is updated to reflect that=
openssl_x509_multi_line <(oc get cm kube-root-ca.crt -o json | jq -r '.data."ca.crt"')
=get cluster-id=
oc get clusterversion/version -o jsonpath="{.spec.clusterID}"
=api=
Process running api server. They scale horizontally. They all serve requests.
openshift-kube-apiserver
kube-apiserver
=kube-proxy=
kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
kube-proxy uses the operating system packet filtering layer if there is one and it's available. Otherwise, kube-proxy forwards the traffic itself.
=Resource Allocation=
OS and Kubernetes overhead. You can see the reserved OS & Kubernetes overhead by comparing the Allocatable (what the Kubernetes Scheduler can allocate to Pods) and the Capacity.
Capacity:
->cpu:                4
  ephemeral-storage:  125293548Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
->memory:            16409360Ki
  pods:              250
Allocatable:
->cpu:                3500m
  ephemeral-storage:  114396791822
  hugepages-1Gi:      0
  hugepages-2Mi:      0
->memory:            15258384Ki
  pods:              250
==requests/limits==
User pod allocation is calculated by looking at the “Requests” resource columns from the kubectl get nodes output.
The relevant columns here are the “Requests, not Limits.
Requests impact how the pod is scheduled, and what resources are allocated to it,
whereas limits are used to enable pods to burst beyond their allocation.
==look at current Allocated resources==
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name' | while read NODE ; do oc describe node $NODE | grep "Allocated resources:" -A10 | grep -E ' cpu | memory ' | while read RESOURCE ; do echo $NODE $RESOURCE ; done ; done
==empty space==
Allocatable - Allocated resources = empty
Allocatable:
  cpu:                3500m
  ephemeral-storage:  114396791822
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:            15258384Ki
  pods:              250
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource          Requests      Limits
  --------          --------      ------
  cpu                834m (23%)    0 (0%)
  memory            2474Mi (16%)  736Mi (4%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
=status of namespace=
Show an overview of the current project
oc status
=age of cluster=
Looking at age of machines.
oc get nodes -o json | jq -r '.items[].metadata.creationTimestamp' | sort -n | sed 's/T/ /g;s/Z//g'
=oc adm inspect=
oc adm inspect namespace/isilon
tar cf /tmp/inspect.isilon.$(date_file ) inspect.local.*
=Operations Lifecycle manager(olm)=
oc logs -l app=olm-operator -n openshift-operator-lifecycle-manager --tail=-1
=Reinstall operator that is no longer available with current openshift version=
# Force install odf which is not possible to install because openshift has moved more than 1 version.
# Save subscription
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do
oc get subscription -o yaml -l $i > oc_get_subscription_${i//\//_}.yaml ; done
...
# Save operators
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do
oc get csv -l $i -o yaml > oc_get_csv_-l_${i//\//_}.yaml ; done
...
# Confirm backup files contain usable yaml. Have we forgotten any operators or csv:s. Remove resources clearly not related to odf.
...
# delete the existing ODF related subscriptions and the ClusterServiceVersions related:
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do
oc delete subscription -l $i; done
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do
oc delete csv -l $i  ; done
...
# Make sure you wait for the CSVs to be deleted before creating a subscription again.
...
# create only the the Subscription again:
# (optional: edit the subscription before recreate, changing the channel version to the goal version)
...
# Recreate subscription
oc create -f 'oc_get_subscription_operators.coreos.com_odf-operator.openshift-storage=.yaml'
# wait watching the events:
oc get events -w
=increase disk on node=
Update worker machineset.
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"rootVolume": {"diskSize" : 50}}}}}}}'
==View results from above==
oc get machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') -o yaml | tee /tmp/$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}').$(date +%F_%H-%M-%S).yaml
==Update on node only==
VOLUME=abjorklund-01-h4sxm-worker-0-rkk87-root
os volume set --size 40 $VOLUME --os-volume-api-version 3.42
dnf install cloud-utils-growpart xfsprogs
ssh core@worker
growpart /dev/sda 4
xfs_growfs /
=increase ram on worker nodes=
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"memoryMiB" : 24576}}}}}}'
=Change flavor of worker node=
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"flavor" : "hm.4x16"}}}}}}'
=set number of worker nodes=
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"replicas" : 2}}'
=clusteroperator=
ClusterOperator is the Custom Resource object which holds the current state of an operator. Clusteroperator is resposible for core, systemwide functions like dns and so on.
oc get clusteroperators
oc get co
oc get clusteroperators -o custom-columns=NAME:.metadata.name,ANNOTATIONS:.metadata.annotations
=ignition=
Retrieve rendered ignition data.
curl https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $NF}'):22623/config/master
curl -v https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $2}'):22623/config/worker
=rockylinux container names=
ubi ("Standard"): OpenSSL, microdnf, and utilities like gzip and vi
ubi-minimal ("Minimal"): Minimized binaries and minimal yum stack.
ubi-init ("Multi-service"): Less than standard but more than minimal, plus systemd.
ubi-micro ("Micro"): Most minimal image without even a package manager.
=create a job/pod/script=
==Create config map of script==
Notice that I have to escape $. Since I give date in a here document. Where $ is being expanded.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: dns-lookup.sh
data:
  dns-lookup.sh: |
    #!/bin/bash
    # Verify if dns resolution works and how fast.
    while true ; do
      for DNS in \$(awk '/^nameserver / {print \$2}' /etc/resolv.conf) 10.2.0.10 ; do
        echo \$(date '+%F %H:%M:%S %Z') \$DNS \$(host -v -t A ibm.se 2>&1 | tail -3 )
      done
      sleep 5
    done
EOF
==create job==
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: dns-lookup
spec:
  template:
    spec:
      containers:
        - name: dns-lookup
#          image: rockylinux/rockylinux:9
          image: halfface/rockylinux-toolbox:v2
          command: ["/script/dns-lookup.sh"]
          volumeMounts:
            - name: script
              mountPath: "/script"
#          securityContext:
#            runAsUser: 0
#            privileged: true
      volumes:
        - name: script
          configMap:
            name: dns-lookup.sh
            defaultMode: 0755
      restartPolicy: Never
      activeDeadlineSeconds: 1209600
EOF
=deployment with command=
==Configmap with script. $ is escaped since feed via here document.(bash)==
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: stress.sh
  namespace: abjorklund
data:
  stress.sh: |
    #!/bin/bash
    # stress pod.
    while true ; do
      echo \$(date '+%F %H:%M:%S %Z') \$( stress -m 1 --vm-bytes 3000M --vm-keep -t 300s )
      sleep 5
    done
EOF
==Deployment==
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
#  name: stress-gp-6x12
  name: stress-hm-6x32
  namespace: abjorklund
  labels:
    app: stress
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stress
  template:
    metadata:
      labels:
        app: stress
    spec:
      containers:
      - name: stress
        image: halfface/rockylinux-toolbox:v3
        volumeMounts:
        - mountPath: /mnt/bin/
          name: stress
        command: ["/mnt/bin/stress.sh"]
        resources:
        resources:
          limits:
              cpu: 500m
          requests:
            cpu: 500m
            memory: 1Gi
      volumes:
        - name: stress
          configMap:
            name: stress.sh
            defaultMode: 0755
      nodeSelector:
#        node.kubernetes.io/instance-type: gp.6x12
        node.kubernetes.io/instance-type: hm.6x32
EOF
=terminal fix=
No line wraps
tput rmam
=list operatorhub/catalogsources=
oc get catalogsources -n openshift-marketplace
oc get catalogsources -n openshift-marketplace -o custom-columns=NAME:.metadata.name,DISPLAY:.spec.displayName,STATE:.status.connectionState.lastObservedState,TYPE:.spec.sourceType,PUBLISHER:.spec.publisher,IMAGE:.spec.image
=remove catalogsources=
oc get catalogsources.operators.coreos.com -n openshift-marketplace -l company=cambio --no-headers -o custom-columns=:.metadata.name | while read i ; do echo oc get catalogsources $i -n openshift-marketplace -o yaml \>oc_get_catalogsources.$(oc_api_url).$i.$(date_file).yaml ; echo oc delete catalogsource -n openshift-marketplace $i ; done
=which changes will occure=
. /etc/node-sizing-enabled.env ; NODE_SIZES_ENV=/tmp/node-sizing.env /usr/local/sbin/dynamic-system-reserved-calc.sh true ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES} ; sdiff /etc/node-sizing.env /tmp/node-sizing.env
=SYSTEM_RESERVED=
cat <<EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: dynamic-node
spec:
  autoSizingReserved: true
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
EOF
Which changes will occur.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'hostname ; . /etc/node-sizing-enabled.env ; NODE_SIZES_ENV=/tmp/node-sizing.env /usr/local/sbin/dynamic-system-reserved-calc.sh true ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES} ; sdiff /etc/node-sizing.env /tmp/node-sizing.env' 2>/dev/null
=CNI=
oc get networks cluster -o 'custom-columns=NETWORKTYPE:.spec.networkType'
Cni from install
echo -e "$(oc --request-timeout=5 get -n kube-system cm/cluster-config-v1 -o json | jq -r '."data"."install-config"')" | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)' | jq -r .networking.networkType
=autoscale.=
https://docs.openshift.com/container-platform/4.12/machine_management/applying-autoscaling.html
==ClusterAutoscaler==
cat <<EOF | oc apply -f -
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  podPriorityThreshold: -10
  resourceLimits:
    maxNodesTotal: 24
    cores:
      min: 8
      max: 128
    memory:
      min: 4
      max: 256
  logVerbosity: 4
  scaleDown:
    enabled: true
    delayAfterAdd: 10m
    delayAfterDelete: 5m
    delayAfterFailure: 30s
    unneededTime: 5m
    utilizationThreshold: "0.4"
EOF
==MachineAutoscaler==
cat <<EOF | oc apply -f -
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: abjorklund-01-h4sxm-worker-0
  namespace: openshift-machine-api
spec:
  minReplicas: 1
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: abjorklund-01-h4sxm-worker-0
EOF
=autoscaler does not scale down=
oc logs -l cluster-autoscaler=default -n openshift-machine-api --tail=-1 --timestamps=true
=change dns server for domain=
oc edit dns.operator/default
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  name: default
spec:
  servers:
  - name: halffce-server
    zones:
    - halfface.se
    forwardPlugin:
      policy: Random
      upstreams: 10.111.222.2
# View config.
oc get configmap/dns-default -n openshift-dns -o yaml
=coredns=
# tail logs.
oc get events -A --sort-by=.metadata.creationTimestamp
# Change debug level.
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge
Sets
log . {
class denial error
}
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge
Sets
log . {
class all
}
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Normal"}}' --type=merge
Sets
log . {
class error
}
==Get log files for analyze==
oc get pods -l dns.operator.openshift.io/daemonset-dns=default  -o custom-columns=POD:.metadata.name,NODE:.spec.nodeName --no-headers -n openshift-dns | while read i j ; do oc logs $i --tail=-1 -c dns --timestamps=true -n openshift-dns > /tmp/oc_logs_$j.$i.$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}').$(date +%F_%H-%M-%S) ; done
=get instance dns name=
oc get DNS cluster -o=jsonpath='{.spec.baseDomain}'
=Read values provided by coredns /metrics=
oc exec -it -n openshift-dns $(oc get pods -l dns.operator.openshift.io/daemonset-dns=default --no-headers -n openshift-dns| head -1) -- curl -s http://localhost:9153/metrics
=coredns default logformat=
# Default format
{remote}:{port} - {>id} "{type} {class} {name} {proto} {size} {>do} {>bufsize}" {rcode} {>rflags} {rsize} {duration}
# Values explained
{port}: client’s port
{remote}: client’s IP address, for IPv6 addresses these are enclosed in brackets: [::1]
{>id}: query ID
{type}: qtype of the request
{class}: qclass of the request
{name}: qname of the request
{proto}: protocol used (tcp or udp)
{size}: request size in bytes
{>do}: is the EDNS0 DO (DNSSEC OK) bit set in the query
{>bufsize}: the EDNS0 buffer size advertised in the query
{rcode}: response RCODE
{>rflags}: response flags, each set flag will be displayed, e.g. “aa, tc”. This includes the qr bit as well
{rsize}: raw (uncompressed), response size (a client may receive a smaller response)
{duration}: response duration
=Confirm that coredns hosts are possible to resolve=
<pre>
grep match /etc/coredns/Corefile | uniq | sed 's/\[//g;s/\]//g;s/^ *match //g;s/\.\*/test/g;s/^\^//g' | while read i ; do echo $(dig +short ${i}.) ${i}. ; done
</pre>
=Create lets encrypt certificates on dns domain in route53 which is managed by certmanager.=
#Create a domain in route 53.
#Create a user with a token for "Application running outside AWS"
==Fill in below values to be able to update config below.==
Hosted_Zone_id:    <Hosted_Zone_id>
Access_key:        <Access_key>
Secret_access_key: <Secret_access_key>
DNS_Domain:        <DNS_Domain>
DNS_shortname:    <DNS_shortname>
==Attach the following policy to your newly created user.==
(Populate all <Values> below.)
{
    "Version": "2023-11-22",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "route53:GetChange",
            "Resource": "arn:aws:route53:::change/*"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ChangeResourceRecordSets",
            "Resource": "arn:aws:route53:::hostedzone/<Hosted_Zone_id>"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ListHostedZonesByName",
            "Resource": "*"
        }
    ]
}
==Create namespace==
oc create namespace cert-manager
==Install cert-manager community version via graphical fluff.==
==Create secret that includes <Secret_access_key>.==
oc create secret generic route53-secret --from-literal=secret-access-key="<Secret_access_key>" -n cert-manager
==Create ClusterIssuer for letsencrypt which uses route53 to show that you own dns.==
cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod-dns
  namespace: cert-manager
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: support@company.se
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: <DNS_shortname>-issuer-account-key
    solvers:
      - selector:
          dnsZones:
            - "<DNS_Domain>"
        dns01:
          route53:
            accessKeyID: <Access_key>
            secretAccessKeySecretRef:
              name: route53-secret
              key: secret-access-key
            hostedZoneID: <Hosted_Zone_id>
            region: 'us-east-1'
EOF
==Create api certificate.==
cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cert-api
  namespace: openshift-config
spec:
  issuerRef:
    name: letsencrypt-prod-dns
    kind: ClusterIssuer
  dnsNames:
      - "api.<DNS_Domain>"
  secretName: le-api-cert
  commonName: "api.<DNS_Domain>"
EOF
==Start to use api certificate.==
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<DNS_Domain>"], "servingCertificate": {"name": "le-api-cert"}}]}}}'
==Create ingress certificate==
cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: le-wildcard-apps-certificate
  namespace: openshift-ingress
spec:
  issuerRef:
    name: letsencrypt-prod-dns
    kind: ClusterIssuer
  dnsNames:
    - "*.apps.<DNS_Domain>"
  secretName: le-wildcard-apps-certificate
  commonName: "*.apps.<DNS_Domain>"
EOF
==Start to use ingress certificate.==
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"le-wildcard-apps-certificate"}}}'
=resolv.conf=
ndots 5. This means that the DNS client will automatically consider a domain name to be fully qualified (which will allow it to skip the search path iteration) if it has five or more dots.
=bind to external login sources ldap ad=
oc get authentications.operator.openshift.io cluster -o yaml
=get machine name and creation time=
oc get machines -o=custom-columns='NAME:.metadata.name,CREATIONTIMESTAMP:.metadata.creationTimestamp,TYPE:.spec.providerSpec.value.flavor,STATUS:.status.phase' -n openshift-machine-api
=setup nfs server=
nfs export shared between pods.
==Create server==
openstack server create --flavor gp.1x2 --availability-zone europe-se-1a --image rocky-8-x86_64 --boot-from-volume 30 --network abjorklund-01-bmc7w-openshift --security-group ssh_allow --key-name abjorklund_ed25519 abjorklund_$(date_file)
openstack volume create --size 50 --type ssd --description "nfs storage block device 0" nfs_storage_abjorklund-01
openstack server add volume e93d2db1-6d95-4364-a236-0bd1b9255e90 28adbdb9-c88d-4397-9a79-b13c505016a8 --device /dev/vdb
==install nfs dependencis==
dnf -y install cloud-utils-growpart nfs-utils iptables-utils epel-release vim-enhanced
==How to grow filesystem.==
partx growpart
os volume set --size 60 nfs_storage_abjorklund-01 --os-volume-api-version 3.42
==Create partion and disk.==
gdisk /dev/sdb
mkfs.ext4 /dev/sdb1
find /dev/ -ls | grep sdb | grep by-uuid
==Mount drive. /etc/fstab==
UUID=66998126-9f18-44ce-a462-827c870a57bd /netstorage                      ext4    defaults        0 0
mkdir /netstorage
mount /netstorage/
mkdir /netstorage/abjorklund-01
chmod 777 /netstorage/abjorklund-01
==export drive==
systemctl enable nfs-server.service --now
/etc/exports
/netstorage/abjorklund-01 10.1.0.0/16(rw,root_squash)
exportfs -rav
==setup deployment==
# deployment.yaml
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  labels:
    app: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/worker
                    operator: Exists
      serviceAccountName: nfs-client-provisioner
      securityContext:
        supplementalGroups:
          - 65534
          - 1261150637
      containers:
        - name: nfs-client-provisioner
          image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: auto-nfs-storage
            - name: NFS_SERVER
              value: 10.1.0.48
            - name: NFS_PATH
              value: "/netstorage/abjorklund-01"
      volumes:
        - name: nfs-client-root
          nfs:
            server: 10.1.0.48
            path: /netstorage/abjorklund-01
EOF
# nfs-clusterrolebinding.yaml
cat <<EOF | oc apply -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
EOF
# nfs-clusterrole.yaml
cat <<EOF | oc apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
EOF
# nfs-rolebinding.yaml
cat <<EOF | oc apply -f -
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io
EOF
# nfs-role.yaml
cat <<EOF | oc apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
EOF
# nfs-sa.yaml
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
EOF
# storageclass.yaml
cat <<EOF | oc apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-nfs-storage
provisioner: auto-nfs-storage # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  onDelete: delete
EOF
# test-claim.yaml
cat <<EOF | oc apply -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim
  namespace: default
spec:
  storageClassName: managed-nfs-storage
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
EOF
=set nfs csi driver=
https://github.com/kubernetes-csi/csi-driver-nfs
=dns=
https://access.redhat.com/solutions/3804501
==confirm upstream dns works==
for UPSTREAM_DNS_IP in 10.46.201.1 10.46.201.2 10.46.201.3 ; do UPSTREAM_DNS_PORT=53 ; echo -e "\nTCP\n"; for dnspod in `oc get pods -n openshift-dns -o name --no-headers -l dns.operator.openshift.io/daemonset-dns=default`; do echo "Pod $dnspod"; oc exec -n openshift-dns -c dns $dnspod -- dig @${UPSTREAM_DNS_IP} redhat.com -p ${UPSTREAM_DNS_PORT} +tcp +short; echo; done ; done
for UPSTREAM_DNS_IP in 10.46.201.1 10.46.201.2 10.46.201.3 ; do UPSTREAM_DNS_PORT=53 ; echo -e "\nUDP\n"; for dnspod in `oc get pods -n openshift-dns -o name --no-headers -l dns.operator.openshift.io/daemonset-dns=default`; do echo "Pod $dnspod"; oc exec -n openshift-dns -c dns $dnspod -- dig @${UPSTREAM_DNS_IP} redhat.com -p ${UPSTREAM_DNS_PORT} +notcp +short; echo; done ; done
=image=
Which images are ok.
oc get image.config.openshift.io cluster -o yaml
=enable sso with keycloak=
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  annotations: {}
  labels:
    app.kubernetes.io/instance: sso
  name: cluster
spec:
  identityProviders:
    - mappingMethod: add
      name: SSO
      openID:
        claims:
          email:
            - email
          groups:
            - groups
          name:
            - name
          preferredUsername:
            - preferred_username
        clientID: <Client name in keycloak>
        clientSecret:
          name: keycloak-client-secret
        extraScopes: []
        issuer: <URL to issuer>
      type: OpenID
---
apiVersion: v1
data:
  clientSecret: <base64 secret>
kind: Secret
metadata:
  labels:
    app.kubernetes.io/instance: sso
  name: keycloak-client-secret
  namespace: openshift-config
=keepalive/api/ingress=
On nodes where nodes server the same ip for api or ingress.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo "# unicast_peer" > /etc/keepalived/keepalived.conf'
Get info about where ingress is running.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'ip a' 2>&1 | tee /tmp/tmp ; grep $(host $(oc whoami --show-server | awk -F ':|/' '{print $4}') | awk '{print $NF}') /tmp/tmp
=diff rendered mc=
export OLD_RENDERED=rendered-infra-6c7e5fc796264dd32341950aea971807 ; export NEW_RENDERED=rendered-infra-bac1dd431374a5c4c21742e547739c7c ; diff -NrU 5 <(oc get mc ${OLD_RENDERED} -o json) <(oc get mc ${NEW_RENDERED} -o json)
=secret management=
List secrets of they type tls.
get secrets --field-selector type=kubernetes.io/tls
=ocm=
==ocm install==
(cd /usr/local/bin/ ; sudo curl -vLsk https://github.com/openshift-online/ocm-cli/releases/download/v0.1.72/ocm-linux-amd64 -o ocm ; sudo chmod 755 ocm)
==ocm search examples==
ocm list clusters --parameter search="name like 'da0d9ade-d649-4948-8bc6-744a1fcb0960'"
ocm get /api/clusters_mgmt/v1/clusters --parameter search="name like '0047ccf6-134b-4bff-99e0-5f2d6532a3ea'"
ocm get /api/accounts_mgmt/v1/subscriptions/ --parameter size=1000 | jq -r '.items[]| .display_name +"\t"+ .status +"\t"+ .cluster_id +"\t"+ .created_at' | grep -v Archived | column_tab
Search for two states.
ocm get /api/accounts_mgmt/v1/subscriptions/ --parameter search="status like 'Active' or status like 'Stale'" --parameter size=1000
=PodDisruptionBudget=
API object that specifies the minimum number of replicas that must be up at a time.
=pod placement=
Does it look sane which pods run on worker nodes. Search for pods on worker nodes and look for the same pods on all nodes.
oc get nodes --no-headers --selector='node-role.kubernetes.io/worker,!node-role.kubernetes.io/infra' -o=custom-columns='NAME:.metadata.name' | while read NODE ; do oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=$NODE" | while read NAMESPACE POD REST ; do echo '#' $NAMESPACE ${POD%-*} ; oc get pods -n $NAMESPACE -o wide | grep ${POD%-*} ; done ; done | less -ISRM
Are any user pods running outside worker nodes?
oc get project --no-headers  -o=custom-columns='NAME:.metadata.name' | grep -v ^openshift- | while read NAMESPACE ; do echo '*' $NAMESPACE ; oc get pods -o wide -n $NAMESPACE ; done
=wait=
Wait for kafka getting ready.
kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka
=list configured ssh public keys=
oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINECONFIG ; do echo '*' "${MACHINECONFIG}" ; oc get machineconfig "${MACHINECONFIG}" -o json | jq -r '.spec.config.passwd.users[].sshAuthorizedKeys[]'; done
=Add key for ssh login=
oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINE_CONFIG_SSH ; do echo '*' $MACHINE_CONFIG_SSH ; oc patch machineconfig $MACHINE_CONFIG_SSH --type=json --patch="[{\"op\":\"add\", \"path\":\"/spec/config/passwd/users/0/sshAuthorizedKeys/-\", \"value\":\"$(cat $HOME/.ssh/id_ed25519.pub)\"}]" ; done
With a save.
oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINE_CONFIG_SSH ; do echo '*' $MACHINE_CONFIG_SSH ; oc_script_log oc get machineconfig $MACHINE_CONFIG_SSH -o yaml </dev/null ; oc patch machineconfig $MACHINE_CONFIG_SSH --type=json --patch="[{\"op\":\"add\", \"path\":\"/spec/config/passwd/users/0/sshAuthorizedKeys/-\", \"value\":\"$(cat $HOME/.ssh/id_ed25519.pub)\"}]" ; done
=readable output from df.=
df -lh | grep -Ev '^overlay|^tmpfs|^shm|^nsfs|^cgroup|^devtmpfs'
=give me openstack credentials=
oc get secret -n kube-system openstack-credentials -o json | jq -r '.data."clouds.yaml" | @base64d'
=extract content of container=
CONT_ID=$(docker create nginx:latest)
docker export ${CONT_ID} -o nginx.tar.gz
=shut down openshift=
Stolen with pride: https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html
# Etcd bacup.
# Do we use proxy.
oc get proxy cluster -o yaml
# Make an etcd backup.
oc debug --as-root node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) -- chroot /host sh -c '/usr/local/bin/cluster-backup.sh /home/core/assets/backup'
# Copy files locally.
MASTER=node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) ; oc debug $MASTER -- chroot /host sh -c 'ls /home/core/assets/backup/*' 2>/dev/null | while read ETCD_BACKUP ; do echo '*' Copying ${ETCD_BACKUP##*/} ; oc debug $MASTER -- chroot /host sh -c "cat $ETCD_BACKUP | gzip -9" | zcat > ${ETCD_BACKUP##*/} ; done
# Confirm files are ok.
MASTER=node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) ; oc debug $MASTER -- chroot /host sh -c 'ls /home/core/assets/backup/*' 2>/dev/null | while read ETCD_BACKUP ; do echo '*' md5sum ${ETCD_BACKUP##*/} ; oc debug $MASTER -- chroot /host sh -c "md5sum $ETCD_BACKUP" 2>/dev/null ; md5sum ${ETCD_BACKUP##*/} ; done
# When does certificate run out.
oc -n openshift-kube-apiserver-operator get secret kube-apiserver-to-kubelet-signer -o jsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}{"\n"}'
# kubelet client/server certificate expiration.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate; openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate'
# If certs expire while being shut down. Then we manually have to approve csr:s when cluster comes up.
# oc get csr -o name | xargs oc adm certificate approve
# Shutdown all nodes.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'shutdown -h 1'
# Now nodes can stay dead until reviving.
# To start up use command similar to this which is from openstack.
openstack server list -f value | grep SHUTOFF | awk '{print $2}' | xargs openstack server start
=statefulset=
StatefulSet is a Kubernetes controller designed to manage stateful applications that require stable network identities and persistent storage. It handles the deployment, scaling, and management of pods in an ordered and predictable manner, making it ideal for databases, distributed systems, and other applications where state preservation is critical.
=oc diff=
Se which changes would be made
kubectl diff -f <manifest>.yaml
=taint=
Remove taint from node.
kubectl taint node control-plane0.novalocal control-plane1.novalocal control-plane2.novalocal node.cloudprovider.kubernetes.io/uninitialized-
=Sealed secrets=
==get secret that you want to unencrypt==
oc get sealedsecrets -n openshift-config ldap-secret -o yaml > sealedsecrets_-n_openshift-config_ldap-secret
==Unencrypt sealed secrets==
kubeseal --recovery-private-key <private_key_file> --recovery-unseal < sealedsecrets_-n_openshift-config_ldap-secret > sealedsecrets_-n_openshift-config_ldap-secret.unsealed
==Get private keys from from Sealed secrets==
oc get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key -o json | jq -r '.items[].data."tls.key"' | while read LINE ; do echo $LINE | base64 -d > $(echo "${LINE}" | cut -c -100) ; done
=imagetag=
ImageTag represents a single tag within an image stream and includes the spec, the status history, and the currently referenced image (if any) of the provided tag
"alertname": "SamplesImagestreamImportFailing",
"namespace": "openshift-cluster-samples-operator",
# Remove import fail
oc -n openshift get imagetag | grep "ImportFailed" | awk -e '{ print $1 }' | xargs oc -n openshift tag -d
=custom-column examples=
oc get machine -n openshift-machine-api -o custom-columns=MACHINE:.metadata.name,SERVERGROUPNAME:.spec.providerSpec.value.serverGroupName,CREATIONTIME:.metadata.creationTimestamp --no-headers

Latest revision as of 15:45, 22 November 2024

What does it mean?

acme                  Automated Certificate Management Environment
annotations           Key=value pairs. That provides metadata for object.
ceph                  Delivers object, block, and file storage in one unified system.
ceph-osd              object storage daemon for the Ceph distributed file system. It is responsible for storing objects on a local file system and providing access to them over the network.
clbo                  CrashLoopBackOff
clo                   Cluster Logging Operator
cmo                   Cluster Monitoring Operator
cncf                  Cloud Native Computing Foundation
cni                   Container Network Interface (OVNKubernetes OpenShiftSDN)
cns                   Cloud Native Storage
cnv                   Container-native Virtualization, add-on to OpenShift Container Platform that allows virtual machine workloads to run and be managed alongside container workloads.
co                    Cluster Operator
cpi                   Cloud Provider Interface
cr                    Custom Resource. (I found it like something added by enabling something. You get it from "oc api-resources")
crd                   Custom Resource Definition. The name of a CRD object must be a valid DNS subdomain name.
cri                   Container Runtime Interface
cri-o                 Lightweight container runtime for kubernetes.
csi                   Container Storage Interface
csm                   Container Storage Modules
csv                   cluster service version
cvo                   Cluster Version Operator
cvss                  Common Vulnerability Scoring System
daemonset             Ensures that all (or some) Nodes run a copy of a Pod
deployment            You describe a desired state in a Deployment. Deployment object describes how to create or modify pods that hold a containerized application by defining the desired state of a particular component. Deployments create and manage how ReplicaSets are deployed.
eo                    ElasticSearch Operator
ephemeral             Short lived, temporary
eus                   Extended Update Support
Fluentd               data collector designed to handle logging by unifying and processing data from various sources.
fluent bit            lightweight and high-performance data collector. logs but can handle metrics too. 
fsgroup               Group which Kubernetes will change the permissions of all files in volumes to when volumes are mounted by a pod. 
geneve                Generic Network Virtualization Encapsulation OVN-Kubernetes uses Geneve.
grcp                  Google Remote Procedure Call, framework that brings performance benefits and modern features to client-server applications. Like RPC
icsp                  ImageContentSourcePolicy. Blocking a payload registry.
idp                   identity provider
idps                  identity providers
implicit              indirect, hinted,
ingressclass          use multiple ingress controllers managing network traffic routing within a cluster.
ipc namespace         Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. .
ipi                   Installer-Provisioned Infrastructure
kcs                   Knowledge Centered Support, Red Hat's way of offering solutions and articles for known questions or problems.
kubelet               Kubelet is the primary "node agent" that runs on each node. Takes a set of PodSpecs (primarily through the apiserver) and ensures the containers described are running and healthy.
kvdb                  key-value store (portworx)
machineset            Managing a set of machines with similar characteristics, manage a group of machines. Desired number of machines.
manifest              Manifest is a YAML or JSON file that describes the desired state of a Kubernetes object.
mco                   machine-config-operator
mcp                   machine config pools
Metricbeat            leightweight shipper for metrics
noobaa                data service for cloud environments, providing S3 object-store interface with flexible tiering, mirroring, and spread placement policies, over any storage resource that allows GET/PUT including S3,GCS..
nsfs                  virtual filesystem making Linux-kernel namespaces available.
oadp                  openshift api data protection
oci                   Open Container Initiative
ocm                   OpenShift Cluster Manager
ocp                   OpenShift Container Platform
ocs                   OpenShift Container Storage
odf                   OpenShift Data Foundation
oidc                  OpenID Connect, is an identity layer on top of the OAuth 2.0 protocol.
olm                   Operator Lifecycle Manager
osm                   Open Service Mesh. Lightweight, extensible, cloud native service mesh
ovnk                  Open Virtual Network Kubernetes
pdb                   Pod Disruption Budget
pvc                   Persistent volume claim. binding between a Pod and Persistent Volume.
pv                    Persistent volume. Persistent storage. low level representation of a storage volume.
prometheus            Prometheus is a time-series database (TSDB). handle the collection, storage, and querying of time-series data. Alerting 
provisioner           A StorageClass object contains a provisioner that decides which volume plugin is used to provision PersistentVolumes.
quay.io               builds, analyzes, distributes your container images. Owned by IBM
ReadWriteMany         Storage read/write for many.
reconciling           Restore friendly relations between.
registrar             The node-driver-registrar is a sidecar container that registers the CSI driver with Kubelet using the kubelet plugin registration mechanism.
replicaset            Maintain a stable set of replica Pods running at any given time
rhacm                 Red Hat Advanced Cluster Management for Kubernetes 
rhcos                 Red Hat Enterprise Linux CoreOS
rhcp                  Red Hat Ceph Storage
rhcs                  Red Hat Cluster Suite
rhocp                 Red Hat OpenShift Container Platform
rhol                  Red Hat OpenShift Logging
rook                  Operator. File, block, and object storage for your cloud native environment and is based on battle tested ceph storage.
rosa                  Red Hat OpenShift Service on AWS
s2i                   source-to-image
sa                    Service Account
scc                   security context constraints
sc                    security context
seccomp               Secure computing mode profiles can be associated with a container to restrict available system calls.
SelfLink              URL representing the given object.
service               Logical abstraction for a deployed group of pods in a cluster (which all perform the same function).
skopeo                Command line utility used to interact with local and remote container images and container image registries
StatefulSet           Workload object to manage stateful applications. Deployment and scaling Pods, ordering and uniqueness of Pods.
Storage Class         allows for dynamic provisioning of Persistent Volumes.
svc                   service
taint                 Taints ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node.
tekton                Container-native way to manage CI/CD. It's also the basis for OpenShift Pipelines.
thanos                Long-Term storage for your Prometheus Metrics on OpenShift
toleration            You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints.
ubi                   Universal Base Images OCI-compliant container base operating system images with complementary runtime languages and packages that are freely redistributable.
upi                   User-Provisioned Infrastructure
uts                   Unix Timesharing System namespace. Controls the hostname and the NIS domain.
uWSGI                 Project aims at developing a full stack for building hosting services.
vxlan                 virtual extensible LAN, The OpenShift SDN uses OpenvSwitch tunnels, OpenFlow rules, and iptables. 
wwn                   world wide names. Fiber channel

where do I start

. <(oc completion bash)  Get bash completion running.
oc help                  Get commands
oc api-resources         What can you use commands on.
oc options               Which options apply to all commands

read

https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/

Projects that I have read about but forgotten

OpenEBS              Storage solution. Possible backends. local, nfs, zfs, nvme. CStor to serve iSCSI block storage using the underlying disks or cloud volumes in a cloud native way

files of value

metadata.json         File created during install. Used by openshift-install destroy cluster

oc get

Available resources to ask about.

oc api-resources

Get everything

oc api-resources -o name --no-headers | while read i ; do echo '***' $i ; oc get $i -A -o yaml 2>&1 ; done > /tmp/oc_api-resourece.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S)

login

oc login --username developerhttps://openshift:6443

switch user

oc login --username developer

which clusters have you logged into

oc config get-clusters

List projects

oc projects
oc get projects

select project

oc project $project
kubectl config set-context --current --namespace=kube-public

create project/namespace

oc create namespace redis

list pods

oc get pods
oc get pods --all-namespaces
oc get pods -o wide

wide will give you on which node pod is running.

oc get pods -o wide --all-namespaces

Get pods that are not runing.

oc get pods --field-selector status.phase!=Running --all-namespaces
oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done

Get pods matching two states

oc get pods --field-selector=status.phase!=Running,spec.restartPolicy=Always
oc get nodes --no-headers --selector='node-role.kubernetes.io/worker,!node-role.kubernetes.io/infra'

Get pods running on specific node

oc get pods -A -o wide --field-selector spec.nodeName=<node>

Get pods with label name=portworx-proxy

oc get pods -A -l name=portworx-proxy

Get pods with several labels

oc get pod -l 'app in (rook-ceph-mon,rook-ceph-operator,rook-ceph-osd,rook-ceph-rgw,rook-ceph-mgr,rook-ceph-mds,rook-ceph-crashcollector)'

Get pods with extra column port.

kubectl get pods --output=custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,IP:.status.podIPs[*].ip,POD_PORT:.spec.containers[*].ports[*].containerPort

Get pods with column restarts

oc get pods -o custom-columns='NAMESPACE:.metadata.namespace,POD:.metadata.name,RESTART:.status.containerStatuses[*].restartCount' -A | sort -k3 -n | tail -10

Service

A Kubernetes Service is an abstraction that defines a logical set of Pods and a policy by which to access them. Services enable loose coupling between dependent Pods.

Endpoint

An Endpoint is an object that represents the IP addresses and ports of the Pods that back a Service. When a Service is created, Kubernetes automatically creates an associated Endpoints object.

EndpointSlices

EndpointSlices offer a scalable, efficient, and feature-rich alternative to traditional Endpoints, topology.

get shell on node

It is possible to debug more than nodes. (deployment, build, or job)

oc debug node/infra-2.ocpdev.lkl.ltkalmar.se

Get working env

chroot /host

Connect to node in eks.

kubectl debug node/<node> -it --image=halfface/rockylinux-toolbox:v3

get debug information from oc

oc debug --loglevel=10 node/$node

debug pod run as root disable health checks

oc debug deployment/my-deployment-name --as-root

get nodes

oc get nodes
oc get nodes -o jsonpath='{.items[*].metadata.name}'
  1. Get nodes without headears. name, cpu:s, disk size, mem, ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,CPU:.status.capacity.cpu,DISK:.status.capacity.ephemeral-storage,MEM:.status.capacity.memory,IP:.status.addresses[?(@.type=="InternalIP")].address'
  1. Get node name and ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,IP:.status.addresses[?(@.type=="InternalIP")].address'

ip address of node

Outside pod.

oc get pod --template 'Template:.status.podIP' openshift-gitops-application-controller-0

Inside pod.

echo $POD_IP

get nodes that are overcommited

oc get nodes -o jsonpath='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' | sed 's/:/=node;/g' | sed 's/;/\n/g' | grep -vE 'MemoryPressure=False|DiskPressure=False|PIDPressure=False|Ready=True'

Does any node stick out.

oc get nodes --no-headers -o=custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions

connect to pod

oc rsh $pod bash

list containers in pod

oc get pod/router-default-6b76b87c6-5m7h6 -n openshift-ingress -o json | jq -r '.spec.containers[].name'
router
logs

list all containers running in a cluster

kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq -c

connect to container in pod

oc rsh -c router pod/router-default-6b76b87c6-5m7h6

get logs from all containers excluding namespace ^openshift from last 24 hours with timestamp

oc get pods --no-headers --field-selector status.phase=Running -A -o custom-columns=NAMESPACE:.metadata.namespace,POD:.metadata.name | grep -v ^openshift | while read NAMESPACE POD ; do for CONTAINER in $(oc get pod $POD -n $NAMESPACE -o json | jq -r '.spec.containers[].name') ; do echo oc logs -n ${NAMESPACE} ${POD} -c ${CONTAINER} ; oc logs -n ${NAMESPACE} $POD -c $CONTAINER --since=24h --timestamps=true 2>&1 | grep "Error: getaddrinfo EAI_AGAIN " ; done ; done

tail logs for pods matching label

oc logs -n openshift-storage -l app=csi-cephfsplugin -c driver-registrar -f  --max-log-requests 8 --tail=1
oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --tail=-1
oc logs -f --tail=0 router-default-6c666984fd-ct8zf logs
oc logs -f --namespace openshift-gitops deployment/openshift-gitops-server

Search for log entries locally on node

ls -la $(ls -la $(grep -l EAI_AGAIN /var/log/containers/*) | awk '{print $NF}')
grep -rl EAI_AGAIN /var/log/pods/

execute command in pod

oc exec pod/router-default-545ffb97db-4h9rx -- $command
kubectl exec --stdin --tty shell-demo -- /bin/bash

execute command on all nodes

oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo $HOSTNAME && chronyc sources'

execute command in all containers

oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' -A | while read NAMESPACE POD ; do
  for CONTAINER in $(oc get -n $NAMESPACE pod/$POD -o json | jq -r '.spec.containers[].name') ; do
    echo '***' $NAMESPACE $POD $CONTAINER
    echo $(oc exec -c $CONTAINER -n $NAMESPACE $POD -- curl -m1 -skv https://inter.net 2>&1 | tr -d '\n')
  done
done | tee /tmp/$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S)

where am i

POD_NAME=rook-ceph-operator-6c86f788d5-f8mqf
POD_NAMESPACE=openshift-storage

describe pods

oc describe pods
oc describe pod stage-sales-62-qjd

To get (almost) all object with a specific label from the current project, execute:

oc get all -l '<label_name>=<label_value>'
oc get pods -n openshift-storage -o name -l app=rook-ceph-operator

get config from pod in yaml format

oc get pods router-default-545ffb97db-kgsdb -o yaml

get deployments

oc get deployments --all-namespaces

set environment variable in pod

oc set env dc/your-app-name COLOR=blue

unset environment variable in pod

oc set env dc/your-app-name COLOR-

list environment variables

oc set env pod/router-default-545ffb97db-lj2t5 --list

list templates

oc get templates -n openshift

Custom resource definitions.(crd)

oc get crd

sort

CREATED AT

oc get crd --sort-by=.metadata.creationTimestamp

edit

oc edit deployment.apps/router-default

Watch changes taking place.

watch -n1 oc get all

grant permission to project

oc adm policy add-role-to-user view developer -n mysecrets

grant permission to group

oc adm policy add-cluster-role-to-group cluster-admin admin

grant a user cluster-admin permissions through group

# create a new group.
oc adm groups new cluster-admin
# Bind cluster-admin Role to the Group
oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
# Add user to group
oc adm groups add-users cluster-admin T1.anbj15

grant unrestriced access to service account

oc adm policy add-scc-to-user privileged system:serviceaccount:isilon:isilon-node

which pods use scc?

oc get project -o=custom-columns='NAME:.metadata.name' --no-headers | grep -v openshift | while read NAMESPACE ; do echo '*' $NAMESPACE ; oc get pods -o=custom-columns='NAME:.metadata.name,SCC:.metadata.annotations.openshift\.io\/scc' --no-headers -n $NAMESPACE | grep restricted-v2 ; done
oc get pods --all-namespaces -o=jsonpath='{range .items[*]}{@.metadata.name}{"\t"}{@.metadata.namespace}{"\t"}{@.metadata.annotations.openshift\.io/scc}{"\n"}' | column_tab | less

crictl

List running containers

crictl ps
crictl ps --all | grep -i coredns

List all pods

crictl pods

List all images

crictl images

Execute a command in a running container

crictl exec -it 1f73f2d81bf98 /bin/sh

crictl logs

crictl logs

nsenter

run program in different namespaces

which version

Get version of various objects

oc version

Only get cluster version

oc get clusterversion
oc get clusterversion -o json|jq -r '.items[0].spec| .channel, .desiredUpdate.version'

copy files from pod

Copy session keys locally.

oc rsync caas-2-8s6cl:/tmp/sslkeylog .

tcpdump from nodes

ssh $node
toolbox

rm toolbox

toolbox rm --force <container>

oc get route -A

get routing.

oc describe route sales -n hlt-prod

Name:                   sales
Namespace:              hlt-prod
Created:                13 months ago
Labels:                 <none>
Annotations:            haproxy.router.openshift.io/balance=roundrobin
                        haproxy.router.openshift.io/disable_cookies=true
Requested Host:         sales.prod.bobcat.hlt.se
                           exposed on router default (host apps.ocpprod.lkl.ltkalmar.se) 13 months ago
Path:                   <none>
TLS Termination:        edge
Insecure Policy:        <none>
Endpoint Port:          port-8000-tcp

Service:        sales
Weight:         100 (100%)
Endpoints:      10.160.7.166:8000, 10.160.7.167:8000, 10.160.7.168:8000 + 35 more...

oc get pods (selecting specific pods)

Only name without headers

oc get pods -o custom-columns=POD:.metadata.name --no-headers -A

Describe Failing pods.

oc get pods -A --field-selector=status.phase=Failed --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo '*' $POD ${NAME_SPACE} ; oc describe pod $POD -n "${NAME_SPACE}" ; done | less -ISRM

get pod label:s

oc get pods --show-labels

get subscriptions

oc get subscriptions -A

delete subscription

oc delete subscription openshift-gitops-operator -n openshift-operators

get available channels for subscription

oc get PackageManifest $OPERATOR -o json | jq -r '.status.channels[] | .name,.currentCSV'

update channel

oc patch subscriptions -n $NAMESPACE $OPERATOR --type merge -p '{"spec": {"channel": "stable-4.12"}}'

delete clusterserviceversion

oc delete clusterserviceversion openshift-gitops-operator.v1.7.4

whoami

oc whoami
oc config current-context
oc whoami --show-console=true --show-context=true

Which is the console url?

oc whoami --show-console

Which is the api url?

oc whoami --show-server

get instance url

oc get routes -n openshift-console console

create an htpasswd user

kubernetes create htpasswd user

oc create user imageregistry
oc create identity htpasswd:imageregistry
oc create useridentitymapping htpasswd:imageregistry imageregistry

Create user/password to feed kubernetes with.

htpasswd -c -B -b htpasswd imageregistry P@ssW0rd
oc create secret generic htpass-secret --from-file=htpasswd=htpasswd -n openshift-config

Get htpasswd users.

oc get secret htpass-secret -ojsonpath={.data.htpasswd} -n openshift-config | base64 --decode

Enable htpasswd login.

oc edit oauth cluster
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret

look at oauth config.

oc get oauth cluster -o yaml

Create service account.

https://docs.openshift.com/container-platform/4.13/authentication/understanding-and-creating-service-accounts.html

get list of user

oc config view -o jsonpath='{.users[*].name}'

list contexts

oc config get-contexts

use-context

oc config use-context openshift-marketplace/api-abjorklund-01-rbcloud-net:6443/kube:admin

oc explain pv

oc explain pv

oc get configmap cluster-monitoring-config

put node offline

Mark a node as unschedulable.

oc adm cordon node1

Drain a node in preparation for maintenance.

oc adm drain <node> --force --delete-emptydir-data --ignore-daemonsets
oc adm drain <node> --ignore-daemonsets --force --grace-period=30 --delete-local-data
oc adm drain <node> --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets

Mark node as online.

oc adm uncordon node1

Extend memory on node.

# Add memory to master nodes.
NODE=costest-ph9l4-master-1
oc adm cordon $NODE
oc adm drain $NODE --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets
timeout 10 oc debug node/$NODE -- chroot /host sh -c 'echo $HOSTNAME && sudo shutdown -P now'
govc vm.power -off /RGK/vm/costest-ph9l4/$NODE
govc vm.info /RGK/vm/costest-ph9l4/$NODE
govc vm.change -vm /RGK/vm/costest-ph9l4/$NODE -m 20480
govc vm.power -on /RGK/vm/costest-ph9l4/$NODE
oc adm uncordon $NODE
oc adm top nodes -l node-role.kubernetes.io/master

Get pv:s

oc get pv

Sorted by size.

oc  get pv --sort-by=.spec.capacity.storage -A

Get more info about a pv.

oc describe pv $PV

Access modes for pv:s. AccessMode

RWO  - ReadWriteOnce     the volume can be mounted as read-write by a single node
ROX  - ReadOnlyMany      the volume can be mounted read-only by many nodes
RWX  - ReadWriteMany     the volume can be mounted as read-write by many nodes
RWOP - ReadWriteOncePod  the volume can be mounted as read-write by a single Pod.

get pvc:s

oc get pvc --all-namespaces | less

sort by

oc get pvc --sort-by=.spec.resources.requests.storage -A

create pvc

# oc create pvc
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: abjorklund-pvc1 
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
EOF

use pvc. Create pod using pvc

# Create test pod.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: abjorklund-test-pvc-claim1-pod
spec:
  volumes:
    - name: abjorklund-test-pvc
      persistentVolumeClaim:
        claimName: abjorklund-test-pvc
  containers:
    - name: abjorklund-test-pvc
      image: halfface/rockylinux-toolbox:v3
      volumeMounts:
        - mountPath: "/mnt/abjorklund-test-pvc"
          name: abjorklund-test-pvc
      command: ["sleep"]
      args: ["infinity"]
EOF

extend/increase pvc

PVC=postgres-instance1-x5b8-pgdata ;NAMESPACE=rk-cos-prod ; oc patch pvc ${PVC} --type=merge -p '{"spec":{"resources":{"requests":{"storage": "2Gi"}}}}' -n ${NAMESPACE}

which pods are using pvc

oc get pods --all-namespaces -o=json | jq -c '.items[] | {name: .metadata.name, namespace: .metadata.namespace, claimName:.spec.volumes[]? | select( has ("persistentVolumeClaim") ).persistentVolumeClaim.claimName }'

kubectl

List contexts

kubectl config get-contexts

Select context

kubectl config use-context default/api-blabla-halfface-se:6443/kube:admin

permissions

list groups

oc get groups -o wide

list cluserroles

oc get clusterrole --all-namespaces

list clusterrolebindings

oc get crb
oc get clusterrolebindings

scale

oc scale --replicas=2 rc/postgresql-1
oc scale -n abjorklund deployment stress-hm-6x32 --replicas=0
oc scale --replicas=3 machineset <machineset> -n openshift-machine-api

top(disable wikimedia top)

oc adm top pods --use-protocol-buffers --all-namespaces
oc adm top pods --use-protocol-buffers --all-namespaces --sort-by=cpu | head -20| cut -c -200
oc adm top nodes --sort-by=cpu
oc adm top nodes --sort-by=memory

get memory usage of all running pods in MB

oc get pods -o custom-columns=POD:.metadata.name --no-headers --field-selector status.phase=Running| while read POD ; do echo $POD $(( $(oc exec -it $POD -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes </dev/null 2>/dev/null) / 1024 / 1024 )) MB ; done
oc get pods -A -o wide --no-headers --field-selector spec.nodeName=ocp-04-9lxgz-worker-wlw9p  --field-selector status.phase=Running | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sort -k 4 -n| less

Get memory usage per pod on specific node.

NODE=ocp-01-4dfqx-worker-4n6mk ; oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=${NODE},status.phase=Running" | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sed 's/  */\t/g' | sort -k 4 -n | column -t -s $'\t'

get memory usage of all nodes in % of total available ram

oc get nodes -o name | xargs -I % oc debug % -- chroot /host sh -c 'BUFFER=($(free | grep Mem:)) ; echo $HOSTNAME $(( $(( ${BUFFER[1]} - ${BUFFER[6]} )) / $(( ${BUFFER[1]} / 100 )) ))' 2>/dev/null

oc get crd

Get Custom Resource Definitions.

oc get crd

operators

Automatically setup of a instances.

list installed operators

oc get ClusterServiceVersions -A
oc get csv -A
oc get operators -o json | jq -r '.items[].status.components.refs[]?|select(.kind=="ClusterServiceVersion")|.name'

Search all namespaces. Exclude namespace.

oc get csv -A -o=custom-columns='NAME:.metadata.name,VERSION:.spec.version,DISPLAY:.spec.displayName' --no-headers | sort  | uniq

list available operators

oc get packagemanifests

delete operator

Delete via gui. If traces are left. Or unable to install again.

https://access.redhat.com/solutions/6762071 Remove potentially blocking references.
https://access.redhat.com/solutions/7026146 Remove label so operator is not recreated.
oc get operator prometheus.prometheus -o yaml -n openshift-operators | grep -i CustomResourceDefinition -A1     //It will list the CRDs 
currently being referenced by the operator
oc edit crd thanosrulers.monitoring.coreos.com
-----------output truncated------------
  labels:
    operators.coreos.com/prometheus.prometheus: ""                            //Remove this line and then save and exit
# Remove possibly broken jobs.
oc get jobs.batch -n openshift-marketplace | grep -i 0/1
# If job was not broken then remove all references to that operator. Remove jobs and configmaps.
oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("elasticsearch-operator")) | .metadata.name' | while read i ; do echo oc delete job $i -n openshift-marketplace ; echo oc delete configmap $i -n openshift-marketplace ; done

Select channel

oc patch clusterversion version --type merge -p '{"spec": {"channel": "candidate-4.12"}}' # candidate-... channel offers unsupported early access to releases as soon as they are built.
oc patch clusterversion version --type merge -p '{"spec": {"channel": "fast-4.12"}}'      # As soon as version as a general availability (GA) release. Fully supported. Used in production environments.
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.12"}}'    # Delay from fast. Looking at quality from fast. If found good then moved to stable
oc patch clusterversion version --type merge -p '{"spec": {"channel": "eus-4.12"}}'       # Extended Update Support

find if image exitst

oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.38-x86_64

Upgrade to version that you found on github okd

oc adm upgrade --to-image=

oc adm upgrade

Upgrade okd images.

Launch a new instance of a pod for gathering debug information. Compress and deliver in support case

cd /tmp && oc adm must-gather && tar czf /tmp/must-gather.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S).tar.gz must-gather.local.*

Must gather for odf. (get csv -n openshift-storage gives you version to use

cd /tmp && oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:4.10
tar czf /tmp/must-gather.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S).tar.gz must-gather.local.*

oc adm certificate approve <csr_name>

Approve csr certificate

Approve all csr

oc get csr -o go-template='Template:Range .itemsTemplate:If not .statusTemplate:.metadata.nameTemplate:"\n"Template:EndTemplate:End' | xargs oc adm certificate approve
oc get csr -o name | xargs oc adm certificate approve

certmanager

cert-manager design

(  +---------+  )
  (  | Ingress |  ) Optional                                              ACME Only!
  (  +---------+  )
         |                                                     |
         |   +-------------+      +--------------------+       |  +-------+       +-----------+
         |-> | Certificate |----> | CertificateRequest | ----> |  | Order | ----> | Challenge |
             +-------------+      +--------------------+       |  +-------+       +-----------+

look at cert-manager cr

oc api-resources | grep cert | awk '{print $1}' | while read i ; do echo '*' $i ; oc get $i -A ; done

list certificates

oc get certificate -A

list ClusterIssuer

oc get ClusterIssuer -A

list orders by date

oc get orders -n openshift-config --sort-by=.metadata.creationTimestamp

install cmctl

 curl -fsSL https://github.com/cert-manager/cert-manager/releases/latest/download/cmctl-linux-amd64.tar.gz | (cd /usr/local/bin/ ; sudo tar zxf - cmctl)

completion

. <(cmctl completion bash)

renew cert

cmctl renew -n openshift-config cert-api

status of cert

cmctl status certificate -n openshift-ingress le-wildcard-apps-certificate

oc adm release info

# Show information about the cluster's current release
oc adm release info
# Show the source code that comprises a release
oc adm release info 4.2.2 --commit-urls
# Show the source code difference between two releases
oc adm release info 4.2.0 4.2.2 --commits
# Show where the images referenced by the release are located
oc adm release info quay.io/openshift-release-dev/ocp-release:4.2.2 --pullspecs
# Show release info about a release
oc adm release info 4.10.47 --pullspecs

release notes

find changes between ocp versions / release note.

https://access.redhat.com/labs/ocpupgradegraph/update_path
Select source and destination.
At bottom there is graphical display.
Press each bubble and read rhba.

Point releases in the end.

https://docs.openshift.com/container-platform/4.12/release_notes/ocp-4-12-release-notes.html

oc adm node-logs

Look at logs from crio from master nodes.

oc adm node-logs --role master -u crio

Get logs from one node from unit crio

oc adm node-logs abjorklund-01-5tsbc-worker-0-kcr54 -u crio

Look at specific log

oc adm node-logs --role master --path=openshift-apiserver/audit.log

List logs

oc adm node-logs --role=master --path=/

List logs from specific node.

oc adm node-logs nord-ic-bc84t-master-0 --path=/oauth-server/

Logs since older reboots

oc adm node-logs --role=master --path=/

Search recursive where log file exist.

oc_debug_run_command_all_nodes 'find /var/log 2>&1 | grep <name_pod>'

openshift upgrade path

https://access.redhat.com/labs/ocpupgradegraph/update_path?channel=stable-4.9&arch=x86_64&is_show_hot_fix=false&current_ocp_version=4.9.15&target_ocp_version=4.10.11

Upgrade openshift/okd

https://docs.okd.io/latest/updating/preparing_for_updates/updating-cluster-prepare.html

Run below and look to se if api:s that are being removed has a count.

oc get apirequestcounts

upgrade openshift

# look for existing alerts.
# look for troublesome pods.
oc get pods -A  | grep -Ev ' Running | Completed '
# Set channel
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.10"}}'
oc adm upgrade --to=4.10.47
oc get clusterversion -o json|jq ".items[0].spec"
# View openshift version history.
oc get clusterversion -o json | jq -r '.items[0].status.history[] |  [.version, .startedTime, .completionTime] | join(" ")'
# View progress of update.
watch -n1 oc whoami --show-console \; oc adm upgrade
watch -cn1 "oc get clusteroperators | grep --color=always -E \"$(oc get clusterversions.config.openshift.io version -o json | jq -r .status.desired.version)|\""
# Upgrade all operators
oc get installplan -A | grep Manual | grep false
oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}'

upgrade okd

Get upgrade path. Look here to find latest version https://github.com/okd-project/okd/releases

(cd /usr/local/bin/ ; sudo curl -s -O https://gist.githubusercontent.com/Goose29/ca7debd6aec7d1a4959faa2d1b661d93/raw/4584d89c49d4af197480539bdd873f6d9ca2dd83/upgrade-path.py ; sudo chmod 755 upgrade-path.py ) && (curl -sH 'Accept:application/json' 'https://amd64.origin.releases.ci.openshift.org/graph?channel=stable-4' | upgrade-path.py 4.13.0-0.okd-2023-07-23-051208 4.14.0-0.okd-2024-01-26-175629 )

To view status of update process run. Command is harmless and gives information about ongoing process and blockers.

oc adm upgrade
watch -cn1 "oc whoami --show-console ; echo ; oc get clusteroperators | grep --color=always -E \"$(oc get clusterversions.config.openshift.io version -o json|jq -r '.spec.desiredUpdate.version')|\""

To get slightly other view. VERSION column gives information about version. When update is done all cluster operators will have same version number.

oc get clusteroperators

Make a report of cluster status before installing. To rule out issues that you have not caused. https://halfface.se/wiki/index.php/Openshift#status_of_kubernetes

"status of kubernetes" below.

Look for api:s that are used that are flagged for being removed.

oc get apirequestcounts

Upgrade okd until there are no more updates or you have reached wanted version.

oc adm upgrade --to-latest=true --allow-explicit-upgrade

If complaining about cert.

oc patch --type='merge' --patch='{"spec":{"desiredUpdate":{"force":true}}}' clusterversion version

If client want specific version pinpoint that.

oc adm upgrade --to=<version from oc adm upgrade> --allow-explicit-upgrade

oc adm upgrade gives: Upgradeable=False Reason: AdminAckRequiredn Follow instructions from link. Command will be something like below.

oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-<version>-kube-<version>-api-removals-in-<version>":"true"}}' --type=merge

status of kubernetes

Get pods that are less than perfekt.

oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done

Get critical alerts.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="critical")'

Get warning alerts.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")'

upgrade odf

# View existing config. 
oc get subscriptions -n openshift-storage odf-operator -o yaml
# Patch subscription
oc patch subscriptions -n openshift-storage odf-operator --type merge -p '{"spec": {"channel": "stable-4.10"}}'
# Get install plans
oc get installplan -n  openshift-storage -o wide
# Approve install plan.
oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done

odf troubleshooting

# ceph problem.  Run commands from rook-ceph-operator
oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator)
export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'
ceph -s
ceph osd pool ls
ceph osd pool autoscale-status
ceph config dump
# disable autoscaling
ceph osd pool ls | while read i ; do echo '*' $i ; ceph osd pool set $i pg_autoscale_mode off ; done
# Look to see how much data is being used for pg:s.
# Number of PGLog Entries, size of PGLog data in megabytes, and Average size of each PGLog item
for i in 0 1 2 ; do echo '*' $i ; osdid=$i ; ceph tell osd.$osdid dump_mempools | jq -r '.mempool.by_pool.osd_pglog | [ .items, .bytes /1024/1024, .bytes / .items ] | @csv' ;done
ceph df

cronjobs

oc get cj
oc get cronjobs -o wide -A

Run cronjob manually

oc create job -n ldap-sync --from=cronjob/ldap-sync ldap-sync-manual-$(date '+%Y-%m-%d-%H-%M-%S')

Disable cronjob

.spec.suspend: true

Enable cronjob

oc patch cronjobs.batch write-to-nfs --type merge -p '{"spec": {"suspend": false}}'

delete po (stop, kill)

stop pod

oc delete po --all --force
oc delete pod openshift-gitops-server --namespace openshift-gitops
oc delete pods -n openshift-oauth-apiserver --all
oc get po -A | grep -v ^NAME | awk '$4 !~ /Running/' | sort -k4 | while read NAMESPACE POD READY STATUS END ; do echo '****' $POD $STATUS ; echo oc delete po $POD -n $NAMESPACE --force --grace-period=0 ; done
oc get pods -A --field-selector=status.phase!=Running --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
(oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
# Delete pods and generate report on what has been removed.
LOG=/tmp/oc_delete_pod_$(oc config current-context | awk -F '/|:' '{print $2}').$(date '+%Y-%m-%d_%H-%M-%S').log ; (oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done | tee $LOG ; awk -F\" '{print $2}' $LOG | sed 's/-[a-z0-9]*$//g'| sed 's/-[a-z0-9]*$//g' | sort | uniq -c | sort -n | tail -20

use other namespace

oc rsh  --namespace namespace-name pod-name
oc rsh --namespace namespace-name-operator pod-name bash -c 'echo $PATH $HOSTNAME'

list namespaces

oc get namespace

use namespace

oc rsh  --namespace openshift-gitops openshift-gitops-application-controller-0

kubectl get netnamespace

Command line utility used to configure network. Egress address can be used to define outgoing address. Which can also cause other issues.

oc get netnamespace openshift-gitops -oyaml

oc get routes

oc get routes --namespace openshift-gitops

oc get oauth

Describe authentication methods.

oc get oauth cluster -o yaml

decode token. base64

https://jwt.io/

view secrets

oc get secret ca-key-pair -o go-template='Template:Range $k,$v := .dataTemplate:"Template:$kTemplate:"\n"Template:$vTemplate:"\n\n"Template:End'

delete cluster

openshift-install destroy cluster

storageclasses(sc)

oc get storageclasses

get storageclasses defined as default

oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name'

set default storageclass

# Set all sc to default false.
oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name' | while read i ; do echo '*' $i ; oc patch storageclass $i -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'; done
# Set default storageclass.
oc patch storageclass ocs-storagecluster-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

get service accounts

oc get serviceaccounts -A
oc get sa -A

which permissions do I have

oc auth can-i --as=fjuza --list
oc get groups -o wide
oc auth can-i --as-group=<group> --list

alerts

How is alertmanager configured

oc get secret -n openshift-monitoring alertmanager-main -o json | jq -r '.data."alertmanager.yaml"|@base64d'

Save alertmanger config

oc get secret alertmanager-main -n openshift-monitoring --template='{{index .data "alertmanager.yaml" | base64decode}}' > /tmp/oc_get_secret_alertmanager-main.alertmanager.yaml.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
oc extract secret/alertmanager-main --confirm -n openshift-monitoring

Restore alertmanager config

oc set data secret alertmanager-main -n openshift-monitoring --from-file=alertmanager.yaml=<file_alertmanager.yaml>

alertmanager

View Alertmanager configured alerts.

oc get prometheusrules -A -o yaml | grep alert: | sort

View configuration of alert

oc get prometheusrules -A -o json | jq '.items[].spec.groups[].rules[]| select(.alert=="AlertmanagerReceiversNotConfigured")'

view alerts.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq . | less -ISRM

View specific alert.

oc rsh -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl 'http://localhost:9090/api/v1/query?query=absent%28up%7Bjob%3D"fluentd"%7D+%3D%3D+1%29' | jq .

View alerts in state firing

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")' | less -ISRM

View alerts in state firing with severity warning

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")' | less -ISRM

View historical alerts.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=2022-08-08T00:00:00.781Z&end=2022-08-09T00:00:00.781Z&step=1m"
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(date '+%Y-%m-%d' --date '-2 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1m" | jq . | less -ISRM

Get warning alerts since the last week.

echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate}| select(.severity=="warning")|select(.alertstate=="firing") | .alertname'

Get more info about fired alerts.

echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate, pod, namespace}| select(.severity=="warning")|select(.alertstate=="firing")'

Get alert during the last 6 days. Give times when alert has fired.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r . | python3 -c "import sys, re, datetime; print(re.sub(r'\b\d{10}\b', lambda x: datetime.datetime.utcfromtimestamp(int(x.group())).isoformat() + 'Z', sys.stdin.read()))" | less -ISRM

disable alermanager alert

oc -n openshift-monitoring exec -ti alertmanager-main-0 -c alertmanager -- amtool silence add --alertmanager.url http://localhost:9093  alertname=AlertmanagerReceiversNotConfigured --end="2053-11-07T00:00:00-00:00" --comment "silence alertmanager"

Silence alertmanager not configured alert

oc set data secret alertmanager-main -n openshift-monitoring --from-file=alertmanager.yaml=<(cat <<'EOF'
"global":
  "resolve_timeout": "5m"
"inhibit_rules":
  - "equal":
      - "namespace"
      - "alertname"
    "source_match":
      "severity": "critical"
    "target_match_re":
      "severity": "warning|info"
  - "equal":
      - "namespace"
      - "alertname"
    "source_match":
      "severity": "warning"
    "target_match_re":
      "severity": "info"
"receivers":
  - "name": "Default"
  - "name": "Watchdog"
  - "name": "Critical"
  - "name": "testrec" # Dummy receiver with webhook config
    "webhook_configs":
      - "url": "http://xxxxdumyxxx.com"
"route":
  "group_by":
    - "namespace"
  "group_interval": "5m"
  "group_wait": "30s"
  "receiver": "Default"
  "repeat_interval": "12h"
  "routes":
    - "match":
        "alertname": "dummyalert" # Dummy alert being routed to dummy receiver
      "receiver": "testrec"
EOF
)

prometheus

Url to web interface.

https://prometheus-k8s-openshift-monitoring.apps.<url>
echo https://prometheus-k8s-openshift-monitoring.$(oc whoami --show-console | awk -F 'console-openshift-console.' '{print $2}')
echo https://$(oc get route -n openshift-monitoring prometheus-k8s -o jsonpath="{.spec.host}")

Get disk usage from odf

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=odf_system_raw_capacity_used_bytes" | jq -r .

Get disk usage from odf over time.(metrics)

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=odf_system_raw_capacity_used_bytes&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq . | less -ISRM

Search tips

https://prometheus.io/docs/prometheus/latest/querying/basics/

Disk usage per project. Taken from RH ticket.

oc -n openshift-monitoring exec prometheus-k8s-0 -c prometheus -- curl -s -g 'http://localhost:9090/api/v1/query?' --data-urlencode 'query=(sort_desc(topk(25,(sum(kubelet_volume_stats_used_bytes * on (namespace,persistentvolumeclaim) group_left(storageclass, provisioner) (kube_persistentvolumeclaim_info * on (storageclass)  group_left(provisioner) kube_storageclass_info {provisioner=~"(.*cephfs.csi.ceph.com)"})) by (namespace)))))'

openshift-user-workload-monitoring

  "annotations": {
    "description": "Prometheus operator in openshift-user-workload-monitoring namespace rejected 2 prometheus/ServiceMonitor resources.",
    "summary": "Resources rejected by Prometheus operator"
  },...
# Look at what is causing.
oc logs -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring
# After tweaking with monitoring settings kill pod and view log.
oc delete pod -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring
oc logs -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring | less
# Stop monitoring.
oc label namespace openshift-local-storage openshift.io/cluster-monitoring-
oc label namespace openshift-local-storage openshift.io/user-monitoring=false
# Allow monitoring.
oc label namespace openshift-operators openshift.io/cluster-monitoring=true

Talk to api with Bearer.

HOST=$(oc -n openshift-monitoring get route alertmanager-main -ojsonpath={.spec.host})
TOKEN=$(oc whoami -t)
curl -skH "Authorization: Bearer $TOKEN" "https://$HOST/api/v2/alerts" | jq .

token

token=`oc sa get-token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.10 or lower ---

OR

token=`oc create token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.11 or higher ---

curl using token

curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main-openshift-monitoring.apps.domain/api/v1/alerts' |  jq '.data[].labels'

ServiceMonitor

Prometheus Operator:

When using Prometheus Operator, custom resources like ServiceMonitor and PodMonitor might include metricsConfig settings to customize how Prometheus should scrape metrics from various services or pods.

bash completion

. <(oc completion bash)

machineconfig

view settings

oc describe machineconfigpool

set ntp servers

echo 'variant: openshift
version: 4.9.0
metadata:
  name: 99-master-chrony 
  labels:
    machineconfiguration.openshift.io/role: master 
storage:
  files:
  - path: /etc/chrony.conf
    mode: 0644 
    overwrite: true
    contents:
      inline: |
        server ntp.lio.se iburst
        driftfile /var/lib/chrony/drift
        makestep 1.0 3
        rtcsync
        logdir /var/log/chrony' | butane | oc apply -f -

get machineconfig value

oc get mc 00-master -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig' | sed 's/^data:,//g' | jq .

List machineconfigs by creation time

oc get mc --sort-by=.metadata.creationTimestamp

get users

oc get users

give me kubeadmin ecrypted password

oc get secret kubeadmin -n kube-system -o json  -o=jsonpath='{.data.kubeadmin}' | base64 -d

Give kubeadmin a new password

generate password hash

htpasswd -bnBC 10 "" '<password>' | tr -d ':\n' | base64 -w0

patch password hash

oc patch secret/kubeadmin -n kube-system -p '{"data": {"kubeadmin": "UGFzc3dvcmQK=="}}'

work with oc without login

export KUBECONFIG=/var/lib/kubelet/kubeconfig

if on bootstrap node.

export KUBECONFIG=/etc/kubernetes/kubeconfig

Add the following if cert is not trusted.ssl/tls

- cluster:
    insecure-skip-tls-verify: true
    server: https://127.0.0.1:443
  name: my-cluster

run oc when on node

oc get pod -n openshift-monitoring --kubeconfig=/var/lib/kubelet/kubeconfig

etcdctl

oc rsh -c etcdctl -n openshift-etcd $(oc get pod -l app=etcd -oname -n openshift-etcd | awk -F"/" 'NR==1{ print $2 }')
[root@ocp-03-lm8km-master-1 /]# etcdctl --write-out=table endpoint status
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| htt://172.19.14.36:2379   | c4f7b42b92713818 |   3.5.0 |  105 MB |     false |      false |         6 |    2632074 |            2632074 |        |
| htt://172.19.14.37:2379   | 5dea668b432969fc |   3.5.0 |  105 MB |     false |      false |         6 |    2632074 |            2632074 |        |
| htt://172.19.14.41:2379   | 51cecd971b657ee5 |   3.5.0 |  105 MB |      true |      false |         6 |    2632074 |            2632074 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

create troubleshooting/debug/test pod

oc run abjorklund-redhat-ubi8 --image=redhat/ubi8 -i --tty -- sh
oc run abjorklund-curlimage-curl --image=curlimages/curl -i --tty -- sh
oc run -it busybox --image=busybox --restart=Never -- ash
oc run abjorklund-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash # dnf -y install procps-ng iproute
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux --restart=Never --command sleep infinity

install packages to get running

yum install -y lsof procps-ng bind-utils

proxy settings

oc get proxy cluster -o yaml

Change ca

oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'

oc proxy

Run a proxy to the Kubernetes API server

port forward to pod

oc port-forward <my-pod-name> <local-port>:<remote-port>

alertmanager

kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093  # http://localhost:9093/

grafana access.

kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80         # http://localhost:3000 admin prom-operator

prometheus access.

kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090    # http://localhost:9090

Install additional ca certificate

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-company-ca-cert
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURrVENDQW5tZ0F3SUJBZ0lFSC93Skh6QU5CZ2txaGtpRzl3MEJBUXNGQURBM01SVXdFd1lEVlFRS0RBeFMKUlVSQ1VrbEVSMFV1VTBVeEhqQWNCZ05WQkFNTUZVTmxjblJwWm1sallYUmxJRUYxZEdodmNtbDBlVEFlRncweQpNVEF5TWpNd056RTVOVFphRncwME1UQXlNak13TnpFNU5UWmFNRGN4RlRBVEJnTlZCQW9NREZKRlJFSlNTVVJIClJTNVRSVEVlTUJ3R0ExVUVBd3dWUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5YVhSNU1JSUJJakFOQmdrcWhraUcKOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQW5mY1F3YURwcEdzNWJxaUc5ajE5aFJVaG1sMzhjb2JGT2tzRQpsZFo3Y3RkV1d6VHJqSTFCRGxZSEd5SXBYMEo4ZU1PaDhvbUZqbVR6VTEzTkpWSnJrWm5RaDRhTzA1UGtKRlJRCkg1ZVA2N3R0S2pEb0txOFZVWXRZUldxRlFaalNxY2lQMzJobXZSNG42QVZDWDdCaUVBZjd2Y05ZVys0a1k5OUsKbTluV1BNbEpGU056M1puRnlWc1BtR1ZWeVN2RmFVL0dBTmt1Z25uSGdUM1VUUTNsc2NidU5keUpBcVEya3dHSwpKbkdZKzBSajVrUWpvdXptUjBDZ3pJN0hWSmhwK2Z6R1lyenRYQXA1Zkt0Z3ZTZFRtTndVVXZJR3pLTmU4WklGCmY0WVVUUDFPdU9jUmNIRDJQclVodDgzWlRLYzNwOUhLYk5CazIzWFFtYU85QVBqeEl3SURBUUFCbzRHa01JR2gKTUI4R0ExVWRJd1FZTUJhQUZMbWFrNHdDamtuakZvWkd6M1daRGErY2N4RGxNQjBHQTFVZERnUVdCQlM1bXBPTQpBbzVKNHhhR1JzOTFtUTJ2bkhNUTVUQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01BNEdBMVVkRHdFQi93UUVBd0lCCnhqQStCZ2dyQmdFRkJRY0JBUVF5TURBd0xnWUlLd1lCQlFVSE1BR0dJbWgwZEhBNkx5OXBjR0V0WTJFdWNtVmsKWW5KcFpHZGxMbk5sTDJOaEwyOWpjM0F3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQURabURvUytJY1ZMcERBRwpiSXM0SWRJKzcxY0xINk90NjNkYWhBT25QRDJnMUhvVUFIZFdUcGdobER3TkFQWjg3UXQybFc4Q1B4eDhCQVZOCnlrZWlEN2paeVA5dmVCcDRxNjBiSTVYSENndWV5U2lGdjBBKzloKzMzekMrYy9WbStJVHJNTkZ0dlZMNE1kRWQKaVE4UVBhaFJEWW1qVkJVb1VIZWErMDdkWEY3TzQxY2t2YzZRb0lad2F5Y1Zhc0gvd05lVGNrdzl1TlNiajNTQwoyNHdpOUthQnpxdDZsWlF3TG5uUjVnNjNWUDZNZUprR2FXMTBxdExiQVM4NGZwQ1NWTUx3U051MGZqeFU2d2lPCkRjaWlKKzNZOG5ldjM5NGJHRkwxcG5ZVmM4YmpoL0xaaHM1dTRQUnhlNFBLRER2Y09NZUhpUkN1M1YySWRRTTgKbDl3enBQZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQoK
        mode: 0644
        overwrite: true
        path: /etc/pki/ca-trust/source/anchors/company-ca.crt

get raw api data

oc get --raw "/api/v1/nodes/[node]/proxy/stats/summary"

Via proxy.

oc proxy &
Starting to serve on 127.0.0.1:8001
curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/stats/summary
curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/metrics/resource

explain

Get documentation for a resource. Get available attributes for an resource.

oc explain deployment

events

Get events.

oc get events -A --sort-by=.metadata.creationTimestamp

jsonpath

Get names of MachineConfigs one value per line.

oc get mc -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' --no-headers

ImageStreamTag

ImageStreamTag represents an Image that is retrieved by tag name from an ImageStream.

imagestream

apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  name: myapp

Tagging Images: When you tag an image, it is added to the ImageStream with a specified tag.

oc tag myregistry/myapp:latest myapp:latest

Using ImageStreams in Deployment Configurations: Deployment configurations can reference ImageStreams instead of direct image URLs.

apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
        - name: myapp
          image: image-registry.openshift-image-registry.svc:5000/myproject/myapp:latest

BuildConfig

Build configurations define a build process for new container images.

download okd openshift-install

# Show latest.
curl -skL https://github.com/okd-project/okd/releases | elinks --dump | sed 's/^ *//g' | grep " Latest"
# Download and install in /usr/local/bin. Keep old versions.
export OKD_VERSION=4.15.0-0.okd-2024-03-10-010116 ; (cd /temp/ ; oc adm release extract --tools quay.io/openshift/okd:${OKD_VERSION} ; cd /usr/local/bin/ ; sudo tar xf /temp/openshift-install-linux-${OKD_VERSION}.tar.gz openshift-install ; sudo mv openshift-install openshift-install.${OKD_VERSION})

setup openshift cluster

Download binary

cd /tmp/ ; curl -L -O https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.10.47/openshift-install-linux.tar.gz && sudo tar xf openshift-install-linux.tar.gz -C /usr/local/bin/

Add vmware certs if using that backend.

(cd /tmp/ ; curl -sk https://${vspherer_server}/certs/download.zip -O) ; cd /etc/pki/ca-trust/source/anchors ; sudo unzip -oj /tmp/download.zip certs/lin/\* ; sudo update-ca-trust

Create config file

install-config.yaml

Then fire off install

openshift-install create cluster

Another example

ln -s install-config.yaml.2023-03-23 install-config.yaml
./openshift-install-4.12.0-0.okd-2023-04-16-041331 create cluster

Edit install config after setup

Save config

 oc get cm cluster-config-v1 -n kube-system --template='{{index .data "install-config" }}' > /tmp/cm_cluster-config-v1_-n_kube-system.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')

Edit downloaded file and apply edited file.

oc set data cm cluster-config-v1 -n kube-system --from-file=install-config=/tmp/cm_cluster-config-v1_-n_kube-system.<suitable_name>

look at install settings

oc get -n kube-system cm/cluster-config-v1 -o yaml

argocd login

argocd login openshift-gitops-server-openshift-gitops.apps.costest.ltkronoberg.se --username kubeadmin --password asdfasfasdfas --sso --insecure
argocd login $(oc get routes -n openshift-gitops openshift-gitops-server -o json | jq -r .spec.host) --username $USER --password $COMPANY_PASSWORD --sso --insecure

git sync heal

argocd app list | grep -v NAME | awk '{print $1}' | while read i ; do echo '*' $i ; argocd app set $i --self-heal ; done

metrics

Get available values

Thanos monitoring points

curl -sk -H "Authorization: Bearer $(oc whoami -t)" https://$(oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}')/api/v1/metadata | jq .

node-exporter

oc --request-timeout=3 -n openshift-monitoring exec -c node-exporter $(oc get pod -n openshift-monitoring -l app.kubernetes.io/name=node-exporter -o=custom-columns='NAME:.metadata.name' --no-headers | head -1) -- curl -s 'http://localhost:9100/metrics' | grep -vE "^#|^$"

Cpu usage per node.

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30m])) * 100)
instance:node_cpu_utilisation:rate1m{job="node-exporter",  cluster=""} != 0
instance:node_cpu_utilisation:rate1m{job="node-exporter"} != 0

iowait

avg by (instance) (irate(node_cpu_seconds_total{mode="iowait"}[30m]))

namespace

cpu usage per namespace.

sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=""}) by (namespace)

load

Load 1 graph

instance:node_load1_per_cpu:ratio{job="node-exporter", cluster=""} != 0

usage for pvc

kubelet_volume_stats_used_bytes
kubelet_volume_stats_available_bytes
kubelet_volume_stats_used_bytes{persistentvolumeclaim="prometheus-prometheus-k8s-1"}

Memory usage

Memory usage of node.

instance:node_memory_utilisation:ratio
node_memory_MemAvailable_bytes
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

OOMKilled

sum by (namespace, pod) (kube_pod_container_status_restarts_total) * on(namespace, pod) group_left(reason) kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=sum%20by%20(namespace,%20pod)%20(kube_pod_container_status_restarts_total)%20*%20on(namespace,%20pod)%20group_left(reason)%20kube_pod_container_status_last_terminated_reason%7Breason%3D%22OOMKilled%22%7D&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq .

uptime

oc exec -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/query?query=time%28%29%20-%20node_boot_time_seconds%7Bjob%3D%22node-exporter%22%7D%0A' | jq -r '.data.result[]|.metric.instance +"\t"+ (.value[1] | tonumber | floor | tostring)' | column_tab

install oc and kubectl

curl -fsSL https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest/openshift-client-linux.tar.gz | (cd /usr/local/bin/ ; sudo tar zxf - oc kubectl )

time and timezone in first pod(date)

oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' -A | grep -v cert-manager | head -1 | while read NAMESPACE POD ; do oc rsh -n $NAMESPACE $POD  bash -c 'date "+%Y-%m-%d %H:%M:%S %Z"' 2>/dev/null ; done

oc get installplan

InstallPlan defines the installation of a set of operators.

oc get installplan install-bk8hw -n openshift-operators -o yaml

Approve all manual updates.

oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done

Get selected info from all installplans

oc get installplans.operators.coreos.com -A --no-headers -o=custom-columns='DATE:.metadata.creationTimestamp,NAME:.metadata.name,PHASE:.status.phase,CSV:.spec.clusterServiceVersionNames,NAMESPACE:.metadata.namespace'  --sort-by=.metadata.creationTimestamp

oc extract

Extract secrets or config maps to disk

# Extract only the key "nginx.conf" from config map "nginx" to the /tmp directory
oc extract configmap/nginx --to=/tmp --keys=nginx.conf

dependencies,owner

Search in output from

oc describe ...

Search for this.

Controlled By:  ReplicaSet/rook-ceph-osd-0-6dcdc7fb48

metadata.ownerReferences

Define object that owns object

nodeAffinity

Pin pod to node with label (kubectl label nodes <your-node-name> disktype=ssd)

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
           - ssd

Add user to group

oc adm groups add-users openshift-admins rb_janitor

api-int

api-int.<fqdn>
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; curl -skI https://${i%%:*}:${i##*:} 2>&1 | xargs ; done | cut -c -150
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; set -x ; curl -skv https://${i%%:*}:${i##*:} -o /dev/null 2>&1 | grep "Server certificate:" -A5 ; set +x ; done | cut -c -150

test talk to api-int

CACERT=/tmp/%var%lib%kubelet%kubeconfig%certificate-authority-data ; grep certificate-authority-data: /var/lib/kubelet/kubeconfig | awk '{print $2}' | base64 -d > /$CACERT ; curl -s --key /var/lib/kubelet/pki/kubelet-client-current.pem --cert /var/lib/kubelet/pki/kubelet-client-current.pem --cacert $CACERT -XGET "$(grep server /etc/kubernetes/kubeconfig | awk '{print $2}')/api/v1/namespaces/default/pods?limit=500"

api urls

kubernetes generic:                    reference to the Kubernetes API server.
kubernetes.default:                    reference to the Kubernetes API server within the "default" namespace.
kubernetes.default.svc:                refers to the Kubernetes service within the "default" namespace.
kubernetes.default.svc.cluster.local:  This is the fully-qualified domain name (FQDN) for the Kubernetes service within the "default" namespace.
openshift:                             Similar to "kubernetes," this is a generic reference to the OpenShift API server.
openshift.default:                     reference to the OpenShift API server within the "default" namespace.
openshift.default.svc:                 refers to the OpenShift service within the "default" namespace.
openshift.default.svc.cluster.local:   fully-qualified domain name (FQDN) for the OpenShift service within the "default" namespace.

okd setup fix

# On bootstrap node. Could work on all clusters. First a test to se if it work already.
DOMAIN=$(grep " baseDomain: " /etc/mcc/bootstrap/cluster-dns-02-config.yml | awk '{print $2}')
for i in api-int api ; do ping -c1 -W1 $i.${DOMAIN} 2>&1 | xargs; done | cut -c -150 
echo "10.1.0.5 api-int.${DOMAIN} api.${DOMAIN}" >> /etc/hosts

oc annotate

Update the annotations on one or more resources.

oc annotate pods foo description='my frontend'

setuid setgid

  securityContext:
    runAsUser: 10004000
    runAsGroup: 10004000

patch examples

Look at oc get ... -o json and copy line after line.

oc patch redis redis-standalone --type merge  --patch '{"spec": {"securityContext": {"runAsGroup": 1000400000}}}'

Enable disable clusterlogging # Unmanaged/Managed

oc patch clusterlogging -n openshift-logging instance --type merge -p '{"spec": {"managementState": "Unmanaged"}}' 

Enable disable elasticsearch

oc patch elasticsearch -n openshift-logging elasticsearch --type merge -p '{"spec": {"managementState": "Unmanaged"}}' # Unmanaged/Managed

finalizers

Remove finalizers from pod.

oc patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":null}}'

Add finalizer

oc patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":["kubernetes.io/pvc-protection"]}}'

Replace finalizers value with this.

oc patch pod <pod> -n <namespace> --type merge -p '{"metadata":{"finalizers":["kubernetes.io/pvc-protection","kubernetes"]}}'

edit text/cert entry

#!/bin/bash
SSL_URL=halfface.se
SSL_PORT=443
DATE_FILE=$(date +%F_%H-%M-%S)
openssl s_client -connect ${SSL_URL}:${SSL_PORT} -servername ${SSL_URL} -verify 5 -showcerts -certform pem </dev/null 2>/dev/null | sed -n '/^----/,/^----/p' > chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem
ln chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem ${SSL_URL}
oc create cm argocd-tls-certs-cm -n argocd --from-file ${SSL_URL} --dry-run=client -o yaml >> /tmp/chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem.patch
oc patch configmap argocd-tls-certs-cm -n argocd --patch-file /tmp/chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem.patch

limits

When your need to increase your cpu and memory resources. cpu limit is either written as a number. 0.5 for half a cpu. Or rather a definition in milli. 500m for half a cpu.

spec:
  containers:
...
   resources:
     limits:
       cpu: "2"
       memory: 5Gi
     requests:
       cpu: "2"
       memory: 5Gi

quotas on cpu memory pvc... per project

oc get ResourceQuota

tolerations|node selectors|...

oc describe pod

Node-Selectors:              node-role.kubernetes.io/app=
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
                             node.ocs.openshift.io/storage=true:NoSchedule

enable monitoring

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:  
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 2d
EOF

retention elasticsearch

Edit the ClusterLogging CR to add or modify the retentionPolicy parameter:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
...
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3
...

retention prometheus

Prometheus retention. https://docs.openshift.com/container-platform/4.10/monitoring/configuring-the-monitoring-stack.html#modifying-retention-time-for-prometheus-metrics-data_configuring-the-monitoring-stack
oc edit configmap cluster-monitoring-config -n openshift-monitoring
# Enable prometheus.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 2d
EOF

retention prometheus default

oc get Prometheus k8s -n openshift-monitoring -o json | jq -r .spec.retention
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/status/runtimeinfo" | jq -r '.data.storageRetention'

EFK(elk)

ElasticSearch
# Fluentd
processing pipeline
# Kibana.
https://kibana-openshift-logging.apps.<url>

grafana

# grafana
https://grafana-openshift-monitoring.apps.<url>

pull secret

oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq .

Just the keys.

oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq .

Name of each key and email.

oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq -r '.auths | with_entries(.value = .value.email)' | sed 's/{//g;s/}//g;s/"//g' | grep -v '^$' | sed 's/ *//g' | sort

Download pull secret.

oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' > /tmp/pull_secret.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')

Set pull secret.

oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=/tmp/pull_secret_<file_name>

has pull secret been update

echo '#' pull-secret ; oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' | jq -r '.auths[].email'
echo '#' apiserver ; oc exec deployment/apiserver -n openshift-apiserver -c openshift-apiserver -- cat /var/lib/kubelet/config.json | jq
echo '#' nodes ; oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'cat /var/lib/kubelet/config.json | jq'

Does pull secret work

jq . /tmp/pull_secret.2024-01-10_12-00-01.registry.redhat.io
{
  "auths": {
    "registry.redhat.io": {
      "auth": "YmxhYmxh"
    }
  }
}
podman pull --authfile /tmp/pull_secret.2024-01-10_12-00-01.registry.redhat.io registry.redhat.io/ubi8/ubi:latest

Which pull secret does machineconfig contain

oc get mc 00-master -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig' | sed 's/^data:,//g' | jq .

Is pull secret correct in machineconfigpool. Rendered config

oc get mc rendered-master-3626460c7752fc1605e94c19b7a9aba7 -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | sed 's/^data:,//g' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig'| jq .

change number of nodes

oc get machineset -n openshift-machine-api
oc edit machineset -n openshift-machine-api <MachineSet>

Elasticsearch status

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cat/health?v
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cluster/health?pretty

talk to elasticsearch

oc rsh elasticsearch-cdm-q8apadpa-1-65f99d99b4-8b9wg
curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200

Oneliner

oc exec -n openshift-logging -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers -n openshift-logging | head -1) -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200

which version of elasticsearch operator is installed

oc get csv -n  openshift-operators-redhat -l operators.coreos.com/elasticsearch-operator.openshift-operators-redhat="" -o=custom-columns='VERSION:.spec.version' --no-headers

list nodes

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/nodes?v"

Who is master node

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/master?v"

Is cluster recovering

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/recovery?active_only=true"

Look at all indices

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v

look at chards

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v

Create audit index

oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=audit-000001 -XPUT

Remove all red indices.

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v | grep ^red | awk '{print $3}'  | while read i ; do echo '*' $i ; oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=${i} -X DELETE ; done

vsphere creds

oc get -n kube-system cm/cluster-config-v1 -o yaml

does vsphere account have expected permissions

oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --timestamps --tail=100 | less

Enable openshift/okd logging

Enable redhat-operators

oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": false}]'

Or edit

oc edit operatorhubs 
Spec:
  Disable All Default Sources:  true
  Sources:
    Disabled:  false
    Name:      community-operators
    Disabled:  false
    Name:      redhat-operators

Create namespace

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-operators-redhat 
  annotations:
    openshift.io/node-selector: ""
  labels:
   openshift.io/cluster-monitoring: "true"
EOF

Create namespace

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    openshift.io/node-selector: ""
  labels:
    openshift.io/cluster-monitoring: "true"
EOF

Create operatorgroup

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-operators-redhat
  namespace: openshift-operators-redhat 
spec: {}
EOF

Subscribe to OpenShift Elasticsearch Operator

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: "elasticsearch-operator"
  namespace: "openshift-operators-redhat" 
spec:
  channel: "stable" 
  installPlanApproval: "Automatic" 
  source: "redhat-operators" 
  sourceNamespace: "openshift-marketplace"
  name: "elasticsearch-operator"
EOF

Install the openshift logging operator.

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging 
spec:
  targetNamespaces:
  - openshift-logging 
EOF

Create a subscription object yaml file.

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging 
spec:
  channel: "stable" 
  name: cluster-logging
  source: redhat-operators 
  sourceNamespace: openshift-marketplace
EOF

Create OpenShift Logging instance.

cat <<EOF | oc apply -f -
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  logStore:
    type: "elasticsearch"  
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3 
      storage:
        storageClassName: "standard-csi"
        size: 200G
      resources: 
        limits:
          memory: "16Gi"
      requests:
        memory: "16Gi"
      proxy: 
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"  
    kibana:
      replicas: 1
  collection:
    logs:
      type: "fluentd"  
      fluentd: {}
EOF

telemetry

Restart telemetry.

oc delete pod -n openshift-monitoring -l app.kubernetes.io/component=telemetry-metrics-collector

Update vsphere/openstack creds

oc edit cm cloud-provider-config -n openshift-config
default-datastore = "cl07-2-fc-loc-001"

Get datastore

oc get cm cloud-provider-config -n openshift-config -o json | jq -r .data.config | sed -nr "/^\[Workspace\]/ { :l /^default-datastore[ ]*=/ { s/[^=]*=[ ]*//; p; q;}; n; b l;}"

Manage labels.

Add a label to a node or pod:

oc label node node001.krenger.ch mylabel=myvalue
oc label pod mypod-34-g0f7k mylabel=myvalue

Remove a label (in the example “mylabel”) from a node or pod:

oc label node node001.krenger.ch mylabel-
oc label pod mypod-34-g0f7k mylabel-

Permanently label a node

oc edit machineset ocp-qz7hf-worker-us-west-1b -n openshift-machine-api

rollout

Restart pod in an deployment

oc rollout restart deployment -n openshift-storage csi-rbdplugin-provisioner

api.<URL>

openssl_x509_multi_line <(oc get secrets external-loadbalancer-serving-certkey -n openshift-kube-apiserver -o json | jq -r '.data."tls.crt"|@base64d')

ssl certificates replace

How to replace api.<url> and star.apps.<url> certs.

# api. Create full chain cert. Public - intermediate - root ca.
api.<url>.crt
api.<url>.key
# create secret
oc delete secret api-cert -n openshift-config
oc create secret tls api-cert --cert=api.<url>.crt --key=api.<url>.key -n openshift-config
# patch apiserver
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<url>"], "servingCertificate": {"name": "api-cert"}}]}}}'
...
# star.apps. Create full chain cert. Public - intermediate - root ca.
star.apps.<url>.crt
star.apps.<url>.key
# create secret
oc delete secret custom-certs-default -n openshift-ingress
oc create secret tls custom-certs-default --cert=star.apps.<url>.crt --key=star.apps.<url>.key -n openshift-ingress
# patch ingress controller
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'

edit serving certs

look at api cert

oc get secret -n openshift-config $(oc get apiservers cluster -o json | jq -r '.spec.servingCerts.namedCertificates[].servingCertificate.name') -o json | jq -r '.data."tls.crt"' | base64 -d

Patch secret api cert

oc patch secret -n openshift-config $(oc get apiservers cluster -o json | jq -r '.spec.servingCerts.namedCertificates[].servingCertificate.name') -p '{"data":{"tls.crt": "<new-base64-encoded-certificate>"}}'

Look at ingress cert. wildcard.apps.<url>

oc get secret -n openshift-ingress $(oc get -n openshift-ingress-operator ingresscontrollers default -o json | jq -r .spec.defaultCertificate.name) -o json | jq -r '.data."tls.crt"' | base64 -d

Patch secret ingress wildcard.apps.<url>

oc patch secret -n openshift-ingress $(oc get -n openshift-ingress-operator ingresscontrollers default -o json | jq -r .spec.defaultCertificate.name) -p '{"data":{"tls.crt": "<new-base64-encoded-certificate>"}}'

After you update above certificates then the following config map is updated to reflect that

openssl_x509_multi_line <(oc get cm kube-root-ca.crt -o json | jq -r '.data."ca.crt"')

get cluster-id

oc get clusterversion/version -o jsonpath="{.spec.clusterID}"

api

Process running api server. They scale horizontally. They all serve requests.

openshift-kube-apiserver 
kube-apiserver

kube-proxy

kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
kube-proxy uses the operating system packet filtering layer if there is one and it's available. Otherwise, kube-proxy forwards the traffic itself.

Resource Allocation

OS and Kubernetes overhead. You can see the reserved OS & Kubernetes overhead by comparing the Allocatable (what the Kubernetes Scheduler can allocate to Pods) and the Capacity.

Capacity:
->cpu:                4
  ephemeral-storage:  125293548Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
->memory:             16409360Ki
  pods:               250
Allocatable:
->cpu:                3500m
  ephemeral-storage:  114396791822
  hugepages-1Gi:      0
  hugepages-2Mi:      0
->memory:             15258384Ki
  pods:               250

requests/limits

User pod allocation is calculated by looking at the “Requests” resource columns from the kubectl get nodes output. 
The relevant columns here are the “Requests, not Limits. 
Requests impact how the pod is scheduled, and what resources are allocated to it, 
whereas limits are used to enable pods to burst beyond their allocation.

look at current Allocated resources

oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name' | while read NODE ; do oc describe node $NODE | grep "Allocated resources:" -A10 | grep -E ' cpu | memory ' | while read RESOURCE ; do echo $NODE $RESOURCE ; done ; done

empty space

Allocatable - Allocated resources = empty

Allocatable:
  cpu:                3500m
  ephemeral-storage:  114396791822
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             15258384Ki
  pods:               250
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                834m (23%)    0 (0%)
  memory             2474Mi (16%)  736Mi (4%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)

status of namespace

Show an overview of the current project

oc status

age of cluster

Looking at age of machines.

oc get nodes -o json | jq -r '.items[].metadata.creationTimestamp' | sort -n | sed 's/T/ /g;s/Z//g'

oc adm inspect

oc adm inspect namespace/isilon
tar cf /tmp/inspect.isilon.$(date_file ) inspect.local.*

Operations Lifecycle manager(olm)

oc logs -l app=olm-operator -n openshift-operator-lifecycle-manager --tail=-1

Reinstall operator that is no longer available with current openshift version

# Force install odf which is not possible to install because openshift has moved more than 1 version.
# Save subscription 
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do 
oc get subscription -o yaml -l $i > oc_get_subscription_${i//\//_}.yaml ; done
...
# Save operators
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do 
oc get csv -l $i -o yaml > oc_get_csv_-l_${i//\//_}.yaml ; done
...
# Confirm backup files contain usable yaml. Have we forgotten any operators or csv:s. Remove resources clearly not related to odf.
...
# delete the existing ODF related subscriptions and the ClusterServiceVersions related:
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do 
oc delete subscription -l $i; done
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do 
oc delete csv -l $i  ; done
...
# Make sure you wait for the CSVs to be deleted before creating a subscription again.
...
# create only the the Subscription again:
# (optional: edit the subscription before recreate, changing the channel version to the goal version)
...
# Recreate subscription
oc create -f 'oc_get_subscription_operators.coreos.com_odf-operator.openshift-storage=.yaml'
# wait watching the events:
oc get events -w

increase disk on node

Update worker machineset.

oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"rootVolume": {"diskSize" : 50}}}}}}}'

View results from above

oc get machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') -o yaml | tee /tmp/$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}').$(date +%F_%H-%M-%S).yaml

Update on node only

VOLUME=abjorklund-01-h4sxm-worker-0-rkk87-root
os volume set --size 40 $VOLUME --os-volume-api-version 3.42
dnf install cloud-utils-growpart xfsprogs
ssh core@worker
growpart /dev/sda 4
xfs_growfs /

increase ram on worker nodes

oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"memoryMiB" : 24576}}}}}}'

Change flavor of worker node

oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"flavor" : "hm.4x16"}}}}}}'

set number of worker nodes

oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"replicas" : 2}}'

clusteroperator

ClusterOperator is the Custom Resource object which holds the current state of an operator. Clusteroperator is resposible for core, systemwide functions like dns and so on.

oc get clusteroperators
oc get co
oc get clusteroperators -o custom-columns=NAME:.metadata.name,ANNOTATIONS:.metadata.annotations

ignition

Retrieve rendered ignition data.

curl https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $NF}'):22623/config/master
curl -v https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $2}'):22623/config/worker

rockylinux container names

ubi ("Standard"): OpenSSL, microdnf, and utilities like gzip and vi
ubi-minimal ("Minimal"): Minimized binaries and minimal yum stack.
ubi-init ("Multi-service"): Less than standard but more than minimal, plus systemd.
ubi-micro ("Micro"): Most minimal image without even a package manager.

create a job/pod/script

Create config map of script

Notice that I have to escape $. Since I give date in a here document. Where $ is being expanded.

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: dns-lookup.sh
data:
  dns-lookup.sh: |
    #!/bin/bash
    # Verify if dns resolution works and how fast.
    while true ; do
      for DNS in \$(awk '/^nameserver / {print \$2}' /etc/resolv.conf) 10.2.0.10 ; do
        echo \$(date '+%F %H:%M:%S %Z') \$DNS \$(host -v -t A ibm.se 2>&1 | tail -3 )
      done
      sleep 5
    done
EOF

create job

cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: dns-lookup
spec:
  template:
    spec:
      containers:
        - name: dns-lookup
#          image: rockylinux/rockylinux:9
          image: halfface/rockylinux-toolbox:v2
          command: ["/script/dns-lookup.sh"]
          volumeMounts:
            - name: script
              mountPath: "/script"
#          securityContext:
#            runAsUser: 0
#            privileged: true
      volumes:
        - name: script
          configMap:
            name: dns-lookup.sh
            defaultMode: 0755
      restartPolicy: Never
      activeDeadlineSeconds: 1209600
EOF

deployment with command

Configmap with script. $ is escaped since feed via here document.(bash)

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: stress.sh
  namespace: abjorklund
data:
  stress.sh: |
    #!/bin/bash
    # stress pod.
    while true ; do
      echo \$(date '+%F %H:%M:%S %Z') \$( stress -m 1 --vm-bytes 3000M --vm-keep -t 300s )
      sleep 5
    done
EOF

Deployment

cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
#  name: stress-gp-6x12
  name: stress-hm-6x32
  namespace: abjorklund
  labels:
    app: stress
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stress
  template:
    metadata:
      labels:
        app: stress
    spec:
      containers:
      - name: stress
        image: halfface/rockylinux-toolbox:v3
        volumeMounts:
        - mountPath: /mnt/bin/
          name: stress
        command: ["/mnt/bin/stress.sh"]
        resources:
        resources:
          limits:
             cpu: 500m
          requests:
            cpu: 500m
            memory: 1Gi
      volumes:
        - name: stress
          configMap:
            name: stress.sh
            defaultMode: 0755
      nodeSelector:
#        node.kubernetes.io/instance-type: gp.6x12
        node.kubernetes.io/instance-type: hm.6x32
EOF

terminal fix

No line wraps

tput rmam

list operatorhub/catalogsources

oc get catalogsources -n openshift-marketplace
oc get catalogsources -n openshift-marketplace -o custom-columns=NAME:.metadata.name,DISPLAY:.spec.displayName,STATE:.status.connectionState.lastObservedState,TYPE:.spec.sourceType,PUBLISHER:.spec.publisher,IMAGE:.spec.image

remove catalogsources

oc get catalogsources.operators.coreos.com -n openshift-marketplace -l company=cambio --no-headers -o custom-columns=:.metadata.name | while read i ; do echo oc get catalogsources $i -n openshift-marketplace -o yaml \>oc_get_catalogsources.$(oc_api_url).$i.$(date_file).yaml ; echo oc delete catalogsource -n openshift-marketplace $i ; done

which changes will occure

. /etc/node-sizing-enabled.env ; NODE_SIZES_ENV=/tmp/node-sizing.env /usr/local/sbin/dynamic-system-reserved-calc.sh true ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES} ; sdiff /etc/node-sizing.env /tmp/node-sizing.env

SYSTEM_RESERVED

cat <<EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: dynamic-node 
spec:
  autoSizingReserved: true 
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: "" 
EOF

Which changes will occur.

oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'hostname ; . /etc/node-sizing-enabled.env ; NODE_SIZES_ENV=/tmp/node-sizing.env /usr/local/sbin/dynamic-system-reserved-calc.sh true ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES} ; sdiff /etc/node-sizing.env /tmp/node-sizing.env' 2>/dev/null

CNI

oc get networks cluster -o 'custom-columns=NETWORKTYPE:.spec.networkType'

Cni from install

echo -e "$(oc --request-timeout=5 get -n kube-system cm/cluster-config-v1 -o json | jq -r '."data"."install-config"')" | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)' | jq -r .networking.networkType

autoscale.

https://docs.openshift.com/container-platform/4.12/machine_management/applying-autoscaling.html

ClusterAutoscaler

cat <<EOF | oc apply -f -
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  podPriorityThreshold: -10
  resourceLimits:
    maxNodesTotal: 24
    cores:
      min: 8
      max: 128
    memory:
      min: 4
      max: 256
  logVerbosity: 4
  scaleDown:
    enabled: true
    delayAfterAdd: 10m
    delayAfterDelete: 5m
    delayAfterFailure: 30s
    unneededTime: 5m
    utilizationThreshold: "0.4"
EOF

MachineAutoscaler

cat <<EOF | oc apply -f -
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: abjorklund-01-h4sxm-worker-0
  namespace: openshift-machine-api
spec:
  minReplicas: 1
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: abjorklund-01-h4sxm-worker-0
EOF

autoscaler does not scale down

oc logs -l cluster-autoscaler=default -n openshift-machine-api --tail=-1 --timestamps=true

change dns server for domain

oc edit dns.operator/default
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  name: default
spec:
  servers:
  - name: halffce-server
    zones:
    - halfface.se
    forwardPlugin:
      policy: Random
      upstreams: 10.111.222.2
# View config.
oc get configmap/dns-default -n openshift-dns -o yaml

coredns

# tail logs.
oc get events -A --sort-by=.metadata.creationTimestamp
# Change debug level.
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge

Sets log . { class denial error }

oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge

Sets log . { class all }

oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Normal"}}' --type=merge

Sets log . { class error }

Get log files for analyze

oc get pods -l dns.operator.openshift.io/daemonset-dns=default  -o custom-columns=POD:.metadata.name,NODE:.spec.nodeName --no-headers -n openshift-dns | while read i j ; do oc logs $i --tail=-1 -c dns --timestamps=true -n openshift-dns > /tmp/oc_logs_$j.$i.$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}').$(date +%F_%H-%M-%S) ; done

get instance dns name

oc get DNS cluster -o=jsonpath='{.spec.baseDomain}'

Read values provided by coredns /metrics

oc exec -it -n openshift-dns $(oc get pods -l dns.operator.openshift.io/daemonset-dns=default --no-headers -n openshift-dns| head -1) -- curl -s http://localhost:9153/metrics

coredns default logformat

# Default format
{remote}:{port} - {>id} "{type} {class} {name} {proto} {size} {>do} {>bufsize}" {rcode} {>rflags} {rsize} {duration}
# Values explained
{port}: client’s port
{remote}: client’s IP address, for IPv6 addresses these are enclosed in brackets: [::1]
{>id}: query ID
{type}: qtype of the request
{class}: qclass of the request
{name}: qname of the request
{proto}: protocol used (tcp or udp)
{size}: request size in bytes
{>do}: is the EDNS0 DO (DNSSEC OK) bit set in the query
{>bufsize}: the EDNS0 buffer size advertised in the query
{rcode}: response RCODE
{>rflags}: response flags, each set flag will be displayed, e.g. “aa, tc”. This includes the qr bit as well
{rsize}: raw (uncompressed), response size (a client may receive a smaller response)
{duration}: response duration

Confirm that coredns hosts are possible to resolve

 grep match /etc/coredns/Corefile | uniq | sed 's/\[//g;s/\]//g;s/^ *match //g;s/\.\*/test/g;s/^\^//g' | while read i ; do echo $(dig +short ${i}.) ${i}. ; done

Create lets encrypt certificates on dns domain in route53 which is managed by certmanager.

  1. Create a domain in route 53.
  2. Create a user with a token for "Application running outside AWS"

Fill in below values to be able to update config below.

Hosted_Zone_id:    <Hosted_Zone_id>
Access_key:        <Access_key>
Secret_access_key: <Secret_access_key>
DNS_Domain:        <DNS_Domain>
DNS_shortname:     <DNS_shortname>

Attach the following policy to your newly created user.

(Populate all <Values> below.)

{
    "Version": "2023-11-22",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "route53:GetChange",
            "Resource": "arn:aws:route53:::change/*"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ChangeResourceRecordSets",
            "Resource": "arn:aws:route53:::hostedzone/<Hosted_Zone_id>"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ListHostedZonesByName",
            "Resource": "*"
        }
    ]
}

Create namespace

oc create namespace cert-manager

Install cert-manager community version via graphical fluff.

Create secret that includes <Secret_access_key>.

oc create secret generic route53-secret --from-literal=secret-access-key="<Secret_access_key>" -n cert-manager

Create ClusterIssuer for letsencrypt which uses route53 to show that you own dns.

cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod-dns
  namespace: cert-manager
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: support@company.se
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: <DNS_shortname>-issuer-account-key
    solvers:
      - selector:
          dnsZones:
           - "<DNS_Domain>"
        dns01:
          route53:
            accessKeyID: <Access_key>
            secretAccessKeySecretRef:
              name: route53-secret
              key: secret-access-key
            hostedZoneID: <Hosted_Zone_id>
            region: 'us-east-1'
EOF

Create api certificate.

cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cert-api
  namespace: openshift-config
spec:
  issuerRef:
    name: letsencrypt-prod-dns
    kind: ClusterIssuer
  dnsNames:
     - "api.<DNS_Domain>"
  secretName: le-api-cert
  commonName: "api.<DNS_Domain>"
EOF

Start to use api certificate.

oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<DNS_Domain>"], "servingCertificate": {"name": "le-api-cert"}}]}}}'

Create ingress certificate

cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: le-wildcard-apps-certificate
  namespace: openshift-ingress
spec:
  issuerRef:
    name: letsencrypt-prod-dns
    kind: ClusterIssuer
  dnsNames:
    - "*.apps.<DNS_Domain>"
  secretName: le-wildcard-apps-certificate
  commonName: "*.apps.<DNS_Domain>"
EOF

Start to use ingress certificate.

oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"le-wildcard-apps-certificate"}}}'

resolv.conf

ndots 5. This means that the DNS client will automatically consider a domain name to be fully qualified (which will allow it to skip the search path iteration) if it has five or more dots.

bind to external login sources ldap ad

oc get authentications.operator.openshift.io cluster -o yaml

get machine name and creation time

oc get machines -o=custom-columns='NAME:.metadata.name,CREATIONTIMESTAMP:.metadata.creationTimestamp,TYPE:.spec.providerSpec.value.flavor,STATUS:.status.phase' -n openshift-machine-api

setup nfs server

nfs export shared between pods.

Create server

openstack server create --flavor gp.1x2 --availability-zone europe-se-1a --image rocky-8-x86_64 --boot-from-volume 30 --network abjorklund-01-bmc7w-openshift --security-group ssh_allow --key-name abjorklund_ed25519 abjorklund_$(date_file)
openstack volume create --size 50 --type ssd --description "nfs storage block device 0" nfs_storage_abjorklund-01
openstack server add volume e93d2db1-6d95-4364-a236-0bd1b9255e90 28adbdb9-c88d-4397-9a79-b13c505016a8 --device /dev/vdb

install nfs dependencis

dnf -y install cloud-utils-growpart nfs-utils iptables-utils epel-release vim-enhanced

How to grow filesystem.

partx growpart
os volume set --size 60 nfs_storage_abjorklund-01 --os-volume-api-version 3.42

Create partion and disk.

gdisk /dev/sdb
mkfs.ext4 /dev/sdb1
find /dev/ -ls | grep sdb | grep by-uuid

Mount drive. /etc/fstab

UUID=66998126-9f18-44ce-a462-827c870a57bd /netstorage                       ext4     defaults        0 0
mkdir /netstorage
mount /netstorage/
mkdir /netstorage/abjorklund-01
chmod 777 /netstorage/abjorklund-01

export drive

systemctl enable nfs-server.service --now

/etc/exports
/netstorage/abjorklund-01 10.1.0.0/16(rw,root_squash)
exportfs -rav

setup deployment

# deployment.yaml
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  labels:
    app: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/worker
                    operator: Exists
      serviceAccountName: nfs-client-provisioner
      securityContext:
        supplementalGroups:
          - 65534
          - 1261150637
      containers:
        - name: nfs-client-provisioner
          image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: auto-nfs-storage
            - name: NFS_SERVER
              value: 10.1.0.48
            - name: NFS_PATH
              value: "/netstorage/abjorklund-01"
      volumes:
        - name: nfs-client-root
          nfs:
            server: 10.1.0.48
            path: /netstorage/abjorklund-01
EOF
# nfs-clusterrolebinding.yaml
cat <<EOF | oc apply -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
EOF
# nfs-clusterrole.yaml
cat <<EOF | oc apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
EOF
# nfs-rolebinding.yaml
cat <<EOF | oc apply -f -
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io
EOF
# nfs-role.yaml
cat <<EOF | oc apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
EOF
# nfs-sa.yaml
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
EOF
# storageclass.yaml
cat <<EOF | oc apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-nfs-storage
provisioner: auto-nfs-storage # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  onDelete: delete
EOF
# test-claim.yaml
cat <<EOF | oc apply -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim
  namespace: default
spec:
  storageClassName: managed-nfs-storage
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
EOF

set nfs csi driver

https://github.com/kubernetes-csi/csi-driver-nfs

dns

https://access.redhat.com/solutions/3804501

confirm upstream dns works

for UPSTREAM_DNS_IP in 10.46.201.1 10.46.201.2 10.46.201.3 ; do UPSTREAM_DNS_PORT=53 ; echo -e "\nTCP\n"; for dnspod in `oc get pods -n openshift-dns -o name --no-headers -l dns.operator.openshift.io/daemonset-dns=default`; do echo "Pod $dnspod"; oc exec -n openshift-dns -c dns $dnspod -- dig @${UPSTREAM_DNS_IP} redhat.com -p ${UPSTREAM_DNS_PORT} +tcp +short; echo; done ; done
for UPSTREAM_DNS_IP in 10.46.201.1 10.46.201.2 10.46.201.3 ; do UPSTREAM_DNS_PORT=53 ; echo -e "\nUDP\n"; for dnspod in `oc get pods -n openshift-dns -o name --no-headers -l dns.operator.openshift.io/daemonset-dns=default`; do echo "Pod $dnspod"; oc exec -n openshift-dns -c dns $dnspod -- dig @${UPSTREAM_DNS_IP} redhat.com -p ${UPSTREAM_DNS_PORT} +notcp +short; echo; done ; done

image

Which images are ok.

oc get image.config.openshift.io cluster -o yaml

enable sso with keycloak

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  annotations: {}
  labels:
    app.kubernetes.io/instance: sso
  name: cluster
spec:
  identityProviders:
    - mappingMethod: add
      name: SSO
      openID:
        claims:
          email:
            - email
          groups:
            - groups
          name:
            - name
          preferredUsername:
            - preferred_username
        clientID: <Client name in keycloak>
        clientSecret:
          name: keycloak-client-secret
        extraScopes: []
        issuer: <URL to issuer>
      type: OpenID
---
apiVersion: v1
data:
  clientSecret: <base64 secret>
kind: Secret
metadata:
  labels:
    app.kubernetes.io/instance: sso
  name: keycloak-client-secret
  namespace: openshift-config

keepalive/api/ingress

On nodes where nodes server the same ip for api or ingress.

oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo "# unicast_peer" > /etc/keepalived/keepalived.conf'

Get info about where ingress is running.

oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'ip a' 2>&1 | tee /tmp/tmp ; grep $(host $(oc whoami --show-server | awk -F ':|/' '{print $4}') | awk '{print $NF}') /tmp/tmp

diff rendered mc

export OLD_RENDERED=rendered-infra-6c7e5fc796264dd32341950aea971807 ; export NEW_RENDERED=rendered-infra-bac1dd431374a5c4c21742e547739c7c ; diff -NrU 5 <(oc get mc ${OLD_RENDERED} -o json) <(oc get mc ${NEW_RENDERED} -o json)

secret management

List secrets of they type tls.

get secrets --field-selector type=kubernetes.io/tls

ocm

ocm install

(cd /usr/local/bin/ ; sudo curl -vLsk https://github.com/openshift-online/ocm-cli/releases/download/v0.1.72/ocm-linux-amd64 -o ocm ; sudo chmod 755 ocm)

ocm search examples

ocm list clusters --parameter search="name like 'da0d9ade-d649-4948-8bc6-744a1fcb0960'"
ocm get /api/clusters_mgmt/v1/clusters --parameter search="name like '0047ccf6-134b-4bff-99e0-5f2d6532a3ea'"
ocm get /api/accounts_mgmt/v1/subscriptions/ --parameter size=1000 | jq -r '.items[]| .display_name +"\t"+ .status +"\t"+ .cluster_id +"\t"+ .created_at' | grep -v Archived | column_tab

Search for two states.

ocm get /api/accounts_mgmt/v1/subscriptions/ --parameter search="status like 'Active' or status like 'Stale'" --parameter size=1000

PodDisruptionBudget

API object that specifies the minimum number of replicas that must be up at a time.

pod placement

Does it look sane which pods run on worker nodes. Search for pods on worker nodes and look for the same pods on all nodes.

oc get nodes --no-headers --selector='node-role.kubernetes.io/worker,!node-role.kubernetes.io/infra' -o=custom-columns='NAME:.metadata.name' | while read NODE ; do oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=$NODE" | while read NAMESPACE POD REST ; do echo '#' $NAMESPACE ${POD%-*} ; oc get pods -n $NAMESPACE -o wide | grep ${POD%-*} ; done ; done | less -ISRM

Are any user pods running outside worker nodes?

oc get project --no-headers  -o=custom-columns='NAME:.metadata.name' | grep -v ^openshift- | while read NAMESPACE ; do echo '*' $NAMESPACE ; oc get pods -o wide -n $NAMESPACE ; done

wait

Wait for kafka getting ready.

kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka

list configured ssh public keys

oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINECONFIG ; do echo '*' "${MACHINECONFIG}" ; oc get machineconfig "${MACHINECONFIG}" -o json | jq -r '.spec.config.passwd.users[].sshAuthorizedKeys[]'; done

Add key for ssh login

oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINE_CONFIG_SSH ; do echo '*' $MACHINE_CONFIG_SSH ; oc patch machineconfig $MACHINE_CONFIG_SSH --type=json --patch="[{\"op\":\"add\", \"path\":\"/spec/config/passwd/users/0/sshAuthorizedKeys/-\", \"value\":\"$(cat $HOME/.ssh/id_ed25519.pub)\"}]" ; done

With a save.

oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINE_CONFIG_SSH ; do echo '*' $MACHINE_CONFIG_SSH ; oc_script_log oc get machineconfig $MACHINE_CONFIG_SSH -o yaml </dev/null ; oc patch machineconfig $MACHINE_CONFIG_SSH --type=json --patch="[{\"op\":\"add\", \"path\":\"/spec/config/passwd/users/0/sshAuthorizedKeys/-\", \"value\":\"$(cat $HOME/.ssh/id_ed25519.pub)\"}]" ; done

readable output from df.

df -lh | grep -Ev '^overlay|^tmpfs|^shm|^nsfs|^cgroup|^devtmpfs'

give me openstack credentials

oc get secret -n kube-system openstack-credentials -o json | jq -r '.data."clouds.yaml" | @base64d'

extract content of container

CONT_ID=$(docker create nginx:latest)
docker export ${CONT_ID} -o nginx.tar.gz

shut down openshift

Stolen with pride: https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html

# Etcd bacup.
# Do we use proxy.
oc get proxy cluster -o yaml
# Make an etcd backup.
oc debug --as-root node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) -- chroot /host sh -c '/usr/local/bin/cluster-backup.sh /home/core/assets/backup'
# Copy files locally.
MASTER=node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) ; oc debug $MASTER -- chroot /host sh -c 'ls /home/core/assets/backup/*' 2>/dev/null | while read ETCD_BACKUP ; do echo '*' Copying ${ETCD_BACKUP##*/} ; oc debug $MASTER -- chroot /host sh -c "cat $ETCD_BACKUP | gzip -9" | zcat > ${ETCD_BACKUP##*/} ; done
# Confirm files are ok.
MASTER=node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) ; oc debug $MASTER -- chroot /host sh -c 'ls /home/core/assets/backup/*' 2>/dev/null | while read ETCD_BACKUP ; do echo '*' md5sum ${ETCD_BACKUP##*/} ; oc debug $MASTER -- chroot /host sh -c "md5sum $ETCD_BACKUP" 2>/dev/null ; md5sum ${ETCD_BACKUP##*/} ; done
# When does certificate run out.
oc -n openshift-kube-apiserver-operator get secret kube-apiserver-to-kubelet-signer -o jsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}{"\n"}'
# kubelet client/server certificate expiration.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate; openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate'
# If certs expire while being shut down. Then we manually have to approve csr:s when cluster comes up.
# oc get csr -o name | xargs oc adm certificate approve
# Shutdown all nodes.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'shutdown -h 1'
# Now nodes can stay dead until reviving.
# To start up use command similar to this which is from openstack.
openstack server list -f value | grep SHUTOFF | awk '{print $2}' | xargs openstack server start

statefulset

StatefulSet is a Kubernetes controller designed to manage stateful applications that require stable network identities and persistent storage. It handles the deployment, scaling, and management of pods in an ordered and predictable manner, making it ideal for databases, distributed systems, and other applications where state preservation is critical.

oc diff

Se which changes would be made

kubectl diff -f <manifest>.yaml

taint

Remove taint from node.

kubectl taint node control-plane0.novalocal control-plane1.novalocal control-plane2.novalocal node.cloudprovider.kubernetes.io/uninitialized-

Sealed secrets

get secret that you want to unencrypt

oc get sealedsecrets -n openshift-config ldap-secret -o yaml > sealedsecrets_-n_openshift-config_ldap-secret

Unencrypt sealed secrets

kubeseal --recovery-private-key <private_key_file> --recovery-unseal < sealedsecrets_-n_openshift-config_ldap-secret > sealedsecrets_-n_openshift-config_ldap-secret.unsealed

Get private keys from from Sealed secrets

oc get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key -o json | jq -r '.items[].data."tls.key"' | while read LINE ; do echo $LINE | base64 -d > $(echo "${LINE}" | cut -c -100) ; done

imagetag

ImageTag represents a single tag within an image stream and includes the spec, the status history, and the currently referenced image (if any) of the provided tag

"alertname": "SamplesImagestreamImportFailing",
"namespace": "openshift-cluster-samples-operator",
# Remove import fail
oc -n openshift get imagetag | grep "ImportFailed" | awk -e '{ print $1 }' | xargs oc -n openshift tag -d

custom-column examples

oc get machine -n openshift-machine-api -o custom-columns=MACHINE:.metadata.name,SERVERGROUPNAME:.spec.providerSpec.value.serverGroupName,CREATIONTIME:.metadata.creationTimestamp --no-headers