Openshift

From Halfface
Jump to navigation Jump to search

What does it mean?

annotation            comment,definition, attach metadata to objects.
ceph                  Delivers object, block, and file storage in one unified system.
ceph-osd              object storage daemon for the Ceph distributed file system. It is responsible for storing objects on a local file system and providing access to them over the network.
clbo                  CrashLoopBackOff
cmo                   Cluster Monitoring Operator
cncf                  Cloud Native Computing Foundation
cni                   Container Network Interface (OVNKubernetes OpenShiftSDN)
cnv                   Container-native Virtualization, add-on to OpenShift Container Platform that allows virtual machine workloads to run and be managed alongside container workloads.
co                    Cluster Operator
cr                    Custom Resource. (I found it like something added by enabling something. You get it from "oc api-resources")
crd                   custom resource definitions
crd                   CustomResourceDefinition. The name of a CRD object must be a valid DNS subdomain name.
cri                   Container Runtime Interface
cri-o                 Lightweight container runtime for kubernetes.
csi                   Container Storage Interface
csm                   Container Storage Modules
csv                   cluster service version
cvo                   Cluster Version Operator
cvss                  Common Vulnerability Scoring System
daemonset             Ensures that all (or some) Nodes run a copy of a Pod
deployment            You describe a desired state in a Deployment. Deployment object describes how to create or modify pods that hold a containerized application by defining the desired state of a particular component. Deployments create and manage how ReplicaSets are deployed.
ephemeral             Short lived, temporary
fsgroup               Group which Kubernetes will change the permissions of all files in volumes to when volumes are mounted by a pod. 
grcp                  Google Remote Procedure Call, framework that brings performance benefits and modern features to client-server applications. Like RPC
icsp                  ImageContentSourcePolicy. Blocking a payload registry.
idp                   identity provider
ipc namespace         Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. .
ipi                   Installer-Provisioned Infrastructure
kubelet               Kubelet is the primary "node agent" that runs on each node. Takes a set of PodSpecs (primarily through the apiserver) and ensures the containers described are running and healthy.
kvdb                  key-value store (portworx)
mco                   machine-config-operator
mcp                   machine config pools
noobaa                data service for cloud environments, providing S3 object-store interface with flexible tiering, mirroring, and spread placement policies, over any storage resource that allows GET/PUT including S3,GCS..
nsfs                  virtual filesystem making Linux-kernel namespaces available.
oadp                  openshift api data protection
oci                   Open Container Initiative
ocm                   OpenShift Cluster Manager
ocp                   OpenShift Container Platform
ocs                   OpenShift Container Storage
odf                   OpenShift Data Foundation
olm                   Operator Lifecycle Manager
osm                   Open Service Mesh. Lightweight, extensible, cloud native service mesh
ovnk                  Open Virtual Network Kubernetes
pvc                   Persistent volume claim. binding between a Pod and Persistent Volume.
pv                    Persistent volume. Persistent storage. low level representation of a storage volume.
provisioner           A StorageClass object contains a provisioner that decides which volume plugin is used to provision PersistentVolumes.
quay.io               builds, analyzes, distributes your container images. Owned by IBM
ReadWriteMany         Storage read/write for many.
reconciling           Restore friendly relations between.
registrar             The node-driver-registrar is a sidecar container that registers the CSI driver with Kubelet using the kubelet plugin registration mechanism.
replicaset            Maintain a stable set of replica Pods running at any given time
rhacm                 Red Hat Advanced Cluster Management for Kubernetes 
rhcos                 Red Hat Enterprise Linux CoreOS
rhcp                  Red Hat Ceph Storage
rhcs                  Red Hat Cluster Suite
rook                  Operator. File, block, and object storage for your cloud native environment and is based on battle tested ceph storage.
rosa                  Red Hat OpenShift Service on AWS
s2i                   source-to-image
sa                    Service Account
scc                   security context constraints
sc                    security context
seccomp               Secure computing mode profiles can be associated with a container to restrict available system calls.
SelfLink              URL representing the given object.
service               Logical abstraction for a deployed group of pods in a cluster (which all perform the same function).
skopeo                Command line utility used to interact with local and remote container images and container image registries
StatefulSet           Workload object to manage stateful applications. Deployment and scaling Pods, ordering and uniqueness of Pods.
Storage Class         allows for dynamic provisioning of Persistent Volumes.
svc                   service
taint                 Taints ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node.
tekton                Container-native way to manage CI/CD. It's also the basis for OpenShift Pipelines.
thanos                Long-Term storage for your Prometheus Metrics on OpenShift
toleration            You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints.
ubi                   Universal Base Images OCI-compliant container base operating system images with complementary runtime languages and packages that are freely redistributable.
upi                   User-Provisioned Infrastructure
uts                   Unix Timesharing System namespace. Controls the hostname and the NIS domain.
uWSGI                 Project aims at developing a full stack for building hosting services.
wwn                   world wide names. Fiber channel

where do I start

. <(oc completion bash)  Get bash completion running.
oc help                  Get commands
oc api-resources         What can you use commands on.
oc options               Which options apply to all commands

read

https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/

Projects that I have read about but forgotten

OpenEBS              Storage solution. Possible backends. local, nfs, zfs, nvme. CStor to serve iSCSI block storage using the underlying disks or cloud volumes in a cloud native way

files of value

metadata.json         File created during install. Used by openshift-install destroy cluster

oc get

Available resources to ask about.

oc api-resources

Get everything

oc api-resources -o name --no-headers | while read i ; do echo '***' $i ; oc get $i -A -o yaml 2>&1 ; done > /tmp/oc_api-resourece_-o_name_--no-headers.$(oc_api_url).$(date_file)

login

oc login --username developerhttps://openshift:6443

switch user

oc login --username developer

which clusters have you logged into

oc config get-clusters

List projects

oc projects
oc get projects

select project

oc project $project

create project/namespace

oc create namespace redis

list pods

oc get pods
oc get pods --all-namespaces
oc get pods -o wide

wide will give you on which node pod is running.

oc get pods -o wide --all-namespaces

Get pods that are not runing.

oc get pods --field-selector status.phase!=Running --all-namespaces
oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done

Get pods matching two states

oc get pods --field-selector=status.phase!=Running,spec.restartPolicy=Always

Get pods running on specific node

oc get pods -A -o wide --field-selector spec.nodeName=<node>

Get pods with label name=portworx-proxy

oc get pods -A -l name=portworx-proxy

Get pods with several labels

oc get pod -l 'app in (rook-ceph-mon,rook-ceph-operator,rook-ceph-osd,rook-ceph-rgw,rook-ceph-mgr,rook-ceph-mds,rook-ceph-crashcollector)'

get services

oc get svc

get shell on node

It is possible to debug more than nodes. (deployment, build, or job)

oc debug node/infra-2.ocpdev.lkl.ltkalmar.se

Get working env

chroot /host

get debug information from oc

oc debug --loglevel=10 node/$node

debug pod run as root disable health checks

oc debug deployment/my-deployment-name --as-root

get nodes

oc get nodes
oc get nodes -o jsonpath='{.items[*].metadata.name}'
  1. Get nodes without headears. name, cpu:s, disk size, mem, ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,CPU:.status.capacity.cpu,DISK:.status.capacity.ephemeral-storage,MEM:.status.capacity.memory,IP:.status.addresses[?(@.type=="InternalIP")].address'
  1. Get node name and ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,IP:.status.addresses[?(@.type=="InternalIP")].address'

get nodes that are overcommited

oc get nodes -o jsonpath='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' | sed 's/:/=node;/g' | sed 's/;/\n/g' | grep -vE 'MemoryPressure=False|DiskPressure=False|PIDPressure=False|Ready=True'

Does any node stick out.

oc get nodes --no-headers -o=custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions

connect to pod

oc rsh $pod bash

list containers in pod

oc get pod/router-default-6b76b87c6-5m7h6 -n openshift-ingress -o json | jq -r '.spec.containers[].name'
router
logs

connect to container in pod

oc rsh -c router pod/router-default-6b76b87c6-5m7h6

get logs from all containers

Get logs all pods containers.

for POD in $(oc get pods -o jsonpath='{.items[*].metadata.name}') ; do for CONTAINER in $(oc get pod/$POD -o json | jq -r '.spec.containers[].name') ; do echo '***' pod $POD, container $CONTAINER ; oc logs $POD -c $CONTAINER --tail=30 ; done; done

Get logs all pods containers in all namespaces.

oc get namespaces --no-headers | awk '{print $1}' | while read NAMESPACE ; do oc project $NAMESPACE >/dev/null ; for POD in $(oc get pods -o jsonpath='{.items[*].metadata.name}') ; do for CONTAINER in $(oc get pod/$POD -o json | jq -r '.spec.containers[].name') ; do echo '***' namespace $NAMESPACE pod $POD, container $CONTAINER ; oc logs $POD $CONTAINER | grep vsphere.int.redbridge.se | tail -10 ; done; done ; done | tee /tmp/vsphere.int.redbridge.se

search logs for all pods for string save to file

SEARCH="cosprod-m22s6-worker-m52c8" ; oc get namespaces --no-headers | awk '{print $1}' | while read NAMESPACE ; do oc project $NAMESPACE >/dev/null ; for POD in $(oc get pods -o jsonpath='{.items[*].metadata.name}') ; do for CONTAINER in $(oc get pod/$POD -o json | jq -r '.spec.containers[].name') ; do echo '***' namespace $NAMESPACE pod $POD, container $CONTAINER ; oc logs $POD $CONTAINER | grep "${SEARCH}" | tail -10 ; done; done ; done | tee /tmp/search_all_containers_"${SEARCH}".$(date '+%Y-%m-%d_%H-%M-%S').log

tail logs for pods matching label

oc logs -n openshift-storage -l app=csi-cephfsplugin -c driver-registrar -f  --max-log-requests 8 --tail=1
oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --tail=-1
oc logs -f --tail=0 router-default-6c666984fd-ct8zf logs
oc logs -f --namespace openshift-gitops deployment/openshift-gitops-server

Search for log entries locally on node

ls -la $(ls -la $(grep -l EAI_AGAIN /var/log/containers/*) | awk '{print $NF}')
grep -rl EAI_AGAIN /var/log/pods/

execute command in pod

oc exec pod/router-default-545ffb97db-4h9rx -- $command

execute command on all nodes

 oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo $HOSTNAME && chronyc sources'

where am i

POD_NAME=rook-ceph-operator-6c86f788d5-f8mqf
POD_NAMESPACE=openshift-storage

describe pods

oc describe pods
oc describe pod stage-sales-62-qjd

To get (almost) all object with a specific label from the current project, execute:

oc get all -l '<label_name>=<label_value>'
oc get pods -n openshift-storage -o name -l app=rook-ceph-operator

get config from pod in yaml format

oc get pods router-default-545ffb97db-kgsdb -o yaml

get deployments

oc get deployments --all-namespaces

set environment variable in pod

oc set env dc/your-app-name COLOR=blue

unset environment variable in pod

oc set env dc/your-app-name COLOR-

list environment variables

oc set env pod/router-default-545ffb97db-lj2t5 --list

list templates

oc get templates -n openshift

Custom resource definitions.(crd)

oc get crd

sort

CREATED AT

oc get crd --sort-by=.metadata.creationTimestamp

edit

oc edit deployment.apps/router-default

Watch changes taking place.

watch -n1 oc get all

grant permission to project

oc adm policy add-role-to-user view developer -n mysecrets

grant unrestriced access to service account

oc adm policy add-scc-to-user privileged system:serviceaccount:isilon:isilon-node

crictl

List running containers

crictl ps

List all pods

crictl pods

List all images

crictl images

Execute a command in a running container

crictl exec -it 1f73f2d81bf98 /bin/sh

nsenter

run program in different namespaces

which version

oc version

Get clusterversion

oc get clusterversion

copy files from pod

Copy session keys locally.

oc rsync caas-2-8s6cl:/tmp/sslkeylog .

tcpdump from nodes

ssh $node
toolbox

oc get route -A

get routing.

oc describe route sales -n hlt-prod

Name:                   sales
Namespace:              hlt-prod
Created:                13 months ago
Labels:                 <none>
Annotations:            haproxy.router.openshift.io/balance=roundrobin
                        haproxy.router.openshift.io/disable_cookies=true
Requested Host:         sales.prod.bobcat.hlt.se
                           exposed on router default (host apps.ocpprod.lkl.ltkalmar.se) 13 months ago
Path:                   <none>
TLS Termination:        edge
Insecure Policy:        <none>
Endpoint Port:          port-8000-tcp

Service:        sales
Weight:         100 (100%)
Endpoints:      10.160.7.166:8000, 10.160.7.167:8000, 10.160.7.168:8000 + 35 more...

=oc get pods (selecting specific pods Only name without headers

oc get pods -o custom-columns=POD:.metadata.name --no-headers -A

Describe Failing pods.

oc get pods -A --field-selector=status.phase=Failed --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo '*' $POD ${NAME_SPACE} ; oc describe pod $POD -n "${NAME_SPACE}" ; done | less -ISRM

get label:s

oc get pods --show-labels

get subscriptions

oc get subscriptions -A

delete subscription

oc delete subscription openshift-gitops-operator -n openshift-operators

get available channels for subscription

oc get PackageManifest $OPERATOR -o json | jq -r '.status.channels[] | .name,.currentCSV'

update channel

oc patch subscriptions -n openshift-storage odf-operator --type merge -p '{"spec": {"channel": "stable-4.12"}}'

delete clusterserviceversion

oc delete clusterserviceversion openshift-gitops-operator.v1.7.4

whoami

oc whoami
oc config current-context
oc whoami --show-console=true --show-context=true

Which is the console url?

oc whoami --show-console

Which is the api url?

oc whoami --show-server

get instance url

oc get routes -n openshift-console console

get list of user

oc config view -o jsonpath='{.users[*].name}'

list contexts

oc config get-contexts

use-context

oc config use-context openshift-marketplace/api-abjorklund-01-rbcloud-net:6443/kube:admin

oc explain pv

oc explain pv

oc get configmap cluster-monitoring-config

put node offline

Mark a node as unschedulable.

oc adm cordon node1

Drain a node in preparation for maintenance.

oc adm drain <node> --force --delete-emptydir-data --ignore-daemonsets
oc adm drain <node> --ignore-daemonsets --force --grace-period=30 --delete-local-data
oc adm drain <node> --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets

Mark node as online.

oc adm uncordon node1

Extend memory on node.

# Add memory to master nodes.
NODE=costest-ph9l4-master-1
oc adm cordon $NODE
oc adm drain $NODE --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets
timeout 10 oc debug node/$NODE -- chroot /host sh -c 'echo $HOSTNAME && sudo shutdown -P now'
govc vm.power -off /RGK/vm/costest-ph9l4/$NODE
govc vm.info /RGK/vm/costest-ph9l4/$NODE
govc vm.change -vm /RGK/vm/costest-ph9l4/$NODE -m 20480
govc vm.power -on /RGK/vm/costest-ph9l4/$NODE
oc adm uncordon $NODE
oc adm top nodes -l node-role.kubernetes.io/master

Get pv:s

oc get pv

Sorted by size.

oc  get pv --sort-by=.spec.capacity.storage -A

Get more info about a pv.

oc describe pv $PV

Access modes for pv:s. AccessMode

RWO  - ReadWriteOnce
ROX  - ReadOnlyMany
RWX  - ReadWriteMany
RWOP - ReadWriteOncePod

get pvc:s

oc get pvc --all-namespaces | less

sort by

oc get pvc --sort-by=.spec.resources.requests.storage -A

create pvc

# oc create pvc
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: abjorklund-pvc1 
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
EOF

extend/increase pvc

PVC=postgres-instance1-x5b8-pgdata ;NAMESPACE=rk-cos-prod ; oc patch pvc ${PVC} --type=merge -p '{"spec":{"resources":{"requests":{"storage": "2Gi"}}}}' -n ${NAMESPACE}

which pods are using pvc

kubectl get pods --all-namespaces -o=json | jq -c '.items[] | {name: .metadata.name, namespace: .metadata.namespace, claimName:.spec.volumes[] | select( has ("persistentVolumeClaim") ).persistentVolumeClaim.claimName }'

kubectl

List contexts

kubectl config get-contexts

Select context

kubectl config use-context default/api-blabla-halfface-se:6443/kube:admin

list groups

oc get groups -o wide

scale

oc scale --replicas=2 rc/postgresql-1

top(disable wikimedia top)

oc adm top pods --use-protocol-buffers --all-namespaces
oc adm top pods --use-protocol-buffers --all-namespaces --sort-by=cpu | head -20| cut -c -200
oc adm top nodes --sort-by=cpu
oc adm top nodes --sort-by=memory

get memory usage of all running pods in MB

oc get pods -o custom-columns=POD:.metadata.name --no-headers --field-selector status.phase=Running| while read POD ; do echo $POD $(( $(oc exec -it $POD -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes </dev/null 2>/dev/null) / 1024 / 1024 )) MB ; done
oc get pods -A -o wide --no-headers --field-selector spec.nodeName=ocp-04-9lxgz-worker-wlw9p  --field-selector status.phase=Running | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sort -k 4 -n| less

Get memory usage per pod on specific node.

NODE=ocp-01-4dfqx-worker-4n6mk ; oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=${NODE},status.phase=Running" | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sed 's/  */\t/g' | sort -k 4 -n | column -t -s $'\t'

oc get crd

Get Custom Resource Definitions.

oc get crd

operators

Automatically setup of a instances.

list installed operators

oc get ClusterServiceVersions -A
oc get csv -A

Search all namespaces. Exclude namespace.

oc get csv -A -o=custom-columns='NAME:.metadata.name,VERSION:.spec.version,DISPLAY:.spec.displayName' --no-headers | sort  | uniq

list available operators

oc get packagemanifests

oc adm upgrade --to-image=

Upgrade to version that you found on github okd

oc adm upgrade

Upgrade okd images.

Launch a new instance of a pod for gathering debug information. Compress and deliver in support case

cd /tmp && oc adm must-gather
tar czf /tmp/must-gather.$(date +%F_%H-%M-%S).tar.gz must-gather.local.*

Must gather for odf. (get csv -n openshift-storage gives you version to use

cd /tmp && oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:4.10
tar czf /tmp/must-gather.$(date +%F_%H-%M-%S).tar.gz must-gather.local.*

oc adm certificate approve <csr_name>

Approve csr certificate

oc adm release info

# Show information about the cluster's current release
oc adm release info
# Show the source code that comprises a release
oc adm release info 4.2.2 --commit-urls
# Show the source code difference between two releases
oc adm release info 4.2.0 4.2.2 --commits
# Show where the images referenced by the release are located
oc adm release info quay.io/openshift-release-dev/ocp-release:4.2.2 --pullspecs
# Show release info about a release
oc adm release info 4.10.47 --pullspecs

oc adm node-logs

Look at logs from crio from master nodes.

oc adm node-logs --role master -u crio

Get logs from one node from unit crio

oc adm node-logs abjorklund-01-5tsbc-worker-0-kcr54 -u crio

Look at specific log

oc adm node-logs --role master --path=openshift-apiserver/audit.log

List logs

oc adm node-logs --role=master --path=/

Logs since older reboots

oc adm node-logs --role=master --path=/

openshift upgrade path

https://access.redhat.com/labs/ocpupgradegraph/update_path?channel=stable-4.9&arch=x86_64&is_show_hot_fix=false&current_ocp_version=4.9.15&target_ocp_version=4.10.11

upgrade openshift

# look for existing alerts.
# look for troublesome pods.
oc get pods -A  | grep -Ev ' Running | Completed '
# Set channel
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.10"}}'
oc adm upgrade --to=4.10.47
oc get clusterversion -o json|jq ".items[0].spec"
# View openshift version history.
oc get clusterversion -o json|jq ".items[0].status.history"
# View progress of update.
watch -n1 oc whoami --show-console \; oc adm upgrade
# Upgrade all operators
oc get installplan -A | grep Manual | grep false
oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}'

upgrade okd

# Change channel.
oc adm upgrade channel stable-4.13
# Upgrade to wanted version.
oc adm upgrade --to=4.13.0-0.okd-2023-05-22-052007 --allow-explicit-upgrade
oc adm upgrade --to-latest=true --allow-explicit-upgrade
# If complaining about cert.
oc patch --type='merge' --patch='{"spec":{"desiredUpdate":{"force":true}}}' clusterversion version
# Acknowledge changed api.
oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.13-kube-1.27-api-removals-in-4.14":"true"}}' --type=merge

upgrade odf

# Save existing config. 
oc get subscriptions -n openshift-storage odf-operator -o yaml
# Patch subscription
oc patch subscriptions -n openshift-storage odf-operator --type merge -p '{"spec": {"channel": "stable-4.10"}}'
# Get install plans
oc get installplan -n  openshift-storage -o wide
# Approve install plan.
oc patch installplan install-4gf99 -n openshift-storage --type merge --patch '{"spec":{"approved":true}}'

odf troubleshooting

# ceph problem.  Run commands from rook-ceph-operator
oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator)
export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'
ceph -s
ceph osd pool ls
ceph osd pool autoscale-status
ceph config dump
# disable autoscaling
ceph osd pool ls | while read i ; do echo '*' $i ; ceph osd pool set $i pg_autoscale_mode off ; done
# Look to see how much data is being used for pg:s.
# Number of PGLog Entries, size of PGLog data in megabytes, and Average size of each PGLog item
for i in 0 1 2 ; do echo '*' $i ; osdid=$i ; ceph tell osd.$osdid dump_mempools | jq -r '.mempool.by_pool.osd_pglog | [ .items, .bytes /1024/1024, .bytes / .items ] | @csv' ;done
ceph df

helm

List all helm charts in all namespaces

helm list -aA

cronjobs

oc get cj
oc get cronjobs -o wide -A

Run cronjob manually

oc create job -n ldap-sync --from=cronjob/ldap-sync ldap-sync-manual-$(date '+%Y-%m-%d-%H-%M-%S')

Disable cronjob

.spec.suspend: true

delete po (stop, kill)

stop pod

oc delete po --all --force
oc delete pod openshift-gitops-server --namespace openshift-gitops
oc delete pods -n openshift-oauth-apiserver --all
oc get po -A | grep -v ^NAME | awk '$4 !~ /Running/' | sort -k4 | while read NAMESPACE POD READY STATUS END ; do echo '****' $POD $STATUS ; echo oc delete po $POD -n $NAMESPACE --force --grace-period=0 ; done
oc get pods -A --field-selector=status.phase!=Running --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
(oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
# Delete pods and generate report on what has been removed.
LOG=/tmp/oc_delete_pod_$(oc config current-context | awk -F '/|:' '{print $2}').$(date '+%Y-%m-%d_%H-%M-%S').log ; (oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done | tee $LOG ; awk -F\" '{print $2}' $LOG | sed 's/-[a-z0-9]*$//g'| sed 's/-[a-z0-9]*$//g' | sort | uniq -c | sort -n | tail -20

use other namespace

oc rsh  --namespace namespace-name pod-name
oc rsh --namespace namespace-name-operator pod-name bash -c 'echo $PATH $HOSTNAME'

list namespaces

oc get namespace

use namespace

oc rsh  --namespace openshift-gitops openshift-gitops-application-controller-0

kubectl get netnamespace

Command line utility used to configure network. Egress address can be used to define outgoing address. Which can also cause other issues.

oc get netnamespace openshift-gitops -oyaml

oc get routes

oc get routes --namespace openshift-gitops

oc get oauth

Describe authentication methods.

oc get oauth cluster -o yaml

decode token. base64

https://jwt.io/

view secrets

oc get secret ca-key-pair -o go-template='Template:Range $k,$v := .dataTemplate:"Template:$kTemplate:"\n"Template:$vTemplate:"\n\n"Template:End'

delete cluster

openshift-install destroy cluster

storageclasses(sc)

oc get storageclasses

get storageclasses defined as default

oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name'

set default storageclass

# Set all sc to default false.
oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name' | while read i ; do echo '*' $i ; oc patch storageclass $i -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'; done
# Set default storageclass.
oc patch storageclass ocs-storagecluster-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

get service accounts

oc get serviceaccounts -A
oc get sa -A

which permissions do I have

oc auth can-i --as=fjuza --list
oc get groups -o wide

alerts

How is alertmanager configured

base64 -d <(oc get secret -n openshift-monitoring alertmanager-main -o json | jq -r '.data."alertmanager.yaml"')
oc -n openshift-monitoring get secret alertmanager-main --template='Template:Index .data "alertmanager.yaml"' | base64 --decode

Save alertmanger config

oc get secret alertmanager-main -n openshift-monitoring --template='{{index .data "alertmanager.yaml" | base64decode}}' > /tmp/oc_get_secret_alertmanager-main.alertmanager.yaml.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')

Restore alertmanager config

oc set data secret alertmanager-main -n openshift-monitoring  --from-file=alertmanager.yaml=/tmp/secret_alertmanager-main.alertmanager.yaml...
oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml |  oc -n openshift-monitoring replace secret --filename=-

alertmanager

View Alertmanager configured alerts.

oc get prometheusrules -A -o yaml | grep alert: | sort

View configuration of alert

oc get prometheusrules -A -o json | jq '.items[].spec.groups[].rules[]| select(.alert=="AlertmanagerReceiversNotConfigured")'

view alerts.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq . | less -ISRM

View alerts in state firing

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")' | less -ISRM

View alerts in state firing with severity warning

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")' | less -ISRM

View historical alerts.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=2022-08-08T00:00:00.781Z&end=2022-08-09T00:00:00.781Z&step=1m"
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(date '+%Y-%m-%d' --date '-2 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1m" | jq . | less -ISRM

Get warning alerts since the last week.

echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(date '+%Y-%m-%d' --date '-7 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=2m" | jq -r '.data.result[].metric | {alertname, severity}| select(.severity=="warning") | .alertname' | uniq
echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate}| select(.severity=="warning")|select(.alertstate=="firing") | .alertname'

disable alermanager alert

Example config. blackhole is the key.

global:
  resolve_timeout: 5m
receivers:
  - name: Default
  - name: Watchdog
  - name: blackhole
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: blackhole
  repeat_interval: 5m
  routes:
    - match:
      alertname: blackhole
      receiver: blackhole

prometheus

Url to web interface.

https://prometheus-k8s-openshift-monitoring.apps.<url>
echo https://prometheus-k8s-openshift-monitoring.$(oc whoami --show-console | awk -F 'console-openshift-console.' '{print $2}')
echo https://$(oc get route -n openshift-monitoring prometheus-k8s -o jsonpath="{.spec.host}")

Get disk usage from odf

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=odf_system_raw_capacity_used_bytes" | jq -r .

Get disk usage from odf over time.

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=odf_system_raw_capacity_used_bytes&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq . | less -ISRM

Search tips

https://prometheus.io/docs/prometheus/latest/querying/basics/

Disk usage per project. Taken from RH ticket.

oc -n openshift-monitoring exec prometheus-k8s-0 -c prometheus -- curl -s -g 'http://localhost:9090/api/v1/query?' --data-urlencode 'query=(sort_desc(topk(25,(sum(kubelet_volume_stats_used_bytes * on (namespace,persistentvolumeclaim) group_left(storageclass, provisioner) (kube_persistentvolumeclaim_info * on (storageclass)  group_left(provisioner) kube_storageclass_info {provisioner=~"(.*cephfs.csi.ceph.com)"})) by (namespace)))))'

Talk to api with Bearer.

HOST=$(oc -n openshift-monitoring get route alertmanager-main -ojsonpath={.spec.host})
TOKEN=$(oc whoami -t)
curl -skH "Authorization: Bearer $TOKEN" "https://$HOST/api/v2/alerts" | jq .

token2

token=`oc sa get-token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.10 or lower ---

OR

token=`oc create token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.11 or higher ---

curl using token

curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main-openshift-monitoring.apps.domain/api/v1/alerts' |  jq '.data[].labels'

bash completion

. <(oc completion bash)

machineconfig

view settings

oc describe machineconfigpool

set ntp servers

echo 'variant: openshift
version: 4.9.0
metadata:
  name: 99-master-chrony 
  labels:
    machineconfiguration.openshift.io/role: master 
storage:
  files:
  - path: /etc/chrony.conf
    mode: 0644 
    overwrite: true
    contents:
      inline: |
        server ntp.lio.se iburst
        driftfile /var/lib/chrony/drift
        makestep 1.0 3
        rtcsync
        logdir /var/log/chrony' | butane | oc apply -f -

Verify settings

oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,IP:.status.addresses[?(@.type=="InternalIP")].address' | while read NODE_NAME NODE_IP ; do echo $NODE_NAME $(ssh $NODE_IP "chronyc sourcestats| tail -n +4" </dev/null) ; done

get users

oc get users

work with oc without login

export KUBECONFIG=/var/lib/kubelet/kubeconfig

Add the following if cert is not trusted

- cluster:
    insecure-skip-tls-verify: true
    server: https://127.0.0.1:443
  name: my-cluster

run oc when on node

oc get pod -n openshift-monitoring --kubeconfig=/var/lib/kubelet/kubeconfig

etcdctl

oc rsh -c etcdctl -n openshift-etcd $(oc get pod -l app=etcd -oname -n openshift-etcd | awk -F"/" 'NR==1{ print $2 }')
[root@ocp-03-lm8km-master-1 /]# etcdctl --write-out=table endpoint status
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| htt://172.19.14.36:2379   | c4f7b42b92713818 |   3.5.0 |  105 MB |     false |      false |         6 |    2632074 |            2632074 |        |
| htt://172.19.14.37:2379   | 5dea668b432969fc |   3.5.0 |  105 MB |     false |      false |         6 |    2632074 |            2632074 |        |
| htt://172.19.14.41:2379   | 51cecd971b657ee5 |   3.5.0 |  105 MB |      true |      false |         6 |    2632074 |            2632074 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

create troubleshooting/debug/test pod

oc run abjorklund-redhat-ubi8 --image=redhat/ubi8 -i --tty -- sh
oc run abjorklund-curlimage-curl --image=curlimages/curl -i --tty -- sh
oc run -it busybox --image=busybox --restart=Never -- ash
oc run abjorklund-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash # dnf -y install procps-ng iproute
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux --restart=Never --command sleep infinity

install packages to get running

yum install -y lsof procps-ng bind-utils

proxy settings

oc get proxy cluster -o yaml

Change ca

oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'

oc proxy

Run a proxy to the Kubernetes API server

port forward to pod

oc port-forward <my-pod-name> <local-port>:<remote-port>

Install additional ca certificate

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-redbridge-ca-cert
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURrVENDQW5tZ0F3SUJBZ0lFSC93Skh6QU5CZ2txaGtpRzl3MEJBUXNGQURBM01SVXdFd1lEVlFRS0RBeFMKUlVSQ1VrbEVSMFV1VTBVeEhqQWNCZ05WQkFNTUZVTmxjblJwWm1sallYUmxJRUYxZEdodmNtbDBlVEFlRncweQpNVEF5TWpNd056RTVOVFphRncwME1UQXlNak13TnpFNU5UWmFNRGN4RlRBVEJnTlZCQW9NREZKRlJFSlNTVVJIClJTNVRSVEVlTUJ3R0ExVUVBd3dWUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5YVhSNU1JSUJJakFOQmdrcWhraUcKOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQW5mY1F3YURwcEdzNWJxaUc5ajE5aFJVaG1sMzhjb2JGT2tzRQpsZFo3Y3RkV1d6VHJqSTFCRGxZSEd5SXBYMEo4ZU1PaDhvbUZqbVR6VTEzTkpWSnJrWm5RaDRhTzA1UGtKRlJRCkg1ZVA2N3R0S2pEb0txOFZVWXRZUldxRlFaalNxY2lQMzJobXZSNG42QVZDWDdCaUVBZjd2Y05ZVys0a1k5OUsKbTluV1BNbEpGU056M1puRnlWc1BtR1ZWeVN2RmFVL0dBTmt1Z25uSGdUM1VUUTNsc2NidU5keUpBcVEya3dHSwpKbkdZKzBSajVrUWpvdXptUjBDZ3pJN0hWSmhwK2Z6R1lyenRYQXA1Zkt0Z3ZTZFRtTndVVXZJR3pLTmU4WklGCmY0WVVUUDFPdU9jUmNIRDJQclVodDgzWlRLYzNwOUhLYk5CazIzWFFtYU85QVBqeEl3SURBUUFCbzRHa01JR2gKTUI4R0ExVWRJd1FZTUJhQUZMbWFrNHdDamtuakZvWkd6M1daRGErY2N4RGxNQjBHQTFVZERnUVdCQlM1bXBPTQpBbzVKNHhhR1JzOTFtUTJ2bkhNUTVUQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01BNEdBMVVkRHdFQi93UUVBd0lCCnhqQStCZ2dyQmdFRkJRY0JBUVF5TURBd0xnWUlLd1lCQlFVSE1BR0dJbWgwZEhBNkx5OXBjR0V0WTJFdWNtVmsKWW5KcFpHZGxMbk5sTDJOaEwyOWpjM0F3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQURabURvUytJY1ZMcERBRwpiSXM0SWRJKzcxY0xINk90NjNkYWhBT25QRDJnMUhvVUFIZFdUcGdobER3TkFQWjg3UXQybFc4Q1B4eDhCQVZOCnlrZWlEN2paeVA5dmVCcDRxNjBiSTVYSENndWV5U2lGdjBBKzloKzMzekMrYy9WbStJVHJNTkZ0dlZMNE1kRWQKaVE4UVBhaFJEWW1qVkJVb1VIZWErMDdkWEY3TzQxY2t2YzZRb0lad2F5Y1Zhc0gvd05lVGNrdzl1TlNiajNTQwoyNHdpOUthQnpxdDZsWlF3TG5uUjVnNjNWUDZNZUprR2FXMTBxdExiQVM4NGZwQ1NWTUx3U051MGZqeFU2d2lPCkRjaWlKKzNZOG5ldjM5NGJHRkwxcG5ZVmM4YmpoL0xaaHM1dTRQUnhlNFBLRER2Y09NZUhpUkN1M1YySWRRTTgKbDl3enBQZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQoK
        mode: 0644
        overwrite: true
        path: /etc/pki/ca-trust/source/anchors/redbridge-ca.crt

get raw api data

oc get --raw "/api/v1/nodes/[node]/proxy/stats/summary"

Via proxy.

oc proxy &
Starting to serve on 127.0.0.1:8001
curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/stats/summary
curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/metrics/resource

explain

Get documentation for a resource

oc explain deployment

events

Get events.

oc get events -A --sort-by=.metadata.creationTimestamp

jsonpath

Get names of MachineConfigs one value per line.

oc get mc -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' --no-headers

endpoints

look to see that pods are defined in

oc get endpoints -n default

ImageStreamTag

ImageStreamTag represents an Image that is retrieved by tag name from an ImageStream.

BuildConfig

Build configurations define a build process for new container images.

download okd openshift-install

# Show latest latest
curl -skL https://github.com/okd-project/okd/releases | elinks --dump | grep Latest
oc adm release extract --tools quay.io/openshift/okd:4.9.0-0.okd-2022-02-12-140851

setup openshift cluster

Download binary

cd /tmp/ ; curl -L -O https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.10.47/openshift-install-linux.tar.gz && sudo tar xf openshift-install-linux.tar.gz -C /usr/local/bin/

Add vmware certs if using that backend.

(cd /tmp/ ; curl -sk https://${vspherer_server}/certs/download.zip -O) ; cd /etc/pki/ca-trust/source/anchors ; sudo unzip -oj /tmp/download.zip certs/lin/\* ; sudo update-ca-trust

Create config file

install-config.yaml

Then fire off install

openshift-install create cluster

Another example

ln -s install-config.yaml.2023-03-23 install-config.yaml
./openshift-install-4.12.0-0.okd-2023-04-16-041331 create cluster

Edit install config after setup

Save config

 oc get cm cluster-config-v1 -n kube-system --template='{{index .data "install-config" }}' > /tmp/cm_cluster-config-v1_-n_kube-system.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')

Edit downloaded file and apply edited file.

oc set data cm cluster-config-v1 -n kube-system --from-file=install-config=/tmp/cm_cluster-config-v1_-n_kube-system.<suitable_name>

look at install settings

oc get -n kube-system cm/cluster-config-v1 -o yaml

argocd

curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
rm argocd-linux-amd64

argocd login

argocd login openshift-gitops-server-openshift-gitops.apps.costest.ltkronoberg.se --username kubeadmin --password asdfasfasdfas --sso --insecure
argocd login $(oc get routes -n openshift-gitops openshift-gitops-server -o json | jq -r .spec.host) --username $USER --password $COMPANY_PASSWORD --sso --insecure

git sync heal

argocd app list | grep -v NAME | awk '{print $1}' | while read i ; do echo '*' $i ; argocd app set $i --self-heal ; done

metrics

Get available values

Thanos monitoring points

curl -sk -H "Authorization: Bearer $(oc whoami -t)" https://$(oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}')/api/v1/metadata | jq .

node-exporter

oc --request-timeout=3 -n openshift-monitoring exec -c node-exporter $(oc get pod -n openshift-monitoring -l app.kubernetes.io/name=node-exporter -o=custom-columns='NAME:.metadata.name' --no-headers | head -1) -- curl -s 'http://localhost:9100/metrics' | grep -vE "^#|^$"

Cpu usage per node.

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30m])) * 100)
instance:node_cpu_utilisation:rate1m{job="node-exporter",  cluster=""} != 0
instance:node_cpu_utilisation:rate1m{job="node-exporter"} != 0

namespace

cpu usage per namespace.

sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=""}) by (namespace)

load

Load 1 graph

instance:node_load1_per_cpu:ratio{job="node-exporter", cluster=""} != 0

usage for pvc

kubelet_volume_stats_used_bytes
kubelet_volume_stats_available_bytes
kubelet_volume_stats_used_bytes{persistentvolumeclaim="prometheus-prometheus-k8s-1"}

Memory usage

Memory usage of node.

instance:node_memory_utilisation:ratio

install oc kubectl

Download oc/kubectl. wget.

wget https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest/openshift-client-linux.tar.gz; tar -xzvf openshift-client-linux.tar.gz; chmod +x oc; sudo rm /usr/local/bin/oc 2>/dev/null ; 

Download oc/kubectl curl

cd /tmp/ ; curl -vskL https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest/openshift-client-linux.tar.gz -O ; tar -xzvf openshift-client-linux.tar.gz; chmod +x oc; sudo rm /usr/local/bin/oc 2>/dev/null ; sudo mv oc /usr/local/bin

time and timezone in first pod(date)

oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' | head -1 | while read NAMESPACE POD ; do oc rsh -n $NAMESPACE $POD  bash -c 'date "+%Y-%m-%d %H:%M:%S %Z"' 2>/dev/null ; done

oc get installplan

InstallPlan defines the installation of a set of operators.

oc get installplan install-bk8hw -n openshift-operators -o yaml

Approve all manual updates.

oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done

Get selected info from all installplans

oc get installplans.operators.coreos.com -A --no-headers -o=custom-columns='DATE:.metadata.creationTimestamp,NAME:.metadata.name,PHASE:.status.phase,CSV:.spec.clusterServiceVersionNames,NAMESPACE:.metadata.namespace'  --sort-by=.metadata.creationTimestamp

oc extract

Extract secrets or config maps to disk

# Extract only the key "nginx.conf" from config map "nginx" to the /tmp directory
oc extract configmap/nginx --to=/tmp --keys=nginx.conf

dependencies,owner

Search in output from

oc describe ...

Search for this.

Controlled By:  ReplicaSet/rook-ceph-osd-0-6dcdc7fb48

metadata.ownerReferences

Define object that owns object

nodeAffinity

Pin pod to node with label (kubectl label nodes <your-node-name> disktype=ssd)

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
           - ssd

Add user to group

oc adm groups add-users openshift-admins rb_janitor

api-int

api-int.<fqdn>
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; curl -skI https://${i%%:*}:${i##*:} 2>&1 | xargs ; done | cut -c -150
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; set -x ; curl -skv https://${i%%:*}:${i##*:} -o /dev/null 2>&1 | grep "Server certificate:" -A5 ; set +x ; done | cut -c -150

test talk to api-int

CACERT=/tmp/%var%lib%kubelet%kubeconfig%certificate-authority-data ; grep certificate-authority-data: /var/lib/kubelet/kubeconfig | awk '{print $2}' | base64 -d > /$CACERT ; curl -s --key /var/lib/kubelet/pki/kubelet-client-current.pem --cert /var/lib/kubelet/pki/kubelet-client-current.pem --cacert $CACERT -XGET "$(grep server /etc/kubernetes/kubeconfig | awk '{print $2}')/api/v1/namespaces/default/pods?limit=500"

okd setup fix

# On bootstrap node. Could work on all clusters. First a test to se if it work already.
DOMAIN=$(grep " baseDomain: " /etc/mcc/bootstrap/cluster-dns-02-config.yml | awk '{print $2}')
for i in api-int api ; do ping -c1 -W1 $i.${DOMAIN} 2>&1 | xargs; done | cut -c -150 
echo "10.1.0.5 api-int.${DOMAIN} api.${DOMAIN}" >> /etc/hosts

oc annotate

Update the annotations on one or more resources.

oc annotate pods foo description='my frontend'

setuid setgid

  securityContext:
    runAsUser: 10004000
    runAsGroup: 10004000

patch examples

Look at oc get ... -o json and copy line after line.

oc patch redis redis-standalone --type merge  --patch '{"spec": {"securityContext": {"runAsGroup": 1000400000}}}'

limits

When your need to increase your cpu and memory resources. cpu limit is either written as a number. 0.5 for half a cpu. Or rather a definition in milli. 500m for half a cpu.

spec:
  containers:
...
   resources:
     limits:
       cpu: "2"
       memory: 5Gi
     requests:
       cpu: "2"
       memory: 5Gi

quotas on cpu memory pvc... per project

oc get ResourceQuota

tolerations|node selectors|...

oc describe pod

Node-Selectors:              node-role.kubernetes.io/app=
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
                             node.ocs.openshift.io/storage=true:NoSchedule

enable monitoring

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:  
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 2d
EOF

retention elasticsearch

Edit the ClusterLogging CR to add or modify the retentionPolicy parameter:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
...
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3
...

retention prometheus

Prometheus retention. https://docs.openshift.com/container-platform/4.10/monitoring/configuring-the-monitoring-stack.html#modifying-retention-time-for-prometheus-metrics-data_configuring-the-monitoring-stack
oc edit configmap cluster-monitoring-config -n openshift-monitoring
# Enable prometheus.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 2d
EOF

EFK(elk)

ElasticSearch
# Fluentd
processing pipeline
# Kibana.
https://kibana-openshift-logging.apps.<url>

grafana

# grafana
https://grafana-openshift-monitoring.apps.<url>

pull secret

oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq .

Just the keys.

oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq .

Name of each key and email.

oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq -r '.auths | with_entries(.value = .value.email)' | sed 's/{//g;s/}//g;s/"//g' | grep -v '^$' | sed 's/ *//g' | sort

Download pull secret.

oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' > /tmp/pull_secret.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')

Set pull secret.

oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=/tmp/pull_secret_<file_name>

change number of nodes

oc get machineset -n openshift-machine-api
oc edit machineset -n openshift-machine-api <MachineSet>

Elasticsearch status

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cat/health?v
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cluster/health?pretty

talk to elasticsearch

oc rsh elasticsearch-cdm-q8apadpa-1-65f99d99b4-8b9wg
curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200

Oneliner

oc exec -n openshift-logging -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers -n openshift-logging | head -1) -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200

which version of elasticsearch operator is installed

oc get csv -n  openshift-operators-redhat -l operators.coreos.com/elasticsearch-operator.openshift-operators-redhat="" -o=custom-columns='VERSION:.spec.version' --no-headers

list nodes

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/nodes?v"

Who is master node

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/master?v"

Is cluster recovering

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/recovery?active_only=true"

Look at all indices

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v

look at chards

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cat/shards?v

Remove all red indices.

oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v | grep ^red | awk '{print $3}'  | while read i ; do echo '*' $i ; oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=${i} -X DELETE ; done

vsphere creds

oc get -n kube-system cm/cluster-config-v1 -o yaml

Enable openshift/okd logging

Enable redhat-operators

oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": false}]'

Or edit

oc edit operatorhubs 
Spec:
  Disable All Default Sources:  true
  Sources:
    Disabled:  false
    Name:      community-operators
    Disabled:  false
    Name:      redhat-operators

Create namespace

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-operators-redhat 
  annotations:
    openshift.io/node-selector: ""
  labels:
   openshift.io/cluster-monitoring: "true"
EOF

Create namespace

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    openshift.io/node-selector: ""
  labels:
    openshift.io/cluster-monitoring: "true"
EOF

Create operatorgroup

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-operators-redhat
  namespace: openshift-operators-redhat 
spec: {}
EOF

Subscribe to OpenShift Elasticsearch Operator

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: "elasticsearch-operator"
  namespace: "openshift-operators-redhat" 
spec:
  channel: "stable" 
  installPlanApproval: "Automatic" 
  source: "redhat-operators" 
  sourceNamespace: "openshift-marketplace"
  name: "elasticsearch-operator"
EOF

Install the openshift logging operator.

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging 
spec:
  targetNamespaces:
  - openshift-logging 
EOF

Create a subscription object yaml file.

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging 
spec:
  channel: "stable" 
  name: cluster-logging
  source: redhat-operators 
  sourceNamespace: openshift-marketplace
EOF

Create OpenShift Logging instance.

cat <<EOF | oc apply -f -
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  logStore:
    type: "elasticsearch"  
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3 
      storage:
        storageClassName: "standard-csi"
        size: 200G
      resources: 
        limits:
          memory: "16Gi"
      requests:
        memory: "16Gi"
      proxy: 
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"  
    kibana:
      replicas: 1
  collection:
    logs:
      type: "fluentd"  
      fluentd: {}
EOF

telemetry

Restart telemetry.

oc delete pod -n openshift-monitoring -l app.kubernetes.io/component=telemetry-metrics-collector

Update vsphere creds

oc edit cm cloud-provider-config -n openshift-config
default-datastore = "cl07-2-fc-loc-001"

Manage labels.

Add a label to a node or pod:

oc label node node001.krenger.ch mylabel=myvalue
oc label pod mypod-34-g0f7k mylabel=myvalue

Remove a label (in the example “mylabel”) from a node or pod:

oc label node node001.krenger.ch mylabel-
oc label pod mypod-34-g0f7k mylabel-

Permanently label a node

oc edit machineset ocp-qz7hf-worker-us-west-1b -n openshift-machine-api

rollout

Restart pod in an deployment

oc rollout restart deployment -n openshift-storage csi-rbdplugin-provisioner

ssl certificates replace

How to replace api.<url> and star.apps.<url> certs.

# api. Create full chain cert. Public - intermediate - root ca.
api.<url>.crt
api.<url>.key
# create secret
oc delete secret api-cert -n openshift-config
oc create secret tls api-cert --cert=api.<url>.crt --key=api.<url>.key -n openshift-config
# patch apiserver
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<url>"], "servingCertificate": {"name": "api-cert"}}]}}}'
...
# star.apps. Create full chain cert. Public - intermediate - root ca.
star.apps.<url>.crt
star.apps.<url>.key
# create secret
oc delete secret custom-certs-default -n openshift-ingress
oc create secret tls custom-certs-default --cert=star.apps.<url>.crt --key=star.apps.<url>.key -n openshift-ingress
# patch ingress controller
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'

get cluster-id

oc get clusterversion/version -o jsonpath="{.spec.clusterID}"

api

Process running api server. They scale horizontally. They all serve requests.

openshift-kube-apiserver 
kube-apiserver

kube-proxy

kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept.
kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
kube-proxy uses the operating system packet filtering layer if there is one and it's available. Otherwise, kube-proxy forwards the traffic itself.

Resource Allocation

OS and Kubernetes overhead. You can see the reserved OS & Kubernetes overhead by comparing the Allocatable (what the Kubernetes Scheduler can allocate to Pods) and the Capacity.

Capacity:
->cpu:                4
  ephemeral-storage:  125293548Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
->memory:             16409360Ki
  pods:               250
Allocatable:
->cpu:                3500m
  ephemeral-storage:  114396791822
  hugepages-1Gi:      0
  hugepages-2Mi:      0
->memory:             15258384Ki
  pods:               250

requests/limits

User pod allocation is calculated by looking at the “Requests” resource columns from the kubectl get nodes output. 
The relevant columns here are the “Requests, not Limits. 
Requests impact how the pod is scheduled, and what resources are allocated to it, 
whereas limits are used to enable pods to burst beyond their allocation.

empty space

Allocatable - Allocated resources = empty

Allocatable:
  cpu:                3500m
  ephemeral-storage:  114396791822
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             15258384Ki
  pods:               250
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                834m (23%)    0 (0%)
  memory             2474Mi (16%)  736Mi (4%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)

status of namespace

Show an overview of the current project

oc status

age of cluster

Looking at age of machines.

oc get nodes -o json | jq -r '.items[].metadata.creationTimestamp' | sort -n | sed 's/T/ /g;s/Z//g'

oc adm inspect

oc adm inspect namespace/isilon
tar cf /tmp/inspect.isilon.$(date_file ) inspect.local.*

Operations Lifecycle manager(olm)

oc logs -l app=olm-operator -n openshift-operator-lifecycle-manager --tail=-1

Reinstall operator that is no longer available with current openshift version

# Force install odf which is not possible to install because openshift has moved more than 1 version.
# Save subscription 
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do 
oc get subscription -o yaml -l $i > oc_get_subscription_${i//\//_}.yaml ; done
...
# Save operators
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do 
oc get csv -l $i -o yaml > oc_get_csv_-l_${i//\//_}.yaml ; done
...
# Confirm backup files contain usable yaml. Have we forgotten any operators or csv:s. Remove resources clearly not related to odf.
...
# delete the existing ODF related subscriptions and the ClusterServiceVersions related:
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do 
oc delete subscription -o yaml -l $i > oc_get_subscription_${i//\//_}.yaml ; done
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do 
oc delete csv -l $i -o yaml > oc_get_csv_-l_${i//\//_}.yaml ; done
...
# Make sure you wait for the CSVs to be deleted before creating a subscription again.
...
# create only the the Subscription again:
# (optional: edit the subscription before recreate, changing the channel version to the goal version)
...
# Recreate subscription
oc create -f 'oc_get_subscription_operators.coreos.com_odf-operator.openshift-storage=.yaml'
# wait watching the events:
oc get events -w

increase disk on node

VOLUME=abjorklund-01-h4sxm-worker-0-rkk87-root
os volume set --size 40 $VOLUME --os-volume-api-version 3.42
dnf install cloud-utils-growpart xfsprogs
ssh core@worker
growpart /dev/sda 4
xfs_growfs /

clusteroperator

oc get clusteroperators
oc get co

ignition

Retrieve rendered ignition data.

curl https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $NF}'):22623/config/master

rockylinux container names

ubi ("Standard"): OpenSSL, microdnf, and utilities like gzip and vi
ubi-minimal ("Minimal"): Minimized binaries and minimal yum stack.
ubi-init ("Multi-service"): Less than standard but more than minimal, plus systemd.
ubi-micro ("Micro"): Most minimal image without even a package manager.

create a job/pod/script

Create config map of script

Notice that I have to escape $. Since I give date in a here document. Where $ is being expanded.

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: dns-lookup.sh
data:
  dns-lookup.sh: |
    #!/bin/bash
    # Verify if dns resolution works and how fast.
    while true ; do
      for DNS in \$(awk '/^nameserver / {print \$2}' /etc/resolv.conf) 10.2.0.10 ; do
        echo \$(date '+%F %H:%M:%S %Z') \$DNS \$(host -v -t A ibm.se 2>&1 | tail -3 )
      done
      sleep 5
    done
EOF

create job

cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: dns-lookup
spec:
  template:
    spec:
      containers:
        - name: dns-lookup
#          image: rockylinux/rockylinux:9
          image: halfface/rockylinux-toolbox:v2
          command: ["/script/dns-lookup.sh"]
          volumeMounts:
            - name: script
              mountPath: "/script"
#          securityContext:
#            runAsUser: 0
#            privileged: true
      volumes:
        - name: script
          configMap:
            name: dns-lookup.sh
            defaultMode: 0755
      restartPolicy: Never
      activeDeadlineSeconds: 1209600
EOF

deployment with command

Configmap with script. $ is escaped since feed via here document.(bash)

cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: stress.sh
  namespace: abjorklund
data:
  stress.sh: |
    #!/bin/bash
    # stress pod.
    while true ; do
      echo \$(date '+%F %H:%M:%S %Z') \$( stress -c 1 -i 1 -m 1 --vm-bytes 512M -t 10s)
      sleep 5
    done
EOF

Deployment.

cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress
  namespace: abjorklund
  labels:
    app: stress
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stress
  template:
    metadata:
      labels:
        app: stress
    spec:
      containers:
      - name: stress
        image: halfface/rockylinux-toolbox:v3
        volumeMounts:
        - mountPath: /mnt/bin/
          name: stress
#        securityContext:
#          privileged: true
        command: ["/mnt/bin/stress.sh"]
      volumes:
        - name: stress
          configMap:
            name: stress.sh
            defaultMode: 0755
EOF

terminal fix

No line wraps

tput rmam