Openshift
What does it mean?
acme Automated Certificate Management Environment annotations Key=value pairs. That provides metadata for object. atomic Operation or process is indivisible, meaning it is performed as a single, uninterruptible unit, it either completes successfully in its entirety, or it has no effect at all — there is no partial execution or intermediate state. ceph Delivers object, block, and file storage in one unified system. ceph-osd object storage daemon for the Ceph distributed file system. It is responsible for storing objects on a local file system and providing access to them over the network. clbo CrashLoopBackOff clo Cluster Logging Operator cmo Cluster Monitoring Operator cncf Cloud Native Computing Foundation cni Container Network Interface (OVNKubernetes OpenShiftSDN) cns Cloud Native Storage cnv Container-native Virtualization, add-on to OpenShift Container Platform that allows virtual machine workloads to run and be managed alongside container workloads. co Cluster Operator ControllerRevision API object used primarily by controllers that manage versioned, declarative resources such as StatefulSets and DaemonSets. It stores a snapshot of the configuration. cpi Cloud Provider Interface cr Custom Resource. (I found it like something added by enabling something. You get it from "oc api-resources") crd Custom Resource Definition. The name of a CRD object must be a valid DNS subdomain name. cri Container Runtime Interface cri-o Lightweight container runtime for kubernetes. csi Container Storage Interface csm Container Storage Modules csv cluster service version. OLM manifest that defines version of an Operator and its metadata, deployment requirements, and permissions. cvo Cluster Version Operator cvss Common Vulnerability Scoring System daemonset Ensures that all (or some) Nodes run a copy of a Pod deployment You describe a desired state in a Deployment. Deployment object describes how to create or modify pods that hold a containerized application by defining the desired state of a particular component. Deployments create and manage how ReplicaSets are deployed. eo ElasticSearch Operator ephemeral Short lived, temporary eus Extended Update Support evict remove, preempt. Fluentd data collector designed to handle logging by unifying and processing data from various sources. fluent bit lightweight and high-performance data collector. logs but can handle metrics too. fsgroup Group which Kubernetes will change the permissions of all files in volumes to when volumes are mounted by a pod. geneve Generic Network Virtualization Encapsulation OVN-Kubernetes uses Geneve. grcp Google Remote Procedure Call, framework that brings performance benefits and modern features to client-server applications. Like RPC hpa Horizontal Pod Autoscaler, automatically scales the number of pods in a deployment, stateful set, or replica set based on CPU, memory, or custom metrics. icsp ImageContentSourcePolicy. Blocking a payload registry. idp identity provider idps identity providers implicit indirect, hinted, ingressclass use multiple ingress controllers managing network traffic routing within a cluster. ipc namespace Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem. . ipi Installer-Provisioned Infrastructure kcs Knowledge Centered Support, Red Hat's way of offering solutions and articles for known questions or problems. kubelet Kubelet is the primary "node agent" that runs on each node. Takes a set of PodSpecs (primarily through the apiserver) and ensures the containers described are running and healthy. kvdb key-value store (portworx) machineset Managing a set of machines with similar characteristics, manage a group of machines. Desired number of machines. manifest Manifest is a YAML or JSON file that describes the desired state of a Kubernetes object. mco machine-config-operator mcp machine config pools Metricbeat leightweight shipper for metrics noobaa data service for cloud environments, providing S3 object-store interface with flexible tiering, mirroring, and spread placement policies, over any storage resource that allows GET/PUT including S3,GCS.. nsfs virtual filesystem making Linux-kernel namespaces available. oadp openshift api data protection oci Open Container Initiative ocm OpenShift Cluster Manager ocp OpenShift Container Platform ocs OpenShift Container Storage odf OpenShift Data Foundation oidc OpenID Connect, is an identity layer on top of the OAuth 2.0 protocol. olm Operator Lifecycle Manager osm Open Service Mesh. Lightweight, extensible, cloud native service mesh ovnk Open Virtual Network Kubernetes pdb Pod Disruption Budget. poddisruptionbudgets pvc Persistent volume claim. binding between a Pod and Persistent Volume. pv Persistent volume. Persistent storage. low level representation of a storage volume. preempt higher priority pod cannot be scheduled due to insufficient resources, preempt (evict) one or more lower-priority pods to free up resources for higher-priority pod prometheus Prometheus is a time-series database (TSDB). handle the collection, storage, and querying of time-series data. Alerting provisioner A StorageClass object contains a provisioner that decides which volume plugin is used to provision PersistentVolumes. quay.io builds, analyzes, distributes your container images. Owned by IBM ReadWriteMany Storage read/write for many. Reconciliation Mechanism that ensures the cluster behaves as intended by comparing the current state of resources with the desired state specified in your manifests or custom resources. register Container registry is a storage and distribution system for container images. In Kubernetes, container images are the building blocks for deploying applications. A registry is used to store, manage, and distribute these images. registrar The node-driver-registrar is a sidecar container that registers the CSI driver with Kubelet using the kubelet plugin registration mechanism. replicaset Maintain a stable set of replica Pods running at any given time rhacm Red Hat Advanced Cluster Management for Kubernetes rhcos Red Hat Enterprise Linux CoreOS rhcp Red Hat Ceph Storage rhcs Red Hat Cluster Suite rhocp Red Hat OpenShift Container Platform rhol Red Hat OpenShift Logging rook Operator. File, block, and object storage for your cloud native environment and is based on battle tested ceph storage. rosa Red Hat OpenShift Service on AWS runc run container. Container runtime that implements the OCI runtime specification. s2i source-to-image sa Service Account scc security context constraints sc security context seccomp Secure computing mode profiles can be associated with a container to restrict available system calls. SelfLink URL representing the given object. service Logical abstraction for a deployed group of pods in a cluster (which all perform the same function). skopeo Command line utility used to interact with local and remote container images and container image registries StatefulSet Workload object to manage stateful applications. Deployment and scaling Pods, ordering and uniqueness of Pods. Storage Class allows for dynamic provisioning of Persistent Volumes. svc service taint Taints ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node. tekton Container-native way to manage CI/CD. It's also the basis for OpenShift Pipelines. thanos Long-Term storage for your Prometheus Metrics on OpenShift toleration You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints. ubi Universal Base Images OCI-compliant container base operating system images with complementary runtime languages and packages that are freely redistributable. upi User-Provisioned Infrastructure uts Unix Timesharing System namespace. Controls the hostname and the NIS domain. uWSGI Project aims at developing a full stack for building hosting services. vxlan virtual extensible LAN, The OpenShift SDN uses OpenvSwitch tunnels, OpenFlow rules, and iptables. wwn world wide names. Fiber channel
where do I start
. <(oc completion bash) Get bash completion running. oc help Get commands oc api-resources What can you use commands on. oc options Which options apply to all commands
read
https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/
Projects that I have read about but forgotten
OpenEBS Storage solution. Possible backends. local, nfs, zfs, nvme. CStor to serve iSCSI block storage using the underlying disks or cloud volumes in a cloud native way
files of value
metadata.json File created during install. Used by openshift-install destroy cluster
oc get
Available resources to ask about.
oc api-resources
Get everything
oc api-resources -o name --no-headers | while read i ; do echo '***' $i ; oc get $i -A -o yaml 2>&1 ; done > /tmp/oc_api-resourece.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S)
login
oc login --username developerhttps://openshift:6443
switch user
oc login --username developer
which clusters have you logged into
oc config get-clusters
List projects
oc projects oc get projects
select project
oc project $project kubectl config set-context --current --namespace=kube-public
create project/namespace
oc create namespace redis
list pods
oc get pods oc get pods --all-namespaces oc get pods -o wide
wide will give you on which node pod is running.
oc get pods -o wide --all-namespaces
Get pods that are not runing.
oc get pods --field-selector status.phase!=Running --all-namespaces
oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done
Get pods matching two states
oc get pods --field-selector=status.phase!=Running,spec.restartPolicy=Always oc get nodes --no-headers --selector='node-role.kubernetes.io/worker,!node-role.kubernetes.io/infra'
Get pods running on specific node
oc get pods -A -o wide --field-selector spec.nodeName=<node>
Get pods with label name=portworx-proxy
oc get pods -A -l name=portworx-proxy
Get pods with several labels
oc get pod -l 'app in (rook-ceph-mon,rook-ceph-operator,rook-ceph-osd,rook-ceph-rgw,rook-ceph-mgr,rook-ceph-mds,rook-ceph-crashcollector)'
Get pods with extra column port.
kubectl get pods --output=custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,IP:.status.podIPs[*].ip,POD_PORT:.spec.containers[*].ports[*].containerPort
Get pods with column restarts
oc get pods -o custom-columns='NAMESPACE:.metadata.namespace,POD:.metadata.name,RESTART:.status.containerStatuses[*].restartCount' -A | sort -k3 -n | tail -10
Endpoint
An Endpoint is an object that represents the IP addresses and ports of the Pods that back a Service. When a Service is created, Kubernetes automatically creates an associated Endpoints object.
EndpointSlices
EndpointSlices offer a scalable, efficient, and feature-rich alternative to traditional Endpoints, topology.
get shell on node
It is possible to debug more than nodes. (deployment, build, or job)
oc debug node/infra-2.ocpdev.lkl.ltkalmar.se
Get working env
chroot /host
Connect to node in eks.
kubectl debug node/<node> -it --image=halfface/rockylinux-toolbox:v3
get debug information from oc
oc debug --loglevel=10 node/$node
debug pod run as root disable health checks
oc debug deployment/my-deployment-name --as-root
get nodes
oc get nodes
oc get nodes -o jsonpath='{.items[*].metadata.name}'
- Get nodes without headears. name, cpu:s, disk size, mem, ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,CPU:.status.capacity.cpu,DISK:.status.capacity.ephemeral-storage,MEM:.status.capacity.memory,IP:.status.addresses[?(@.type=="InternalIP")].address'
- Get node name and ip address.
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name,IP:.status.addresses[?(@.type=="InternalIP")].address'
ip address of node
Outside pod.
oc get pod --template 'Template:.status.podIP' openshift-gitops-application-controller-0
Inside pod.
echo $POD_IP
get nodes that are overcommited
oc get nodes -o jsonpath='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}' | sed 's/:/=node;/g' | sed 's/;/\n/g' | grep -vE 'MemoryPressure=False|DiskPressure=False|PIDPressure=False|Ready=True'
Does any node stick out.
oc get nodes --no-headers -o=custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions
connect to pod
oc rsh $pod bash
list containers in pod
oc get pod/router-default-6b76b87c6-5m7h6 -n openshift-ingress -o json | jq -r '.spec.containers[].name' router logs
list all containers running in a cluster
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq -c
connect to container in pod
oc rsh -c router pod/router-default-6b76b87c6-5m7h6
get logs from all containers excluding namespace ^openshift from last 24 hours with timestamp
oc get pods --no-headers --field-selector status.phase=Running -A -o custom-columns=NAMESPACE:.metadata.namespace,POD:.metadata.name | grep -v ^openshift | while read NAMESPACE POD ; do for CONTAINER in $(oc get pod $POD -n $NAMESPACE -o json | jq -r '.spec.containers[].name') ; do echo oc logs -n ${NAMESPACE} ${POD} -c ${CONTAINER} ; oc logs -n ${NAMESPACE} $POD -c $CONTAINER --since=24h --timestamps=true 2>&1 | grep "Error: getaddrinfo EAI_AGAIN " ; done ; done
tail logs for pods matching label
oc logs -n openshift-storage -l app=csi-cephfsplugin -c driver-registrar -f --max-log-requests 8 --tail=1 oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --tail=-1 oc logs -f --tail=0 router-default-6c666984fd-ct8zf logs oc logs -f --namespace openshift-gitops deployment/openshift-gitops-server
Search for log entries locally on node
ls -la $(ls -la $(grep -l EAI_AGAIN /var/log/containers/*) | awk '{print $NF}')
grep -rl EAI_AGAIN /var/log/pods/
execute command in pod
oc exec pod/router-default-545ffb97db-4h9rx -- $command kubectl exec --stdin --tty shell-demo -- /bin/bash
execute command on all nodes
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo $HOSTNAME && chronyc sources'
execute command in all containers
oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' -A | while read NAMESPACE POD ; do
for CONTAINER in $(oc get -n $NAMESPACE pod/$POD -o json | jq -r '.spec.containers[].name') ; do
echo '***' $NAMESPACE $POD $CONTAINER
echo $(oc exec -c $CONTAINER -n $NAMESPACE $POD -- curl -m1 -skv https://inter.net 2>&1 | tr -d '\n')
done
done | tee /tmp/$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S)
where am i
POD_NAME=rook-ceph-operator-6c86f788d5-f8mqf POD_NAMESPACE=openshift-storage
describe pods
oc describe pods oc describe pod stage-sales-62-qjd
To get (almost) all object with a specific label from the current project, execute:
oc get all -l '<label_name>=<label_value>' oc get pods -n openshift-storage -o name -l app=rook-ceph-operator
get config from pod in yaml format
oc get pods router-default-545ffb97db-kgsdb -o yaml
get deployments
oc get deployments --all-namespaces
set environment variable in pod
oc set env dc/your-app-name COLOR=blue
unset environment variable in pod
oc set env dc/your-app-name COLOR-
list environment variables
oc set env pod/router-default-545ffb97db-lj2t5 --list
list templates
oc get templates -n openshift
Custom resource definitions.(crd)
oc get crd
sort
CREATED AT
oc get crd --sort-by=.metadata.creationTimestamp
edit
oc edit deployment.apps/router-default
Watch changes taking place.
watch -n1 oc get all
grant permission to project
oc adm policy add-role-to-user view developer -n mysecrets
grant permission to group
oc adm policy add-cluster-role-to-group cluster-admin admin
grant a user cluster-admin permissions through group
# create a new group. oc adm groups new cluster-admin # Bind cluster-admin Role to the Group oc adm policy add-cluster-role-to-group cluster-admin cluster-admin # Add user to group oc adm groups add-users cluster-admin T1.anbj15
grant unrestriced access to service account
oc adm policy add-scc-to-user privileged system:serviceaccount:isilon:isilon-node ... oc adm policy add-scc-to-user anyuid -z ak-authentik oc adm policy add-scc-to-user privileged -z ak-authentik
which pods use scc?
oc get project -o=custom-columns='NAME:.metadata.name' --no-headers | grep -v openshift | while read NAMESPACE ; do echo '*' $NAMESPACE ; oc get pods -o=custom-columns='NAME:.metadata.name,SCC:.metadata.annotations.openshift\.io\/scc' --no-headers -n $NAMESPACE | grep restricted-v2 ; done
oc get pods --all-namespaces -o=jsonpath='{range .items[*]}{@.metadata.name}{"\t"}{@.metadata.namespace}{"\t"}{@.metadata.annotations.openshift\.io/scc}{"\n"}' | column_tab | less
crictl
List running containers
crictl ps crictl ps --all | grep -i coredns
List all pods
crictl pods
List all images
crictl images
Execute a command in a running container
crictl exec -it 1f73f2d81bf98 /bin/sh
crictl logs
crictl logs
nsenter
run program in different namespaces
which version
Get version of various objects
oc version
Only get cluster version
oc get clusterversion oc get clusterversion -o json|jq -r '.items[0].spec| .channel, .desiredUpdate.version'
copy files from pod
Copy session keys locally.
oc rsync caas-2-8s6cl:/tmp/sslkeylog .
tcpdump from nodes
ssh $node toolbox
rm toolbox
toolbox rm --force <container>
oc get route -A
get routing.
oc describe route sales -n hlt-prod
Name: sales
Namespace: hlt-prod
Created: 13 months ago
Labels: <none>
Annotations: haproxy.router.openshift.io/balance=roundrobin
haproxy.router.openshift.io/disable_cookies=true
Requested Host: sales.prod.bobcat.hlt.se
exposed on router default (host apps.ocpprod.lkl.ltkalmar.se) 13 months ago
Path: <none>
TLS Termination: edge
Insecure Policy: <none>
Endpoint Port: port-8000-tcp
Service: sales
Weight: 100 (100%)
Endpoints: 10.160.7.166:8000, 10.160.7.167:8000, 10.160.7.168:8000 + 35 more...
oc get pods (selecting specific pods)
Only name without headers
oc get pods -o custom-columns=POD:.metadata.name --no-headers -A
Describe Failing pods.
oc get pods -A --field-selector=status.phase=Failed --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo '*' $POD ${NAME_SPACE} ; oc describe pod $POD -n "${NAME_SPACE}" ; done | less -ISRM
get pod label:s
oc get pods --show-labels
get subscriptions
oc get subscriptions -A
delete subscription
oc delete subscription openshift-gitops-operator -n openshift-operators
get available channels for subscription
oc get PackageManifest $OPERATOR -o json | jq -r '.status.channels[] | .name,.currentCSV'
update channel
oc patch subscriptions -n $NAMESPACE $OPERATOR --type merge -p '{"spec": {"channel": "stable-4.12"}}'
delete clusterserviceversion
oc delete clusterserviceversion openshift-gitops-operator.v1.7.4
whoami
oc whoami oc config current-context oc whoami --show-console=true --show-context=true
Which is the console url?
oc whoami --show-console
Which is the api url?
oc whoami --show-server
get instance url
oc get routes -n openshift-console console
create an htpasswd user
kubernetes create htpasswd user
oc create user imageregistry oc create identity htpasswd:imageregistry oc create useridentitymapping htpasswd:imageregistry imageregistry
Create user/password to feed kubernetes with.
htpasswd -c -B -b htpasswd imageregistry P@ssW0rd oc create secret generic htpass-secret --from-file=htpasswd=htpasswd -n openshift-config
Get htpasswd users.
oc get secret htpass-secret -ojsonpath={.data.htpasswd} -n openshift-config | base64 --decode
Enable htpasswd login.
oc edit oauth cluster
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
name: cluster
spec:
identityProviders:
- name: htpasswd
mappingMethod: claim
type: HTPasswd
htpasswd:
fileData:
name: htpass-secret
look at oauth config.
oc get oauth cluster -o yaml
Create service account.
https://docs.openshift.com/container-platform/4.13/authentication/understanding-and-creating-service-accounts.html
get list of user
oc config view -o jsonpath='{.users[*].name}'
list contexts
oc config get-contexts
use-context
oc config use-context openshift-marketplace/api-abjorklund-01-rbcloud-net:6443/kube:admin
oc explain pv
oc explain pv
oc get configmap cluster-monitoring-config
put node offline
Mark a node as unschedulable.
oc adm cordon node1
Drain a node in preparation for maintenance.
oc adm drain <node> --force --delete-emptydir-data --ignore-daemonsets oc adm drain <node> --ignore-daemonsets --force --grace-period=30 --delete-local-data oc adm drain <node> --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets
Mark node as online.
oc adm uncordon node1
Extend memory on node.
# Add memory to master nodes. NODE=costest-ph9l4-master-1 oc adm cordon $NODE oc adm drain $NODE --force --delete-emptydir-data --grace-period=1 --ignore-daemonsets timeout 10 oc debug node/$NODE -- chroot /host sh -c 'echo $HOSTNAME && sudo shutdown -P now' govc vm.power -off /RGK/vm/costest-ph9l4/$NODE govc vm.info /RGK/vm/costest-ph9l4/$NODE govc vm.change -vm /RGK/vm/costest-ph9l4/$NODE -m 20480 govc vm.power -on /RGK/vm/costest-ph9l4/$NODE oc adm uncordon $NODE oc adm top nodes -l node-role.kubernetes.io/master
Get pv:s
oc get pv
Sorted by size.
oc get pv --sort-by=.spec.capacity.storage -A
Get more info about a pv.
oc describe pv $PV
Access modes for pv:s. AccessMode
RWO - ReadWriteOnce the volume can be mounted as read-write by a single node ROX - ReadOnlyMany the volume can be mounted read-only by many nodes RWX - ReadWriteMany the volume can be mounted as read-write by many nodes RWOP - ReadWriteOncePod the volume can be mounted as read-write by a single Pod.
get pvc:s
oc get pvc --all-namespaces | less
sort by
oc get pvc --sort-by=.spec.resources.requests.storage -A
list pvc by creation time
oc get pvc --all-namespaces -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,CREATED:.metadata.creationTimestamp"
create pvc
# oc create pvc
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: abjorklund-pvc1
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
EOF
use pvc. Create pod using pvc
# Create test pod.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: abjorklund-test-pvc-claim1-pod
spec:
volumes:
- name: abjorklund-test-pvc
persistentVolumeClaim:
claimName: abjorklund-test-pvc
containers:
- name: abjorklund-test-pvc
image: halfface/rockylinux-toolbox:v3
volumeMounts:
- mountPath: "/mnt/abjorklund-test-pvc"
name: abjorklund-test-pvc
command: ["sleep"]
args: ["infinity"]
EOF
extend/increase pvc
PVC=postgres-instance1-x5b8-pgdata ;NAMESPACE=rk-cos-prod ; oc patch pvc ${PVC} --type=merge -p '{"spec":{"resources":{"requests":{"storage": "2Gi"}}}}' -n ${NAMESPACE}
which pods are using pvc
oc get pods --all-namespaces -o=json | jq -c '.items[] | {name: .metadata.name, namespace: .metadata.namespace, claimName:.spec.volumes[]? | select( has ("persistentVolumeClaim") ).persistentVolumeClaim.claimName }'
List pvc:s with the pod using them.
kubectl describe pvc -A | awk '/^Name:/ {name=$2} /^Namespace:/ {namespace=$2} /^Used By:/ {usedby=$3; print namespace "\t" name "\t" usedby}' | column -t -s $'\t'
create storage class and create pvc
Install nfs csi
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/ helm install nfs-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \ --set nfs.server=10.111.222.1 \ --set nfs.path=/storage/temp/kafka_nfs_root \ --set storageClass.name=nfs
Make storage class default
oc patch storageclass nfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Create pvc using sc
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata: {name: nfs-pvc, namespace: kafka}
spec: {accessModes: [ReadWriteOnce], resources: {requests: {storage: 1Gi}}, storageClassName: nfs}
EOF
kubectl
List contexts
kubectl config get-contexts
Select context
kubectl config use-context default/api-blabla-halfface-se:6443/kube:admin
permissions
list groups
oc get groups -o wide
list cluserroles
oc get clusterrole --all-namespaces
list clusterrolebindings
oc get crb oc get clusterrolebindings
scale
oc scale --replicas=2 rc/postgresql-1 oc scale -n abjorklund deployment stress-hm-6x32 --replicas=0 oc scale --replicas=3 machineset <machineset> -n openshift-machine-api
top(disable wikimedia top)
oc adm top pods --use-protocol-buffers --all-namespaces oc adm top pods --use-protocol-buffers --all-namespaces --sort-by=cpu | head -20| cut -c -200 oc adm top nodes --sort-by=cpu oc adm top nodes --sort-by=memory
get memory usage of all running pods in MB
oc get pods -o custom-columns=POD:.metadata.name --no-headers --field-selector status.phase=Running| while read POD ; do echo $POD $(( $(oc exec -it $POD -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes </dev/null 2>/dev/null) / 1024 / 1024 )) MB ; done oc get pods -A -o wide --no-headers --field-selector spec.nodeName=ocp-04-9lxgz-worker-wlw9p --field-selector status.phase=Running | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sort -k 4 -n| less
Get memory usage per pod on specific node.
NODE=ocp-01-4dfqx-worker-4n6mk ; oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=${NODE},status.phase=Running" | while read NAMESPACE POD NULL ; do oc project $NAMESPACE >/dev/null 2>&1 ; oc adm top pod $POD --containers --no-headers ; done | sed 's/ */\t/g' | sort -k 4 -n | column -t -s $'\t'
get memory usage of all nodes in % of total available ram
oc get nodes -o name | xargs -I % oc debug % -- chroot /host sh -c 'BUFFER=($(free | grep Mem:)) ; echo $HOSTNAME $(( $(( ${BUFFER[1]} - ${BUFFER[6]} )) / $(( ${BUFFER[1]} / 100 )) ))' 2>/dev/null
oc get crd
Get Custom Resource Definitions.
oc get crd
operators
Automatically setup of a instances.
list installed operators
oc get ClusterServiceVersions -A oc get csv -A oc get operators -o json | jq -r '.items[].status.components.refs[]?|select(.kind=="ClusterServiceVersion")|.name'
Search all namespaces. Exclude namespace.
oc get csv -A -o=custom-columns='NAME:.metadata.name,VERSION:.spec.version,DISPLAY:.spec.displayName' --no-headers | sort | uniq
list available operators
oc get packagemanifests
delete operator
Delete via gui. If traces are left. Or unable to install again.
https://access.redhat.com/solutions/6762071 Remove potentially blocking references. https://access.redhat.com/solutions/7026146 Remove label so operator is not recreated. oc get operator prometheus.prometheus -o yaml -n openshift-operators | grep -i CustomResourceDefinition -A1 //It will list the CRDs currently being referenced by the operator oc edit crd thanosrulers.monitoring.coreos.com -----------output truncated------------ labels: operators.coreos.com/prometheus.prometheus: "" //Remove this line and then save and exit # Remove possibly broken jobs. oc get jobs.batch -n openshift-marketplace | grep -i 0/1 # If job was not broken then remove all references to that operator. Remove jobs and configmaps. oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("elasticsearch-operator")) | .metadata.name' | while read i ; do echo oc delete job $i -n openshift-marketplace ; echo oc delete configmap $i -n openshift-marketplace ; done
Select channel
oc patch clusterversion version --type merge -p '{"spec": {"channel": "candidate-4.12"}}' # candidate-... channel offers unsupported early access to releases as soon as they are built.
oc patch clusterversion version --type merge -p '{"spec": {"channel": "fast-4.12"}}' # As soon as version as a general availability (GA) release. Fully supported. Used in production environments.
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.12"}}' # Delay from fast. Looking at quality from fast. If found good then moved to stable
oc patch clusterversion version --type merge -p '{"spec": {"channel": "eus-4.12"}}' # Extended Update Support
find if image exitst
oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.38-x86_64
Upgrade to version that you found on github okd
oc adm upgrade --to-image=
oc adm upgrade
Upgrade okd images.
Launch a new instance of a pod for gathering debug information. Compress and deliver in support case
cd /tmp && oc adm must-gather && tar czf /tmp/must-gather.$(oc whoami --show-server | awk -F ':|/' '{print $4}').$(date +%F_%H-%M-%S).tar.gz must-gather.local.*
Must gather for odf
DATE=$(date +%F_%H-%M-%S)
mkdir /tmp/${DATE} ; cd /tmp/${DATE} && oc adm must-gather --image=registry.redhat.io/odf4/odf-must-gather-rhel9:v4.16
tar czf /tmp/must-gather.$(oc whoami --show-server | awk -F ':|/' '{print $4}').${DATE}.tar.gz /tmp/${DATE}/
oc adm certificate approve <csr_name>
Approve csr certificate
Approve all csr
oc get csr --no-headers | grep -vE 'Pending|Issued' | awk '{print $1}' | xargs -r oc adm certificate approve
certmanager
cert-manager design
( +---------+ )
( | Ingress | ) Optional ACME Only!
( +---------+ )
| |
| +-------------+ +--------------------+ | +-------+ +-----------+
|-> | Certificate |----> | CertificateRequest | ----> | | Order | ----> | Challenge |
+-------------+ +--------------------+ | +-------+ +-----------+
look at cert-manager cr
oc api-resources | grep cert | awk '{print $1}' | while read i ; do echo '*' $i ; oc get $i -A ; done
list certificates
oc get certificate -A
list ClusterIssuer
oc get ClusterIssuer -A
list orders by date
oc get orders -n openshift-config --sort-by=.metadata.creationTimestamp
install cmctl
curl -fsSL https://github.com/cert-manager/cert-manager/releases/latest/download/cmctl-linux-amd64.tar.gz | (cd /usr/local/bin/ ; sudo tar zxf - cmctl)
completion
. <(cmctl completion bash)
renew cert
cmctl renew -n openshift-config cert-api
status of cert
cmctl status certificate -n openshift-ingress le-wildcard-apps-certificate
cert-utils
cert check
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cert-utils-operator-certificate-rule-alerts
namespace: cert-utils-operator
spec:
groups:
- name: cert-utils-operator-recording-rules
rules:
- expr: certutils_certificate_expiry_time - certutils_certificate_issue_time
record: cert:validity_duration:sec
- expr: certutils_certificate_expiry_time - time()
record: cert:time_to_expiration:sec
- name: cert-utils-operator-alerting-rules
rules:
- alert: CertificateApproachingExpiration
annotations:
message: Certificate {{ $labels.namespace }}/{{ $labels.name }} has less than 30 days left.
summary: Certificate {{ $labels.namespace }}/{{ $labels.name }} is approaching expiration (30 days left).
expr: |
cert:validity_duration:sec >= 7776000 and cert:time_to_expiration:sec < 2592000
labels:
severity: warning
- alert: CertificateIsAboutToExpire
annotations:
message: Certificate {{ $labels.namespace }}/{{ $labels.name }} has less than 15 days left.
summary: Certificate {{ $labels.namespace }}/{{ $labels.name }} is about to expire (15 days left).
expr: |
cert:validity_duration:sec >= 7776000 and cert:time_to_expiration:sec < 1296000
labels:
severity: critical
oc adm release info
# Show information about the cluster's current release oc adm release info # Show the source code that comprises a release oc adm release info 4.2.2 --commit-urls # Show the source code difference between two releases oc adm release info 4.2.0 4.2.2 --commits # Show where the images referenced by the release are located oc adm release info quay.io/openshift-release-dev/ocp-release:4.2.2 --pullspecs # Show release info about a release oc adm release info 4.10.47 --pullspecs
release notes
find changes between ocp versions / release note.
https://access.redhat.com/labs/ocpupgradegraph/update_path Select source and destination. At bottom there is graphical display. Press each bubble and read rhba.
Point releases in the end.
https://docs.openshift.com/container-platform/4.12/release_notes/ocp-4-12-release-notes.html
oc adm node-logs
Look at logs from crio from master nodes.
oc adm node-logs --role master -u crio
Get logs from one node from unit crio
oc adm node-logs abjorklund-01-5tsbc-worker-0-kcr54 -u crio
Look at specific log
oc adm node-logs --role master --path=openshift-apiserver/audit.log
List logs
oc adm node-logs --role=master --path=/
List logs from specific node.
oc adm node-logs nord-ic-bc84t-master-0 --path=/oauth-server/
Logs since older reboots
oc adm node-logs --role=master --path=/
Search recursive where log file exist.
oc_debug_run_command_all_nodes 'find /var/log 2>&1 | grep <name_pod>'
download logfile
CONTAINER_PATH="/containers" ; SEARCH_STRING="rabbit" ; oc adm node-logs --role=worker --path="${CONTAINER_PATH}" | grep "${SEARCH_STRING}" | while read NODE LOGFILE ; do echo $NODE --path="${CONTAINER_PATH}/${LOGFILE}" ; oc adm node-logs $NODE --path="${CONTAINER_PATH}/${LOGFILE}" > ${NODE}:${CONTAINER_PATH//\//%}%${LOGFILE} ; done
openshift upgrade path
https://access.redhat.com/labs/ocpupgradegraph/update_path?channel=stable-4.9&arch=x86_64&is_show_hot_fix=false¤t_ocp_version=4.9.15&target_ocp_version=4.10.11
Upgrade openshift/okd
https://docs.okd.io/latest/updating/preparing_for_updates/updating-cluster-prepare.html
Run below and look to se if api:s that are being removed has a count.
oc get apirequestcounts
upgrade openshift
# look for existing alerts.
# look for troublesome pods.
oc get pods -A | grep -Ev ' Running | Completed '
# Set channel
oc patch clusterversion version --type merge -p '{"spec": {"channel": "stable-4.10"}}'
oc adm upgrade --to=4.10.47
oc adm upgrade --include-not-recommended
oc adm upgrade --allow-not-recommended --to=4.10.0 --include-not-recommended
oc get clusterversion -o json|jq ".items[0].spec"
# View openshift version history.
oc get clusterversion -o json | jq -r '.items[0].status.history[] | [.version, .startedTime, .completionTime] | join(" ")'
# View progress of update.
watch -n1 oc whoami --show-console \; oc adm upgrade
watch -cn1 "oc get clusteroperators | grep --color=always -E \"$(oc get clusterversions.config.openshift.io version -o json | jq -r .status.desired.version)|\""
# Upgrade all operators
oc get installplan -A | grep Manual | grep false
oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}'
upgrade okd
Get upgrade path.
Look here to find latest version https://github.com/okd-project/okd/releases
(cd /usr/local/bin/ ; sudo curl -s -O https://gist.githubusercontent.com/Goose29/ca7debd6aec7d1a4959faa2d1b661d93/raw/4584d89c49d4af197480539bdd873f6d9ca2dd83/upgrade-path.py ; sudo chmod 755 upgrade-path.py ) && (curl -sH 'Accept:application/json' 'https://amd64.origin.releases.ci.openshift.org/graph?channel=stable-4' | upgrade-path.py 4.13.0-0.okd-2023-07-23-051208 4.14.0-0.okd-2024-01-26-175629 )
To view status of update process run. Command is harmless and gives information about ongoing process and blockers.
oc adm upgrade watch -cn1 "oc whoami --show-console ; echo ; oc get clusteroperators | grep --color=always -E \"$(oc get clusterversions.config.openshift.io version -o json|jq -r '.spec.desiredUpdate.version')|\""
To get slightly other view. VERSION column gives information about version. When update is done all cluster operators will have same version number.
oc get clusteroperators
Make a report of cluster status before installing. To rule out issues that you have not caused.
"status of kubernetes" below.
Look for api:s that are used that are flagged for being removed.
oc get apirequestcounts
Upgrade okd until there are no more updates or you have reached wanted version.
oc adm upgrade --to-latest=true --allow-explicit-upgrade
If complaining about cert. ReleaseAccepted=False
oc patch --type='merge' --patch='{"spec":{"desiredUpdate":{"force":true}}}' clusterversion version
If client want specific version pinpoint that.
oc adm upgrade --to=<version from oc adm upgrade> --allow-explicit-upgrade
oc adm upgrade gives: Upgradeable=False Reason: AdminAckRequiredn Follow instructions from link. Command will be something like below.
oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-<version>-kube-<version>-api-removals-in-<version>":"true"}}' --type=merge
Get pods that are less than perfekt.
oc get pods -A --no-headers | grep -v Completed | while read LINE ; do PODS=$(awk '{print $3}' <<< "${LINE}") ; if [ "${PODS%%/*}" != "${PODS##*/}" ] ; then echo "${LINE}" ; fi ; done
Get critical alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="critical")'
Get warning alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")'
upgrade odf
# View existing config.
oc get subscriptions -n openshift-storage odf-operator -o yaml
# Patch subscription
oc patch subscriptions -n openshift-storage odf-operator --type merge -p '{"spec": {"channel": "<channel>"}}'
# Get install plans
oc get installplan -n openshift-storage -o wide
# Approve install plan.
oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done
odf troubleshooting
# ceph problem. Run commands from rook-ceph-operator oc rsh -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator) export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config' ceph -s ceph osd pool ls ceph osd pool autoscale-status ceph config dump # disable autoscaling ceph osd pool ls | while read i ; do echo '*' $i ; ceph osd pool set $i pg_autoscale_mode off ; done # Look to see how much data is being used for pg:s. # Number of PGLog Entries, size of PGLog data in megabytes, and Average size of each PGLog item for i in 0 1 2 ; do echo '*' $i ; osdid=$i ; ceph tell osd.$osdid dump_mempools | jq -r '.mempool.by_pool.osd_pglog | [ .items, .bytes /1024/1024, .bytes / .items ] | @csv' ;done ceph df
cronjobs
oc get cj oc get cronjobs -o wide -A
Run cronjob manually
export CRONJOB=ldap-sync ; oc create job --from=cronjob/${CRONJOB} ${CRONJOB}-manual-$(date '+%Y-%m-%d-%H-%M-%S')
Disable cronjob
.spec.suspend: true
Enable cronjob
oc patch cronjobs.batch write-to-nfs --type merge -p '{"spec": {"suspend": false}}'
delete po (stop, kill)
stop pod
oc delete po --all --force
oc delete pod openshift-gitops-server --namespace openshift-gitops
oc delete pods -n openshift-oauth-apiserver --all
oc get po -A | grep -v ^NAME | awk '$4 !~ /Running/' | sort -k4 | while read NAMESPACE POD READY STATUS END ; do echo '****' $POD $STATUS ; echo oc delete po $POD -n $NAMESPACE --force --grace-period=0 ; done
oc get pods -A --field-selector=status.phase!=Running --no-headers | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
(oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do echo oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done
# Delete pods and generate report on what has been removed.
LOG=/tmp/oc_delete_pod_$(oc config current-context | awk -F '/|:' '{print $2}').$(date '+%Y-%m-%d_%H-%M-%S').log ; (oc get pods --field-selector="status.phase=Pending" --no-headers -A ; oc get pods --field-selector="status.phase=Failed" --no-headers -A) | while read NAME_SPACE POD REST_OF_LINE ; do oc delete pod $POD -n "${NAME_SPACE}" --force --grace-period=0 ; done | tee $LOG ; awk -F\" '{print $2}' $LOG | sed 's/-[a-z0-9]*$//g'| sed 's/-[a-z0-9]*$//g' | sort | uniq -c | sort -n | tail -20
use other namespace
oc rsh --namespace namespace-name pod-name oc rsh --namespace namespace-name-operator pod-name bash -c 'echo $PATH $HOSTNAME'
list namespaces
oc get namespace
use namespace
oc rsh --namespace openshift-gitops openshift-gitops-application-controller-0
kubectl get netnamespace
Command line utility used to configure network. Egress address can be used to define outgoing address. Which can also cause other issues.
oc get netnamespace openshift-gitops -oyaml
oc get routes
oc get routes --namespace openshift-gitops
oc get oauth
Describe authentication methods.
oc get oauth cluster -o yaml
decode token. base64
https://jwt.io/
view secrets
oc get secret ca-key-pair -o go-template='Template:Range $k,$v := .dataTemplate:"Template:$kTemplate:"\n"Template:$vTemplate:"\n\n"Template:End'
delete cluster
openshift-install destroy cluster
storageclasses(sc)
oc get storageclasses
get storageclasses defined as default
oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name'
set default storageclass
# Set all sc to default false.
oc get sc -o json | jq -r '.items[]|select(."metadata".annotations."storageclass.kubernetes.io/is-default-class"=="true")|.metadata.name' | while read i ; do echo '*' $i ; oc patch storageclass $i -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'; done
# Set default storageclass.
oc patch storageclass ocs-storagecluster-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
get service accounts
oc get serviceaccounts -A oc get sa -A
which permissions do I have
oc auth can-i --as=fjuza --list oc get groups -o wide oc auth can-i --as-group=<group> --list
alerts
How is alertmanager configured
oc get secret -n openshift-monitoring alertmanager-main -o json | jq -r '.data."alertmanager.yaml"|@base64d'
Save alertmanger config
oc get secret alertmanager-main -n openshift-monitoring --template='{{index .data "alertmanager.yaml" | base64decode}}' > /tmp/oc_get_secret_alertmanager-main.alertmanager.yaml.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
oc extract secret/alertmanager-main --confirm -n openshift-monitoring
Restore alertmanager config
oc set data secret alertmanager-main -n openshift-monitoring --from-file=alertmanager.yaml=<file_alertmanager.yaml>
alertmanager
View Alertmanager configured alerts.
oc get prometheusrules -A -o yaml | grep alert: | sort
View configuration of alert
oc get prometheusrules -A -o json | jq '.items[].spec.groups[].rules[]| select(.alert=="AlertmanagerReceiversNotConfigured")'
view alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq . | less -ISRM
View specific alert.
oc rsh -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl 'http://localhost:9090/api/v1/query?query=absent%28up%7Bjob%3D"fluentd"%7D+%3D%3D+1%29' | jq .
View alerts in state firing
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")' | less -ISRM
View alerts in state firing with severity warning
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/alerts" | jq '.data.alerts[]|select(.state=="firing")|select(.labels.severity=="warning")' | less -ISRM
View historical alerts.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=2022-08-08T00:00:00.781Z&end=2022-08-09T00:00:00.781Z&step=1m" oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(date '+%Y-%m-%d' --date '-2 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1m" | jq . | less -ISRM
Get warning alerts since the last week.
echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate}| select(.severity=="warning")|select(.alertstate=="firing") | .alertname'
Get more info about fired alerts.
echo '***' $(oc whoami --show-console) ; oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r '.data.result[].metric | {alertname, severity, alertstate, pod, namespace}| select(.severity=="warning")|select(.alertstate=="firing")'
Get alert during the last 6 days. Give times when alert has fired.
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=ALERTS&start=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S.000Z' --date '-6 days')&end=$(TZ=UTC date '+%Y-%m-%dT%H:%M:%S').000Z&step=1m" | jq -r . | python3 -c "import sys, re, datetime; print(re.sub(r'\b\d{10}\b', lambda x: datetime.datetime.utcfromtimestamp(int(x.group())).isoformat() + 'Z', sys.stdin.read()))" | less -ISRM
disable alermanager alert
oc -n openshift-monitoring exec -ti alertmanager-main-0 -c alertmanager -- amtool silence add --alertmanager.url http://localhost:9093 alertname=AlertmanagerReceiversNotConfigured --end="2053-11-07T00:00:00-00:00" --comment "silence alertmanager"
list silenced alerts
oc -n openshift-monitoring exec -ti alertmanager-main-0 -c alertmanager -- amtool silence query --alertmanager.url http://localhost:9093
Parsed config to alertmanager
oc -n openshift-monitoring exec -ti alertmanager-main-0 -c alertmanager -- curl http://localhost:9093/api/v2/status | jq -r .config.original
Version of alertmanager
oc -n openshift-monitoring exec -ti alertmanager-main-0 -c alertmanager -- curl http://localhost:9093/api/v2/status | jq -r .versionInfo.version | strings
Silence alertmanager not configured alert
oc set data secret alertmanager-main -n openshift-monitoring --from-file=alertmanager.yaml=<(cat <<'EOF'
"global":
"resolve_timeout": "5m"
"inhibit_rules":
- "equal":
- "namespace"
- "alertname"
"source_match":
"severity": "critical"
"target_match_re":
"severity": "warning|info"
- "equal":
- "namespace"
- "alertname"
"source_match":
"severity": "warning"
"target_match_re":
"severity": "info"
"receivers":
- "name": "Default"
- "name": "Watchdog"
- "name": "Critical"
- "name": "testrec" # Dummy receiver with webhook config
"webhook_configs":
- "url": "http://xxxxdumyxxx.com"
"route":
"group_by":
- "namespace"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "Default"
"repeat_interval": "12h"
"routes":
- "match":
"alertname": "dummyalert" # Dummy alert being routed to dummy receiver
"receiver": "testrec"
EOF
)
prometheus
Url to web interface.
https://prometheus-k8s-openshift-monitoring.apps.<url> echo https://prometheus-k8s-openshift-monitoring.$(oc whoami --show-console | awk -F 'console-openshift-console.' '{print $2}') echo https://$(oc get route -n openshift-monitoring prometheus-k8s -o jsonpath="{.spec.host}")
Get disk usage from odf
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query?query=odf_system_raw_capacity_used_bytes" | jq -r .
Get disk usage from odf over time.(metrics)
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=odf_system_raw_capacity_used_bytes&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq . | less -ISRM
Search tips
https://prometheus.io/docs/prometheus/latest/querying/basics/
Disk usage per project. Taken from RH ticket.
oc -n openshift-monitoring exec prometheus-k8s-0 -c prometheus -- curl -s -g 'http://localhost:9090/api/v1/query?' --data-urlencode 'query=(sort_desc(topk(25,(sum(kubelet_volume_stats_used_bytes * on (namespace,persistentvolumeclaim) group_left(storageclass, provisioner) (kube_persistentvolumeclaim_info * on (storageclass) group_left(provisioner) kube_storageclass_info {provisioner=~"(.*cephfs.csi.ceph.com)"})) by (namespace)))))'
openshift-user-workload-monitoring
"annotations": {
"description": "Prometheus operator in openshift-user-workload-monitoring namespace rejected 2 prometheus/ServiceMonitor resources.",
"summary": "Resources rejected by Prometheus operator"
},...
# Look at what is causing.
oc logs -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring
# After tweaking with monitoring settings kill pod and view log.
oc delete pod -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring
oc logs -l app.kubernetes.io/name=prometheus-operator -n openshift-user-workload-monitoring | less
# Stop monitoring.
oc label namespace openshift-local-storage openshift.io/cluster-monitoring-
oc label namespace openshift-local-storage openshift.io/user-monitoring=false
# Allow monitoring.
oc label namespace openshift-operators openshift.io/cluster-monitoring=true
Talk to api with Bearer.
HOST=$(oc -n openshift-monitoring get route alertmanager-main -ojsonpath={.spec.host})
TOKEN=$(oc whoami -t)
curl -skH "Authorization: Bearer $TOKEN" "https://$HOST/api/v2/alerts" | jq .
token
token=`oc sa get-token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.10 or lower ---
OR
token=`oc create token prometheus-k8s -n openshift-monitoring` ## --- In OCP client 4.11 or higher ---
curl using token
curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main-openshift-monitoring.apps.domain/api/v1/alerts' | jq '.data[].labels'
ServiceMonitor
Prometheus Operator:
When using Prometheus Operator, custom resources like ServiceMonitor and PodMonitor might include metricsConfig settings to customize how Prometheus should scrape metrics from various services or pods.
bash completion
. <(oc completion bash)
machineconfig
view settings
oc describe machineconfigpool
set ntp servers
echo 'variant: openshift
version: 4.9.0
metadata:
name: 99-master-chrony
labels:
machineconfiguration.openshift.io/role: master
storage:
files:
- path: /etc/chrony.conf
mode: 0644
overwrite: true
contents:
inline: |
server ntp.lio.se iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony' | butane | oc apply -f -
get machineconfig value
oc get mc 00-master -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig' | sed 's/^data:,//g' | jq .
List machineconfigs by creation time
oc get mc --sort-by=.metadata.creationTimestamp
get users
oc get users
give me kubeadmin ecrypted password
oc get secret kubeadmin -n kube-system -o json -o=jsonpath='{.data.kubeadmin}' | base64 -d
Give kubeadmin a new password
generate password hash
htpasswd -bnBC 10 "" '<password>' | tr -d ':\n' | base64 -w0
patch password hash
oc patch secret/kubeadmin -n kube-system -p '{"data": {"kubeadmin": "UGFzc3dvcmQK=="}}'
work with oc without login
export KUBECONFIG=/var/lib/kubelet/kubeconfig
if on bootstrap node.
export KUBECONFIG=/etc/kubernetes/kubeconfig
Add the following if cert is not trusted.ssl/tls
- cluster:
insecure-skip-tls-verify: true
server: https://127.0.0.1:443
name: my-cluster
run oc when on node
oc get pod -n openshift-monitoring --kubeconfig=/var/lib/kubelet/kubeconfig
etcdctl
oc rsh -c etcdctl -n openshift-etcd $(oc get pod -l app=etcd -oname -n openshift-etcd | awk -F"/" 'NR==1{ print $2 }')
[root@ocp-03-lm8km-master-1 /]# etcdctl --write-out=table endpoint status
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| htt://172.19.14.36:2379 | c4f7b42b92713818 | 3.5.0 | 105 MB | false | false | 6 | 2632074 | 2632074 | |
| htt://172.19.14.37:2379 | 5dea668b432969fc | 3.5.0 | 105 MB | false | false | 6 | 2632074 | 2632074 | |
| htt://172.19.14.41:2379 | 51cecd971b657ee5 | 3.5.0 | 105 MB | true | false | 6 | 2632074 | 2632074 | |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Verifying etcd while openshift is less than perfekt
# List etcd containers crictl ps --label io.kubernetes.pod.namespace=openshift-etcd # Execute command to verify health of etcd using crictl crictl exec -it <container-id> etcdctl endpoint status --write-out=table
create troubleshooting/debug/test pod
oc run abjorklund-redhat-ubi8 --image=redhat/ubi8 -i --tty -- sh
oc run abjorklund-curlimage-curl --image=curlimages/curl -i --tty -- sh
oc run -it busybox --image=busybox --restart=Never -- ash
oc run abjorklund-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux -i --tty -- bash # dnf -y install procps-ng iproute
oc run ${USER}-rocky-rocky --image=rockylinux/rockylinux --restart=Never --command sleep infinity
install packages to get running
yum install -y lsof procps-ng bind-utils
proxy settings
oc get proxy cluster -o yaml
Change ca
oc patch proxy/cluster --type=merge --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'
oc proxy
Run a proxy to the Kubernetes API server
port forward to pod
oc port-forward <my-pod-name> <local-port>:<remote-port>
alertmanager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093 # http://localhost:9093/
grafana access.
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80 # http://localhost:3000 admin prom-operator
prometheus access.
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 # http://localhost:9090
proxy via pod
# Create tunnel pod. Please remove when done.
oc apply -f - << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: proxy-pod
namespace: default
spec:
containers:
- name: socat-proxy
image: alpine/socat:latest
command: ["/bin/sh", "-c"]
args:
- |
apk add --no-cache socat && # Install socat if not in image
socat TCP-LISTEN:443,fork,reuseaddr TCP:scm.devops.cambio.se:443
ports:
- containerPort: 443
restartPolicy: Never
EOF
# Tunnel on localhost reaching destination.
sudo oc --kubeconfig=$KUBECONFIG port-forward pod/proxy-pod -n default 443:443
# Add hosts entry to use browser as "usual".
grep scm.devops.cambio.se /etc/hosts
127.0.0.1 localhost scm.devops.cambio.se
Install additional ca certificate
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 50-company-ca-cert
spec:
config:
ignition:
version: 3.1.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURrVENDQW5tZ0F3SUJBZ0lFSC93Skh6QU5CZ2txaGtpRzl3MEJBUXNGQURBM01SVXdFd1lEVlFRS0RBeFMKUlVSQ1VrbEVSMFV1VTBVeEhqQWNCZ05WQkFNTUZVTmxjblJwWm1sallYUmxJRUYxZEdodmNtbDBlVEFlRncweQpNVEF5TWpNd056RTVOVFphRncwME1UQXlNak13TnpFNU5UWmFNRGN4RlRBVEJnTlZCQW9NREZKRlJFSlNTVVJIClJTNVRSVEVlTUJ3R0ExVUVBd3dWUTJWeWRHbG1hV05oZEdVZ1FYVjBhRzl5YVhSNU1JSUJJakFOQmdrcWhraUcKOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDQVFFQW5mY1F3YURwcEdzNWJxaUc5ajE5aFJVaG1sMzhjb2JGT2tzRQpsZFo3Y3RkV1d6VHJqSTFCRGxZSEd5SXBYMEo4ZU1PaDhvbUZqbVR6VTEzTkpWSnJrWm5RaDRhTzA1UGtKRlJRCkg1ZVA2N3R0S2pEb0txOFZVWXRZUldxRlFaalNxY2lQMzJobXZSNG42QVZDWDdCaUVBZjd2Y05ZVys0a1k5OUsKbTluV1BNbEpGU056M1puRnlWc1BtR1ZWeVN2RmFVL0dBTmt1Z25uSGdUM1VUUTNsc2NidU5keUpBcVEya3dHSwpKbkdZKzBSajVrUWpvdXptUjBDZ3pJN0hWSmhwK2Z6R1lyenRYQXA1Zkt0Z3ZTZFRtTndVVXZJR3pLTmU4WklGCmY0WVVUUDFPdU9jUmNIRDJQclVodDgzWlRLYzNwOUhLYk5CazIzWFFtYU85QVBqeEl3SURBUUFCbzRHa01JR2gKTUI4R0ExVWRJd1FZTUJhQUZMbWFrNHdDamtuakZvWkd6M1daRGErY2N4RGxNQjBHQTFVZERnUVdCQlM1bXBPTQpBbzVKNHhhR1JzOTFtUTJ2bkhNUTVUQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01BNEdBMVVkRHdFQi93UUVBd0lCCnhqQStCZ2dyQmdFRkJRY0JBUVF5TURBd0xnWUlLd1lCQlFVSE1BR0dJbWgwZEhBNkx5OXBjR0V0WTJFdWNtVmsKWW5KcFpHZGxMbk5sTDJOaEwyOWpjM0F3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQURabURvUytJY1ZMcERBRwpiSXM0SWRJKzcxY0xINk90NjNkYWhBT25QRDJnMUhvVUFIZFdUcGdobER3TkFQWjg3UXQybFc4Q1B4eDhCQVZOCnlrZWlEN2paeVA5dmVCcDRxNjBiSTVYSENndWV5U2lGdjBBKzloKzMzekMrYy9WbStJVHJNTkZ0dlZMNE1kRWQKaVE4UVBhaFJEWW1qVkJVb1VIZWErMDdkWEY3TzQxY2t2YzZRb0lad2F5Y1Zhc0gvd05lVGNrdzl1TlNiajNTQwoyNHdpOUthQnpxdDZsWlF3TG5uUjVnNjNWUDZNZUprR2FXMTBxdExiQVM4NGZwQ1NWTUx3U051MGZqeFU2d2lPCkRjaWlKKzNZOG5ldjM5NGJHRkwxcG5ZVmM4YmpoL0xaaHM1dTRQUnhlNFBLRER2Y09NZUhpUkN1M1YySWRRTTgKbDl3enBQZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQoK
mode: 0644
overwrite: true
path: /etc/pki/ca-trust/source/anchors/company-ca.crt
get raw api data
oc get --raw "/api/v1/nodes/[node]/proxy/stats/summary"
Via proxy.
oc proxy & Starting to serve on 127.0.0.1:8001 curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/stats/summary curl -s http://localhost:8001/api/v1/nodes/crc-lgph7-master-0/proxy/metrics/resource
explain
Get documentation for a resource. Get available attributes for an resource.
oc explain deployment
events
Get events.
oc get events -A --sort-by=.metadata.creationTimestamp
jsonpath
Get names of MachineConfigs one value per line.
oc get mc -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' --no-headers
ImageStreamTag
ImageStreamTag represents an Image that is retrieved by tag name from an ImageStream.
imagestream
apiVersion: image.openshift.io/v1 kind: ImageStream metadata: name: myapp
Tagging Images: When you tag an image, it is added to the ImageStream with a specified tag.
oc tag myregistry/myapp:latest myapp:latest
Using ImageStreams in Deployment Configurations: Deployment configurations can reference ImageStreams instead of direct image URLs.
apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: myapp
image: image-registry.openshift-image-registry.svc:5000/myproject/myapp:latest
BuildConfig
Build configurations define a build process for new container images.
download okd openshift-install
# Show latest. curl -skL https://github.com/okd-project/okd/releases | elinks --dump | sed 's/^ *//g' | grep " Latest" # Download and install in /usr/local/bin. Keep old versions. export OKD_VERSION=4.15.0-0.okd-2024-03-10-010116 ; (cd /temp/ ; oc adm release extract --tools quay.io/openshift/okd:${OKD_VERSION} ; cd /usr/local/bin/ ; sudo tar xf /temp/openshift-install-linux-${OKD_VERSION}.tar.gz openshift-install ; sudo mv openshift-install openshift-install.${OKD_VERSION})
setup openshift cluster
Download binary
cd /tmp/ ; curl -L -O https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.10.47/openshift-install-linux.tar.gz && sudo tar xf openshift-install-linux.tar.gz -C /usr/local/bin/
Add vmware certs if using that backend.
(cd /tmp/ ; curl -sk https://${vspherer_server}/certs/download.zip -O) ; cd /etc/pki/ca-trust/source/anchors ; sudo unzip -oj /tmp/download.zip certs/lin/\* ; sudo update-ca-trust
Create config file
install-config.yaml
Then fire off install
openshift-install create cluster
Another example
ln -s install-config.yaml.2023-03-23 install-config.yaml ./openshift-install-4.12.0-0.okd-2023-04-16-041331 create cluster
Edit install config after setup
Save config
oc get cm cluster-config-v1 -n kube-system --template='{{index .data "install-config" }}' > /tmp/cm_cluster-config-v1_-n_kube-system.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
Edit downloaded file and apply edited file.
oc set data cm cluster-config-v1 -n kube-system --from-file=install-config=/tmp/cm_cluster-config-v1_-n_kube-system.<suitable_name>
look at install settings
oc get -n kube-system cm/cluster-config-v1 -o yaml
argocd login
argocd login openshift-gitops-server-openshift-gitops.apps.costest.ltkronoberg.se --username kubeadmin --password asdfasfasdfas --sso --insecure argocd login $(oc get routes -n openshift-gitops openshift-gitops-server -o json | jq -r .spec.host) --username $USER --password $COMPANY_PASSWORD --sso --insecure
git sync heal
argocd app list | grep -v NAME | awk '{print $1}' | while read i ; do echo '*' $i ; argocd app set $i --self-heal ; done
metrics
Get available values
Thanos monitoring points
curl -sk -H "Authorization: Bearer $(oc whoami -t)" https://$(oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}')/api/v1/metadata | jq .
node-exporter
oc --request-timeout=3 -n openshift-monitoring exec -c node-exporter $(oc get pod -n openshift-monitoring -l app.kubernetes.io/name=node-exporter -o=custom-columns='NAME:.metadata.name' --no-headers | head -1) -- curl -s 'http://localhost:9100/metrics' | grep -vE "^#|^$"
Cpu usage per node.
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30m])) * 100)
instance:node_cpu_utilisation:rate1m{job="node-exporter", cluster=""} != 0
instance:node_cpu_utilisation:rate1m{job="node-exporter"} != 0
cpu usage per pod on node
sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="", node=~"<node>"}) by (pod)
iowait
avg by (instance) (irate(node_cpu_seconds_total{mode="iowait"}[30m]))
namespace
cpu usage per namespace.
sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster=""}) by (namespace)
load
Load 1 graph
instance:node_load1_per_cpu:ratio{job="node-exporter", cluster=""} != 0
usage for pvc
kubelet_volume_stats_used_bytes
kubelet_volume_stats_available_bytes
kubelet_volume_stats_used_bytes{persistentvolumeclaim="prometheus-prometheus-k8s-1"}
Memory usage
Memory usage of node.
instance:node_memory_utilisation:ratio node_memory_MemAvailable_bytes 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
Memory usage per pod on a node
sum(container_memory_usage_bytes{node="<node_name>"}) by (pod, namespace)
OOMKilled
sum by (namespace, pod) (kube_pod_container_status_restarts_total) * on(namespace, pod) group_left(reason) kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/query_range?query=sum%20by%20(namespace,%20pod)%20(kube_pod_container_status_restarts_total)%20*%20on(namespace,%20pod)%20group_left(reason)%20kube_pod_container_status_last_terminated_reason%7Breason%3D%22OOMKilled%22%7D&start=$(date '+%Y-%m-%d' --date '-20 days')T00:00:00.781Z&end=$(date '+%Y-%m-%dT%H:%M:%S').781Z&step=1h" | jq .
uptime
oc exec -n openshift-monitoring -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/query?query=time%28%29%20-%20node_boot_time_seconds%7Bjob%3D%22node-exporter%22%7D%0A' | jq -r '.data.result[]|.metric.instance +"\t"+ (.value[1] | tonumber | floor | tostring)' | column_tab
disk usage
(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100
disk inode usage
(1 - (node_filesystem_files_free{mountpoint="/"} / node_filesystem_files{mountpoint="/"})) * 100
steal
sum by (instance) (rate(node_cpu_seconds_total{mode="steal"}[1m])) * 100
request memory
sum by (node) ( kube_pod_container_resource_requests{resource="memory"} * on (namespace, pod) group_left kube_pod_status_phase{phase="Running"} ) / 1024 / 1024
install oc and kubectl
curl -fsSL https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/latest/openshift-client-linux.tar.gz | (cd /usr/local/bin/ ; sudo tar zxf - oc kubectl )
time and timezone in first pod(date)
oc get pods --no-headers -o 'custom-columns=:.metadata.namespace,:.metadata.name' -A | grep -v cert-manager | head -1 | while read NAMESPACE POD ; do oc rsh -n $NAMESPACE $POD bash -c 'date "+%Y-%m-%d %H:%M:%S %Z"' 2>/dev/null ; done
oc get installplan
InstallPlan defines the installation of a set of operators.
oc get installplan install-bk8hw -n openshift-operators -o yaml
Approve all manual updates.
oc get installplans.operators.coreos.com -A --no-headers | awk '$5 ~ /false/' | awk '$4 ~ /Manual/' | while read NAMESPACE INSTALLPLAN END ; do echo '*' $NAMESPACE $INSTALLPLAN ; oc patch installplan $INSTALLPLAN -n $NAMESPACE --type merge --patch '{"spec":{"approved":true}}' ; done
Get selected info from all installplans
oc get installplans.operators.coreos.com -A --no-headers -o=custom-columns='DATE:.metadata.creationTimestamp,NAME:.metadata.name,PHASE:.status.phase,CSV:.spec.clusterServiceVersionNames,NAMESPACE:.metadata.namespace' --sort-by=.metadata.creationTimestamp
oc extract
Extract secrets or config maps to disk
# Extract only the key "nginx.conf" from config map "nginx" to the /tmp directory oc extract configmap/nginx --to=/tmp --keys=nginx.conf
dependencies,owner
Search in output from
oc describe ...
Search for this.
Controlled By: ReplicaSet/rook-ceph-osd-0-6dcdc7fb48
metadata.ownerReferences
Define object that owns object
nodeAffinity
Pin pod to node with label (kubectl label nodes <your-node-name> disktype=ssd)
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
Add user to group
oc adm groups add-users openshift-admins rb_janitor
api-int
api-int.<fqdn>
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; curl -skI https://${i%%:*}:${i##*:} 2>&1 | xargs ; done | cut -c -150
for i in api-int:6443 api:6443 test.apps:443 ; do ping -c1 -W1 ${i%%:*} 2>&1 | xargs ; set -x ; curl -skv https://${i%%:*}:${i##*:} -o /dev/null 2>&1 | grep "Server certificate:" -A5 ; set +x ; done | cut -c -150
test talk to api-int
CACERT=/tmp/%var%lib%kubelet%kubeconfig%certificate-authority-data ; grep certificate-authority-data: /var/lib/kubelet/kubeconfig | awk '{print $2}' | base64 -d > /$CACERT ; curl -s --key /var/lib/kubelet/pki/kubelet-client-current.pem --cert /var/lib/kubelet/pki/kubelet-client-current.pem --cacert $CACERT -XGET "$(grep server /etc/kubernetes/kubeconfig | awk '{print $2}')/api/v1/namespaces/default/pods?limit=500"
api urls
kubernetes generic: reference to the Kubernetes API server. kubernetes.default: reference to the Kubernetes API server within the "default" namespace. kubernetes.default.svc: refers to the Kubernetes service within the "default" namespace. kubernetes.default.svc.cluster.local: This is the fully-qualified domain name (FQDN) for the Kubernetes service within the "default" namespace. openshift: Similar to "kubernetes," this is a generic reference to the OpenShift API server. openshift.default: reference to the OpenShift API server within the "default" namespace. openshift.default.svc: refers to the OpenShift service within the "default" namespace. openshift.default.svc.cluster.local: fully-qualified domain name (FQDN) for the OpenShift service within the "default" namespace.
okd setup fix
# On bootstrap node. Could work on all clusters. First a test to se if it work already.
DOMAIN=$(grep " baseDomain: " /etc/mcc/bootstrap/cluster-dns-02-config.yml | awk '{print $2}')
for i in api-int api ; do ping -c1 -W1 $i.${DOMAIN} 2>&1 | xargs; done | cut -c -150
echo "10.1.0.5 api-int.${DOMAIN} api.${DOMAIN}" >> /etc/hosts
oc annotate
Update the annotations on one or more resources.
oc annotate pods foo description='my frontend'
setuid setgid
securityContext:
runAsUser: 10004000
runAsGroup: 10004000
patch examples
Look at oc get ... -o json and copy line after line.
oc patch redis redis-standalone --type merge --patch '{"spec": {"securityContext": {"runAsGroup": 1000400000}}}'
Enable disable clusterlogging # Unmanaged/Managed
oc patch clusterlogging -n openshift-logging instance --type merge -p '{"spec": {"managementState": "Unmanaged"}}'
Enable disable elasticsearch(Unmanaged/Managed)
oc patch elasticsearch -n openshift-logging elasticsearch --type merge -p '{"spec": {"managementState": "Unmanaged"}}' #
Remove finalizers from pod.
oc patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":null}}'
remove value
Remove .spec.kafka.version
oc patch kafka kafka-cluster --type='json' -p='[{"op": "remove", "path": "/spec/kafka/version"}]'
Add finalizer
oc patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":["kubernetes.io/pvc-protection"]}}'
Replace finalizers value with this.
oc patch pod <pod> -n <namespace> --type merge -p '{"metadata":{"finalizers":["kubernetes.io/pvc-protection","kubernetes"]}}'
patch replicas deployment
oc patch deployment <deployment-name> --patch '{"spec": {"replicas": 0}}'
patch list entries do not wipe existing list entries
oc patch deployment -n openshift-kube-apiserver-operator kube-apiserver-operator --type json -p '[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "quay.io/okd/scos-content@sha256:37d6b6c13d864deb7ea925acf2b2cb34305333f92ce64e7906d3f973a8071642"}]'
oc get deployment kube-apiserver-operator -n openshift-kube-apiserver-operator -o json | jq '.spec.template.spec.containers[0].env |= map(if .name == "IMAGE" then .value = "quay.io/okd/scos-content@sha256:5c9128668752a9b891a24a9ec36e0724d975d6d49e6e4e2d516b5ba80ae2fb23" else . end)' | oc apply -f -
oc get deployment kube-apiserver-operator -n openshift-kube-apiserver-operator -o json | jq '.spec.template.spec.containers[0].env |= map(if .name == "OPERATOR_IMAGE" then .value = "quay.io/okd/scos-content@sha256:37d6b6c13d864deb7ea925acf2b2cb34305333f92ce64e7906d3f973a8071642" else . end)' | oc apply -f -
oc get deployment kube-apiserver-operator -n openshift-kube-apiserver-operator -o json | jq '.spec.template.spec.containers[0].env |= map(if .name == "OPERAND_IMAGE_VERSION" then .value = "1.29.6" else . end)' | oc apply -f -
patch service monitor
kubectl patch servicemonitor cert-utils-operator-controller-manager-metrics-monitor -n openshift-operators -p='[{"op": "replace", "path": "/spec/endpoints/0/tlsConfig/serverName", "value": "cert-utils-operator-controller-manager-metrics-service.openshift-operators.svc"}]' --type='json'
edit text/cert entry
#!/bin/bash
SSL_URL=halfface.se
SSL_PORT=443
DATE_FILE=$(date +%F_%H-%M-%S)
openssl s_client -connect ${SSL_URL}:${SSL_PORT} -servername ${SSL_URL} -verify 5 -showcerts -certform pem </dev/null 2>/dev/null | sed -n '/^----/,/^----/p' > chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem
ln chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem ${SSL_URL}
oc create cm argocd-tls-certs-cm -n argocd --from-file ${SSL_URL} --dry-run=client -o yaml >> /tmp/chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem.patch
oc patch configmap argocd-tls-certs-cm -n argocd --patch-file /tmp/chain.${SSL_URL}.${SSL_PORT}.${DATE_FILE}.pem.patch
limits
When your need to increase your cpu and memory resources. cpu limit is either written as a number. 0.5 for half a cpu. Or rather a definition in milli. 500m for half a cpu.
spec:
containers:
...
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
quotas on cpu memory pvc... per project
oc get ResourceQuota
tolerations|node selectors|...
oc describe pod
Node-Selectors: node-role.kubernetes.io/app=
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
node.ocs.openshift.io/storage=true:NoSchedule
tolerate any taint
tolerations: - operator: Exists
enable monitoring
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
retention: 2d
EOF
retention elasticsearch
Edit the ClusterLogging CR to add or modify the retentionPolicy parameter:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
...
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
retentionPolicy:
application:
maxAge: 1d
infra:
maxAge: 7d
audit:
maxAge: 7d
elasticsearch:
nodeCount: 3
...
retention prometheus
Prometheus retention. https://docs.openshift.com/container-platform/4.10/monitoring/configuring-the-monitoring-stack.html#modifying-retention-time-for-prometheus-metrics-data_configuring-the-monitoring-stack oc edit configmap cluster-monitoring-config -n openshift-monitoring # Enable prometheus. cat <<EOF | oc apply -f - apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: retention: 2d EOF
retention prometheus default
oc get Prometheus k8s -n openshift-monitoring -o json | jq -r .spec.retention oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s "http://localhost:9090/api/v1/status/runtimeinfo" | jq -r '.data.storageRetention'
EFK(elk)
ElasticSearch # Fluentd processing pipeline # Kibana. https://kibana-openshift-logging.apps.<url>
grafana
# grafana https://grafana-openshift-monitoring.apps.<url>
pull secret
Get pull secret
oc get secret/pull-secret -n openshift-config -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .
oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq .
oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' -o json | jq .
Just the keys.
oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq .
Name of each key and email.
oc get secret/pull-secret -n openshift-config --template='Template:Index .data ".dockerconfigjson"' -o json | jq -r '.data.".dockerconfigjson"' | base64 -d | jq -r '.auths | with_entries(.value = .value.email)' | sed 's/{//g;s/}//g;s/"//g' | grep -v '^$' | sed 's/ *//g' | sort
Download pull secret.
oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' > /tmp/pull_secret.$(oc whoami --show-console=true | awk -F / '{print $3}').$(date '+%Y-%m-%d_%H-%M-%S')
Set pull secret.
oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=/tmp/pull_secret_<file_name>
has pull secret been update
echo '#' pull-secret ; oc get secret/pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' | jq -r '.auths[].email'
echo '#' apiserver ; oc exec deployment/apiserver -n openshift-apiserver -c openshift-apiserver -- cat /var/lib/kubelet/config.json | jq
echo '#' nodes ; oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'cat /var/lib/kubelet/config.json | jq'
Does pull secret work
jq . /tmp/pull_secret.2024-01-10_12-00-01.registry.redhat.io
{
"auths": {
"registry.redhat.io": {
"auth": "YmxhYmxh"
}
}
}
podman pull --authfile /tmp/pull_secret.2024-01-10_12-00-01.registry.redhat.io registry.redhat.io/ubi8/ubi:latest
Which pull secret does machineconfig contain
oc get mc 00-master -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig' | sed 's/^data:,//g' | jq .
Is pull secret correct in machineconfigpool. Rendered config
oc get mc rendered-master-3626460c7752fc1605e94c19b7a9aba7 -o json | jq -r '.spec.config.storage.files[]|select(.path=="/var/lib/kubelet/config.json")|.contents.source' | sed 's/^data:,//g' | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig'| jq .
change number of nodes
oc get machineset -n openshift-machine-api oc edit machineset -n openshift-machine-api <MachineSet>
Elasticsearch status
oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cat/health?v oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=_cluster/health?pretty
talk to elasticsearch
oc rsh elasticsearch-cdm-q8apadpa-1-65f99d99b4-8b9wg curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200
Oneliner
oc exec -n openshift-logging -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers -n openshift-logging | head -1) -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200
Free disk space/reclaim
oc exec -n openshift-logging -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers -n openshift-logging | head -1) -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca "https://localhost:9200/_forcemerge?only_expunge_deletes=true" -X POST
which version of elasticsearch operator is installed
oc get csv -n openshift-operators-redhat -l operators.coreos.com/elasticsearch-operator.openshift-operators-redhat="" -o=custom-columns='VERSION:.spec.version' --no-headers
list nodes
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/nodes?v"
Who is master node
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/master?v"
Is cluster recovering
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query="_cat/recovery?active_only=true"
Look at all indices
oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v
export INDICE=<indice> ; oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=$INDICE -XDELETE
look at chards
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v
Create audit index
oc exec -n openshift-logging -c elasticsearch $(oc get pods -n openshift-logging -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | head -1) -- es_util --query=audit-000001 -XPUT
Remove all red indices.
oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=_cat/indices?v | grep ^red | awk '{print $3}' | while read i ; do echo '*' $i ; oc exec -c elasticsearch $(oc get pods -l component=elasticsearch -o custom-columns=:.metadata.name --no-headers | tail -1) -- es_util --query=${i} -X DELETE ; done
recreate elasticsearch pvc:s
# scale down elasticsearch to 0 pods to be able to remove pvc:s oc get deployment -l component=elasticsearch -o custom-columns=NAME:.metadata.name --no-headers -n openshift-logging | while read DEPLOYMENT ; do echo '*' $DEPLOYMENT ; oc scale deployment $DEPLOYMENT --replicas 0 ; done # Remove pvc:s. oc delete pvc -l logging-cluster=elasticsearch -n openshift-logging # scale up elasticsearch to 0 pods to be able to remove pvc:s oc get deployment -l component=elasticsearch -o custom-columns=NAME:.metadata.name --no-headers -n openshift-logging | while read DEPLOYMENT ; do echo '*' $DEPLOYMENT ; oc scale deployment $DEPLOYMENT --replicas 1 ; done
vsphere creds
oc get -n kube-system cm/cluster-config-v1 -o yaml
does vsphere account have expected permissions
oc logs -n openshift-cluster-storage-operator -l name=vsphere-problem-detector-operator --timestamps --tail=100 | less
Enable openshift/okd logging
Enable redhat-operators
oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": false}]'
Or edit individual operator.
oc edit operatorhubs
Spec:
Disable All Default Sources: true
Sources:
Disabled: false
Name: community-operators
Disabled: false
Name: redhat-operators
Or patch OperatorHub for individual operators
oc patch operatorhub cluster --type='json' -p='[{"op": "add", "path": "/spec/sources/-", "value":{"name":"redhat-operators","disabled":false}}]'
Create namespace
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: openshift-operators-redhat
annotations:
openshift.io/node-selector: ""
labels:
openshift.io/cluster-monitoring: "true"
EOF
Create namespace
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: openshift-logging
annotations:
openshift.io/node-selector: ""
labels:
openshift.io/cluster-monitoring: "true"
EOF
Create operatorgroup
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-operators-redhat
namespace: openshift-operators-redhat
spec: {}
EOF
Subscribe to OpenShift Elasticsearch Operator
cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: "elasticsearch-operator" namespace: "openshift-operators-redhat" spec: channel: "stable" installPlanApproval: "Automatic" source: "redhat-operators" sourceNamespace: "openshift-marketplace" name: "elasticsearch-operator" EOF
Install the openshift logging operator.
cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: cluster-logging namespace: openshift-logging spec: targetNamespaces: - openshift-logging EOF
Create a subscription object yaml file.
cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging spec: channel: "stable" name: cluster-logging source: redhat-operators sourceNamespace: openshift-marketplace EOF
Create OpenShift Logging instance.
cat <<EOF | oc apply -f -
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: "openshift-logging"
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
retentionPolicy:
application:
maxAge: 1d
infra:
maxAge: 7d
audit:
maxAge: 7d
elasticsearch:
nodeCount: 3
storage:
storageClassName: "standard-csi"
size: 200G
resources:
limits:
memory: "16Gi"
requests:
memory: "16Gi"
proxy:
resources:
limits:
memory: 256Mi
requests:
memory: 256Mi
redundancyPolicy: "SingleRedundancy"
visualization:
type: "kibana"
kibana:
replicas: 1
collection:
logs:
type: "fluentd"
fluentd: {}
EOF
telemetry
Restart telemetry.
oc delete pod -n openshift-monitoring -l app.kubernetes.io/component=telemetry-metrics-collector
Update vsphere/openstack creds
oc edit cm cloud-provider-config -n openshift-config default-datastore = "cl07-2-fc-loc-001"
Get datastore
oc get cm cloud-provider-config -n openshift-config -o json | jq -r .data.config | sed -nr "/^\[Workspace\]/ { :l /^default-datastore[ ]*=/ { s/[^=]*=[ ]*//; p; q;}; n; b l;}"
Manage labels.
Add a label to a node or pod:
oc label node node001.krenger.ch mylabel=myvalue oc label pod mypod-34-g0f7k mylabel=myvalue
Remove a label (in the example “mylabel”) from a node or pod:
oc label node node001.krenger.ch mylabel- oc label pod mypod-34-g0f7k mylabel-
Permanently label a node
oc edit machineset ocp-qz7hf-worker-us-west-1b -n openshift-machine-api
rollout
Restart pod in an deployment
oc rollout restart deployment -n openshift-storage csi-rbdplugin-provisioner
api.<URL>
openssl_x509_multi_line <(oc get secrets external-loadbalancer-serving-certkey -n openshift-kube-apiserver -o json | jq -r '.data."tls.crt"|@base64d')
ssl certificates replace
How to replace api.<url> and star.apps.<url> certs.
# api. Create full chain cert. Public - intermediate - root ca.
api.<url>.crt
api.<url>.key
# create secret
oc delete secret api-cert -n openshift-config
oc create secret tls api-cert --cert=api.<url>.crt --key=api.<url>.key -n openshift-config
# patch apiserver
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<url>"], "servingCertificate": {"name": "api-cert"}}]}}}'
...
# star.apps. Create full chain cert. Public - intermediate - root ca.
star.apps.<url>.crt
star.apps.<url>.key
# create secret
oc delete secret custom-certs-default -n openshift-ingress
oc create secret tls custom-certs-default --cert=star.apps.<url>.crt --key=star.apps.<url>.key -n openshift-ingress
# patch ingress controller
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
edit serving certs
look at api cert
oc get secret -n openshift-config $(oc get apiservers cluster -o json | jq -r '.spec.servingCerts.namedCertificates[].servingCertificate.name') -o json | jq -r '.data."tls.crt"' | base64 -d
Patch secret api cert
oc patch secret -n openshift-config $(oc get apiservers cluster -o json | jq -r '.spec.servingCerts.namedCertificates[].servingCertificate.name') -p '{"data":{"tls.crt": "<new-base64-encoded-certificate>"}}'
Look at ingress cert. wildcard.apps.<url>
oc get secret -n openshift-ingress $(oc get -n openshift-ingress-operator ingresscontrollers default -o json | jq -r .spec.defaultCertificate.name) -o json | jq -r '.data."tls.crt"' | base64 -d
Patch secret ingress wildcard.apps.<url>
oc patch secret -n openshift-ingress $(oc get -n openshift-ingress-operator ingresscontrollers default -o json | jq -r .spec.defaultCertificate.name) -p '{"data":{"tls.crt": "<new-base64-encoded-certificate>"}}'
After you update above certificates then the following config map is updated to reflect that
openssl_x509_multi_line <(oc get cm kube-root-ca.crt -o json | jq -r '.data."ca.crt"')
get cluster-id
oc get clusterversion/version -o jsonpath="{.spec.clusterID}"
api
Process running api server. They scale horizontally. They all serve requests.
openshift-kube-apiserver kube-apiserver
kube-proxy
kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service concept. kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster. kube-proxy uses the operating system packet filtering layer if there is one and it's available. Otherwise, kube-proxy forwards the traffic itself.
Resource Allocation
OS and Kubernetes overhead. You can see the reserved OS & Kubernetes overhead by comparing the Allocatable (what the Kubernetes Scheduler can allocate to Pods) and the Capacity.
Capacity: ->cpu: 4 ephemeral-storage: 125293548Ki hugepages-1Gi: 0 hugepages-2Mi: 0 ->memory: 16409360Ki pods: 250 Allocatable: ->cpu: 3500m ephemeral-storage: 114396791822 hugepages-1Gi: 0 hugepages-2Mi: 0 ->memory: 15258384Ki pods: 250
requests/limits
User pod allocation is calculated by looking at the “Requests” resource columns from the kubectl get nodes output. The relevant columns here are the “Requests, not Limits. Requests impact how the pod is scheduled, and what resources are allocated to it, whereas limits are used to enable pods to burst beyond their allocation.
look at current Allocated resources
oc get nodes --no-headers --selector="node-role.kubernetes.io/worker" -o=custom-columns='NAME:.metadata.name' | while read NODE ; do oc describe node $NODE | grep "Allocated resources:" -A10 | grep -E ' cpu | memory ' | while read RESOURCE ; do echo $NODE $RESOURCE ; done ; done
empty space
Allocatable - Allocated resources = empty
Allocatable: cpu: 3500m ephemeral-storage: 114396791822 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15258384Ki pods: 250 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 834m (23%) 0 (0%) memory 2474Mi (16%) 736Mi (4%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%)
status of namespace
Show an overview of the current project
oc status
age of cluster
Looking at age of machines.
oc get nodes -o json | jq -r '.items[].metadata.creationTimestamp' | sort -n | sed 's/T/ /g;s/Z//g'
oc adm inspect
oc adm inspect namespace/isilon tar cf /tmp/inspect.isilon.$(date_file ) inspect.local.*
Operations Lifecycle manager(olm)
oc logs -l app=olm-operator -n openshift-operator-lifecycle-manager --tail=-1
Reinstall operator that is no longer available with current openshift version
# Force install odf which is not possible to install because openshift has moved more than 1 version.
# Save subscription
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do
oc get subscription -o yaml -l $i > oc_get_subscription_${i//\//_}.yaml ; done
...
# Save operators
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do
oc get csv -l $i -o yaml > oc_get_csv_-l_${i//\//_}.yaml ; done
...
# Confirm backup files contain usable yaml. Have we forgotten any operators or csv:s. Remove resources clearly not related to odf.
...
# delete the existing ODF related subscriptions and the ClusterServiceVersions related:
for i in operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= operators.coreos.com/odf-operator.openshift-storage= ; do
oc delete subscription -l $i; done
for i in operators.coreos.com/odf-operator.openshift-storage= operators.coreos.com/ocs-operator.openshift-storage= operators.coreos.com/mcg-operator.openshift-storage= operators.coreos.com/odf-csi-addons-operator.openshift-storage= ; do
oc delete csv -l $i ; done
...
# Make sure you wait for the CSVs to be deleted before creating a subscription again.
...
# create only the the Subscription again:
# (optional: edit the subscription before recreate, changing the channel version to the goal version)
...
# Recreate subscription
oc create -f 'oc_get_subscription_operators.coreos.com_odf-operator.openshift-storage=.yaml'
# wait watching the events:
oc get events -w
increase disk on node
Update worker machineset.
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"rootVolume": {"diskSize" : 50}}}}}}}'
View results from above
oc get machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') -o yaml | tee /tmp/$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}').$(date +%F_%H-%M-%S).yaml
Update on node only
VOLUME=abjorklund-01-h4sxm-worker-0-rkk87-root os volume set --size 40 $VOLUME --os-volume-api-version 3.42 dnf install cloud-utils-growpart xfsprogs ssh core@worker growpart /dev/sda 4 xfs_growfs /
increase ram on worker nodes
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"memoryMiB" : 24576}}}}}}'
Change flavor of worker node
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"template": {"spec": {"providerSpec": {"value": {"flavor" : "hm.4x16"}}}}}}'
set number of worker nodes
oc patch machinesets -n openshift-machine-api $(oc get machinesets -n openshift-machine-api -o json | jq -r '.items[] | select(.spec.template.metadata.labels."machine.openshift.io/cluster-api-machine-role" == "worker")| .metadata.name') --type merge -p '{"spec": {"replicas" : 2}}'
clusteroperator
ClusterOperator is the Custom Resource object which holds the current state of an operator. Clusteroperator is resposible for core, systemwide functions like dns and so on.
oc get clusteroperators oc get co oc get clusteroperators -o custom-columns=NAME:.metadata.name,ANNOTATIONS:.metadata.annotations
ignition
Retrieve rendered ignition data.
curl https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $NF}'):22623/config/master curl -v https://api-int.$(grep ^search /etc/resolv.conf | awk '{print $2}'):22623/config/worker
rockylinux container names
ubi ("Standard"): OpenSSL, microdnf, and utilities like gzip and vi
ubi-minimal ("Minimal"): Minimized binaries and minimal yum stack.
ubi-init ("Multi-service"): Less than standard but more than minimal, plus systemd.
ubi-micro ("Micro"): Most minimal image without even a package manager.
create a job/pod/script
Create config map of script
Notice that I have to escape $. Since I give date in a here document. Where $ is being expanded.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: dns-lookup.sh
data:
dns-lookup.sh: |
#!/bin/bash
# Verify if dns resolution works and how fast.
while true ; do
for DNS in \$(awk '/^nameserver / {print \$2}' /etc/resolv.conf) 10.2.0.10 ; do
echo \$(date '+%F %H:%M:%S %Z') \$DNS \$(host -v -t A ibm.se 2>&1 | tail -3 )
done
sleep 5
done
EOF
create job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: dns-lookup
spec:
template:
spec:
containers:
- name: dns-lookup
# image: rockylinux/rockylinux:9
image: halfface/rockylinux-toolbox:v2
command: ["/script/dns-lookup.sh"]
volumeMounts:
- name: script
mountPath: "/script"
# securityContext:
# runAsUser: 0
# privileged: true
volumes:
- name: script
configMap:
name: dns-lookup.sh
defaultMode: 0755
restartPolicy: Never
activeDeadlineSeconds: 1209600
EOF
terminal fix
No line wraps
tput rmam
list operatorhub/catalogsources
oc get catalogsources -n openshift-marketplace oc get catalogsources -n openshift-marketplace -o custom-columns=NAME:.metadata.name,DISPLAY:.spec.displayName,STATE:.status.connectionState.lastObservedState,TYPE:.spec.sourceType,PUBLISHER:.spec.publisher,IMAGE:.spec.image
remove catalogsources
oc get catalogsources.operators.coreos.com -n openshift-marketplace -l company=cambio --no-headers -o custom-columns=:.metadata.name | while read i ; do echo oc get catalogsources $i -n openshift-marketplace -o yaml \>oc_get_catalogsources.$(oc_api_url).$i.$(date_file).yaml ; echo oc delete catalogsource -n openshift-marketplace $i ; done
which changes will occure
. /etc/node-sizing-enabled.env ; NODE_SIZES_ENV=/tmp/node-sizing.env /usr/local/sbin/dynamic-system-reserved-calc.sh true ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES} ; sdiff /etc/node-sizing.env /tmp/node-sizing.env
SYSTEM_RESERVED
cat <<EOF | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: dynamic-node
spec:
autoSizingReserved: true
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: ""
EOF
Which changes will occur.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'hostname ; . /etc/node-sizing-enabled.env ; NODE_SIZES_ENV=/tmp/node-sizing.env /usr/local/sbin/dynamic-system-reserved-calc.sh true ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU} ${SYSTEM_RESERVED_ES} ; sdiff /etc/node-sizing.env /tmp/node-sizing.env' 2>/dev/null
which processes is it complaining about
systemd-cgls /system.slice | grep -o '[^─]*\.service' | cat -v | sed 's/^\^\[\[0m//g' | while read i ; do echo -e "$(systemctl show -p MemoryCurrent $i | awk -F = '{print $2}')\t$i" ; done | column -t -s $'\t' | sort -n
CNI
oc get networks cluster -o 'custom-columns=NETWORKTYPE:.spec.networkType'
Cni from install
echo -e "$(oc --request-timeout=5 get -n kube-system cm/cluster-config-v1 -o json | jq -r '."data"."install-config"')" | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)' | jq -r .networking.networkType
autoscale.
https://docs.openshift.com/container-platform/4.12/machine_management/applying-autoscaling.html
ClusterAutoscaler
# The two below has to be configured.
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
name: default
spec:
logVerbosity: 4
podPriorityThreshold: -10
resourceLimits:
cores:
max: 128
min: 0
maxNodesTotal: 24
memory:
max: 256
min: 0
scaleDown:
delayAfterAdd: 10m
delayAfterDelete: 5m
delayAfterFailure: 30s
enabled: true
unneededTime: 5m
utilizationThreshold: "0.4"
MachineAutoscaler
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
name: abjorklund-01-4rp8x-worker-1
namespace: openshift-machine-api
spec:
maxReplicas: 12
minReplicas: 0
scaleTargetRef:
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
name: abjorklund-01-4rp8x-worker-1
Machineset example where machine is labeled and tainted.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
autoscaling.openshift.io/machineautoscaler: openshift-machine-api/abjorklund-01-4rp8x-worker-1
capacity.cluster-autoscaler.kubernetes.io/cpu: "4"
capacity.cluster-autoscaler.kubernetes.io/memory: "17179869184"
machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "12"
machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
machine.openshift.io/memoryMb: "16384"
machine.openshift.io/vCPU: "4"
labels:
machine.openshift.io/cluster-api-cluster: abjorklund-01-4rp8x
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
name: abjorklund-01-4rp8x-worker-1
namespace: openshift-machine-api
spec:
replicas: 0
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: abjorklund-01-4rp8x
machine.openshift.io/cluster-api-machineset: abjorklund-01-4rp8x-worker-1
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: abjorklund-01-4rp8x
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: abjorklund-01-4rp8x-worker-1
spec:
metadata:
labels:
stress: stress
providerSpec:
value:
apiVersion: machine.openshift.io/v1alpha1
cloudName: openstack
cloudsSecret:
name: openstack-cloud-credentials
namespace: openshift-machine-api
flavor: hm.4x16
image: ""
kind: OpenstackProviderSpec
metadata:
creationTimestamp: null
networks:
- subnets:
- filter:
name: abjorklund-01-4rp8x-nodes
tags: openshiftClusterID=abjorklund-01-4rp8x
rootVolume:
diskSize: 64
sourceUUID: abjorklund-01-4rp8x-rhcos
volumeType: ssd
securityGroups:
- name: abjorklund-01-4rp8x-worker
- uuid: 1de812c6-ed8b-4212-a486-ca283dbe1444
serverGroupName: abjorklund-01-4rp8x-worker-1
serverMetadata:
Name: abjorklund-01-4rp8x-worker
openshiftClusterID: abjorklund-01-4rp8x
tags:
- openshiftClusterID=abjorklund-01-4rp8x
userDataSecret:
name: worker-user-data
taints:
- effect: NoExecute
key: stress
value: stress
autoscaler does not scale down
oc logs -l cluster-autoscaler=default -n openshift-machine-api --tail=-1 --timestamps=true
Add dynamic load to cluster. deployment
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "7"
labels:
app: stress
name: stress
namespace: stress
spec:
progressDeadlineSeconds: 600
replicas: 0
revisionHistoryLimit: 10
selector:
matchLabels:
app: stress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2025-01-24T09:52:22+01:00"
creationTimestamp: null
labels:
app: stress
spec:
containers:
- command:
- /mnt/bin/stress.sh
image: halfface/rockylinux-toolbox:v3
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- bash
- -c
- ps uxawww | grep -q [s]tress && echo 0 || echo 1
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: stress
resources:
requests:
cpu: 700m
memory: 300Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/bin/
name: stress
dnsPolicy: ClusterFirst
nodeSelector:
stress: stress
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: stress
value: stress
volumes:
- configMap:
defaultMode: 493
name: stress.sh
name: stress
configmap
apiVersion: v1
data:
stress.sh: |
#!/bin/bash
# stress pod.
while true ; do
echo $(date '+%F %H:%M:%S %Z') $( stress -m 1 --vm-bytes 1000M --vm-keep -t 300s )
sleep 5
done
kind: ConfigMap
metadata:
name: stress.sh
namespace: stress
change dns server for domain
oc edit dns.operator/default
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
name: default
spec:
servers:
- name: halffce-server
zones:
- halfface.se
forwardPlugin:
policy: Random
upstreams: 10.111.222.2
# View config.
oc get configmap/dns-default -n openshift-dns -o yaml
coredns
# tail logs.
oc get events -A --sort-by=.metadata.creationTimestamp
# Change debug level.
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge
Sets log . { class denial error }
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge
Sets log . { class all }
oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Normal"}}' --type=merge
Sets log . { class error }
Get log files for analyze
oc get pods -l dns.operator.openshift.io/daemonset-dns=default -o custom-columns=POD:.metadata.name,NODE:.spec.nodeName --no-headers -n openshift-dns | while read i j ; do oc logs $i --tail=-1 -c dns --timestamps=true -n openshift-dns > /tmp/oc_logs_$j.$i.$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}').$(date +%F_%H-%M-%S) ; done
get instance dns name
oc get DNS cluster -o=jsonpath='{.spec.baseDomain}'
Read values provided by coredns /metrics
oc exec -it -n openshift-dns $(oc get pods -l dns.operator.openshift.io/daemonset-dns=default --no-headers -n openshift-dns| head -1) -- curl -s http://localhost:9153/metrics
coredns default logformat
# Default format
{remote}:{port} - {>id} "{type} {class} {name} {proto} {size} {>do} {>bufsize}" {rcode} {>rflags} {rsize} {duration}
# Values explained
{port}: client’s port
{remote}: client’s IP address, for IPv6 addresses these are enclosed in brackets: [::1]
{>id}: query ID
{type}: qtype of the request
{class}: qclass of the request
{name}: qname of the request
{proto}: protocol used (tcp or udp)
{size}: request size in bytes
{>do}: is the EDNS0 DO (DNSSEC OK) bit set in the query
{>bufsize}: the EDNS0 buffer size advertised in the query
{rcode}: response RCODE
{>rflags}: response flags, each set flag will be displayed, e.g. “aa, tc”. This includes the qr bit as well
{rsize}: raw (uncompressed), response size (a client may receive a smaller response)
{duration}: response duration
Confirm that coredns hosts are possible to resolve
grep match /etc/coredns/Corefile | uniq | sed 's/\[//g;s/\]//g;s/^ *match //g;s/\.\*/test/g;s/^\^//g' | while read i ; do echo $(dig +short ${i}.) ${i}. ; done
Create lets encrypt certificates on dns domain in route53 which is managed by certmanager.
- Create a domain in route 53.
- Create a user with a token for "Application running outside AWS"
Fill in below values to be able to update config below.
Hosted_Zone_id: <Hosted_Zone_id> Access_key: <Access_key> Secret_access_key: <Secret_access_key> DNS_Domain: <DNS_Domain> DNS_shortname: <DNS_shortname>
Attach the following policy to your newly created user.
(Populate all <Values> below.)
{
"Version": "2023-11-22",
"Statement": [
{
"Effect": "Allow",
"Action": "route53:GetChange",
"Resource": "arn:aws:route53:::change/*"
},
{
"Effect": "Allow",
"Action": "route53:ChangeResourceRecordSets",
"Resource": "arn:aws:route53:::hostedzone/<Hosted_Zone_id>"
},
{
"Effect": "Allow",
"Action": "route53:ListHostedZonesByName",
"Resource": "*"
}
]
}
Create namespace
oc create namespace cert-manager
Install cert-manager community version via graphical fluff.
Create secret that includes <Secret_access_key>.
oc create secret generic route53-secret --from-literal=secret-access-key="<Secret_access_key>" -n cert-manager
Create ClusterIssuer for letsencrypt which uses route53 to show that you own dns.
cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod-dns
namespace: cert-manager
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: support@company.se
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: <DNS_shortname>-issuer-account-key
solvers:
- selector:
dnsZones:
- "<DNS_Domain>"
dns01:
route53:
accessKeyID: <Access_key>
secretAccessKeySecretRef:
name: route53-secret
key: secret-access-key
hostedZoneID: <Hosted_Zone_id>
region: 'us-east-1'
EOF
Create api certificate.
cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cert-api
namespace: openshift-config
spec:
issuerRef:
name: letsencrypt-prod-dns
kind: ClusterIssuer
dnsNames:
- "api.<DNS_Domain>"
secretName: le-api-cert
commonName: "api.<DNS_Domain>"
EOF
Start to use api certificate.
oc patch apiserver cluster --type=merge -p '{"spec":{"servingCerts": {"namedCertificates": [{"names": ["api.<DNS_Domain>"], "servingCertificate": {"name": "le-api-cert"}}]}}}'
Create ingress certificate
cat <<EOF | oc apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: le-wildcard-apps-certificate
namespace: openshift-ingress
spec:
issuerRef:
name: letsencrypt-prod-dns
kind: ClusterIssuer
dnsNames:
- "*.apps.<DNS_Domain>"
secretName: le-wildcard-apps-certificate
commonName: "*.apps.<DNS_Domain>"
EOF
Start to use ingress certificate.
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default --patch '{"spec":{"defaultCertificate":{"name":"le-wildcard-apps-certificate"}}}'
resolv.conf
ndots 5. This means that the DNS client will automatically consider a domain name to be fully qualified (which will allow it to skip the search path iteration) if it has five or more dots.
bind to external login sources ldap ad
oc get authentications.operator.openshift.io cluster -o yaml
get machine name and creation time
oc get machines -o=custom-columns='NAME:.metadata.name,CREATIONTIMESTAMP:.metadata.creationTimestamp,TYPE:.spec.providerSpec.value.flavor,STATUS:.status.phase' -n openshift-machine-api
setup nfs server
nfs export shared between pods.
Create server
openstack server create --flavor gp.1x2 --availability-zone europe-se-1a --image rocky-8-x86_64 --boot-from-volume 30 --network abjorklund-01-bmc7w-openshift --security-group ssh_allow --key-name abjorklund_ed25519 abjorklund_$(date_file) openstack volume create --size 50 --type ssd --description "nfs storage block device 0" nfs_storage_abjorklund-01 openstack server add volume e93d2db1-6d95-4364-a236-0bd1b9255e90 28adbdb9-c88d-4397-9a79-b13c505016a8 --device /dev/vdb
install nfs dependencis
dnf -y install cloud-utils-growpart nfs-utils iptables-utils epel-release vim-enhanced
How to grow filesystem.
partx growpart os volume set --size 60 nfs_storage_abjorklund-01 --os-volume-api-version 3.42
Create partion and disk.
gdisk /dev/sdb mkfs.ext4 /dev/sdb1 find /dev/ -ls | grep sdb | grep by-uuid
Mount drive. /etc/fstab
UUID=66998126-9f18-44ce-a462-827c870a57bd /netstorage ext4 defaults 0 0 mkdir /netstorage mount /netstorage/ mkdir /netstorage/abjorklund-01 chmod 777 /netstorage/abjorklund-01
export drive
systemctl enable nfs-server.service --now
/etc/exports /netstorage/abjorklund-01 10.1.0.0/16(rw,root_squash) exportfs -rav
setup deployment
# deployment.yaml
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
serviceAccountName: nfs-client-provisioner
securityContext:
supplementalGroups:
- 65534
- 1261150637
containers:
- name: nfs-client-provisioner
image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.0
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: auto-nfs-storage
- name: NFS_SERVER
value: 10.1.0.48
- name: NFS_PATH
value: "/netstorage/abjorklund-01"
volumes:
- name: nfs-client-root
nfs:
server: 10.1.0.48
path: /netstorage/abjorklund-01
EOF
# nfs-clusterrolebinding.yaml
cat <<EOF | oc apply -f -
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
EOF
# nfs-clusterrole.yaml
cat <<EOF | oc apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
EOF
# nfs-rolebinding.yaml
cat <<EOF | oc apply -f -
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
EOF
# nfs-role.yaml
cat <<EOF | oc apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
EOF
# nfs-sa.yaml
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
EOF
# storageclass.yaml
cat <<EOF | oc apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: auto-nfs-storage # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
onDelete: delete
EOF
# test-claim.yaml
cat <<EOF | oc apply -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-claim
namespace: default
spec:
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
EOF
set nfs csi driver
https://github.com/kubernetes-csi/csi-driver-nfs
dns
https://access.redhat.com/solutions/3804501
confirm upstream dns works
for UPSTREAM_DNS_IP in 10.46.201.1 10.46.201.2 10.46.201.3 ; do UPSTREAM_DNS_PORT=53 ; echo -e "\nTCP\n"; for dnspod in `oc get pods -n openshift-dns -o name --no-headers -l dns.operator.openshift.io/daemonset-dns=default`; do echo "Pod $dnspod"; oc exec -n openshift-dns -c dns $dnspod -- dig @${UPSTREAM_DNS_IP} redhat.com -p ${UPSTREAM_DNS_PORT} +tcp +short; echo; done ; done
for UPSTREAM_DNS_IP in 10.46.201.1 10.46.201.2 10.46.201.3 ; do UPSTREAM_DNS_PORT=53 ; echo -e "\nUDP\n"; for dnspod in `oc get pods -n openshift-dns -o name --no-headers -l dns.operator.openshift.io/daemonset-dns=default`; do echo "Pod $dnspod"; oc exec -n openshift-dns -c dns $dnspod -- dig @${UPSTREAM_DNS_IP} redhat.com -p ${UPSTREAM_DNS_PORT} +notcp +short; echo; done ; done
image
Which images are ok.
oc get image.config.openshift.io cluster -o yaml
enable sso with keycloak
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
annotations: {}
labels:
app.kubernetes.io/instance: sso
name: cluster
spec:
identityProviders:
- mappingMethod: add
name: SSO
openID:
claims:
email:
- email
groups:
- groups
name:
- name
preferredUsername:
- preferred_username
clientID: <Client name in keycloak>
clientSecret:
name: keycloak-client-secret
extraScopes: []
issuer: <URL to issuer>
type: OpenID
---
apiVersion: v1
data:
clientSecret: <base64 secret>
kind: Secret
metadata:
labels:
app.kubernetes.io/instance: sso
name: keycloak-client-secret
namespace: openshift-config
keepalive/api/ingress
On nodes where nodes server the same ip for api or ingress.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'echo "# unicast_peer" > /etc/keepalived/keepalived.conf'
Get info about where ingress is running.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'ip a' 2>&1 | tee /tmp/tmp ; grep $(host $(oc whoami --show-server | awk -F ':|/' '{print $4}') | awk '{print $NF}') /tmp/tmp
diff rendered mc
export OLD_RENDERED=rendered-infra-6c7e5fc796264dd32341950aea971807 ; export NEW_RENDERED=rendered-infra-bac1dd431374a5c4c21742e547739c7c ; diff -NrU 5 <(oc get mc ${OLD_RENDERED} -o json) <(oc get mc ${NEW_RENDERED} -o json)
secret management
List secrets of they type tls.
get secrets --field-selector type=kubernetes.io/tls
ocm
ocm install
(cd /usr/local/bin/ ; sudo curl -vLsk https://github.com/openshift-online/ocm-cli/releases/download/v0.1.72/ocm-linux-amd64 -o ocm ; sudo chmod 755 ocm)
ocm search examples
ocm list clusters --parameter search="name like 'da0d9ade-d649-4948-8bc6-744a1fcb0960'" ocm get /api/clusters_mgmt/v1/clusters --parameter search="name like '0047ccf6-134b-4bff-99e0-5f2d6532a3ea'" ocm get /api/accounts_mgmt/v1/subscriptions/ --parameter size=1000 | jq -r '.items[]| .display_name +"\t"+ .status +"\t"+ .cluster_id +"\t"+ .created_at' | grep -v Archived | column_tab
Search for two states.
ocm get /api/accounts_mgmt/v1/subscriptions/ --parameter search="status like 'Active' or status like 'Stale'" --parameter size=1000
PodDisruptionBudget
API object that specifies the minimum number of replicas that must be up at a time.
pod placement
Does it look sane which pods run on worker nodes. Search for pods on worker nodes and look for the same pods on all nodes.
oc get nodes --no-headers --selector='node-role.kubernetes.io/worker,!node-role.kubernetes.io/infra' -o=custom-columns='NAME:.metadata.name' | while read NODE ; do oc get pods -A -o wide --no-headers --field-selector "spec.nodeName=$NODE" | while read NAMESPACE POD REST ; do echo '#' $NAMESPACE ${POD%-*} ; oc get pods -n $NAMESPACE -o wide | grep ${POD%-*} ; done ; done | less -ISRM
Are any user pods running outside worker nodes?
oc get project --no-headers -o=custom-columns='NAME:.metadata.name' | grep -v ^openshift- | while read NAMESPACE ; do echo '*' $NAMESPACE ; oc get pods -o wide -n $NAMESPACE ; done
wait
Wait for kafka getting ready.
kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n kafka
list configured ssh public keys
oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINECONFIG ; do echo '*' "${MACHINECONFIG}" ; oc get machineconfig "${MACHINECONFIG}" -o json | jq -r '.spec.config.passwd.users[].sshAuthorizedKeys[]'; done
Add key for ssh login
oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINE_CONFIG_SSH ; do echo '*' $MACHINE_CONFIG_SSH ; oc patch machineconfig $MACHINE_CONFIG_SSH --type=json --patch="[{\"op\":\"add\", \"path\":\"/spec/config/passwd/users/0/sshAuthorizedKeys/-\", \"value\":\"$(cat $HOME/.ssh/id_ed25519.pub)\"}]" ; done
With a save.
oc get machineconfig --no-headers -o custom-columns=":metadata.name" | grep -E '^99-.*-ssh$' | while read MACHINE_CONFIG_SSH ; do echo '*' $MACHINE_CONFIG_SSH ; oc_script_log oc get machineconfig $MACHINE_CONFIG_SSH -o yaml </dev/null ; oc patch machineconfig $MACHINE_CONFIG_SSH --type=json --patch="[{\"op\":\"add\", \"path\":\"/spec/config/passwd/users/0/sshAuthorizedKeys/-\", \"value\":\"$(cat $HOME/.ssh/id_ed25519.pub)\"}]" ; done
readable output from df.
df -lh | grep -Ev '^overlay|^tmpfs|^shm|^nsfs|^cgroup|^devtmpfs'
give me openstack credentials
oc get secret -n kube-system openstack-credentials -o json | jq -r '.data."clouds.yaml" | @base64d'
extract content of container
CONT_ID=$(docker create nginx:latest)
docker export ${CONT_ID} -o nginx.tar.gz
shut down openshift
Stolen with pride: https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html
# Etcd bacup.
# Do we use proxy.
oc get proxy cluster -o yaml
# Make an etcd backup.
oc debug --as-root node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) -- chroot /host sh -c '/usr/local/bin/cluster-backup.sh /home/core/assets/backup'
# Copy files locally.
MASTER=node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) ; oc debug $MASTER -- chroot /host sh -c 'ls /home/core/assets/backup/*' 2>/dev/null | while read ETCD_BACKUP ; do echo '*' Copying ${ETCD_BACKUP##*/} ; oc debug $MASTER -- chroot /host sh -c "cat $ETCD_BACKUP | gzip -9" | zcat > ${ETCD_BACKUP##*/} ; done
# Confirm files are ok.
MASTER=node/$(oc get nodes --no-headers --selector='node-role.kubernetes.io/master' -o=custom-columns='NAME:.metadata.name' | head -1) ; oc debug $MASTER -- chroot /host sh -c 'ls /home/core/assets/backup/*' 2>/dev/null | while read ETCD_BACKUP ; do echo '*' md5sum ${ETCD_BACKUP##*/} ; oc debug $MASTER -- chroot /host sh -c "md5sum $ETCD_BACKUP" 2>/dev/null ; md5sum ${ETCD_BACKUP##*/} ; done
# When does certificate run out.
oc -n openshift-kube-apiserver-operator get secret kube-apiserver-to-kubelet-signer -o jsonpath='{.metadata.annotations.auth\.openshift\.io/certificate-not-after}{"\n"}'
# kubelet client/server certificate expiration.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate; openssl x509 -in /var/lib/kubelet/pki/kubelet-server-current.pem -noout -enddate'
# If certs expire while being shut down. Then we manually have to approve csr:s when cluster comes up.
# oc get csr -o name | xargs oc adm certificate approve
# Shutdown all nodes.
oc get nodes -o name | xargs -I {} oc debug {} -- chroot /host sh -c 'shutdown -h 1'
# Now nodes can stay dead until reviving.
# To start up use command similar to this which is from openstack.
openstack server list -f value | grep SHUTOFF | awk '{print $2}' | xargs openstack server start
statefulset
StatefulSet is a Kubernetes controller designed to manage stateful applications that require stable network identities and persistent storage. It handles the deployment, scaling, and management of pods in an ordered and predictable manner, making it ideal for databases, distributed systems, and other applications where state preservation is critical.
oc diff
Se which changes would be made
kubectl diff -f <manifest>.yaml
taint
Remove taint from node.
kubectl taint node control-plane0.novalocal control-plane1.novalocal control-plane2.novalocal node.cloudprovider.kubernetes.io/uninitialized-
list nodes with taints
oc get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Sealed secrets
create sealed secret
kubeseal --controller-namespace=kube-system --format yaml --namespace openshift-config <ldap-secret.yaml > ldap-secret-sealed.yaml
get secret that you want to unencrypt
oc get sealedsecrets -n openshift-config ldap-secret -o yaml > sealedsecrets_-n_openshift-config_ldap-secret
Unencrypt sealed secrets
kubeseal --recovery-private-key <private_key_file> --recovery-unseal < sealedsecrets_-n_openshift-config_ldap-secret > sealedsecrets_-n_openshift-config_ldap-secret.unsealed
Get private keys from from Sealed secrets
oc get secret -n kube-system -l sealedsecrets.bitnami.com/sealed-secrets-key -o json | jq -r '.items[].data."tls.key"' | while read LINE ; do echo $LINE | base64 -d > $(echo "${LINE}" | cut -c -100) ; done
imagetag
ImageTag represents a single tag within an image stream and includes the spec, the status history, and the currently referenced image (if any) of the provided tag
"alertname": "SamplesImagestreamImportFailing",
"namespace": "openshift-cluster-samples-operator",
# Remove import fail
oc -n openshift get imagetag | grep "ImportFailed" | awk -e '{ print $1 }' | xargs -r oc -n openshift tag -d
oc delete pod -l name=cluster-samples-operator -n openshift-cluster-samples-operator
custom-column examples
oc get machine -n openshift-machine-api -o custom-columns=MACHINE:.metadata.name,SERVERGROUPNAME:.spec.providerSpec.value.serverGroupName,CREATIONTIME:.metadata.creationTimestamp --no-headers
/etc/hosts
BASE_URL=$(oc get DNS cluster -o=jsonpath='{.spec.baseDomain}')
cat << EOF
$( host api.${BASE_URL} | awk '{print $NF}') api.${BASE_URL}
$( host oauth-openshift.apps.${BASE_URL} | awk '{print $NF}') oauth-openshift.apps.${BASE_URL}
EOF
--field-selector: examples
List pods on node.
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<node-name>
List running pods
oc get pods --field-selector status.phase==Running
List not running pods
oc get pods --field-selector status.phase!=Running
node is not ready. What could cause it
oc logs -n openshift-machine-config-operator -l k8s-app=machine-config-controller
Copy file from pod to your machine
kubectl cp -n kafka $(oc get kafka -n kafka --no-headers -o custom-columns=:.metadata.name)-kafka-0:/opt/kafka/libs/kafka-tools-3.9.0.jar /temp/kafka-tools-3.9.0.jar
category
oc get crd -o jsonpath='{range .items[?(@.spec.names.categories)]}{.metadata.name}{"\t"}{.spec.names.categories}{"\n"}{end}' | awk -F '"' '{print $2}' | sort | uniq | while read i ; do echo '*' $i oc get $i -A ; done
* cluster-api oc get cluster-api -A
* coreoperators oc get coreoperators -A
* olm oc get olm -A
* prometheus-operator oc get prometheus-operator -A
noobaa
recreate noobaa
# Recreate noobaa
oc patch -n openshift-storage noobaa noobaa --type='merge' -p '{"spec":{"cleanupPolicy":{"allowNoobaaDeletion":true}}}'
oc delete -n openshift-storage noobaas.noobaa.io --all
# confirm working.
oc get pv,deployment,pods,sts -n openshift-storage|grep noobaa
hdd to sdd convertion
ROTA 1 means spinning platter
oc get nodes -l cluster.ocs.openshift.io/openshift-storage="" --no-headers -o custom-columns=:.metadata.name | xargs -I % oc debug node/% -- chroot /host sh -c "lsblk -d -o NAME,ROTA,MODEL" 2>/dev/null | grep -Ev '^loop|^sr0|^nbd|^rbd' ; oc exec -n openshift-storage $(oc get pods -n openshift-storage -o name -l app=rook-ceph-operator) -it -- bash -c "export CEPH_ARGS='-c /var/lib/rook/openshift-storage/openshift-storage.config'; exec ceph osd tree"
lease
I way of distributing resouces. In the case of nodes report back within time defined.
oc get leases -o custom-columns=NAMESPACE:.metadata.name,NAME:.metadata.name,HOLDER:.spec.holderIdentity,LEASEDURATION:.spec.leaseDurationSeconds,RENEWTIME:.spec.renewTime -A
download oc for windows
https://mirror.openshift.com/pub/openshift-v4/clients/oc/latest/
missing commands
ps_ls(){
echo "PID STATE COMMAND"
for pid in /proc/[0-9]*; do
[ -d "$pid" ] || continue
pid_num=$(basename "$pid")
cmd=$(cat "$pid/cmdline" 2>/dev/null | tr '\0' ' ')
state=$(cat "$pid/stat" 2>/dev/null | cut -d' ' -f3)
[ -n "$cmd" ] && printf "%-8s %-5s %s\n" "$pid_num" "$state" "$cmd"
done
}