Aug 5, 2025

What I learnt in July 2025

July brings yet another set of updates on what I have learnt

I bought lifetime access to Iximiuz’s courses and challenges. I have changed my mind on the value of paid courses. Though you might be able to find pretty much all of this information online for free, the value proposition for a working professional with limited time to experiment and search for stuff constantly is that someone else has done the hardwork of consolidating and pruning the various paths and has provided a simpler roadmap to pick up the skills and the knowledge needed to get productive very quickly. Iximiuz challenges are definitely worth the money I paid for. The challenges start simple and progresively buildup towards a more in-depth understanding of modern systems and platform engineering tools and I find a lot of it immediately relevant to my day to day work. If you can afford to buy the course, go for it. It will not be a waste of your money !

A few tidbits I learnt from the course:

You can use kubectl debug to launch a container in a Pod that shares the net, uts and mnt namespaces with a container, allowing you to debug running pods/containers without having to restart them, rebuild the image with a different Entrypoint or having to setup elaborate debugging hooks (you might still need them based on the problem you wish to solve, but kubectl debug provides 80% of the needed abilities with 0 investment)

kubectl debug -it podname --image=busybox --target=app -- sh
debug pods

The pause container. When you launch a Pod with your container(s), Kubernetes injects (and schedules) a pause container first. This pause container is created with a uts, networking, mnt and pid (although it looks like it no longer creates a shared pid namespace anymore) namespaces. All the containers of your pod are then attached to the same namespaces, thus allowing the containers in a pod to share resources and communicate with each other. The pause container thus sort of reserves namespace ids so that even when your containers restart they attach to the same namespaces and can communicate with each other. I found a small post that explains a bit more in-depth with a few snippets that you can use to understand it better.
runc - runc is the runtime that creates containers from images. Container management solutions (like Docker and Podman) use runc internally to create the actual containers during runtime. Here are a few more posts
Sidecar init containers Previously you would run a sidecar container alongside the main container in a Pod. The sidecar would perform auxillary activities, such as pushing logs to the central logging system or act as a traffic Proxy (like Envoy). An init container would be run before your main container, to perform startup operations, such as fetching secrets, creating accounts etc and exit before the main container starts. The main container doesn’t start until the init container(s) exit and exit successfully. A sidecar init container is an init container that doesn’t need to exit and can continue running and Kubernetes will go ahead and start running the main container. Sidecar init containers are started with an always restart policy to make sure that they are always running and are killed after the main container(s) in a pod is killed.

...podspec
initContainers:
- name:
  restartPolicy: Always

Docker copy command can automatically extract tar archives when copying into the layers of an image

Docker

Dockerfile entrypoint and cmd

cmd is give as args to entrypoint entrypoint: [“ls”] cmd: [“—help”] actual cmd run is ls --help.

docker run image args will replace cmd for entrypoint with entrypoint args

To update entrypoint run with:

docker run image --entrypoint new-entrypoint-sh

docker stats - stats (CPU, Mem usage etc) about running containers

docke run -t - creates a pseudo tty

docker has volumes and bind mounts volumes - managed by docker and mounted in containers. persist after containers are killed bind mounts - mount a dir in the host into a container

docker inspect image

image format registry/image:tag@sha

Docker BuildKit

BuildKit is an engine for Docker that replaced an older one. Buildkit can parallel-ize the Docker build process (for multi-stage builds) and can also cache layers in a Dockerfile. Depot has a good article that goes in depth into the details of Buildkit. Another option with buildkit is that we can use different drivers in a docker build. For example, the docker-container driver runs a container that builds the image. The container based driver supports parallel-izing builds and layer caching that the normal driver cannot. There is also a remote driver that can run the image building on a remote server for possible speedups or creating images in limited environments.

We must first create a builder and then use it, as part of the image build process.

$ docker buildx create --driver docker-container --name dcbuilder
dcbuilder
$ docker buildx use dcbuilder

For example, here is a sample usecase where we build a docker image but do not compress the individual layers within the image

docker buildx build \
    --output type=image,compression=uncompressed \
    -t app:v1.0.0 \
    .

This level of customization wouldn’t be possible with the default driver and is enabled by the docker-container driver.

Some non-Docker/Kubernetes stuff I was reading about

The creator of Prometheus(the logging/monitoring solution for Cloud software) had an interesting blog post on why Prometheus might be a better choice over using OpenTelemetry, the key point being:

Pull based model instead of push, which can keep track of which source of information is up or down
adds minimal labels to each metric

Ablation studies in LLMs

The word ablation was showing up in many blogs I was reading. Turns out its a technical term to describe what happens when parts of an LLM are removed then tested, to see if it leads to degradation of output, normally done to see how model performs when context windows are made smaller or optimizations are done etc.

KV Cache

1.calculate Q,K & V for input tokens

2.append newly calculate K and V to cache

3.forward cache to forward method so that model doesn’t have to compute k and v for all values again.

Database

How Clickhouse uses Change Data Capture to make a copy of data in Postgres to run analytics workloads. It uses Postgres’ Logical Replication Decoding to capture the stream of human readable updates to rows and update that into Clickhouse for better performance.

That’s it for July, nothing fancy ! See you in August with more interesting updates hopefully !