What I learnt in July 2025
July brings yet another set of updates on what I have learnt
I bought lifetime access to Iximiuz’s courses and challenges. I have changed my mind on the value of paid courses. Though you might be able to find pretty much all of this information online for free, the value proposition for a working professional with limited time to experiment and search for stuff constantly is that someone else has done the hardwork of consolidating and pruning the various paths and has provided a simpler roadmap to pick up the skills and the knowledge needed to get productive very quickly. Iximiuz challenges are definitely worth the money I paid for. The challenges start simple and progresively buildup towards a more in-depth understanding of modern systems and platform engineering tools and I find a lot of it immediately relevant to my day to day work. If you can afford to buy the course, go for it. It will not be a waste of your money !
A few tidbits I learnt from the course:
- You can use
kubectl debug
to launch a container in a Pod that shares thenet
,uts
andmnt
namespaces with a container, allowing you to debug running pods/containers without having to restart them, rebuild the image with a different Entrypoint or having to setup elaborate debugging hooks (you might still need them based on the problem you wish to solve, butkubectl debug
provides 80% of the needed abilities with 0 investment)
kubectl debug -it podname --image=busybox --target=app -- sh
debug pods
- The
pause
container. When you launch a Pod with your container(s), Kubernetes injects (and schedules) apause
container first. This pause container is created with auts
,networking
,mnt
andpid
(although it looks like it no longer creates a sharedpid
namespace anymore) namespaces. All the containers of your pod are then attached to the same namespaces, thus allowing the containers in a pod to share resources and communicate with each other. Thepause
container thus sort of reserves namespace ids so that even when your containers restart they attach to the same namespaces and can communicate with each other. I found a small post that explains a bit more in-depth with a few snippets that you can use to understand it better. runc
- runc is the runtime that creates containers from images. Container management solutions (like Docker and Podman) userunc
internally to create the actual containers during runtime. Here are a few more posts- Sidecar init containers
Previously you would run a
sidecar
container alongside the main container in a Pod. The sidecar would perform auxillary activities, such as pushing logs to the central logging system or act as a traffic Proxy (like Envoy). Aninit
container would be run before yourmain
container, to perform startup operations, such as fetching secrets, creating accounts etc and exit before themain
container starts. The main container doesn’t start until the init container(s) exit and exit successfully. Asidecar init
container is aninit
container that doesn’t need to exit and can continue running and Kubernetes will go ahead and start running the main container. Sidecar init containers are started with an always restart policy to make sure that they are always running and are killed after the main container(s) in a pod is killed.
...podspec
initContainers:
- name:
restartPolicy: Always
- Docker copy command can automatically extract tar archives when copying into the layers of an image
Docker
Dockerfile entrypoint
and cmd
cmd is give as args to entrypoint
entrypoint: [“ls”]
cmd: [“—help”]
actual cmd run is ls --help
.
docker run image args
will replace cmd for entrypoint with entrypoint args
To update entrypoint run with:
docker run image --entrypoint new-entrypoint-sh
docker stats
- stats (CPU, Mem usage etc) about running containers
docke run -t
- creates a pseudo tty
docker has volumes and bind mounts volumes - managed by docker and mounted in containers. persist after containers are killed bind mounts - mount a dir in the host into a container
docker inspect image
image format registry/image:tag@sha
Docker BuildKit
BuildKit is an engine
for Docker that replaced an older one. Buildkit can parallel-ize the Docker build process (for multi-stage builds) and can also cache layers in a Dockerfile. Depot has a good article that goes in depth into the details of Buildkit.
Another option with buildkit is that we can use different drivers in a docker build
. For example, the docker-container
driver runs a container that builds the image. The container based driver supports parallel-izing builds and layer caching that the normal driver cannot. There is also a remote
driver that can run the image building on a remote server for possible speedups or creating images in limited environments.
We must first create a builder and then use
it, as part of the image build process.
$ docker buildx create --driver docker-container --name dcbuilder
dcbuilder
$ docker buildx use dcbuilder
For example, here is a sample usecase where we build a docker image but do not compress the individual layers within the image
docker buildx build \
--output type=image,compression=uncompressed \
-t app:v1.0.0 \
.
This level of customization wouldn’t be possible with the default driver and is enabled by the docker-container
driver.
Some non-Docker/Kubernetes stuff I was reading about
The creator of Prometheus(the logging/monitoring solution for Cloud software) had an interesting blog post on why Prometheus might be a better choice over using OpenTelemetry, the key point being:
- Pull based model instead of push, which can keep track of which source of information is up or down
- adds minimal labels to each metric
Ablation studies in LLMs
The word ablation
was showing up in many blogs I was reading. Turns out its a technical term to describe what happens when
parts of an LLM are removed then tested, to see if it leads to degradation of output, normally done to see how model performs when context windows are made smaller or optimizations are done etc.
KV Cache
1.calculate Q,K & V for input tokens
2.append newly calculate K and V to cache
3.forward cache to forward
method so that model doesn’t have to compute k and v for all values again.
Database
How Clickhouse uses Change Data Capture to make a copy of data in Postgres to run analytics workloads. It uses Postgres’ Logical Replication Decoding to capture the stream of human readable updates to rows and update that into Clickhouse for better performance.
That’s it for July, nothing fancy ! See you in August with more interesting updates hopefully !