Jun 27, 2024

Solving Tensor Puzzles

When attempting to learn about Neural Networks, I ran into constant stumbling blocks with Pytorch, especially around Tensor broadcasting and dimension extensions.

I found out about Tensor Puzzles, a set of exercises to get good at manipulating tensors. I admit, I cheated, incapable of making any progress, but reading through the solutions offered by Ishan on his blog.

But is it really cheating if you try to learn how it works ? ** I would say no**

This is my attempt to really learn the ins and outs of Tensor broadcasting and dimensions.

The puzzle, provides 2 functions: arange and where, and you have to build each of the fn required based on top of these 2 fns and the fns previously implemented.

arange(i)

arange(i) -> tensor([0, 1, 2, 3, 4])

where(arr, v, w)

where([True, True, False], 1, 0) -> tensor([1, 1, 0]) (when arr[i] is True, out[i] = v else out out[i] = w)

Lets Begin !

ones

Solution: where(arange(i) > = 0, 1, 0)

pytorch
>>> arange(5) >= 0
tensor([True, True, True, True, True])

The tensor operation >= is implemented element-wise on each element in arange(5). This ends up being true for all elements thus [True,....] When we use where([True,...], 1, 0), this results in a tensor([1, 1,...])

sum

Solution: ones(a.shape[0]) @ a[:, None]

I am not 100% how the @ (matmul dunder method) work. when a = tensor([1, 2, 3, 4, 5]), a has a shape of (5,). a[:, None] adds a trailing dimension to a, such that a now has a shape of `(5, 1) thus a now becomes

tensor([[0],
        [1],
        [2],
        [3],
        [4]])

multiply tensor([1, 1, 1, 1, 1]) with a now seems to result in a standard vector dot product, thus resulting in tensor([10])

outer

Solution: a[:, None] @ b[None, :]

given

>>> a
tensor([0, 1, 2, 3])
>>> b = arange(5, 9)
>>> b
tensor([5, 6, 7, 8])

>>> a[:, None]
tensor([[0],
        [1],
        [2],
        [3]])

and b[None, :] extends b into

tensor([[5, 6, 7, 8]])

the @ operator multiple the row [5, 6, 7, 8] with each row of 1 element on the left thus generating a 4x4 array:

tensor([[ 0,  0,  0,  0],
        [ 5,  6,  7,  8],
        [10, 12, 14, 16],
        [15, 18, 21, 24]])

diag

Solution: a[arange(a.shape[0]), arange(a.shape[0])]

The tensor subscript operator can take a list of indexes as input (not just a[i], but a[(i,j,k)])

tensor([[ 0,  0,  0,  0],
        [ 5,  6,  7,  8],
        [10, 12, 14, 16],
        [15, 18, 21, 24]])
>>> g[(0, 1), (0, 0)]
tensor([0, 5])

We use that to index in a single expression, all the elements in the diagonal. arange(i), will generate 0, 1, 2..i. By subscripting arange(i) for both row and column indices, we can fetch all the elements in the diagonal a[(0, 1, 2, 3), (0, 1, 2, 3)]

eye

The Identity matrix

Solution: where(arange(j)[:, None] == arange(j), 1, 0)

>>> tensor([0]) == tensor([1, 2, 3])
tensor([False, False, False])

The comparison is done for each element in the tensor arange(5)[:, None] turns our vector into a column vector tensor([[0], [1], [2], [3]]). By comparing the tensor([0, 1, 2, 3]) with each row in the column vector, only the indexes where the elements are equal will be set to True. We then pass this matrix of Booleans to where where the element is set to 1 if True, thus turning it into anidentity matrix

triu

Solution: where(arange(j)[:, None] <= arange(j), 1, 0)

We use the same trick as in eye except we change the comparison from == <= , so that all elements in row <= row index will be True (thus 1) The results will be

[0] < = tensor([0, 1, 2, 3])`  => [True, False, False, False]
[1] < = tensor([0, 1, 2, 3])`  => [True, True, False, False]
[2] < = tensor([0, 1, 2, 3])`  => [True, True, True, False]

cumsum

a[i] = sum(a[0..i])