
Solving Tensor Puzzles
When attempting to learn about Neural Networks, I ran into constant stumbling blocks with Pytorch, especially around Tensor broadcasting and dimension extensions.
I found out about Tensor Puzzles, a set of exercises to get good at manipulating tensors. I admit, I cheated, incapable of making any progress, but reading through the solutions offered by Ishan on his blog.
But is it really cheating if you try to learn how it works ? ** I would say no**
This is my attempt to really learn the ins and outs of Tensor broadcasting and dimensions.
The puzzle, provides 2 functions: arange
and where
, and you have to build each of the fn required based on top of these 2 fns
and the fns previously implemented.
arange(i)
arange(i)
-> tensor([0, 1, 2, 3, 4])
where(arr, v, w)
where([True, True, False], 1, 0)
-> tensor([1, 1, 0])
(when arr[i] is True, out[i] = v else out out[i] = w)
Lets Begin !
ones
Solution: where(arange(i) > = 0, 1, 0)
pytorch
>>> arange(5) >= 0
tensor([True, True, True, True, True])
The tensor operation >=
is implemented element-wise on each element in arange(5).
This ends up being true for all elements thus [True,....]
When we use where([True,...], 1, 0)
, this results in a tensor([1, 1,...])
sum
Solution: ones(a.shape[0]) @ a[:, None]
I am not 100% how the @
(matmul dunder method) work.
when a = tensor([1, 2, 3, 4, 5])
, a has a shape of (5,). a[:, None]
adds a trailing dimension to a,
such that a now has a shape of `(5, 1)
thus a now becomes
tensor([[0],
[1],
[2],
[3],
[4]])
multiply tensor([1, 1, 1, 1, 1]) with a now seems to result in a standard vector dot product, thus resulting in tensor([10])
outer
Solution: a[:, None] @ b[None, :]
given
>>> a
tensor([0, 1, 2, 3])
>>> b = arange(5, 9)
>>> b
tensor([5, 6, 7, 8])
>>> a[:, None]
tensor([[0],
[1],
[2],
[3]])
and b[None, :]
extends b into
tensor([[5, 6, 7, 8]])
the @
operator multiple the row [5, 6, 7, 8] with each row of 1 element on the left
thus generating a 4x4 array:
tensor([[ 0, 0, 0, 0],
[ 5, 6, 7, 8],
[10, 12, 14, 16],
[15, 18, 21, 24]])
diag
Solution: a[arange(a.shape[0]), arange(a.shape[0])]
The tensor subscript operator can take a list of indexes as input (not just a[i], but a[(i,j,k)])
tensor([[ 0, 0, 0, 0],
[ 5, 6, 7, 8],
[10, 12, 14, 16],
[15, 18, 21, 24]])
>>> g[(0, 1), (0, 0)]
tensor([0, 5])
We use that to index in a single expression, all the elements in the diagonal. arange(i), will generate 0, 1, 2..i. By subscripting arange(i) for both row and column indices, we can fetch all the elements in the diagonal a[(0, 1, 2, 3), (0, 1, 2, 3)]
eye
The Identity matrix
Solution: where(arange(j)[:, None] == arange(j), 1, 0)
>>> tensor([0]) == tensor([1, 2, 3])
tensor([False, False, False])
The comparison is done for each element in the tensor
arange(5)[:, None]
turns our vector into a column vector tensor([[0], [1], [2], [3]])
. By comparing the tensor([0, 1, 2, 3])
with each row in the column vector, only the indexes where the elements are equal will be set to True. We then pass this matrix of Booleans to where
where the element is set to 1 if True, thus turning it into anidentity matrix
triu
Solution: where(arange(j)[:, None] <= arange(j), 1, 0)
We use the same trick as in eye
except we change the comparison from ==
<=
, so that all elements in row <= row index will be True (thus 1)
The results will be
[0] < = tensor([0, 1, 2, 3])` => [True, False, False, False]
[1] < = tensor([0, 1, 2, 3])` => [True, True, False, False]
[2] < = tensor([0, 1, 2, 3])` => [True, True, True, False]
cumsum
a[i] = sum(a[0..i])