【2026】 PyTorch Review 01 (Popularized 2018-2020)

Overview

This review walks through the PyTorch basics that show up in almost every project: what a tensor really is, how to create and reshape it, how broadcasting works, and how to use torch.distributions for probabilistic modeling. The examples are intentionally small and direct so you can keep the mental model clean.

1. What Is a Tensor?

A tensor is simply a box that holds numbers. The box can have 0, 1, 2, or any number of dimensions.

0D: a single number (scalar), e.g. tensor(3.14)
1D: a row of numbers (vector), shape (N,)
2D: a table (matrix), shape (H, W)
3D: a cube of data (e.g., image), shape (H, W, C)
4D: a batch of images, shape (B, H, W, C) or (B, C, H, W)

The four most important tensor attributes are:

shape: length of each dimension
ndim: number of dimensions
dtype: data type (int or float with precision)
device: CPU or GPU

2. Create a Tensor from a List

2.1 Directly from a Python list

import torch

a = torch.tensor([1, 2, 3])
print(a)          # tensor([1, 2, 3])
print(a.shape)    # torch.Size([3])
print(a.ndim)     # 1
print(a.dtype)    # inferred, usually torch.int64

Why is dtype int64? Because all values are integers, PyTorch infers an integer tensor.

2.2 Convert to float (most common in deep learning)

Two typical ways:

# Option A: put floats in the list
a = torch.tensor([1.0, 2.0, 3.0])
print(a.dtype)  # torch.float32

# Option B: explicitly specify dtype
a = torch.tensor([1, 2, 3], dtype=torch.float32)
print(a.dtype)  # torch.float32

In deep learning, parameters, gradients, and inputs are usually float32 for speed and memory efficiency.

3. Factory Functions: zeros / ones / rand / randn / arange

torch.zeros(5)    # five zeros, shape (5,)
torch.ones(5)     # five ones
torch.rand(5)     # uniform random in [0, 1)
torch.randn(5)    # standard normal N(0, 1)
torch.arange(5)   # 0, 1, 2, 3, 4

3.1 Example: arange + ones + division

a = torch.arange(10)
b = torch.ones(10)

print(f'{a = }')
print(f'{b = }')
print(f'{a / b = }')

What happens?

a is an integer tensor: [0, 1, 2, ..., 9]
b is a float tensor: ten ones
a / b becomes float because of type promotion (int to float to preserve decimals)

4. Basic Tensor Operations

4.1 Element-wise arithmetic

a = torch.arange(10)  # (10,)
b = torch.ones(10)    # (10,)

a + b
a - b
a * b
a / b
a.pow(2)              # element-wise square

These are element-wise operations, not matrix multiplication (matrix multiplication uses @).

4.2 Shape mismatch (unless broadcastable)

a = torch.ones(5)   # (5,)
b = torch.ones(4)   # (4,)
a + b               # RuntimeError: shape mismatch

5. Convert an Image to a Tensor

PyTorch converts most naturally from a NumPy array:

from PIL import Image
import numpy as np
import torch

img = Image.open("cat.jpg")
arr = np.array(img)          # shape (H, W, C)
x = torch.tensor(arr)
print(x.shape, x.dtype)      # (H, W, 3), torch.uint8

Image pixels are 0-255 integers, so uint8 is expected. For training, convert to float and normalize:

x = x.float() / 255.0

5.1 Channel order: HWC vs CHW

Many CNNs expect (C, H, W):

x = x.permute(2, 0, 1)  # (H, W, C) -> (C, H, W)

6. Reshape: view / reshape

Reshaping means viewing the same numbers with a different geometry. The total number of elements must match.

a = torch.arange(6)    # shape (6,)
b = a.reshape(2, 3)    # shape (2, 3)

view and reshape both reshape, but view requires contiguous memory. When unsure, use reshape.

7. Transpose and permute

7.1 2D transpose

A = torch.rand(2, 4)  # (2, 4)
AT = A.T              # (4, 2)

7.2 permute (general dimension reorder)

a = torch.rand(2, 3, 4)
print(a.permute(1, 2, 0).shape)  # torch.Size([3, 4, 2])

permute reorders dimensions and works for any rank (3D, 4D, etc.).

8. Add or remove size-1 dimensions

8.1 Add a dimension (unsqueeze)

a = torch.arange(6)   # shape (6,)
print(a[None].shape)  # shape (1, 6)

a.unsqueeze(0)        # (1, 6)
a.unsqueeze(1)        # (6, 1)

8.2 Remove size-1 dimensions (squeeze)

x = torch.zeros(3, 2, 1, 1)  # (3, 2, 1, 1)

x1 = x.squeeze(-1)           # (3, 2, 1)
x2 = x1.squeeze(-1)          # (3, 2)

Prefer squeeze(dim=...) so you only remove the dimension you intend.

9. Broadcasting: The Most Useful Shortcut

When two tensors do element-wise ops, PyTorch aligns dimensions from the last one backward. Each dimension must match or be 1 to broadcast.

a = torch.rand(4, 1)  # 4 rows, 1 column
b = torch.rand(1, 5)  # 1 row, 5 columns
print((a + b).shape)  # (4, 5)

The single column expands to 5 columns, and the single row expands to 4 rows. The result becomes (4, 5).

Part 2. Distributions in PyTorch: `torch.distributions`

In PyTorch, a distribution is an object (e.g., dist.Bernoulli, dist.Normal) with two key capabilities:

sample(): generate random samples
mean, variance, log_prob(...): theoretical properties and probability calculations

10. Bernoulli: The Coin Flip (0/1)

Bernoulli(p) produces:

X = 1 with probability p
X = 0 with probability 1 - p

Its theoretical mean is p, and variance is p(1 - p). When p = 0.5, mean is 0.5 and variance is 0.25.

10.1 Why is the sample shape `(1,)`?

import torch.distributions as dist
bernoulli = dist.Bernoulli(torch.tensor([0.5]))
print(bernoulli.sample())

torch.tensor([0.5]) has shape (1,), so the output sample also has shape (1,). If you use torch.tensor(0.5), it behaves like a scalar.

11. Sampling Many Times

11.1 Ten samples (looks random)

for _ in range(10):
    print(bernoulli.sample())

11.2 One thousand samples (compute mean and variance)

samples = [bernoulli.sample() for _ in range(1000)]
x = torch.stack(samples)

print(torch.mean(x))
print(torch.var(x))

torch.stack converts a Python list of tensors into a single tensor so you can apply mean and var.

The sample mean will not be exactly 0.5 because you only sampled 1000 times. As the sample size grows, it converges toward the theoretical mean.

11.3 Note on variance correction

In many PyTorch versions, torch.var applies Bessel's correction (dividing by n - 1), which is an unbiased estimator for sample variance. If you want the population variance closer to p(1 - p), use:

torch.var(x, unbiased=False)
# or in newer versions:
torch.var(x, correction=0)

12. Normal (Gaussian): The Bell Curve

Normal distributions produce real values across the entire number line. Most values cluster near the mean.

normal = dist.Normal(torch.tensor([0.0]), torch.tensor([1.0]))
print(normal.sample())

The second parameter is the standard deviation (sigma), not the variance. For Normal(0, 1), the mean is 0 and the variance is 1. Samples can be positive or negative because the distribution is continuous across all real numbers.

Closing Notes

This review covers the operations you will use every day: tensor creation, shape management, broadcasting, and basic probabilistic distributions. Mastering these fundamentals makes debugging and model-building dramatically faster.