【2026】 pytorch review01
By Yangming Li
Overview
This review walks through the PyTorch basics that show up in almost every project: what a tensor really is, how to create and reshape it, how broadcasting works, and how to use torch.distributions for probabilistic modeling. The examples are intentionally small and direct so you can keep the mental model clean.
1. What Is a Tensor?
A tensor is simply a box that holds numbers. The box can have 0, 1, 2, or any number of dimensions.
- 0D: a single number (scalar), e.g.
tensor(3.14) - 1D: a row of numbers (vector), shape
(N,) - 2D: a table (matrix), shape
(H, W) - 3D: a cube of data (e.g., image), shape
(H, W, C) - 4D: a batch of images, shape
(B, H, W, C)or(B, C, H, W)
The four most important tensor attributes are:
- shape: length of each dimension
- ndim: number of dimensions
- dtype: data type (int or float with precision)
- device: CPU or GPU
2. Create a Tensor from a List
2.1 Directly from a Python list
import torch
a = torch.tensor([1, 2, 3])
print(a) # tensor([1, 2, 3])
print(a.shape) # torch.Size([3])
print(a.ndim) # 1
print(a.dtype) # inferred, usually torch.int64
Why is dtype int64? Because all values are integers, PyTorch infers an integer tensor.
2.2 Convert to float (most common in deep learning)
Two typical ways:
# Option A: put floats in the list
a = torch.tensor([1.0, 2.0, 3.0])
print(a.dtype) # torch.float32
# Option B: explicitly specify dtype
a = torch.tensor([1, 2, 3], dtype=torch.float32)
print(a.dtype) # torch.float32
In deep learning, parameters, gradients, and inputs are usually float32 for speed and memory efficiency.
3. Factory Functions: zeros / ones / rand / randn / arange
torch.zeros(5) # five zeros, shape (5,)
torch.ones(5) # five ones
torch.rand(5) # uniform random in [0, 1)
torch.randn(5) # standard normal N(0, 1)
torch.arange(5) # 0, 1, 2, 3, 4
3.1 Example: arange + ones + division
a = torch.arange(10)
b = torch.ones(10)
print(f'{a = }')
print(f'{b = }')
print(f'{a / b = }')
What happens?
ais an integer tensor:[0, 1, 2, ..., 9]bis a float tensor: ten onesa / bbecomes float because of type promotion (int to float to preserve decimals)
4. Basic Tensor Operations
4.1 Element-wise arithmetic
a = torch.arange(10) # (10,)
b = torch.ones(10) # (10,)
a + b
a - b
a * b
a / b
a.pow(2) # element-wise square
These are element-wise operations, not matrix multiplication (matrix multiplication uses @).
4.2 Shape mismatch (unless broadcastable)
a = torch.ones(5) # (5,)
b = torch.ones(4) # (4,)
a + b # RuntimeError: shape mismatch
5. Convert an Image to a Tensor
PyTorch converts most naturally from a NumPy array:
from PIL import Image
import numpy as np
import torch
img = Image.open("cat.jpg")
arr = np.array(img) # shape (H, W, C)
x = torch.tensor(arr)
print(x.shape, x.dtype) # (H, W, 3), torch.uint8
Image pixels are 0-255 integers, so uint8 is expected. For training, convert to float and normalize:
x = x.float() / 255.0
5.1 Channel order: HWC vs CHW
Many CNNs expect (C, H, W):
x = x.permute(2, 0, 1) # (H, W, C) -> (C, H, W)
6. Reshape: view / reshape
Reshaping means viewing the same numbers with a different geometry. The total number of elements must match.
a = torch.arange(6) # shape (6,)
b = a.reshape(2, 3) # shape (2, 3)
view and reshape both reshape, but view requires contiguous memory. When unsure, use reshape.
7. Transpose and permute
7.1 2D transpose
A = torch.rand(2, 4) # (2, 4)
AT = A.T # (4, 2)
7.2 permute (general dimension reorder)
a = torch.rand(2, 3, 4)
print(a.permute(1, 2, 0).shape) # torch.Size([3, 4, 2])
permute reorders dimensions and works for any rank (3D, 4D, etc.).
8. Add or remove size-1 dimensions
8.1 Add a dimension (unsqueeze)
a = torch.arange(6) # shape (6,)
print(a[None].shape) # shape (1, 6)
a.unsqueeze(0) # (1, 6)
a.unsqueeze(1) # (6, 1)
8.2 Remove size-1 dimensions (squeeze)
x = torch.zeros(3, 2, 1, 1) # (3, 2, 1, 1)
x1 = x.squeeze(-1) # (3, 2, 1)
x2 = x1.squeeze(-1) # (3, 2)
Prefer squeeze(dim=...) so you only remove the dimension you intend.
9. Broadcasting: The Most Useful Shortcut
When two tensors do element-wise ops, PyTorch aligns dimensions from the last one backward. Each dimension must match or be 1 to broadcast.
a = torch.rand(4, 1) # 4 rows, 1 column
b = torch.rand(1, 5) # 1 row, 5 columns
print((a + b).shape) # (4, 5)
The single column expands to 5 columns, and the single row expands to 4 rows. The result becomes (4, 5).
Part 2. Distributions in PyTorch: torch.distributions
In PyTorch, a distribution is an object (e.g., dist.Bernoulli, dist.Normal) with two key capabilities:
sample(): generate random samplesmean,variance,log_prob(...): theoretical properties and probability calculations
10. Bernoulli: The Coin Flip (0/1)
Bernoulli(p) produces:
X = 1with probabilitypX = 0with probability1 - p
Its theoretical mean is p, and variance is p(1 - p). When p = 0.5, mean is 0.5 and variance is 0.25.
10.1 Why is the sample shape (1,)?
import torch.distributions as dist
bernoulli = dist.Bernoulli(torch.tensor([0.5]))
print(bernoulli.sample())
torch.tensor([0.5]) has shape (1,), so the output sample also has shape (1,). If you use torch.tensor(0.5), it behaves like a scalar.
11. Sampling Many Times
11.1 Ten samples (looks random)
for _ in range(10):
print(bernoulli.sample())
11.2 One thousand samples (compute mean and variance)
samples = [bernoulli.sample() for _ in range(1000)]
x = torch.stack(samples)
print(torch.mean(x))
print(torch.var(x))
torch.stack converts a Python list of tensors into a single tensor so you can apply mean and var.
The sample mean will not be exactly 0.5 because you only sampled 1000 times. As the sample size grows, it converges toward the theoretical mean.
11.3 Note on variance correction
In many PyTorch versions, torch.var applies Bessel's correction (dividing by n - 1), which is an unbiased estimator for sample variance. If you want the population variance closer to p(1 - p), use:
torch.var(x, unbiased=False)
# or in newer versions:
torch.var(x, correction=0)
12. Normal (Gaussian): The Bell Curve
Normal distributions produce real values across the entire number line. Most values cluster near the mean.
normal = dist.Normal(torch.tensor([0.0]), torch.tensor([1.0]))
print(normal.sample())
The second parameter is the standard deviation (sigma), not the variance. For Normal(0, 1), the mean is 0 and the variance is 1. Samples can be positive or negative because the distribution is continuous across all real numbers.
Closing Notes
This review covers the operations you will use every day: tensor creation, shape management, broadcasting, and basic probabilistic distributions. Mastering these fundamentals makes debugging and model-building dramatically faster.