Machine Learning and NLP
Statistical ML, NLP, topic modeling, and trustworthy ML.
By Yangming Li
This review walks through the PyTorch basics that show up in almost every project: what a tensor really is, how to create and reshape it, how broadcasting works, and how to use torch.distributions for probabilistic modeling. The examples are intentionally small and direct so you can keep the mental model clean.
A tensor is simply a box that holds numbers. The box can have 0, 1, 2, or any number of dimensions.
tensor(3.14)(N,)(H, W)(H, W, C)(B, H, W, C) or (B, C, H, W)The four most important tensor attributes are:
import torch
a = torch.tensor([1, 2, 3])
print(a) # tensor([1, 2, 3])
print(a.shape) # torch.Size([3])
print(a.ndim) # 1
print(a.dtype) # inferred, usually torch.int64
Why is dtype int64? Because all values are integers, PyTorch infers an integer tensor.
Two typical ways:
# Option A: put floats in the list
a = torch.tensor([1.0, 2.0, 3.0])
print(a.dtype) # torch.float32
# Option B: explicitly specify dtype
a = torch.tensor([1, 2, 3], dtype=torch.float32)
print(a.dtype) # torch.float32
In deep learning, parameters, gradients, and inputs are usually float32 for speed and memory efficiency.
torch.zeros(5) # five zeros, shape (5,)
torch.ones(5) # five ones
torch.rand(5) # uniform random in [0, 1)
torch.randn(5) # standard normal N(0, 1)
torch.arange(5) # 0, 1, 2, 3, 4
a = torch.arange(10)
b = torch.ones(10)
print(f'{a = }')
print(f'{b = }')
print(f'{a / b = }')
What happens?
a is an integer tensor: [0, 1, 2, ..., 9]b is a float tensor: ten onesa / b becomes float because of type promotion (int to float to preserve decimals)a = torch.arange(10) # (10,)
b = torch.ones(10) # (10,)
a + b
a - b
a * b
a / b
a.pow(2) # element-wise square
These are element-wise operations, not matrix multiplication (matrix multiplication uses @).
a = torch.ones(5) # (5,)
b = torch.ones(4) # (4,)
a + b # RuntimeError: shape mismatch
PyTorch converts most naturally from a NumPy array:
from PIL import Image
import numpy as np
import torch
img = Image.open("cat.jpg")
arr = np.array(img) # shape (H, W, C)
x = torch.tensor(arr)
print(x.shape, x.dtype) # (H, W, 3), torch.uint8
Image pixels are 0-255 integers, so uint8 is expected. For training, convert to float and normalize:
x = x.float() / 255.0
Many CNNs expect (C, H, W):
x = x.permute(2, 0, 1) # (H, W, C) -> (C, H, W)
Reshaping means viewing the same numbers with a different geometry. The total number of elements must match.
a = torch.arange(6) # shape (6,)
b = a.reshape(2, 3) # shape (2, 3)
view and reshape both reshape, but view requires contiguous memory. When unsure, use reshape.
A = torch.rand(2, 4) # (2, 4)
AT = A.T # (4, 2)
a = torch.rand(2, 3, 4)
print(a.permute(1, 2, 0).shape) # torch.Size([3, 4, 2])
permute reorders dimensions and works for any rank (3D, 4D, etc.).
a = torch.arange(6) # shape (6,)
print(a[None].shape) # shape (1, 6)
a.unsqueeze(0) # (1, 6)
a.unsqueeze(1) # (6, 1)
x = torch.zeros(3, 2, 1, 1) # (3, 2, 1, 1)
x1 = x.squeeze(-1) # (3, 2, 1)
x2 = x1.squeeze(-1) # (3, 2)
Prefer squeeze(dim=...) so you only remove the dimension you intend.
When two tensors do element-wise ops, PyTorch aligns dimensions from the last one backward. Each dimension must match or be 1 to broadcast.
a = torch.rand(4, 1) # 4 rows, 1 column
b = torch.rand(1, 5) # 1 row, 5 columns
print((a + b).shape) # (4, 5)
The single column expands to 5 columns, and the single row expands to 4 rows. The result becomes (4, 5).
torch.distributionsIn PyTorch, a distribution is an object (e.g., dist.Bernoulli, dist.Normal) with two key capabilities:
sample(): generate random samplesmean, variance, log_prob(...): theoretical properties and probability calculationsBernoulli(p) produces:
X = 1 with probability pX = 0 with probability 1 - pIts theoretical mean is p, and variance is p(1 - p). When p = 0.5, mean is 0.5 and variance is 0.25.
(1,)?import torch.distributions as dist
bernoulli = dist.Bernoulli(torch.tensor([0.5]))
print(bernoulli.sample())
torch.tensor([0.5]) has shape (1,), so the output sample also has shape (1,). If you use torch.tensor(0.5), it behaves like a scalar.
for _ in range(10):
print(bernoulli.sample())
samples = [bernoulli.sample() for _ in range(1000)]
x = torch.stack(samples)
print(torch.mean(x))
print(torch.var(x))
torch.stack converts a Python list of tensors into a single tensor so you can apply mean and var.
The sample mean will not be exactly 0.5 because you only sampled 1000 times. As the sample size grows, it converges toward the theoretical mean.
In many PyTorch versions, torch.var applies Bessel's correction (dividing by n - 1), which is an unbiased estimator for sample variance. If you want the population variance closer to p(1 - p), use:
torch.var(x, unbiased=False)
# or in newer versions:
torch.var(x, correction=0)
Normal distributions produce real values across the entire number line. Most values cluster near the mean.
normal = dist.Normal(torch.tensor([0.0]), torch.tensor([1.0]))
print(normal.sample())
The second parameter is the standard deviation (sigma), not the variance. For Normal(0, 1), the mean is 0 and the variance is 1. Samples can be positive or negative because the distribution is continuous across all real numbers.
This review covers the operations you will use every day: tensor creation, shape management, broadcasting, and basic probabilistic distributions. Mastering these fundamentals makes debugging and model-building dramatically faster.