Machine Learning and NLP
Statistical ML, NLP, topic modeling, and trustworthy ML.
Deep Neural Networks (DNNs) have revolutionized machine learning by enabling computers to learn complex patterns from data. The core principles of DNNs can be distilled into two main phases: forward propagation and backpropagation, combined with optimization algorithms like gradient descent. Through these mechanisms and multiple layers of non-linear transformations, DNNs can gradually approximate complex function mappings using large amounts of data.
This article breaks down the fundamental concepts of DNNs and provides practical implementation examples using PyTorch.
A basic DNN consists of multiple layers of interconnected neurons:
Forward propagation is the process of passing input data through the network to generate predictions:
This process essentially applies a series of linear transformations followed by non-linear activations, increasing model capacity with more layers and neurons.
Loss functions quantify how well the model's predictions match the ground truth:
where \(m\) is the batch size.
Backpropagation efficiently calculates gradients of the loss with respect to all parameters. The key steps are:
The most common optimization method is gradient descent and its variants:
To prevent overfitting, common regularization methods include:
Below is a complete example of implementing a DNN for MNIST digit classification using PyTorch:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# 1. Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# 2. Data preparation: MNIST handwritten digits
transform = transforms.Compose([
transforms.ToTensor(), # Convert to Tensor, normalize to [0,1]
transforms.Normalize((0.1307,), (0.3081,))# Mean and std
])
train_dataset = datasets.MNIST(root='./data',
train=True,
transform=transform,
download=True)
test_dataset = datasets.MNIST(root='./data',
train=False,
transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)
# 3. Model definition: a two-layer fully connected network
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.fc1 = nn.Linear(28*28, 256)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(256, 10)
def forward(self, x):
x = x.view(x.size(0), -1) # Flatten: batch x 784
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
model = MLP().to(device)
# 4. Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# 5. Training loop
num_epochs = 5
for epoch in range(1, num_epochs + 1):
model.train() # Switch to training mode
running_loss = 0.0
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad() # Clear gradients
outputs = model(data) # Forward pass
loss = criterion(outputs, target)
loss.backward() # Backpropagation
optimizer.step() # Parameter update
running_loss += loss.item()
if (batch_idx + 1) % 100 == 0:
print(f'Epoch [{epoch}/{num_epochs}], '
f'Step [{batch_idx+1}/{len(train_loader)}], '
f'Loss: {running_loss / 100:.4f}')
running_loss = 0.0
# 6. Test evaluation
model.eval() # Switch to evaluation mode
correct = 0
total = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
outputs = model(data)
_, predicted = torch.max(outputs.data, 1)
total += target.size(0)
correct += (predicted == target).sum().item()
print(f'Test Accuracy: {100 * correct / total:.2f}%')
# 7. Save model
torch.save(model.state_dict(), 'mlp_mnist.pth')
print('Model saved to mlp_mnist.pth')
Normalize((mean,), (std,)) helps accelerate convergence.state_dict() makes it easier for later loading and deployment.Several Python libraries are available for implementing DNNs:
Deep Neural Networks learn through:
Understanding these principles provides the foundation for working with more advanced architectures like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers.