This repository contains two implementations of generative models and language systems: a CPU-trained GPT model and a GAN-based handwritten digit generator.
- End-to-End CPU-Trained GPT System: A transformer-based language model trained from scratch on the FineWeb-Edu dataset using only CPU resources.
- Handwritten Digit Generation (DCGAN & cGAN): A Deep Convolutional Generative Adversarial Network (DCGAN) and Conditional GAN (cGAN) trained on the MNIST dataset.
Status: Deployed Demo: https://helixgpt.azurewebsites.net
This project implements a GPT language model trained entirely on CPU hardware. It covers the full pipeline from raw data acquisition to deployment, demonstrating transformer training feasibility on constrained hardware.
- Architecture: Decoder-only Transformer (GPT)
- Parameters: 11.64 Million
- Context Window: 512 tokens
- Tokenizer: Byte-Level BPE (Vocab size: 20,257)
- Training Hardware: Azure Standard E16as v5 (CPU)
- Inference: FastAPI + Docker (Azure Web App)
| Component | Specification | Comparison to GPT-2 Small |
|---|---|---|
| Layers | 8 | ~0.66x |
| Hidden Size | 256 | ~0.33x |
| Attention Heads | 8 | ~0.66x |
| Total Params | 11M | ~0.10x |
The model was trained on the FineWeb-Edu 100M dataset (approx. 100 million tokens).
- Acquisition: Raw text retrieval from HuggingFace.
- Tokenization: Custom-trained BPE tokenizer reserving
<|endoftext|>(ID 0). - Serialization: Data encoded to
uint16binary shards (.bin) for memory-mapped loading. - Splitting: 90% Training / 10% Validation.
- Optimizer: AdamW (Beta1=0.9, Beta2=0.95)
- Schedule: Cosine learning rate decay with 1000-step warmup.
- Batch Size: 8
- Duration: ~9.5 hours (24,413 steps / 1 epoch).
- Loss: Initial ~9.95 | Final ~4.75.
End-to-End CPU-Trained GPT System/
├── app/ # Azure deployment files
├── bpe_tokenizer/ # BPE vocab and merges
├── checkpoints/ # Model weights (.pt)
├── src/
│ ├── data.py # Binary data loader
│ ├── model.py # GPT architecture definition
│ ├── tokenizer.py # BPE logic
│ └── utils.py # Configuration utilities
├── train_gpt2.py # Main training loop
└── prepare_dataset.py # Data processing script
This project implements Generative Adversarial Networks to synthesize handwritten digits resembling the MNIST dataset. It includes two distinct architectures: a standard DCGAN for random generation and a Conditional GAN (cGAN) for targeted digit generation.
The DCGAN generates random digit images by learning the latent space distribution of the training data.
Generator Architecture:
- Input: Random noise vector (100 dimensions).
- Dense Layer: Projects noise to 7x7x256 feature map.
- Upsampling: 3x Conv2DTranspose layers with Batch Normalization and LeakyReLU.
- Output: 28x28x1 image (Tanh activation).
Discriminator Architecture:
- Input: 28x28x1 image.
- Downsampling: 2x Conv2D layers (Strides=2) with LeakyReLU and Dropout (0.3).
- Output: Binary classification (Real vs. Fake).
Training Hyperparameters:
- Epochs: 1000
- Batch Size: 256
- Optimizer: Adam (Learning Rate: 0.0002, Beta1: 0.5)
- Loss Function: Binary Crossentropy
The cGAN extends the architecture by conditioning both the generator and discriminator on class labels (0-9), allowing for deterministic generation of specific digits.
Environment Setup:
pip install tensorflow imageio tensorflow-docsGenerating Random Digits (DCGAN):
Load the trained model dcgan_generator.keras and pass a noise vector.
import tensorflow as tf
import matplotlib.pyplot as plt
model = tf.keras.models.load_model('dcgan_generator.keras')
noise = tf.random.normal([1, 100])
prediction = model(noise, training=False)
plt.imshow(prediction[0, :, :, 0], cmap='gray')
plt.show()Generating Specific Digits (cGAN):
Load the trained model cgan_generator.keras and pass both noise and the target label.
import tensorflow as tf
import numpy as np
model = tf.keras.models.load_model('cgan_generator.keras')
noise = tf.random.normal([1, 100])
label = np.array([7]) # Specify digit here
prediction = model([noise, label], training=False)Handwritten Digits Generator/
├── generate_handwritten_digit_images_DCGAN.ipynb # Training notebook
├── dcgan_generator.keras # Saved DCGAN model
├── cgan_generator.keras # Saved cGAN model
└── training_checkpoints/ # Training artifacts