Skip to content
This repository was archived by the owner on Jun 23, 2025. It is now read-only.

Commit 21e3ce2

Browse files
committed
Archiving notice
1 parent 2d5b21b commit 21e3ce2

File tree

1 file changed

+32
-172
lines changed

1 file changed

+32
-172
lines changed

README.md

Lines changed: 32 additions & 172 deletions
Original file line numberDiff line numberDiff line change
@@ -1,172 +1,32 @@
1-
# AI on GKE Assets
2-
3-
This repository contains assets related to AI/ML workloads on
4-
[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/integrations/ai-infra).
5-
6-
## Overview
7-
8-
Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers:
9-
10-
- Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale
11-
- Flexible integration with distributed computing and data processing frameworks
12-
- Support for multiple teams on the same infrastructure to maximize utilization of resources
13-
14-
## Infrastructure
15-
16-
The AI-on-GKE application modules assumes you already have a functional GKE cluster. If not, follow the instructions under [infrastructure/README.md](./infrastructure/README.md) to install a Standard or Autopilot GKE cluster.
17-
18-
```bash
19-
.
20-
├── LICENSE
21-
├── README.md
22-
├── infrastructure
23-
│ ├── README.md
24-
│ ├── backend.tf
25-
│ ├── main.tf
26-
│ ├── outputs.tf
27-
│ ├── platform.tfvars
28-
│ ├── variables.tf
29-
│ └── versions.tf
30-
├── modules
31-
│ ├── gke-autopilot-private-cluster
32-
│ ├── gke-autopilot-public-cluster
33-
│ ├── gke-standard-private-cluster
34-
│ ├── gke-standard-public-cluster
35-
│ ├── jupyter
36-
│ ├── jupyter_iap
37-
│ ├── jupyter_service_accounts
38-
│ ├── kuberay-cluster
39-
│ ├── kuberay-logging
40-
│ ├── kuberay-monitoring
41-
│ ├── kuberay-operator
42-
│ └── kuberay-serviceaccounts
43-
└── tutorial.md
44-
```
45-
46-
To deploy new GKE cluster update the `platform.tfvars` file with the appropriate values and then execute below terraform commands:
47-
```
48-
terraform init
49-
terraform apply -var-file platform.tfvars
50-
```
51-
52-
53-
## Applications
54-
55-
The repo structure looks like this:
56-
57-
```bash
58-
.
59-
├── LICENSE
60-
├── Makefile
61-
├── README.md
62-
├── applications
63-
│ ├── jupyter
64-
│ └── ray
65-
├── contributing.md
66-
├── dcgm-on-gke
67-
│ ├── grafana
68-
│ └── quickstart
69-
├── gke-a100-jax
70-
│ ├── Dockerfile
71-
│ ├── README.md
72-
│ ├── build_push_container.sh
73-
│ ├── kubernetes
74-
│ └── train.py
75-
├── gke-batch-refarch
76-
│ ├── 01_gke
77-
│ ├── 02_platform
78-
│ ├── 03_low_priority
79-
│ ├── 04_high_priority
80-
│ ├── 05_compact_placement
81-
│ ├── 06_jobset
82-
│ ├── Dockerfile
83-
│ ├── README.md
84-
│ ├── cloudbuild-create.yaml
85-
│ ├── cloudbuild-destroy.yaml
86-
│ ├── create-platform.sh
87-
│ ├── destroy-platform.sh
88-
│ └── images
89-
├── gke-disk-image-builder
90-
│ ├── README.md
91-
│ ├── cli
92-
│ ├── go.mod
93-
│ ├── go.sum
94-
│ ├── imager.go
95-
│ └── script
96-
├── gke-dws-examples
97-
│ ├── README.md
98-
│ ├── dws-queues.yaml
99-
│ ├── job.yaml
100-
│ └── kueue-manifests.yaml
101-
├── gke-online-serving-single-gpu
102-
│ ├── README.md
103-
│ └── src
104-
├── gke-tpu-examples
105-
│ ├── single-host-inference
106-
│ └── training
107-
├── indexed-job
108-
│ ├── Dockerfile
109-
│ ├── README.md
110-
│ └── mnist.py
111-
├── jobset
112-
│ └── pytorch
113-
├── modules
114-
│ ├── gke-autopilot-private-cluster
115-
│ ├── gke-autopilot-public-cluster
116-
│ ├── gke-standard-private-cluster
117-
│ ├── gke-standard-public-cluster
118-
│ ├── jupyter
119-
│ ├── jupyter_iap
120-
│ ├── jupyter_service_accounts
121-
│ ├── kuberay-cluster
122-
│ ├── kuberay-logging
123-
│ ├── kuberay-monitoring
124-
│ ├── kuberay-operator
125-
│ └── kuberay-serviceaccounts
126-
├── saxml-on-gke
127-
│ ├── httpserver
128-
│ └── single-host-inference
129-
├── training-single-gpu
130-
│ ├── README.md
131-
│ ├── data
132-
│ └── src
133-
├── tutorial.md
134-
└── tutorials
135-
├── e2e-genai-langchain-app
136-
├── finetuning-llama-7b-on-l4
137-
└── serving-llama2-70b-on-l4-gpus
138-
```
139-
140-
141-
### Jupyter Hub
142-
143-
This repository contains a Terraform template for running JupyterHub on Google Kubernetes Engine. We've also included some example notebooks ( under `applications/ray/example_notebooks`), including one that serves a GPT-J-6B model with Ray AIR (see here for the original notebook). To run these, follow the instructions at [applications/ray/README.md](./applications/ray/README.md) to install a Ray cluster.
144-
145-
This jupyter module deploys the following resources, once per user:
146-
- JupyterHub deployment
147-
- User namespace
148-
- Kubernetes service accounts
149-
150-
Learn more [about JupyterHub on GKE here](./applications/jupyter/README.md)
151-
152-
### Ray
153-
154-
This repository contains a Terraform template for running Ray on Google Kubernetes Engine.
155-
156-
This module deploys the following, once per user:
157-
- User namespace
158-
- Kubernetes service accounts
159-
- Kuberay cluster
160-
- Prometheus monitoring
161-
- Logging container
162-
163-
Learn more [about Ray on GKE here](./applications/ray/README.md)
164-
165-
## Important Considerations
166-
- Make sure to configure terraform backend to use GCS bucket, in order to persist terraform state across different environments.
167-
168-
169-
## Licensing
170-
171-
* The use of the assets contained in this repository is subject to compliance with [Google's AI Principles](https://ai.google/responsibility/principles/)
172-
* See [LICENSE](/LICENSE)
1+
# AI on GKE (Archived)
2+
3+
>[!WARNING]
4+
>This repository has been archived to preserve its contents and is no **longer actively maintained**. It is now **read-only**, meaning no further changes or contributions can be made.
5+
>
6+
> You can still freely browse all files, commit history, and issues. Please note that most of this repository's content has been migrated to new repositories under the [AI on GKE GitHub Organization](https://github.com/ai-on-gke).
7+
8+
## Content Migration Update
9+
10+
All content, including open PRs and Issues has been successfully migrated and updated! You can now find everything in the new repositories within the [AI on GKE GitHub Organization](https://github.com/ai-on-gke) and on the [GKE AI Labs website](https://gke-ai-labs.dev).
11+
12+
#### Looking for Older Content?
13+
If you're searching for a previous folder or guide, start by checking the main README.md file of the specific folder you're looking for. This file should include a direct link to where the code was migrated. If you can't find it there, please refer to the table below for overall guidance.
14+
15+
#### Repository Migration Table
16+
Below is a breakdown of how the content from older folders has been migrated to new repositories within the [AI on GKE GitHub Organization](https://github.com/ai-on-gke). This table will help you locate content that has been moved or updated.
17+
18+
| Original ai-on-gke folder | New Repository |
19+
| :----------| :--------------------- |
20+
| `benchmarks` | [scalability-benchmarks](https://github.com/ai-on-gke/scalability-benchmarks) |
21+
| `gke-batch-refarch` | [batch-reference-architecture](https://github.com/ai-on-gke/batch-reference-architecture) |
22+
| `ml-platform` | [GoogleCloudPlatform/accelerated-platforms](https://github.com/GoogleCloudPlatform/accelerated-platforms) |
23+
| `tutorials-and-examples/nvidia*` | [nvidia-ai-solutions](https://github.com/ai-on-gke/nvidia-ai-solutions) |
24+
| `applications` | [quick-start-guides](https://github.com/ai-on-gke/quick-start-guides) |
25+
| `ray-on-gke` | [quick-start-guides](https://github.com/ai-on-gke/quick-start-guides) |
26+
| `slurm-on-gke` | [slurm-on-gke](https://github.com/ai-on-gke/slurm-on-gke) |
27+
| `tools` | [tools](https://github.com/ai-on-gke/tools) |
28+
| `tpu-provisioner` | [tpu-provisioner](https://github.com/ai-on-gke/tpu-provisioner) |
29+
| `tutorials-and-examples` | [tutorials-and-examples](https://github.com/ai-on-gke/tutorials-and-examples) |
30+
| `ray-on-gke\tpu\kuberay-tpu-webhook` | [kuberay-tpu-webhook](https://github.com/ai-on-gke/kuberay-tpu-webhook) |
31+
| `modules`, `scripts`, `charts` | [common-infra](https://github.com/ai-on-gke/common-infra) |
32+
| `website` | [website](https://github.com/ai-on-gke/website) |

0 commit comments

Comments
 (0)