Skip to content

AlayaDB-AI/Athena

 
 

Repository files navigation

Athena

Athena is a learned optimizer enhancer, proposed to enhance the query optimizer of DBMSs. It is consisted of three key designs: (i) Order-centric plan explorer, (ii) Tree-Mamba plan comparator, and (iii) Time-weighted loss function.

  • Order-centric plan explorer is implemented in Athena_PG.

  • Tree-Mamba plan comparator is implemented in athena_model.py.

  • Time-weighted loss function is implemented in train.py as function tailr_loss_with_logits.

To ensure usability and reusability, we provide the following two implementation approaches:​​ ​(1) Docker​ ​and (2) manual step-by-step deployment.

Method 1: Run in Docker

  1. Install Docker and Docker-compose.

    For example, for ubuntu, please refer to the following link:

    Install Docker using the apt repository

    Notice that if you encountered a network problem, supposing that you have a http proxy at http://<ip>:<port>, such as http://127.0.0.1:7890, you can try the following command. Remember to replace the <ip> and <port> in the following texts to your actual value.

    export HTTP_PROXY=http://<ip>:<port>
    export HTTPS_PROXY=$HTTP_PROXY
    cat <<EOF | sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf > /dev/null
    [Service]
    Environment="HTTP_PROXY=$HTTP_PROXY"
    Environment="HTTPS_PROXY=$HTTP_PROXY"
    Environment="NO_PROXY=localhost,127.0.0.1"
    EOF

    and add two lines after the line - POSTGRES_PASSWORD=postgres in the ./docker-compose.yml

       - HTTP_PROXY=http://<ip>:<port>
       - HTTPS_PROXY=http://<ip>:<port>
    
  2. Download datasets

    • IMDB

    Download the imdb.tgz to ./postgres_service

  3. Generate configuration for PostgreSQL.

    Go to PGTune, generate the configuration of your computer, and save it to ./postgres_service/postgresql.conf.

  4. Train our model in Docker

    docker compose up --build
  5. Run other commands

    Refer to other commands in section Run. Replace the command property of the athena service with the command you want in docker-compose.yml. And then run docker compose up --build again.

Method 2: Manual deployment

I. Dependencies

- Python dependencies

uv pip install -r requirements.txt

// install mamba in https://github.com/a858438680/TreeMamba.git
git clone https://github.com/a858438680/tree_mamba.git
uv pip install --no-build-isolation ./python

- Postgres extensions

No need to install these extensions here, there is guidelines when installing Athena_PG.

II. Data

Athena support 4 benchmarks: JOB, STATS-CEB, TPC-DS, and DSB.

We have included their workloads into the dir workloads/, where the training and test queries of each benchmark are placed into dir xxx and xxx-sample respectively.

III. Run

Install Athena_PG before the following steps.

1. Prepare plans for model training

# generate plans for JOB queries as example
python -u src/Athena/generate_dataset.py --database imdb --workload JOB

This code employs Athena's plan explorer to enumerate candidate plans for input queries, executes them, and stores the results into dir datasets as the training samples. The dir datasets is orgranizeda as datasets/{database}/{workload}/{plan-generation-method}/.

We also support to generate training samples of Bao, Lero, Kepler. For example, for Bao, check the file.

2. Train the Tree-Mamba plan comparator

# set parameter for torch determistic algorithm
export CUBLAS_WORKSPACE_CONFIG=:4096:8

# train a Tree-Mamba plan comparator for JOB queries as an example
python scripts/run_experiments.py --models athena --workloads JOB --methods athena

This code will train a Tree-Mamba plan comparator based on the collected training samples in the last step and store the model in the dir models/{model}/{workload}/{plan-generation-methods}/{random-seed}/models/. For this example, the dir to store the model is models/athena/JOB/athena/3407/models/.

3. Test the model

This step tests the model based JOB as an example.

  • 3.1 Geneate plans for test queries
python -u src/Athena/generate_dataset.py --database imdb --workload JOB-sample --generate_candidate_plans

The code with the option generate_candidate_plans only generates the plans for queries and does not execute them. The generated plans will be also stored in dir datasets/.

  • 3.2 Test model
python src/Athena/test.py --dataset imdb/JOB-sample/Athena --model models/athena/JOB/athena/3407/models/199.pt

This code will employ the model to select a plan from the candidate plans of each query and store the indexes of selected plans in the dir results/{database}/{workload}/{plan-generation-method}/{model}.json. file results/imdb/JOB-sample/Athena/Athena.json.

  • 3.3 Execute the selected plans in the last step
python -u scripts/test_model.py --database=imdb --workload=JOB-sample --plan_generation_method=Athena --model=Athena --result_path=Athena

This code loads the indexes of the selected plan in the last step, executes them, and stores the execution results into dir records/. This code will directly load the records instead of executing the queries next time.

We have stored model selection and execution records for Athena on JOB-sample, so you can just run 3.3 without 3.1 and 3.2.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.4%
  • C 9.8%
  • HTML 6.2%
  • Shell 1.5%
  • Jupyter Notebook 0.8%
  • Dockerfile 0.3%