-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Build DNA methylation database schema and tables using results generated from d3b-center/OpenPedCan-analysis#165. According to @ewafula, the results in d3b-center/OpenPedCan-analysis#165 are still under preparation.
Following is a brief introduction on how to build the DNA methylation database schema and tables. The implementation of these procedures may require further discussions and modifications, to coordinate with the development of DNA methylation API plot and table endpoints.
Add a new DNA methylation database schema to store all DNA methylation tables, to avoid table name collisions with existing tables in bulk_expression schema. The DNA methylation schema can have any appropriate name, and the following description uses dna_methylation as an example schema name. Following are the steps to add in a new dna_methylation schema:
-
Add
DNA_METHYLATION_SCHEMA=dna_methylationin../OpenPedCan-api-secrets/common_db.env. -
Add
ENV DNA_METHYLATION_SCHEMA="dna_methylation"inDockerfile. -
Add the following code in
db/init_db.sh:CREATE SCHEMA ${DNA_METHYLATION_SCHEMA}; GRANT SELECT ON ALL TABLES IN SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME}; ALTER DEFAULT PRIVILEGES IN SCHEMA ${DNA_METHYLATION_SCHEMA} GRANT SELECT ON TABLES TO ${DB_USERNAME};
-
Add
DNA_METHYLATION_SCHEMA = "DNA_METHYLATION_SCHEMA"indb/r_interfaces/db_env_vars.R. -
Add the following code in
db/load_db.sh:GRANT USAGE ON SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME}; GRANT SELECT ON ALL TABLES IN SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME}; ALTER DEFAULT PRIVILEGES IN SCHEMA ${DNA_METHYLATION_SCHEMA} GRANT SELECT ON TABLES TO ${DB_USERNAME};
-
Add
--schema="$DNA_METHYLATION_SCHEMA"indb/build_tools/build_db_docker_cmd.shto includeDNA_METHYLATION_SCHEMAin the database dump.
Add new DNA methylation tables to store DNA methylation results that are required to generate plots and tables for MTP. The DNA methylation tables can have any appropriate names, and the following description uses dna_methylation_summary as an example table name. Following are the steps to add a new dna_methylation_summary table:
-
Add
DNA_METHYLATION_SUMMARY_TBL=dna_methylation_summaryin../OpenPedCan-api-secrets/common_db.env. -
Add
ENV DNA_METHYLATION_SUMMARY_TBL="dna_methylation_summary"inDockerfile. -
Add
DNA_METHYLATION_SUMMARY_TBL = "DNA_METHYLATION_SUMMARY_TBL"indb/r_interfaces/db_env_vars.R. -
Create an empty table in
db/build_tools/build_db.R:# dna_methylation_summary_tbl is assumed to be the table for loading into the database. # # dna_methylation_summary_tbl does not have to be the full table, and it can only have zero or a few lines. db_write_table( dplyr::slice(dna_methylation_summary_tbl, 0), conn, tolower(db_env_vars$DNA_METHYLATION_SCHEMA), tolower(db_env_vars$DNA_METHYLATION_SUMMARY_TBL))
-
Prepare a CSV file for loading into the database. The CSV files can either be prepared in methylation summary module d3b-center/OpenPedCan-analysis#165 or
db/build_tools/build_db.R.- If prepared in methylation summary module d3b-center/OpenPedCan-analysis#165:
- Upload the prepared CSV file to a S3 bucket as new OpenPedCan-analysis data release.
- The prepared CSV file will be downloaded to
OpenPedCan-analysis/data/bydb/build_db.sh.
- If prepared in
db/build_tools/build_db.R:- Upload a result file to a S3 bucket as new OpenPedCan-analysis data release. The result file can be in any format, which needs further preparation before loading into the database.
- The result file will be downloaded to
OpenPedCan-analysis/data/bydb/build_db.sh. - Load the result file in
db/build_tools/build_db.R. - Prepare the table for database loading, such as filtering, renaming columns, and reordering columns.
- Output the prepared table as CSV file like the procedure in
db/build_tools/build_db.R.
- If prepared in methylation summary module d3b-center/OpenPedCan-analysis#165:
-
Add the following code in
db/build_tools/build_db_docker_cmd.shto load the prepared CSV file to the database:# Assuming the prepared CSV file is at ${BUILD_OUTPUT_DIR_PATH}/${DNA_METHYLATION_SCHEMA}_${DNA_METHYLATION_SUMMARY_TBL}.csv psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$DB_NAME" <<EOSQL COPY ${DNA_METHYLATION_SCHEMA}.${DNA_METHYLATION_SUMMARY_TBL} FROM '${BUILD_OUTPUT_DIR_PATH}/${DNA_METHYLATION_SCHEMA}_${DNA_METHYLATION_SUMMARY_TBL}.csv' WITH (FORMAT csv, HEADER); EOSQL
-
Add the following code in
db/build_tools/build_db_docker_cmd.shto add a table index on the column that will be queried:echo "CREATE INDEX ensg_id_idx ON ${DNA_METHYLATION_SCHEMA}.${DNA_METHYLATION_SUMMARY_TBL} (\"Gene_Ensembl_ID\");" \ | gzip --no-name -c >> "$db_dump_out_path"
The updated database building procedure can be tested using ./db/build_db.sh.