-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Build CNV database schema and tables with the correct structure for the Evidence Plot proposed in d3b-center/ticket-tracker-OPC#288. This ticket is heavily based off the thorough example @logstar wrote for building the methylation database schema in PedatricOpenTargets/OpenPedCan-api#69.
Adding a CNV Database Schema
Add new copy number variant (CNV) database schema, using any appropriate name, to store all CNV tables and to avoid table name collisions with existing tables in bulk_expression schema. Following are the steps to add in a new dna_methylation schema:
-
Add
CNV_SCHEMA=cnvin../OpenPedCan-api-secrets/common_db.env. -
Add
ENV CNV_SCHEMA ="cnv"inDockerfile. -
Add the following code in
db/init_db.sh:CREATE SCHEMA ${CNV}; GRANT SELECT ON ALL TABLES IN SCHEMA ${CNV} TO ${DB_USERNAME}; ALTER DEFAULT PRIVILEGES IN SCHEMA ${CNV} GRANT SELECT ON TABLES TO ${DB_USERNAME}; -
Add
CNV_SCHEMA = "CNV_SCHEMA"indb/r_interfaces/db_env_vars.R. -
Add the following code in
db/load_db.sh:GRANT USAGE ON SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME}; GRANT SELECT ON ALL TABLES IN SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME}; ALTER DEFAULT PRIVILEGES IN SCHEMA ${DNA_METHYLATION_SCHEMA} GRANT SELECT ON TABLES TO ${DB_USERNAME}; -
Add
--schema="$CNV_SCHEMA"indb/build_tools/build_db_docker_cmd.shto includeCNV_SCHEMAin the database dump.
Adding CNV Tables
Add new CNV tables to store CNV results that are required to generate plots and tables for MTP. The DNA methylation tables can have any appropriate names, and the following description uses dna_methylation_summary as an example table name. Following are the steps to add a new dna_methylation_summary table:
-
Add
CNV_EVIDENCE_SUMMARY_TBL=cnv_evidence_summaryin../OpenPedCan-api-secrets/common_db.env. -
Add
ENV CNV_EVIDENCE_SUMMARY_TBL="cnv_evidence_summary"inDockerfile. -
Add
CNV_EVIDENCE_SUMMARY_TBL = "CNV_EVIDENCE_SUMMARY_TBL"indb/r_interfaces/db_env_vars.R. -
Create a minimal table for testing in
db/build_tools/build_db.R:# cnv_evidence_summary_tbl is assumed to be the table for loading into the database. # # cnv_evidence_summary_tbl does not have to be the full table, and it can only have zero or a few lines. db_write_table( dplyr::slice(cnv_evidence_summary_tbl, 0), conn, tolower(db_env_vars$CNV_SCHEMA), tolower(db_env_vars$CNV_EVIDENCE_SUMMARY_TBL)) -
Prepare a CSV file for loading into the database. Since the file(s) in
OpenPedCan-analysisare not correctly formatted, the file will need to be created usingdb/build_tools/build_db.R. Since the CNV plots will add at least 3 tables to the database, may want to conside splittingbuild_db.Rinto multiple scripts in the future.- Upload a result file to a S3 bucket as new OpenPedCan-analysis data release. The result file can be in any format, which needs further preparation before loading into the database.
- The result file will be downloaded to
OpenPedCan-analysis/data/bydb/build_db.sh. - Load the result file in
db/build_tools/build_db.R. - Prepare the table for database loading, such as filtering, renaming columns, and reordering columns.
- Output the prepared table as CSV file like the procedure in
db/build_tools/build_db.R.
-
Add the following code in
db/build_tools/build_db_docker_cmd.shto load the prepared CSV file to the database:# Assuming the prepared CSV file is at ${BUILD_OUTPUT_DIR_PATH}/${CNV_SCHEMA}_${CNV_EVIDENCE_SUMMARY_TBL}.csv psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$DB_NAME" <<EOSQL COPY ${CNV_SCHEMA}.${CNV_EVIDENCE_SUMMARY_TBL} FROM '${BUILD_OUTPUT_DIR_PATH}/${CNV_SCHEMA}_${CNV_EVIDENCE_SUMMARY_TBL}.csv' WITH (FORMAT csv, HEADER); EOSQL -
Add the following code in db/build_tools/build_db_docker_cmd.sh to add a table index on the column that will be queried:
echo "CREATE INDEX ensg_id_idx ON ${CNV_SCHEMA}.${CNV_EVIDENCE_SUMMARY_TBL} (\"Gene_Ensembl_ID\");" \ | gzip --no-name -c >> "$db_dump_out_path"
The updated database building procedure can be tested using ./db/build_db.sh.