Skip to content

Build DNA methylation database schema and tables #69

@logstar

Description

@logstar

Build DNA methylation database schema and tables using results generated from d3b-center/OpenPedCan-analysis#165. According to @ewafula, the results in d3b-center/OpenPedCan-analysis#165 are still under preparation.

Following is a brief introduction on how to build the DNA methylation database schema and tables. The implementation of these procedures may require further discussions and modifications, to coordinate with the development of DNA methylation API plot and table endpoints.

Add a new DNA methylation database schema to store all DNA methylation tables, to avoid table name collisions with existing tables in bulk_expression schema. The DNA methylation schema can have any appropriate name, and the following description uses dna_methylation as an example schema name. Following are the steps to add in a new dna_methylation schema:

  • Add DNA_METHYLATION_SCHEMA=dna_methylation in ../OpenPedCan-api-secrets/common_db.env.

  • Add ENV DNA_METHYLATION_SCHEMA="dna_methylation" in Dockerfile.

  • Add the following code in db/init_db.sh:

    CREATE SCHEMA ${DNA_METHYLATION_SCHEMA};
    GRANT SELECT ON ALL TABLES IN SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME};
    ALTER DEFAULT PRIVILEGES IN SCHEMA ${DNA_METHYLATION_SCHEMA} GRANT SELECT ON TABLES TO ${DB_USERNAME};
  • Add DNA_METHYLATION_SCHEMA = "DNA_METHYLATION_SCHEMA" in db/r_interfaces/db_env_vars.R.

  • Add the following code in db/load_db.sh:

    GRANT USAGE ON SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME};
    GRANT SELECT ON ALL TABLES IN SCHEMA ${DNA_METHYLATION_SCHEMA} TO ${DB_USERNAME};
    ALTER DEFAULT PRIVILEGES IN SCHEMA ${DNA_METHYLATION_SCHEMA} GRANT SELECT ON TABLES TO ${DB_USERNAME};
  • Add --schema="$DNA_METHYLATION_SCHEMA" in db/build_tools/build_db_docker_cmd.sh to include DNA_METHYLATION_SCHEMA in the database dump.

Add new DNA methylation tables to store DNA methylation results that are required to generate plots and tables for MTP. The DNA methylation tables can have any appropriate names, and the following description uses dna_methylation_summary as an example table name. Following are the steps to add a new dna_methylation_summary table:

  • Add DNA_METHYLATION_SUMMARY_TBL=dna_methylation_summary in ../OpenPedCan-api-secrets/common_db.env.

  • Add ENV DNA_METHYLATION_SUMMARY_TBL="dna_methylation_summary" in Dockerfile.

  • Add DNA_METHYLATION_SUMMARY_TBL = "DNA_METHYLATION_SUMMARY_TBL" in db/r_interfaces/db_env_vars.R.

  • Create an empty table in db/build_tools/build_db.R:

    # dna_methylation_summary_tbl is assumed to be the table for loading into the database.
    #
    # dna_methylation_summary_tbl does not have to be the full table, and it can only have zero or a few lines.
    db_write_table(
      dplyr::slice(dna_methylation_summary_tbl, 0), conn,
      tolower(db_env_vars$DNA_METHYLATION_SCHEMA),
      tolower(db_env_vars$DNA_METHYLATION_SUMMARY_TBL))
  • Prepare a CSV file for loading into the database. The CSV files can either be prepared in methylation summary module d3b-center/OpenPedCan-analysis#165 or db/build_tools/build_db.R.

    • If prepared in methylation summary module d3b-center/OpenPedCan-analysis#165:
      • Upload the prepared CSV file to a S3 bucket as new OpenPedCan-analysis data release.
      • The prepared CSV file will be downloaded to OpenPedCan-analysis/data/ by db/build_db.sh.
    • If prepared in db/build_tools/build_db.R:
      • Upload a result file to a S3 bucket as new OpenPedCan-analysis data release. The result file can be in any format, which needs further preparation before loading into the database.
      • The result file will be downloaded to OpenPedCan-analysis/data/ by db/build_db.sh.
      • Load the result file in db/build_tools/build_db.R.
      • Prepare the table for database loading, such as filtering, renaming columns, and reordering columns.
      • Output the prepared table as CSV file like the procedure in db/build_tools/build_db.R.
  • Add the following code in db/build_tools/build_db_docker_cmd.sh to load the prepared CSV file to the database:

    # Assuming the prepared CSV file is at ${BUILD_OUTPUT_DIR_PATH}/${DNA_METHYLATION_SCHEMA}_${DNA_METHYLATION_SUMMARY_TBL}.csv
    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$DB_NAME" <<EOSQL
    COPY ${DNA_METHYLATION_SCHEMA}.${DNA_METHYLATION_SUMMARY_TBL}
    FROM '${BUILD_OUTPUT_DIR_PATH}/${DNA_METHYLATION_SCHEMA}_${DNA_METHYLATION_SUMMARY_TBL}.csv'
    WITH (FORMAT csv, HEADER);
    EOSQL
  • Add the following code in db/build_tools/build_db_docker_cmd.sh to add a table index on the column that will be queried:

    echo "CREATE INDEX ensg_id_idx ON ${DNA_METHYLATION_SCHEMA}.${DNA_METHYLATION_SUMMARY_TBL} (\"Gene_Ensembl_ID\");" \
    | gzip --no-name -c >> "$db_dump_out_path"

The updated database building procedure can be tested using ./db/build_db.sh.

cc @afarrel @chinwallaa

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions