Skip to content

Create a dataset out of the PDFs content  #3

@liadmagen

Description

@liadmagen

As part of the corpus creation process, the PDF content should be converted to text, and aggregated together into a large dataset.

This dataset should be stored into the data/papers/processed folder, and the script that creates it should be saved under src/papers/data/make_dataset.py file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions