Paper Getting Tool

A Python script to download competition problem sets from URLs and stitch the images together vertically into a single combined image.

Features

Downloads competition problem pages from a list of URLs
Extracts images from the pages in order
Stitches images vertically to create a single combined image
Saves output images to the output/ directory with properly sanitized filenames
Includes retry mechanisms and proper error handling

Requirements

Python 3.x
requests
BeautifulSoup4
Pillow (PIL)
re (for sanitizing filenames)

Setup

Install the required packages:

pip install requests beautifulsoup4 pillow

Activate your virtual environment (recommended):

source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate     # On Windows

Create a target_urls file with the URLs you want to process, one per line
Run the script:

python paper_getter.py

File Descriptions

paper_getter.py: Main script that handles newer page structures (post-2021)
- Designed to handle current page layouts
- Looks for elements with classes content-desc and content-text
paper_getter_old.py: Specialized script for 2021 and earlier competition problems
- Only capable of extracting competition problem images from 2021
- Does not support competition problem retrieval for years prior to 2020
- Handles older page structures with classes like newsMain-content-title

Output

Processed images are saved in the output/ directory with filenames based on the competition title and a timestamp to prevent duplicates.

Notes

The script adds delays between requests to be respectful to the server
Images are saved in PNG format to preserve quality
Filenames are sanitized to remove potentially problematic characters

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent_tasks		agent_tasks
README.md		README.md
README_zh.md		README_zh.md
paper_getter.py		paper_getter.py
paper_getter_old.py		paper_getter_old.py
target_urls		target_urls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Getting Tool

Features

Requirements

Setup

File Descriptions

Output

Notes

About

Uh oh!

Releases

Packages

Languages

Xhen-Starry-Night/PaperGetting

Folders and files

Latest commit

History

Repository files navigation

Paper Getting Tool

Features

Requirements

Setup

File Descriptions

Output

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages