Skip to content

Get papers from the Electronic design competition in China.

Notifications You must be signed in to change notification settings

Xhen-Starry-Night/PaperGetting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Getting Tool

A Python script to download competition problem sets from URLs and stitch the images together vertically into a single combined image.

Features

  • Downloads competition problem pages from a list of URLs
  • Extracts images from the pages in order
  • Stitches images vertically to create a single combined image
  • Saves output images to the output/ directory with properly sanitized filenames
  • Includes retry mechanisms and proper error handling

Requirements

  • Python 3.x
  • requests
  • BeautifulSoup4
  • Pillow (PIL)
  • re (for sanitizing filenames)

Setup

  1. Install the required packages:
pip install requests beautifulsoup4 pillow
  1. Activate your virtual environment (recommended):
source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate     # On Windows
  1. Create a target_urls file with the URLs you want to process, one per line

  2. Run the script:

python paper_getter.py

File Descriptions

  • paper_getter.py: Main script that handles newer page structures (post-2021)

    • Designed to handle current page layouts
    • Looks for elements with classes content-desc and content-text
  • paper_getter_old.py: Specialized script for 2021 and earlier competition problems

    • Only capable of extracting competition problem images from 2021
    • Does not support competition problem retrieval for years prior to 2020
    • Handles older page structures with classes like newsMain-content-title

Output

Processed images are saved in the output/ directory with filenames based on the competition title and a timestamp to prevent duplicates.

Notes

  • The script adds delays between requests to be respectful to the server
  • Images are saved in PNG format to preserve quality
  • Filenames are sanitized to remove potentially problematic characters

About

Get papers from the Electronic design competition in China.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages