A high-performance web crawler written in Rust, designed for efficient and reliable web scraping with support for Tor network integration.
- ๐ Fast and efficient web crawling using Rust
- ๐ HTML parsing and processing capabilities
- ๐ธ๏ธ Tor network support for anonymous crawling
- ๐ณ Docker containerization for easy deployment
- ๐ธ Bloom filter implementation for efficient URL tracking
- โก Asynchronous operation with Tokio runtime
- ๐ฆ Rust (latest stable version)
- ๐ณ Docker and Docker Compose (for containerized deployment)
- ๐ Tor (for anonymous crawling)
- Clone the repository:
git clone https://github.com/yourusername/DeepStalk.git
cd DeepStalk- Build the project:
cargo build --release- Build and run using Docker Compose:
docker-compose up --build./target/release/crawle-rsThe project includes a utility script run_with_input.sh that allows for automated control of the crawler:
./run_with_input.shThis script:
- ๐ฎ Runs the crawler in the background
- โฑ๏ธ Automatically sends 'q' input every minute
- ๐ Provides a way to gracefully control the crawler's operation
docker run -it deepstalk- ๐
src/- Source code directory - ๐ฆ
target/- Build output directory - ๐ณ
Dockerfile- Container configuration - ๐ณ
docker-compose.yml- Docker Compose configuration - ๐
Cargo.toml- Rust project configuration and dependencies
- ๐ธ fastbloom - For efficient URL tracking
- ๐ html5ever - HTML parsing
- ๐งฉ lol_html - HTML processing
- ๐ reqwest - HTTP client
- โก tokio - Asynchronous runtime
- ๐ url - URL parsing and manipulation
This project is licensed under the terms of the license included in the repository.
Contributions are welcome! Please feel free to submit a Pull Request.
This project includes Tor network support for anonymous crawling. Please use responsibly and in accordance with applicable laws and regulations.