Skip to content

๐ŸŒ High-performance Rust web crawler with Tor support and Docker integration. Fast, anonymous, and efficient.

License

Notifications You must be signed in to change notification settings

aryan-212/DeepStalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ•ท๏ธ DeepStalk

A high-performance web crawler written in Rust, designed for efficient and reliable web scraping with support for Tor network integration.

โœจ Features

  • ๐Ÿš€ Fast and efficient web crawling using Rust
  • ๐Ÿ” HTML parsing and processing capabilities
  • ๐Ÿ•ธ๏ธ Tor network support for anonymous crawling
  • ๐Ÿณ Docker containerization for easy deployment
  • ๐ŸŒธ Bloom filter implementation for efficient URL tracking
  • โšก Asynchronous operation with Tokio runtime

๐Ÿ“‹ Prerequisites

  • ๐Ÿฆ€ Rust (latest stable version)
  • ๐Ÿณ Docker and Docker Compose (for containerized deployment)
  • ๐Ÿ”’ Tor (for anonymous crawling)

๐Ÿ› ๏ธ Installation

Local Development

  1. Clone the repository:
git clone https://github.com/yourusername/DeepStalk.git
cd DeepStalk
  1. Build the project:
cargo build --release

Docker Deployment

  1. Build and run using Docker Compose:
docker-compose up --build

๐Ÿš€ Usage

Running Locally

./target/release/crawle-rs

Running with Input Control

The project includes a utility script run_with_input.sh that allows for automated control of the crawler:

./run_with_input.sh

This script:

  • ๐ŸŽฎ Runs the crawler in the background
  • โฑ๏ธ Automatically sends 'q' input every minute
  • ๐Ÿ”„ Provides a way to gracefully control the crawler's operation

Running with Docker

docker run -it deepstalk

๐Ÿ“ Project Structure

  • ๐Ÿ“‚ src/ - Source code directory
  • ๐Ÿ“ฆ target/ - Build output directory
  • ๐Ÿณ Dockerfile - Container configuration
  • ๐Ÿณ docker-compose.yml - Docker Compose configuration
  • ๐Ÿ“„ Cargo.toml - Rust project configuration and dependencies

๐Ÿ“ฆ Dependencies

  • ๐ŸŒธ fastbloom - For efficient URL tracking
  • ๐Ÿ” html5ever - HTML parsing
  • ๐Ÿงฉ lol_html - HTML processing
  • ๐ŸŒ reqwest - HTTP client
  • โšก tokio - Asynchronous runtime
  • ๐Ÿ”— url - URL parsing and manipulation

๐Ÿ“œ License

This project is licensed under the terms of the license included in the repository.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ”’ Security

This project includes Tor network support for anonymous crawling. Please use responsibly and in accordance with applicable laws and regulations.

About

๐ŸŒ High-performance Rust web crawler with Tor support and Docker integration. Fast, anonymous, and efficient.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published