-
Notifications
You must be signed in to change notification settings - Fork 687
Description
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
I am the creator of PerturbLab. I recently built a comprehensive perturbation analysis framework from scratch in just 3 days (verifiable via commit history). Based on this experience, I propose three architectural enhancements for Scanpy:
1. Native C++ Backends (It is painless now)
I propose moving high-performance kernels to Native C++.
- Feasibility: In the modern CI/CD era (GitHub Actions +
cibuildwheel), cross-platform binary distribution is fully automated. - Maintenance: With AI-assisted coding, writing C++ kernels is no longer a burden. We can focus on algorithm design while automating the implementation.
- Proof: In
PerturbLab/kernels, I implemented sparse matrix operators in pure C++ that significantly outperform Numba.
2. Dependency Hygiene: Lazy Imports & Vendoring
- Lazy Loading: Heavy submodules (especially those requiring
torchor specific plotting libs) should use lazy imports. - Vendoring: Small utility functions should be "vendored" (inlined) rather than adding full package dependencies.
- Benefit: This keeps the core lightweight and prevents "dependency hell."
3. A "Transformers-like" Model Hub
I propose adding a standardized sc.models interface.
- In
perturblab/models, I implemented a unified registry to manage, download, and deploy models (e.g., scGPT, Gears) with a consistent API (config,model,io). - Scanpy is the ideal place to standardize this for the community.
Alternative Solutions
Continuing to rely solely on Numba/Python for everything limits the potential for extreme optimization and restricts the ecosystem from effectively utilizing low-level hardware acceleration (CUDA/C++).
Additional Context
My repository krkawzq/PerturbLab serves as a proof-of-concept for this architecture. It demonstrates that a strictly typed, high-performance (C++ backed), and modular system can be built rapidly.
I am happy to discuss contributing the C++ kernels or the Model Hub design to help push this initiative forward. "The lower the level, the better the performance."