Skip to content

A Python reimplementation of the LAMOST Stellar Parameter Pipeline (LASP), supporting CPU and GPU acceleration for large-scale spectroscopic surveys.

License

Notifications You must be signed in to change notification settings

LiangJunC/PyLASP

Repository files navigation

PyLASP - The Python Version of the LASP

Python 3.9+ SciPy Astropy Pandas PyTorch Matplotlib License: GPL v3


πŸ“‹ Table of Contents


πŸ”­ Overview

PyLASP (The Python Version of the LAMOST Stellar Parameter Pipeline) is a modern, modular reimplementation of the original LASP (LASP-MPFit), which was developed in Interactive Data Language (IDL) and employed the ULySS software package to infer radial velocity, effective temperature, surface gravity, and metallicity from observed spectra.

PyLASP refactors the LASP-MPFit with two complementary modules:

  • LASP-CurveFit β€” a new implementation of the LASP-MPFit fitting procedure that runs on CPU, preserving legacy logic while improving data I/O and multithreaded execution efficiency.
  • LASP-Adam-GPU β€” a GPU accelerated method that introduces grouped optimization by constructing a joint residual function over multiple observed and model spectra, enabling high-throughput parameter inference across tens of millions of spectra.

PyLASP provides both No Clean and Clean strategies:

  • No Clean strategy β€” a computationally efficient strategy that fits spectra without iterative pixel rejection. It is faster but may yield lower accuracy for spectra containing significant artifacts.
  • Clean strategy β€” an iterative strategy that identifies and rejects anomalous flux points during the fitting process, specifically those whose model–data discrepancies cannot be reasonably explained by the spectral emulator. This approach improves robustness for spectra with defects or irregularities, but is computationally slower than the No Clean strategy.

πŸ”§ Installation

Follow the steps below to set up PyLASP in a clean conda environment.

  1. Create an independent conda environment:
conda create -n PyLASP-env python=3.10
  1. Activate the environment:
conda activate PyLASP-env
  1. Navigate to the PyLASP project folder:
cd /path/to/PyLASP
  1. Install the package and its dependencies:
pip install -e .

⚠️ Note: The above steps install only the dependencies for LASP-CurveFit. To enable LASP-Adam-GPU, install the appropriate PyTorch version in the same environment. See the official PyTorch installation guide for details.


πŸ–₯️ Project Structure

The Core Modules in LASP-CurveFit

LASP-CurveFit/
β”‚
β”œβ”€β”€ tgm_model/                            # Spectral emulator
β”‚   └── elodie32_flux_tgm.fits            # ELODIE polynomial coefficients
β”‚
β”œβ”€β”€ test_data/                            # Test data examples
β”‚   β”œβ”€β”€ LAMOST_spec_fits/                 # LAMOST spectrum FITS files
β”‚   β”‚   └── *.fits
β”‚   └── PyLASP_inferred_results/          # Parameter results inferred by PyLASP
β”‚       └── *.csv
β”‚
β”œβ”€β”€ file_paths.py                         # Obtain the PyLASP file path
β”‚
β”œβ”€β”€ uly_read_lms/                         # Spectrum data reading
β”‚   β”œβ”€β”€ uly_spect_alloc.py                # Initialize spectrum dictionary
β”‚   β”œβ”€β”€ uly_spect_get.py                  # Extract fields from the spectrum dictionary
β”‚   β”œβ”€β”€ uly_spect_extract.py              # Update dictionary entries based on wavelength range
β”‚   └── uly_spect_read_lms.py             # Construct the LAMOST spectrum dictionary (example implementation)
β”‚
β”œβ”€β”€ uly_tgm/                              # Model spectrum structure
β”‚   β”œβ”€β”€ uly_tgm.py                        # Define model spectrum dictionary
β”‚   └── uly_tgm_init.py                   # Initialize model spectrum dictionary
β”‚
β”œβ”€β”€ uly_tgm_eval/                         # Model spectrum generation
β”‚   └── uly_tgm_eval.py                   # Generate model spectra from parameters
β”‚
β”œβ”€β”€ WRS/                                  # Wavelength resampling
β”‚   β”œβ”€β”€ xrebin.py                         # Interpolation methods
β”‚   └── uly_spect_logrebin.py             # Spectrum resampling implementation
β”‚
β”œβ”€β”€ resolution_reduction/                 # Resolution matching
β”‚   └── convol.py                         # Spectral resolution reduction
β”‚
β”œβ”€β”€ legendre_polynomial/                  # Shape correction
β”‚   └── mregress.py                       # Legendre polynomial coefficient calculation
β”‚
β”œβ”€β”€ clean_outliers/                       # Outlier rejection
β”‚   └── clean.py                          # Clean strategy
β”‚ 
└── uly_fit/                              # Parameter fitting core
    β”œβ”€β”€ robust_sigma.py                   # Robust standard deviation calculation
    β”œβ”€β”€ uly_fit_init.py                   # Initialize a model spectrum dictionary
    β”œβ”€β”€ uly_makeparinfo.py                # Configure parameters to be optimized
    β”œβ”€β”€ uly_fit_conv_weight_poly.py       # Model preprocessing: convolution + weighting + shape correction
    β”œβ”€β”€ uly_fit_a_cmp.py                  # Compute best-fit parameters
    └── ulyss.py                          # Wrapper integrating uly_fit_a_cmp for parameter inference

The Core Modules in LASP-Adam-GPU

LASP-Adam-GPU/
β”‚
β”œβ”€β”€ config/                               # Configuration files for LASP-Adam-GPU
β”‚   └── config.py                         # Data type and device configuration
β”‚
β”œβ”€β”€ tgm_model/                            # Spectral emulator
β”‚   └── elodie32_flux_tgm.fits            # ELODIE polynomial coefficients
β”‚
β”œβ”€β”€ test_data/                            # Test data examples
β”‚   β”œβ”€β”€ LAMOST_spec_fits/                 # LAMOST spectrum FITS files
β”‚   β”‚   └── *.fits
β”‚   β”œβ”€β”€ LAMOST_spec_pt/                   # LAMOST spectrum files in .pt format
β”‚   β”‚   └── *.pt
β”‚   └── PyLASP_inferred_results/          # Parameter results inferred by PyLASP
β”‚       └── *.csv
β”‚
β”œβ”€β”€ file_paths.py                         # Obtain the PyLASP file path
β”‚
β”œβ”€β”€ data_to_pt/                           # Convert FITS spectra to .pt format
β”‚   └── data_to_pt.py                     # Step 1: Convert observed spectra to .pt format
β”‚
β”œβ”€β”€ uly_tgm_eval/                         # Model spectrum generation
β”‚   └── uly_tgm_eval_pytorch.py           # Step 2: Generate N model spectra
β”‚
β”œβ”€β”€ WRS/                                  # Wavelength resampling
β”‚   └── xrebin_pytorch.py                 # Step 3: Resample N model spectra to observed wavelengths
β”‚
β”œβ”€β”€ resolution_reduction/                 # Resolution matching
β”‚   └── convol_pytorch.py                 # Step 4: Reduce the resolution of N model spectra to observed spectra
β”‚
β”œβ”€β”€ legendre_polynomial/                  # Shape correction
β”‚   β”œβ”€β”€ matrix_inverse_benchmark.py       # Efficiency comparison of matrix inversion methods
β”‚   └── mregress_pytorch.py               # Step 5: Correct the shape of N model spectra to match observed spectra
β”‚
│── clean_outliers/                       # Outlier rejection
β”‚   └── clean_pytorch.py                  # Step 6: Clean strategy
β”‚
│── model_err/                            # Parameter uncertainty estimation
β”‚   β”œβ”€β”€ loss_reduced.py                   # Compute the flux residuals of N spectra
β”‚   └── model_err.py                      # Step 7: Compute the parameter errors of N spectra
β”‚ 
└── uly_fit/                              # Parameter fitting core
    β”œβ”€β”€ ulyss_pytorch.py                  # Initialize spectrum info for .pt storage (depends on LASP-CurveFit)
    └── uly_fit_conv_poly_pytorch.py      # Run steps 1–7 to compute best-fit parameters and save results to CSV

πŸš€ Workflow

LASP-CurveFit Inference Process

Step 1: Read a target spectrum and store it in a dictionary: uly_spect_read_lms.py

Step 2: Set the initial values for the parameters to be inferred, the Legendre polynomial degree, the model location, and whether to enable the Clean strategy, etc.: ulyss.py

Step 3: ulyss.py further updates the model dictionary and the observed-spectrum dictionary, and passes them to: uly_fit_a_cmp.py

Step 4: uly_fit_a_cmp.py constructs the objective function and iteratively calls: uly_fit_conv_weight_poly.py, which performs:

  • Generate the model spectra: uly_tgm_eval.py
  • Resample the model spectra to the observed wavelength grid: uly_spect_logrebin.py
  • Match the resolution of the model spectra to the observed spectra: convol.py
  • Correct the shape of the model spectra to match observed spectra: mregress.py

and finally saves the inferred results as CSV file.

LASP-Adam-GPU Inference Process

Step 1: Convert spectrum to .pt format: data_to_pt.py

Step 2: Configure parameters in uly_fit_conv_poly_pytorch.py β€” including whether to enable the Clean strategy β€” then construct the objective function and iteratively call:

Step 3: Once the objective function converges, uly_fit_conv_poly_pytorch.py calls model_err.py to compute the parameter errors of N spectra and saves the final results as a CSV file.


βš™οΈ Parameter Inference Example

LASP-CurveFit Inference Example

  • LASP-CurveFit is used to infer stellar parameters: see case 2 in tutorial.ipynb
  • Individual spectrum parameter inference using curve_fit
  • Uses joblib to provide multiprocessing support for large spectroscopic datasets
  • Preserves original IDL logic

LASP-Adam-GPU Inference Example

  • LASP-Adam-GPU is used to infer stellar parameters for N spectra simultaneously: see case 3 in tutorial.ipynb
  • Performs multi-spectrum parameter inference using the Adam optimizer
  • Provides significantly higher throughput for large-scale datasets
  • Easily extensible to multi-element or joint-parameter inference

πŸ”„ Limitations and Future Work

Feature Current Status Planned Improvement Implementation Plan
Initial Parameter Guess Single initial guess per spectrum Support for multiple initializations to improve robustness Grid search over parameter space; select solution with lowest χ²
Wavelength Coverage Single continuous range (e.g., 4200-5700 Γ…) Add support for disjoint wavelength segments (e.g., 4200-4500 Γ… and 5200-5700 Γ…) Apply wavelength mask array; set mask=0 for excluded regions (e.g., 4500-5200 Γ…)
Two-Stage Fitting First-stage implemented Full two-stage pipeline integration Remove pseudo-continuum from observed spectra manually, then run PyLASP inference
Multi-Abundance Inference Only RV, $T_{\rm eff}$, log $g$, [Fe/H] Joint inference of multiple elemental abundances Multi-objective optimization with extended spectral model
Legendre Polynomial Multiplicative correction supported; additive mode not yet successful Enable both multiplicative and additive polynomial corrections Iterative testing and implementation refinement
Wavelength Sampling Tested with log-uniform grids (ln Ξ») Support for linear-uniform and non-uniform wavelength grids Progressive testing across different sampling schemes

πŸ“„ Citation

When using this code, please cite the following works:

  1. ULySS: a full spectrum fitting package
  2. CoudΓ©-feed stellar spectral library – atmospheric parameters
  3. The first data release (DR1) of the LAMOST regular survey
  4. Scalable Stellar Parameter Inference Using Python-Based LASP: From CPU Optimization to GPU Acceleration

βš– License

This project is released under the GNU General Public License v3.0. See the LICENSE file for details.


About

A Python reimplementation of the LAMOST Stellar Parameter Pipeline (LASP), supporting CPU and GPU acceleration for large-scale spectroscopic surveys.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published