-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Summary
Two categories of thread-safety problems were identified in CFITSIO that cause data corruption and occasional crashes in multithreaded environments.
They were found while debugging FITS header corruption in the AEGIS preprocessing pipeline, which runs 10–32 concurrent workers using the in-memory FITS driver.
The race conditions were observed and reproduced consistently on both ARM64 (macOS 14.6.1, Clang 17) and x86-64 (Debian 13.1 / Arch Linux, GCC 15.2 / Clang 21).
Affected areas
- Initialization races when multiple threads call CFITSIO entry points simultaneously.
- Multiple data races in the memory driver (
drvrmem.c) when multiple threads open, read, write, or close FITS buffers concurrently.
Impact
In tests using ThreadSanitizer (TSan) with 16 threads and 2000 iterations each (after suppressing the nullptr deref):
- 503 slot collisions (two threads assigned the same handle, at the same time)
- 4,278 failed operations out of 32,000 total
- Frequent segmentation faults and corrupted FITS data
This affects any program using CFITSIO’s memory driver concurrently, including those using fits_open_memfile().
Root Cause
1. Initialization Race
CFITSIO entry points (ffomem, ffopen, ffinit, ffimem) conditionally call fits_init_cfitsio() based on a shared global flag:
if (need_to_initialize) { // unprotected read
*status = fits_init_cfitsio(); // multiple threads can enter here
}If two threads enter this block concurrently, both execute initialization and write to shared global state (driverTable, handleTable, no_of_drivers).
TSan reports races on these variables and on the need_to_initialize flag itself.
2. Memory Driver Races
The memory driver maintains a global memTable[] shared by all threads:
static memdriver memTable[NMAXFILES];
Functions such as mem_openmem(), mem_truncate(), mem_close_*(), mem_read(), and mem_write() access memTable[] without synchronization.
Example (simplified):
if (memTable[i].memaddrptr == 0) { // both threads see slot as free
memTable[i].memaddrptr = buffptr; // both write to same slot
}Some higher-level functions acquire global locks (e.g., FFLOCK in ffomem()), but these do not cover all code paths into the memory driver, leaving certain accesses unprotected
This leads to slot collisions, use-after-free errors, and data corruption when multiple threads allocate or release slots simultaneously.
Reproduction
A deterministic reproducer is available here:
https://github.com/fschledorn/cfitsio_concurrency_fix/tree/develop
cd cfitsio-concurrency-reproducer
make clean && make all && make runWithout patches, TSan reports reproducible races.
With the patches applied, no warnings occur.
The original TSan logs are attached tsan_enhanced_test_output.txt
Fixes
Fix 1 – Initialization Race
Each CFITSIO entry point now calls fits_init_cfitsio() unconditionally.
Inside fits_init_cfitsio(), a lock-protected flag ensures initialization occurs only once.
int fits_init_cfitsio(void) {
FFLOCK;
if (cfitsio_is_initialized) {
FFUNLOCK;
return 0;
}
/* initialization */
cfitsio_is_initialized = 1;
FFUNLOCK;
return 0;
}Implemented in commit d53d489
Fix 2 – Memory Driver Synchronization
Added FFLOCK / FFUNLOCK guards to all operations accessing memTable[], including:
- mem_openmem(), mem_createmem()
- mem_truncate()
- mem_close_*()
- mem_read(), mem_write(), mem_seek(), mem_size()
Example:
int mem_openmem(/* ... */) {
FFLOCK;
/* allocate slot */
FFUNLOCK;
return 0;
}Implemented in commits fschledorn@0fbed9f and 6d38f9c
Feedback on the locking strategy, potential ABI implications, or alternative synchronization approaches is very welcome. I can also provide performance and TSan logs on request.