Skip to content

Data Race Issues in fits_open_memfile() #100

@fschledorn

Description

@fschledorn

Summary

Two categories of thread-safety problems were identified in CFITSIO that cause data corruption and occasional crashes in multithreaded environments.
They were found while debugging FITS header corruption in the AEGIS preprocessing pipeline, which runs 10–32 concurrent workers using the in-memory FITS driver.
The race conditions were observed and reproduced consistently on both ARM64 (macOS 14.6.1, Clang 17) and x86-64 (Debian 13.1 / Arch Linux, GCC 15.2 / Clang 21).

Affected areas

  1. Initialization races when multiple threads call CFITSIO entry points simultaneously.
  2. Multiple data races in the memory driver (drvrmem.c) when multiple threads open, read, write, or close FITS buffers concurrently.

Impact

In tests using ThreadSanitizer (TSan) with 16 threads and 2000 iterations each (after suppressing the nullptr deref):

  • 503 slot collisions (two threads assigned the same handle, at the same time)
  • 4,278 failed operations out of 32,000 total
  • Frequent segmentation faults and corrupted FITS data

This affects any program using CFITSIO’s memory driver concurrently, including those using fits_open_memfile().


Root Cause

1. Initialization Race

CFITSIO entry points (ffomem, ffopen, ffinit, ffimem) conditionally call fits_init_cfitsio() based on a shared global flag:

if (need_to_initialize) {           // unprotected read
    *status = fits_init_cfitsio();  // multiple threads can enter here
}

If two threads enter this block concurrently, both execute initialization and write to shared global state (driverTable, handleTable, no_of_drivers).
TSan reports races on these variables and on the need_to_initialize flag itself.

2. Memory Driver Races

The memory driver maintains a global memTable[] shared by all threads:

static memdriver memTable[NMAXFILES];

Functions such as mem_openmem(), mem_truncate(), mem_close_*(), mem_read(), and mem_write() access memTable[] without synchronization.

Example (simplified):

if (memTable[i].memaddrptr == 0) {     // both threads see slot as free
    memTable[i].memaddrptr = buffptr;  // both write to same slot
}

Some higher-level functions acquire global locks (e.g., FFLOCK in ffomem()), but these do not cover all code paths into the memory driver, leaving certain accesses unprotected

This leads to slot collisions, use-after-free errors, and data corruption when multiple threads allocate or release slots simultaneously.


Reproduction

A deterministic reproducer is available here:

https://github.com/fschledorn/cfitsio_concurrency_fix/tree/develop

cd cfitsio-concurrency-reproducer
make clean && make all && make run

Without patches, TSan reports reproducible races.
With the patches applied, no warnings occur.

The original TSan logs are attached tsan_enhanced_test_output.txt


Fixes

Fix 1 – Initialization Race

Each CFITSIO entry point now calls fits_init_cfitsio() unconditionally.
Inside fits_init_cfitsio(), a lock-protected flag ensures initialization occurs only once.

int fits_init_cfitsio(void) {
    FFLOCK;
    if (cfitsio_is_initialized) {
        FFUNLOCK;
        return 0;
    }
    /* initialization */
    cfitsio_is_initialized = 1;
    FFUNLOCK;
    return 0;
}

Implemented in commit d53d489

Fix 2 – Memory Driver Synchronization

Added FFLOCK / FFUNLOCK guards to all operations accessing memTable[], including:

  • mem_openmem(), mem_createmem()
  • mem_truncate()
  • mem_close_*()
  • mem_read(), mem_write(), mem_seek(), mem_size()

Example:

int mem_openmem(/* ... */) {
    FFLOCK;
    /* allocate slot */
    FFUNLOCK;
    return 0;
}

Implemented in commits fschledorn@0fbed9f and 6d38f9c


Feedback on the locking strategy, potential ABI implications, or alternative synchronization approaches is very welcome. I can also provide performance and TSan logs on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions