Skip to content

Conversation

@THE-Amrit-mahto-05
Copy link
Contributor

@THE-Amrit-mahto-05 THE-Amrit-mahto-05 commented Jan 31, 2026

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Description

As discussed on Zulip, I had a question about memory ownership in stream_functions.rs. Some pointers coming from C (like the demuxer pointer) were being wrapped in Box::from_raw, which assumes Rust owns them and will free them. The allocation actually happens on the C side, which could cause undefined behavior (UB) and allocator mismatches.

So proper handling is needed, especially as the platform is moving towards Rust-managed memory for safety and maintainability.

Problem

  • Allocator mismatch / UB: Some C-allocated memory was being freed by Rust using Box::from_raw, which is unsafe.
  • Miri test failures: Isolated operations like open and drop(Box::from_raw) caused errors in Miri.
  • Potential crashes / undefined behavior: If Rust tries to free C-allocated memory incorrectly, it can panic or corrupt memory.

Changes Made

  • Transferred ownership to Rust for certain allocations that Rust should manage.
  • Replaced drop(Box::from_raw(ptr)) with a safe free(ptr) where the memory is still C-managed.
  • Used Rust’s dealloc for Rust-allocated memory, ensuring ownership and deallocation match.
  • Exposed free in libccxr_exports::demuxer to correctly free C-allocated memory where needed.
  • Made CCX_DTVCC_MAX_COLUMNS public for safe Rust usage when calculating layouts in deallocations.
  • Ensured sub.data in simplexml.rs is freed with the correct allocator (free) instead of Box::from_raw.

Flowchart / Old vs New Behavior

  • Old Behavior (Unsafe / UB Risk)
C allocates memory -> Pointer passed to Rust -> Rust does Box::from_raw -> Rust frees

Problems
Allocator mismatch
UB if C malloc and Rust dealloc mismatch
Miri and test errors

  • New Behavior (Safe Ownership / Correct Free)
C allocates memory -> Pointer passed to Rust ->Rust uses pointer but calls `free(ptr)` from C (no mismatch)
      
Rust allocates memory -> Pointer stays in Rust -> Rust deallocs with `dealloc`

Benefits

  • Memory safely freed according to its allocator
  • Ownership matches Rust or C context
  • Eliminates UB and test failures

Why Ownership Changed to Rust

  • The platform is moving towards Rust-first memory management.
  • Centralizing ownership in Rust reduces complexity and makes future maintenance safer.
  • Rust’s compile-time guarantees prevent memory leaks and runtime crashes.

Test Plan

  • cargo test (all decoder, demuxer, and encoder tests passed)
  • cargo miri test (Miri no longer reports UB in ownership paths)

Copy link
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this important fix! Allocator mismatch between C and Rust is a real source of undefined behavior, and this PR correctly identifies the problem.

However, the implementation could be improved to follow better Rust patterns. Here are the issues I found:

1. Layout calculation duplicated across files (UB risk)

The same layout calculation now appears in 4 places:

Layout::array::<dtvcc_symbol>(CCX_DTVCC_MAX_COLUMNS as usize)
  • window.rs:164 (allocation)
  • window.rs:675 (allocation in tests)
  • window.rs:706 (allocation in tests)
  • lib.rs:278 (deallocation - new in this PR)

If anyone changes the allocation size without updating all deallocation sites, it's instant UB. Allocation and deallocation logic should be encapsulated together.

Suggested fix: Create helper functions in decoder/mod.rs or decoder/window.rs:

fn row_layout() -> Layout {
    Layout::array::<dtvcc_symbol>(CCX_DTVCC_MAX_COLUMNS as usize)
        .expect("row layout calculation")
}

pub unsafe fn alloc_row() -> *mut dtvcc_symbol {
    alloc_zeroed(row_layout()) as *mut dtvcc_symbol
}

pub unsafe fn dealloc_row(ptr: *mut dtvcc_symbol) {
    if !ptr.is_null() {
        dealloc(ptr as *mut u8, row_layout());
    }
}

Then lib.rs just calls decoder::dealloc_row(*row_ptr) - no Layout knowledge needed outside the module.

2. Duplicate free declarations

There are now TWO extern "C" fn free declarations:

  • demuxer/common_types.rs:8 - private
  • libccxr_exports/demuxer.rs:24 - now public

Please consolidate these.

3. free exported from wrong module

// In simplexml.rs (encoder module)
use crate::libccxr_exports::demuxer::free;

free() is a C stdlib function, not a demuxer-specific function. This is confusing.

Suggested fix: Create a central module for FFI memory helpers:

// src/rust/src/ffi_alloc.rs (new file)
use std::os::raw::c_void;

extern "C" {
    fn free(ptr: *mut c_void);
}

/// Frees memory allocated by C code (malloc/calloc)
#[inline]
pub unsafe fn c_free<T>(ptr: *mut T) {
    if !ptr.is_null() {
        free(ptr as *mut c_void);
    }
}

Then use ffi_alloc::c_free(ptr) everywhere.

4. Making CCX_DTVCC_MAX_COLUMNS public

This leaks internal implementation details just for deallocation. With the alloc_row()/dealloc_row() approach above, this constant can stay private.

Summary

The fix is correct in principle, but please refactor to:

  1. Encapsulate row alloc/dealloc in helper functions (prevents Layout mismatch bugs)
  2. Create a central ffi_alloc module for C memory functions (or add libc crate)
  3. Remove duplicate free declaration in common_types.rs
  4. Keep CCX_DTVCC_MAX_COLUMNS private

Happy to discuss if you have questions!

@THE-Amrit-mahto-05
Copy link
Contributor Author

@cfsmp3
Thanks for the detailed review
I’ve refactored the implementation to fully address the points you raised

  1. Centralized row allocation/deallocation via alloc_row/dealloc_row
  2. Removed duplicated Layout calculations
  3. Introduced a central ffi_alloc module for C allocator interactions
  4. Consolidated free declarations and removed demuxer-specific exposure
  5. Kept CCX_DTVCC_MAX_COLUMNS crate-private

All allocation/deallocation paths now match ownership correctly
Please let me know if required any new changes

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 032cd1c...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 6/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 85/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --bom c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 032cd1c...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 85/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants