Skip to content

Conversation

@Aryan-Shan
Copy link

@Aryan-Shan Aryan-Shan commented Dec 5, 2025

Fixes #675

Description

This PR introduces a powerful OCR text selection feature that allows users to extract and copy text from images directly within PictoPy. The implementation uses Tesseract.js on the frontend for a smooth, responsive experience without heavy backend dependencies.

Key Features

  • OCR Text Selection: Press Ctrl+T to activate text recognition on any image
  • Smart Text Detection: Automatically identifies and highlights text regions
  • Selection & Copy: Select any text and copy with Ctrl+C
  • Visual Feedback: Glassmorphism-style notifications for all actions
  • Image Locking: Automatically locks image panning during text selection

Technical Implementation

  • Frontend OCR: Uses Tesseract.js v5.1.0 (stable) for client-side text recognition
  • Performance: Worker-based OCR processing prevents UI blocking
  • Precise Alignment: Calculates scale factors to ensure selection highlights match displayed image
  • Event Handling: Proper event propagation management to work alongside zoom/pan features
  • Responsive UI: Smooth animations and modern glassmorphism design

New Dependencies

User Experience

  1. Open any image in PictoPy
  2. Press Ctrl+T to activate OCR text selection
  3. Hover over detected text (highlighted in light purple)
  4. Select text like in a regular document
  5. Press Ctrl+C to copy selected text
  6. See confirmation notification
  7. Press Ctrl+T again to exit selection mode

Notes

  • The feature works entirely client-side, no server processing required
  • Text recognition supports multiple languages (English by default)
  • Selection works reliably even with zoomed/panned images
  • Memory efficient with proper worker cleanup

Testing

  • Tested with various image types (screenshots, documents, photos)
  • Verified copy functionality across different text densities
  • Confirmed proper handling of scaled images
  • Tested keyboard shortcut reliability

Summary by CodeRabbit

  • New Features
    • OCR added to the image viewer to extract selectable text from images.
    • Press Ctrl+T to toggle OCR; recognized text appears as a selectable, copyable overlay (Ctrl+C) with brief copy confirmation.
    • Visual loading indicator and “Text Selection Active” badge during/after processing.
    • Overlay respects image scale and rotation and adapts on resize; image load errors fallback to a placeholder.
    • New text-overlay and OCR service support added.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

Adds frontend OCR: tesseract.js dependency, a singleton OCRService (worker lifecycle + recognize/terminate), ImageViewer integration with Ctrl+T toggle, image scale tracking and OCR invocation, and a new TextOverlay component rendering selectable, copyable OCR overlays.

Changes

Cohort / File(s) Change Summary
Dependencies
frontend/package.json
Added dependency tesseract.js: ^5.1.0.
OCR Service
frontend/src/services/OCRService.ts
New singleton ocrService that lazily initializes a Tesseract worker, exposes recognize(imagePath) and terminate(), caches worker state, and logs errors.
Image Viewer
frontend/src/components/Media/ImageViewer.tsx
Integrated OCR state (isOCRActive, isOCRLoading, ocrData), keyboard toggle (Ctrl+T), image ref + ResizeObserver scale calculation, OCR invocation via ocrService.recognize(...), loading/error overlays, rotation handling, and conditional rendering of TextOverlay.
Text Overlay
frontend/src/components/Media/TextOverlay.tsx
New TextOverlay component that consumes Tesseract Page data to render absolutely-positioned, scaled, selectable text spans; handles global Ctrl+C copy-to-clipboard with feedback and includes highlight/animation styling.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ImageViewer
    participant OCRService
    participant TesseractWorker as Tesseract.js Worker
    participant TextOverlay

    User->>ImageViewer: Press Ctrl+T (toggle OCR)
    ImageViewer->>ImageViewer: set isOCRActive=true, isOCRLoading=true
    ImageViewer->>OCRService: recognize(imagePath)
    Note right of ImageViewer: show OCR processing overlay
    OCRService->>OCRService: getWorker() (lazy init)
    OCRService->>TesseractWorker: initialize worker, load language
    TesseractWorker-->>OCRService: worker ready
    OCRService->>TesseractWorker: run recognize(image)
    TesseractWorker-->>OCRService: return OCR Page result
    OCRService-->>ImageViewer: return ocrData
    ImageViewer->>ImageViewer: set isOCRLoading=false, store ocrData
    ImageViewer->>TextOverlay: render with ocrData + scale
    User->>TextOverlay: Select text + Ctrl+C
    TextOverlay->>TextOverlay: capture selection, write to clipboard
    TextOverlay-->>User: show "Copied!" feedback
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to:
    • frontend/src/components/Media/ImageViewer.tsx: ResizeObserver lifecycle, image scale calculation, interaction with existing pan/zoom and keyboard handlers, and rotation-based layout adjustments.
    • frontend/src/services/OCRService.ts: worker initialization, error handling, reuse/termination semantics, and concurrency during repeated recognize calls.
    • frontend/src/components/Media/TextOverlay.tsx: global keydown listener registration/removal, clipboard use, and accurate bbox scaling/positioning.

Possibly related PRs

Suggested labels

enhancement, UI

Suggested reviewers

  • rahulharpal1603

Poem

🐇
I nibble shadows, trace each line,
Tesseract wakes letters, crisp and fine.
Ctrl+T opens a meadow of text,
I hop, you copy — the words connect. ✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding OCR text selection with Ctrl+T shortcut and copy functionality, which matches the core implementation across all modified files.
Linked Issues check ✅ Passed The PR successfully implements all required objectives from issue #675: OCR text extraction via tesseract.js, selectable text overlay with light-purple highlights, mouse drag selection, Ctrl+C copy functionality, Ctrl+T toggle, automatic panning lock, proper scaling for zoomed/panned images, worker-based processing, and glassmorphism feedback.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #675 requirements: tesseract.js dependency, OCRService for worker management, ImageViewer integration with OCR state and keyboard shortcuts, and TextOverlay component for rendering selectable text.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
frontend/src/components/Media/ImageViewer.tsx (1)

76-107: Consider worker cleanup for memory efficiency.

The OCR worker persists for the app's lifetime. For a Tauri desktop app, this is likely acceptable. However, if memory efficiency is a concern for long-running sessions, consider terminating the worker after a period of inactivity or when the ImageViewer unmounts.

// Optional: Add cleanup on unmount
useEffect(() => {
  return () => {
    // Optionally terminate worker when viewer unmounts
    // ocrService.terminate();
  };
}, []);
frontend/src/components/Media/TextOverlay.tsx (1)

105-125: Consider extracting inline styles to a CSS file.

The inline <style> tag is injected on every render. While functional, this could cause style duplication if multiple overlays exist. For better maintainability, extract these styles to a CSS module or the project's stylesheet.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c37d8df and 24bbb6b.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (4)
  • frontend/package.json (1 hunks)
  • frontend/src/components/Media/ImageViewer.tsx (3 hunks)
  • frontend/src/components/Media/TextOverlay.tsx (1 hunks)
  • frontend/src/services/OCRService.ts (1 hunks)
🔇 Additional comments (11)
frontend/package.json (1)

69-69: LGTM! Appropriate dependency for OCR functionality.

The tesseract.js library is a well-established client-side OCR solution. Version 5.1.0 uses web workers by default, which aligns with the PR's goal of non-blocking OCR processing.

frontend/src/services/OCRService.ts (2)

7-32: Solid lazy initialization pattern with proper error recovery.

The worker initialization correctly handles the race condition where multiple calls to getWorker() occur simultaneously by caching the promise. The error handling appropriately clears workerPromise to allow retry on failure.


54-54: Singleton pattern is appropriate; verify worker cleanup on app lifecycle.

The singleton ensures a single worker instance is reused across the app. Ensure ocrService.terminate() is called when the application unmounts or when the feature is no longer needed to free up memory.

frontend/src/components/Media/ImageViewer.tsx (4)

1-7: Clean imports for OCR integration.

The imports are well-organized and appropriate for the new OCR functionality.


39-44: Good practice to reset OCR state on image change.

Clearing OCR state when imagePath changes prevents stale overlay data from appearing on a different image.


46-74: Well-implemented scale tracking with proper cleanup.

The ResizeObserver pattern correctly tracks image dimensions for overlay alignment. The cleanup properly removes the event listener and disconnects the observer.


109-183: Well-structured OCR UI with appropriate status indicators.

The loading and active state overlays provide clear user feedback. Disabling panning during OCR mode correctly prioritizes text selection UX.

Minor: convertFileSrc(imagePath) is called in multiple places. Consider memoizing it if performance becomes a concern, though the impact is likely negligible.

frontend/src/components/Media/TextOverlay.tsx (4)

5-8: Clean interface definition.

The props interface is well-defined with appropriate optionality for scale.


54-70: Good event propagation management for text selection.

The onMouseDown stopPropagation correctly prevents the zoom/pan wrapper from capturing mouse events during text selection.


71-104: Clever transparent text overlay approach for selection.

The transparent text with visible hover states allows native text selection while maintaining the original image visibility. The title attribute provides accessibility via tooltips.


38-39: Avoid as any type assertion; use proper typing.

The Page type from tesseract.js v5 includes a lines property as Line[], so the as any assertion is unnecessary and bypasses type safety. Use ocrData.lines ?? [] instead. If TypeScript raises errors after removing the assertion, verify that the tesseract.js type definitions are correctly installed and that ocrData is properly typed as Page.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24bbb6b and 3bb8d9b.

📒 Files selected for processing (3)
  • frontend/src/components/Media/ImageViewer.tsx (3 hunks)
  • frontend/src/components/Media/TextOverlay.tsx (1 hunks)
  • frontend/src/services/OCRService.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/components/Media/TextOverlay.tsx
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/components/Media/ImageViewer.tsx (2)
frontend/src/services/OCRService.ts (1)
  • ocrService (59-59)
frontend/src/components/Media/TextOverlay.tsx (1)
  • TextOverlay (10-134)
🔇 Additional comments (5)
frontend/src/services/OCRService.ts (1)

1-59: Well-structured singleton OCR service with proper lifecycle management.

The implementation correctly handles:

  • Lazy worker initialization with caching
  • Race condition in terminate() by awaiting pending workerPromise (previous review issue addressed)
  • Error recovery by clearing workerPromise on initialization failure, allowing retry

One consideration: the worker persists until explicitly terminated. If OCR is used infrequently, you may want to add an idle timeout to auto-terminate the worker and free resources.

frontend/src/components/Media/ImageViewer.tsx (4)

1-29: Clean state management for OCR integration.

The separation of isOCRActive, ocrData, and isOCRLoading provides clear control over the different aspects of the OCR feature. Using imgRef for scale calculation is the right approach.


46-74: Robust scale calculation with proper cleanup.

Good defensive check for naturalWidth > 0 to avoid division issues. The ResizeObserver ensures the overlay stays aligned when the image container resizes.


82-117: Stale data race condition properly addressed.

The imagePathRef pattern correctly prevents OCR results from being applied to a different image (previous review issue addressed).

Minor consideration: if the user rapidly toggles OCR off/on while loading, the single ocrData check at line 93 might cause the second activation to skip OCR since ocrData is still being populated from the first request. This is edge-case behavior that's unlikely to cause real problems.


138-192: Good integration of OCR overlay with zoom/pan controls.

The panning={{ disabled: isOCRActive }} correctly locks image movement during text selection. Placing TextOverlay inside the rotated container ensures proper alignment with rotated images.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
frontend/src/components/Media/ImageViewer.tsx (4)

25-29: Tighten transformRef typing; OCR state additions look fine.

imgRef, isOCRActive, ocrData, isOCRLoading, and imageScale are well-structured for the OCR flow. The only nit is transformRef being any, which loses type safety around zoom/pan methods. If react-zoom-pan-pinch exposes a ref type (e.g., ReactZoomPanPinchRef), prefer:

-import { TransformWrapper, TransformComponent } from 'react-zoom-pan-pinch';
+import { TransformWrapper, TransformComponent, ReactZoomPanPinchRef } from 'react-zoom-pan-pinch';

-const transformRef = useRef<any>(null);
+const transformRef = useRef<ReactZoomPanPinchRef | null>(null);

47-75: Scale calculation is reasonable; verify alignment for extreme aspect ratios.

Using width / naturalWidth with a ResizeObserver on the <img> is a solid approach and should keep overlays aligned under zoom/pan (since both image and overlay sit inside the same transformed container). The only potential edge case is very tall/wide images where objectFit: 'contain' introduces letterboxing; in those cases, you may need to factor in height and potential offsets (e.g., via getBoundingClientRect() plus container size) to keep OCR lines perfectly aligned.


83-118: Ctrl+T OCR toggle and stale-result guard look solid; consider global hotkey scope.

The Ctrl+T handler correctly:

  • Toggles isOCRActive and respects existing ocrData/isOCRLoading.
  • Prevents duplicate OCR runs.
  • Uses imagePathRef so late results don’t overwrite state after an image change.
  • Cleans up the event listener on unmount.

One thing to double‑check: attaching the handler to window means Ctrl+T is intercepted anywhere in the app while this viewer is mounted. If there are (now or later) other views that need Ctrl+T or browser‑style shortcuts, you may want to scope this to when the viewer (or its parent panel) is focused instead of globally.


162-191: Image container + TextOverlay integration looks correct; watch alignment if layout changes.

Placing the <img> and <TextOverlay> inside a relatively positioned, rotated container keeps OCR rectangles and image in the same transform context, and draggable={false} avoids conflicts with selection. Passing scale={imageScale} into TextOverlay lines up with the earlier scale calculation; as long as the image sizing rules remain maxWidth/maxHeight + objectFit: 'contain', overlays should stay in sync under zoom/rotation.

If future layout tweaks change how the image is sized/positioned, remember this coupling between imageScale and the <img>’s rendered box so the overlay doesn’t drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bb8d9b and 9e2cc0d.

📒 Files selected for processing (1)
  • frontend/src/components/Media/ImageViewer.tsx (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/components/Media/ImageViewer.tsx (2)
frontend/src/services/OCRService.ts (1)
  • ocrService (59-59)
frontend/src/components/Media/TextOverlay.tsx (1)
  • TextOverlay (10-134)
🔇 Additional comments (6)
frontend/src/components/Media/ImageViewer.tsx (6)

1-7: OCR-related imports look good; confirm Page type matches ocrService output.

The added imports (OCRService singleton, TextOverlay, tesseract.js Page, Loader2) are cohesive and keep OCR concerns localized to this component. Just ensure ocrService.recognize actually returns a Page (or a compatible shape) so the Page | null typing remains accurate end‑to‑end.


39-45: Resetting OCR state on image/reset change is correct.

Resetting isOCRActive, ocrData, and isOCRLoading together when imagePath/resetSignal changes avoids stale overlays and loading indicators carrying over between images. This aligns well with the intended flow.


77-82: imagePathRef pattern is a good guard against stale OCR updates.

Keeping the latest imagePath in a ref and syncing it via useEffect gives you a simple way to ignore late OCR results from previous images. The setup here works cleanly with the reset effect above.


121-137: Status badges cleanly communicate OCR state.

The “Processing Text…” and “Text Selection Active” glassmorphism badges are clearly conditioned on isOCRLoading vs. isOCRActive && ocrData and are layered with appropriate z‑indices. This gives good feedback without interfering with the image content.


139-147: Disabling panning while OCR is active matches the selection UX requirement.

Using panning={{ disabled: isOCRActive }} on TransformWrapper is a straightforward way to lock drag‑panning during OCR mode while still allowing zoom. This matches the “lock image panning while text selection is active” objective.


148-161: Wrapper/content styles are appropriate for overlays and zoom.

Setting wrapperStyle to full width/height with overflow: 'visible' and centering content ensures the image and overlays remain visible under zoom/pan without clipping. This integrates well with the absolute‑positioned overlay approach.

Copy link
Contributor

@rohan-pandeyy rohan-pandeyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, kindly go through these reviews.
One more and one of the most suggestion is that you should first get these features discussed and approved with @rahulharpal1603, for the implementation to be merged into PictoPy.

}}
<div className="relative w-full h-full">
{isOCRLoading && (
<div className="absolute top-6 right-6 z-[60] bg-zinc-900/80 text-white px-4 py-2 rounded-full flex items-center gap-3 backdrop-blur-md border border-white/10 shadow-xl animate-in fade-in slide-in-from-top-2 duration-300">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tailwind is warning that the class z-[60] can be written as z-60. Since it’s a standard value, we don’t need the arbitrary syntax here.

)}

{isOCRActive && !isOCRLoading && ocrData && (
<div className="absolute top-6 right-6 z-[60] bg-blue-600/90 text-white px-4 py-2 rounded-full flex items-center gap-2 backdrop-blur-md border border-white/10 shadow-xl animate-in fade-in slide-in-from-top-2 duration-300">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same tailwind warning here, kindly refer to the above review comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "Processing Text" and "Text Selection Active" visuals, I'd suggest you move these two divs' position. Currently they're covering the buttons on top right, making them inaccessible.

<>
{showCopyFeedback && (
<div
className="fixed bottom-8 left-1/2 transform -translate-x-1/2 z-[100] bg-zinc-900/90 text-white px-4 py-2.5 rounded-full flex items-center gap-2.5 backdrop-blur-md shadow-xl border border-white/10 animate-in fade-in slide-in-from-bottom-4 duration-300"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar tailwind warning can be seen here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another addition required would be that you should add a button for text detection along side the shortcut which most users might never discover.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
frontend/src/components/Media/ImageViewer.tsx (2)

158-161: Use standard Tailwind z-60 instead of arbitrary z-[60]} syntax.

Per previous review feedback, Tailwind warns that z-[60] can be simplified to z-60 since it's a standard value.

-          className={`absolute top-6 left-6 z-60 flex items-center gap-2 ...
+          className={`absolute top-6 left-6 z-60 flex items-center gap-2 ...

Apply the same fix to lines 171 and 178 if they currently use z-[60].

Also applies to: 171-171, 178-178


218-235: Consider adding loading="lazy" for performance.

For a gallery/viewer application, lazy loading can improve initial render performance when multiple images might be in the DOM.

 <img
   ref={imgRef}
   src={convertFileSrc(imagePath) || '/placeholder.svg'}
   alt={alt}
   draggable={false}
+  loading="lazy"
   className="select-none block max-w-full max-h-full"
frontend/src/components/Media/TextOverlay.tsx (3)

43-44: Type safety: Avoid as any cast; define or extend the type properly.

Casting ocrData as any and using line: any loses type safety and IDE assistance. The Page type from tesseract.js should include lines, but if it doesn't match your usage, define a proper interface.

+interface OcrLine {
+  bbox: { x0: number; y0: number; x1: number; y1: number };
+  text: string;
+}
+
+interface OcrPageWithLines extends Page {
+  lines?: OcrLine[];
+}

 interface TextOverlayProps {
-    ocrData: Page | null;
+    ocrData: OcrPageWithLines | null;
     scale?: number;
 }

Then update the map:

-    const lines = (ocrData as any).lines || [];
+    const lines: OcrLine[] = ocrData.lines ?? [];
     // ...
-    {lines.map((line: any, index: number) => {
+    {lines.map((line, index) => {

Also applies to: 76-77


110-130: Consider moving inline styles to a CSS module or styled-components.

Inline <style> tags within component render can cause:

  1. Style duplication if multiple instances render
  2. Potential conflicts with global styles
  3. No CSS scoping guarantees

For this PR, it's acceptable since TextOverlay is typically a singleton, but for maintainability, consider extracting to a CSS file.


16-31: Copy handler intercepts all Ctrl+C events globally.

The handler captures Ctrl+C on window, which could interfere with copy operations in other parts of the app. Consider scoping this to only when the overlay or its content is focused/selected.

Add a check to ensure the selection is within the overlay:

 const handleKeyDown = async (e: KeyboardEvent) => {
     if (e.ctrlKey && e.key.toLowerCase() === 'c') {
         const selection = window.getSelection();
         const text = selection?.toString().trim();
+        
+        // Only handle if selection is within the OCR overlay
+        const anchorNode = selection?.anchorNode;
+        const isWithinOverlay = anchorNode && 
+            document.querySelector('.ocr-overlay-container')?.contains(anchorNode);
+        if (!isWithinOverlay) return;

         if (text && text.length > 0) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e2cc0d and 4a3362c.

📒 Files selected for processing (2)
  • frontend/src/components/Media/ImageViewer.tsx (3 hunks)
  • frontend/src/components/Media/TextOverlay.tsx (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/components/Media/ImageViewer.tsx (2)
frontend/src/services/OCRService.ts (1)
  • ocrService (59-59)
frontend/src/components/Media/TextOverlay.tsx (1)
  • TextOverlay (10-134)
🔇 Additional comments (6)
frontend/src/components/Media/ImageViewer.tsx (3)

47-75: ResizeObserver setup is solid.

The effect correctly handles initial load, load events, and resize observation with proper cleanup. The dependency on imagePath ensures re-subscription when the image changes.


194-194: Disabling panning during OCR mode is a good UX choice.

This correctly prevents accidental panning while the user is trying to select text, aligning with the PR objective to lock image panning during text selection.


220-220: Fallback to placeholder won't trigger if convertFileSrc returns an empty string.

The expression convertFileSrc(imagePath) || '/placeholder.svg' only falls back if convertFileSrc returns a falsy value. If it returns an empty string "", the fallback works, but if it throws or returns an invalid URL, the onError handler is the actual safety net. The current approach is acceptable since onError handles the failure case.

frontend/src/components/Media/TextOverlay.tsx (3)

13-39: Timeout cleanup is properly implemented.

The feedback timeout is correctly cleared on unmount, addressing the previous memory leak concern. The event listener cleanup is also correct.


83-108: Line rendering logic is well-implemented.

The positioning calculations correctly apply the scale factor to bbox coordinates. Using title={text} provides accessibility for the transparent text. The styling choices (transparent color, text cursor) align with the UX goal of invisible-but-selectable text.


50-50: z-100 is not a default Tailwind utility class.

Tailwind's default z-index scale includes z-0, z-10, z-20, z-30, z-40, z-50, and z-auto only. The class z-100 will be ignored unless it's added to your theme configuration or you use the arbitrary value syntax z-[100].

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/src/components/Media/ImageViewer.tsx (1)

83-105: Consider using refs to prevent unnecessary callback recreation.

The triggerOCR callback includes ocrData and isOCRLoading in its dependency array, causing it to be recreated every time these values change. This triggers the keyboard handler effect (line 125) to re-run and re-register the event listener unnecessarily.

Since these values are only used for a guard check (line 84), consider using refs instead:

+  const ocrDataRef = useRef(ocrData);
+  const isOCRLoadingRef = useRef(isOCRLoading);
+  useEffect(() => {
+    ocrDataRef.current = ocrData;
+    isOCRLoadingRef.current = isOCRLoading;
+  }, [ocrData, isOCRLoading]);

   const triggerOCR = useCallback(async () => {
-    if (ocrData || isOCRLoading) return;
+    if (ocrDataRef.current || isOCRLoadingRef.current) return;

     setIsOCRLoading(true);
     const currentPath = imagePathRef.current;

     try {
       const src = convertFileSrc(currentPath);
       const data = await ocrService.recognize(src);

       if (currentPath === imagePathRef.current) {
         setOcrData(data);
       }
     } catch (error) {
       console.error('Failed to perform OCR', error);
       setIsOCRActive(false);
     } finally {
       if (currentPath === imagePathRef.current) {
         setIsOCRLoading(false);
       }
     }
-  }, [ocrData, isOCRLoading]);
+  }, []);

This prevents the callback from being recreated and stabilizes the keyboard handler registration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a3362c and a223a88.

📒 Files selected for processing (1)
  • frontend/src/components/Media/ImageViewer.tsx (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/components/Media/ImageViewer.tsx (2)
frontend/src/services/OCRService.ts (1)
  • ocrService (59-59)
frontend/src/components/Media/TextOverlay.tsx (1)
  • TextOverlay (10-134)
🔇 Additional comments (8)
frontend/src/components/Media/ImageViewer.tsx (8)

1-7: LGTM! Clean import organization.

All new imports are properly used throughout the component. The addition of useCallback improves performance optimization for the OCR trigger logic.


25-29: Well-structured state management.

The OCR-related state variables are properly typed and initialized. The use of Page type from tesseract.js ensures type safety for OCR results.


39-45: Proper cleanup on image change.

The effect correctly resets all OCR-related state when the image changes, preventing stale OCR data from being displayed on a new image. Previous reviewer's concern about missing isOCRLoading reset has been properly addressed.


47-75: Comprehensive scale tracking implementation.

The effect handles multiple scenarios correctly:

  • Initial render when image is already loaded (img.complete)
  • Async image loading via the load event
  • Container/image resize via ResizeObserver

The cleanup properly removes listeners and disconnects the observer. The scale calculation is accurate for aligning OCR overlays with zoomed/panned images.


107-125: Keyboard handler correctly implements Ctrl+T toggle.

The event handler properly prevents the default browser behavior and toggles OCR mode as expected. The cleanup ensures no memory leaks from event listeners.


130-150: Excellent UI affordance for OCR feature.

The button provides good visual feedback with:

  • Clear icon and label
  • Active state styling with pulse animation
  • Keyboard shortcut hint for discoverability
  • Proper event propagation handling

This addresses the previous reviewer's request for a button alongside the keyboard shortcut.


152-167: Verify indicator positioning doesn't interfere with other UI elements.

A previous reviewer noted that these indicators might cover buttons on the top right. While the indicators are positioned on the left (left-6), on smaller viewports or with certain zoom levels, they could potentially overlap with other UI controls.

Please test the following scenarios to ensure the indicators don't interfere with other UI elements:

  1. Small viewport sizes (mobile, tablet)
  2. Different zoom levels in the TransformWrapper
  3. Verify accessibility of any top-right buttons (e.g., close, menu) while indicators are visible

Consider adding responsive positioning or adjusting the z-index hierarchy if overlap occurs.

Based on learnings, previous reviewer flagged this as a positioning concern.


169-223: Well-integrated OCR overlay with existing zoom/pan functionality.

The implementation correctly:

  • Disables panning when OCR is active (line 176) to allow text selection
  • Applies rotation at the container level (line 195) so overlay aligns with rotated images
  • Passes accurate imageScale to TextOverlay (line 219) for proper overlay positioning
  • Maintains existing zoom/pan behavior when OCR is inactive

The conditional rendering ensures the overlay only appears when OCR data is available, and the overflow styling allows the overlay to be fully visible.

@Aryan-Shan
Copy link
Author

Hi @rohan-pandeyy I have gone through your comments and implemented the required changes. I'll ask @rahulharpal1603 to review it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: OCR Text Selection

2 participants