Skip to content

Conversation

@ohmayr
Copy link
Contributor

@ohmayr ohmayr commented Jan 15, 2026

This PR improves the docs generation performance.

@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Jan 15, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ohmayr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of documentation generation by optimizing the Markdown-to-RST conversion process. It introduces a multi-tiered approach, prioritizing a new, fast, pure-Python converter for most common cases and leveraging a caching mechanism. The external pypandoc tool is now reserved as a fallback for more intricate Markdown elements, thereby reducing its invocation and improving overall efficiency.

Highlights

  • Optimized Markdown to RST Conversion: Introduced a new pure-Python Markdown-to-RST converter (_tuned_fast_convert) that efficiently handles common Markdown elements like code blocks, various link types, bold/italics, headings, and lists using regular expressions.
  • Conversion Caching: Implemented a caching mechanism (_RAW_RST_CACHE) to store previously converted RST snippets, significantly reducing redundant conversions and improving performance for frequently encountered text blocks.
  • Tiered Conversion Strategy: Refactored the rst conversion logic to prioritize a 'Super Fast Path' for unformatted text, followed by a cache lookup, then the new fast Python converter, and finally falling back to the slower pypandoc only for complex Markdown structures such as tables and images.
  • Code Refactoring: Extracted common post-processing logic for handling trailing newlines and appending periods into a new helper function _finalize, improving code organization and readability within the rst function.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR aims to improve documentation generation performance by introducing a custom, pure-Python Markdown-to-RST converter for common cases, which is a good strategy. While reviewing the changes, I found a few areas for improvement. I've identified a bug in heading conversion that can produce invalid RST, an omission in handling numbered lists, and a formatting inconsistency. Most critically, the new logic is not covered by tests, and the changes appear to break existing tests. Please see my detailed comments for suggestions on how to fix these problems. Adding a robust test suite will be essential to ensure the correctness of this performance optimization.

# Cache for the few complex items we actually send to pandoc
_RAW_RST_CACHE: Dict[str, str] = {}

def _tuned_fast_convert(text: str) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This new conversion logic in _tuned_fast_convert is a critical part of the performance improvement, but it lacks unit tests. Additionally, the changes in the rst function appear to break the existing tests in tests/unit/utils/test_rst.py (e.g., test_rst_formatted), as they now bypass the mocked pypandoc.convert_text call.

It's crucial to:

  1. Add comprehensive unit tests for _tuned_fast_convert covering all the regex conversions and edge cases (links, code blocks, headings, lists).
  2. Update the existing tests for the rst function to reflect the new logic, including caching and the fast-path converter.

Without proper tests, it's hard to verify the correctness of these performance optimizations and prevent future regressions.

Comment on lines +56 to +57
converted = re.sub(r"^# (.*)$", r"\1\n" + "=" * 10, converted, flags=re.MULTILINE)
converted = re.sub(r"^## (.*)$", r"\1\n" + "-" * 10, converted, flags=re.MULTILINE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The RST specification requires that heading underlines are at least as long as the heading text. Using a fixed length of 10 characters for the underline can result in invalid RST for headings longer than 10 characters, which would break the documentation build. The replacement in re.sub should be a function that calculates the correct underline length based on the matched heading text.

Suggested change
converted = re.sub(r"^# (.*)$", r"\1\n" + "=" * 10, converted, flags=re.MULTILINE)
converted = re.sub(r"^## (.*)$", r"\1\n" + "-" * 10, converted, flags=re.MULTILINE)
converted = re.sub(r"^# (.*)$", lambda m: f"{m.group(1)}\n{'=' * len(m.group(1))}", converted, flags=re.MULTILINE)
converted = re.sub(r"^## (.*)$", lambda m: f"{m.group(1)}\n{'-' * len(m.group(1))}", converted, flags=re.MULTILINE)

converted = re.sub(r"^## (.*)$", r"\1\n" + "-" * 10, converted, flags=re.MULTILINE)

# F. LISTS: Markdown (- item) needs a preceding newline for RST.
converted = re.sub(r"(\n[^-*].*)\n\s*([-*] )", r"\1\n\n\2", converted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex for list conversion only handles unordered lists starting with - or *. It does not handle numbered lists (e.g., 1. item). This will cause numbered lists to be rendered incorrectly, as they also require a blank line preceding them in RST. The logic should be updated to also handle numbered lists.

if "::" in raw_rst or ".. code" in raw_rst:
answer = raw_rst.replace("\n", f"\n{' ' * indent}")
else:
answer = wrap(raw_rst, indent=indent, offset=indent, width=width - indent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call to wrap here uses offset=indent, while the "Super Fast Path" on line 73 uses offset=indent + 3. This inconsistency can lead to different wrapping for plain text versus text with markdown, resulting in inconsistent formatting. For consistent output, the offset should be the same in both places.

Suggested change
answer = wrap(raw_rst, indent=indent, offset=indent, width=width - indent)
answer = wrap(raw_rst, indent=indent, offset=indent + 3, width=width - indent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant