-
Notifications
You must be signed in to change notification settings - Fork 915
Add a configurable LangExtract recognizer for use with any provider. #1815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…with any provider.
…with any provider.
RonShakutai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a great PR !
he lx_factory.ModelConfig approach is elegant !
left a few comments to finalize it
presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml
Outdated
Show resolved
Hide resolved
...nalyzer/presidio_analyzer/predefined_recognizers/third_party/basic_langextract_recognizer.py
Show resolved
Hide resolved
...nalyzer/presidio_analyzer/predefined_recognizers/third_party/basic_langextract_recognizer.py
Show resolved
Hide resolved
...nalyzer/presidio_analyzer/predefined_recognizers/third_party/basic_langextract_recognizer.py
Outdated
Show resolved
Hide resolved
...nalyzer/presidio_analyzer/predefined_recognizers/third_party/basic_langextract_recognizer.py
Show resolved
Hide resolved
|
Good comments! I'm happy to make the requested changes, but it may need to be late this week or early next. |
No worries, whenever you get a chance :) |
|
@telackey |
Hi :) @telackey |
|
Yes, the holidays just intervened. |
|
same here with |
We're really like this PR, |
|
Yes, in fact I'm going to take a look at it today. I'll keep you posted. |
* Address comments * Replace ollama_langextract_recognizer with basic_langextract_recognizer. * Replace ollama_langextract_recognizer with basic_langextract_recognizer. * Replace ollama_langextract_recognizer with basic_langextract_recognizer. * Working so far * Working so far * Working so far * remove dead code * bad comment --------- Co-authored-by: Kassymkhan Bekbolatov <[email protected]>
|
I replaced the OllamaLangExtractRecognizer with this one. I haven't had a chance to run the e2e tests yet, but the basic unit tests are passing and I think I addressed all the questions. One wrinkle is that there is a chicken/egg problem at the moment with the extract_params being loaded from the config, but the config file is read in the parent class. My inelegant workaround is to update the extract_params again after the parent constructor is finished. It is a bit wonky, but it works. I will take a look at the e2e tests on Monday. At the very least this should be much closer to what you were hoping for. |
|
Oh, all the defaults match the previous Ollama recognizer defaults, so it should be a low risk change in some ways, but it does mean yaml/config changes. |
* Replace ollama_langextract_recognizer with basic_langextract_recognizer. * Fix LangExtract error --------- Co-authored-by: Kassymkhan Bekbolatov <[email protected]>
|
@RonShakutai e2e tests are updated. Should be ready for a second look. |
|
Fixed ruff |
|
The explicit calls to: Are to work around this bug: google/langextract#331 |
Change Description
Add a new, basic LangExtract-based recognizer class that is generic. The current implementations focus on ollama or azure support. This one instantiates an lx.ModelConfig from the yaml, so that it can specify different providers and custom configurations (eg, developed using Ollama via a LiteLLM OpenAI proxy).
Issue reference
Fixes #XX
Checklist