feat: implement OOD query filtering in bot logic and enforce scoped i…#11
feat: implement OOD query filtering in bot logic and enforce scoped i…#11kpj2006 wants to merge 4 commits into
Conversation
…nteraction guidelines via .clinerules
WalkthroughAdds an out-of-domain query guard to ChangesOut-of-Domain Query Guard and Thread Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bot.py`:
- Around line 166-182: The is_query_covered function currently checks if
keywords exist in either the current query or the entire conversation context
history, which allows past in-domain messages to incorrectly mark a new
out-of-domain query as covered. Remove the context checking from the keyword
matching logic in the is_query_covered function by changing the condition that
currently checks both query and context to only check the current query. This
ensures coverage determination is based solely on the current user query and not
the entire thread history.
- Around line 166-182: The keyword matching in the is_query_covered function
uses simple substring checks (kw in q and kw in ctx) which cause false positives
with short keywords like "pr" matching unrelated words like "price". Replace the
substring matching logic with word boundary checking to ensure only complete
words are matched, not partial substrings. Consider using regular expressions
with word boundaries (\b) to check if each keyword in the categories dictionary
appears as a whole word in the query or context strings rather than allowing
arbitrary substring matches.
- Around line 192-196: In the is_in_configured_channel assignment, replace the
attribute access `message.channel.parent.id` with `message.channel.parent_id`
when is_in_thread is true. The parent attribute can be None when the parent
channel is unavailable in the client cache (in discord.py 2.3.2), which causes
an AttributeError when dereferencing .id. The parent_id attribute is always
stable and available, so use that instead for the channel ID comparison.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8e50c4d3-a666-4955-8623-21261ec8842a
📒 Files selected for processing (2)
.clinerulesbot.py
…improve LLM prompt context inclusion
… ensuring proper logging and fallback mechanisms
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bot.py (1)
212-215:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNarrow generic keywords to keep the OOD gate effective.
"run","start", and especially"issue"are so broad that many unrelated queries will be marked as covered, bypassing the OOD clarification path.Suggested patch
categories = { - "setup": ["setup", "install", "run", "build", "clone", "docker", "env", "start", "dev server", "npm run dev"], + "setup": ["setup", "install", "build", "clone", "docker", "environment", "dev server", "npm run dev"], "readme": ["readme", "read me", "documentation", "project name", "description", "user flow", "feature"], - "contribute": ["contribute", "contributor", "fork", "pr", "pull request", "issue", "branch", "git", "onboarding"], - "error": ["error", "exception", "bug", "fail", "crash", "issue", "logs", "broken", "debug", "not working"] + "contribute": ["contribute", "contributor", "fork", "pr", "pull request", "github issue", "branch", "git", "onboarding"], + "error": ["error", "exception", "bug", "fail", "crash", "stack trace", "logs", "broken", "debug", "not working"] }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bot.py` around lines 212 - 215, The keyword lists in the dictionary are too generic and cause false positives in out-of-distribution detection. Remove the overly broad keywords "run", "start", and "issue" from the respective lists in the setup, contribute, and error categories. Replace "run" and "start" in the setup list with more specific terms that relate directly to project initialization or development setup (such as "startup" or "initialize"). Remove "issue" from both the contribute and error lists as it appears in multiple contexts and is too ambiguous, or replace it with more specific terminology like "github issue" or "bug report" that narrows the intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bot.py`:
- Around line 117-120: The err_msg variable in the Ollama client error handling
is directly including e.response.text which exposes internal backend details to
end users. Instead, log the full response details (e.response.text) to the
internal logger for debugging purposes, but modify the err_msg to return only a
sanitized, user-friendly error message that does not include the raw response
body details. This way, users see a safe generic error message while developers
can still access the full details in the logs.
---
Outside diff comments:
In `@bot.py`:
- Around line 212-215: The keyword lists in the dictionary are too generic and
cause false positives in out-of-distribution detection. Remove the overly broad
keywords "run", "start", and "issue" from the respective lists in the setup,
contribute, and error categories. Replace "run" and "start" in the setup list
with more specific terms that relate directly to project initialization or
development setup (such as "startup" or "initialize"). Remove "issue" from both
the contribute and error lists as it appears in multiple contexts and is too
ambiguous, or replace it with more specific terminology like "github issue" or
"bug report" that narrows the intent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| err_msg = ( | ||
| f"Local Ollama configuration or client error (HTTP {e.response.status_code}).\n" | ||
| f"Details: {e.response.text}" | ||
| ) |
There was a problem hiding this comment.
Do not return raw Ollama response bodies to end users.
e.response.text can expose backend/internal details in public bot replies. Keep full details in logs; return a sanitized user-facing message.
Suggested patch
elif 400 <= e.response.status_code < 500:
- err_msg = (
- f"Local Ollama configuration or client error (HTTP {e.response.status_code}).\n"
- f"Details: {e.response.text}"
- )
+ logger.error(
+ "Local Ollama client/config error HTTP %s: %s",
+ e.response.status_code,
+ e.response.text[:500],
+ )
+ err_msg = (
+ f"Local Ollama configuration or client error (HTTP {e.response.status_code}). "
+ "Please contact a maintainer."
+ )
return err_msg, True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bot.py` around lines 117 - 120, The err_msg variable in the Ollama client
error handling is directly including e.response.text which exposes internal
backend details to end users. Instead, log the full response details
(e.response.text) to the internal logger for debugging purposes, but modify the
err_msg to return only a sanitized, user-friendly error message that does not
include the raw response body details. This way, users see a safe generic error
message while developers can still access the full details in the logs.
…nteraction guidelines via .clinerules
Addressed Issues:
Fixes #(issue number)
Screenshots/Recordings:
Additional Notes:
Checklist
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.
Summary by CodeRabbit