Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds sanitization functionality to filter invisible Unicode characters from GitHub issue and pull request titles and bodies. This helps prevent potential security issues and display problems caused by invisible control characters.
Key Changes
- Created a new
sanitizepackage withFilterInvisibleCharactersfunction that removes various invisible Unicode characters - Applied sanitization to PR and issue titles/bodies in the GitHub API response handlers
- Added comprehensive test coverage for the sanitization logic
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/sanitize/sanitize.go | Implements the FilterInvisibleCharacters function to remove invisible Unicode characters |
| pkg/sanitize/sanitize_test.go | Comprehensive test suite covering various invisible character scenarios |
| pkg/github/pullrequests.go | Applies sanitization to PR titles and bodies in GetPullRequest and ListPullRequests |
| pkg/github/issues.go | Applies sanitization to issue titles and bodies in fragmentToIssue and GetIssue |
| .gitignore | Adds .history to ignored files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Sanitize title/body on response | ||
| if issue != nil { | ||
| if issue.Title != nil { | ||
| issue.Title = github.Ptr(sanitize.FilterInvisibleCharacters(*issue.Title)) | ||
| } | ||
| if issue.Body != nil { | ||
| issue.Body = github.Ptr(sanitize.FilterInvisibleCharacters(*issue.Body)) | ||
| } | ||
| } |
There was a problem hiding this comment.
The sanitization logic is duplicated in both GetIssue (lines 327-335) and fragmentToIssue (lines 215, 223). Since GetIssue already calls the GitHub API and returns an issue object, the sanitization in GetIssue may be redundant if the issue comes from a fragment that's already sanitized, or vice versa. Consider whether both locations need sanitization or if one location is sufficient to avoid double-processing.
There was a problem hiding this comment.
Nope, GetIssue is using a REST API and fragmentToIssue is used for graphQL.
Implement basic content sanitisation that is (for now) applied only to title and body of issues and pull requests.
It removes hidden Unicode characters that pose a risk to LLM input.