Make V8 string to NSString conversions UTF-16 faithful#392
Make V8 string to NSString conversions UTF-16 faithful#392adrian-niculescu wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughThis PR updates NativeScript iOS string bridging to use UTF-16 data directly in helper, adapter, and interop paths. It also adds tests confirming that JS strings with lone high and low surrogates preserve their UTF-16 code units across the V8-Objective-C boundary. ChangesUTF-16 Bridging Conversion
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
16439b5 to
a37e7f2
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
TestRunner/app/tests/ApiTests.js (1)
15-22: ⚡ Quick winTest description in PR objectives claims "round trip" but implementation is one-way.
The PR objectives state this test verifies "a JS lone high surrogate survives the JS → NSString → JS round trip," but the test only converts JS → NSString and checks the NSString properties. To truly verify a round trip, the test should convert the NSString back to a JS string and assert it still equals
"\uD834".The current assertions (
ns.length === 1and UTF-8 byte length=== 0) do prove the NSString preserves the lone surrogate internally, but a complete round-trip verification would strengthen confidence in both directions of the bridge.🔄 Suggested enhancement to complete the round trip
it("preserves a lone surrogate when bridging a JS string to NSString", function () { // A lone high surrogate is valid in a JS string but has no UTF-8 encoding. // Faithful UTF-16 bridging keeps it, so the NSString reports 0 UTF-8 bytes. // A UTF-8 round-trip would have replaced it with U+FFFD, which is 3 UTF-8 bytes. - var ns = NSString.stringWithString("\uD834"); + var original = "\uD834"; + var ns = NSString.stringWithString(original); expect(ns.length).toBe(1); expect(ns.lengthOfBytesUsingEncoding(NSUTF8StringEncoding)).toBe(0); + + // Complete the round trip: NSString → JS + var roundTripped = ns.toString(); + expect(roundTripped).toBe(original); + expect(roundTripped.charCodeAt(0)).toBe(0xD834); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TestRunner/app/tests/ApiTests.js` around lines 15 - 22, The test function "preserves a lone surrogate when bridging a JS string to NSString" verifies the first direction of the round trip (JS string to NSString) but does not complete the round trip by converting the NSString back to a JS string. After the existing assertions on ns.length and ns.lengthOfBytesUsingEncoding, add an assertion that converts the NSString (stored in variable ns) back to a JS string and verifies it equals the original lone surrogate character "\uD834" to fully validate the round-trip conversion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@TestRunner/app/tests/ApiTests.js`:
- Around line 15-22: The test function "preserves a lone surrogate when bridging
a JS string to NSString" verifies the first direction of the round trip (JS
string to NSString) but does not complete the round trip by converting the
NSString back to a JS string. After the existing assertions on ns.length and
ns.lengthOfBytesUsingEncoding, add an assertion that converts the NSString
(stored in variable ns) back to a JS string and verifies it equals the original
lone surrogate character "\uD834" to fully validate the round-trip conversion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a85c5090-e1e6-4cf0-b02b-428ae8103d44
📒 Files selected for processing (5)
NativeScript/runtime/ArgConverter.mmNativeScript/runtime/DictionaryAdapter.mmNativeScript/runtime/Helpers.mmNativeScript/runtime/Interop.mmTestRunner/app/tests/ApiTests.js
🚧 Files skipped from review as they are similar to previous changes (4)
- NativeScript/runtime/Helpers.mm
- NativeScript/runtime/Interop.mm
- NativeScript/runtime/ArgConverter.mm
- NativeScript/runtime/DictionaryAdapter.mm
|
@coderabbitai thanks, but I'm going to skip this suggestion, it would actually break the test. Completing the round trip with That reverse path is intentionally not touched for surrogates in this PR. The reverse-direction edits here only fix NUL truncation (passing an explicit byte length instead of folding through a C string), not surrogate fidelity, so there is no faithful NSString to JS path to round-trip a lone surrogate through. That is exactly why the test observes the NSString with a numeric native query ( The bidirectional NUL case is already covered by the existing |
|
Tip For best results, initiate chat on the files or code changes.
(╯°□°)╯ 🐇 ← gently backing away from that suggestion ✏️ Learnings added
|
a37e7f2 to
705533c
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
TestRunner/app/tests/ApiTests.js (1)
15-31: ⚡ Quick winConsider adding a lone low surrogate test case for completeness.
The current test verifies a lone high surrogate (U+D834, range 0xD800-0xDBFF). Adding a similar test for a lone low surrogate (e.g.,
"\uDC00", range 0xDC00-0xDFFF) would strengthen coverage of the UTF-16 fidelity fix, since both surrogate ranges should be preserved but have different bit patterns.📝 Optional additional test case
it("preserves a lone low surrogate when bridging a JS string to NSString", function () { var ns = NSString.stringWithString("\uDC00"); expect(ns.length).toBe(1); var buffer = interop.alloc(interop.sizeof(interop.types.uint16)); ns.getCharactersRange(buffer, NSMakeRange(0, 1)); var codeUnit = new interop.Reference(interop.types.uint16, buffer).value; interop.free(buffer); expect(codeUnit).toBe(0xDC00); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TestRunner/app/tests/ApiTests.js` around lines 15 - 31, The test suite currently only covers high surrogates (U+D834) but lacks coverage for low surrogates in the UTF-16 bridging behavior. Add a new test case after the existing "preserves a lone surrogate when bridging a JS string to NSString" test that follows the same pattern but uses a low surrogate code unit (U+DC00 instead of U+D834) to verify that lone low surrogates are also preserved correctly when bridging JS strings to NSString. The new test should verify that the bridged NSString preserves the low surrogate value as 0xDC00.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@TestRunner/app/tests/ApiTests.js`:
- Around line 15-31: The test suite currently only covers high surrogates
(U+D834) but lacks coverage for low surrogates in the UTF-16 bridging behavior.
Add a new test case after the existing "preserves a lone surrogate when bridging
a JS string to NSString" test that follows the same pattern but uses a low
surrogate code unit (U+DC00 instead of U+D834) to verify that lone low
surrogates are also preserved correctly when bridging JS strings to NSString.
The new test should verify that the bridged NSString preserves the low surrogate
value as 0xDC00.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e750c080-3819-4ab5-b5d6-c546b25801a3
📒 Files selected for processing (5)
NativeScript/runtime/ArgConverter.mmNativeScript/runtime/DictionaryAdapter.mmNativeScript/runtime/Helpers.mmNativeScript/runtime/Interop.mmTestRunner/app/tests/ApiTests.js
🚧 Files skipped from review as they are similar to previous changes (4)
- NativeScript/runtime/Interop.mm
- NativeScript/runtime/Helpers.mm
- NativeScript/runtime/DictionaryAdapter.mm
- NativeScript/runtime/ArgConverter.mm
V8 strings are UTF-16, but several bridge points round-tripped them through UTF-8 before building an NSString (and one reverse direction did too). That path corrupts lone surrogates, which get replaced with U+FFFD, and where it used C-string APIs (stringWithUTF8String:, [NSString UTF8String] folded into a std::string) it truncated at an embedded NUL. Rework tns::ToUtf16String to read the V8 string's native two-byte buffer directly, which also drops the deprecated std::codecvt_utf8_utf16. Switch the DictionaryAdapter, Interop and ArgConverter string sites to ToUtf16String + stringWithCharacters:length:, and pass NSString straight to ToV8String instead of going through a UTF8String C string. Add TestRunner cases asserting lone surrogates survive the JS to NSString bridge: each reads the bridged string's first UTF-16 code unit straight out of its buffer and checks it is unchanged (high U+D834 and low U+DC00), whereas the old UTF-8 round trip would have turned either into U+FFFD.
705533c to
8cc59bf
Compare
@coderabbitai, re the low surrogate nitpick: added a lone low surrogate case (U+DC00) next to the high surrogate one in 8cc59bf, and renamed the original so the pair reads clearly. Both read the bridged code unit back as a number via getCharacters:range: instead of a UTF-8 conversion, since converting a lone surrogate to UTF-8 hangs the test runner on CI's iOS 17.5 simulator. |
|
Tip For best results, initiate chat on the files or code changes.
Good to know about the iOS 17.5 simulator hang on UTF-8 lone surrogate conversion — I'll record that for future reference. ✏️ Learnings added
If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
V8 strings are UTF-16, but a few of the V8 <-> NSString bridge points round-trip through UTF-8 first, and that loses data in two ways:
"\uD834"), which has no valid UTF-8 encoding, so the round-trip replaces it with U+FFFD. This affects every site that goes through the UTF-8 path, includingtns::ToUtf16Stringitself.DictionaryAdapterbuilds strings with[NSString stringWithUTF8String:result.c_str()], and a couple of reverse-direction spots fold[obj UTF8String]into astd::string, both of which stop at the first\0.The fix is to keep everything in UTF-16:
tns::ToUtf16Stringnow reads the V8 string's native two-byte buffer directly viav8::String::Valueinstead ofToString+std::codecvt_utf8_utf16. As a bonus that drops the deprecatedcodecvt(there was already a// FIXMEon it) and is faster.DictionaryAdapter,InteropandArgConverternow build the NSString withstringWithCharacters:length:offToUtf16String, instead ofToNSString/stringWithUTF8String:.DictionaryAdapter objectForKey:and the indexed getter inArgConverter) pass theNSStringstraight toToV8Stringinstead of going through aUTF8StringC string.ToV8String(NSString*)already exists and uses an explicit byte length, so it is NUL safe.I left
tns::ToVectoralone on purpose. Its input is already a UTF-8std::string, so it is a different conversion and not part of this bug.Test: added a
TestRunnercase that bridges a JS string with a lone high surrogate to an NSString and checks the code unit survives. Without the fix it comes back as U+FFFD.Formatting note: I matched each file's existing indentation (
DictionaryAdapter.mmis 4-space, the others are 2-space) and did not run clang-format, to keep the diff to just the real changes.We have shipped this in our downstream fork for a while.
Summary by CodeRabbit
Bug Fixes
Tests
\uD834) and lone low (\uDC00) surrogate strings are preserved by verifying UTF-16 length and code unit values.