Skip to content

Comments

feat: add integration with vision camera v5#810

Draft
NorbertKlockiewicz wants to merge 12 commits intomainfrom
@nk/vision-camera
Draft

feat: add integration with vision camera v5#810
NorbertKlockiewicz wants to merge 12 commits intomainfrom
@nk/vision-camera

Conversation

@NorbertKlockiewicz
Copy link
Contributor

@NorbertKlockiewicz NorbertKlockiewicz commented Feb 16, 2026

Description

Example of how to use the API with vision camera v5: https://gist.github.com/NorbertKlockiewicz/5d62915d16955979c029303591912d6a

For now this PR is in experimental phase so when reviewing please focus on the user facing API + implementation of ObjectDetection both on TypeScript and Native Side. The JSI part of the code isn't production ready yet and requires refactor + comprehensive comments

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@NorbertKlockiewicz NorbertKlockiewicz changed the title @nk/vision camera feat: add integration with vision camera v5 Feb 18, 2026
Copy link
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also handle there:

Image

Comment on lines +61 to +96
// Create a simple 320x320 test image (all zeros - black image)
// In a real scenario, you would load actual image pixel data here
const width = 320;
const height = 320;
const channels = 3; // RGB

// Create a black image (you can replace this with actual pixel data)
const rgbData = new Uint8Array(width * height * channels);

// Optionally, add some test pattern (e.g., white square in center)
for (let y = 100; y < 220; y++) {
for (let x = 100; x < 220; x++) {
const idx = (y * width + x) * 3;
rgbData[idx + 0] = 255; // R
rgbData[idx + 1] = 255; // G
rgbData[idx + 2] = 255; // B
}
}

const pixelData: PixelData = {
dataPtr: rgbData,
sizes: [height, width, channels],
scalarType: ScalarType.BYTE,
};

console.log('Running forward with hardcoded pixel data...', {
sizes: pixelData.sizes,
dataSize: pixelData.dataPtr.byteLength,
});

// Run inference using unified forward() API
const output = await ssdLite.forward(pixelData, 0.3);
console.log('Pixel data result:', output.length, 'detections');
setResults(output);
} catch (e) {
console.error('Error in runForwardPixels:', e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the comments from here as the code is self-describing.

Comment on lines +36 to +55
// Get target size from model input shape
const std::vector<int32_t> tensorDims = getAllInputShapes()[0];
cv::Size tensorSize = cv::Size(tensorDims[tensorDims.size() - 1],
tensorDims[tensorDims.size() - 2]);

cv::Mat rgb;

// Convert RGBA/BGRA to RGB if needed (for VisionCamera frames)
if (frame.channels() == 4) {
// Platform-specific color conversion:
// iOS uses BGRA format, Android uses RGBA format
#ifdef __APPLE__
// iOS: BGRA → RGB
cv::cvtColor(frame, rgb, cv::COLOR_BGRA2RGB);
#else
// Android: RGBA → RGB
cv::cvtColor(frame, rgb, cv::COLOR_RGBA2RGB);
#endif
} else if (frame.channels() == 3) {
// Already RGB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again these comments are not needed, only comment Only resize if dimensions don't match seems to be valid one.

auto [inputTensor, originalSize] =
image_processing::readImageToTensor(imageSource, getAllInputShapes()[0]);
ObjectDetection::runInference(cv::Mat image, double detectionThreshold) {
std::lock_guard<std::mutex> lock(inference_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoped lock is superior to lock_guard, and since we use c++ >= 17, use only scoped_lock in such situations.

Suggested change
std::lock_guard<std::mutex> lock(inference_mutex_);
std::scoped_lock<std::mutex> lock(inference_mutex_);

Comment on lines +118 to +124
// Store original size for postprocessing
cv::Size originalSize = image.size();

// Preprocess the image using model-specific preprocessing
cv::Mat preprocessed = preprocessFrame(image);

// Create tensor and run inference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are redundant.

} // namespace rnexecutorch::models::object_detection

std::vector<types::Detection>
ObjectDetection::generateFromString(std::string imageSource,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you use passing string by copy and not const reference? If because of the fact that this function is called via JSI and const ref fails here, please resolve this comment.

await moduleInstance.load(model, setDownloadProgress);
setIsReady(true);

// Extract runOnFrame worklet from VisionModule if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Extract runOnFrame worklet from VisionModule if available

Comment on lines +47 to +50
// Extract pure JSI function reference (runs on JS thread)
const nativeGenerateFromFrame = this.nativeModule.generateFromFrame;

// Return worklet that captures ONLY the JSI function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Extract pure JSI function reference (runs on JS thread)
const nativeGenerateFromFrame = this.nativeModule.generateFromFrame;
// Return worklet that captures ONLY the JSI function
const nativeGenerateFromFrame = this.nativeModule.generateFromFrame;


// Type detection and routing
if (typeof input === 'string') {
// String path → generateFromString()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// String path → generateFromString()

'scalarType' in input &&
input.scalarType === ScalarType.BYTE
) {
// Pixel data → generateFromPixels()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Pixel data → generateFromPixels()

Comment on lines +117 to +124
typeof input === 'object' &&
'dataPtr' in input &&
input.dataPtr instanceof Uint8Array &&
'sizes' in input &&
Array.isArray(input.sizes) &&
input.sizes.length === 3 &&
'scalarType' in input &&
input.scalarType === ScalarType.BYTE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huuuh, abstract it into smaller function ;p

Copy link
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also handle there:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants