Unleash ML Power on iOS: Apple Silicon Optimization Secrets

Community Article Published July 18, 2024

Core ML, Apple's machine learning framework, has introduced new APIs for asynchronous and batch predictions. These powerful features can significantly improve your app's performance when working with machine learning models. In this post, we'll explore how to implement these new APIs and the benefits they bring to your Core ML integration. For more details, check out the WWDC 2023 session: Improve Core ML integration with async prediction.

Understanding Async Prediction

The new async prediction API in Core ML is designed to work seamlessly with Swift concurrency. It allows for better performance and responsiveness in your app by enabling concurrent predictions and integrating well with other asynchronous code.

Key Benefits of Async Prediction:

  • Thread-safe: No need for manual synchronization
  • Supports cancellation: Respects task cancellation for better resource management
  • Improved throughput: Allows for concurrent predictions

Implementing Async Prediction

Let's look at how to implement async prediction:

class ColorizingService {
    private let colorizerModel: colorizer_model

    init() throws {
        let config = MLModelConfiguration()
        self.colorizerModel = try colorizer_model(configuration: config)
    }

    func colorize(_ image: CGImage) async throws -> CGImage {
        // Check for cancellation
        try Task.checkCancellation()

        // Prepare input (non-isolated work)
        let input = try colorizationInputFrom(cgImage: image)

        // Perform prediction
        let output = try await colorizerModel.prediction(input: input)

        // Process output
        return try cgImageFrom(output: output)
    }
}

In this example, we've made the colorize method asynchronous. The prediction call is preceded by the await keyword, allowing it to be used with Swift's concurrency system. We've also added a cancellation check at the start of the method for better task management.

Batch Prediction

While async prediction is great for handling individual inputs over time, batch prediction is ideal when you have multiple inputs available at once.

When to Use Batch Prediction:

  • When you have a fixed set of inputs to process
  • When you want to leverage Core ML's internal optimizations for processing multiple inputs
  • When you don't need to cancel individual predictions within the batch

Here's an example of how you might implement batch prediction:

func colorizeBatch(_ images: [CGImage]) throws -> [CGImage] {
    let inputs = try images.map { try colorizationInputFrom(cgImage: $0) }
    let outputs = try colorizerModel.predictions(inputs: inputs)
    return try outputs.map { try cgImageFrom(output: $0) }
}

Case Study: CLIP-Finder

Let's explore how these concepts are applied in a real-world project: CLIP-Finder. CLIP-Finder is an iOS app that allows users to search their photo gallery using natural language descriptions or camera input. It uses Core ML models optimized for the Neural Engine to perform semantic searches on photos.

Async Prediction in CLIP-Finder

CLIP-Finder implements async prediction in its CLIPImageModel class:

class CLIPImageModel {
    var model: MLModel?

    func performInference(_ pixelBuffer: CVPixelBuffer) async throws -> MLMultiArray? {
        guard let model = model else {
            throw NSError(domain: "ClipImageModel", code: 2, userInfo: [NSLocalizedDescriptionKey: "Model is not loaded"])
        }

        let input = InputFeatureProvider(pixelBuffer: pixelBuffer)

        do {
            let outputFeatures = try await model.prediction(from: input)

            if let multiArray = outputFeatures.featureValue(for: "var_1259")?.multiArrayValue {
                return multiArray
            } else {
                throw NSError(domain: "ClipImageModel", code: 3, userInfo: [NSLocalizedDescriptionKey: "Failed to retrieve MLMultiArray from prediction"])
            }
        } catch {
            print("ClipImageModel: Failed to perform inference: \(error)")
            throw error
        }
    }
}

Turbo Mode in CLIP-Finder

CLIP-Finder leverages async prediction in its Turbo Mode for faster image processing:

  • Activated through a button in the camera interface
  • Enables asynchronous camera prediction for faster image search
  • Activated by tapping the "Turbo" button in the lower right corner of the camera interface

Batch Processing in CLIP-Finder

CLIP-Finder uses batch processing to preprocess photos and store their feature vectors in a CoreData database:

private func processAndCachePhotos(_ assetsToProcess: [PHAsset]) async {
    // ... (setup code)

    let batchSize = 512
    let batches = stride(from: 0, to: assetsToProcess.count, by: batchSize).map {
        Array(assetsToProcess[$0..<min($0 + batchSize, assetsToProcess.count)])
    }

    for batch in batches {
        let results = await withTaskGroup(of: (String, CVPixelBuffer?).self) { group in
            for asset in batch {
                group.addTask {
                    let identifier = asset.localIdentifier
                    guard let image = await self.requestImage(for: asset, targetSize: targetSize, options: options) else {
                        return (identifier, nil)
                    }
                    let pixelBuffer = Preprocessing.preprocessImageWithCoreImage(image, targetSize: targetSize)
                    return (identifier, pixelBuffer)
                }
            }

            var batchResults: [(String, CVPixelBuffer?)] = []
            for await result in group {
                batchResults.append(result)
            }
            return batchResults
        }

        // ... (process batch results)
    }

    // ... (finalize processing)
}

This batch processing approach offers several advantages:

  • Efficient initial processing: Entire photo gallery preprocessed using Neural Engine with a batch size of 512 photos
  • Incremental updates: Only new images are processed on subsequent app launches
  • Database maintenance: Updates database when photos are deleted from the device

Performance Analysis

Sync vs Async Prediction

To understand the benefits of async prediction, let's compare it with synchronous prediction:

Synchronous Prediction

image/png

In synchronous prediction, tasks are executed sequentially. As we can see in the image, there are noticeable gaps between consecutive predictions. This approach can lead to inefficient use of system resources and potentially slower overall performance, especially when dealing with multiple predictions.

Asynchronous Prediction

image/png

With asynchronous prediction, image preprocessing and prediction tasks are dispatched asynchronously. This allows for optimized execution along the timeline, maximizing the probability of multiple CoreML instances running simultaneously on the Neural Engine. As the image shows, tasks overlap and utilize system resources more efficiently, leading to improved overall performance.

Batch Processing Performance

image/png

This Instruments trace provides insights into the batch processing performance:

  • Image Preprocessing GPU: Two distinct phases of GPU activity for image preprocessing, likely involving resizing and normalizing images.
  • CLIP Image Prediction: Intense Neural Engine activity with a batch size of 512 photos, each taking approximately 1ms to process on the A17 Pro chip.
  • Neural Engine Prediction: Additional batch of Neural Engine activity, possibly for further processing or feature extraction.
  • Efficient resource utilization: The trace shows how CLIP-Finder efficiently uses both GPU and Neural Engine, allowing for parallel processing.

Detailed Batch Processing

image/png

This detailed view of batch processing reveals an important optimization: two instances of prediction occur in parallel, significantly improving performance. By leveraging the Neural Engine's capabilities, CLIP-Finder can process multiple images simultaneously, greatly reducing the overall time required for batch operations.

Performance Comparison: Sync vs Async vs Batch Processing

To better understand the performance benefits of different processing modes, a test was conducted using CLIP-Finder to process a photo gallery containing 6,524 images. The results clearly demonstrate the advantages of batch processing over synchronous and asynchronous methods.

Processing Mode Total Time (seconds) Average Time per Photo (ms)
Synchronous 40.4 6.19
Asynchronous 16.28 2.49
Batch 15.29 2.34

As shown by the results, batch processing significantly outperforms both synchronous and asynchronous methods, with asynchronous processing showing a marked improvement over synchronous processing.

Synchronous Processing

image/png

In synchronous mode, the processing takes approximately 40.4 seconds for the entire gallery, resulting in an average of 6.19 ms per photo. The timeline shows sequential processing with noticeable gaps between operations.

Asynchronous Processing

image/png

Asynchronous processing improves performance significantly, taking about 16.28 seconds for the gallery, or 2.49 ms per photo on average. The timeline shows more overlap in operations, with increased utilization of the Neural Engine and GPU, leading to better resource utilization compared to synchronous processing.

Batch Processing

image/png

Batch processing demonstrates the best performance, completing the gallery in just 15.29 seconds, or 2.34 ms per photo on average. The timeline shows intense, parallel utilization of the Neural Engine, resulting in the most efficient processing method.

These results clearly illustrate the performance benefits of batch processing, especially for large datasets like photo galleries. The ability to process multiple images simultaneously using the Neural Engine leads to significant time savings and more efficient resource utilization.

Choosing Between Async and Batch Prediction

The choice between async and batch prediction depends on your specific use case:

  • Use async prediction when:
    • You're in an asynchronous context
    • Inputs become available individually over time
    • You need to support cancellation of individual predictions
  • Use batch prediction when:
    • You have a known quantity of work to be done
    • All inputs are available at once
    • You don't need to cancel individual predictions within the batch

Performance Considerations

While async and batch predictions can significantly improve performance, keep in mind:

  • Profile your workload to ensure it's benefiting from concurrency
  • Be mindful of memory usage, especially when processing large inputs or outputs concurrently
  • Consider implementing flow control to limit the number of concurrent predictions if memory usage is a concern

Conclusion

The new async prediction API and batch processing capabilities in Core ML offer powerful ways to integrate machine learning into your Swift apps with improved performance and responsiveness. By understanding when to use each approach and how to implement them effectively, you can optimize your app's Core ML integration for the best possible user experience. The CLIP-Finder project demonstrates how these techniques can be applied in a real-world scenario, leveraging both async and batch processing to create a responsive and efficient photo search application.

References