Google is committed to advancing racial equity for Black communities. See how.

Multi-camera API

Multi-camera was introduced with Android 9 (API level 28). Since its release, devices have come to market that support the API. Many multi-camera use cases are tightly coupled with a specific hardware configuration. In other words, not all use cases are compatible with every device,  which makes multi-camera features a good candidate for dynamic delivery of modules.

Some typical use cases include:

  • Zoom: switching between cameras depending on crop region or desired focal length.
  • Depth: using multiple cameras to build a depth map.
  • Bokeh: using inferred depth information to simulate a DSLR-like narrow focus range.

Logical and physical cameras

Understanding the multi-camera API requires understanding the difference between logical and physical cameras. For reference, consider a device with three back-facing cameras. In this example, each of the three back cameras is considered a physical camera. A logical camera is then a grouping of two or more of those physical cameras. The output of the logical camera can be a stream that comes from one of the underlying physical cameras, or a fused stream coming from more than one underlying physical camera simultaneously. Either way, the stream is handled by the camera Hardware Abstraction Layer (HAL).

Many phone manufacturers develop first-party camera applications, which usually come pre-installed on their devices. To use all of the hardware's capabilities, they may use private or hidden APIs or receive special treatment from the driver implementation that other applications do not have access to. Some devices implement the concept of logical cameras by providing a fused stream of frames from the different physical cameras, but only to certain privileged applications. Often, only one of the physical cameras is exposed to the framework. The situation for third-party developers prior to Android 9 is illustrated in the following diagram:

Figure 1. Camera capabilities typically only available to privileged applications

Beginning with Android 9, private APIs are no longer allowed in Android apps. With the inclusion of multi-camera support in the framework, Android best practices strongly recommend that phone manufacturers expose a logical camera for all physical cameras facing the same direction. The following is what third-party developers should expect to see on devices running Android 9 and higher:

Figure 2. Full developer access to all camera devices starting in Android 9

What the logical camera provides is entirely dependent on the OEM implementation of the Camera HAL. For example, a device like Pixel 3 implements its logical camera in such a way that it chooses one of its physical cameras based on the requested focal length and crop region.

The multi-camera API

The new API adds the following new constants, classes, and methods:

Due to changes to the Android Compatibility Definition Document (CDD), the multi-camera API also comes with certain expectations from developers. Devices with dual cameras existed prior to Android 9, but opening more than one camera simultaneously involved trial and error. On Android 9 and higher, multi-camera gives a set of rules to specify when it is possible to open a pair of physical cameras that are part of the same logical camera.

In most cases, devices running Android 9 and higher expose all physical cameras (except possibly for less-common sensor types like infrared) along with an easier-to-use logical camera. For every combination of streams that are guaranteed to work, one stream belonging to a logical camera can be replaced by two streams from the underlying physical cameras.

Multiple streams simultaneously

Using multiple camera streams simultaneously covers the rules for using multiple streams simultaneously in a single camera. The same rules apply for multiple cameras with one notable addition. CameraMetadata.REQUEST_AVAILABLE_CAPABILITIES_LOGICAL_MULTI_CAMERA explains how to replace a logical YUV_420_888 or raw stream with two physical streams. That is, each stream of type YUV or RAW can be replaced with two streams of identical type and size. You can start with a camera stream of the following guaranteed configuration for single-camera devices:

  • Stream 1: YUV type, MAXIMUM size from logical camera id = 0

Then, a device with multi-camera support allows you to create a session replacing that logical YUV stream with two physical streams:

  • Stream 1: YUV type, MAXIMUM size from physical camera id = 1
  • Stream 2: YUV type, MAXIMUM size from physical camera id = 2

You can replace a YUV or RAW stream with two equivalent streams if and only if those two cameras are part of a logical camera grouping ; that is, listed under CameraCharacteristics.getPhysicalCameraIds().

The guarantees provided by the framework are just the bare minimum required to get frames from more than one physical camera simultaneously. Additional streams are supported in most devices, sometimes even allowing opening multiple physical camera devices independently. Since it's not a hard guarantee from the framework, doing that requires performing per-device testing and tuning using trial and error.

Creating a session with multiple physical cameras

When using physical cameras on a multi-camera enabled device, open a single CameraDevice (the logical camera) and interact with it within a single session. Create the single session using the API CameraDevice.createCaptureSession(SessionConfiguration config), which was added in API level 28. The session configuration has a number of output configurations, each of which has a set of output targets and, optionally, a desired physical camera ID.

Figure 3. SessionConfiguration and OutputConfiguration model

Capture requests have an output target associated with them. The framework determines which physical (or logical) camera the requests are sent to based on what output target is attached. If the output target corresponds to one of the output targets that was sent as an output configuration along with a physical camera ID, then that physical camera receives and processes the request.

Using a pair of physical cameras

Another addition to the camera APIs for multi-camera is the ability to identify logical cameras and find the physical cameras behind them. You can define a function to help identify potential pairs of physical cameras that you can use to replace one of the logical camera streams:

/**
* Helper class used to encapsulate a logical camera and two underlying
* physical cameras
*/
data class DualCamera(val logicalId: String, val physicalId1: String, val physicalId2: String)

fun findDualCameras(manager: CameraManager, facing: Int? = null): List<DualCamera> {
  val dualCameras = MutableList<DualCamera>()

  // Iterate over all the available camera characteristics
  manager.cameraIdList.map {
    Pair(manager.getCameraCharacteristics(it), it)
  }.filter {
    // Filter by cameras facing the requested direction
    facing == null || it.first.get(CameraCharacteristics.LENS_FACING) == facing
  }.filter {
    // Filter by logical cameras
    it.first.get(CameraCharacteristics.REQUEST_AVAILABLE_CAPABILITIES)!!.contains(
        CameraCharacteristics.REQUEST_AVAILABLE_CAPABILITIES_LOGICAL_MULTI_CAMERA)
  }.forEach {
    // All possible pairs from the list of physical cameras are valid results
    // NOTE: There could be N physical cameras as part of a logical camera grouping
    val physicalCameras = it.first.physicalCameraIds.toTypedArray()
    for (idx1 in 0 until physicalCameras.size) {
      for (idx2 in (idx1 + 1) until physicalCameras.size) {
        dualCameras.add(DualCamera(
          it.second, physicalCameras[idx1], physicalCameras[idx2]))
      }
    }
  }

  return dualCameras
}

State handling of the physical cameras is controlled by the logical camera. To open a "dual camera," open the logical camera corresponding to the physical cameras:

fun openDualCamera(cameraManager: CameraManager,
                   dualCamera: DualCamera,
                   executor: Executor = AsyncTask.SERIAL_EXECUTOR,
                   callback: (CameraDevice) -> Unit) {

  cameraManager.openCamera(
        dualCamera.logicalId, executor, object : CameraDevice.StateCallback() {
    override fun onOpened(device: CameraDevice) = callback(device)
    // Omitting for brevity...
    override fun onError(device: CameraDevice, error: Int) = onDisconnected(device)
    override fun onDisconnected(device: CameraDevice) = device.close()
  })
}

Besides selecting which camera to open, nothing is different compared to opening a camera in past Android versions. Creating a capture session using the new session configuration API tells the framework to associate certain targets with specific physical camera IDs:

/**
 * Helper type definition that encapsulates 3 sets of output targets:
 *
 *   1. Logical camera
 *   2. First physical camera
 *   3. Second physical camera
 */
typealias DualCameraOutputs =
        Triple<MutableList<Surface>?, MutableList<Surface>?, MutableList<Surface>?>

fun createDualCameraSession(cameraManager: CameraManager,
                            dualCamera: DualCamera,
                            targets: DualCameraOutputs,
                            executor: Executor = AsyncTask.SERIAL_EXECUTOR,
                            callback: (CameraCaptureSession) -> Unit) {

  // Create 3 sets of output configurations: one for the logical camera, and
  // one for each of the physical cameras.
  val outputConfigsLogical = targets.first?.map { OutputConfiguration(it) }
  val outputConfigsPhysical1 = targets.second?.map {
      OutputConfiguration(it).apply { setPhysicalCameraId(dualCamera.physicalId1) } }
  val outputConfigsPhysical2 = targets.third?.map {
      OutputConfiguration(it).apply { setPhysicalCameraId(dualCamera.physicalId2) } }

  // Put all the output configurations into a single flat array
  val outputConfigsAll = arrayOf(
          outputConfigsLogical, outputConfigsPhysical1, outputConfigsPhysical2)
          .filterNotNull().flatMap { it }

  // Instantiate a session configuration that can be used to create a session
  val sessionConfiguration = SessionConfiguration(SessionConfiguration.SESSION_REGULAR,
          outputConfigsAll, executor, object : CameraCaptureSession.StateCallback() {
    override fun onConfigured(session: CameraCaptureSession) = callback(session)
    // Omitting for brevity...
    override fun onConfigureFailed(session: CameraCaptureSession) = session.device.close()
  })

  // Open the logical camera using the previously defined function
  openDualCamera(cameraManager, dualCamera, executor = executor) {

    // Finally create the session and return via callback
    it.createCaptureSession(sessionConfiguration)
  }
}

See createCaptureSession for information on which combination of streams is supported. Combining streams is for multiple streams on a single logical camera. The compatibility extends to using the same configuration and replacing one of those streams with two streams from two physical cameras that are part of the same logical camera.

With the camera session ready, you can dispatch the desired capture requests. Each target of the capture request receives its data from its associated physical camera, if any, or fall back to the logical camera.

Zoom example use-case

It is possible to use the merging of physical cameras into a single stream so that users can switch between the different physical cameras to experience a different field-of-view, effectively capturing a different "zoom level."

Figure 4. Example of swapping cameras for zoom level use-case (from Pixel 3 Ad)

First, you select the pair of physical cameras to allow users to switch between. For maximum effect, you can choose the pair of cameras that provide the minimum and maximum focal length available, respectively.

fun findShortLongCameraPair(manager: CameraManager, facing: Int? = null): DualCamera? {

  return findDualCameras(manager, facing).map {
    val characteristics1 = manager.getCameraCharacteristics(it.physicalId1)
    val characteristics2 = manager.getCameraCharacteristics(it.physicalId2)

    // Query the focal lengths advertised by each physical camera
    val focalLengths1 = characteristics1.get(
            CameraCharacteristics.LENS_INFO_AVAILABLE_FOCAL_LENGTHS) ?: floatArrayOf(0F)
    val focalLengths2 = characteristics2.get(
            CameraCharacteristics.LENS_INFO_AVAILABLE_FOCAL_LENGTHS) ?: floatArrayOf(0F)

    // Compute the largest difference between min and max focal lengths between cameras
    val focalLengthsDiff1 = focalLengths2.max()!! - focalLengths1.min()!!
    val focalLengthsDiff2 = focalLengths1.max()!! - focalLengths2.min()!!

    // Return the pair of camera IDs and the difference between min and max focal lengths
    if (focalLengthsDiff1 < focalLengthsDiff2) {
        Pair(DualCamera(it.logicalId, it.physicalId1, it.physicalId2), focalLengthsDiff1)
    } else {
        Pair(DualCamera(it.logicalId, it.physicalId2, it.physicalId1), focalLengthsDiff2)
    }

    // Return only the pair with the largest difference, or null if no pairs are found
  }.maxBy { it.second }.first
}

A sensible architecture for this would be to have two SurfaceViews, one for each stream, which get swapped based on user interaction so that only one is visible at any given time.

The following code shows how to open the logical camera, configure the camera outputs, create a camera session, and start two preview streams:

val cameraManager: CameraManager = ...

// Get the two output targets from the activity / fragment
val surface1 = ...  // from SurfaceView
val surface2 = ...  // from SurfaceView

val dualCamera = findShortLongCameraPair(manager)!!
val outputTargets = DualCameraOutputs(
    null, mutableListOf(surface1), mutableListOf(surface2))

// Here you open the logical camera, configure the outputs and create a session
createDualCameraSession(manager, dualCamera, targets = outputTargets) { session ->

  // Create a single request which has one target for each physical camera
  // NOTE: Each target receive frames from only its associated physical camera
  val requestTemplate = CameraDevice.TEMPLATE_PREVIEW
  val captureRequest = session.device.createCaptureRequest(requestTemplate).apply {
    arrayOf(surface1, surface2).forEach { addTarget(it) }
  }.build()

  // Set the sticky request for the session and you are done
  session.setRepeatingRequest(captureRequest, null, null)
}

All that is left to do is provide a UI for the user to switch between the two surfaces, such as a button or double-tapping the SurfaceView. You could even perform some form of scene analysis and switch between the two streams automatically.

Lens distortion

All lenses produce a certain amount of distortion. In Android, you can query the distortion created by lenses using CameraCharacteristics.LENS_DISTORTION, which replaces the now-deprecated CameraCharacteristics.LENS_RADIAL_DISTORTION. For logical cameras, the distortion is minimal and your application can use the frames more or less as they come from the camera. For physical cameras, there are potentially very different lens configurations, especially on wide lenses.

Some devices may implement automatic distortion correction via CaptureRequest.DISTORTION_CORRECTION_MODE. Distortion correction defaults to being on for most devices.

val cameraSession: CameraCaptureSession = ...

// Use still capture template to build the capture request
val captureRequest = cameraSession.device.createCaptureRequest(
    CameraDevice.TEMPLATE_STILL_CAPTURE)

// Determine if this device supports distortion correction
val characteristics: CameraCharacteristics = ...
val supportsDistortionCorrection = characteristics.get(
    CameraCharacteristics.DISTORTION_CORRECTION_AVAILABLE_MODES)?.contains(
    CameraMetadata.DISTORTION_CORRECTION_MODE_HIGH_QUALITY) ?: false

if (supportsDistortionCorrection) {
    captureRequest.set(
        CaptureRequest.DISTORTION_CORRECTION_MODE,
        CameraMetadata.DISTORTION_CORRECTION_MODE_HIGH_QUALITY)
}

// Add output target, set other capture request parameters...

// Dispatch the capture request
cameraSession.capture(captureRequest.build(), ...)

Setting a capture request in this mode can impact the frame rate that can be produced by the camera. You may choose to set the distortion correction on only still image captures.