Overview of measuring app performance

This document helps you to identify and fix key performance issues in your app.

Key performance issues

There are many problems that can contribute to bad performance in an app, but the following are some common issues to look for in your app:

Scroll jank

Jank is the term that describes the visual hiccup that occurs when the system isn't able to build and provide frames in time to draw them to the screen at the requested cadence of 60hz or higher. Jank is most apparent when scrolling, when instead of smoothly animated flow, there are hiccups. Jank appears when the movement pauses along the way for one or more frames, as the app takes longer to render content than the duration of a frame on the system.

Apps must target 90Hz refresh rates. Conventional rendering rates are 60Hz, but many newer devices operate in 90Hz mode during user interactions, such as scrolling. Some devices support even higher rates of up to 120Hz.

To see what refresh rate a device is using at a given time, enable an overlay using Developer Options > Show refresh rate in the Debugging section.

Startup latency

Startup latency is the amount of time it takes between tapping on the app icon, notification, or other entry point, and the user's data showing on the screen.

Aim for the following startup goals in your apps:

  • Cold start in less than 500ms. A cold start happens when the app being launched isn't present in the system's memory. This happens when it is the app's first launch since reboot or since the app process is stopped by either the user or the system.

    In contrast, a warm start occurs when the app is already running in the background. A cold start requires the most work from the system, as it has to load everything from storage and initialize the app. Try to make cold starts take 500ms or less.

  • P95 and P99 latencies very close to the median latency. When the app takes a long time to start, it makes a poor user experience. Interprocess communications (IPCs) and unnecessary I/O during the critical path of app startup can experience lock contention and introduce inconsistencies.

Transitions that aren't smooth

This is apparent during interactions such as switching between tabs or loading a new activity. These types of transitions must be smooth animations and not include delays or visual flicker.

Power inefficiencies

Doing work reduces battery charge, and doing unnecessary work reduces battery life.

Memory allocations, which come from creating new objects in code, can be the cause of significant work in the system. This is because not only do the allocations themselves require effort from the Android Runtime (ART), but freeing these objects later (garbage collection) also requires time and effort. Both allocation and collection are much faster and more efficient, especially for temporary objects. Although it used to be best practice to avoid allocating objects whenever possible, we recommend you do what makes the most sense for your app and architecture. Saving on allocations at the risk of unmaintainable code isn't the best practice, given what ART is capable of.

However, it requires effort, so keep in mind that it can contribute to performance problems if you are allocating many objects in your inner loop.

Identify issues

We recommended the following workflow to identify and remedy performance issues:

  1. Identify and inspect the following critical user journeys:
    • Common startup flows, including from launcher and notification.
    • Screens where the user scrolls through data.
    • Transitions between screens.
    • Long-running flows, like navigation or music playback.
  2. Inspect what is happening during the preceding flows using the following debugging tools:
    • Perfetto: lets you see what is happening across the entire device with precise timing data.
    • Memory Profiler: lets you see what memory allocations are happening on the heap.
    • Simpleperf: shows a flamegraph of what function calls are using the most CPU during a certain period of time. When you identify something that's taking a long time in Systrace, but you don't know why, Simpleperf can provide additional information.

To understand and debug these performance issues, it's critical to manually debug individual test runs. You can't replace the preceding steps by analyzing aggregated data. However, to understand what users are actually seeing and identify when regressions might occur, it's important to set up metrics collection in automated testing and in the field:

  • Startup flows
  • Jank
    • Field metrics
      • Play Console frame vitals: within the Play Console, you can't narrow down metrics to a specific user journey. It only reports overall jank throughout the app.
      • Custom measurement with FrameMetricsAggregator: you can use FrameMetricsAggregator to record jank metrics during a particular workflow.
    • Lab tests
      • Scrolling with Macrobenchmark.
      • Macrobenchmark collects frame timing using dumpsys gfxinfo commands that bracket a single user journey. This is a way to understand variation in jank over a specific user journey. The RenderTime metrics, which highlight how long frames are taking to draw, are more important than the count of janky frames for identifying regressions or improvements.

Set up your app for performance analysis

It's essential to properly set up to get accurate, repeatable, actionable benchmarks from an app. Test on a system that is as close to production as possible, while suppressing sources of noise. The following sections show a number of APK- and system-specific steps you can take to prepare a test setup, some of which are use-case specific.

Tracepoints

Apps can instrument their code with custom trace events.

While traces are being captured, tracing does incur a small overhead of roughly 5μs per section, so don't put it around every method. Tracing larger chunks of work of >0.1ms can give significant insights into bottlenecks.

APK considerations

Debug variants can be helpful for troubleshooting and symbolizing stack samples, but they have severe non-linear impacts on performance. Devices running Android 10 (API Level 29) and higher can use profileable android:shell="true" in their manifest to enable profiling in release builds.

Use your production-grade code shrinking configuration. Depending on the resources your app uses, this can have a substantial impact on performance. Some ProGuard configurations remove tracepoints, so consider removing those rules for the configuration you're running tests on.

Compilation

Compile your app on-device to a known state—generally speed or speed-profile. Background just-in-time (JIT) activity can have significant performance overhead, and you reach it often if you are reinstalling the APK between test runs. The following is a command to do this is:

adb shell cmd package compile -m speed -f com.google.packagename

The speed compilation mode compiles the app completely. The speed-profile mode compiles the app according to a profile of the utilized code paths that is collected during app usage. It can be difficult to collect profiles consistently and correctly, so if you decide to use them, confirm they are collecting what you expect. The profiles are located in the following location:

/data/misc/profiles/ref/[package-name]/primary.prof

Macrobenchmark lets you directly specify compilation mode.

System considerations

For low-level and high fidelity measurements, calibrate your devices. Run A/B comparisons across the same device and same OS version. There can be significant variations in performance, even across the same device type.

On rooted devices, consider using a lockClocks script for Microbenchmarks. Among other things, these scripts do the following:

  • Place CPUs at a fixed frequency.
  • Disable small cores and configure the GPU.
  • Disable thermal throttling.

We don't recommend using a lockClocks script for user-experience focused tests such as app launch, DoU testing, and jank testing, but it can be essential for reducing noise in Microbenchmark tests.

When possible, consider using a testing framework like Macrobenchmark, which can reduce noise in your measurements and prevent measurement inaccuracy.

Slow app startup: unnecessary trampoline activity

A trampoline activity can extend app startup time unnecessarily, and it's important to be aware if your app is doing it. As shown in the following example trace, one activityStart is immediately followed by another activityStart without any frames being drawn by the first activity.

alt_text

Figure 1. A trace showing trampoline activity.

This can happen both in a notification entrypoint and a regular app startup entrypoint, and you can often address it by refactoring. For example, if you're using this activity to perform setup before another activity runs, factor this code out into a reusable component or library.

Unnecessary allocations triggering frequent GCs

You might see garbage collections (GCs) are happening more frequently than you expect in a Systrace.

In the following example, every 10 seconds during a long-running operation is an indicator that the app might be allocating unnecessarily but consistently over time:

alt_text

Figure 2. A trace showing space between GC events.

You might also notice that a specific call stack is making the vast majority of the allocations when using the Memory Profiler. You don't need to eliminate all allocations aggressively, as this can make code harder to maintain. Instead, start by working on hotspots of allocations.

Janky frames

The graphics pipeline is relatively complicated, and there can be some nuance in determining whether a user ultimately might see a dropped frame. In some cases, the platform can "rescue" a frame using buffering. However, you can ignore most of that nuance to identify problematic frames from your app's perspective.

When frames are being drawn with little work required from the app, the Choreographer.doFrame() tracepoints occur on a 16.7ms cadence on a 60 FPS device:

alt_text

Figure 3. A trace showing frequent fast frames.

If you zoom out and navigate through the trace, you sometimes see frames take a little longer to complete, but it's still okay because they're not taking more than their allotted 16.7ms time:

alt_text

Figure 4. A trace showing frequent fast frames with periodic bursts of work.

When you see a disruption to this regular cadence, it is a janky frame, as shown in figure 5:

alt_text

Figure 5. A trace showing a janky frame.

You can practice identifying them.

alt_text

Figure 6. A trace showing more janky frames.

In some cases, you need to zoom into a tracepoint for more information about which views are being inflated or what RecyclerView is doing. In other cases, you might have to inspect further.

For more information about identifying janky frames and debugging their causes, see Slow rendering.

Common RecyclerView mistakes

Invalidating the entire backing data of RecyclerView unnecessarily can lead to long frame rendering times and jank. Instead, to minimize the number of views that need to update, invalidate only the data that changes.

See Present dynamic data for ways to avoid costly notifyDatasetChanged() calls, which cause content to update rather than replacing it entirely.

If you don't support every nested RecyclerView properly, it can cause the internal RecyclerView to be completely recreated every time. Every nested, inner RecyclerView must have a RecycledViewPool set to help ensure views can be recycled between every inner RecyclerView.

Not prefetching enough data, or not prefetching in a timely manner, can make reaching the bottom of a scrolling list jarring when a user needs to wait for more data from the server. Although this isn't technically jank, as no frame deadlines are being missed, you can significantly improve UX by modifying the timing and quantity of prefetching so that the user doesn't have to wait for data.