配置文件引导的优化
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
配置文件引导的优化 (PGO) 是一种常用的编译器优化技术。在 PGO 中,编译器会使用程序执行的运行时配置文件,对内嵌和代码布局进行优化。这有助于提高性能并缩减代码大小。
您可以按照以下步骤将 PGO 部署到您的应用或库中:1. 确定一个具有代表性的工作负载。2. 收集配置文件。
3. 在发布 build 中使用这些配置文件。
第 1 步:确定一个具有代表性的工作负载
首先,为您的应用确定一个具有代表性的基准或工作负载。这一步很关键,因为从工作负载收集的配置文件会标识代码中的热区域和冷区域。使用这些配置文件时,编译器会在热区域中主动执行优化和内嵌。编译器也可以缩减冷区域的代码大小,但性能也会随之降低。
总体而言,工作负载选择得当对于跟踪性能也会很有帮助。
第 2 步:收集配置文件
收集配置文件包括三个步骤:使用插桩构建原生代码;在设备上运行插桩应用并生成配置文件;在主机上合并/后处理配置文件。
创建插桩 build
在应用的插桩 build 中运行在第 1 步中确定的工作负载来收集配置文件。如需生成插桩 build,请将 -fprofile-generate
添加到编译器标记和链接器标记中。此标记应由单独的 build 变量控制,因为在默认 build 期间不需要此标记。
生成配置文件
接下来,在设备上运行插桩应用并生成配置文件。运行插桩二进制文件,并在退出时写入文件,系统便会在内存中收集配置文件。不过,使用 atexit
注册的函数无法在 Android 应用中调用,应用会因此终止。
应用/工作负载必须执行额外的操作来设置配置文件的路径,然后显式触发配置文件写入。
- 如需设置配置文件路径,请调用
__llvm_profile_set_filename(PROFILE_DIR "/default-%m.profraw
。如果有多个共享库,%m
会非常有用。%m
会扩展为该库的唯一模块签名,从而为每个库生成一个单独的配置文件。如需了解其他有用的模式说明符,请参阅此处的说明。PROFILE_DIR
是可从应用写入的目录。如需了解如何在运行时检测此目录,请查看演示。
- 如需显式触发配置文件写入,请调用
__llvm_profile_write_file
函数。
extern "C" {
extern int __llvm_profile_set_filename(const char*);
extern int __llvm_profile_write_file(void);
}
#define PROFILE_DIR "<location-writable-from-app>"
void workload() {
// ...
// run workload
// ...
// set path and write profiles after workload execution
__llvm_profile_set_filename(PROFILE_DIR "/default-%m.profraw");
__llvm_profile_write_file();
return;
}
注意:如果工作负载是独立的二进制文件,则生成配置文件会更加简单,只需将 LLVM_PROFILE_FILE
环境变量设置为 %t/default-%m.profraw
,然后再运行该二进制文件即可。
后处理配置文件
配置文件采用 .profraw 格式。必须先使用 adb pull
将其从设备中提取出来。提取后,使用 NDK 中的 llvm-profdata
实用程序将其从 .profraw
转换为 .profdata
,然后再传递给编译器。
$NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-profdata \
merge --output=pgo_profile.profdata \
<list-of-profraw-files>
使用来自同一 NDK 版本的 llvm-profdata
和 clang
,以免配置文件格式出现版本不匹配的问题。
第 3 步:使用配置文件构建应用
通过将 -fprofile-use=<>.profdata
传递给编译器和链接器,在应用的发布 build 中使用在上一步生成的配置文件。即使代码在不断演变,也可以继续使用这些配置文件。Clang 编译器可以容忍源文件和配置文件之间的细微不匹配。
注意:一般而言,对于大多数库来说,配置文件在各个架构中是相同的。例如,arm64 build 生成的配置文件可用于所有架构。需要注意的是,如果库中有特定于架构的代码路径(arm 与 x86 或 32 位与 64 位),则应针对每个此类配置使用单独的配置文件。
总结
https://github.com/DanAlbert/ndk-samples/tree/pgo/pgo 演示了在应用中使用 PGO 的完整流程。该演示提供了本文档中略去的其他详细信息。
- CMake 构建规则说明了如何设置 CMake 变量,从而通过插桩构建原生代码。如果未设置 build 变量,系统会使用之前生成的 PGO 配置文件来优化原生代码。
- 在插桩 build 中,pgodemo.cpp 会在工作负载执行期间写入配置文件。
- 可在运行时使用
applicationContext.cacheDir.toString()
在 MainActivity.kt 中获取配置文件的可写入位置。
- 如需在不使用
adb root
的情况下从设备提取配置文件,请使用此处的 adb
配方。
本页面上的内容和代码示例受内容许可部分所述许可的限制。Java 和 OpenJDK 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2024-11-09。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-11-09。"],[],[],null,["# Profile-guided Optimization\n\n[Profile-guided optimization (PGO)](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization)\nis a well known compiler optimization\ntechnique. In PGO, runtime profiles from a program's executions are used by the\ncompiler to make optimal choices about inlining and code layout. This leads to\nimproved performance and reduced code size.\n\nPGO can be deployed to your application or library with the following steps:\n1. Identify a representative workload.\n2. Collect profiles.\n3. Use the profiles in a Release build.\n\nStep 1: Identify a Representative Workload\n------------------------------------------\n\nFirst, identify a representative benchmark or workload for your application.\nThis is a critical step as the profiles collected from the workload identify the\nhot and cold regions in the code. When using the profiles, the compiler will\nperform aggressive optimizations and inlining in the hot regions. The compiler\nmay also choose to reduce the code size of cold regions while trading off\nperformance.\n\nIdentifying a good workload is also beneficial to keep track of performance in\ngeneral.\n\nStep 2: Collect Profiles\n------------------------\n\nProfile collection involves three steps:\n- building native code with instrumentation,\n- running the instrumented app on the device and generating profiles, and\n- merging/post-processing the profiles on the host.\n\n### Create Instrumented Build\n\nThe profiles are collected by running the workload from step 1 on an\ninstrumented build of the application. To generate an instrumented build, add\n`-fprofile-generate` to the compiler and linker flags. This flag should be\ncontrolled by a separate build variable since the flag is not needed during a\ndefault build.\n\n### Generate Profiles\n\nNext, run the instrumented app on the device and generate profiles.\nProfiles are collected in memory when the instrumented binary is run and get\nwritten to a file at exit. However, functions registered with `atexit` are not\ncalled in an Android app --- the app just gets killed.\n\nThe application/workload has to do extra work to set a path for the profile file\nand then explicitly trigger a profile write.\n\n- To set the profile file path, call `__llvm_profile_set_filename(PROFILE_DIR \"/default-%m.profraw`. `%m` is useful when there are multiple shared libraries. `%m` expands to a unique module signature for that library, resulting in a separate profile per library. See [here](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) for other useful pattern specifiers. `PROFILE_DIR` is a directory that is writable from the app. See the [demo](#putting_it_all_together) for detecting this directory at runtime.\n- To explicitly trigger a profile write, call the `__llvm_profile_write_file` function.\n\n extern \"C\" {\n extern int __llvm_profile_set_filename(const char*);\n extern int __llvm_profile_write_file(void);\n }\n\n #define PROFILE_DIR \"\u003clocation-writable-from-app\u003e\"\n void workload() {\n // ...\n // run workload\n // ...\n\n // set path and write profiles after workload execution\n __llvm_profile_set_filename(PROFILE_DIR \"/default-%m.profraw\");\n __llvm_profile_write_file();\n return;\n }\n\nNB: Generating the profile file is simpler if the workload is a standalone binary ---\njust set the `LLVM_PROFILE_FILE` environment variable to `%t/default-%m.profraw`\nbefore running the binary.\n\n### Post-process Profiles\n\nThe profile files are in the .profraw format. They must first be fetched from\nthe device using `adb pull`. After fetch, use the `llvm-profdata` utility in\nthe NDK to convert from `.profraw` to `.profdata`, which can then be passed to the\ncompiler. \n\n $NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-profdata \\\n merge --output=pgo_profile.profdata \\\n \u003clist-of-profraw-files\u003e\n\nUse the `llvm-profdata` and `clang` from the same NDK release to avoid version\nmismatch of the profile file formats.\n\nStep 3 Use the Profiles to Build Application\n--------------------------------------------\n\nUse the profile from the previous step during a release build of your\napplication by passing `-fprofile-use=\u003c\u003e.profdata` to the compiler and linker. The\nprofiles can be used even as the code evolves --- the Clang compiler can tolerate\nslight mismatch between the source and the profiles.\n\nNB: In general, for most libraries, the profiles are common across architectures.\nFor e.g., profiles generated from arm64 build of the library can be used for\nall architectures. The caveat being that if there are architecture-specific\ncode paths in the library (arm vs x86 or 32-bit vs 64-bit), separate profiles\nshould be used for each such configuration.\n\nPutting it all together\n-----------------------\n\n\u003chttps://github.com/DanAlbert/ndk-samples/tree/pgo/pgo\u003e\nshows an end-to-end demo for using PGO from an app. It provides additional\ndetails that were skimmed over in this doc.\n\n- The [CMake build\n rules](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/cpp/CMakeLists.txt) show how to setup a CMake variable that builds native code with instrumentation. When the build variable is not set, native code is optimized using previously generated PGO profiles.\n- In an instrumented build, [pgodemo.cpp](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/cpp/pgodemo.cpp) writes the profiles are workload execution.\n- A writable location for the profiles is obtained at runtime in [MainActivity.kt](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/java/com/example/pgodemo/MainActivity.kt) using `applicationContext.cacheDir.toString()`.\n- To pull profiles from the device without requiring `adb root`, use the `adb` recipe [here](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/cpp/CMakeLists.txt#L11)."]]