特性指引最佳化
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
特性指引最佳化 (PGO) 是廣為人知的編譯器最佳化技術。在 PGO 中,編譯器會使用來自程式執行作業的執行階段設定檔,針對內嵌和程式碼配置做出最佳選擇,這有助於改善效能並縮減程式碼大小。
您可以按照下列步驟將 PGO 部署至應用程式或程式庫:
1. 確定一個具代表性的工作負載。
2. 收集設定檔。
3. 在發布子版本中使用設定檔。
步驟 1:確定一個具代表性的工作負載
首先,為您的應用程式確定一個具代表性的基準或工作負載。
這一步非常關鍵,因為從工作負載收集的設定檔將用於識別程式碼中的熱區域和冷區域。使用這些設定檔時,編譯器會在熱區域中積極執行最佳化和內嵌作業。編譯器也可能會選擇縮減冷區域的程式碼大小,但將因此導致效能下降。
總體而言,工作負載選擇得當也有利於追蹤效能。
步驟 2:收集設定檔
設定檔收集作業包含三個步驟:透過檢測設備建構原生程式碼;在裝置上執行檢測應用程式並產生設定檔;以及在主機上合併/後置處理設定檔。
建立檢測建構
在應用程式的檢測建構中,透過執行步驟 1 中確定的工作負載來收集設定檔。如要產生檢測建構,請將 -fprofile-generate
新增至編譯器和連結器標記。這個標記應由個別建構變數控制,因為在預設建構作業期間不需要這個標記。
產生設定檔
接著,在裝置上執行檢測應用程式並產生設定檔。
執行檢測二進位檔,並在結束時寫入檔案,這個過程便會在記憶體中收集設定檔。不過,使用 atexit
註冊的函式無法在 Android 應用程式中呼叫,應用程式會遭到終止。
應用程式/工作負載必須執行額外作業來為設定檔設定路徑,然後顯式觸發設定檔寫入作業。
- 如要為設定檔檔案設定路徑,請呼叫
__llvm_profile_set_filename(PROFILE_DIR "/default-%m.profraw
。如果有多個共用程式庫,%m
會相當實用。%m
會擴充為這個程式庫的唯一模組簽章,進而為每個程式庫產生獨立的設定檔。如需瞭解其他實用的模式說明符,請前往這裡。PROFILE_DIR
是可從應用程式寫入的目錄。如需瞭解如何在執行階段偵測這個目錄,請查看示範。
- 如要明確觸發設定檔寫入作業,請呼叫
__llvm_profile_write_file
函式。
extern "C" {
extern int __llvm_profile_set_filename(const char*);
extern int __llvm_profile_write_file(void);
}
#define PROFILE_DIR "<location-writable-from-app>"
void workload() {
// ...
// run workload
// ...
// set path and write profiles after workload execution
__llvm_profile_set_filename(PROFILE_DIR "/default-%m.profraw");
__llvm_profile_write_file();
return;
}
注意:如果工作負載是獨立二進位檔,產生設定檔會比較簡單。只要在執行二進位檔之前,將 LLVM_PROFILE_FILE
環境變數設為 %t/default-%m.profraw
即可。
後置處理設定檔
設定檔檔案的格式為 .profraw,必須先使用 adb pull
從裝置中擷取設定檔檔案。接著,使用 NDK 中的 llvm-profdata
公用程式,將擷取到的設定檔從 .profraw
轉換為 .profdata
,然後傳遞給編譯器。
$NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-profdata \
merge --output=pgo_profile.profdata \
<list-of-profraw-files>
請使用相同 NDK 版本中的 llvm-profdata
和 clang
,避免設定檔檔案格式出現版本不相符的問題。
步驟 3:使用設定檔建構應用程式
將 -fprofile-use=<>.profdata
傳遞給編譯器和連接器,藉此在應用程式的發布子版本中使用上一步驟產生的設定檔。即使程式碼隨時間改變,這些設定檔依然可以繼續使用。Clang 編譯器能夠容許來源檔案與設定檔之間有細微差異。
注意:一般對大部分程式庫而言,設定檔在各個架構中通常是相同的。例如,透過程式庫 arm64 版本產生的設定檔可用於所有架構。需注意的是,如果程式庫中有架構專屬的程式碼路徑 (arm 與 x86 或 32 位元與 64 位元),則應針對此類設定分別使用獨立的設定檔。
總結
https://github.com/DanAlbert/ndk-samples/tree/pgo/pgo 示範了在應用程式中使用 PGO 的完整流程,其中提供本文件中省略的額外詳細資料。
- CMake 建構規則說明了如何設定 CMake 變數,用於透過檢測設備建構原生程式碼。
如未設定建構變數,系統會使用先前產生的 PGO 設定檔,將原生程式碼最佳化。
- 在檢測建構中,pgodemo.cpp 會在執行工作負載期間寫入設定檔。
- 在執行階段可使用
applicationContext.cacheDir.toString()
在 MainActivity.kt 中取得設定檔的可寫入位置。
- 如要在不使用
adb root
的情況下從裝置提取設定檔,請使用這裡的 adb
套件。
這個頁面中的內容和程式碼範例均受《內容授權》中的授權所規範。Java 與 OpenJDK 是 Oracle 和/或其關係企業的商標或註冊商標。
上次更新時間:2024-11-09 (世界標準時間)。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2024-11-09 (世界標準時間)。"],[],[],null,["# Profile-guided Optimization\n\n[Profile-guided optimization (PGO)](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization)\nis a well known compiler optimization\ntechnique. In PGO, runtime profiles from a program's executions are used by the\ncompiler to make optimal choices about inlining and code layout. This leads to\nimproved performance and reduced code size.\n\nPGO can be deployed to your application or library with the following steps:\n1. Identify a representative workload.\n2. Collect profiles.\n3. Use the profiles in a Release build.\n\nStep 1: Identify a Representative Workload\n------------------------------------------\n\nFirst, identify a representative benchmark or workload for your application.\nThis is a critical step as the profiles collected from the workload identify the\nhot and cold regions in the code. When using the profiles, the compiler will\nperform aggressive optimizations and inlining in the hot regions. The compiler\nmay also choose to reduce the code size of cold regions while trading off\nperformance.\n\nIdentifying a good workload is also beneficial to keep track of performance in\ngeneral.\n\nStep 2: Collect Profiles\n------------------------\n\nProfile collection involves three steps:\n- building native code with instrumentation,\n- running the instrumented app on the device and generating profiles, and\n- merging/post-processing the profiles on the host.\n\n### Create Instrumented Build\n\nThe profiles are collected by running the workload from step 1 on an\ninstrumented build of the application. To generate an instrumented build, add\n`-fprofile-generate` to the compiler and linker flags. This flag should be\ncontrolled by a separate build variable since the flag is not needed during a\ndefault build.\n\n### Generate Profiles\n\nNext, run the instrumented app on the device and generate profiles.\nProfiles are collected in memory when the instrumented binary is run and get\nwritten to a file at exit. However, functions registered with `atexit` are not\ncalled in an Android app --- the app just gets killed.\n\nThe application/workload has to do extra work to set a path for the profile file\nand then explicitly trigger a profile write.\n\n- To set the profile file path, call `__llvm_profile_set_filename(PROFILE_DIR \"/default-%m.profraw`. `%m` is useful when there are multiple shared libraries. `%m` expands to a unique module signature for that library, resulting in a separate profile per library. See [here](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) for other useful pattern specifiers. `PROFILE_DIR` is a directory that is writable from the app. See the [demo](#putting_it_all_together) for detecting this directory at runtime.\n- To explicitly trigger a profile write, call the `__llvm_profile_write_file` function.\n\n extern \"C\" {\n extern int __llvm_profile_set_filename(const char*);\n extern int __llvm_profile_write_file(void);\n }\n\n #define PROFILE_DIR \"\u003clocation-writable-from-app\u003e\"\n void workload() {\n // ...\n // run workload\n // ...\n\n // set path and write profiles after workload execution\n __llvm_profile_set_filename(PROFILE_DIR \"/default-%m.profraw\");\n __llvm_profile_write_file();\n return;\n }\n\nNB: Generating the profile file is simpler if the workload is a standalone binary ---\njust set the `LLVM_PROFILE_FILE` environment variable to `%t/default-%m.profraw`\nbefore running the binary.\n\n### Post-process Profiles\n\nThe profile files are in the .profraw format. They must first be fetched from\nthe device using `adb pull`. After fetch, use the `llvm-profdata` utility in\nthe NDK to convert from `.profraw` to `.profdata`, which can then be passed to the\ncompiler. \n\n $NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-profdata \\\n merge --output=pgo_profile.profdata \\\n \u003clist-of-profraw-files\u003e\n\nUse the `llvm-profdata` and `clang` from the same NDK release to avoid version\nmismatch of the profile file formats.\n\nStep 3 Use the Profiles to Build Application\n--------------------------------------------\n\nUse the profile from the previous step during a release build of your\napplication by passing `-fprofile-use=\u003c\u003e.profdata` to the compiler and linker. The\nprofiles can be used even as the code evolves --- the Clang compiler can tolerate\nslight mismatch between the source and the profiles.\n\nNB: In general, for most libraries, the profiles are common across architectures.\nFor e.g., profiles generated from arm64 build of the library can be used for\nall architectures. The caveat being that if there are architecture-specific\ncode paths in the library (arm vs x86 or 32-bit vs 64-bit), separate profiles\nshould be used for each such configuration.\n\nPutting it all together\n-----------------------\n\n\u003chttps://github.com/DanAlbert/ndk-samples/tree/pgo/pgo\u003e\nshows an end-to-end demo for using PGO from an app. It provides additional\ndetails that were skimmed over in this doc.\n\n- The [CMake build\n rules](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/cpp/CMakeLists.txt) show how to setup a CMake variable that builds native code with instrumentation. When the build variable is not set, native code is optimized using previously generated PGO profiles.\n- In an instrumented build, [pgodemo.cpp](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/cpp/pgodemo.cpp) writes the profiles are workload execution.\n- A writable location for the profiles is obtained at runtime in [MainActivity.kt](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/java/com/example/pgodemo/MainActivity.kt) using `applicationContext.cacheDir.toString()`.\n- To pull profiles from the device without requiring `adb root`, use the `adb` recipe [here](https://github.com/DanAlbert/ndk-samples/blob/pgo/pgo/app/src/main/cpp/CMakeLists.txt#L11)."]]