Vulkan 設計指南
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
與早期圖形 API 不同,Vulkan 的驅動程式不會為應用程式執行特定最佳化作業,例如重複使用管道。相反地,使用 Vulkan 的應用程式必須自行實作這類最佳化作業。否則,相較於執行 OpenGL ES 的應用程式,這些應用程式的效能可能較差。
應用程式在自行實作這些最佳化作業時,由於可存取特定使用情境更詳細的資訊,因此可能比驅動程式表現更佳。因此,如果能有技巧性地將使用 Vulkan 的應用程式進行最佳化,應用程式的效能就會比起在 OpenGL ES 上執行更佳。
本頁面會介紹 Android 應用程式可於 Vulkan 實作的幾種最佳化方式,有助於提升應用程式效能。
硬體加速
大多數裝置都透過硬體加速支援 Vulkan 1.1,而少部分的裝置則透過軟體模擬支援。應用程式可以透過 vkGetPhysicalDeviceProperties
和檢查所傳回結構的 deviceType
欄位,偵測出軟體型態的 Vulkan 裝置。SwiftShader 和其他以 CPU 為基礎的實作會包含 VK_PHYSICAL_DEVICE_TYPE_CPU
值。透過檢查這個相同結構的 vendorID
和 deviceID
欄位是否包含 SwiftShader 專屬值,應用程式就能特別針對 SwiftShader 進行檢查。
著重效能的應用程式應避免使用軟體模擬的 Vulkan 實作,而應改用 OpenGL ES。
在算繪時套用畫面旋轉設定
當應用程式的向上方向與裝置螢幕方向不一致時,合成器會旋轉應用程式的交換鏈圖片,讓兩者方向一致。合成器會在顯示圖片時旋轉圖片方向,這會導致耗電量增加,有時耗電量甚至比不旋轉圖片時還高出許多。
相較之下,如果在產生交換鏈圖片時就旋轉圖片方向,則即使有額外耗電量,增幅也十分有限。VkSurfaceCapabilitiesKHR::currentTransform
欄位顯示合成器套用至視窗的旋轉效果。應用程式在算繪期間套用該旋轉效果後,會透過 VkSwapchainCreateInfoKHR::preTransform
欄位回報已完成旋轉。
最大程度減少每個影格的算繪通道
在大部分的行動 GPU 架構中,開始和結束算繪通道是一項耗時且耗費資源的作業。透過將算繪作業控制在盡可能少的算繪通道中,就能提升您的應用程式效能。
不同的附件載入和附件儲存操作會有不同等級的效能。例如,如果您不需要保存附件內容,可以使用更快的 VK_ATTACHMENT_LOAD_OP_CLEAR
或 VK_ATTACHMENT_LOAD_OP_DONT_CARE
,而不是 VK_ATTACHMENT_LOAD_OP_LOAD
。同樣地,如果您不需要將附件的最終值寫入記憶體以供日後使用,則可以使用 VK_ATTACHMENT_STORE_OP_DONT_CARE
,就能獲得比 VK_ATTACHMENT_STORE_OP_STORE
更佳的效能。
此外,在大部分算繪通道中,應用程式不需要載入或儲存深度/模板附件。在這些情況下,您可以在建立附件圖片時使用 VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
旗標,就不需為附件分配實體記憶體。這個位元提供與 OpenGL ES 中的 glFramebufferDiscard
相同的優點。
選擇合適的記憶體類型
分配裝置記憶體時,應用程式必須選擇記憶體類型。記憶體類型決定應用程式可以如何使用記憶體,同時也會說明記憶體的快取和一致性屬性。不同裝置可使用的記憶體類型皆不同;不同記憶體類型展現的效能特性也不相同。
應用程式可透過簡單的演算法,針對特定用途選擇最合適的記憶體類型。這個演算法會從 VkPhysicalDeviceMemoryProperties::memoryTypes
陣列中選取符合以下兩個條件的第一個記憶體類型:緩衝區或圖片必須適用這個記憶體類型,而且這個記憶體類型必須至少具備應用程式所需的屬性。
行動系統通常不會有 CPU 和 GPU 各自獨立的實體記憶體堆積。比起具有獨立 GPU (擁有獨立專屬記憶體) 的系統,對於這類系統而言,VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
的重要性沒有那麼大。應用程式不應假設這類屬性為必要屬性。
依頻率為描述元集分組
如果您的資源繫結會在不同頻率下變更,請針對每個管道使用多個描述元集,而不是在每次繪圖時重新繫結所有資源。例如,您可以為按情境的繫結建立一組描述元集,為按材料的繫結建立另一組描述元集,以及為按網格執行個體的繫結建立第三組描述元集。
為頻率最高的變更 (例如每次繪製呼叫時所執行的變更) 使用立即常數。
這個頁面中的內容和程式碼範例均受《內容授權》中的授權所規範。Java 與 OpenJDK 是 Oracle 和/或其關係企業的商標或註冊商標。
上次更新時間:2024-01-10 (世界標準時間)。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["缺少我需要的資訊","missingTheInformationINeed","thumb-down"],["過於複雜/步驟過多","tooComplicatedTooManySteps","thumb-down"],["過時","outOfDate","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["示例/程式碼問題","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2024-01-10 (世界標準時間)。"],[],[],null,["# Vulkan design guidelines\n\nVulkan is unlike earlier graphics APIs in that drivers do not perform certain\noptimizations, such as pipeline reuse, for apps. Instead, apps using Vulkan must\nimplement such optimizations themselves. If they do not, they may exhibit worse\nperformance than apps running OpenGL ES.\n\n\nWhen apps implement these optimizations themselves, they have the potential\nto do so more successfully than the driver can, because they have access to\nmore specific information for a given use case. As a result, skillfully\noptimizing an app that uses Vulkan can yield better performance than if the\napp were using OpenGL ES.\n\n\nThis page introduces several optimizations that your Android app can implement\nto gain performance boosts from Vulkan.\n\nHardware acceleration\n---------------------\n\n\nMost devices\n[support Vulkan 1.1 via hardware acceleration](/reference/android/content/pm/PackageManager#FEATURE_VULKAN_HARDWARE_VERSION) while a small subset support\nit via software emulation. Apps can detect a\nsoftware-based Vulkan device using `vkGetPhysicalDeviceProperties`\nand checking the `deviceType` field of the returned structure.\n[SwiftShader](https://github.com/google/swiftshader) and other\nCPU-based implementations have the value `VK_PHYSICAL_DEVICE_TYPE_CPU`.\nApps can check specifically for SwiftShader by checking the `vendorID` and `deviceID`\nfields of this same structure for SwiftShader-specific values.\n\n\nPerformance-critical apps should avoid using software-emulated Vulkan implementations\nand fall back to OpenGL ES instead.\n\nApply display rotation during rendering\n---------------------------------------\n\n\nWhen the upward-facing direction of an app doesn't match the orientation of the device's\ndisplay, the compositor rotates the app's swapchain images so that it\ndoes match. It performs this rotation as it displays the images, which results\nin more power consumption---sometimes significantly more---than if it were not\nrotating them.\n\n\nBy contrast, rotating swapchain images while generating them results in\nlittle, if any, additional power consumption. The\n`VkSurfaceCapabilitiesKHR::currentTransform` field indicates the rotation\nthat the compositor applies to the window. After an app applies that rotation\nduring rendering, the app uses the `VkSwapchainCreateInfoKHR::preTransform`\nfield to report that the rotation is complete.\n\nMinimize render passes per frame\n--------------------------------\n\n\nOn most mobile GPU architectures, beginning and ending a render pass is an\nexpensive operation. Your app can improve performance by organizing rendering operations into\nas few render passes as possible.\n\n\nDifferent attachment-load and attachment-store ops offer different levels of\nperformance. For example, if you do not need to preserve the contents of an attachment, you\ncan use the much faster `VK_ATTACHMENT_LOAD_OP_CLEAR` or\n`VK_ATTACHMENT_LOAD_OP_DONT_CARE` instead of `VK_ATTACHMENT_LOAD_OP_LOAD`. Similarly, if\nyou don't need to write the attachment's final values to memory for later use, you can use\n`VK_ATTACHMENT_STORE_OP_DONT_CARE` to attain much better performance than\n`VK_ATTACHMENT_STORE_OP_STORE`.\n\n\nAlso, in most render passes, your app doesn't need to load or store the\ndepth/stencil attachment. In such cases, you can avoid having to allocate physical memory for\nthe attachment by using the `VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT`\nflag when creating the attachment image. This bit provides the same benefits as does\n`glFramebufferDiscard` in OpenGL ES.\n\nChoose appropriate memory types\n-------------------------------\n\n\nWhen allocating device memory, apps must choose a memory type. Memory type\ndetermines how an app can use the memory, and also describes caching and\ncoherence properties of the memory. Different devices have different memory\ntypes available; different memory types exhibit different performance\ncharacteristics.\n\n\nAn app can use a simple algorithm to pick the best memory type for a given\nuse. This algorithm picks the first memory type in the\n`VkPhysicalDeviceMemoryProperties::memoryTypes` array that meets two criteria:\nThe memory type must be allowed for the buffer\nor image, and must have the minimum properties that the app requires.\n\n\nMobile systems generally don't have separate physical memory heaps for the\nCPU and GPU. On such systems, `VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT` is not as\nsignificant as it is on systems that have discrete GPUs with their own, dedicated\nmemory. An app should not assume this property is required.\n\nGroup descriptor sets by frequency\n----------------------------------\n\n\nIf you have resource bindings that change at different frequencies, use\nmultiple descriptor sets per pipeline rather than rebinding all resources for\neach draw. For example, you can have one set of descriptors for per-scene\nbindings, another set for per-material bindings, and a third set for\nper-mesh-instance bindings.\n\n\nUse immediate constants for the highest-frequency changes, such as changes\nexecuted with each draw call."]]