profile
viewpoint

haoxli/zjs-demos 1

Sample applications and demos for Zephyr.js

haoxli/cts 0

WebGPU Conformance Test Suite

haoxli/demo-express 0

A demo application to showcase Crosswalk's features and APIs

haoxli/deno 0

A secure TypeScript runtime on V8

haoxli/gpuweb 0

Where the GPU for the Web work happens!

issue commentgpuweb/gpuweb

Pipeline statistics clarifications

@kvark Thanks for clarify. Metal just specify what the returned values are stand for. D3D12 has some descriptions to explain what will affect the results.

I think these values are intended to measure relative complexity of different parts of applications and get insights into the GPU operations while performance tuning. So how about add description to reminder users like this: Note: These values are intended to measure relative statistics generated by operations of the device. Be careful to use these values in the release version of applications, because various device architectures may count these values differently.

Kangz

comment created time in 4 days

push eventhaoxli/gpuweb

Li, Hao

commit sha 64ff17e02c54a4570e4461b16e25b0ddbd5fb26f

Move pipeline statistics query to render encoder and compute encoder These query APIs do not support on render bundles.

view details

push time in 4 days

Pull request review commentgpuweb/gpuweb

Move pipeline statistics query to render encoder and compute encoder

 interface GPUComputePassEncoder {     void dispatch(GPUSize32 x, optional GPUSize32 y = 1, optional GPUSize32 z = 1);     void dispatchIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset); +    void beginPipelineStatisticsQuery(GPUQuerySet querySet, GPUSize32 queryIndex);+    void endPipelineStatisticsQuery(GPUQuerySet querySet, GPUSize32 queryIndex);

Yes, I will remove these args.

haoxli

comment created time in 4 days

Pull request review commentgpuweb/gpuweb

Move pipeline statistics query to render encoder and compute encoder

 interface GPUComputePassEncoder {     void dispatch(GPUSize32 x, optional GPUSize32 y = 1, optional GPUSize32 z = 1);     void dispatchIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset); +    void beginPipelineStatisticsQuery(GPUQuerySet querySet, GPUSize32 queryIndex);

Another consideration is there is also an occlusionQuerySet in GPURenderPassDescriptor, if add a pipelineStatisticsQuerySet, maybe it's a duplication in spec, and there will be timestamp query in the future.

haoxli

comment created time in 4 days

issue commentgpuweb/gpuweb

Pipeline statistics clarifications

I filed an issue for the nested queries in Vulkan validation layers: https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/1874.

@kvark Now we use an array of enum in GPUPipelineStatisticName to specified pipeline statistics, and also return 0 if the specified pipeline statistics is not supported. Difference is the number and order of the results written to buffer match the number and order of the specified pipeline statistics, in Vulkan, they are fixed.

Kangz

comment created time in 6 days

issue openedKhronosGroup/Vulkan-ValidationLayers

Validation layer does not check the queryPool when calling vkCmdBeginQuery nested with same queryType

For vkCmdBeginQuery and vkCmdBeginQueryIndexedEXT, Vulkan spec says:

queryPool must have been created with a queryType that differs from that of any queries that are active within commandBuffer

But if we run the nested query operations with same query pool as below:

  vkCmdBeginQuery(command_buffer,  query_pool, 0 /*slot*/, 0 /*flags*/);
  vkCmdDraw(commandbuffer, 3, 1, 0, 0);

  vkCmdBeginQuery(command_buffer, query_pool, 1 /*slot*/, 0 /*flags*/);
  vkCmdDraw(commandbuffer, 3, 1, 3, 0);
  vkCmdEndQuery(command_buffer, query_pool, 1 /*slot*/);

  vkCmdEndQuery(command_buffer, query_pool, 0 /*slot*/);

When the first vkCmdBeginQuery is called, the query is active, so the sencond vkCmdBeginQuery should not be allowed until the first one is ended, the validation layer does not catch this problem, and I can get the results of the two queries.

created time in 6 days

pull request commentgpuweb/gpuweb

Query API: Timestamp Query

Actually the timestamps returned to users will be a time delta processed by compute shader, the first timestamp will be 0 , the following values are the deltas based on the first one.

But no matter how to return the results of timestamp query, they're in nanosecond. If we want to sidestep potential risks in high precision, how about define an internal timestamp period to increase nanosecond interval like Vulkan did (VkPhysicalDeviceLimits::timestampPeriod).

haoxli

comment created time in 9 days

issue commentgpuweb/gpuweb

Pipeline statistics clarifications

Oh, understand what you mean, but strangely, I can test this case pass with Vulkan validation layers, I will file the issue for the validation and add description on spec to disallow the nesting.

Kangz

comment created time in 10 days

PR opened gpuweb/gpuweb

Move pipeline statistics query to render encoder and compute encoder

These query APIs do not support on render bundles, move them to GPUComputePassEncoder and GPURenderPassEncoder. Issue: #794

+6 -3

0 comment

1 changed file

pr created time in 10 days

create barnchhaoxli/gpuweb

branch : pipeline-statistics-fix

created branch time in 10 days

push eventhaoxli/gpuweb

Paul Kernfeld

commit sha 4b711a8cd99f758e262e72b877fb39f3d1912fc7

Fix *N8 to *N* in struct type overview (#729)

view details

David Neto

commit sha 26088ba8ea1a149e579e65e5b52a93544d40a7d7

Remove TODO for signed and unordered comparisons (#791) We no longer need those builtins: - PR https://github.com/gpuweb/gpuweb/pull/772 added type rules for signed integer comparisons - Resolution of https://github.com/gpuweb/gpuweb/issues/706 is to not have unordered floating point comparisons as a direct language feature.

view details

David Neto

commit sha 7861a34efa673808d0ac3dff0153be3ba34a9d1b

Make tokens for << >> >>> (#792) Make tokens for << >> >>> For example, disallow a < < b from being recognized as a left-shift.

view details

push time in 10 days

pull request commentwebgpu-native/webgpu-headers

Add pipeline statistics queries

Yes, they are not supported on bundles, but we can nest query operations for a same queryset or querysets with different types.

Kangz

comment created time in 10 days

issue commentgpuweb/gpuweb

Pipeline statistics clarifications

Should they be allowed in GPURenderBundle? They currently are, but it seems to conflict with our earlier decision to disallow occlusion queries on render bundles.

Yes, you're right, D3D12 and Metal don't support Query API on render bundles. I will move the Pipeline Statistcs APIs to GPURenderPassEncoder and GPUComputePassEncoder.

queryPool must have been created with a queryType that differs from that of any queries that are active within commandBuffer.

It only does not allow to create queryPool with same query type, but we can nest the query operation for a same query type, even for different query types, such as:

pass.beginPipelineStatisticsQuery(queryset, 0);
pass.beginPipelineStatisticsQuery(queryset, 1);
pass.draw(3, 1, 0, 0);
pass.endPipelineStatisticsQuery(queryset, 1);
pass.draw(6, 1, 3, 0);
pass.endPipelineStatisticsQuery(queryset, 0);

or

pass.beginPipelineStatisticsQuery(psQuerySet, 0);
pass.writeTimestamp(tsQuerySet, 0);
pass.draw(3, 1, 0, 0);
pass.writeTimestamp(tsQuerySet, 1);
pass.draw(6, 1, 3, 0);
pass.endPipelineStatisticsQuery(psQuerySet, 0);

Maybe we can remove the args of queryset becase we have seprate api for different queries, but if removed, we need told users which the queryset the results are stored in, and the created queryset is only used when resolving, maybe also considerring not expose queryset, and resolving the data with query type instead, becasue there is only one queryset for each type, we can handle them in native. I think it will increase the complexity of the implementation, so currently I prefer to keep these arguments.

Kangz

comment created time in 10 days

push eventhaoxli/gpuweb

Jiawei Shao

commit sha bb3fccc1bde901c127e23f28df6c943eb00af0a0

Disallow source and destination to be the same buffer in B2B copy (#788) * Disallow source and destination to be the same buffer in B2B copy * Small fix * Address reviewer's comment

view details

Dzmitry Malyshau

commit sha 5526a9c3c96ea5dfe6a34b155c58e046f65a8257

Read-only depth-stencil pass (#746)

view details

Kai Ninomiya

commit sha 9a642f397df8e6744c6b9338ef62d36688621bc8

Replace DOMString with USVString for wgsl and debug strings (#787) Issue: #784

view details

Brandon Jones

commit sha 17bdb1723836f63b7046c705c2ffe58c429456cf

Enable 'make online' to output bikeshed validation (#786)

view details

Dzmitry Malyshau

commit sha cc744595378aa737add428161fab0146f90c9ac1

Add Queue/writeTexture method (#761) * Add writeTexture * Refactor writeXxx parameters * Turn copy view validations into proper algorithms * Apply suggestions from code review * wrap writeBuffer and writeTexture in <div algorithm> Co-authored-by: Kai Ninomiya <kainino1@gmail.com> Co-authored-by: Kai Ninomiya <kainino@chromium.org>

view details

Li, Hao

commit sha 8eac1a71a3c523dcf2819755d2a81063f6a65da6

Query API: getElapsedTime on command buffer Add getElapsedTime to get execution time of entire command buffer.

view details

push time in 11 days

push eventhaoxli/gpuweb

Jiawei Shao

commit sha bb3fccc1bde901c127e23f28df6c943eb00af0a0

Disallow source and destination to be the same buffer in B2B copy (#788) * Disallow source and destination to be the same buffer in B2B copy * Small fix * Address reviewer's comment

view details

Dzmitry Malyshau

commit sha 5526a9c3c96ea5dfe6a34b155c58e046f65a8257

Read-only depth-stencil pass (#746)

view details

Kai Ninomiya

commit sha 9a642f397df8e6744c6b9338ef62d36688621bc8

Replace DOMString with USVString for wgsl and debug strings (#787) Issue: #784

view details

Brandon Jones

commit sha 17bdb1723836f63b7046c705c2ffe58c429456cf

Enable 'make online' to output bikeshed validation (#786)

view details

Dzmitry Malyshau

commit sha cc744595378aa737add428161fab0146f90c9ac1

Add Queue/writeTexture method (#761) * Add writeTexture * Refactor writeXxx parameters * Turn copy view validations into proper algorithms * Apply suggestions from code review * wrap writeBuffer and writeTexture in <div algorithm> Co-authored-by: Kai Ninomiya <kainino1@gmail.com> Co-authored-by: Kai Ninomiya <kainino@chromium.org>

view details

push time in 11 days

create barnchhaoxli/gpuweb

branch : command-buffer-elapsed-time

created branch time in 12 days

push eventhaoxli/gpuweb

Brandon Jones

commit sha 8cb9ebd586afbd95a9827cb467bae850985779ae

Specifies the state machine of GPUCommandEncoder (#752) * Specifies the state machine of GPUCommandEncoder * Use dot syntax, as pointed out by @Kangz * Address feedback from @kvark * Fixing some build errors * Addressing @kainino0x's feedback * Apply suggestions from code review Co-authored-by: Kai Ninomiya <kainino1@gmail.com>

view details

Dean Jackson

commit sha 31031076e77914baf4b8a703d38c51a5750cd340

Sort the 'Under Discussion' column by putting MVP at the top

view details

Dean Jackson

commit sha 23b35a9f7b62e427670ef9eef45c7c02f6a036f3

Ignore node_modules in tools directory

view details

Jeff Gilbert

commit sha 68331a1dc2074e1965baee3cf7dabe90f1144043

Add GPUShaderModuleDescriptor/sourceMap. (#645) * Add GPUShaderModuleDescriptor/sourceMap. * Dictionary members are optional unless required, s/non-null/defined/. * `sourceMap` MAY be a source-map-v3 format, but it's not required.

view details

Jeff Gilbert

commit sha 62536354e07d5d5449bd4fe83cefbb198fe0795e

Add pipeline shader module error enumeration. (#646) * Add pipeline shader module error enumeration. * Move compilationMessages to GPUShaderModule. * Add `GPUCompilationMessage.message` (oops). * Remove sourcemaps references for now, and return info iface that has seq<message>. * Apply suggestions from code review * IDL syntax fixes Co-authored-by: Kai Ninomiya <kainino@chromium.org>

view details

push time in 12 days

push eventhaoxli/cts

Li, Hao

commit sha c2ed70c593ccb5c61b6d16cf9c765c9a2f5f05e7

Index format tests Add test cases for testing drawIndexed, IndexFormat, Primitive restart

view details

push time in 12 days

PR opened gpuweb/cts

Index format tests

Add test cases for testing drawIndexed, IndexFormat, Primitive restart

+255 -0

0 comment

1 changed file

pr created time in 12 days

create barnchhaoxli/cts

branch : index-format-tests

created branch time in 12 days

Pull request review commentgpuweb/cts

Index format tests

+export const description = `Index format tests.`;++import { TestGroup } from '../../../../../common/framework/test_group.js';+import { GPUTest } from '../../../../gpu_test.js';+import glslangModule from '../../../../util/glslang.js';++export class IndexFormatTest extends GPUTest {+  private glslang: any;+  private pipeline!: GPURenderPipeline;+  private colorAttachment!: GPUTexture;+  private result!: GPUBuffer;+  private swapChainFormat: GPUTextureFormat = 'bgra8unorm';++  async initResources(format: GPUIndexFormat): Promise<void> {+    // glslang module+    this.glslang = await glslangModule();++    // render pipeline+    this.pipeline = this.MakeRenderPipeline(format);++    const context = this.CreateRenderContext(true);

Thanks, I will use the way in glsl-dependent branch.

haoxli

comment created time in 12 days

Pull request review commentgpuweb/cts

Index format tests

+export const description = `Index format tests.`;++import { TestGroup } from '../../../../../common/framework/test_group.js';+import { GPUTest } from '../../../../gpu_test.js';+import glslangModule from '../../../../util/glslang.js';++export class IndexFormatTest extends GPUTest {+  private glslang: any;+  private pipeline!: GPURenderPipeline;+  private colorAttachment!: GPUTexture;+  private result!: GPUBuffer;+  private swapChainFormat: GPUTextureFormat = 'bgra8unorm';++  async initResources(format: GPUIndexFormat): Promise<void> {+    // glslang module+    this.glslang = await glslangModule();++    // render pipeline+    this.pipeline = this.MakeRenderPipeline(format);++    const context = this.CreateRenderContext(true);+    if (context !== null) {+      const swapChain: GPUSwapChain = context.configureSwapChain({+        device: this.device,+        format: this.swapChainFormat,+        usage: GPUTextureUsage.COPY_SRC | GPUTextureUsage.OUTPUT_ATTACHMENT,+      });+      this.colorAttachment = swapChain.getCurrentTexture();+    }++    // result buffer+    this.result = this.device.createBuffer({+      size: 4,+      usage: GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST,+    });+  }++  CreateRenderContext(display: boolean): any {+    const canvas = document.createElement('canvas');+    canvas.width = 100;+    canvas.height = 100;+    // Display rendering on test page+    if (display) {+      const container = document.getElementById('canvasContainer');+      if (container) {+        container.innerHTML = '';+        container.append(canvas);+      }+    }+    return canvas.getContext('gpupresent');+  }++  MakeRenderPipeline(format: GPUIndexFormat): GPURenderPipeline {+    const vertexShaderGLSL = `#version 450+    layout(location = 0) in vec4 pos;+    void main() {+        gl_Position = pos;+    }`;++    const fragmentShaderGLSL = `#version 450+    layout(location = 0) out vec4 fragColor;+    void main() {+        fragColor = vec4(0.0, 1.0, 0.0, 1.0);+    }`;++    return this.device.createRenderPipeline({+      layout: this.device.createPipelineLayout({ bindGroupLayouts: [] }),++      vertexStage: {+        module: this.device.createShaderModule({+          code: this.glslang.compileGLSL(vertexShaderGLSL, 'vertex'),+        }),+        entryPoint: 'main',+      },+      fragmentStage: {+        module: this.device.createShaderModule({+          code: this.glslang.compileGLSL(fragmentShaderGLSL, 'fragment'),+        }),+        entryPoint: 'main',+      },++      primitiveTopology: 'triangle-strip',++      colorStates: [+        {+          format: this.swapChainFormat,+        },+      ],++      vertexState: {+        indexFormat: format,+        vertexBuffers: [+          {+            arrayStride: 4 * 4,+            stepMode: 'vertex',+            attributes: [+              {+                format: 'float4',+                offset: 0,+                shaderLocation: 0,+              },+            ],+          },+        ],+      },+    });+  }++  MakeBufferMapped(arrayBuffer: ArrayBuffer, usage: number): GPUBuffer {+    const [buffer, bufferMapping] = this.device.createBufferMapped({+      size: arrayBuffer.byteLength,+      usage,+    });++    if (arrayBuffer instanceof Float32Array) {

I tested it's true if arrayBuffer = new Float32Array()

haoxli

comment created time in 12 days

fork haoxli/cts

WebGPU Conformance Test Suite

https://gpuweb.github.io/cts/

fork in 12 days

delete branch haoxli/cts

delete branch : index-format-tests

delete time in 12 days

push eventhaoxli/cts

Kai Ninomiya

commit sha 9cbb800a967708b0a72f252fd045d973f77755be

Make test cases hierarchical, and refactor/cleanup a lot in the process (#181) * update tests, approximately * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * unittests:loading:* now passes * passes 63 unittests, but not all are running... * typechecks!! * 58 unittests passing * reenable getStackTrace tests * wip * finally all tests and checks pass! * fixups * wip * replace all spaces with underscores in test names * wip * wip * wip * Dissolve TestLoader (now just use DefaultTestFileLoader) * tighten ParamSpec type * rename ParamSpec -> CaseParams * make kParamSeparator ";", now standalone runner is working! * add separate icon for leaf run buttons * file renames * remove user-select:all (not really needed anymore, triple click works fine now) * cleanup some tests, add tests for test queries * split interface for defining and running TestGroups * more cleanup, fix URL encoding stuff, and standalone HTML * fix readmes for parents of query * clean up unused ReadmeFile interface * dissolve id.ts * test_suite_listing.ts * split params_utils into param_utils + stringify_params * rename encodeURLSelectively again * add query.isMulti{File,Test,Case} * fix gen_wpt_cts_html (at least for the simple mode; advanced mode probably still broken * better run-one icon * standalone: always make the url bar show the complete url * revert tasks.json * fix number out of bounds * don't drop worker results, report input string on all query parse errors * update docs/terms.md * revisions * more revisions * make comparePublicParamsPaths order-agnostic (now allows queries against params out of order, but tree will not reorder itself accordingly) * revise compareQueries, part 1 * revise compareQueries, part 2: compareOneLevel * remove incorrect comment * undo fast-glob dependency * address Austin's comments * address comments * pass debug flag into test worker * typo fixes

view details

Kai Ninomiya

commit sha 0b380062012789052ed123b78422b87ab1675f54

Don't use variants in reftests (#185) Turns out WPT doesn't support test variants in reftests.

view details

Kai Ninomiya

commit sha ca1351128891b2da026d82edc412911256a3ec90

only checkForDuplicateCases during crawl(); slightly speed up comparePublicParamsPaths (#186) * only checkForDuplicateCases during crawl(); slightly speed up comparePublicParamsPaths * use Set * un-delete error message * expand tests, go back to n^2 (instead of Set) to pass tests * more testing, rename method * nit

view details

push time in 12 days

PR closed gpuweb/cts

Index format tests

Add test cases for testing drawIndexed, IndexFormat, Primitive restart

+320 -0

2 comments

5 changed files

haoxli

pr closed time in 12 days

pull request commentgpuweb/cts

Index format tests

OK, close this one.

haoxli

comment created time in 12 days

push eventhaoxli/cts

Li, Hao

commit sha 45b258a4e759c7981bb953c4190991dde10c6061

Index format tests Add test cases for testing drawIndexed, IndexFormat, Primitive restart

view details

push time in 12 days

push eventhaoxli/cts

Li, Hao

commit sha 89422e23b08223e82d08bc5a4edfafeb43770443

Index format tests Add test cases for testing drawIndexed, IndexFormat, Primitive restart

view details

push time in 12 days

PR opened gpuweb/cts

Index format tests

Add test cases for testing drawIndexed, IndexFormat, Primitive restart

+258 -0

0 comment

5 changed files

pr created time in 13 days

push eventhaoxli/cts

Li, Hao

commit sha c94495c7e395b492cfd4514873892172375831d1

Index format tests Add test cases for testing drawIndexed, IndexFormat, Primitive restart

view details

push time in 13 days

create barnchhaoxli/cts

branch : index-format-tests

created branch time in 13 days

fork haoxli/cts

WebGPU Conformance Test Suite

https://gpuweb.github.io/cts/

fork in 13 days

push eventhaoxli/gpuweb

Li Hao

commit sha f7165d91de61a8895e3683b89d9cfb49b1cabae7

Add extension requirements in queries

view details

push time in 17 days

push eventhaoxli/gpuweb

Li Hao

commit sha 21c786571438d5288e4766ae47a05d525d900a38

Add writeTimestamp in GPUCommandEncoder

view details

push time in 18 days

Pull request review commentgpuweb/gpuweb

Query API: Timestamp Query

 interface mixin GPUProgrammablePassEncoder {      void beginPipelineStatisticsQuery(GPUQuerySet querySet, GPUSize32 queryIndex);     void endPipelineStatisticsQuery(GPUQuerySet querySet, GPUSize32 queryIndex);+    void writeTimestamp(GPUQuerySet querySet, GPUSize32 queryIndex);

If so, I think we can move writeTimestamp to GPUCommandEncoder, and also call encoder.writeTimestamp inside render pass or compute. For Metal backend, the sampleCountersInBuffer are on blit/render/compute encoders, so we just need to know which encoder is activing in command encoder currently, then call the API of the encoder.

haoxli

comment created time in 18 days

PR opened gpuweb/gpuweb

Query API: Timestamp Query

Add extension and entrypoints for timestamp query according to #614.

+10 -2

0 comment

1 changed file

pr created time in 19 days

push eventhaoxli/gpuweb

Li, Hao

commit sha e995f88362e7958122beb186d602e49c4c185c9f

Query API: Timestamp Query

view details

push time in 19 days

push eventhaoxli/gpuweb

Li, Hao

commit sha b9056ea89f15dddab00218d9303a2cd12fb0b3ca

Query API: Timestamp Query

view details

push time in 19 days

create barnchhaoxli/gpuweb

branch : timestamp-query

created branch time in 19 days

delete branch haoxli/gpuweb

delete branch : timestamp-query

delete time in 19 days

push eventhaoxli/gpuweb

Kai Ninomiya

commit sha d72ac60be08cda3477089ab1d96244a9145e45c6

Merge GPUTextureCopyView.arrayLayer into .origin.z (#730) * Merge GPUTextureCopyView.arrayLayer into .origin.z Similar to what was done for GPUTextureDescriptor in #613, I just missed it. * update more text as needed * address comments

view details

Li, Hao

commit sha 88808e2c9bb5f86d82fae49c0c59326b90134f9f

Query API: Timestamp Query

view details

push time in 19 days

push eventhaoxli/gpuweb

Kai Ninomiya

commit sha 5b22c22cc4fcc7876c0479ad1270a1026ab60d53

Merge GPUTextureCopyView.arrayLayer into .origin.z (#730) * Merge GPUTextureCopyView.arrayLayer into .origin.z Similar to what was done for GPUTextureDescriptor in #613, I just missed it. * update more text as needed * address comments

view details

push time in 19 days

create barnchhaoxli/gpuweb

branch : timestamp-query

created branch time in 20 days

push eventhaoxli/gpuweb

Jeff Gilbert

commit sha 77d3936bacf354b89bd8a2a50a3ad8596ec9bfed

Move builtins and intrinsics out of the grammar. (#655) * Move builtins and intrinsics out of the grammar. * Add valid decoration values per key.

view details

Myles C. Maxfield

commit sha 08f7552562fb14e0af9397bb1d148448ce32430b

Remove regardless (#690) We've already forfeited perfect CFG round-trippability of SPIR-V because of the lack of phi operations. So, now we're in a world of degrees, where we have to decide whether a concept is important enough to be required to be exactly faithfully preserved, or unimportant enough to allow the concept to be modified by the round-trip. The benefit of `regardless` is unclear, as it has yet to be demonstrated. This concept doesn't exist in GLSL, HLSL, or MSL, so it is at least conceivable that it is unnecessary. On the other hand, it adds an unfamiliar keyword/block to the language, which makes the language more difficult to read/write. Implementations will have to implement it, test it thoroughly, figure out how to represent it in the platform shading languages, and maintain it over time. For `regardless` specifically, the current cost seems to outweigh the benefit, at least until the benefit is demonstrated. This PR removes `regardless` until that happens. closes https://github.com/gpuweb/gpuweb/issues/580 Co-authored-by: Myles C. Maxfield <mmaxfield@apple.com>

view details

Jiawei Shao

commit sha 03da857a87c78a951208b3e1ce2fe993efa2231e

Add validation rules for copyBufferToTexture (#648) * Add validation rules for copyBufferToTexture * Small fix * Address more reviewer's suggestions * Address more comments from the reviewers * Fix several typos

view details

Corentin Wallez

commit sha 85b0a02b7707cd45751267668e0faf555080fe40

Rename arguments to resolveQuery (#683)

view details

Josh Groves

commit sha b94bb52fc4ef7511e8ef1496ed5e988e7c229d63

Fix typos (#698)

view details

Greg Roth

commit sha 7fe2350ed48571fab717db29f900192372b39b5e

Add approved text on checkin process (#703) This was discussed in the WGSL meeting and also on the broader webgpu mailing list. All feedback was positive and I was asked to upload it here.

view details

David Neto

commit sha 5f9beea7d1c9ea53ea650a09ac9eca20b83975f8

[wgsl] Define the void type (#712)

view details

Jiawei Shao

commit sha adfae01c6a6d83d5bf547f23fff23e1e90ac5b56

Add validation rules for copyTextureToBuffer (#694) * Add validation rules for copyTextureToBuffer This PR also fixes the typo by using "&plus;" instead of "&add;". * Put the common rules of B2T and T2B copies together * Address more reviewer's comments

view details

David Neto

commit sha d05d63c37f34c32f95e084d457b6a9a253e59a1b

Describe the return statement (#710) * Describe the return statement Also, in the grammar, the expression is optional. * Updates from review - Make a return_stmt grammar element - Combine the return value paragraphs into one. * Remove duplicate return_stmt grammar rule

view details

Dean Jackson

commit sha 09cc0adf0100fd3af3671bb768226b81b9b33e73

Add a script to gather the WGSL meeting agenda

view details

michaldybizbanski

commit sha 3c7143ad6ad7e7f5d172f73e89228b3ad7e10584

Update BufferOperations examples with .defaultQueue #490 (#742)

view details

Dzmitry Malyshau

commit sha bc902fb88faba8272d457d7f435585892917887d

Document pipelines (#724) * Document pipeline layouts * Document pipelines * Vertex buffer and attribute validation * Detail the rules of bind group equivalence * Refactor algorithm definitions * Indentation fixes

view details

Jiawei Shao

commit sha be261673a891e870cef52fbf1bdc75092337d6f8

Add validation rules on copyTextureToTexture (#725) * Add validation rules on copyTextureToTexture This patch adds the validation rules on copyTextureToTexture: - Updated "GPUTextureCopyView Valid Usage" and "Valid Texture Copy Range" to reuse them in the validations of copyTextureToTexture. - The format of the source texture must be the same as the one of the destination texture as is required on Metal. - We can only copy the whole subresource when the textures are in depth-stencil formats or the textures are multisampled as is required by D3D12. - The source and destination subresources must be different as is required by D3D12. * Address reviewer's comments * Define "textureCopyView subresource size" in an algorithm * Address more reviewer's comments

view details

David Neto

commit sha 62265731fecd7c1105980764593b53faf92b32e8

[wgsl] Describe loop, break, continue, continuing (#711) Also better explain *why* the loop is structured the way it is. SPIR-V did it a certain way for underlying reasons, and we explain some of that.

view details

David Neto

commit sha 25d37caacc28055765a7ae69a0a314be5c9772d2

[wgsl] Describe switch statements (#713) * Multiple literal values in case conditions * Case values must have same type as selector * Ban fallthrough in the last case body of a switch * If no selector matches and no default, then skip body Fixes #630 Fixes #714

view details

Myles C. Maxfield

commit sha 568b98f9e064f4476af71483e79f336d3b54e5d2

Remove premerge (#688) We've already forfeited perfect CFG round-trippability of SPIR-V because of the lack of phi operations. So, now we're in a world of degrees, where we have to decide whether a concept is important enough to be required to be exactly faithfully preserved, or unimportant enough to allow the concept to be modified by the round-trip. The benefit of `premerge` is unclear, as it has yet to be demonstrated. This concept doesn't exist in GLSL, HLSL, or MSL, so it is at least conceivable that it is unnecessary. On the other hand, it adds an unfamiliar keyword/block to the language, which makes the language more difficult to read/write. Implementations will have to implement it, test it thoroughly, figure out how to represent it in the platform shading languages, and maintain it over time. For `premerge` specifically, the current cost seems to outweigh the benefit, at least until the benefit is demonstrated. This PR removes `premerge` until that happens. closes https://github.com/gpuweb/gpuweb/issues/660

view details

hannni

commit sha d4904bf201cc22879bed0f8675d50a00b4399fac

Fix Typo (#756)

view details

Dzmitry Malyshau

commit sha 498c01a68de10be44f02b4d371e5835d53a1e5a8

Allow postfixes in the assignment lhs (#722) * Allow postfixes in the assignment lhs * Change assignment to use singular_expression * Fix description of the assignment

view details

Doug Moen

commit sha 960a5724b060d6975ed2aa831c2c20a84a6c2de6

fix issue #717: BRACE and BRACKET tokens misnamed (#718) Fixes #717

view details

06wj

commit sha 3493b8af7936de4ae2e0eae3fef2b324507cd938

Fix "VertexState" Typo (#759)

view details

push time in 20 days

pull request commentgpuweb/gpuweb

Query API: Pipeline Statistics Query

@litherum, @RafaelCintron Do you have any comments about pipeline statistics queries?

haoxli

comment created time in 23 days

push eventhaoxli/gpuweb

Li Hao

commit sha ea1cb3bb3b057d52dc628f9a11423273263c1043

Fix nits

view details

push time in a month

push eventhaoxli/gpuweb

Li Hao

commit sha c51dbedd3fd6b76191e01a93a76238517f72b833

Use array of enum insead of bitfield in pipeline statistics

view details

push time in a month

issue commentgpuweb/gpuweb

Investigation: Query API

For the discussion of Pipeline Statistics Query in the 2020-04-27 meeting, have we decided to use arrary of enum instead of the bitfield? And the result of pipeline statistics query should be in order of the specified enums in array, right?

haoxli

comment created time in a month

Pull request review commentgpuweb/gpuweb

Query API: Pipeline Statistics Query

 GPUQuerySet includes GPUObjectBase; dictionary GPUQuerySetDescriptor : GPUObjectDescriptorBase {     required GPUQueryType type;     required GPUSize32 count;+    GPUPipelineStatisticFlags pipelineStatistics; }; </script> +  * {{GPUQuerySetDescriptor/pipelineStatistics}}: a bitmask of {{GPUPipelineStatisticFlags}} specifying which pipeline statistics will be returned in the new query set.+    |pipelineStatistics| is ignored if type is not {{GPUQueryType/pipeline-statistics}}.+    <div class=validusage dfn-for=GPUQuerySetDescriptor.pipelineStatistics>+      <dfn abstract-op>Valid Usage</dfn>+        1. If {{GPUExtensionName/pipeline-statistics-query}} is not available, |type| must not be {{GPUQueryType/pipeline-statistics}}.+        2. If |type| is {{GPUQueryType/pipeline-statistics}},  |pipelineStatistics| must be a valid combination of {{GPUPipelineStatisticFlags}} values.+    </div>+ ## QueryType ## {#querytype}  <script type=idl> enum GPUQueryType {-    "occlusion"+    "occlusion",+    "pipeline-statistics"+};+</script>++## Pipeline Statistics Query ## {#pipeline-statistics}++<script type=idl>+typedef [EnforceRange] unsigned long GPUPipelineStatisticFlags;+interface GPUPipelineStatisticBit {+    const GPUPipelineStatisticFlags VERTEX_SHADER_INVOCATIONS   = 0x01;+    const GPUPipelineStatisticFlags CLIPPER_INVOCATIONS         = 0x02;+    const GPUPipelineStatisticFlags CLIPPER_PRIMITIVES_OUT      = 0x04;+    const GPUPipelineStatisticFlags FRAGMENT_SHADER_INVOCATIONS = 0x08;+    const GPUPipelineStatisticFlags COMPUTE_SHADER_INVOCATIONS  = 0x10; }; </script> +When resolving the results of a pipeline statistics query, the number of pipeline statistics written into GPU buffer depends on the number of enabled {{GPUPipelineStatisticBit}} in {{GPUQuerySetDescriptor/pipelineStatistics}}.++Each result is written into unsigned 64-bit integer and in the order of the lowest-valued member of the {{GPUPipelineStatisticBit}} to the highest.

This behavior follows Vulkan rules, and need fixup on D3D12 and Metal, they will query all of pipeline statistics and resolve results into buffer according to the structure of D3D12_QUERY_DATA_PIPELINE_STATISTICS or MTLCounterResultStatistic, we need to pick the values corresponding to the set bits and write them to buffer in order.

haoxli

comment created time in a month

push eventhaoxli/gpuweb

Li Hao

commit sha 8566e69f11ab0930b61ec4ee5a458551e50086d4

Add GPUPipelineStatisticBit and more instructions

view details

push time in a month

pull request commentgpuweb/gpuweb

Query API: Pipeline Statistics Query

For example if I ask a pipeline statistic query with VERTEX_SHADER_INVOCATIONS | FRAGMENT_SHADER_INVOCATIONS, which of these formats will be the one I can use to read data in shaders from the resolve buffer?

The results in the resolve buffer will be stpred in the first format:

struct Statistics {
    vertexShaderInvocations : u64
    fragmentShaderInvocations : u64;
}

Only the pipeline statistics of the specified flags will be wrote to resolve buffer, and the order of results is not affected by the order of flags, it will always write in this order:

struct Statistics {
    vertexShaderInvocations : u64;
    clipperInvocations : u64; 
    clipperPrimitives : u64;
    fragmentShaderInvocations : u64;
    computeShaderInvocations : u64;
}

which does the same with Vulkan. If we set GPUQuerySetDescriptor.pipelineStatistics=0, only a zero value is wrote to the buffer.

Do I need to add these notes to the PR now?

haoxli

comment created time in a month

issue commentgpuweb/gpuweb

Investigation: Query API

When investigating the compute shader to process the query results, we meet an issue that the support of unsinged 64-bit integer is not in the core feature:

But the type of all query results is uint64 (only Vulkan supports to read the results as uint32), for occlusion and pipeline statistics queries, they just need to read the results from buffer, then write them to buffer after checking, so we can simulated a 64-bit integer with two 32-bit integers, but for timestamp query, it need to perform subtraction, multiplication and division operations to get timestamp delta in nanosecond on D3D12, which is too complicated to do the simulation in shader.

Is there any specific usage about using timestamp results in shader? Otherwise I still propose to return the results of timestamp query in CPU memory.

haoxli

comment created time in a month

pull request commentgpuweb/gpuweb

Query API: Pipeline Statistics Query

On Vulkan, it allows users to decide which data of pipeline statistics to query by setting VkQueryPoolCreateInfo::pipelineStatistics when creating query sets (always querying all pipeline statistics data on D3D12 and Metal), maybe we can add a GPUPipelineStatisticFlags in GPUQuerySetDescriptor such as:

dictionary GPUQuerySetDescriptor : GPUObjectDescriptorBase {
    required GPUQueryType type;
    required GPUSize32 count;
    GPUPipelineStatisticFlags pipelineStatistics;
};

typedef [EnforceRange] unsigned long GPUPipelineStatisticFlags;
interface GPUPipelineStatisticBit {
    const GPUPipelineStatisticFlags VERTEX_SHADER_INVOCATIONS   = 0x01;
    const GPUPipelineStatisticFlags CLIPPER_INVOCATIONS         = 0x02;
    const GPUPipelineStatisticFlags CLIPPER_PRIMITIVES_OUT      = 0x04;
    const GPUPipelineStatisticFlags FRAGMENT_SHADER_INVOCATIONS = 0x08;
    const GPUPipelineStatisticFlags COMPUTE_SHADER_INVOCATIONS  = 0x10;
};
haoxli

comment created time in a month

pull request commentgpuweb/gpuweb

Query API: Pipeline Statistics Query

The results returned by pipeline statistics query will be the number of vertex shader invocations, the number of primitives sent to the clip stage, the number of primitives out by the clip stage, the number of fragment shader invocations, the number of compute shader invocations in order.

Because all query data stored in the QuerySet will be resolved into a buffer, we just copy the common parts supported by all backends to the destination buffer, so if we want to expand the results, just add the corresponding copy items.

For usage scenarios, we can use these pipeline statistics results to measure the relative complexity of different parts of applications, which can help to find bottlenecks while performance tuning.

haoxli

comment created time in 2 months

PR opened gpuweb/gpuweb

Query API: Pipeline Statistics Query

Add extension and entrypoints for pipeline statistics query according to #614.

+7 -2

0 comment

1 changed file

pr created time in 2 months

create barnchhaoxli/gpuweb

branch : pipeline-statistics-idl

created branch time in 2 months

push eventhaoxli/gpuweb

Corentin Wallez

commit sha 1abc267071dddc7537edaf5ed02b13f3070e2325

Add default for arguments of draw and drawIndirect. (#632) Co-authored-by: Kai Ninomiya <kainino@chromium.org>

view details

Hao Li

commit sha 02454410d1ae503f5dd967a0ee493d6d96067085

Query API: Add Occlusion Query (#656) * Query API: Add Occlusion Query Add the definition of Occlusion Query and its requirements. * Define QuerySet, QuerySetDescriptor and QueryType. * Add query set in render pass descriptor for occlusion query. * Add begin/endOcclusionQuery on render pass encoder. * Resolve query result from query set on command encoder. * Add a new buffer usage for resolving query result.

view details

Myles C. Maxfield

commit sha fd27b887f65e53cbd529c7af47ee021b8aef9147

[WGSL] Add a blurb that describes variable loads and stores Closes https://github.com/gpuweb/gpuweb/issues/622

view details

push time in 2 months

Pull request review commentgpuweb/gpuweb

Rename arguments to resolveQuery

 interface GPUCommandEncoder {      void resolveQuerySet(         GPUQuerySet querySet,-        GPUSize32 queryFirstIndex,+        GPUSize32 firstQuery,

If we want to know the meaning of the parameter from the name, startIndex can be better represent what we want to pass. Otherwise, I think firstQuery or firstQueryIndex also make sense, we will explain in the spec that it specifies an index of first query to resolve.

Kangz

comment created time in 2 months

push eventhaoxli/gpuweb

Hao Li

commit sha b278bbf1ad473c26f6280c3fd83cc53f32fb7502

Remove flag of precise occlusion Co-Authored-By: Kai Ninomiya <kainino1@gmail.com>

view details

push time in 2 months

push eventhaoxli/gpuweb

Hao Li

commit sha ebcf25a220568b3fcf903cddbc13b7c06d24bc94

Remove precise occlusion extension Co-Authored-By: Kai Ninomiya <kainino1@gmail.com>

view details

push time in 2 months

push eventhaoxli/gpuweb

Hao Li

commit sha e49cefb0e78ce4e434db7232e85d01a648bfa9e2

Remove other query types Co-Authored-By: Kai Ninomiya <kainino1@gmail.com>

view details

push time in 2 months

issue commentgpuweb/gpuweb

Investigation: Query API

I also checked that it seems only Vulkan support specifying a stage of the pipeline in vkCmdWriteTimestamp.

Add a mechanism for getting a GPUCommandBuffer start and end time that's unrelated to queries.

As mentioned above, on Metal, GPUStartTime and GPUEndTime are internal attributes in command buffer which always can be read when commands are completed. On D3D12 and Vulkan, we need to use timestamp query to get the timestamp at the beginning and end of command buffer. Maybe we also need a flag in encoders to turn the timing mechanism on, otherwise need enable the mechanism for all command buffers.

haoxli

comment created time in 2 months

pull request commentgpuweb/gpuweb

Add Query API

I will splite this PR to small ones to make it merged easily. First for occlusion query: https://github.com/gpuweb/gpuweb/pull/656

haoxli

comment created time in 2 months

Pull request review commentgpuweb/gpuweb

Add Query API

 dictionary GPUFenceDescriptor : GPUObjectDescriptorBase { </script>  +Queries {#queries}+================++## QuerySet ## {#queryset}++<script type=idl>+interface GPUQuerySet {+    GPUQueryType getQueryType();+    GPUSize32 getQueryCount();++    void destroy();

I'm not sure if I understood correctly, QuerySet should be allocated in a continuous memory on GPU like buffers for storing query data, so I think it need the destory() method.

haoxli

comment created time in 2 months

Pull request review commentgpuweb/gpuweb

Add Query API

 dictionary GPUFenceDescriptor : GPUObjectDescriptorBase { </script>  +Queries {#queries}+================++## QuerySet ## {#queryset}++<script type=idl>+interface GPUQuerySet {+    GPUQueryType getQueryType();+    GPUSize32 getQueryCount();

I've removed the getters, will add them as internal attributes in spec. At beginning I want to get the size of query data by this method when resolving, instead of passing in a actual number.

haoxli

comment created time in 2 months

Pull request review commentgpuweb/gpuweb

Add Query API

 dictionary GPUFenceDescriptor : GPUObjectDescriptorBase { </script>  +Queries {#queries}+================++## QuerySet ## {#queryset}++<script type=idl>+interface GPUQuerySet {+    GPUQueryType getQueryType();+    GPUSize32 getQueryCount();++    void destroy();+};+GPUQuerySet includes GPUObjectBase;+</script>++### Creation ### {#queryset-creation}++<script type=idl>+dictionary GPUQuerySetDescriptor : GPUObjectDescriptorBase {+    required GPUQueryType type;+    required GPUSize32 count;+};+</script>++## QueryType ## {#querytype}++<script type=idl>+enum GPUQueryType {+    "occlusion",+    "pipelinestatistics",

Done

haoxli

comment created time in 2 months

Pull request review commentgpuweb/gpuweb

Add Query API

 interface GPURenderPassEncoder {     void setBlendColor(GPUColor color);     void setStencilReference(GPUStencilValue reference); +    void beginOcclusionQuery(GPUSize32 queryIndex, boolean preciseOcclusion);

Done

haoxli

comment created time in 2 months

push eventhaoxli/gpuweb

Li, Hao

commit sha 2092f73c0eefab364b8562807efb63c6c1f96080

Query API: Add Occlusion Query Add the definition of Occlusion Query and its requirements. * Add precise occlusion query as extension. * Define QuerySet, QuerySetDescriptor and QueryType. * Add query set in render pass descriptor for occlusion query. * Add begin/endOcclusionQuery on render pass encoder. * Resolve query result from query set on command encoder. * Add a new buffer usage for resolving query result.

view details

push time in 2 months

PR opened gpuweb/gpuweb

Query API: Add Occlusion Query

Add the definition of Occlusion Query and its requirements

  • Add precise occlusion query as extension.
  • Define QuerySet, QuerySetDescriptor and QueryType.
  • Add query set in render pass descriptor for occlusion query.
  • Add begin/endOcclusionQuery on render pass encoder.
  • Resolve query result from query set on command encoder.
  • Add a new buffer usage for resolving query result.
+57 -10

0 comment

1 changed file

pr created time in 2 months

create barnchhaoxli/gpuweb

branch : query-api-occlusion

created branch time in 2 months

push eventhaoxli/gpuweb

dan sinclair

commit sha 7e08b4bb02f900f1294841f7a24b516d40ebbee6

[wgsl] Fixup VertexIdx SPIR-V example. (#651) The VertexIdx was incorrectly set to convert to the VertexId instead of the VertexIndex SPIR-V decoration.

view details

push time in 2 months

push eventhaoxli/gpuweb

Dzmitry Malyshau

commit sha fc49fe6dccc3245c92aa9b70516afe5ec21d4811

Rename bindings to entries for consistency (#611)

view details

dan sinclair

commit sha a7b43af7078c353349018f08cdc880c2537d3cfa

Remove push constant from WGSL. (#615) It was decided that push constants will not be in WebGPU at this point. Remove references from the WGSL spec. Fixes #612

view details

Dzmitry Malyshau

commit sha d097ec095b4ecaeed84d5116403fb43a93bc5909

[wgsl] intrinsic and derivative expression (#604) Fixes #603

view details

Kai Ninomiya

commit sha 7f4b88a205760e1d1e1160a383e40f6802412c00

Spec bytesPerRow and rowsPerImage (#608) These have been renamed from rowPitch and imageHeight. Also touches some surrounding material/definitions.

view details

Kai Ninomiya

commit sha 7531dda46a6fa81b37d804a0c1486d095426aeea

Merge arrayLayerCount into size.depth (#613)

view details

Kai Ninomiya

commit sha 041fd0212f03f6f789cc0b45c45215612f7ae45c

Synchronization section: editing nits (#607)

view details

Dzmitry Malyshau

commit sha 0a48816412b5d08a5fb8b89005e019165a1a2c63

Reorder fields of GPUBindGroupLayoutEntry (#618)

view details

Mehmet Oguz Derin

commit sha 2e86cff113ef035499df95b39b33bef1439c4f5a

Update index.bs (#628) Fix loop example

view details

Jeff Gilbert

commit sha 2ce1971a4617cff5287a472763cfd4cd261af1cf

Bump python to 3.7 for Travis. (#633) * Bump python to 3.7 for Travis. Bikeshed now requires 3.7. * fix extract-idl.py for python3 Co-authored-by: Kai Ninomiya <kainino@chromium.org>

view details

Jiawei Shao

commit sha 29138ee8aeaee31716d29f3c365e545ad227cb86

Add definitions about copy commands with textures (#623) * Define multiple concepts about copy commands with textures This patch adds several definitions that are required in the validation rules of copy commands with textures. - the internal slots of a GPUTexture - texel block - texel block size - texel block width - texel block height - the physical size of a texture subresource * Address reviewer's feedbacks * Format the example in the "physical size" section * Address more comments from reviewers

view details

Jeff Gilbert

commit sha c4b3bca57362b751d0ad65c98b857767e26cbbac

Spec how canvas compositing happens with SwapChains. (#627) * Spec how canvas compositing happens with SwapChains. * Update spec/index.bs Co-Authored-By: Kai Ninomiya <kainino1@gmail.com> Co-authored-by: Kai Ninomiya <kainino1@gmail.com>

view details

Mehmet Oguz Derin

commit sha 78fef8e798604d2e96fd6a78184f192d828c8646

Match VERTEX declaration to the rest of the definitions #634 Matches VERTEX declaration to the rest of the definitions

view details

Mehmet Oguz Derin

commit sha c1252857d9e3eec2c6ac582c26617db0eed40851

Convert SEMICLON to SEMICOLON #635

view details

Dzmitry Malyshau

commit sha 2c7f2621916e347f2dfa8a4fc651b5787218404d

Change threading uniforms to be uint (#636)

view details

Mehmet Oguz Derin

commit sha 22e38b0af2535715e0a8691b7d55935f669746b4

Conform for loop example to the statements syntax #638 As per the definition of statements, the statement given inside the loop needs to be terminated by a semicolon. This pull request aims to match example to the definition.

view details

Jeff Gilbert

commit sha 5048a91354dcfb80df8d136b4c6de13297ff9c1c

Convert 0xa0 to space in wgsl/index.bs. (#641) It looks like this was the only non-ASCII, at least.

view details

Kai Ninomiya

commit sha d00a99c5382f13405ee8e5465654aaf738a6b55e

Minor toplevel copyediting (#637)

view details

push time in 2 months

issue commentgpuweb/gpuweb

Investigation: Query API

iOS has had GPUStartTime and GPUEndTime since iOS 10.3. I think we should modify the existing timestamp queries to be able to be implemented via these methods so WebGPU code would work out-of-the-box on iOS.

Maybe the timestamp query extension could be unsupported on iOS, or a different one designed to reflect the timestamp on the command buffer?

GPUStartTime and GPUEndTime are very different from timestamp query. They are command buffer properties and does not need those query operations, and retrieved from CPU memory directly when the commands are completed. So it's better to have a separated interface on command buffer to get the timestamps. On Metal(iOS and macOS), they are queried in the addCompletedHandler: callback, or after a call to the waitUntilCompleted method. On D3D12 and Vulkan, we can insert timestamp queries at the beginning and end of command encoder, and read the timestamps back from resolved buffer when the commands are compeleted (after a call to ID3D12Fence or vkQueueWaitIdle).

haoxli

comment created time in 2 months

issue commentgpuweb/gpuweb

Investigation: Query API

This would also mean timestamp queries could have a single writeTimestamp method instead of begin/end. The proposal explains that we could return time deltas, but imho the application can do a subtraction themselves, and if we really need to, we normalize the timestamps so they don't leak device-specific information.

In addition to eliminating the difference in timestamps across platforms, another reason to add begin/end for timestmaps is to make the three types of queries share the same set of interfaces as much as possible. We thought about using a single method like writeTimestamp in timestamp query, this has the advantage that the user can calculate the delta between any timestamp. We can nomalize the results by storing the difference of each timestamp with the first one, for example: <table> <tr align="center"> <td>Actual Timestamps</td> <td>10000</td> <td>10002</td> <td>10005</td> <td>9999</td> <td>10003</td> </tr> <tr align="center"> <td>Resolved into buffer</td> <td>0</td> <td>2</td> <td>5</td> <td>0</td> <td>4</td> </tr> </table> If a timestamp is reset, we will store it as 0, the next timestamp will be computed based on the reset one.

But there are also some problems, such as:

  1. We need to investigate if this can achieved by the compute shader.
  2. Users need to make sure that there is no zero value between the two timestamps when calculating the delta. For example, if we calculate the delta of the second value and fifth value in above table, even if the result is positive, it is invalid.
haoxli

comment created time in 2 months

PR opened gpuweb/gpuweb

Add Query API

Add the definition of Query API and its requirements according to gpuweb#614.

  • Add extensions.
  • Define QuerySet, QuerySetDescriptor and QueryType.
  • Create QuerySet on GPU device.
  • Set query set in render pass descriptor for occlusion query.
  • Add separated begin/end method for occlusion query on render pass encoder.
  • Add begin/end method for pipeline statistics and timestamp queries on render pass encoder and compute pass encoder.
  • Resolve query result from query set on command encoder.
  • Add a new buffer usage for resolving query result.
+68 -10

0 comment

1 changed file

pr created time in 2 months

push eventhaoxli/gpuweb

Li, Hao

commit sha 8c5ead9f2bb91c7e930d45fc09bed092bb80c7db

Add Query API Add the definition of Query API and its requirements according to gpuweb#614. * Add extensions. * Define QuerySet, QuerySetDescriptor and QueryType. * Create QuerySet on GPU device. * Set query set in render pass descriptor for occlusion query. * Add separated begin/end method for occlusion query on render pass encoder. * Add begin/end method for pipeline statistics and timestamp queries on render pass encoder and compute pass encoder. * Resolve query result from query set on command encoder. * Add a new buffer usage for resolving query result.

view details

push time in 2 months

create barnchhaoxli/gpuweb

branch : query-api-idl

created branch time in 2 months

fork haoxli/gpuweb

Where the GPU for the Web work happens!

https://webgpu.io

fork in 2 months

issue openedgpuweb/gpuweb

Investigation: Query API

Motivation

Modern graphics APIs have their query mechanism to get the information about the processing of a sequence of commands on GPU, and mainly support three types:

  • Occlusion Query: Count the number of samples passed depth/stencil testing or whether samples passed the testing. This feature is used to determine visibility or even measure the area of geometry, such as predicated rendering (#551).
  • Pipeline Statistics Query: Count various aspects of the operation of graphics or compute pipelines, such as the number of vertex shader invocations, the number of primitives processed by the clip stage, etc.. We can use these statistics informations to get a measure of relative complexity of different parts of application, which could help to find bottlenecks while performance tuning.
  • Timestamp Query: Get timestamps generated by device. It can be used to measure the execution time of commands on GPU while performance tuning.

We expect to have such a mechanism to get these informations on WebGPU, here is the investigation about the support of these queries on D3D12, Metal and Vulkan.

Native APIs

Native APIs Support

Query Types D3D12 Metal Vulkan
Occlusion Supported macOS 10.11+ <br> iOS 8+ Binary Occlusion: supported <br> Precise Occlusion: <br> VkPhysicalDeviceFeatures.occlusionQueryPrecise == true <br> Device Coverage: 98.9% Windows, 97.3% Linux, 10.6% Android
Pipeline Statistics Supported macOS 10.15+ <br> No iOS VkPhysicalDeviceFeatures.pipelineStatisticsQuery == true <br> Device Coverage: 99.5% Windows, 99.5% Linux, 58.7% Android
Timestamp Supported macOS 10.15+ Supported
  • Binary occlusion query is supported in all native APIs, precise occlusion query and pipeline statistics query are optional features on Vulkan which need to be enabled at device creation time.
  • Pipeline statistics and timestamp queries are not available on Metal until macOS 10.15+.
  • On iOS 10.3+, it starts to support GPU time (GPUStartTime and GPUEndTime) but only for the whole command buffer.
  • So we can expose the binary occlusion query as a core feature, other queries as extensions.

Query Object

Query object is a collection of a specific number of queries of a particular type.

The query objects on native APIs are created with descriptor (D3D12_QUERY_HEAP_DESC, MTLCounterSampleBufferDescriptor, VkQueryPoolCreateInfo) which specify query type and query count, expect for visibilityResultBuffer on Metal, which is a MTLBuffer and set in render pass descriptor when the render pass is creating.

The query objects are passed as an argument to query operations and need to be destroyed like Vulkan did.

Query Types

Query Types D3D12 Metal Vulkan
Occlusion D3D12_QUERY_HEAP_TYPE<br>_OCCLUSION MTLVisibilityResultMode VK_QUERY_TYPE_OCC<br>LUSION
Pipeline Statistics D3D12_QUERY_HEAP_TYPE<br>_PIPELINE_STATISTICS MTLCommonCounterSetStatistic VK_QUERY_TYPE_PIP<br>ELINE_STATISTICS
Timestamp D3D12_QUERY_HEAP_TYPE<br>_TIMESTAMP MTLCommonCounterSetTimestamp VK_QUERY_TYPE_TIM<br>ESTAMP

On Metal, it has no query type but uses MTLVisibilityResultMode for occlusion query, and stores query results in a MTLBuffer directly. Other queries have their types for creating query objects on each backend.

Query Operations

<table> <tr> <th rowspan="2">Query Types</th> <th colspan="2">D3D12</th> <th colspan="2">Metal</th> <th colspan="3">Vulkan</th> </tr> <tr> <td><a href="https://docs.microsoft.com/en-us/windows/desktop/api/d3d12/nf-d3d12-id3d12graphicscommandlist-beginquery">Begin<br>Query</a></td> <td><a href="https://docs.microsoft.com/en-us/windows/desktop/api/d3d12/nf-d3d12-id3d12graphicscommandlist-endquery">EndQ<br>uery</a></td> <td><a href="https://developer.apple.com/documentation/metal/mtlrendercommandencoder/1515556-setvisibilityresultmode">setVisib<br>ilityResu<br>ltMode</a></td> <td><a href="https://developer.apple.com/search/?q=sampleCountersInBuffer">sampleC<br>ounters<br>InBuffer</a></td> <td><a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#vkCmdBeginQuery">vkCmdBe<br>ginQuery</a></td> <td><a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#vkCmdEndQuery">vkCmdE<br>ndQuery</a></td> <td><a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#vkCmdWriteTimestamp">vkCmdW<br>riteTimes<br>tamp</a></td> </tr> <tr align="center"> <td>Occlusion</td> <td>√</td> <td>√</td> <td>√</td> <td> </td> <td>√</td> <td>√</td> <td> </td> </tr> <tr align="center"> <td>Pipeline<br>Statistic</td> <td>√</td> <td>√</td> <td> </td> <td>√</td> <td>√</td> <td>√</td> <td> </td> </tr> <tr align="center"> <td>Timestamp</td> <td> </td> <td>√</td> <td> </td> <td>√</td> <td> </td> <td> </td> <td>√</td> </tr> </table>

Occlusion Query:

  • On Metal, it calls a separate API named setVisibiltyResultMode with Boolean/Disabled to begin/end binary occlusion query (Counting for precise occlusion query).
  • D3D12 and Vulkan have begin and end operations. D3D12 controls binary/precise queries when calling BeginQuery with query type of D3D12_QUERY_TYPE_BINARY_OCCLUSION or D3D12_QUERY_TYPE_OCCLUSION, Vulkan controls them via call vkCmdBeginQuery with control flags.

Pipeline Statistics Query:

  • On Metal, it performs pipeline statistics query by calling a new API on macOS 10.15+ named sampleCountersInBuffer, it does NOT begin and end statistics in a range of commands like D3D12 and Vulkan, but does statistics from the beginning of render (or compute or blit) encoder to where sampleCountersInBuffer is called.
  • To implement pipeline statistics query on Metal, we can call sampleCountersInBuffer twice (one for Begin() and other for End()) inside a render (or compute or blit) encoder, and stores the difference of the two query results in the result buffer.

Timestamp Query:

  • Unlike occlusion and pipeline statistics queries, timestamp query does NOT operate over a range, but writes timestamps generated by device to query objects.
  • The meanings of the timestamp results queried from native APIs are not clear. Timestamps are different on D3D12 (GPU ticks), Metal (nanoseconds) and Vulkan (nanoseconds), and not all timestamps can be converted to specific dates, which is platform dependent.
  • So it’s better to have begin/end operations for timestamp query for exposing time delta instead of timestamp which may be more useful.

These operations on native APIs have different scopes:

Query Types D3D12 Metal Vulkan
Occlusion Inside or outside render pass on Direct Command List Inside render encoder Inside or outside render pass on Graphics Queue
Pipeline<br>Statistic Inside or outside render pass on Direct Command List Inside render/compute/blit encoders Inside or outside render pass on Graphics and Compute Queues
Timestamp Inside or outside render pass on Direct nad Compute Command Lists Inside render/compute/blit encoders Inside or outside render pass on Graphics and Compute Queues

Pipeline statistics query is only supported on Direct Command List on D3D12, but ID3D12GraphicsCommandList::Dispatch() can execute commands in a compute shader.

Resolve Query Results

Query Types D3D12 Metal Vulkan
Resolve APIs ResolveQueryData resolveCounters vkGetQueryPoolResults<br>vkCmdCopyQueryPoolResults
Binary Occlusion Result Binary 0/1 resolved into a buffer Non-zero or zero integer stored in buffer Non-zero or zero integer resolved into a buffer
Precise Occlusion Result The number of samples passed depth and stencil tests The number of samples passed depth and stencil tests The number of samples passed scissor, exclusive scissor, sample mask, alpha to coverage, stencil, and depth tests
Pipeline Statistics Result <a href="https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_query_data_pipeline_statistics">D3D12_QUERY_DATA_<br>PIPELINE_STATISTICS</a> <a href="https://developer.apple.com/documentation/metal/mtlcounterresultstatistic?language=objc">MTLCounterResult<br>Statistic</a> <a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#queries-pipestats">VkQueryPipelineStatistic<br>FlagBits</a>
Timestamp Result GPU Ticks resolved into a buffer.<br> Timestamp (in ns) =<br>Timestamp(in ticks) * 10<sup>9</sup>/<br>ID3D12CommandQueue::<br><a href="https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12commandqueue-gettimestampfrequency">GetTimestampFrequency()</a> Nanoseconds resolved into a buffer Nanoseconds resolved into a buffer
  • All native APIs support resolving the results from query objects to a buffer memory, the destination buffer can be accessed by following pipeline, such as using as conditional for the predicated rendering.
  • Resolve operation must be outside render pass on D3D12 and Vulkan or render/compute encoder on Metal.
  • The state or usage of the destination buffer must be COPY_DEST on D3D12, MTLStorageModeShared or MTLStorageModePrivate on Metal, UNIFORM_BUFFER and TRANSFER_DST on Vulkan.
  • The offset in destination buffer must be a multiple of 8 bytes on D3D12 and Vulkan (resolve results as 64-bit).
  • For occlusion query, Vulkan specifies more tests in its spec, but these tests will also affect the occlusion results on D3D12 and Metal. If the depth/stencil tests are disabled, then the results is simply the area of the rasterized primitives.
  • The query results are resolved as 32-bit or 64-bit unsigned integers with flag on Vulkan, and always resolved as 64-bit unsigned integers on D3D12 and Metal.
  • We cannot return the results buffer directly because we need to perform post-processing by compute shader after getting the raw query results from the native APIs.
    • Unify the results of binary occlusion queries.
    • Compute the counters in pipeline statistics results which are different on three native APIs, we prefer to expose the common parts of them.
    • Compute the difference of the two timestamp queries. The time delta may be negative due to the timestamp counter may be reset after a long time on some platforms. We can suggest users to skip the invalid time delta if it’s negative.

Proposal

Extensions

Add precise occlusion, pipeline statistics and timestamp queries in GPUExtensionName.

QuerySet

  • Define QuerySet instead of individual query objects because Query objects (or sample buffers on Metal) can be allocated in a continuous part of memory.
  • Create and destroy QuerySet on GPUDevice.
  • Set query set in GPURenderPassDescriptor for occlusion query due to Metal requires visibilityResultsBuffer in MTLRenderPassDescriptor at render pass creation time.

Begin/End Query

  • Occlusion query only supports begin/end on render pass encoder without passing query set which has been set in render pass descriptor.
  • Pipeline statistics and timestamp queries support begin/end on both render pass encoder and compute pass encoder.
  • We may need to perform different types of queries in the same render/compute pass encoder, so it’s better to pass a query set in beginQuery/endQuery for pipeline statistics and timestamp queries.

Resolve Query

Retrieve query results from query set, users can read the results from buffer memory or consume the result buffer directly.

  • Queries results are resolved into GPU buffer:
Query Types Resolved Results
Binary Occlusion 0/1
Precise Occlusion The number of samples passed depth/stencil tests.
Pipeline Statistics The number of<br>vertex shader invocations,<br>primitives processed by the clip stage,<br> primitives output by the clip stage,<br> fragment shader invocations,<br> compute shader invocations.
Timestamp Time delta in nanoseconds.<br>0 for invalid results which need to be skipped.
  • All results in the GPU buffer are stored in a type of GPUSize64. The offset must be a multiple of 8 bytes.
  • Add a new GPUBuffer usage for resolving queries, which avoid to expose more detailed information about buffer usage, and can be reused in predicated or conditional rendering.

created time in 2 months

more