profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/antiagainst/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Lei Zhang antiagainst @google Toronto https://www.lei.chat AI Frameworks & Compilers. Now: Vulkan compute, IREE, MLIR. Previous: Vulkan graphics, SPIR-V toolchain.

antiagainst/codeclimate-cppcheck 23

Code Climate Engine for Cppcheck

antiagainst/DirectXShaderCompiler 4

This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.

antiagainst/graphics-concepts 3

Graphics concepts

antiagainst/klee-build-scripts 3

Build scripts for KLEE

antiagainst/antiagainst.github.io 2

Generated website for my personal blog

antiagainst/dotfiles 2

Dev environment configuration files

antiagainst/amber 1

Amber is a multi-API shader test framework

antiagainst/clspv 1

Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders

antiagainst/grub2-themes 1

Grub2 gfxmenu themes

issue commentantiagainst/antiagainst.github.io

GPGPU, ML Inference, and Vulkan Compute | Lei.Chat()

@monkeyking: I meant how much performance (e.g., latency) variation we are expecting for similar workloads; particularly due to driver stack overhead. It's quite important for real time and resource constrained scenarios.

"Novel Methodologies for Predictable CPU-To-GPU Command Offloading" is a nice paper that offers great analysis on this front. It compares submission and execution predictability among CUDA, OpenCL, and Vulkan. It also dives into the driver to explain the reasons. You might want to check it out. :)

utterances-bot

comment created time in a day

create barnchantiagainst/Dash-User-Contributions

branch : vulkan-1.2.193

created branch time in a day

push eventllvm/llvm-project

Lei Zhang

commit sha b45476c94ce8ea94e2ad4d93ceda00eb4078e682

[mlir][tosa] Do not fold transpose with quantized types For such cases, the type of the constant DenseElementsAttr is different from the transpose op return type. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D110446

view details

push time in 3 days

push eventllvm/llvm-project

Lei Zhang

commit sha e325ebb9c70bbdd48866926a42d4c4373b832035

[mlir][tosa] Add some transpose folders * If the input is a constant splat value, we just need to reshape it. * If the input is a general constant with one user, we can also constant fold it, without bloating the IR. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D110439

view details

push time in 3 days

Pull request review commentgoogle/iree

Discuss compilation and runtime settings in best practices doc.

 sample for further guidance on using dynamic shapes.  ## Practices for compilation settings -TODO: mention parameters to tune- TODO: which compiler targets to use (try both CUDA and Vulkan?)  TODO: use the most specific LLVM target triple you can? -## Practices for runtime use+### Tuning compilation heuristics

Sorry for the late reply!

As discussed in our 1:1, it would be nice to categorize stuff a bit:

  1. Mention long term we want to support attaching attributes to the input model to allow autotuning. This should be the prefered way and there are work undergoing to make it happen.
  2. CL options that supply a number. These are really knobs that can be tuned. We should mention preferably they should also be folded into the above category when possible. But there can exist outliers. So case by case.
  3. CL options for experimental features. We should mention these should all be removed and making it as the default.
ScottTodd

comment created time in 3 days

PullRequestReviewEvent

Pull request review commentgoogle/iree

Discuss compilation and runtime settings in best practices doc.

 sample for further guidance on using dynamic shapes.  ## Practices for compilation settings -TODO: mention parameters to tune- TODO: which compiler targets to use (try both CUDA and Vulkan?)  TODO: use the most specific LLVM target triple you can? -## Practices for runtime use+### Tuning compilation heuristics -### Do the minimum amount of work: cache queries and reuse buffers+IREE runs its own suite of benchmarks continuously using the definitions at+https://github.com/google/iree/tree/main/benchmarks. The flags set for these+benchmarks represent the latest manually tuned values for workloads we track+closely and referencing them may help with your own search for peak performance. -Try to front-load queries, particularly queries using strings that look up into-maps like `iree_runtime_session_call_by_name`, so that hot sections of code are-doing the minimum amount of work: routing inputs through buffers, scheduling-runtime calls, and routing outputs through other buffers.+Here is a non-exhaustive list of flags which can be tuned when compiling+through the `iree-translate` tool, while full documentation can be found in the+project source code for each flag:++* `--iree-flow-inline-constants-max-byte-length=[integer]`: Maximum byte-length of constants+that can be inlined into dispatch regions. We find values around 16 work best+for when running on a GPU and values around 2048 work best on CPUs.+* `--iree-llvm-loop-unrolling=true`: This flag and other flags like it in+[LLVMTargetOptions.cpp](https://github.com/google/iree/blob/main/iree/compiler/Dialect/HAL/Target/LLVM/LLVMTargetOptions.cpp)+turn on specific optimizations within LLVM when targeting CPUs.+* `--iree-flow-dispatch-formation-enable-operand-fusion=true`: Enable fusing operand+producers during dispatch region formation.+* `--iree-enable-fusion-with-reduction-ops=true`: Allow fusing generic ops with+reductions.++## Practices for runtime use  TODO: sample code, profile numbers++### Tuning runtime settings++When running on the CPU, the task system flags specified in+[iree/task/api.c](https://github.com/google/iree/blob/main/iree/task/api.c)+give control over how worker threads will be created. For example, the+`--task_topology_group_count=3` flag can be set to explicitly run on three+workers rather than rely on heuristic selection that defaults to one worker+per detected physical core.++If running on a single thread or system with no threading support, the+`dylib-sync` HAL driver can be used instead of the more generic `dylib` HAL+driver. The synchronous driver performs execution inline rather than through+IREE's task scheduling system.+

It would be nice to also mention for GPU one would need to choose the matching triple to get best perf. (Can defer to a link to the main GPU page. I'll update there later.)

ScottTodd

comment created time in 3 days

PullRequestReviewEvent

push eventgoogle/iree

Lei Zhang

commit sha c8f10a4ac2c637258979b8ed74c2178c4e95ae62

Adjust mobile GPU model similarity thresholds (#7161) These models run much faster now; we are seeing less fluctuating benchmark numbers.

view details

push time in 4 days

PR merged google/iree

Reviewers
Adjust mobile GPU model similarity thresholds

These models run much faster now; we are seeing less fluctuating benchmark numbers.

+3 -3

1 comment

1 changed file

antiagainst

pr closed time in 4 days

pull request commentgoogle/iree

Adjust mobile GPU model similarity thresholds

We are seeing more stable numbers for many commits thus far. For example

  • https://perf.iree.dev/serie?IREE?PoseNet%20[fp32]%20(TFLite)%20full-inference%20with%20IREE-Vulkan%20@%20SM-G980F%20(GPU-Mali-G77)
  • https://perf.iree.dev/serie?IREE?PoseNet%20[fp32]%20(TFLite)%20full-inference%20with%20IREE-Vulkan%20@%20SM-G980F%20(GPU-Mali-G77)
  • https://perf.iree.dev/serie?IREE?DeepLabV3%20[fp32]%20(TFLite)%20kernel-execution%20with%20IREE-Vulkan%20@%20SM-G980F%20(GPU-Mali-G77)
antiagainst

comment created time in 4 days

PR opened google/iree

Reviewers
Adjust mobile GPU model similarity thresholds

These models run much faster now; we are seeing less fluctuating benchmark numbers.

+3 -3

0 comment

1 changed file

pr created time in 4 days

create barnchantiagainst/iree

branch : change-threshold

created branch time in 4 days

push eventantiagainst/llvm-project

Lei Zhang

commit sha 3ea0d3cdfdf97d0a5e700c7310b089478a2e591d

Use parallel/windo/reduction terms

view details

push time in 4 days

push eventantiagainst/llvm-project

Lei Zhang

commit sha 0b452c6af81698d88156c23deb8dbb0efb0914b4

[mlir][linalg] Add patterns to vectorize convolution ops

view details

push time in 4 days

push eventantiagainst/iree

Lei Zhang

commit sha 62c0e13d139b1fe4e9c125b7bd303afead7efe38

Disable trace captures

view details

push time in 4 days

push eventantiagainst/iree

Lei Zhang

commit sha e50cc2f12324fcb8ea87a935e1440819a76d7994

Disable trace captures

view details

push time in 4 days

push eventantiagainst/llvm-project

Nikita Popov

commit sha dd0226561e86e491f77464b1d3afe5bb53a2c54e

[IR] Add helper to convert offset to GEP indices We implement logic to convert a byte offset into a sequence of GEP indices for that offset in a number of places. This patch adds a DataLayout::getGEPIndicesForOffset() method, which implements the core logic. I've updated SROA, ConstantFolding and InstCombine to use it, and there's a few more places where it looks relevant. Differential Revision: https://reviews.llvm.org/D110043

view details

Arthur Eubanks

commit sha b64fdaa86b5b35fa982dd1f41d32b37a9d5208b6

[gn build] Don't pass -Wl,-z,defs for sanitizer builds -Wl,-z,defs doesn't work with sanitizers. See https://clang.llvm.org/docs/AddressSanitizer.html Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D110086

view details

Alex Langford

commit sha c4a406bbd0fe3afa8366b72c49b1bc494a168624

[lldb][NFC] Remove outdated FIXME

view details

Arthur O'Dwyer

commit sha df81bb71aa452c677984fbeb7c34e8a77ec3e83b

[libc++] [LIBCXX-DEBUG-FIXME] Constexpr char_traits::copy mustn't compare unrelated pointers. Now that __builtin_is_constant_evaluated() is present on all supported compilers, we can use it to skip the UB-inducing assert in cases where the computation might be happening at constexpr time. Differential Revision: https://reviews.llvm.org/D101674

view details

Arthur O'Dwyer

commit sha d5db71d19f11d7c31257066aea6bd41ef04f28b7

[libc++] [P0919] Some belated review on D87171. - Simplify the structure of the new tests. - Test const containers as well as non-const containers, since it's easy to do so. - Remove redundant enable-iffing of helper structs' member functions. (They're not instantiated unless they're called, and who would call them?) - Fix indentation and use more consistent SFINAE method in <unordered_map>. - Add _LIBCPP_INLINE_VISIBILITY on some swap functions. Differential Revision: https://reviews.llvm.org/D109011

view details

Craig Topper

commit sha 792101fff749191dfd4dadabe2ecd30a4d8cd973

[RISCV] Add test cases for missed opportunity to use vfmacc.vf. NFC This is another case of a splat being in another basic block preventing SelectionDAG from optimizing it.

view details

Craig Topper

commit sha a95ba8107359e17cb1669c01f416fd2723a23126

[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA. If either of the multiplicands is a splat, we can sink it to use vfmacc.vf or similar.

view details

Nico Weber

commit sha 55f0b337087136554122f942fea951a357bc4a49

[cmake] Put check from D110016 behind (default-on) flag See discussion on https://reviews.llvm.org/D110016 for details.

view details

Nico Weber

commit sha 9197834535364efff505580ef940ad41cd293275

Revert "Fix CLANG_ENABLE_STATIC_ANALYZER=OFF building all analyzer source" This reverts commit 6d7b3d6b3a8dbd62650b6c3dae1fe904a8ae9048. Breaks running cmake with `-DCLANG_ENABLE_STATIC_ANALYZER=OFF` without turning off CLANG_TIDY_ENABLE_STATIC_ANALYZER. See comments on https://reviews.llvm.org/D109611 for details.

view details

Paul Robinson

commit sha fa822a2ee52f8243d29eb035d7002a9ab40788a0

[DebugInfo] Add test for dumping DW_AT_defaulted

view details

Craig Topper

commit sha c6e52b1e85c6d633bda0e268fed16487fea084d1

[RISCV] Add test cases for missed opportunities to use vand/vor/vxor.vx. NFC These are cases were the splat is in another basic block. CGP needs to sink it to expose the opportunity to SelectionDAG.

view details

Florian Mayer

commit sha 16b5f4502c5b58c7f70afa8e1e1e33d170ba6089

[NFC] [hwasan] Separate outline and inline instrumentation. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D110067

view details

Nikita Popov

commit sha 53720f74e4e32fe11a1688282f7d09dc1828b83a

[Polly] Partially fix scoped alias metadata This partially addresses the verifier failures caused by D110026. In particular, it does not fix the "second level" alias metadata.

view details

Shilei Tian

commit sha 49e976c9343253956a7de93f1d982537f9c240ab

[OpenMP][NVPTX] Fix a warning that data argument not used by format string Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D110104

view details

Saleem Abdulrasool

commit sha 96d3319d6f024b17ac725d9595548acc4787003c

Sema: relax va_start checking further for Windows AArch64 When building in C mode, the VC runtime assumes that it can use pointer aliasing through `char *` for the parameter to `__va_start`. Relax the checks further. In theory we could keep the tests strict for non-system header code, but this takes the less strict approach as the additional check doesn't particularly end up being too much more helpful for correctness. The C++ type system is a bit stricter and requires the explicit cast which we continue to verify.

view details

Amara Emerson

commit sha f9d69a0ab02567933302602238264a38468f9900

[GlobalISel] Implement support for the "trap-func-name" attribute. This attribute calls a function instead of emitting a trap instruction. Differential Revision: https://reviews.llvm.org/D110098

view details

Jacob Lambert

commit sha dc6e8dfdfe7efecfda318d43a06fae18b40eb498

[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.

view details

natashaknk

commit sha 4edf46f72a8f3bd9d60628d0c852e8ff91921673

[mlir][tosa] Remove the documentation requirement for elements of several binary elementwise ops to be of the same rank. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D110095

view details

Nico Weber

commit sha f11917057923bce7f9c04282b4a3b15ef0aad0d6

[clang] Fix a few comment typos to cycle bots

view details

natashaknk

commit sha 38ff7e11c04e760570e3cb517f8b78d554c65386

[mlir][tosa] Add several binary elementwise to the list of broadcastable ops. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D110096

view details

push time in 4 days

Pull request review commentgoogle/iree

[spirv] Fix matmul vectorization corner cases

 namespace detail { LogicalResult setMatmulOpConfig(linalg::LinalgOp op,                                 std::array<int64_t, 2> bestWorkgroupSizeXY,                                 std::array<int64_t, 3> bestThreadTileSizeMNK) {+  auto lhsType = op.inputs()[0].getType().cast<ShapedType>();+  auto elementBits = lhsType.getElementType().getIntOrFloatBitWidth();+  if (elementBits != 16 && elementBits != 32) return success();

Yes the problem is vectorization. They will still be distributed as it will fallback to another config in the following. The tests also showed that. The default distribute config is not considering multiple parallel dimensions though and that should be improved.

antiagainst

comment created time in 4 days

PullRequestReviewEvent

delete branch antiagainst/iree

delete branch : fix-matmul-config

delete time in 4 days

push eventantiagainst/iree

iree-copybara-bot

commit sha 2c679e470d6a476f00217bbc5d388cdca93eddb3

Integrate LLVM at llvm/llvm-project@ed2f0ad30719 Updates LLVM usage to match [ed2f0ad30719](https://github.com/llvm/llvm-project/commit/ed2f0ad30719) PiperOrigin-RevId: 396688867

view details

iree-copybara-bot

commit sha 21a764d99e2547aecdb6c9bc5e70ec28605377c8

Integrate LLVM at llvm/llvm-project@9111635cb78e Updates LLVM usage to match [9111635cb78e](https://github.com/llvm/llvm-project/commit/9111635cb78e) PiperOrigin-RevId: 396925318

view details

iree-copybara-bot

commit sha c33f23894bad958e6166fd0383e34758b4ad1fcd

Integrate LLVM at llvm/llvm-project@c90cbb2d3455 Updates LLVM usage to match [c90cbb2d3455](https://github.com/llvm/llvm-project/commit/c90cbb2d3455) PiperOrigin-RevId: 396979557

view details

iree-copybara-bot

commit sha 27dca8f3645c8bd05dc66a72918427a30d64ac53

Integrate LLVM at llvm/llvm-project@7acf92943b78 Updates LLVM usage to match [7acf92943b78](https://github.com/llvm/llvm-project/commit/7acf92943b78) PiperOrigin-RevId: 397027144

view details

iree-copybara-bot

commit sha c8d018ae18229fdbbc85ad03872d3c7e43710847

Integrate LLVM at llvm/llvm-project@1613ab8a4a3e Updates LLVM usage to match [1613ab8a4a3e](https://github.com/llvm/llvm-project/commit/1613ab8a4a3e) PiperOrigin-RevId: 397146439

view details

iree-copybara-bot

commit sha 29b545071f2bf86bb5fab31b062cfe9f68a81518

Integrate LLVM at llvm/llvm-project@4c1023b4b790 Updates LLVM usage to match [4c1023b4b790](https://github.com/llvm/llvm-project/commit/4c1023b4b790) PiperOrigin-RevId: 397184602

view details

iree-copybara-bot

commit sha f92733e6bc231d689b1c92443c2540fa6d5b9468

Integrate LLVM at llvm/llvm-project@fc08cfb8884d Updates LLVM usage to match [fc08cfb8884d](https://github.com/llvm/llvm-project/commit/fc08cfb8884d) PiperOrigin-RevId: 397264106

view details

iree-copybara-bot

commit sha a97e8ff4ead1d05bedec92f7b2dd7890dffb4f61

Integrate LLVM at llvm/llvm-project@750d5fc65c92 Updates LLVM usage to match [750d5fc65c92](https://github.com/llvm/llvm-project/commit/750d5fc65c92) PiperOrigin-RevId: 397368903

view details

Submodule Update Action

commit sha 579417c4922a27850404289f7a15aff232f1a0d8

Synchronize submodules with LLVM at llvm/llvm-project@750d5fc65c92

view details

iree-copybara-bot

commit sha dd6c59a8f2a9259977c51071da3f803145f9a5e2

Synchronize submodules with LLVM at llvm/llvm-project@750d5fc65c92 Updates LLVM dependencies to match [750d5fc65c92](https://github.com/llvm/llvm-project/commit/750d5fc65c92). - TensorFlow to [e9fd972a7e83](https://github.com/tensorflow/tensorflow/commit/e9fd972a7e83) - MLIR-HLO to [a0685891c145](https://github.com/tensorflow/mlir-hlo/commit/${MLIR_HLO_SHA?}) `./scripts/git/update_to_llvm_syncpoint.py` Automated submodule bump from .github/workflows/update_llvm_dependent_submodules.yml PiperOrigin-RevId: 397375258

view details

MaheshRavishankar

commit sha d0e589ce90618b92a46c9ce08d3f5d6a1f511d06

Basic handling of multiple generic ops in dispatch regions on CPU path. (#7102) If there are no root operations within a dispatch region, assume all generic ops are tiled + distributed the same way, and can therefore use the same configuration.

view details

iree-copybara-bot

commit sha 507e46a42fab275bcae65ffdf690f8be4a828a7c

Integrate LLVM at llvm/llvm-project@08f0cb77197d Updates LLVM usage to match [08f0cb77197d](https://github.com/llvm/llvm-project/commit/08f0cb77197d) PiperOrigin-RevId: 397416111

view details

Scott Todd

commit sha 9fc46757b942e6f030ded666d012dc701f76800d

Fix tile_pad_and_vectorize.mlir test. PiperOrigin-RevId: 397419479

view details

iree-copybara-bot

commit sha 6757214a69392f1bd4fa90fcb702a3dfde250974

Integrate LLVM at llvm/llvm-project@3b14d80ad4af Updates LLVM usage to match [3b14d80ad4af](https://github.com/llvm/llvm-project/commit/3b14d80ad4af) PiperOrigin-RevId: 397525626

view details

iree-copybara-bot

commit sha 9905c6b1be6722acc718b94e3132b43a5987b07c

Integrate LLVM at llvm/llvm-project@f5b8f1247cd9 Updates LLVM usage to match [f5b8f1247cd9](https://github.com/llvm/llvm-project/commit/f5b8f1247cd9) PiperOrigin-RevId: 397710335

view details

Rob Suderman

commit sha 128ce7a111639261a749286642ecba8c94e9b709

Synchronize submodules with LLVM at llvm/llvm-project@f5b8f1247cd9

view details

Geoffrey Martin-Noble

commit sha a94a06c83f5096dcf721d444b29f8a92c6c4ff43

Merge branch 'google' into main-to-google

view details

Geoffrey Martin-Noble

commit sha 5baeb767915f7f4b0fd55ff41b79db7430e77b5e

Merge pull request #7106 from GMNGeoffrey:main-to-google PiperOrigin-RevId: 397858053

view details

rsuderman

commit sha 34e630d3ba45b7800879affbbdf07c0470130d6d

Merge google -> main #7107 5baeb76 Merge pull request Merge main -> google #7106 from GMNGeoffrey:main-to-google 128ce7a Synchronize submodules with LLVM at llvm/llvm-project@f5b8f12 9905c6b Integrate LLVM at llvm/llvm-project@f5b8f12 6757214 Integrate LLVM at llvm/llvm-project@3b14d80 9fc4675 Fix tile_pad_and_vectorize.mlir test. 507e46a Integrate LLVM at llvm/llvm-project@08f0cb7 dd6c59a Synchronize submodules with LLVM at llvm/llvm-project@750d5fc a97e8ff Integrate LLVM at llvm/llvm-project@750d5fc f92733e Integrate LLVM at llvm/llvm-project@fc08cfb 29b5450 Integrate LLVM at llvm/llvm-project@4c1023b c8d018a Integrate LLVM at llvm/llvm-project@1613ab8 27dca8f Integrate LLVM at llvm/llvm-project@7acf929 c33f238 Integrate LLVM at llvm/llvm-project@c90cbb2 21a764d Integrate LLVM at llvm/llvm-project@9111635 2c679e4 Integrate LLVM at llvm/llvm-project@ed2f0ad

view details

Stella Laurenzo

commit sha 8d3a4dcd10de230fb46ef37586f6d71a566afa6e

[pydm] Add a basic RTL linker pass. (#7089) * Includes build automation to produce the runtime library assembly file that the pass consumes. * Not super complicated but gets the job done (and may be all this ever needs). * Has the RTL builder do basic canonicalization (and verification) before saving.

view details

push time in 4 days

push eventantiagainst/llvm-project

Lei Zhang

commit sha 6d206fafa7d1bed4bb7240cb3509abeaee26d8dc

[mlir][linalg] Add patterns to vectorize convolution ops

view details

push time in 4 days

push eventgoogle/iree

Lei Zhang

commit sha ebf08a7cdbc6bd882fcd9b9639f2fd7e5562af6d

[spirv] Fix matmul vectorization corner cases (#7137) * We don't support non-16/non-32 bit element types yet. * Don't vectorize for odd K sizes. We cannot vector load there.

view details

push time in 5 days

PR merged google/iree

[spirv] Fix matmul vectorization corner cases codegen/spirv
  • We don't support non-16/non-32 bit element types yet.
  • Don't vectorize for odd K sizes. We cannot vector load there.
+170 -1

0 comment

4 changed files

antiagainst

pr closed time in 5 days

PR opened google/iree

Reviewers
[spirv] Fix matmul vectorization corner cases
  • We don't support non-16/non-32 bit element types yet.
  • Don't vectorize for odd K sizes. We cannot vector load there.
+170 -1

0 comment

4 changed files

pr created time in 5 days

push eventantiagainst/iree

Lei Zhang

commit sha d460e5ae4bd4a4604be1cc3039fcd12bac620c38

[spirv] Fix matmul vectorization corner cases * We don't support non-16/non-32 bit element types yet. * Don't vectorize for odd K sizes. We cannot vector load there.

view details

push time in 5 days

push eventantiagainst/iree

Lei Zhang

commit sha ac3bdab6c15ac765ff36f2dec7c9fe238cc25fb9

[spirv] Fix matmul vectorization corner cases * We don't support non-16/non-32 bit element types yet. * Don't vectorize for odd K sizes. We cannot vector load there.

view details

push time in 5 days

create barnchantiagainst/iree

branch : fix-matmul-config

created branch time in 5 days