profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/renjie-liu/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

tensorflow/tflite-support 187

TFLite Support is a toolkit that helps users to develop ML and deploy TFLite models onto mobile / ioT devices.

PullRequestReviewEvent

issue commenttensorflow/tensorflow

Converting Hard Swish activation to TFLite efficiently

you can check out our existing fusion patterns here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/transforms/optimize_patterns.td#L253-L317

see two reasons your ops are not fused:

  • operator order
  • 0.1666666666666 does not match 0.16667
yakovdan

comment created time in 19 days

issue commenttensorflow/tensorflow

Converting Hard Swish activation to TFLite efficiently

I think if you do

tf.nn.relu6(x+3.0) * 0.1666666666666 * x

the fusion should works

yakovdan

comment created time in 20 days

issue commenttensorflow/model-optimization

MLIR-based PTQ have MEAN ops with with different quantization scales for inputs and outputs.

Hi Freedom, https://github.com/tensorflow/tensorflow/commit/8c90a182b8e200a870ddc5dc4fb9d7ee10f04cbe

Can you try bazel run //tensorflow/compiler/mlir/lite/experimental/tac:tac-translate -- <INPUT_MODEL> -o=<OUTPUT_MODEL> --device-specs=NNAPI

And see if it works for you case?

Note we are replacing the mean with a avg_pool -> requantize, that may cause accuracy issue so it would be great to get your help to validate the model.

Thanks!

freedomtan

comment created time in 24 days

issue commenttensorflow/model-optimization

MLIR-based PTQ have MEAN ops with with different quantization scales for inputs and outputs.

Hi Freedom, do you have a tflite model we can debug internally? thanks!

freedomtan

comment created time in 25 days

issue commenttensorflow/tflite-support

Failed to run the tflite model on Interpreter due to Runtime Error

yes, looks like the result is fine.

Kimidoge

comment created time in a month

issue commenttensorflow/tflite-support

Failed to run the tflite model on Interpreter due to Runtime Error

Hi Can you try set the input signature like here? https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/experimental_new_converter/Keras_LSTM_fusion_Codelab.ipynb

And you don't need to resize the input shape.

thanks

Kimidoge

comment created time in a month

issue commenttensorflow/tflite-support

Failed to run the tflite model on Interpreter due to Runtime Error

Hi, can you try set a fixed size input and try again? thanks!

Kimidoge

comment created time in a month

PullRequestReviewEvent

issue commenttensorflow/tensorflow

Default values of mean and standard deviation for TFLite RNN Models.

if you are using float model, you don't need to worry about mean and std dev

pranathibl

comment created time in a month

issue commenttensorflow/tensorflow

Adding some Boosted tree ops to the 'allowed' list

Adding Karim who may have a better idea

Koushik667

comment created time in a month

issue commenttensorflow/tensorflow

LSTM & BiLSTM can't run correct results while batch processing on Mobile

I see, in this case, I suggest there are two options:

  1. run prediction 1 by 1, it should always work
  2. if you really need batch inference, please set the batch_size during model conversion, and run the fixed batch size inference as well
kylechang523

comment created time in a month

startedgooglefonts/compute-shader-101

started time in a month

issue commenttensorflow/tensorflow

LSTM & BiLSTM can't run correct results while batch processing on Mobile

I see, adding YC.

On mobile (assume you're using c++), the equivalent c++ call is : ResetVariableTensors: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/interpreter.h#L624

kylechang523

comment created time in a month

issue commenttensorflow/tensorflow

LSTM & BiLSTM can't run correct results while batch processing on Mobile

Yes, you need to call interpreter.reset_all_variables() after each interpreter.invoke().

It should work for batch inference as well.

kylechang523

comment created time in a month

issue commenttensorflow/tensorflow

Incorrect Results of MatMul on TFLite with GPU Delegate

Meanwhile, can you try TAC: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/lite/experimental/tac

thanks!

Kipsora

comment created time in a month

issue commenttensorflow/tensorflow

Incorrect Results of MatMul on TFLite with GPU Delegate

SG, I will file a bug to track this

Kipsora

comment created time in a month

PullRequestReviewEvent

pull request commenttensorflow/tensorflow

[TFLite] Update MultiplyByQuantizedMultiplier to use single-rounding instead of double-rounding

Sorry about the long hold, I think we can deprecate the AwayFromZero path but only take the Upward path.

The Upward path should be very similar to what you did, and we already allow the rounding difference for the test.

Tessil

comment created time in a month

pull request commenttensorflow/tensorflow

[tflite] add same scale constraint to tflite's mean op

You can consider mean is more like a "smoother" operation, so mean can shrink the output range (a lot).

I agree that maybe let mean have the same constraint for inputs and outputs can work sometime, but I don't think that can work for all cases.

avg_pool may have this constraint since most of the time the window size is not large, mean is more like a global avg_pool so the situation may change.

On the safe side, I think we should keep the current setting for mean, we probably can push NNAPI to fix the issue.

As for the temporary workaround, I think you can either do: reduce_sum then divide the element_count or you can just transform the mean op into a global avg_pool.

freedomtan

comment created time in a month

pull request commenttensorflow/tensorflow

[tflite] add same scale constraint to tflite's mean op

I think there are a few things:

  • It's not guaranteed mean op should have same scale for both inputs & outputs, imagine you have an input tensor [-3.0, 0.0, 3.0], the reduced output will be just [0.0] (the scale shrunk down!)
  • So Ensure the input output have the same scale does not help on keeping accuracy (and it hurt the accuracy)
  • Well, it does not simplify the implementation either: For reduce mean, you will still need to do at least one "division" : sum(elements) / element_count, and you actually fuse the input-output scale withe the element_count, see how we implemented here
  • TfLite CPU kernel supports different scales for mean op.

To sum up, I don't think we should add the constraints to the Mean op for the reasons mentioned above, instead, we should fix on the NNAPI side.

If you need a temporary workaround, I think you should go for reduce_sum then a broadcast_mul with (reciprocal of tensor shape).

freedomtan

comment created time in 2 months

issue commenttensorflow/tensorflow

Does TFLite Support conv + LSTM ?

Yes, it's supported.

pranathibl

comment created time in 2 months

issue commenttensorflow/tensorflow

Identifying the type of input

Sorry I'm confused about the question, what problem did you get?

pranathibl

comment created time in 2 months

PullRequestReviewEvent

issue commenttensorflow/tensorflow

TFLite conversion of LSTM model does not work with multiple batch size

Currently the fusion only works if the input shape is fixed, can you try to set the batch_size to 1? thanks!

kpei

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[TFLite] Update MultiplyByQuantizedMultiplier to use single-rounding instead of double-rounding

 inline void BroadcastSub16POTSlow(const ArithmeticParams& params,   NDOpsHelper<N>(output_desc, sub_func); } -// Element-wise Sub that can often be used for inner loop of broadcast sub as-// well as the non-broadcast sub.-inline void SubElementwise(int size, const ArithmeticParams& params,-                           const uint8_t* input1_data,-                           const uint8_t* input2_data, uint8_t* output_data) {-  TFLITE_DCHECK_GT(params.input1_offset, -256);-  TFLITE_DCHECK_GT(params.input2_offset, -256);-  TFLITE_DCHECK_LT(params.input1_offset, 256);-  TFLITE_DCHECK_LT(params.input2_offset, 256);+template <typename T, int N = 5>+void BroadcastQuantSubSlow(const ArithmeticParams& params,

I think we should do this incrementally: 1 pr to fix the sub/div tests scaling issue and another to update MultiplyByQuantizedMuliplier and adjust for rounding issue

Tessil

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[TFLite] Update MultiplyByQuantizedMultiplier to use single-rounding instead of double-rounding

 inline int32_t MultiplyByQuantizedMultiplier(int64_t x,   assert(x >= -(static_cast<int64_t>(1) << 47) &&          x < (static_cast<int64_t>(1) << 47)); -  int32_t reduced_multiplier = (quantized_multiplier < 0x7FFF0000)-                                   ? ((quantized_multiplier + (1 << 15)) >> 16)-                                   : 0x7FFF;-  int total_shift = 15 - shift;-  x = (x * (int64_t)reduced_multiplier) + ((int64_t)1 << (total_shift - 1));-  int32_t result = x >> total_shift;-  return result;+  const int32_t reduced_multiplier =+      (quantized_multiplier < 0x7FFF0000)+          ? ((quantized_multiplier + (1 << 15)) >> 16)+          : 0x7FFF;+  const int64_t total_shift = 15 - shift;+  const int64_t round = static_cast<int64_t>(1) << (total_shift - 1);+  int64_t result = x * static_cast<int64_t>(reduced_multiplier) + round;+  result = result >> total_shift;++  assert(result >= std::numeric_limits<int32_t>::min() &&+         result <= std::numeric_limits<int32_t>::max());+  return static_cast<int32_t>(result); }  #ifdef USE_NEON-// Round uses ARM's rounding shift right. inline int32x4x4_t MultiplyByQuantizedMultiplier4Rows(     int32x4x4_t input_val, int32_t quantized_multiplier, int shift) {-  const int left_shift = std::max(shift, 0);-  const int right_shift = std::min(shift, 0);+  const int right_shift = std::min(-1, shift);+  const int left_shift = shift - right_shift;++  const int32x4_t multiplier_dup = vdupq_n_s32(quantized_multiplier);+  const int32x4_t left_shift_dup = vdupq_n_s32(left_shift);+  const int32x4_t right_shift_dup = vdupq_n_s32(right_shift);+   int32x4x4_t result;+  result.val[0] = vrshlq_s32(+      vqdmulhq_s32(vshlq_s32(input_val.val[0], left_shift_dup), multiplier_dup),+      right_shift_dup);++  result.val[1] = vrshlq_s32(+      vqdmulhq_s32(vshlq_s32(input_val.val[1], left_shift_dup), multiplier_dup),+      right_shift_dup);++  result.val[2] = vrshlq_s32(+      vqdmulhq_s32(vshlq_s32(input_val.val[2], left_shift_dup), multiplier_dup),

thanks for the explanation!

Tessil

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[TFLite] Update MultiplyByQuantizedMultiplier to use single-rounding instead of double-rounding

 inline void BiasAndClamp(float clamp_min, float clamp_max, int bias_size, #endif } -inline int32_t MultiplyByQuantizedMultiplierSmallerThanOneExp(-    int32_t x, int32_t quantized_multiplier, int left_shift) {-  using gemmlowp::RoundingDivideByPOT;-  using gemmlowp::SaturatingRoundingDoublingHighMul;-  return RoundingDivideByPOT(-      SaturatingRoundingDoublingHighMul(x, quantized_multiplier), -left_shift);+inline int32_t MultiplyByQuantizedMultiplier(int32_t x,+                                             int32_t quantized_multiplier,+                                             int shift) {+  assert(quantized_multiplier >= 0);

instead of assert, please use TFLITE_DCHECK whenever possible, thanks

Tessil

comment created time in 3 months