profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/tamato1/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

tamato1/cloudwatch_exporter 0

Metrics exporter for Amazon AWS CloudWatch

tamato1/darknet 0

Convolutional Neural Networks

tamato1/EasyOCR 0

Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai

tamato1/incubator-mxnet 0

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

tamato1/models 0

Models and examples built with TensorFlow

delete branch tensorflow/text

delete branch : test_359676350

delete time in 2 days

PR merged tensorflow/text

Internal repo change cla: yes

Internal repo change

+8 -3

0 comment

1 changed file

tf-text-github-robot

pr closed time in 2 days

push eventtensorflow/text

Terry Huang

commit sha a6253a2b47abab230d8e2a58fbe3a0e083725860

Internal repo change PiperOrigin-RevId: 359803101

view details

push time in 2 days

push eventtensorflow/text

Terry Huang

commit sha a6253a2b47abab230d8e2a58fbe3a0e083725860

Internal repo change PiperOrigin-RevId: 359803101

view details

push time in 2 days

issue closedtensorflow/text

Bert Preprocess Model not working on windows 10

I have the same issue described here error-with-using-bert-model-from-tensorflow

I got this exception when i try the bert preprocessor on windows 10

Trying to access resource using the wrong type. Expected class tensorflow::lookup::LookupInterface got class tensorflow::lookup::LookupInterface

Stack trace

File "C:\work\vpython\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
    return graph_function._call_flat(
  File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
    outputs = execute.execute(
  File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Trying to access resource using the wrong type. Expected class tensorflow::lookup::LookupInterface got class tensorflow::lookup::LookupInterface
	 [[{{node prediction/keras_layer_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/bert_tokenizer/StatefulPartitionedCall/WordpieceTokenizeWithOffsets/WordpieceTokenizeWithOffsets/WordpieceTokenizeWithOffsets}}]] [Op:__inference_train_function_52076]
Function call stack:
train_function

closed time in 2 days

nassimus26

push eventtensorflow/text

Terry Huang

commit sha eabb72cb03dc76efbb5027a3ec9beecf2b2354a3

Internal repo change PiperOrigin-RevId: 359676350

view details

push time in 2 days

PR opened tensorflow/text

Internal repo change

Internal repo change

+8 -3

0 comment

1 changed file

pr created time in 3 days

create barnchtensorflow/text

branch : test_359676350

created branch time in 3 days

push eventtensorflow/text

Rens

commit sha fe4475bb6ba9583ecf6ec1d2525cfd972af5d1f5

Fix pip install command in readme The correct pip install command is with a dash, not underscore.

view details

Mark Daoust

commit sha 91d8087e5f13c66087970a4c2121da78f38bc4a6

A tensorfow.org compatible docs generator for tf-text. PiperOrigin-RevId: 358272664

view details

Mark Daoust

commit sha 02ab0ae40620b7009729046afa84a0ed0da376ea

Formatting fixes for tensorflow.org - `>>>` blocks can't be in triple-backtick fences. - If you need a list in `Returns:`, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule. PiperOrigin-RevId: 358480894

view details

thuang513

commit sha 4f12da539b16ea5d76eabfae79be84bd76147b3f

Merge pull request #526 from RensDimmendaal/patch-1 Fix pip install command in readme

view details

Rigel Swavely

commit sha 3d5c0fd29fe3d13e02256f4b583f08bd43fcdb99

Sample random tokens correctly during MLM. PiperOrigin-RevId: 359374836

view details

Yanhui Liang

commit sha 7e115ff94c06bf84b6ab69d4353b7a60f1eac64d

Move `monitoring.py` to `python/tools` dir. PiperOrigin-RevId: 354416703

view details

push time in 4 days

delete branch tensorflow/text

delete branch : test_359358056

delete time in 4 days

PR merged tensorflow/text

Sample random tokens correctly during MLM. cla: yes

Sample random tokens correctly during MLM.

+3 -2

0 comment

2 changed files

tf-text-github-robot

pr closed time in 4 days

push eventtensorflow/text

Rigel Swavely

commit sha 3d5c0fd29fe3d13e02256f4b583f08bd43fcdb99

Sample random tokens correctly during MLM. PiperOrigin-RevId: 359374836

view details

push time in 4 days

push eventtensorflow/text

Rigel Swavely

commit sha 3d5c0fd29fe3d13e02256f4b583f08bd43fcdb99

Sample random tokens correctly during MLM. PiperOrigin-RevId: 359374836

view details

push time in 4 days

push eventtensorflow/text

Rigel Swavely

commit sha 4b7fd29063ac10896a5d6a85d5cdc2884d766ea1

Sample random tokens correctly during MLM. PiperOrigin-RevId: 359358056

view details

push time in 4 days

push eventtensorflow/text

Rigel Swavely

commit sha dc3e6c9a206ac6c57610cea1702cf836f8ffe0af

Sample random tokens correctly during MLM. PiperOrigin-RevId: 359358056

view details

push time in 4 days

push eventtensorflow/text

Rigel Swavely

commit sha 91b8f4d251fac9993f607787d74eec522df8addd

Sample random tokens correctly during MLM. PiperOrigin-RevId: 359358056

view details

push time in 4 days

PR opened tensorflow/text

Sample random tokens correctly during MLM.

Sample random tokens correctly during MLM.

+2 -1

0 comment

1 changed file

pr created time in 4 days

create barnchtensorflow/text

branch : test_359358056

created branch time in 4 days

issue commenttensorflow/text

[Question] Pool bert subwords back to word level?

I wrote my own solution for this:

The following custom layer merges subwords using a specific input that tells which subwords belong together.

import numpy as np
import tensorflow as tf


class MergeSubwordsLayer(tf.keras.layers.Layer):
    """Merges consecutive subword embeddings to form fullword embeddings."""

    def __init__(self, **kwargs):
        super(MergeSubwordsLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        super(MergeSubwordsLayer, self).build(input_shape)

    def _merge_subwords(self, subword_vectors, full_word_indexes):
        ragged_fw_indexes = tf.RaggedTensor.from_tensor(full_word_indexes, padding=-1, ragged_rank=2)

        fullword_vectors = tf.gather(subword_vectors, ragged_fw_indexes, batch_dims=1)

        # Reduce subwords by sum, mean or any other operation
        fullword_vectors = tf.math.reduce_sum(fullword_vectors, axis=-2).to_tensor()

        return fullword_vectors

    def call(self, subword_vectors, full_word_indexes):
        fullword_embeddings = self._merge_subwords(subword_vectors, full_word_indexes)

        batch_size, _, embedding_dim = subword_vectors.shape
        _, num_fullwords, _ = full_word_indexes.shape
        fullword_embeddings.set_shape((batch_size, num_fullwords, embedding_dim))

        return fullword_embeddings

    def get_config(self):
        config = {
        }
        config.update(super(MergeSubwordsLayer, self).get_config())

        return config

This input is full_word_indexes, which looks like: tokens: ['joseph', 'harold', 'greenberg', 'may', '28', '1915', 'may', '7', '2001', 'was', 'an'] subtokens: ['joseph', 'har', '##old', 'green', '##berg', 'may', '2', '##8', '1', '##9', '##1', '##5', 'may', '7', '2', '##0', '##0', '##1', 'was', 'an'] full_word_indexes: [[0] [1 2] [3 4] [5] [6 7] [8 9 10 11] [12] [13] [14 15 16 17] [18] [19]]

However, since I'm doing this inside a tf.py_function, I need to return a np array with a dense shape. So it looks something like:

full_word_indexes = np.zeros((self.max_words_len, self.max_tokens_len), dtype=np.int32) - 1
for i, subtoken in enumerate(subtokens):
    if subtoken[:2] != '##':
        curr_index += 1
        count = 0
    full_word_indexes[curr_index][count] = i
    count += 1

Where self.max_words_len is the maximum # of full words in an input sentence and self.max_tokens_len is the maximum # of subword tokens in an input sentence.

And here's a small sample of how to use it:

inp = Input((self.max_tokens_len, ), dtype=tf.int32, name='sub_tokens')
inp_full_word_indexes = Input((self.max_words_len, self.max_tokens_len), dtype=tf.int32, name='full_word_indexes')

var = bert_model(inp)
var = Dropout(dropout_rate)(var)
full_var = MergeSubwordsLayer()(var, inp_full_word_indexes)
full_var = Conv1D(self.vocab_size, kernel_size=1, activation='softmax', name='full_out')(full_var)

inputs = {
    'sub_tokens': inp,
    'full_word_indexes': inp_full_word_indexes
}

outputs = {
    'full_out': full_var,
}

model = Model(inputs, outputs, name='word_vec')

Notes:

  • Using tf.py_function affects the performance of the data generator. Try to avoid this.
  • The layer uses tf.gather which is super fast during training/execution. It won't slow down your training times.
r-wheeler

comment created time in 4 days

issue commenttensorflow/text

TextVectorization layer vs TensorFlow Text

May I ask what's the difference between Tokenizer(fit_on_texts, texts_to_sequences and pad_sequences) and TextVectorization layer?

It seems they're doing the same thing.

dzlab

comment created time in 5 days

push eventtensorflow/text

TF.Text Team

commit sha 1e51f07f76c12927f85565215abf2a3bc55c5382

internal PiperOrigin-RevId: 359015537

view details

push time in 5 days

PR opened tensorflow/text

INTERNAL

INTERNAL

+4 -1

0 comment

1 changed file

pr created time in 5 days

create barnchtensorflow/text

branch : test_359015537

created branch time in 5 days

push eventtensorflow/text

Rens

commit sha fe4475bb6ba9583ecf6ec1d2525cfd972af5d1f5

Fix pip install command in readme The correct pip install command is with a dash, not underscore.

view details

thuang513

commit sha 4f12da539b16ea5d76eabfae79be84bd76147b3f

Merge pull request #526 from RensDimmendaal/patch-1 Fix pip install command in readme

view details

push time in 7 days

PR merged tensorflow/text

Fix pip install command in readme cla: yes

I was trying to install this package, but I got errors. Only later I found that the pypi name of the package is tensorflow-text not tensorflow_text. I hope this readme update will make it clear for future readers.

+1 -1

2 comments

1 changed file

RensDimmendaal

pr closed time in 7 days

delete branch tensorflow/text

delete branch : test_358385108

delete time in 9 days

PR merged tensorflow/text

Formatting fixes for tensorflow.org cla: yes

Formatting fixes for tensorflow.org

  • >>> blocks can't be in triple-backtick fences.
  • If you need a list in Returns:, format it the same way as an arg-list.
  • Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule.
+224 -236

0 comment

10 changed files

tf-text-github-robot

pr closed time in 9 days

push eventtensorflow/text

Mark Daoust

commit sha 02ab0ae40620b7009729046afa84a0ed0da376ea

Formatting fixes for tensorflow.org - `>>>` blocks can't be in triple-backtick fences. - If you need a list in `Returns:`, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule. PiperOrigin-RevId: 358480894

view details

push time in 9 days

push eventtensorflow/text

Mark Daoust

commit sha 02ab0ae40620b7009729046afa84a0ed0da376ea

Formatting fixes for tensorflow.org - `>>>` blocks can't be in triple-backtick fences. - If you need a list in `Returns:`, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule. PiperOrigin-RevId: 358480894

view details

push time in 9 days