Metrics exporter for Amazon AWS CloudWatch
Convolutional Neural Networks
Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Models and examples built with TensorFlow
PR merged tensorflow/text
Internal repo change
pr closed time in 2 days
push eventtensorflow/text
commit sha a6253a2b47abab230d8e2a58fbe3a0e083725860
Internal repo change PiperOrigin-RevId: 359803101
push time in 2 days
push eventtensorflow/text
commit sha a6253a2b47abab230d8e2a58fbe3a0e083725860
Internal repo change PiperOrigin-RevId: 359803101
push time in 2 days
issue closedtensorflow/text
Bert Preprocess Model not working on windows 10
I have the same issue described here error-with-using-bert-model-from-tensorflow
I got this exception when i try the bert preprocessor on windows 10
Trying to access resource using the wrong type. Expected class tensorflow::lookup::LookupInterface got class tensorflow::lookup::LookupInterface
Stack trace
File "C:\work\vpython\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
return graph_function._call_flat(
File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
outputs = execute.execute(
File "C:\work\vpython\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Trying to access resource using the wrong type. Expected class tensorflow::lookup::LookupInterface got class tensorflow::lookup::LookupInterface
[[{{node prediction/keras_layer_1/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/bert_tokenizer/StatefulPartitionedCall/WordpieceTokenizeWithOffsets/WordpieceTokenizeWithOffsets/WordpieceTokenizeWithOffsets}}]] [Op:__inference_train_function_52076]
Function call stack:
train_function
closed time in 2 days
nassimus26push eventtensorflow/text
commit sha eabb72cb03dc76efbb5027a3ec9beecf2b2354a3
Internal repo change PiperOrigin-RevId: 359676350
push time in 2 days
push eventtensorflow/text
commit sha fe4475bb6ba9583ecf6ec1d2525cfd972af5d1f5
Fix pip install command in readme The correct pip install command is with a dash, not underscore.
commit sha 91d8087e5f13c66087970a4c2121da78f38bc4a6
A tensorfow.org compatible docs generator for tf-text. PiperOrigin-RevId: 358272664
commit sha 02ab0ae40620b7009729046afa84a0ed0da376ea
Formatting fixes for tensorflow.org - `>>>` blocks can't be in triple-backtick fences. - If you need a list in `Returns:`, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule. PiperOrigin-RevId: 358480894
commit sha 4f12da539b16ea5d76eabfae79be84bd76147b3f
Merge pull request #526 from RensDimmendaal/patch-1 Fix pip install command in readme
commit sha 3d5c0fd29fe3d13e02256f4b583f08bd43fcdb99
Sample random tokens correctly during MLM. PiperOrigin-RevId: 359374836
commit sha 7e115ff94c06bf84b6ab69d4353b7a60f1eac64d
Move `monitoring.py` to `python/tools` dir. PiperOrigin-RevId: 354416703
push time in 4 days
PR merged tensorflow/text
Sample random tokens correctly during MLM.
pr closed time in 4 days
push eventtensorflow/text
commit sha 3d5c0fd29fe3d13e02256f4b583f08bd43fcdb99
Sample random tokens correctly during MLM. PiperOrigin-RevId: 359374836
push time in 4 days
push eventtensorflow/text
commit sha 3d5c0fd29fe3d13e02256f4b583f08bd43fcdb99
Sample random tokens correctly during MLM. PiperOrigin-RevId: 359374836
push time in 4 days
push eventtensorflow/text
commit sha 4b7fd29063ac10896a5d6a85d5cdc2884d766ea1
Sample random tokens correctly during MLM. PiperOrigin-RevId: 359358056
push time in 4 days
push eventtensorflow/text
commit sha dc3e6c9a206ac6c57610cea1702cf836f8ffe0af
Sample random tokens correctly during MLM. PiperOrigin-RevId: 359358056
push time in 4 days
push eventtensorflow/text
commit sha 91b8f4d251fac9993f607787d74eec522df8addd
Sample random tokens correctly during MLM. PiperOrigin-RevId: 359358056
push time in 4 days
PR opened tensorflow/text
Sample random tokens correctly during MLM.
pr created time in 4 days
issue commenttensorflow/text
[Question] Pool bert subwords back to word level?
I wrote my own solution for this:
The following custom layer merges subwords using a specific input that tells which subwords belong together.
import numpy as np
import tensorflow as tf
class MergeSubwordsLayer(tf.keras.layers.Layer):
"""Merges consecutive subword embeddings to form fullword embeddings."""
def __init__(self, **kwargs):
super(MergeSubwordsLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(MergeSubwordsLayer, self).build(input_shape)
def _merge_subwords(self, subword_vectors, full_word_indexes):
ragged_fw_indexes = tf.RaggedTensor.from_tensor(full_word_indexes, padding=-1, ragged_rank=2)
fullword_vectors = tf.gather(subword_vectors, ragged_fw_indexes, batch_dims=1)
# Reduce subwords by sum, mean or any other operation
fullword_vectors = tf.math.reduce_sum(fullword_vectors, axis=-2).to_tensor()
return fullword_vectors
def call(self, subword_vectors, full_word_indexes):
fullword_embeddings = self._merge_subwords(subword_vectors, full_word_indexes)
batch_size, _, embedding_dim = subword_vectors.shape
_, num_fullwords, _ = full_word_indexes.shape
fullword_embeddings.set_shape((batch_size, num_fullwords, embedding_dim))
return fullword_embeddings
def get_config(self):
config = {
}
config.update(super(MergeSubwordsLayer, self).get_config())
return config
This input is full_word_indexes
, which looks like:
tokens: ['joseph', 'harold', 'greenberg', 'may', '28', '1915', 'may', '7', '2001', 'was', 'an']
subtokens: ['joseph', 'har', '##old', 'green', '##berg', 'may', '2', '##8', '1', '##9', '##1', '##5', 'may', '7', '2', '##0', '##0', '##1', 'was', 'an']
full_word_indexes: [[0] [1 2] [3 4] [5] [6 7] [8 9 10 11] [12] [13] [14 15 16 17] [18] [19]]
However, since I'm doing this inside a tf.py_function
, I need to return a np array with a dense shape.
So it looks something like:
full_word_indexes = np.zeros((self.max_words_len, self.max_tokens_len), dtype=np.int32) - 1
for i, subtoken in enumerate(subtokens):
if subtoken[:2] != '##':
curr_index += 1
count = 0
full_word_indexes[curr_index][count] = i
count += 1
Where self.max_words_len
is the maximum # of full words in an input sentence and self.max_tokens_len
is the maximum # of subword tokens in an input sentence.
And here's a small sample of how to use it:
inp = Input((self.max_tokens_len, ), dtype=tf.int32, name='sub_tokens')
inp_full_word_indexes = Input((self.max_words_len, self.max_tokens_len), dtype=tf.int32, name='full_word_indexes')
var = bert_model(inp)
var = Dropout(dropout_rate)(var)
full_var = MergeSubwordsLayer()(var, inp_full_word_indexes)
full_var = Conv1D(self.vocab_size, kernel_size=1, activation='softmax', name='full_out')(full_var)
inputs = {
'sub_tokens': inp,
'full_word_indexes': inp_full_word_indexes
}
outputs = {
'full_out': full_var,
}
model = Model(inputs, outputs, name='word_vec')
Notes:
- Using
tf.py_function
affects the performance of the data generator. Try to avoid this. - The layer uses
tf.gather
which is super fast during training/execution. It won't slow down your training times.
comment created time in 4 days
issue commenttensorflow/text
TextVectorization layer vs TensorFlow Text
May I ask what's the difference between Tokenizer(fit_on_texts, texts_to_sequences and pad_sequences) and TextVectorization layer?
It seems they're doing the same thing.
comment created time in 5 days
push eventtensorflow/text
commit sha 1e51f07f76c12927f85565215abf2a3bc55c5382
internal PiperOrigin-RevId: 359015537
push time in 5 days
issue commenttensorflow/text
As a note, this also breaks the Transformer tutorial for en to portuguese
comment created time in 6 days
push eventtensorflow/text
commit sha fe4475bb6ba9583ecf6ec1d2525cfd972af5d1f5
Fix pip install command in readme The correct pip install command is with a dash, not underscore.
commit sha 4f12da539b16ea5d76eabfae79be84bd76147b3f
Merge pull request #526 from RensDimmendaal/patch-1 Fix pip install command in readme
push time in 7 days
PR merged tensorflow/text
I was trying to install this package, but I got errors. Only later I found that the pypi name of the package is tensorflow-text not tensorflow_text. I hope this readme update will make it clear for future readers.
pr closed time in 7 days
PR merged tensorflow/text
Formatting fixes for tensorflow.org
>>>
blocks can't be in triple-backtick fences.- If you need a list in
Returns:
, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule.
pr closed time in 9 days
push eventtensorflow/text
commit sha 02ab0ae40620b7009729046afa84a0ed0da376ea
Formatting fixes for tensorflow.org - `>>>` blocks can't be in triple-backtick fences. - If you need a list in `Returns:`, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule. PiperOrigin-RevId: 358480894
push time in 9 days
push eventtensorflow/text
commit sha 02ab0ae40620b7009729046afa84a0ed0da376ea
Formatting fixes for tensorflow.org - `>>>` blocks can't be in triple-backtick fences. - If you need a list in `Returns:`, format it the same way as an arg-list. - Avoid using indentation to denote sections - this may accidentally trigger markdown's (bad) "4-space indent is code formatted" rule. PiperOrigin-RevId: 358480894
push time in 9 days