profile
viewpoint
Sayak Paul sayakpaul Kolkata, India sayak.dev Data Science Educator | Intel Software Innovator | GDE in ML

issue commenttensorflow/tensorflow

Library Conversion: TensorRT

@aaroey I tried running your example on this Colab Notebook but surprisingly, Colab crashes. Your inputs here would be very helpful.

dynamicwebpaige

comment created time in 12 hours

Pull request review commenttensorflow/docs

Removing redundant use of np.array & unnecessary spaces

         "\n",         "Let's demonstrate how you can make a neural network \"dream\" and enhance the surreal patterns it sees in an image.\n",         "\n",-        "![Dogception](images/dogception.png)"+        "![Dogception](https://github.com/sayakpaul/docs/blob/master/site/en/tutorials/generative/images/dogception.png?raw=1)"

@yashk2810 could you provide your review on this?

sayakpaul

comment created time in 13 hours

startedraghavbali/appliedml_workshop_dhs_av_2019

started time in 13 hours

issue commentpytorch/pytorch

Can not use .cuda() function to load the model into GPU using Pytorch 1.3

@soumith thanks much for the clarification!

phongnhhn92

comment created time in 13 hours

issue commentpytorch/pytorch

Can not use .cuda() function to load the model into GPU using Pytorch 1.3

@soumith yeah! But I am curious about knowing Colab too has CUDA10.0 and PyTorch 1.3.1 still runs.

phongnhhn92

comment created time in a day

starteddipanjanS/deep_transfer_learning_nlp_dhs2019

started time in a day

startedShichenLiu/SoftRas

started time in a day

issue commentpytorch/pytorch

Can not use .cuda() function to load the model into GPU using Pytorch 1.3

@soumith it's a P100. I am seeing:

The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
phongnhhn92

comment created time in 2 days

issue commentpytorch/pytorch

Can not use .cuda() function to load the model into GPU using Pytorch 1.3

I am facing this issue on my GCP instance which equipped with CUDA 10.0. Is anyone else also facing the same on a GCP instance?

Also, out of curiosity, I ran !nvcc --version on a Colab notebook and I found out the version of CUDA there is also 10.0 and PyTorch 1.3.1 runs successfully.

Anything on this? :(

phongnhhn92

comment created time in 2 days

Pull request review commenttensorflow/docs

Removing redundant use of np.array & unnecessary spaces

         "\n",         "Let's demonstrate how you can make a neural network \"dream\" and enhance the surreal patterns it sees in an image.\n",         "\n",-        "![Dogception](images/dogception.png)"+        "![Dogception](https://github.com/sayakpaul/docs/blob/master/site/en/tutorials/generative/images/dogception.png?raw=1)"

tensorflow.org's .markdown parser doesn't recognize lists unless they have a blank line before them.

@MarkDaoust thanks for passing that along :)

sayakpaul

comment created time in 4 days

startedgregversteeg/corex_topic

started time in 4 days

issue commentpytorch/pytorch

Can not use .cuda() function to load the model into GPU using Pytorch 1.3

I am facing this issue on my GCP instance which equipped with CUDA 10.0. Is anyone else also facing the same on a GCP instance?

Also, out of curiosity, I ran !nvcc --version on a Colab notebook and I found out the version of CUDA there is also 10.0 and PyTorch 1.3.1 runs successfully.

phongnhhn92

comment created time in 5 days

startedmnielsen/neural-networks-and-deep-learning

started time in 5 days

startedAxeldeRomblay/MLBox

started time in 5 days

startedThoughtWorksInc/continuous-intelligence-workshop

started time in 5 days

startedjantic/DeOldify

started time in 6 days

issue commentfacebookresearch/detectron2

Problem with register_coco_instances while registering a COCO dataset

Thanks for your help throughout @ppwwyyxx. I was able to get the model to train:

[11/10 09:26:37 d2.engine.train_loop]: Starting training from iteration 0
[11/10 09:27:05 d2.utils.events]: eta: 0:06:23  iter: 19  total_loss: 5.656  loss_cls: 4.131  loss_box_reg: 0.188  loss_mask: 0.693  loss_rpn_cls: 0.531  loss_rpn_loc: 0.048  time: 1.3740  data_time: 0.0053  lr: 0.000005  max_mem: 2446M
[11/10 09:27:32 d2.utils.events]: eta: 0:05:45  iter: 39  total_loss: 5.314  loss_cls: 3.944  loss_box_reg: 0.308  loss_mask: 0.692  loss_rpn_cls: 0.208  loss_rpn_loc: 0.032  time: 1.3528  data_time: 0.0047  lr: 0.000010  max_mem: 2446M
[11/10 09:27:59 d2.utils.events]: eta: 0:05:21  iter: 59  total_loss: 5.189  loss_cls: 3.548  loss_box_reg: 0.281  loss_mask: 0.690  loss_rpn_cls: 0.457  loss_rpn_loc: 0.047  time: 1.3535  data_time: 0.0049  lr: 0.000015  max_mem: 2446M
[11/10 09:28:25 d2.utils.events]: eta: 0:04:56  iter: 79  total_loss: 4.186  loss_cls: 2.773  loss_box_reg: 0.151  loss_mask: 0.687  loss_rpn_cls: 0.318  loss_rpn_loc: 0.028  time: 1.3474  data_time: 0.0047  lr: 0.000020  max_mem: 2446M
[11/10 09:28:52 d2.utils.events]: eta: 0:04:30  iter: 99  total_loss: 3.981  loss_cls: 2.038  loss_box_reg: 0.327  loss_mask: 0.686  loss_rpn_cls: 0.427  loss_rpn_loc: 0.053  time: 1.3479  data_time: 0.0043  lr: 0.000025  max_mem: 2471M
[11/10 09:29:21 d2.utils.events]: eta: 0:04:04  iter: 119  total_loss: 2.759  loss_cls: 1.108  loss_box_reg: 0.179  loss_mask: 0.684  loss_rpn_cls: 0.427  loss_rpn_loc: 0.050  time: 1.3643  data_time: 0.0051  lr: 0.000030  max_mem: 2552M
[11/10 09:29:48 d2.utils.events]: eta: 0:03:37  iter: 139  total_loss: 2.177  loss_cls: 0.762  loss_box_reg: 0.128  loss_mask: 0.678  loss_rpn_cls: 0.422  loss_rpn_loc: 0.057  time: 1.3598  data_time: 0.0047  lr: 0.000035  max_mem: 2552M
[11/10 09:30:14 d2.utils.events]: eta: 0:03:09  iter: 159  total_loss: 2.534  loss_cls: 0.803  loss_box_reg: 0.077  loss_mask: 0.670  loss_rpn_cls: 0.375  loss_rpn_loc: 0.076  time: 1.3544  data_time: 0.0045  lr: 0.000040  max_mem: 2552M
[11/10 09:30:42 d2.utils.events]: eta: 0:02:42  iter: 179  total_loss: 1.567  loss_cls: 0.507  loss_box_reg: 0.053  loss_mask: 0.656  loss_rpn_cls: 0.212  loss_rpn_loc: 0.043  time: 1.3572  data_time: 0.0047  lr: 0.000045  max_mem: 2552M
[11/10 09:31:09 d2.utils.events]: eta: 0:02:15  iter: 199  total_loss: 1.537  loss_cls: 0.516  loss_box_reg: 0.068  loss_mask: 0.658  loss_rpn_cls: 0.229  loss_rpn_loc: 0.043  time: 1.3580  data_time: 0.0045  lr: 0.000050  max_mem: 2552M
[11/10 09:31:37 d2.utils.events]: eta: 0:01:49  iter: 219  total_loss: 1.717  loss_cls: 0.639  loss_box_reg: 0.008  loss_mask: 0.653  loss_rpn_cls: 0.169  loss_rpn_loc: 0.022  time: 1.3586  data_time: 0.0048  lr: 0.000055  max_mem: 2552M
[11/10 09:32:04 d2.utils.events]: eta: 0:01:22  iter: 239  total_loss: 1.438  loss_cls: 0.479  loss_box_reg: 0.024  loss_mask: 0.632  loss_rpn_cls: 0.168  loss_rpn_loc: 0.043  time: 1.3592  data_time: 0.0044  lr: 0.000060  max_mem: 2552M
[11/10 09:32:31 d2.utils.events]: eta: 0:00:55  iter: 259  total_loss: 2.169  loss_cls: 0.794  loss_box_reg: 0.052  loss_mask: 0.626  loss_rpn_cls: 0.350  loss_rpn_loc: 0.093  time: 1.3583  data_time: 0.0043  lr: 0.000065  max_mem: 2552M
[11/10 09:32:59 d2.utils.events]: eta: 0:00:28  iter: 279  total_loss: 1.572  loss_cls: 0.559  loss_box_reg: 0.047  loss_mask: 0.605  loss_rpn_cls: 0.213  loss_rpn_loc: 0.037  time: 1.3609  data_time: 0.0043  lr: 0.000070  max_mem: 2552M
[11/10 09:33:26 d2.utils.events]: eta: 0:00:01  iter: 299  total_loss: 1.832  loss_cls: 0.683  loss_box_reg: 0.170  loss_mask: 0.570  loss_rpn_cls: 0.196  loss_rpn_loc: 0.041  time: 1.3593  data_time: 0.0043  lr: 0.000075  max_mem: 2552M
[11/10 09:33:27 d2.engine.hooks]: Overall training speed: 297 iterations in 0:06:45 (1.3639 s / it)
[11/10 09:33:27 d2.engine.hooks]: Total training time: 0:06:46 (0:00:01 on hooks)
OrderedDict()

But I am still confused about why the model does not infer anything. I have updated the Colab notebook with minimal code to reproduce the issue. I have also updated the notebook with TensorBoard.

sayakpaul

comment created time in 7 days

issue commentfacebookresearch/detectron2

Problem with register_coco_instances while registering a COCO dataset

@ppwwyyxx I updated to a tuple but still it does not help.

sayakpaul

comment created time in 8 days

startedpallets/click

started time in 8 days

PublicEvent

starteddeezer/spleeter

started time in 9 days

PR opened wandb/gitbook

Adding parenthesis after `run.history`
+1 -1

0 comment

1 changed file

pr created time in 9 days

push eventsayakpaul/gitbook

Sayak Paul

commit sha 3e59fd31831e29794614be77d681f97a48196b7f

Adding parenthesis after `run.history`

view details

push time in 9 days

fork sayakpaul/gitbook

Documentation synced with GitBook

fork in 9 days

Pull request review commenttensorflow/docs

Removing redundant use of np.array & unnecessary spaces

         "\n",         "Let's demonstrate how you can make a neural network \"dream\" and enhance the surreal patterns it sees in an image.\n",         "\n",-        "![Dogception](images/dogception.png)"+        "![Dogception](https://github.com/sayakpaul/docs/blob/master/site/en/tutorials/generative/images/dogception.png?raw=1)"

@lamberta any updates on this?

sayakpaul

comment created time in 9 days

startedgoogle/active-learning

started time in 10 days

startedopenai/gpt-2-output-dataset

started time in 11 days

issue openedfacebookresearch/detectron2

Problem with register_coco_instances while registering a COCO dataset

Hi, I am following this getting started Colab notebook. I am trying to train a custom model using the TACO dataset which comes as a COCO-formatted dataset.

I prepared this Colab notebook for doing the experiments with the dataset. After I registered the dataset using register_coco_instances I am not able to start the training process and the error I get looks like so:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/content/detectron2_repo/detectron2/data/catalog.py in get(name)
     51         try:
---> 52             f = DatasetCatalog._REGISTERED[name]
     53         except KeyError:

KeyError: 'd'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
6 frames
/content/detectron2_repo/detectron2/data/catalog.py in get(name)
     54             raise KeyError(
     55                 "Dataset '{}' is not registered! Available datasets are: {}".format(
---> 56                     name, ", ".join(DatasetCatalog._REGISTERED.keys())
     57                 )
     58             )

KeyError: "Dataset 'd' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, my_dataset, taco_dataset"

The above-mentioned notebook can be used to reproduce the issue.

created time in 11 days

startedpedropro/TACO

started time in 11 days

startedfacebookresearch/detectron2

started time in 11 days

issue closedGoogleCloudPlatform/training-data-analyst

Number of samples not getting matched in TPU distribution strategy

Hi @martin-gorner. I am following your notebook keras_flowers_gputputpupod_tf2.1.ipynb. I was able to follow through this example pretty smoothly and thank you for putting it together and sharing it :)

I wanted to try it on a pet project I was working on. I first defined the strategy (after all the initial configuration on GCP):

# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy()

print("REPLICAS: ", strategy.num_replicas_in_sync)

You can verify the output in the following as well:

image

My training data in the form of numpy array and after compiling in the way you had shown in the notebook, I am calling fit on the compiled model:

compiled_model.fit(train_features, y_train_binarized,
                        class_weight=class_weight,
                        steps_per_epoch=len(train_features)//8 * strategy.num_replicas_in_sync,
                        epochs=15,
                        batch_size=8 * strategy.num_replicas_in_sync,
                        validation_split=0.1)

The shape of traini_features is (26152, 300) and the model's input is defined accordingly. When the above is run, it gives me:

ValueError                                Traceback (most recent call last)
<ipython-input-29-61629493fd70> in <module>
      4                         epochs=15,
      5                         batch_size=8 * strategy.num_replicas_in_sync,
----> 6                         validation_split=0.1)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    783         max_queue_size=max_queue_size,
    784         workers=workers,
--> 785         use_multiprocessing=use_multiprocessing)
    786 
    787   def evaluate(self,

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    210         steps_per_epoch,
    211         ModeKeys.TRAIN,
--> 212         validation_split=validation_split)
    213     dist_utils.validate_callbacks(input_callbacks=callbacks,
    214                                   optimizer=model.optimizer)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in process_batch_and_step_size(strategy, inputs, batch_size, steps_per_epoch, mode, validation_split)
    467     # relax the constraint to consume all the training samples.
    468     steps_per_epoch, batch_size = get_input_params(
--> 469         strategy, num_samples, steps_per_epoch, batch_size, mode=mode)
    470   return batch_size, steps_per_epoch
    471 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in get_input_params(distribution_strategy, num_samples, steps, batch_size, mode)
    559         raise ValueError('Number of samples %s is less than samples required '
    560                          'for specified batch_size %s and steps %s' % (
--> 561                              num_samples, global_batch_size, steps))
    562 
    563   # We need to return the per replica or global batch size based on the strategy

ValueError: Number of samples 23536 is less than samples required for specified batch_size 64 and steps 408

The number of samples in the above log is different from the original and I am not able to figure out why. You can find the files here in order to reproduce the issue:

Archive.zip

closed time in 12 days

sayakpaul

issue commentGoogleCloudPlatform/training-data-analyst

Number of samples not getting matched in TPU distribution strategy

I think that in your case, you assumed the number of data elements available for training was len(train_features) when in fact it is len(train_features)-(1-validation_split)

Correct, @martin-gorner. Thanks for pointing that out.

sayakpaul

comment created time in 12 days

push eventsayakpaul/TalksGiven

Sayak Paul

commit sha 778f12eb271d6599115d8ac152ebcd1b132d4f34

Update README.md

view details

push time in 14 days

push eventsayakpaul/TalksGiven

Sayak Paul

commit sha 0494192c029b356924b18fa1464909af47cefa99

Add files via upload

view details

push time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump urllib3 from 1.24.1 to 1.24.2 in /wandb/run-20191103_044405-2ev9pm6c dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps urllib3 from 1.24.1 to 1.24.2. <details> <summary>Changelog</summary>

Sourced from urllib3's changelog.

1.24.2 (2019-04-17)

  • Don't load system certificates by default when any other ca_certs, ca_certs_dir or ssl_context parameters are specified.

  • Remove Authorization header regardless of case when redirecting to cross-site. (Issue #1510)

  • Add support for IPv6 addresses in subjectAltName section of certificates. (Issue #1269) </details> <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump pillow from 5.4.1 to 6.2.0 in /wandb/run-20191103_044451-eendlfxo dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps pillow from 5.4.1 to 6.2.0. <details> <summary>Release notes</summary>

Sourced from pillow's releases.

6.2.0

https://pillow.readthedocs.io/en/stable/releasenotes/6.2.0.html

6.1.0

https://pillow.readthedocs.io/en/stable/releasenotes/6.1.0.html

6.0.0

No release notes provided. </details> <details> <summary>Changelog</summary>

Sourced from pillow's changelog.

6.2.0 (2019-10-01)

  • Catch buffer overruns #4104 [radarhere]

  • Initialize rows_per_strip when RowsPerStrip tag is missing #4034 [cgohlke, radarhere]

  • Raise error if TIFF dimension is a string #4103 [radarhere]

  • Added decompression bomb checks #4102 [radarhere]

  • Fix ImageGrab.grab DPI scaling on Windows 10 version 1607+ #4000 [nulano, radarhere]

  • Corrected negative seeks #4101 [radarhere]

  • Added argument to capture all screens on Windows #3950 [nulano, radarhere]

  • Updated warning to specify when Image.frombuffer defaults will change #4086 [radarhere]

  • Changed WindowsViewer format to PNG #4080 [radarhere]

  • Use TIFF orientation #4063 [radarhere]

  • Raise the same error if a truncated image is loaded a second time #3965 [radarhere]

  • Lazily use ImageFileDirectory_v1 values from Exif #4031 [radarhere]

  • Improved HSV conversion #4004 [radarhere]

  • Added text stroking #3978 [radarhere, hugovk]

  • No more deprecated bdist_wininst .exe installers #4029 [hugovk]

  • Do not allow floodfill to extend into negative coordinates #4017 [radarhere] </tr></table> ... (truncated) </details> <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump urllib3 from 1.24.1 to 1.24.2 in /wandb/run-20191103_044451-eendlfxo dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps urllib3 from 1.24.1 to 1.24.2. <details> <summary>Changelog</summary>

Sourced from urllib3's changelog.

1.24.2 (2019-04-17)

  • Don't load system certificates by default when any other ca_certs, ca_certs_dir or ssl_context parameters are specified.

  • Remove Authorization header regardless of case when redirecting to cross-site. (Issue #1510)

  • Add support for IPv6 addresses in subjectAltName section of certificates. (Issue #1269) </details> <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump notebook from 5.7.6 to 5.7.8 in /wandb/run-20191103_044451-eendlfxo dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps notebook from 5.7.6 to 5.7.8. <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump python-engineio from 3.7.0 to 3.8.2.post1 in /wandb/run-20191103_044405-2ev9pm6c dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps python-engineio from 3.7.0 to 3.8.2.post1. <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump werkzeug from 0.15.1 to 0.15.3 in /wandb/run-20191103_044451-eendlfxo dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps werkzeug from 0.15.1 to 0.15.3. <details> <summary>Release notes</summary>

Sourced from werkzeug's releases.

0.15.3

  • Blog: https://palletsprojects.com/blog/werkzeug-0-15-3-released/
  • Changes: https://werkzeug.palletsprojects.com/en/0.15.x/changes/#version-0-15-3

0.15.2

  • Blog: https://palletsprojects.com/blog/werkzeug-0-15-2-released/
  • Changes: https://werkzeug.palletsprojects.com/en/0.15.x/changes/#version-0-15-2 </details> <details> <summary>Changelog</summary>

Sourced from werkzeug's changelog.

Version 0.15.3

Released 2019-05-14

  • Properly handle multi-line header folding in development server in Python 2.7. (:issue:1080)
  • Restore the response argument to :exc:~exceptions.Unauthorized. (:pr:1527)
  • :exc:~exceptions.Unauthorized doesn't add the WWW-Authenticate header if www_authenticate is not given. (:issue:1516)
  • The default URL converter correctly encodes bytes to string rather than representing them with b''. (:issue:1502)
  • Fix the filename format string in :class:~middleware.profiler.ProfilerMiddleware to correctly handle float values. (:issue:1511)
  • Update :class:~middleware.lint.LintMiddleware to work on Python 3. (:issue:1510)
  • The debugger detects cycles in chained exceptions and does not time out in that case. (:issue:1536)
  • When running the development server in Docker, the debugger security pin is now unique per container.

Version 0.15.2

Released 2019-04-02

  • Rule code generation uses a filename that coverage will ignore. The previous value, "generated", was causing coverage to fail. (:issue:1487)
  • The test client removes the cookie header if there are no persisted cookies. This fixes an issue introduced in 0.15.0 where the cookies from the original request were used for redirects, causing functions such as logout to fail. (:issue:1491)
  • The test client copies the environ before passing it to the app, to prevent in-place modifications from affecting redirect requests. (:issue:1498)
  • The "werkzeug" logger only adds a handler if there is no handler configured for its level in the logging chain. This avoids double logging if other code configures logging first. (:issue:1492) </details> <details> <summary>Commits</summary>
  • 9b1123a release version 0.15.3
  • 00bc43b unique debugger pin in Docker containers
  • 2cbdf2b Merge pull request #1542 from asottile/exceptions_arent_always_hashable
  • 0e669f6 Fix unhashable exception types
  • bdc17e4 Merge pull request #1540 from pallets/break-tb-cycle
  • 44e38c2 break cycle in chained exceptions
  • 777500b Merge pull request #1518 from NiklasMM/fix/1510_lint-middleware-python3-compa...
  • e00c7c2 Make LintMiddleware Python 3 compatible and add tests
  • d590cc7 Merge pull request #1539 from pallets/profiler-format
  • 0388fc9 update filename_format for ProfilerMiddleware.
  • Additional commits viewable in compare view </details> <br />

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump werkzeug from 0.15.1 to 0.15.3 in /wandb/run-20191103_044405-2ev9pm6c dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps werkzeug from 0.15.1 to 0.15.3. <details> <summary>Release notes</summary>

Sourced from werkzeug's releases.

0.15.3

  • Blog: https://palletsprojects.com/blog/werkzeug-0-15-3-released/
  • Changes: https://werkzeug.palletsprojects.com/en/0.15.x/changes/#version-0-15-3

0.15.2

  • Blog: https://palletsprojects.com/blog/werkzeug-0-15-2-released/
  • Changes: https://werkzeug.palletsprojects.com/en/0.15.x/changes/#version-0-15-2 </details> <details> <summary>Changelog</summary>

Sourced from werkzeug's changelog.

Version 0.15.3

Released 2019-05-14

  • Properly handle multi-line header folding in development server in Python 2.7. (:issue:1080)
  • Restore the response argument to :exc:~exceptions.Unauthorized. (:pr:1527)
  • :exc:~exceptions.Unauthorized doesn't add the WWW-Authenticate header if www_authenticate is not given. (:issue:1516)
  • The default URL converter correctly encodes bytes to string rather than representing them with b''. (:issue:1502)
  • Fix the filename format string in :class:~middleware.profiler.ProfilerMiddleware to correctly handle float values. (:issue:1511)
  • Update :class:~middleware.lint.LintMiddleware to work on Python 3. (:issue:1510)
  • The debugger detects cycles in chained exceptions and does not time out in that case. (:issue:1536)
  • When running the development server in Docker, the debugger security pin is now unique per container.

Version 0.15.2

Released 2019-04-02

  • Rule code generation uses a filename that coverage will ignore. The previous value, "generated", was causing coverage to fail. (:issue:1487)
  • The test client removes the cookie header if there are no persisted cookies. This fixes an issue introduced in 0.15.0 where the cookies from the original request were used for redirects, causing functions such as logout to fail. (:issue:1491)
  • The test client copies the environ before passing it to the app, to prevent in-place modifications from affecting redirect requests. (:issue:1498)
  • The "werkzeug" logger only adds a handler if there is no handler configured for its level in the logging chain. This avoids double logging if other code configures logging first. (:issue:1492) </details> <details> <summary>Commits</summary>
  • 9b1123a release version 0.15.3
  • 00bc43b unique debugger pin in Docker containers
  • 2cbdf2b Merge pull request #1542 from asottile/exceptions_arent_always_hashable
  • 0e669f6 Fix unhashable exception types
  • bdc17e4 Merge pull request #1540 from pallets/break-tb-cycle
  • 44e38c2 break cycle in chained exceptions
  • 777500b Merge pull request #1518 from NiklasMM/fix/1510_lint-middleware-python3-compa...
  • e00c7c2 Make LintMiddleware Python 3 compatible and add tests
  • d590cc7 Merge pull request #1539 from pallets/profiler-format
  • 0388fc9 update filename_format for ProfilerMiddleware.
  • Additional commits viewable in compare view </details> <br />

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump python-engineio from 3.7.0 to 3.8.2.post1 in /wandb/run-20191103_044451-eendlfxo dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps python-engineio from 3.7.0 to 3.8.2.post1. <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

PR closed sayakpaul/Generating-categories-from-arXiv-paper-titles

Bump pillow from 5.4.1 to 6.2.0 in /wandb/run-20191103_044405-2ev9pm6c dependencies

⚠️ Dependabot is rebasing this PR ⚠️

If you make any changes to it yourself then they will take precedence over the rebase.


Bumps pillow from 5.4.1 to 6.2.0. <details> <summary>Release notes</summary>

Sourced from pillow's releases.

6.2.0

https://pillow.readthedocs.io/en/stable/releasenotes/6.2.0.html

6.1.0

https://pillow.readthedocs.io/en/stable/releasenotes/6.1.0.html

6.0.0

No release notes provided. </details> <details> <summary>Changelog</summary>

Sourced from pillow's changelog.

6.2.0 (2019-10-01)

  • Catch buffer overruns #4104 [radarhere]

  • Initialize rows_per_strip when RowsPerStrip tag is missing #4034 [cgohlke, radarhere]

  • Raise error if TIFF dimension is a string #4103 [radarhere]

  • Added decompression bomb checks #4102 [radarhere]

  • Fix ImageGrab.grab DPI scaling on Windows 10 version 1607+ #4000 [nulano, radarhere]

  • Corrected negative seeks #4101 [radarhere]

  • Added argument to capture all screens on Windows #3950 [nulano, radarhere]

  • Updated warning to specify when Image.frombuffer defaults will change #4086 [radarhere]

  • Changed WindowsViewer format to PNG #4080 [radarhere]

  • Use TIFF orientation #4063 [radarhere]

  • Raise the same error if a truncated image is loaded a second time #3965 [radarhere]

  • Lazily use ImageFileDirectory_v1 values from Exif #4031 [radarhere]

  • Improved HSV conversion #4004 [radarhere]

  • Added text stroking #3978 [radarhere, hugovk]

  • No more deprecated bdist_wininst .exe installers #4029 [hugovk]

  • Do not allow floodfill to extend into negative coordinates #4017 [radarhere] </tr></table> ... (truncated) </details> <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+1 -1

0 comment

1 changed file

dependabot[bot]

pr closed time in 14 days

push eventsayakpaul/Generating-categories-from-arXiv-paper-titles

Sayak Paul

commit sha 1de9d00aed4f9213383b904e5112016cddbcee15

Cleaning up

view details

Sayak Paul

commit sha fd8eda66da0b1c35ac82a7040362f6ec7a5c89d5

Merge pull request #11 from sayakpaul/text_pred_logger Cleaning up

view details

push time in 14 days

push eventsayakpaul/Generating-categories-from-arXiv-paper-titles

Sayak Paul

commit sha 1de9d00aed4f9213383b904e5112016cddbcee15

Cleaning up

view details

push time in 14 days

push eventsayakpaul/Generating-categories-from-arXiv-paper-titles

Sayak Paul

commit sha 369bae7b97d86105273125cd272f21d42fb2193b

Added the text logger notebook

view details

Sayak Paul

commit sha 508cc54f6a34489cbd9570ef682d49dcae231468

Merge pull request #1 from sayakpaul/text_pred_logger Added the text logger notebook

view details

push time in 14 days

startedbrohrer/cottonwood_martian_images

started time in 14 days

MemberEvent

startedgcastex/PruNet

started time in 15 days

startedgcastex/PruNet

started time in 15 days

PublicEvent

issue commentGoogleCloudPlatform/training-data-analyst

Number of samples not getting matched in TPU distribution strategy

@martin-gorner thank you very much for your detailed insights regarding adjusting the learning rate. They would really be helpful. Would you mind if I added those notes in the notebook with credits?

Regarding validation_split error, I think you missed the point I had made in my previous comment:

The problem was indeed in validation_split. When you specify the training data to be train_features and y_train_binarized and at the same time you also specify validation_split the different cores of the respective Cloud TPU fail to handle it properly, just like you hinted.

I initially tried without steps_per_epoch actually and it was giving me a Not Divisible Error.

Also, here's the notebook (the working one) again with the necessary files including the dataset: Archive.zip

sayakpaul

comment created time in 15 days

push eventsayakpaul/TF-2.0-Hacks

Sayak Paul

commit sha df666dae4583b159a371ae3b2af66b89cb0c0d3c

Add files via upload

view details

push time in 15 days

push eventsayakpaul/TF-2.0-Hacks

Sayak Paul

commit sha 2af61148e942b197028e02372d3584476a05c045

Added the initial versions of TFRecords notebooks

view details

push time in 15 days

issue commentGoogleCloudPlatform/training-data-analyst

Number of samples not getting matched in TPU distribution strategy

Oh yes! I got it to up and running just now. The problem was indeed in validation_split. When you specify the training data to be train_features and y_train_binarized and at the same time you also specify validation_split the different cores of the respective Cloud TPU fail to handle it properly, just like you hinted.

So, I modified the fit function like so:

compiled_model.fit(train_features, y_train_binarized,
                        class_weight=class_weight,
                        steps_per_epoch=len(train_features)//(8 * strategy.num_replicas_in_sync),
                        validation_data=(test_features, y_test_binarized),
                        validation_steps=len(test_features)//(8 * strategy.num_replicas_in_sync),
                        epochs=15,
                        batch_size=8 * strategy.num_replicas_in_sync)

Now it was running in its full glory and the speed was blazing fast. Regarding the following:

Also, could you please copy-paste the TF version you tried this on: print("Tensorflow version " + tf.version)

It's 2.1.0-dev20191029. I think the logging could have been changed a bit:

Train on 26152 samples, validate on 6538 samples
Epoch 1/15
26112/26152 [============================>.] - ETA: 0s - loss: 0.2301 - categorical_accuracy: 0.2482 - val_loss: 0.1260 - val_categorical_accuracy: 0.3073Epoch 2/15
26088/26152 [============================>.] - ETA: 0s - loss: 0.1229 - categorical_accuracy: 0.3073 - val_loss: 0.1210 - val_categorical_accuracy: 0.3073Epoch 3/15

There is no information on the time it took to complete that epoch or how much time it took per step. Is it normal?

I ran the same version on my V100 GCP instance and the performance was better (although the training time was significantly higher). I am also leaving you the notebook for your perusal.

notebook.ipynb.zip

sayakpaul

comment created time in 16 days

startedwandb/examples

started time in 16 days

issue commentGoogleCloudPlatform/training-data-analyst

Number of samples not getting matched in TPU distribution strategy

Cool. I will try that out. Just so you know, train_features and y_train_binarized and numpy arrays.

sayakpaul

comment created time in 16 days

startedGoogleCloudPlatform/training-data-analyst

started time in 16 days

issue openedGoogleCloudPlatform/training-data-analyst

Number of samples not getting matched in TPU distribution strategy

Hi @martin-gorner. I am following your notebook keras_flowers_gputputpupod_tf2.1.ipynb. I was able to follow through this example pretty smoothly and thank you for putting it together and sharing it :)

I wanted to try it on a pet project I was working on. I first defined the strategy (after all the initial configuration on GCP):

# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy()

print("REPLICAS: ", strategy.num_replicas_in_sync)

You can verify the output in the following as well:

image

My training data in the form of numpy array and after compiling in the way you had shown in the notebook, I am calling fit on the compiled model:

compiled_model.fit(train_features, y_train_binarized,
                        class_weight=class_weight,
                        steps_per_epoch=len(train_features)//8 * strategy.num_replicas_in_sync,
                        epochs=15,
                        batch_size=8 * strategy.num_replicas_in_sync,
                        validation_split=0.1)

The shape of traini_features is (26152, 300) and the model's input is defined accordingly. When the above is run, it gives me:

ValueError                                Traceback (most recent call last)
<ipython-input-29-61629493fd70> in <module>
      4                         epochs=15,
      5                         batch_size=8 * strategy.num_replicas_in_sync,
----> 6                         validation_split=0.1)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    783         max_queue_size=max_queue_size,
    784         workers=workers,
--> 785         use_multiprocessing=use_multiprocessing)
    786 
    787   def evaluate(self,

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/engine/training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    210         steps_per_epoch,
    211         ModeKeys.TRAIN,
--> 212         validation_split=validation_split)
    213     dist_utils.validate_callbacks(input_callbacks=callbacks,
    214                                   optimizer=model.optimizer)

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in process_batch_and_step_size(strategy, inputs, batch_size, steps_per_epoch, mode, validation_split)
    467     # relax the constraint to consume all the training samples.
    468     steps_per_epoch, batch_size = get_input_params(
--> 469         strategy, num_samples, steps_per_epoch, batch_size, mode=mode)
    470   return batch_size, steps_per_epoch
    471 

/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in get_input_params(distribution_strategy, num_samples, steps, batch_size, mode)
    559         raise ValueError('Number of samples %s is less than samples required '
    560                          'for specified batch_size %s and steps %s' % (
--> 561                              num_samples, global_batch_size, steps))
    562 
    563   # We need to return the per replica or global batch size based on the strategy

ValueError: Number of samples 23536 is less than samples required for specified batch_size 64 and steps 408

The number of samples in the above log is different from the original and I am not able to figure out why. You can find the files here in order to reproduce the issue:

Archive.zip

created time in 16 days

startedconnorhough/Mask_RCNN

started time in 17 days

push eventsayakpaul/TalksGiven

Sayak Paul

commit sha ee4f03521d9238b0c9ccdf87a2b963205809b1c9

Add files via upload

view details

push time in 18 days

startedanubhavmaity/Sports-Type-Classifier

started time in 18 days

startedVowpalWabbit/vowpal_wabbit

started time in 19 days

issue closedtensorflow/tpu

Training tf.keras models (TensorFlow 2.0) using Cloud TPUs

Hi. Referring to this tutorial I was wondering if we could do the same with TensorFlow 2.0. If yes, what would be the required changes?

closed time in 19 days

sayakpaul

starteduber/ludwig

started time in 19 days

startedtacchinotacchi/distil-bilstm

started time in 19 days

issue openedtensorflow/tpu

Training tf.keras models (TensorFlow 2.0) using Cloud TPUs

Hi. Referring to this tutorial I was wondering if we could do the same with TensorFlow 2.0. If yes, what would be the required changes?

created time in 20 days

startedexplosion/thinc

started time in 20 days

Pull request review commenttensorflow/docs

Removing redundant use of np.array & unnecessary spaces

         "\n",         "Let's demonstrate how you can make a neural network \"dream\" and enhance the surreal patterns it sees in an image.\n",         "\n",-        "![Dogception](images/dogception.png)"+        "![Dogception](https://github.com/sayakpaul/docs/blob/master/site/en/tutorials/generative/images/dogception.png?raw=1)"

Hi @lamberta I also changed the bullet points for the following section in this tutorial:

Screen Shot 2019-10-27 at 2 50 36 PM

Although "*" allows us to create bullets in Markdown, for some weird reasons it is rendering properly (which you can see above). I replaced the Asterisk with "-".

sayakpaul

comment created time in 21 days

push eventsayakpaul/docs

Sayak Paul

commit sha 5fe87d8348f5cf124ad502ed78e2364a512e1c55

Changed the bullet points

view details

push time in 21 days

issue openedtensorflow/tensorflow

Inclusion of a model re-training example

URL(s) with the issue: https://www.tensorflow.org/tutorials/keras/save_and_load

Please provide a link to the documentation entry, for example: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/MyMethod https://www.tensorflow.org/tutorials/keras/save_and_load

Description of issue (what needs changing):

Since model re-training is quite vital in both applied and research-based environments, I think it would be great to include an example on the same in this tutorial.

Clear description

The tutorial shows how to save and load models using various options. It does mention that using the model checkpoints it is possible to train the model from the point it was left off. However, currently, there is no section in the tutorial that shows how to do that in the correct way.

For example, why should someone use this method? How is it useful?

There are several instances where a model may have to be retrained:

  • There is new data and the model needs to re-trained on that
  • If we are on local machines and if there is a power failure or bottlenecks that cause the training process to stop, we can always load up the latest checkpoints and re-train the models from there.

created time in 21 days

issue commenttensorflow/tensorflow

Loading and re-training the model

Hi @prakhar21. Did you happen to find a way around on this? I am specifically interested in how could we perform the re-training part using the new Save and Load methodologies with Keras models.

prakhar21

comment created time in 21 days

issue commentcreme-ml/creme

Inclusion of a scikit-learn wrapper like API for Keras models

Thank you so much @MaxHalford. I will definitely give this a try and let you know my findings.

sayakpaul

comment created time in 21 days

pull request commentjphall663/awesome-machine-learning-interpretability

Added several resources specifically for deep learning

@jphall663 suggestions incorporated. I could not find **python interpretable model ** section. So, I removed lattice from the list.

sayakpaul

comment created time in 21 days

push eventsayakpaul/awesome-machine-learning-interpretability

Sayak Paul

commit sha a1b2a1a4615f5a3b6fb0e20dc2ea7d6b88011895

Addressed Patrick's feedback

view details

push time in 21 days

startedcreme-ml/creme

started time in 21 days

issue commentcreme-ml/creme

Inclusion of a scikit-learn wrapper like API for Keras models

Thanks. Mind passing on some boilerplate code so that I can quickly test it out?

sayakpaul

comment created time in 21 days

push eventsayakpaul/docs

Sayak Paul

commit sha e6edc606e1dcc95407805a583bcbd8dd0ac17cbb

Kept the image link original

view details

push time in 21 days

Pull request review commenttensorflow/docs

Removing redundant use of np.array & unnecessary spaces

         "\n",         "Let's demonstrate how you can make a neural network \"dream\" and enhance the surreal patterns it sees in an image.\n",         "\n",-        "![Dogception](images/dogception.png)"+        "![Dogception](https://github.com/sayakpaul/docs/blob/master/site/en/tutorials/generative/images/dogception.png?raw=1)"

Sure. Thanks!

sayakpaul

comment created time in 21 days

PR opened dronedeploy/dd-ml-segmentation-benchmark

ResNet18 import fixation

Referring to the models_keras.py script, ResNet18 is being imported as from classification_models.keras import Classifiers but I was unable to find the classification_models module in the first place.

I swapped it with:

from keras.applications.resnet import ResNet18
model = ResNet18(
            weights='imagenet' if pretrained else None,
            input_tensor=input,
            include_top=False)
+1 -2

0 comment

1 changed file

pr created time in 22 days

push eventsayakpaul/dd-ml-segmentation-benchmark

Sayak Paul

commit sha a77986fd7481a05a42e0880b973701244d0ceba9

Fixed ResNet18 import

view details

push time in 22 days

starteddronedeploy/dd-ml-segmentation-benchmark

started time in 22 days

issue openeddronedeploy/dd-ml-segmentation-benchmark

Clarification regarding the use of elevation

Hi.

Thanks for putting together this dataset. A very interesting problem indeed. Could you please elaborate a bit more on the elevation images as to what their purpose is, how can they be utilized for the segmentation task and so on?

Sorry if those questions are dumb.

created time in 22 days

push eventsayakpaul/TF-2.0-Hacks

Sayak Paul

commit sha abe023a7fd06f434670aac6988b3c40cb6f57038

Update README.md

view details

push time in 22 days

push eventsayakpaul/TF-2.0-Hacks

Sayak Paul

commit sha ba03cb60359a2bb6b8cc43fa5de0833408b1f9e0

Update README.md

view details

push time in 22 days

starteddeepfakes/faceswap

started time in 22 days

push eventtirthajyoti/Papers-Literature-ML-DL-RL-AI

Sayak Paul

commit sha 3316e96d284e9c56b4cc0100be83645051beafb1

Added RAdam, Lottery Ticket Hypothesis and Super Convergence papers

view details

Sayak Paul

commit sha 5b52e279534d337ec0a39ec9bc4438404b399d70

Deep Learning and Information Theory

view details

Sayak Paul

commit sha fa7566e2dd5c3e86edfa2950459d5ba975693623

Merge pull request #5 from sayakpaul/master Papers on network training & relevance to information theory

view details

push time in 23 days

PR merged tirthajyoti/Papers-Literature-ML-DL-RL-AI

Papers on network training & relevance to information theory

Adding four papers:

  • Lottery ticket hypothesis
  • Superconvergence
  • RAdam
  • Deep learning and information theory
+0 -0

0 comment

4 changed files

sayakpaul

pr closed time in 23 days

pull request commenttensorflow/docs

Removing redundant use of np.array & unnecessary spaces

@googlebot I signed it!

sayakpaul

comment created time in 23 days

PR opened tensorflow/docs

Reviewers
Removing redundant use of np.array & unnecessary spaces

Hi.

I found the tutorial on Deep Dream really interesting. It not only presents with the vanilla implementation but also introduces more complex topics like octaves, tiling very elegantly.

However, I found two occurrences where I felt that the use of np.array is redundant. Towards the very beginning, while loading our image, we are already doing it. Also, I have removed some unnecessary spaces to keep it more compact.

+667 -651

0 comment

1 changed file

pr created time in 23 days

push eventsayakpaul/docs

Sayak Paul

commit sha 6ed156a7473714fc7d9f7c3214a1221fbca020df

Added changes to DeepDream notebook

view details

push time in 23 days

fork sayakpaul/docs

TensorFlow documentation

https://www.tensorflow.org

fork in 23 days

startedgoogle-research/text-to-text-transfer-transformer

started time in 23 days

more