UberAI introduces a new approach for making Neural Networks process images faster & more accurately with jpeg representations.
Link: https://eng.uber.com/neural-networks-jpeg/
Paper: https://papers.nips.cc/paper/7649-faster-neural-networks-straight-from-jpeg
#nn #CV #Uber
🔗 Faster Neural Networks Straight from JPEG
Uber AI Labs introduces a method for making neural networks that process images faster and more accurately by leveraging JPEG representations.
Link: https://eng.uber.com/neural-networks-jpeg/
Paper: https://papers.nips.cc/paper/7649-faster-neural-networks-straight-from-jpeg
#nn #CV #Uber
🔗 Faster Neural Networks Straight from JPEG
Uber AI Labs introduces a method for making neural networks that process images faster and more accurately by leveraging JPEG representations.
Uber Engineering Blog
Faster Neural Networks Straight from JPEG
Uber AI Labs introduces a method for making neural networks that process images faster and more accurately by leveraging JPEG representations.
Generalization in Deep Networks: The Role of Distance from Initialization
Why it's important to take into account the initialization to explain generalization.
ArXiV: https://arxiv.org/abs/1901.01672
#DL #NN
🔗 Generalization in Deep Networks: The Role of Distance from Initialization
Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of {\em the $\ell_2$ distance from the initialization}. We also provide theoretical arguments that further highlight the need for initialization-dependent notions of model capacity. We leave as open questions how and why distance from initialization is regularized, and whether it is sufficient to explain generalization.
Why it's important to take into account the initialization to explain generalization.
ArXiV: https://arxiv.org/abs/1901.01672
#DL #NN
🔗 Generalization in Deep Networks: The Role of Distance from Initialization
Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of {\em the $\ell_2$ distance from the initialization}. We also provide theoretical arguments that further highlight the need for initialization-dependent notions of model capacity. We leave as open questions how and why distance from initialization is regularized, and whether it is sufficient to explain generalization.
🤓Interesting note on weight decay vs L2 regularization
In short, the was difference when moving from caffe (which implements weight decay) to keras (which implements L2). That led to different results on the same net architecture and same set of hyperparameters.
Link: https://bbabenko.github.io/weight-decay/
#DL #nn #hyperopt #hyperparams
🔗 weight decay vs L2 regularization
one popular way of adding regularization to deep learning models is to include a weight decay term in the updates. this is the same thing as adding an $L_2$ ...
In short, the was difference when moving from caffe (which implements weight decay) to keras (which implements L2). That led to different results on the same net architecture and same set of hyperparameters.
Link: https://bbabenko.github.io/weight-decay/
#DL #nn #hyperopt #hyperparams
🔗 weight decay vs L2 regularization
one popular way of adding regularization to deep learning models is to include a weight decay term in the updates. this is the same thing as adding an $L_2$ ...
bbabenko.github.io
weight decay vs L2 regularization
one popular way of adding regularization to deep learning models is to include a weight decay term in the updates. this is the same thing as adding an $L_2$ ...
Implementing a ResNet model from scratch.
Well-written and explained note on how to build and train a ResNet model from ground zero.
Link: https://towardsdatascience.com/implementing-a-resnet-model-from-scratch-971be7193718
#ResNet #DL #CV #nn #tutorial
🔗 Implementing a ResNet model from scratch. – Towards Data Science
A basic description of how ResNet works and a hands-on approach to understanding the state-of-the-art network.
Well-written and explained note on how to build and train a ResNet model from ground zero.
Link: https://towardsdatascience.com/implementing-a-resnet-model-from-scratch-971be7193718
#ResNet #DL #CV #nn #tutorial
🔗 Implementing a ResNet model from scratch. – Towards Data Science
A basic description of how ResNet works and a hands-on approach to understanding the state-of-the-art network.
Understanding Convolutional Neural Networks through Visualizations in PyTorch
Explanation of how #CNN works
Link: https://towardsdatascience.com/understanding-convolutional-neural-networks-through-visualizations-in-pytorch-b5444de08b91
#PyTorch #nn #DL
🔗 Understanding Convolutional Neural Networks through Visualizations in PyTorch
Getting down to the nitty-gritty of CNNs
Explanation of how #CNN works
Link: https://towardsdatascience.com/understanding-convolutional-neural-networks-through-visualizations-in-pytorch-b5444de08b91
#PyTorch #nn #DL
🔗 Understanding Convolutional Neural Networks through Visualizations in PyTorch
Getting down to the nitty-gritty of CNNs
Towards Data Science
Understanding Convolutional Neural Networks through Visualizations in PyTorch
Getting down to the nitty-gritty of CNNs