Monday, August 4, 2014

ML RESEARCHERS & TUTORIALS

Deep Learning and Unsupervised Deep Learning Scientists

Jürgen Schmidhuber (Swiss AI Lab IDSIA)
http://www.idsia.ch/~juergen/
http://lifeboat.com/ex/bios.juergen.schmidhuber
https://plus.google.com/100849856540000067209/posts
http://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber
https://chessprogramming.wikispaces.com/J%C3%BCrgen+Schmidhuber
https://www.linkedin.com/pub/j%C3%BCrgen-schmidhuber/72/268/392?trk=biz_employee_pub

https://innsbigdata.wordpress.com/2015/02/09/interview-with-juergen-schmidhuber/

Schmidhuber, J. (2015).
 Deep Learning in Neural Networks: An OverviewNeural Networks, 61, 85-117.

Abstract

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of theircredit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

Keywords

  • Deep learning
  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning
  • Evolutionary computation
videos:

SF ML meetup on August 11, 2014 in SF at upsight

DEEP LEARNING RESOURCES

gtu tech conf


https://www.youtube.com/watch?v=JSNZA8jVcm4
http://www.idsia.ch/~juergen/videos.html
http://www.kurzweilai.net/deep-learning-jurgen-schmidhuber-1
http://videolectures.net/jurgen_schmidhuber/

http://www.meetup.com/SF-Bayarea-Machine-Learning/events/198947462/
slides:
http://www.idsia.ch/~juergen/deep2014white.pdf


Jürgen Schmidhuber - Deep Learning and Artificial Intelligence
from Sep 12, 2014
https://www.youtube.com/watch?v=fam49iVeCqY

search: how to avoid local min neural network schmidhuber
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=how+to+avoid+local+min+neural+network+schmidhuber
A tutorial on training recurrent neural
networks, covering BPPT, RTRL, EKF and the 

"echo state network" approach 


Alex Graves

I'm a CIFAR Junior Fellow supervised by Geoffrey Hinton in the Department of Computer Science at the University of Toronto.

Research Interests

http://www.cs.toronto.edu/~graves/

RNN toolkit

Alex Graves released a toolbox(RNNLIB)
http://sourceforge.net/projects/rnnl/

RNN TUTORIAL
http://www.pdx.edu/sites/www.pdx.edu.sysc/files/Jaeger_TrainingRNNsTutorial.2005.pdf

A GENERAL METHOD FOR MULTI-AGENT REINFORCEMENT
LEARNING IN UNRESTRICTED ENVIRONMENTS

http://deeplearning.net/
http://deeplearning.net/datasets/


Deep Mind
Learning word embeddings efficiently with
noise-contrastive estimation
Andriy Mnih
DeepMind Technologies
andriy@deepmind.com
Koray Kavukcuoglu
DeepMind Technologies
koray@deepmind.com
https://www.cs.toronto.edu/~amnih/papers/wordreps.pdf

Tomas Mikolov
Facebook
https://research.fb.com/people/mikolov-tomas/

Latest Publications

Advances in Pre-Training Distributed Word Representations
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin
LREC 2018 - May 7, 2018


Efficient Large-Scale Multi-Modal Classification
Douwe Kiela, Edouard Grave, Armand Joulin, Tomas Mikolov
AAAI 2018 - February 2, 2018


Enriching Word Vectors with Subword Information
Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov
 -

Andrej Karpathy
Stanford
http://cs.stanford.edu/people/karpathy/

RNNLM and Convolutional NN

Andrej Karpathy blog
http://karpathy.github.io/

The Unreasonable Effectiveness of Recurrent Neural NetworksMay 21, 2015
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

corresponding GITHub code
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

https://github.com/karpathy/char-rnn
Deep Visual-Semantic Alignments for Generating Image Descriptions
http://cs.stanford.edu/people/karpathy/deepimagesent/

Thomas Breuel, Volkmar Frinken, Marcus Liwicki
LSTM RNN Tutorial 2013

Building Fast High-Performance Recognition Systems with Recurrent Neural Networks and LSTM

http://lstm.iupr.com/
Resources
For the tutorial slides, please go to the Files section
Recommended mplementations:

RNNLIB - the original C++ library implementing LSTM and many of the ideas about LSTM
JANNlab - Java-based implementation of 1D and BLSTM, no CTC
OCRopus - Python-based implementation of 1D and BLSTM, with CTC (the implementation is in lstm.py; here is an example of using lstm.py).
For other implementations mentioned in the tutorial, please contact us.

Geoffrey Hinton

A. Krizhevsky,
I. Sutskever,
G. E. Hinton

Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).
Google Scholar

http://papers.nips.cc/paper/4824-imagenet-classification-w



Ilya Sutskever

RNN LSTM
Sequence to Sequence Learning with Neural Networks
 Parallelization A C++ implementation of deep LSTM with the configuration from the previous section on a single GPU processes a speed of approximately 1,700 words per second. This was too slow for our purposes, so we parallelized our model using an 8-GPU machine. Each layer of the LSTM was executed on a different GPU and communicated its activations to the next GPU / layer as soon as they were computed. Our models have 4 layers of LSTMs, each of which resides on a separate GPU. The remaining 4 GPUs were used to parallelize the softmax, so each GPU was responsible for multiplying by a 1000 × 20000 matrix. The resulting implementation achieved a speed of 6,300 (both English and French) words per second with a minibatch size of 128. Training took about a ten days with this implementation.

director of Open AI
http://www.cs.toronto.edu/~ilya/


Yan LeCun

Y. LeCun,
Y. Bengio,
G. Hinton

Deep learning. Nature 521, 436–444 (2015).





CrossRefMedlineGoogle Scholar
Alex Graves

V. Mnih,
K. Kavukcuoglu,
D. Silver,
A. A. Rusu,
J. Veness,
M. G. Bellemare,
A. Graves,
M. Riedmiller,
A. K. Fidjeland,
G. Ostrovski,
S. Petersen,
C. Beattie,
A. Sadik,
I. Antonoglou,
H. King,
D. Kumaran,
D. Wierstra,
S. Legg,
D. Hassabis

Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

CrossRefMedlineGoogle Scholar 

Nando de Freitas

Feb 26 2015 RNN with LSTM YouTube Oxford CS department lecture

Karol Gregor from Deep Mind - Google
2015
DRAW: A Recurrent Neural Network For Image Generation
http://arxiv.org/pdf/1502.04623v2.pdf

Yoshua Bengio

Video - Deep Learning -- Yoshua Bengio (Part 1)


Tsvi Achler MD/PhD

http://reason.cs.uiuc.edu/tsvi/
CV

Tutorial Video
http://reason.cs.uiuc.edu/tsvi/TutorialVideo.html
Email: achler@gmail.com

Technical Video for Optimizing Mind

https://www.youtube.com/watch?v=w4aoQUxqlZg&feature=youtu.be

https://www.youtube.com/watch?v=9LJred8R7DY

Tsvi Achler: What is the brain doing different from machine learning algorithms?


http://www.meetup.com/Cognitive-Computing-Enthusiasts/events/226666265/

Tsvi Achler has a unique background focusing on the neural mechanisms of recognition from a multidisciplinary perspective. He has done extensive work in theory and simulations, human cognitive experiments, animal neurophysiology experiments, and clinical training. He has an applied engineering background, has received bachelor degrees from UC Berkeley in Electrical Engineering, Computer Science and advanced degrees from University of Illinois at Urbana-Champaign in Neuroscience (PhD), Medicine (MD) and worked as a postdoc in Computer Science, and at Los Alamos National Labs, and IBM Research. He now heads his own startup Optimizing Mind whose goal is to provide the next generation of machine learning algorithms. Achler has a unique background focusing on the neural mechanisms of recognition from a multidisciplinary perspective. He has done extensive work in theory and simulations, human cognitive experiments, animal neurophysiology experiments, and clinical training. He has an applied engineering background, has received bachelor degrees from UC Berkeley in Electrical Engineering, Computer Science and advanced degrees from University of Illinois at Urbana-Champaign in Neuroscience (PhD), Medicine (MD) and worked as a postdoc in Computer Science, and at Los Alamos National Labs, and IBM Research. He now heads his own startup Optimizing Mind http://optimizingmind.com/ whose goal is to provide the next generation of machine learning algorithms.




"The origin of phenomena observed in brain studies such as oscillations and a speed-accuracy tradeoff remain unclear. It also remains unclear how the brain can be computationally flexible (quickly learn, modify, and use new patterns as it encounters them from the environment), and recall (reason with or describe recognizable patterns from memory). I study the brain from multidisciplinary perspectives looking for a single, compact network that can display these phenomena and perform flexible recognition.




Virtually all popular models of the brain and algorithms of machine learning remain “feedforward” even though it has been clear since the early days that this may limit flexibility (and is not optimal for recall, symbolic reasoning, or analysis). Feedforward methods use optimized weights to perform recognition. In feedforward networks “uniqueness information” is encoded into weights based on the frequency of occurrence found in the training set. This requires optimizing weights over the whole training set.




Instead, I suggest uniqueness is estimated during recognition, by performing optimization on the current pattern that is being recognized. This is NOT optimization to learn weights, instead optimization to perform recognition. Subsequently, only simple Hebbian-like relational learning is required during learning without any uniqueness information. The weights are no longer “feedforward” but learning is more flexible and can be much faster (>>100x), especially for big data since it does not require elaborate rehearsal. From a phenomenological perspective, the optimization during recognition displays general properties observed in brain and cognitive experiments, predicting, oscillations, initial bursting with unrecognized patterns, and speed-accuracy tradeoff.




I will compare computational and cognitive properties of both approaches and discuss the state of new research initiatives."










Ruslan Salakhutdinov

DEPARTMENT OF COMPUTER SCIENCE AND STATISTIC

http://www.cs.toronto.edu/~rsalakhu/




https://www.sciencemag.org/content/350/6266/1332.full


Science 11 December 2015:
Vol. 350 no. 6266 pp. 1332-1338
DOI: 10.1126/science.aab3050
RESEARCH ARTICLE




Human-level concept learning through probabilistic program induction

Brenden M. Lake1,*,
Ruslan Salakhutdinov2,
Joshua B. Tenenbaum3
1Center for Data Science, New York University, 726 Broadway, New York, NY 10003, USA.
2Department of Computer Science and Department of Statistics, University of Toronto, 6 King's College Road, Toronto, ON M5S 3G4, Canada.
3Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
*Corresponding author. E-mail: brenden{at}nyu.edu

Handwritten characters drawn by a model


Not only do children learn effortlessly, they do so quickly and with a remarkable ability to use what they have learned as the raw material for creating new stuff. Lake et al. describe a computational model that learns in a similar fashion and does so better than current deep learning algorithms. The model classifies, parses, and recreates handwritten characters, and can generate new letters of the alphabet that look “right” as judged by Turing-like tests of the model's output in comparison to what real humans produce.

In the Spring of 2016, I will be moving to the Machine Learning Department at Carnegie Mellon University. I am looking for strong PhD students, please apply to CMU if you are interested in working with me.

I am an assistant professor of Computer Science and Statistics at the University of Toronto. I work in the field of statistical machine learning (See my CV.) I received my PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at MIT, I joined the University of Toronto in 2011.


My research interests include Deep Learning, Probabilistic Graphical Models, and Large-scale Optimization.


Prospective students: Please read this to ensure that I read your email.
Recent Research Highlights:
See our recent Deep Learning Tutorial in Montreal:
Part 1:[Slides (pdf)], [Video]
Part 2:[Slides (pdf)], [Video]
See our recent Deep Learning Tutorial at KDD 2014: [Video], [ Slides].
Check out our new website with demos and software.
I was helping to run Thematic Program on Statistical Inference, Learning, and Big Data at the Fields Institute.
I am teaching an advanced Machine Learning course at the Fields Institute. Videos of my lectures will be available online. Also, check out Live Streaming of my course.


Adam Coates (Director, Baidu)


10 Billion Parameter Neural Networks in Your Basement

Papers:

Deep learning with COTS HPC systems

http://stanford.edu/~acoates/papers/CoatesHuvalWangWuNgCatanzaro_icml2013.pdf




Alex (Sandy) Pentland, MIT

http://www.theverge.com/2014/5/6/5661318/the-wizard-alex-pentland-father-of-the-wearable-computer








Prof. Michael Jordan, who is one the authors of Latent Dirichlet Allocation, among others:


August 20, 2014 at 18:30 AM Pacific, Yelp, SF:

http://www.meetup.com/sfmachinelearning/

recording

https://www.youtube.com/watch?v=zdavG9xbVp0&feature=youtu.be




http://www.meetup.com/SF-Bayarea-Machine-Learning/



AMPLabs

TUPAQ
http://www.datasciencecentral.com/profiles/blogs/tupaq-automating-model-search-for-large-scale-machine-learning
Automating Model Search for Large Scale Machine Learning 
Evan R. Sparks Computer Science Division UC Berkeley sparks@cs.berkeley.edu Ameet Talwalkar Computer Science Dept. UCLA ameet@cs.ucla.edu Daniel Haas Computer Science Division UC Berkeley dhaas@cs.berkeley.edu Michael J. Franklin Computer Science Division UC Berkeley franklin@cs.berkeley.edu Michael I. Jordan Computer Science Division UC Berkeley jordan@cs.berkeley.edu Tim Kraska Dept. of Computer Science Brown University tim kraska@brown.edu 
Abstract 
The proliferation of massive datasets combined with the development of sophisticated analytical techniques has enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces. A major obstacle to supporting these predictive applications is the challenging and expensive process of identifying and training an appropriate predictive model. Recent efforts aiming to automate this process have focused on single node implementations and have assumed that model training itself is a black box, limiting their usefulness for applications driven by large-scale datasets. In this work, we build upon these recent efforts and propose an architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyperparameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation. The result is TUPAQ, a component of the MLbase system that automatically finds and trains models for a user’s predictive application with comparable quality to those found using exhaustive strategies, but an order of magnitude more efficiently than the standard baseline approach. TUPAQ scales to models trained on Terabytes of data across hundreds of machines.
http://www.datascienceassn.org/sites/default/files/Automating%20Model%20Search%20for%20Large%20Scale%20Machine%20Learning.pdf





MLI: An API for Distributed Machine Learning

Evan R. Sparks Ameet Talwalkar Virginia Smith

Jey Kottalam

Xinghao Pan

Joseph Gonzalez Michael J. Franklin Michael I. Jordana Tim Kraska

University of California, Berkeley Brown University




AMPCAMP November 2014

http://ampcamp.berkeley.edu/5/?utm_source=AMP+Camp+Wait+List+and+Abandonded+Registrations&utm_campaign=ae6e8c94fd-AMP_Camp_5_Slides_and_video_12_10_2014&utm_medium=email&utm_term=0_8a10332e0b-ae6e8c94fd-215777105







DEEP LEARNING, PROBABILISTIC PROGRAMMING, PARALLEL LEARNING & MORE
http://www.next.ml/
Jeff Risberg - former Tibco executive, currently startup mentor and investor


spark and mllib training material
http://therisbergfamily.com/


https://github.com/JeffRisberg













Sebastian Thrun
http://robots.stanford.edu/





Prof. C.J. Lin:
"Large-scale linear classification: status and challenges"
2014-10-30


https://www.youtube.com/watch?v=GCIJP0cLSmU&feature=youtu.be






Richard Zemel

Professor


Dept. of Computer Science

University of Toronto

http://www.phoenixhollo.com/en/Zemel_1.html





Jeremy Howard

http://www.enlitic.com/


good website design - that's how it's done.


home page of Jeremy Howard, President and Chief Scientist of Kaggle, founder of FastMail.FM (sold to Opera in May 2010), and a co-founder of The Optimal Decisions Group (sold to ChoicePoint in Feb 2008).


http://jhoward.fastmail.fm.user.fm/

https://www.linkedin.com/profile/view?id=54272



















































































































Co-founder
The Optimal Decisions Group June 1999 – August 2008 (9 years 3 months)




I came up with the idea for Optimal Decisions Group (http://www.optimaldecisions.com) and worked with my university friend (and math guru) Bruce Davey to turn it into a business. The idea was to move insurance pricing from the risk-minimization approach used up until that time, to a profit-maximization approach (incorporating price elasticity, competitor prices, multi-period simulations, and so forth). The idea turned out to work really well in practice, and Optimal Decisions built a strong presence in the US, UK, and Australia. After nearly 10 years of constant growth I sold the company to ChoicePoint, Today the product is sold as "LexisNexis Optimal Decisions Toolkit".




New medical startup with


Rebecca Weiss

https://www.linkedin.com/profile/view?id=16135206





We're looking to apply machine learning to medical diagnostics. Deep learning for medical imaging will be a key component.


"We're looking for additional data/product partners, healthcare advisors, and potential recruits with very strong applied numerical computing skills (particularly linear algebra, convex optimization, GPU programming, and computer vision)."



Datascience Journal Meetup Participants


Michael Rinehart


https://www.linkedin.com/profile/view?id=18146056


Principal Scientist at Elastica






Priya Desai



https://www.linkedin.com/profile/view?id=10882596


Data Scientist-Algorithms at Stanford University, School of Medicine


Arno Candel (oxdata)

http://www.slideshare.net/0xdata/deep-learning-through-examples



Bill MacCartney


http://nlp.stanford.edu/~wcmac/



Jure Leskovec

https://cs.stanford.edu/people/jure/pubs/




Sebastian Thrun

http://robots.stanford.edu/
Steve Omohundro

http://steveomohundro.com/

http://possibilityresearch.com/

http://selfawaresystems.com/





James Kobielus, columnist, IBM


http://www.infoworld.com/author/James-Kobielus/




Dan Rice

Cognitive/Machine Learning Scientist - Rice Analytics/SkyRELR.com; Calculus of Thought (Elsevier: Academic Press, 2014)

Top Contributor




https://www.linkedin.com/groups/Machine-learning-When-data-scientists-35222.S.5859267227743711236





Root Cause Faster with Data Analytics - Webinar


Join Gary Brandt, HP Global IT Functional Architect, to learn how HP IT incorporates best operational practices to collect and analyze structured and unstructured data using big data analytics at enterprise scale.

Found in 30 minutes: How HP IT used Operations Analytics for rapid root cause analysis


http://h30499.www3.hp.com/t5/Business-Service-Management-BAC/Found-in-30-minutes-How-HP-IT-used-Operations-Analytics-for/ba-p/6574864#.VC4wzitdV9k


Awesome RNN

http://jiwonkim.org/awesome-rnn/
Code

Theano - Python

Simple IPython tutorial on Theano
Deep Learning Tutorials
RNN for semantic parsing of speech
LSTM network for sentiment analysis
Pylearn2 : Library that wraps a lot of models and training algorithms in deep learning
Blocks : modular framework that enables building neural network models
Keras : Theano-based deep learning library similar to Torch, but in Python
Lasagne : Lightweight library to build and train neural networks in Theano
theano-rnn by Graham Taylor
Passage : Library for text analysis with RNNs
Theano-Lights : Contains many generative models

Caffe - C++ with MATLAB/Python wrappers

LRCN by Jeff Donahue

Torch - Lua

char-rnn by Andrej Karpathy : multi-layer RNN/LSTM/GRU for training/sampling from character-level language models
LSTM by Wojciech Zaremba : Long Short Term Memory Units to train a language model on word level Penn Tree Bank dataset
Oxford by Nando de Freitas : Oxford Computer Science - Machine Learning 2015 Practicals
rnn by Nicholas Leonard : general library for implementing RNN, LSTM, BRNN and BLSTM (highly unit tested).

Etc.

Neon: new deep learning library in Python, with support for RNN/LSTM, and a fast image captioning model
Brainstorm: deep learning library in Python, developed by IDSIA, thereby including various recurrent structures
Chainer : new, flexible deep learning library in Python
CGT(Computational Graph Toolkit) : replicates Theano's API, but with very short compilation time and multithreading
RNNLIB by Alex Graves : C++ based LSTM library
RNNLM by Tomas Mikolov : C++ based simple code

https://github.com/yandex/faster-rnnlm
faster-RNNLM of Yandex : C++ based rnnlm implementation aimed to handle huge datasets

neuraltalk by Andrej Karpathy : numpy-based RNN/LSTM implementation
gist by Andrej Karpathy : raw numpy code that implements an efficient batched LSTM
Recurrentjs by Andrej Karpathy : a beta javascript library for RNN

my search - 
RNN LSTM on GITHUB

JAVA

Munich Ph.D. Java 2012-2014
BitBucket
https://bitbucket.org/dmonner/xlbp
http://www.cs.umd.edu/~dmonner/papers/nn2012.pdf
http://www.overcomplete.net/
XLBP README
Derek Monner, http://www.cs.umd.edu/~dmonner

XLBP stands for eXtensible Localized Back-Propagation. It is a toolkit for 
building neural networks for use with the LSTM-g training method, which is a 
generalized (-g) descendant of LSTM (the Long Short Term Memory) and of error 
back-propagation methods in general. It can build and train arbitrarily complex 
networks of neurons that can not only add but multiply inputs and save state 
across time. For more information about LSTM-g, see the following paper (also 
available at the project website):

D. Monner and J.A. Reggia (2012). A generalized LSTM-like training algorithm 
for second-order recurrent neural networks. Neural Networks, 25, pp 70-83. 
Available at http://www.cs.umd.edu/~dmonner/papers/nn2012.pdf

XLBP is released under the GNU General Public License, version 3. For more 
information on your rights and responsibilities under this license, see the 
file LICENSE.


INSTALLATION

This XLBP repository doubles as a valid Java project which you can import into 
the Eclipse IDE. This is the recommended way to compile and run XLBP.

XLBP requires Java 6 or above.


USAGE

For a quick start on using XLBP for the most common applications, see the file 
"tutorial.pdf" in the top level of the source tree.

old
https://github.com/evolvingstuff/LongShortTermMemory

java implementation of old Alex Graves C++ RNN LSTM toolkit
http://deeplearning4j.org/recurrentnetwork.html

https://github.com/deeplearning4j/dl4j-0.4-examples/blob/master/src/main/java/org/deeplearning4j/examples/rnn/GravesLSTMCharModellingExample.java

CUDA enabled C++
CURRENNT
http://sourceforge.net/projects/currennt/

TORCH LUA
https://github.com/stanfordnlp/treelstm/tree/master/sentiment

Various
https://www.reddit.com/r/MachineLearning/comments/2j7ytz/whats_the_best_library_out_there_for/


Stat212b: Topics Course on Deep Learning
by Joan Bruna, UC Berkeley, Stats Department. Spring 2016.
Topics in Deep Learning

http://joanbruna.github.io/stat212b/

This topics course aims to present the mathematical, statistical and computational challenges of building stable representations for high-dimensional data, such as images, text and data. We will delve into selected topics of Deep Learning, discussing recent models from both supervised and unsupervised learning. Special emphasis will be on convolutional architectures, invariance learning, unsupervised learning and non-convex optimization.

Richard Socher

CS224D Lecture 7 - Introduction to TensorFlow (19th Apr 2016)
https://www.youtube.com/watch?v=L8Y2_Cq2X5s&feature=youtu.be

NVIDIA TensorRT
High performance deep learning inference for production deployment
https://developer.nvidia.com/tensorrt
















No comments:

Post a Comment