TOMAS MIKOLOV
2012
RNNLM Toolkit
by Tomas Mikolov, 2010-2012http://www.rnnlm.org/
Introduction
Neural network based language models are nowadays among the most successful techniques for statistical language modeling. The 'rnnlm' toolkit can be used to train, evaluate and use such models.The goal of this toolkit is to speed up research progress in the language modeling field. First, by providing useful implementation that can demonstrate some of the principles. Second, for the empirical experiments when used in speech recognition and other applications. And finally third, by providing a strong state of the art baseline results, to which future research that aims to "beat state of the art techniques" should compare to.
Download
rnnlm-0.1h - some older version of the toolkitrnnlm-0.2b
rnnlm-0.2c
rnnlm-0.3b
rnnlm-0.3c
rnnlm-0.3d
rnnlm-0.3e
rnnlm-0.4b - latest version of the toolkit
my notes:
written in C
Uses stochastic gradient descent
One hidden layer
One compression layer
Uses soft max
Needs srilm installed for n-gram model to work. n-gram model is used for comparison with rnnlm model in example.sh
Created in 2012 when srilm 1.6.0 was released
Current SRILM version (2015) is 1.7.1
srilm links:
http://www.speech.sri.com/projects/srilm/
http://www.speech.sri.com/projects/srilm/download.html
SRILM Installation and Running Tutorial
srilm-1.6.0.tar.gz - Google Code
installing srilm 1.6.1beta on macOS
Differences between L1 and L2 as Loss Function and Regularization
http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/
TEXT CLASSIFICATION FOR SENTIMENT ANALYSIS – STOPWORDS AND COLLOCATIONS
Bayesian classifier
http://www1.icsi.berkeley.edu/~wooters/SRILM/index.html
Basic examples - very useful for quick introduction (training, evaluation, hyperparameter selection, simple n-best list rescoring, etc.) - 35MB
Advanced examples - includes large scale experiments with speech lattices (n-best list rescoring, ...) - 235MB, by Stefan Kombrink
Slides from my presentation at Google - pdf
RNNLM is now integrated into Kaldi toolkit! Check this.
Example of data generated by 4-gram language model, by RNN model and by RNNME model (all models are trained on Broadcast news data, 400M/320M words) - check which generated sentences are easier to read!
Word projections from RNN-80 and RNN-640 models trained on Broadcast news data + tool for computing the closest words. (extra large 1600-dimensional features from 3 models are here)
https://github.com/vimal-manohar91/kaldi-git/tree/master/tools/rnnlm-hs-0.1b
SOURCE CODE
LSTM source code of Felix Gers (ex-IDSIA)
LSTM source code in the PDP++ software
Understanding LSTM Networks
Posted on August 27, 2015
colah's blog
Basic examples - very useful for quick introduction (training, evaluation, hyperparameter selection, simple n-best list rescoring, etc.) - 35MB
Advanced examples - includes large scale experiments with speech lattices (n-best list rescoring, ...) - 235MB, by Stefan Kombrink
Slides from my presentation at Google - pdf
RNNLM is now integrated into Kaldi toolkit! Check this.
Example of data generated by 4-gram language model, by RNN model and by RNNME model (all models are trained on Broadcast news data, 400M/320M words) - check which generated sentences are easier to read!
Word projections from RNN-80 and RNN-640 models trained on Broadcast news data + tool for computing the closest words. (extra large 1600-dimensional features from 3 models are here)
Frequently asked questions
FAQ archive------------------------------------------------------------------------------------------------------------------------
6. Something fails! What should I do?
------------------------------------------------------------------------------------------------------------------------
- compilation: the code should be easy to compile with any c++ compiler, let us know if you experience any problems
- if the 'example.sh' fails, check if you have installed SRILM tools (if the combination of models fails)
Known bugs:
- with MSDOS end of line encoding, the rnnlm tool works incorrectly; use 'dos2unix'
- empty lines: SRILM skips empty lines, while rnnlm does not; thus better remove all empty lines from test sets, if
scores from rnnlm and SRILM tools are to be combined
- other CPU architectures than x86: the FAST_EXP() macro from rnnlmlib.cpp might fail; in such case, use normal call
to exp()
------------------------------------------------------------------------------------------------------------------------
7. Where can I get more information about the 'recurrent neural network based language model'?
------------------------------------------------------------------------------------------------------------------------
First check the examples on the webpage:
http://www.fit.vutbr.cz/~imikolov/rnnlm/
Contact:
email: tmikolov@gmail.com
References:
[1] Mikolov, T., Karafiát, M., Burget, L., Èernocký, J., Khudanpur, S.: Recurrent neural network based language model, In:
Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010),
Makuhari, Chiba, JP, ISCA, 2010
http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf
[2] Mikolov, T., Kombrink, S., Burget, L., Èernocký, J., Khudanpur, S.: EXTENSIONS OF RECURRENT NEURAL NETWORK LANGUAGE MODEL,
In: Proc. ICASSP 2011
https://scholar.google.com/citations?view_op=view_citation&hl=en
[3] Mikolov, T., Deoras, A., Kombrink, S., Burget, L., Èernocký: Empirical Evaluation and Combination of Advanced
Language Modeling Techniques, submitted to Interspeech 2011
http://research.microsoft.com/pubs/175560/InterSpeech-2011.PDF
Recurrent tweets Project presentation
Mathias Berglund, Petri Kyröläinen, Yu Shen December 9, 2013
http://research.ics.aalto.fi/cog/langtech13/2013-12-09_Recurrent_Tweets.pdfRNNLM-HS: fast recurrent nnet language model; WSJ example
https://github.com/vimal-manohar91/kaldi-git/tree/master/tools/rnnlm-hs-0.1b
****************************************************************************
Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
All the details that did not make it into the papers, more results on additional taks.
Mikolov Tomáš, Sutskever Ilya, Deoras Anoop, Le Hai-Son, Kombrink Stefan, Černocký Jan: Subword Language Modeling with Neural Networks. Not published (rejected from ICASSP 2012).
Using subwords as basic units for RNNLMs has several advantages: no OOV rate, smaller model size and better speed. Just split the infrequent words into subword units.
Mikolov Tomáš, Deoras Anoop, Povey Daniel, Burget Lukáš, Černocký Jan: Strategies for Training Large Scale Neural Network Language Models, In: Proceedings of ASRU 2011
How to train RNN LM on a single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup.
Mikolov Tomáš, Kombrink Stefan, Deoras Anoop, Burget Lukáš, Černocký Jan: RNNLM - Recurrent Neural Network Language Modeling Toolkit, In: ASRU 2011 Demo Session
Brief description of the RNN LM toolkit that is available on this website.
Mikolov Tomáš, Deoras Anoop, Kombrink Stefan, Burget Lukáš, Černocký Jan: Empirical Evaluation and Combination of Advanced Language Modeling Techniques, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.
Kombrink Stefan, Mikolov Tomáš, Karafiát Martin, Burget Lukáš: Recurrent Neural Network based Language Modeling in Meeting Recognition, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Easy way how to adapt RNN LM + speedup tricks for rescoring (can be faster than 0.05 RT)
Deoras Anoop, Mikolov Tomáš, Kombrink Stefan, Karafiát Martin, Khudanpur Sanjeev: Variational Approximation of Long-span Language Models for LVCSR, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
RNN LM can be approximated by n-gram model, and used directly in the decoder at no compuational cost.
Mikolov Tomáš, Kombrink Stefan, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Extensions of Recurrent Neural Network Language Model, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
Better results by using Backpropagation throught time and better speed by using classes.
Mikolov Tomáš, Karafiát Martin, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Recurrent neural network based language model, In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, JP
We show that RNN LM can be trained just by simple backpropagation, despite the popular beliefs.
Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
All the details that did not make it into the papers, more results on additional taks.
Mikolov Tomáš, Sutskever Ilya, Deoras Anoop, Le Hai-Son, Kombrink Stefan, Černocký Jan: Subword Language Modeling with Neural Networks. Not published (rejected from ICASSP 2012).
Using subwords as basic units for RNNLMs has several advantages: no OOV rate, smaller model size and better speed. Just split the infrequent words into subword units.
Mikolov Tomáš, Deoras Anoop, Povey Daniel, Burget Lukáš, Černocký Jan: Strategies for Training Large Scale Neural Network Language Models, In: Proceedings of ASRU 2011
How to train RNN LM on a single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup.
Mikolov Tomáš, Kombrink Stefan, Deoras Anoop, Burget Lukáš, Černocký Jan: RNNLM - Recurrent Neural Network Language Modeling Toolkit, In: ASRU 2011 Demo Session
Brief description of the RNN LM toolkit that is available on this website.
Mikolov Tomáš, Deoras Anoop, Kombrink Stefan, Burget Lukáš, Černocký Jan: Empirical Evaluation and Combination of Advanced Language Modeling Techniques, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.
Kombrink Stefan, Mikolov Tomáš, Karafiát Martin, Burget Lukáš: Recurrent Neural Network based Language Modeling in Meeting Recognition, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Easy way how to adapt RNN LM + speedup tricks for rescoring (can be faster than 0.05 RT)
Deoras Anoop, Mikolov Tomáš, Kombrink Stefan, Karafiát Martin, Khudanpur Sanjeev: Variational Approximation of Long-span Language Models for LVCSR, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
RNN LM can be approximated by n-gram model, and used directly in the decoder at no compuational cost.
Mikolov Tomáš, Kombrink Stefan, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Extensions of Recurrent Neural Network Language Model, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
Better results by using Backpropagation throught time and better speed by using classes.
Mikolov Tomáš, Karafiát Martin, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Recurrent neural network based language model, In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, JP
We show that RNN LM can be trained just by simple backpropagation, despite the popular beliefs.
*******
methods for learning vector space representations of words:
Distributed Representations of Words and Phrases and their Compositionality
https://arxiv.org/abs/1310.4546
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean
(Submitted on 16 Oct 2013)
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean
(Submitted on 16 Oct 2013)
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible
Distributed Representations of Sentences and Documents
Quoc Le QVL@GOOGLE.COM
Tomas Mikolov TMIKOLOV@GOOGLE.COM
Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043
Abstract
Many machine learning algorithms require the
input to be represented as a fixed-length feature
vector. When it comes to texts, one of the most
common fixed-length features is bag-of-words.
Despite their popularity, bag-of-words features
have two major weaknesses: they lose the ordering
of the words and they also ignore semantics
of the words. For example, “powerful,” “strong”
and “Paris” are equally distant. In this paper, we
propose Paragraph Vector, an unsupervised algorithm
that learns fixed-length feature representations
from variable-length pieces of texts, such as
sentences, paragraphs, and documents. Our algorithm
represents each document by a dense vector
which is trained to predict words in the document.
Its construction gives our algorithm the
potential to overcome the weaknesses of bag-of-words
models. Empirical results show that Paragraph
Vectors outperform bag-of-words models
as well as other techniques for text representations.
Finally, we achieve new state-of-the-art results
on several text classification and sentiment
analysis tasks.
GloVe: Global Vectors for Word Representation
http://nlp.stanford.edu/projects/glove/
Jeffrey Pennington, Richard Socher, Christopher D. Manning
Introduction
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
GloVe: Global Vectors for Word Representation
http://nlp.stanford.edu/pubs/glove.pdf
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014
Abstract
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.
Details of intrinsic word vector evaluation
Word vector analogies Semantic and Syntactic examples from
https://code.google.com/archive/p/word2vec/source/default/source
http://code.google.com/p/word2vec/source/browse/trunk/questions-words.txt
****************************************************************************GloVe: Global Vectors for Word Representation
http://nlp.stanford.edu/projects/glove/
Jeffrey Pennington, Richard Socher, Christopher D. Manning
Introduction
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
GloVe: Global Vectors for Word Representation
http://nlp.stanford.edu/pubs/glove.pdf
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014
Abstract
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.
Details of intrinsic word vector evaluation
Word vector analogies Semantic and Syntactic examples from
https://code.google.com/archive/p/word2vec/source/default/source
word2vec/trunk/questions-words.txt | 589.8KB |
SVN hosting has been permanently disabled.
2014-2015
[PDF] Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews
G Mesnil, MA Ranzato, T Mikolov, Y Bengio - arXiv preprint arXiv:1412.5335, 2014
Abstract: Sentiment analysis is a common task in natural language processing that aims to
detect polarity of a text document (typically a consumer review). In the simplest settings, we
discriminate only between positive and negative sentiment, turning the task into a ...
http://arxiv.org/pdf/1412.5335.pdf
https://github.com/mesnilgr/nbsvm
my notes
use https to clone git repository
git clone https://github.com:mesnilgr/iclr15
//git clone git@github.com:mesnilgr/iclr15.git
cd iclr15;
chmod +x oh_my_go.sh
./oh_my_go.sh
This code has been tested on Ubuntu and Fedora. Compilation of word2vec on OSX seems to be an issue
my notes:G Mesnil, MA Ranzato, T Mikolov, Y Bengio - arXiv preprint arXiv:1412.5335, 2014
Abstract: Sentiment analysis is a common task in natural language processing that aims to
detect polarity of a text document (typically a consumer review). In the simplest settings, we
discriminate only between positive and negative sentiment, turning the task into a ...
http://arxiv.org/pdf/1412.5335.pdf
Code to reproduce Mesnil experiments is available at
https://github.com/mesnilgr/iclr15https://github.com/mesnilgr/nbsvm
my notes
use https to clone git repository
git clone https://github.com:mesnilgr/iclr15
//git clone git@github.com:mesnilgr/iclr15.git
cd iclr15;
chmod +x oh_my_go.sh
./oh_my_go.sh
This code has been tested on Ubuntu and Fedora. Compilation of word2vec on OSX seems to be an issue
my env
1. alleged patch to allow rnnlm build on macos - doesn't fix exp10 issue
https://gist.github.com/tpeng/9020592
patch makefile:
1. alleged patch to allow rnnlm build on macos - doesn't fix exp10 issue
https://gist.github.com/tpeng/9020592
patch makefile:
-----------------------------------------------------------------------------------------------------------------
#CC = x86_64-linux-g++-4.6
CC = llvm-gcc
WEIGHTTYPE = float
CFLAGS = -D WEIGHTTYPE=$(WEIGHTTYPE) -lm -O2 -Wall -funroll-loops -ffast-math -lstdc++
#CFLAGS = -D WEIGHTTYPE=$(WEIGHTTYPE) -lm -O2 -Wall -funroll-loops -ffast-math
#CFLAGS = -lm -O2 -Wall
all: rnnlmlib.o rnnlm
rnnlmlib.o : rnnlmlib.cpp
$(CC) $(CFLAGS) $(OPT_DEF) -c rnnlmlib.cpp
rnnlm : rnnlm.cpp
$(CC) $(CFLAGS) $(OPT_DEF) rnnlm.cpp rnnlmlib.o -o rnnlm
clean:
rm -rf *.o rnnlm
-------------------------------------------------------------------------------------------------------------------
really good advice at
http://dev.libqxt.org/libqxt-old-hg/issue/156/problem-with-llrint-exp10-not-declared-in
----------------------------------------------------------------------------------------------------------------------
Ok. I looked into this further. It turns out my math.h function in Mac OSX does not have exp10. I changed line 269 to use pow(x,y) instead of exp10(x). So
//qlonglong modv = llrint(exp10(4-n));
#CC = x86_64-linux-g++-4.6
CC = llvm-gcc
WEIGHTTYPE = float
CFLAGS = -D WEIGHTTYPE=$(WEIGHTTYPE) -lm -O2 -Wall -funroll-loops -ffast-math -lstdc++
#CFLAGS = -D WEIGHTTYPE=$(WEIGHTTYPE) -lm -O2 -Wall -funroll-loops -ffast-math
#CFLAGS = -lm -O2 -Wall
all: rnnlmlib.o rnnlm
rnnlmlib.o : rnnlmlib.cpp
$(CC) $(CFLAGS) $(OPT_DEF) -c rnnlmlib.cpp
rnnlm : rnnlm.cpp
$(CC) $(CFLAGS) $(OPT_DEF) rnnlm.cpp rnnlmlib.o -o rnnlm
clean:
rm -rf *.o rnnlm
-------------------------------------------------------------------------------------------------------------------
really good advice at
http://dev.libqxt.org/libqxt-old-hg/issue/156/problem-with-llrint-exp10-not-declared-in
----------------------------------------------------------------------------------------------------------------------
Ok. I looked into this further. It turns out my math.h function in Mac OSX does not have exp10. I changed line 269 to use pow(x,y) instead of exp10(x). So
//qlonglong modv = llrint(exp10(4-n));
//this line fails compilation in Mac OSX10.6.8 - no exp10 function
qlonglong modv = llrint(pow((double)10,(4-n)));
//this should work. Modified to use Mac's existing function library
-----------------------------------------------------------------------------------------------------------------------
i.e.
fprintf(flog, "PPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));
in my case -
--------------------
after replacement, to prevent ovewriting during script execution - change rnnlm.sh script:
#IR commenting out lines downloading new code because we need to preserve changed rnnlmlib.cpp with replaced exp10 failing the build
#wget http://www.fit.vutbr.cz/~imikolov/rnnlm/rnnlm-0.3e.tgz
#tar -xvf rnnlm-0.3e.tgz
[PDF] Learning Longer Memory in Recurrent Neural Networks
T Mikolov, A Joulin, S Chopra, M Mathieu, MA Ranzato - arXiv preprint arXiv: …, 2014
Abstract: Recurrent neural network is a powerful model that learns temporal patterns in
sequential data. For a long time, it was believed that recurrent networks are difficult to train
using simple optimizers, such as stochastic gradient descent, due to the so-called ...
http://arxiv.org/pdf/1412.7753.pdf
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling (Chelba et al, 2014)
http://arxiv.org/pdf/1312.3005v3.pdf
Summary: NN, RNN, RNNME • RNN outperforms FNN on language modeling tasks, both are better than n-grams • The question “are neural nets better than n-grams” is incomplete: the best solution is to use both • Joint training of RNN and maxent with n-gram features works great on large datasets
Maximum Entropy Modeling
http://homepages.inf.ed.ac.uk/lzhang10/maxent.html
Resources • Open-source neural-net based NLP software: RNNLM toolkit, word2vec and other tools • Links to large text corpora, pre-trained models • Benchmark datasets for advancing the state of the art Tomas Mikolov, COLING 2014 139 RNNLM toolkit • Available at rnnlm.org • Allows training of RNN and RNNME models • Extensions are actively developed, for example multi-threaded version with hierarchical softmax: http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/ Tomas Mikolov, COLING 2014 140 Word2vec • Available at https://code.google.com/p/word2vec/ • Tool for training the word vectors using CBOW and skip-gram architectures, supports both negative sampling and hierarchical softmax • Optimized for very large datasets (>billions of training words) • Includes links to models pre-trained on large datasets (100B words) Tomas Mikolov, COLING 2014 141 CSLM: Feedforward NNLM code • Continuous Space Language Model toolkit: http://www-lium.univ-lemans.fr/cslm/ • Implementation of feedforward neural network language model by Holger Schwenk Tomas Mikolov, COLING 2014 142 Other neural net SW • List available at http://deeplearning.net/software_links/ • Mostly focuses on general machine learning tools, not necessarily NLP Tomas Mikolov, COLING 2014 143 Large text corpora Short list available at the word2vec project: https://code.google.com/p/word2vec/#Where_to_obtain_the_training _data • Sources: Wikipedia dump, statmt.org, UMBC webbase corpus • Altogether around 8 billion words can be downloaded for free Tomas Mikolov, COLING 2014 144 Benchmark datasets (LMs, word vectors) • The Penn Treebank setup including the usual text normalization is part of the example archive at rnnlm.org • WSJ setup (simple ASR experiments, includes N-best lists): http://www.fit.vutbr.cz/~imikolov/rnnlm/kaldi-wsj.tgz • Datasets for measuring word / phrase similarity available at: 1. http://research.microsoft.com/enus/um/people/gzweig/Pubs/myz_naacl13_test_set.tgz 2. https://code.google.com/p/word2vec/source/browse/trunk/questionswords.txt 3. https://code.google.com/p/word2vec/source/browse/trunk/questionsphrases.txt Tomas Mikolov, COLING 2014 145 Final summary • Distributed word representations >= word classes • Neural nets >= logistic regression • Neural networks are useful statistical tool, but not the final solution to AI by themselves • Deep learning is an interesting research direction, but we need more research to understand how to learn complex patterns in language
Juergen Schmidhuber
https://plus.google.com/100849856540000067209/posts
Recent (2014) benchmark records in speech recognition and machine translation etc achieved with the help of deep Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs), often at major IT companies:
Vanishing gradients • As we propagate the gradients back in time, usually their magnitude decreases, and quickly approaches tiny values: this is called vanishing gradient • In practice this means that learning long term dependencies is difficult • Special architectures address this problem (Long Short-term Memory – LSTM RNN (Hochreiter & Schmidhuber, 1997))
http://people.idsia.ch/~juergen/rnn.html
http://people.idsia.ch/~juergen/oldrnn4.html
Why Use Recurrent Neural Networks? Why Use LSTM?
-----------------------------------------------------------------------------------------------------------------------
rnnlmlib.cpp:1651:40: error: use of undeclared identifier 'exp10'
fprintf(flog, "PPL net: %f\n", exp10(-logp/(real)wordcn));
replacing exp10(-logp/(real)wordcn) with pow((double)10, (-logp/(real)wordcn))
i.e.
fprintf(flog, "PPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));
in my case -
rnnlmlib.cpp:1651:40: error: use of undeclared identifier 'exp10'
fprintf(flog, "PPL net: %f\n", exp10(-logp/(real)wordcn));
replace with fprintf(flog, "PPL net: %f\n", pow((double)10, (-logp/(real)wordcn))); ^
rnnlmlib.cpp:1798:35: error: use of undeclared identifier 'exp10'
fprintf(flog, "\nPPL net: %f\n", exp10(-logp/(real)wordcn));
replace with fprintf(flog, "\nPPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));
^
rnnlmlib.cpp:1800:43: error: use of undeclared identifier 'exp10'
fprintf(flog, "PPL other: %f\n", exp10(-log_other/(real)wordcn));
replace with fprintf(flog, "PPL other: %f\n", pow((double)10, (-log_other/(real)wordcn)));
^
rnnlmlib.cpp:1801:45: error: use of undeclared identifier 'exp10'
fprintf(flog, "PPL combine: %f\n", exp10(-log_combine/(real)wordcn));
replace with fprintf(flog, "PPL combine: %f\n", pow((double)10, (-log_combine/(real)wordcn)));
^
rnnlmlib.cpp:1936:28: error: use of undeclared identifier 'exp10'
printf("\nPPL net: %f\n", exp10(-logp/(real)wordcn));
replace with printf("\nPPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));
^
rnnlmlib.cpp:1938:36: error: use of undeclared identifier 'exp10'
printf("PPL other: %f\n", exp10(-log_other/(real)wordcn));
replace with printf("PPL net: %f\n", pow((double)10, (-log_other/(real)wordcn)));
^
rnnlmlib.cpp:1939:38: error: use of undeclared identifier 'exp10'
printf("PPL combine: %f\n", exp10(-log_combine/(real)wordcn));
replace with printf("PPL combine: %f\n", pow((double)10, (-log_combine/(real)wordcn)));
^
7 errors generated.
--------------------
after replacement, to prevent ovewriting during script execution - change rnnlm.sh script:
#IR commenting out lines downloading new code because we need to preserve changed rnnlmlib.cpp with replaced exp10 failing the build
#wget http://www.fit.vutbr.cz/~imikolov/rnnlm/rnnlm-0.3e.tgz
#tar -xvf rnnlm-0.3e.tgz
----------------------
the problem now is at
mkdir: word2vec: File exists
Word2Vec
OSX 10 env errors debugging:
https://code.google.com/p/word2vec/issues/detail?id=17
http://coolestguidesontheplanet.com/install-and-configure-wget-on-os-x/
http://code.google.com/p/word2vec/issues/detail?id=1
Related code:
improved performance gensim word2vec
Deep learning via word2vec’s “skip-gram and CBOW models”, using either hierarchical softmax or negative sampling
http://radimrehurek.com/gensim/models/word2vec.html
the problem now is at
mkdir: word2vec: File exists
../iclr15/scripts/paragraph.sh: line 7: shuf: command not found
----------------------
command line shuffling util shuf is in linux coreutils and absent on Mac OS
http://superuser.com/questions/760732/randomly-shuffle-rows-in-a-large-text-file
http://brew.sh/
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
command:
brew install coreutils
location:
command line shuffling util shuf is in linux coreutils and absent on Mac OS
http://superuser.com/questions/760732/randomly-shuffle-rows-in-a-large-text-file
http://brew.sh/
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
command:
brew install coreutils
location:
/usr/local/bin/gshuf
----------------------
paragraph.sh
typos in commands. correct versions:
head -n 25000 sentence_vectors.txt | awk 'BEGIN{a=0;}{if (a<12500) printf "1 "; else printf "-1 "; for (b=1; b<NF; b++) printf b ":" $(b+1) " "; print ""; a++;}' > full-train.txt
head -n 50000 sentence_vectors.txt | tail -n 25000 | awk 'BEGIN{a=0;}{if (a<12500) printf "1 "; else printf "-1 "; for (b=1; b<NF; b++) printf b ":" $(b+1) " "; print ""; a++;}' > test.txt
--------------------------------------------
paragraph.sh
typos in commands. correct versions:
head -n 25000 sentence_vectors.txt | awk 'BEGIN{a=0;}{if (a<12500) printf "1 "; else printf "-1 "; for (b=1; b<NF; b++) printf b ":" $(b+1) " "; print ""; a++;}' > full-train.txt
head -n 50000 sentence_vectors.txt | tail -n 25000 | awk 'BEGIN{a=0;}{if (a<12500) printf "1 "; else printf "-1 "; for (b=1; b<NF; b++) printf b ":" $(b+1) " "; print ""; a++;}' > test.txt
bash-3.2$ ../iclr15/scripts/nbsvm.sh
Cloning into 'nbsvm'...
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
solution - use https:
#git clone git@github.com:mesnilgr/nbsvm.git
git clone https://github.com/mesnilgr/nbsvm.git
solution - use https:
#git clone git@github.com:mesnilgr/nbsvm.git
git clone https://github.com/mesnilgr/nbsvm.git
---------------------
see also -Word2Vec
OSX 10 env errors debugging:
https://code.google.com/p/word2vec/issues/detail?id=17
http://coolestguidesontheplanet.com/install-and-configure-wget-on-os-x/
http://code.google.com/p/word2vec/issues/detail?id=1
Related code:
improved performance gensim word2vec
Deep learning via word2vec’s “skip-gram and CBOW models”, using either hierarchical softmax or negative sampling
http://radimrehurek.com/gensim/models/word2vec.html
[PDF] Learning Longer Memory in Recurrent Neural Networks
T Mikolov, A Joulin, S Chopra, M Mathieu, MA Ranzato - arXiv preprint arXiv: …, 2014
Abstract: Recurrent neural network is a powerful model that learns temporal patterns in
sequential data. For a long time, it was believed that recurrent networks are difficult to train
using simple optimizers, such as stochastic gradient descent, due to the so-called ...
http://arxiv.org/pdf/1412.7753.pdf
Distributed Representations of Sentences and Documents
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling (Chelba et al, 2014)
http://arxiv.org/pdf/1312.3005v3.pdf
Summary: NN, RNN, RNNME • RNN outperforms FNN on language modeling tasks, both are better than n-grams • The question “are neural nets better than n-grams” is incomplete: the best solution is to use both • Joint training of RNN and maxent with n-gram features works great on large datasets
http://homepages.inf.ed.ac.uk/lzhang10/maxent.html
Resources • Open-source neural-net based NLP software: RNNLM toolkit, word2vec and other tools • Links to large text corpora, pre-trained models • Benchmark datasets for advancing the state of the art Tomas Mikolov, COLING 2014 139 RNNLM toolkit • Available at rnnlm.org • Allows training of RNN and RNNME models • Extensions are actively developed, for example multi-threaded version with hierarchical softmax: http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/ Tomas Mikolov, COLING 2014 140 Word2vec • Available at https://code.google.com/p/word2vec/ • Tool for training the word vectors using CBOW and skip-gram architectures, supports both negative sampling and hierarchical softmax • Optimized for very large datasets (>billions of training words) • Includes links to models pre-trained on large datasets (100B words) Tomas Mikolov, COLING 2014 141 CSLM: Feedforward NNLM code • Continuous Space Language Model toolkit: http://www-lium.univ-lemans.fr/cslm/ • Implementation of feedforward neural network language model by Holger Schwenk Tomas Mikolov, COLING 2014 142 Other neural net SW • List available at http://deeplearning.net/software_links/ • Mostly focuses on general machine learning tools, not necessarily NLP Tomas Mikolov, COLING 2014 143 Large text corpora Short list available at the word2vec project: https://code.google.com/p/word2vec/#Where_to_obtain_the_training _data • Sources: Wikipedia dump, statmt.org, UMBC webbase corpus • Altogether around 8 billion words can be downloaded for free Tomas Mikolov, COLING 2014 144 Benchmark datasets (LMs, word vectors) • The Penn Treebank setup including the usual text normalization is part of the example archive at rnnlm.org • WSJ setup (simple ASR experiments, includes N-best lists): http://www.fit.vutbr.cz/~imikolov/rnnlm/kaldi-wsj.tgz • Datasets for measuring word / phrase similarity available at: 1. http://research.microsoft.com/enus/um/people/gzweig/Pubs/myz_naacl13_test_set.tgz 2. https://code.google.com/p/word2vec/source/browse/trunk/questionswords.txt 3. https://code.google.com/p/word2vec/source/browse/trunk/questionsphrases.txt Tomas Mikolov, COLING 2014 145 Final summary • Distributed word representations >= word classes • Neural nets >= logistic regression • Neural networks are useful statistical tool, but not the final solution to AI by themselves • Deep learning is an interesting research direction, but we need more research to understand how to learn complex patterns in language
Juergen Schmidhuber
Recent (2014) benchmark records in speech recognition and machine translation etc achieved with the help of deep Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs), often at major IT companies:
Vanishing gradients • As we propagate the gradients back in time, usually their magnitude decreases, and quickly approaches tiny values: this is called vanishing gradient • In practice this means that learning long term dependencies is difficult • Special architectures address this problem (Long Short-term Memory – LSTM RNN (Hochreiter & Schmidhuber, 1997))
http://people.idsia.ch/~juergen/rnn.html
http://people.idsia.ch/~juergen/oldrnn4.html
Why Use Recurrent Neural Networks? Why Use LSTM?
SOURCE CODE
LSTM source code of Felix Gers (ex-IDSIA)
LSTM source code in the PDP++ software
Understanding LSTM Networks
Posted on August 27, 2015
colah's blog
Linguistic Regularities in Sparse and Explicit Word Representations
https://levyomer.files.wordpress.com/2014/04/linguistic-regularities-in-sparse-and-explicit-word-representations-conll-2014.pdf
ANNOTATING RELATION INFERENCE IN CONTEXT VIA QUESTION ANSWERING
https://levyomer.wordpress.com/2016/05/01/annotating-relation-inference-in-context-via-question-answering/
Yoshua Bengio
How to Construct Deep Recurrent Neural Networks
http://arxiv.org/pdf/1312.6026.pdf
On optimization methods for deep learning
http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf
Brown clustering
http://en.wikipedia.org/wiki/Brown_clustering
Perplexity per word
http://en.wikipedia.org/wiki/Perplexity
What is Maximum Entropy Modeling
http://homepages.inf.ed.ac.uk/lzhang10/maxent.html
Deep Learning. Gregory Piatetsky (@kdnuggets) posted this on Twitter http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html
Artificial Neural Networks/Neural Network Basics
http://en.wikibooks.org/wiki/Artificial_Neural_Networks/Neural_Network_Basics#Learning_Rate
https://levyomer.files.wordpress.com/2014/04/linguistic-regularities-in-sparse-and-explicit-word-representations-conll-2014.pdf
ANNOTATING RELATION INFERENCE IN CONTEXT VIA QUESTION ANSWERING
https://levyomer.wordpress.com/2016/05/01/annotating-relation-inference-in-context-via-question-answering/
BIST Parsers
(Yoav Goldberg likes)
Graph & Transition based dependency parsers using BiLSTM feature extractors
The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations.
Required software
Python 2.7 interpreter
PyCNN library
https://github.com/elikip/bist-parser
The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations.
Required software
Python 2.7 interpreter
PyCNN library
https://github.com/elikip/bist-parser
Google Parser SyntaxNet
Yoshua Bengio
How to Construct Deep Recurrent Neural Networks
http://arxiv.org/pdf/1312.6026.pdf
On optimization methods for deep learning
http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf
Brown clustering
http://en.wikipedia.org/wiki/Brown_clustering
Perplexity per word
http://en.wikipedia.org/wiki/Perplexity
What is Maximum Entropy Modeling
http://homepages.inf.ed.ac.uk/lzhang10/maxent.html
Deep Learning. Gregory Piatetsky (@kdnuggets) posted this on Twitter http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html
Artificial Neural Networks/Neural Network Basics
http://en.wikibooks.org/wiki/Artificial_Neural_Networks/Neural_Network_Basics#Learning_Rate
Differences between L1 and L2 as Loss Function and Regularization
http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/
TEXT CLASSIFICATION FOR SENTIMENT ANALYSIS – STOPWORDS AND COLLOCATIONS
Bayesian classifier
http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/
Latest research papers in NLP, advanced applications of RNN with LSTM
2] .....have advanced the work to use LSTM and deep belief networks.
RNNLIB: A recurrent neural network library for sequence learning problems.... Multilingual OCR research and applications: an overview.
. ... That awfully nerdy sentence means that part of the software that Google uses ...
Latest research papers in NLP, advanced applications of RNN with LSTM
RECURRENT NEURAL NETWORKS - FEEDBACK ... - Idsia
www.idsia.ch/.../r...
Journal of Machine Learning Research 3:115-143, 2002. PDF. ... Our recent applications include adaptive robotics and control, handwriting recognition, speech ...Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014) ...LSTM recurrent neural network applications by (former) students & postdocs:.
Dalle Molle Institute for Artificial Intelligence Research
[1502.06922] Deep Sentence Embedding Using the Long ...
arxiv.org › cs
arXiv
by H Palangi - 2015 - Cited by 2 - Related articles
(Help | Advanced search) ... (Submitted on 24 Feb 2015 (v1), last revised 5 Jul 2015 (this version, v2)) ... a hot topic in current natural language processing research, using recurrent ... In this paper, the LSTM-RNN is trained in a weakly supervised manner ... the embedding vector can be used in many different applications.[PDF]arXiv:1506.00019v4 [cs.LG] 17 Oct 2015
arxiv.org/pdf/1506.00019
arXiv
by ZC Lipton - 2015 - Cited by 3
Jun 5, 2015 - In recent years, systems based on long short-term memory (LSTM) and... far larger, and the field of parallel computing has advanced considerably. In .... sive but selective survey of research on recurrent neural networks for learning ... We usethe notation lj and not hj, unlike some other papers, to distinguish ...The Unreasonable Effectiveness of Recurrent Neural ...
karpathy.github.io/2015/05/21/rnn-effectiveness/
May 21, 2015 - There's something magical about Recurrent Neural Networks (RNNs). ... For instance, the figure below shows results from two very nice papers from DeepMind. .... From here on I will use the terms "RNN/LSTM" interchangeably but all experiments ... Concatenating all pg essays over the last ~5 years we get ...Long short-term memory - Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Long_short-term_memory
Long short-term memory (LSTM) is a recurrent neural network (RNN) ... 1 Architecture; 2 Training; 3 Applications; 4 See also; 5 References .... AdvancedRobotics, 22/13–14, pp. ... Journal of Machine Learning Research 3:115–143, 2002. ...detection paper with two chapters devoted to explaining recurrent neural networks, ...
Wikipedia
[PDF]This paper - Microsoft Research
research.microsoft.com/pubs/246720/rnn_em.pdf
Microsoft Research
by B Peng
standing, Long Short-Term Memory, Neural Turing Machine. 1. ... promising results on many natural language processing tasks [1,. 2]. ... chitecture. Inspired by the recentwork in [19], we extend the ... To predict outputs, the model uses input observation together ... RNN, to contrast it with more advanced recurrent neural net-.[PDF]Deep Sentence Embedding Using Long Short-Term ...
research.microsoft.com/.../SentenceEmbedding1502....
Microsoft Research
by H Palangi
(RNN) with Long Short-Term Memory (LSTM) cells. The proposed ... In this paper, we propose to use an RNN to sequen- tially accept each ... Language Processing (NLP). ....sponding to the last word is the sentence embedding vector (blue). ..... In this study all the ... of words in a query or a document, is known in advance,.[PDF]Phenotyping of Clinical Time Series with LSTM Recurrent ...
zacklipton.com/media/papers/lipton_kale-nips2015-picu_lstms.pdf
by ZC Lipton
We present a novel application of LSTM recurrent neural networks to multilabel ...spanning natural language processing, image captioning, handwriting recognition, ...This paper presents a preliminary empirical study of LSTM RNNs applied to .... includeadvanced optimization and regularization strategies, techniques to ...ChristosChristofidis/awesome-deep-learning · GitHub
https://github.com/ChristosChristofidis/awesome-deep-learning
Latest commit 82f39e2 3 days ago @ChristosChristofidis ... (Jan 6 2015); neuraltalk by Andrej Karpathy : numpy-based RNN/LSTM implementation ... Recursive Deep Learning for Natural Language Processing and Computer Vision .... over several years to use in object recognition research (Formats: homebrew, vrml); OSU ...Recurrent Neural Networks Tutorial, Part 1 – Introduction to ...
www.wildml.com/.../recurrent-neural-networks-tutorial-part-1-introducti...
Sep 17, 2015 - Recurrent Neural Networks (RNNs) are popular models that have shown ... But despite their recent popularity I've only found a limited number of ... Here are some example applications of RNNs in NLP (by non means an exhaustive list). ...Research papers about Language Modeling and Generating Text:.
https://www.reddit.com/r/.../comments/.../keras_lstm_limitations/
Jul 18, 2015 - advanced search: by author, subreddit. .... Here is how I made an LSTM generate in Keras: ... feeding input data to an RNN and obtaining its output one-by-one.... in Keras have to process every sample from its first time step to the last. ... Even though PyBrain uses no theano or optimization whatsoever, ...
Reddit
Recommend-Papers.org - Explore Deep Learning ...
https://recommend-papers.org/venue?q...
Key to our approach is a well-optimized RNN training system that uses multiple GPUs,... Therefore, our objective here is to formally study this general problem for .... output is given to a long short-term memory (LSTM) recurrent neural network .... ask me anything: dynamic memory networks for natural language processing.[PDF]On Efficient Training of Word Classes and Their Application ...
https://www-i6.informatik.rwth-aachen.de/.../Botr...
RWTH Aachen University
by R Botros - 2015
Sep 10, 2015 - In this paper, we investigated various word clustering methods, by studying ... terms of PP of the RNN LM, word classes trained under the ... used for various tasks in natural language processing including ... The last two cases are both called one-sided models. ..... Supported by the Intelligence Advanced.Tutorials - (ICASSP) 2015
icassp2015.org/tutorials/
In 2012, she was named a Center for Advanced Studies (CAS) Associate, while in ...RNN d. LSTM. 2. Computational Network: A Unified Framework for Models ..... of the tutorial will then focus on advanced applications and recent research results. .... He has received the IEEE Signal Processing Society best paper award in ...Asking RNNs+LTSMs: What Would Mozart Write? - Wise.io
www.wise.io/tech/asking-rnn-and-ltsm-what-would-mozart-write
Jun 19, 2015 - deep learning, recurrent neural net, long short-term memory, LSTM, data science, NLP. ... awhile (even with LSTM; see this site and this 2014 paper from Liu ...Andrej Karpathy's recent blog showed how training a character-level model ... While we're using advanced Natural Language Processing (NLP) in ...[PDF]Fine-grained Opinion Mining with Recurrent Neural ...
www.cs.ubc.ca/.../paper/emnlp-paper-drn...
University of British Columbia
by P Liu - Cited by 1
2Qatar Computing Research Institute - HBKU, Doha, Qatar. {pfliu ... NLP applicationscan benefit from fine-grained ... Meanwhile, recent advances in word embed- ... ing, in this paper we propose a general class of ... results with LSTM RNN outperform the top per- ... also with a Jordan-type and with a more advanced. LSTM ...The Unreasonable Effectiveness of Recurrent Neural ...
https://news.ycombinator.com/item?id=9584325
May 21, 2015 - I would be fascinated by seeing if this research can be continued ... I agree the RNN performance is really amazing! .... See any of the recent papers from Google DeepMind, such as [1] or their most recent work which is startlingly good [
Hacker News
[PDF]Interspeech - Merl
www.merl.com/.../TR2015-097...
Mitsubishi Electric Research Laboratories
by Y Luan - 2015
The pro- posed RNN uses two sub-networks to model the different time ... In the lastdecade, a variety of practical goal-oriented spoken dialog systems have ...Newest 'deep-learning' Questions - Cross Validated
stats.stackexchange.com/questions/tagged/deep-learning
a new area of Machine Learning research concerned with the technologies used for learning .... I'm looking to replicate the findings of a paper to get a better understanding of some more advanced concepts. .... Dictionary creation in deep learning for NLP ....Say that I use an RNN/LSTM to do sentiment analysis, which is a ...IBM Research creates new foundation to program SyNAPSE ...
www.kurzweilai.net › News
Aug 8, 2013 - (Credit: IBM Research) Scientists from IBM unveiled on Aug. ... Toadvance and enable this new ecosystem, IBM researchers developed the .... and Microsoft speech recognition starting last year, and deep convolutional nets .... neural nets on GPU, convolutional neural nets, use LSTM or tensor based RNN, ...
Ray Kurzweil
[PDF]Part III
mlss.tuebingen.mpg.de/2015/slides/.../Fergus_2.pdf
Recent uses of NNLMs and RNNs to improve machine translation: Fast and ... Learning Phrase Representations using RNN Encoder-Decoder for Statistical.
Max Planck Society
arXiv:1506.06726v1 [cs.CL] 22 Jun 2015 - Department of ...
www.cs.toronto.edu/~zemel/documents/skipThought.pdf
by R Kiros - 2015 - Cited by 9
Jun 22, 2015 - Canadian Institute for Advanced Research 2 .... model, we use an RNNencoder with GRU [14] activations and an RNN decoder with a ...paper - Academia.edu
www.academia.edu/.../FYP_Deep_Learning_with_GPU_T...
An empirical study of the use of deep learning (DL) neural networks powered by ... allow reasoning via probabilistic natural language processing for interacting more ... (RNN) 4.6 Sparse Autoencoders 4.7 Long short term memory (LSTM) 4.8 .... NVIDIA's latestGPU-powered technology making access affordable for those ...
Academia.edu
Can we build language-independent OCR using LSTM ...
dl.acm.org/citation.cfm?id...
Association for Computing Machinery
by A Ul-Hasan - 2013 - Cited by 4 - Related articles
Aug 24, 2013 - Recent authors with related interests Expand Related Authors ... In thispaper, we explore the question to what extent LSTM models can be used for .... A. Graves, "BigDat 2016 Course Description - Grammars
grammars.grlmc.com/bigdat2016/coursedescription.php
Computing in particle physics and related research domains faces ... and published around 65 papers in physics and computer science with an hindex of ... However, the analysis of that data -- that magic ingredient of algorithms and advanced ....representative models including CNN, RBMs, LSTM, and RNN are covered.Machine Learning - Community - Google+
https://plus.google.com/communities/107785538899595981479
NLP. - Nov 13, 2015. We have used Karpathy's char-rnn network to generate .... by Irina Rish(http://www.research.ibm.com/people/r/rish/papers/RC22230.pdf), .... explaining what the difference between an LSTM memory cell and an LSTM layer[PDF]yes
heim.ifi.uio.no/griff/SIGMM-records-1404.pdf
Dec 4, 2014 - 12 Papers. 13 Call for ... and audio processing, natural language processing and multimedia ..... LSTM-RNN JSON network file support for networks trained with the .... quickly get some advanced applications running as pre- configured... The latest openSMILE release (2.1) contains a research prototype of ...[PDF]Language Models for Image Captioning - Margaret Mitchell
www.m-mitchell.com/papers/P15-2017.pdf
by J Devlin - Cited by 4 - Related articles
Jul 31, 2015 - key aspects of the ME and RNN methods, we achieve a new ... Recentprogress in automatic image captioning ... In this paper, we study the relative merits of... We advance the state-of-the-art BLEU scores ... 2015), and a novel LSTM approach introduced ... pear in a caption, and both use a beam search to.Accepted Regular Papers | ASRU 2013
www.asru2013.org/accepted-regular-papers
LM_01: K-COMPONENT RECURRENT NEURAL NETWORK LANGUAGE ... Taiwan; Ea-Ee Jan, IBM Thomas J. Watson Research Center, United States ..... Rongfeng Su, Shenzhen Institutes of Advanced Technology, China; Xunying Liu, ..... NN_02: HYBRID SPEECH RECOGNITION WITH DEEP BIDIRECTIONAL LSTM.Jonathan LE ROUX
www.jonathanleroux.org/
I added a page with non-research software where I share pieces of software that I ... to cite my papers, please make sure to write my last name as {Le Roux} in the ... robust feature extraction, and advanced speech recognition," in Proc. ... LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR," in Proc.OCR for Bilingual documents using Language Modeling ...
www.researchgate.net/.../282283832_OCR_for_Bilingual_doc...
Sep 29, 2015 - In this paper we use multiple preprocessing routines as alternate hypotheses and use a language model to verify each alternative and choose ...
ResearchGate
BEGINNINGS OF WORD2VEC IN 2010 CANADA
From Frequency to Meaning: Vector Space Models of Semantics
https://www.jair.org/media/2934/live-2934-4846-jair.pdf
Peter D. Turney peter.turney@nrc-cnrc.gc.ca National Research Council Canada Ottawa, Ontario, Canada, K1A 0R6 Patrick Pantel me@patrickpantel.com Yahoo! Labs Sunnyvale, CA, 94089, USA Abstract
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
TEXT GENERATION
DopeLearning: A Computational Approach to Rap Lyrics Generation
May 18 2015
http://arxiv.org/pdf/1505.04771.pdf
DopeLearning: A Computational Approach to Rap Lyrics Generation∗ Eric Malmi Aalto University and HIIT Espoo, Finland eric.malmi@aalto.fi Pyry Takala Aalto University Espoo, Finland pyry.takala@aalto.fi Hannu Toivonen University of Helsinki and HIIT Helsinki, Finland hannu.toivonen@cs.helsinki.fi Tapani Raiko Aalto University Espoo, Finland tapani.raiko@aalto.fi Aristides Gionis Aalto University and HIIT Espoo, Finland aristides.gionis@aalto.fi
ABSTRACT Writing rap lyrics requires both creativity, to construct a meaningful and an interesting story, and lyrical skills, to produce complex rhyme patterns, which are the cornerstone of a good flow. We present a method for capturing both of these aspects. Our approach is based on two machine learning techniques: the RankSVM algorithm, and a deep neural network model with a novel structure. For the problem of distinguishing the real next line from a randomly selected one, we achieve an 82 % accuracy. We employ the resulting prediction method for creating new rap lyrics by combining lines from existing songs. In terms of quantitative rhyme density, the produced lyrics outperform best human rappers by 21 %. The results highlight the benefit of our rhyme density metric and our innovative predictor of next lines.
demo
http://deepbeat.org/
eSpeak text to speech
From Frequency to Meaning: Vector Space Models of Semantics
https://www.jair.org/media/2934/live-2934-4846-jair.pdf
Peter D. Turney peter.turney@nrc-cnrc.gc.ca National Research Council Canada Ottawa, Ontario, Canada, K1A 0R6 Patrick Pantel me@patrickpantel.com Yahoo! Labs Sunnyvale, CA, 94089, USA Abstract
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
TEXT GENERATION
DopeLearning: A Computational Approach to Rap Lyrics Generation
May 18 2015
http://arxiv.org/pdf/1505.04771.pdf
DopeLearning: A Computational Approach to Rap Lyrics Generation∗ Eric Malmi Aalto University and HIIT Espoo, Finland eric.malmi@aalto.fi Pyry Takala Aalto University Espoo, Finland pyry.takala@aalto.fi Hannu Toivonen University of Helsinki and HIIT Helsinki, Finland hannu.toivonen@cs.helsinki.fi Tapani Raiko Aalto University Espoo, Finland tapani.raiko@aalto.fi Aristides Gionis Aalto University and HIIT Espoo, Finland aristides.gionis@aalto.fi
ABSTRACT Writing rap lyrics requires both creativity, to construct a meaningful and an interesting story, and lyrical skills, to produce complex rhyme patterns, which are the cornerstone of a good flow. We present a method for capturing both of these aspects. Our approach is based on two machine learning techniques: the RankSVM algorithm, and a deep neural network model with a novel structure. For the problem of distinguishing the real next line from a randomly selected one, we achieve an 82 % accuracy. We employ the resulting prediction method for creating new rap lyrics by combining lines from existing songs. In terms of quantitative rhyme density, the produced lyrics outperform best human rappers by 21 %. The results highlight the benefit of our rhyme density metric and our innovative predictor of next lines.
demo
http://deepbeat.org/
eSpeak text to speech
http://espeak.sourceforge.net/
Raplysaattori is a software used to detect rhymes and compute their lengths from English / Finnish rap lyrics
http://mining4meaning.com/2015/02/13/raplyzer/
https://github.com/ekQ/raplysaattori
Raplysaattori is a software used to detect rhymes and compute their lengths from English / Finnish rap lyrics
http://mining4meaning.com/2015/02/13/raplyzer/
https://github.com/ekQ/raplysaattori
WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making
Rodrigo Nogueira
http://arxiv.org/pdf/1602.02261v1.pdf
RODRIGONOGUEIRA@NYU.EDU Tandon School of Engineering, New York University, 6 MetroTech Center, Brooklyn, NY 11201 Kyunghyun Cho KYUNGHYUN.CHO@NYU.EDU Courant Institute of Mathematical Sciences, New York University, 719 Broadway, 12th Floor, New York, NY 10003
Abstract We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a web site, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is required to have sophisticated high-level reasoning based on natural languages and efficient sequential decision making capability to succeed. We release a software tool, called WebNav, that automatically transforms a website into this goal-driven web navigation task, and as an example, we make WikiNav, a dataset constructed from the English Wikipedia containing approximately 5 million articles and more than 12 million queries for training. We evaluate two different agents based on neural networks on the WikiNav and provide the human performance. Our results show the difficulty of the task for both humans and machines. With this benchmark, we expect faster progress in developing artificial agents with natural language understanding and planning skills.
Exploring the Limits of Language Modeling
Google Brain
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu
(Submitted on 7 Feb 2016 (v1), last revised 11 Feb 2016 (this version, v2))
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu
(Submitted on 7 Feb 2016 (v1), last revised 11 Feb 2016 (this version, v2))
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.
Google -
Swivel: Improving Embeddings by Noticing What’s Missing
http://arxiv.org/pdf/1602.02215v1.pdf
Noam Shazeer NOAM@GOOGLE.COM Ryan Doherty PORTALFIRE@GOOGLE.COM Colin Evans COLINHEVANS@GOOGLE.COM Chris Waterson WATERSON@GOOGLE.COM Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043
Abstract
We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating lowdimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information matrix via stochastic gradient descent. It uses a piecewise loss with special handling for unobserved co-occurrences, and thus makes use of all the information in the matrix. While this requires computation proportional to the size of the entire matrix, we make use of vectorized multiplication to process thousands of rows and columns at once to compute millions of predicted values. Furthermore, we partition the matrix into shards in order to parallelize the computation across many nodes. This approach results in more accurate embeddings than can be achieved with methods that consider only observed cooccurrences, and can scale to much larger corpora than can be handled with sampling
Unsupervised and Multimodal Seq2Seq
Dec 15, 2015
Three more papers! These are on multimodal / multilingual translation, as well as an approach to incorporating monolingual data that I’ve also been pursuing. Thanks to Cho (at NIPS) for bringing them to my attention.
http://www.cinjon.com/papers-multimodal-seq2seq/
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
http://arxiv.org/abs/1506.07285
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher
(Submitted on 24 Jun 2015 (v1), last revised 9 Feb 2016 (this version, v4))
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher
(Submitted on 24 Jun 2015 (v1), last revised 9 Feb 2016 (this version, v4))
Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook's bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.
Representation of linguistic form and function in recurrent neural networks
We present novel methods for analysing the activation patterns of RNNs and identifying the types of linguistic structure they learn. As a case study, we use a multi-task gated recurrent network model consisting of two parallel pathways with shared word embeddings trained on predicting the representations of the visual scene corresponding to an input sentence, and predicting the next word in the same sentence. We show that the image prediction pathway is sensitive to the information structure of the sentence, and pays selective attention to lexical categories and grammatical functions that carry semantic information. It also learns to treat the same input token differently depending on its grammatical functions in the sentence. The language model is comparatively more sensitive to words with a syntactic function. Our analysis of the function of individual hidden units shows that each pathway contains specialized units tuned to patterns informative for the task, some of which can carry activations to later time steps to encode long-term dependencies.
Research at Google
Natural Language Processing
http://research.google.com/pubs/NaturalLanguageProcessing.html
Ross Goodwin
http://rossgoodwin.com/
Adventures in Narrated Reality
New forms & interfaces for written language, enabled by machine intelligence
Adventures in Narrated Reality, Part II
Ongoing experiments in writing & machine intelligence
By Ross Goodwin
[DRAFT]
Due to the popularity of Adventures in Narrated Reality, Part I, I’ve decided to continue narrating my research concerning the creative potential of LSTM recurrent neural networks here on Medium. In this installment, I’ll begin by introducing a new short film: Sunspring, an End Cue film, directed by Oscar Sharp and starring Thomas Middleditch, created for the 2016 Sci-Fi London 48 Hour Film Challenge from a screenplay generated with an LSTM trained on science fiction screenplays.
Avni Hannun - speech recognition, currently at Baidu deep speech
http://arxiv.org/find/cs/1/au:+Hannun_A/0/1/0/all/0/1
1. arXiv:1603.09509 [pdf, other]
2. arXiv:1512.02595 [pdf, other]
3. arXiv:1412.5567 [pdf, other]
4. arXiv:1408.2873 [pdf, ps, other]
5. arXiv:1406.7806 [pdf, other]
New forms & interfaces for written language, enabled by machine intelligence
Adventures in Narrated Reality, Part II
Ongoing experiments in writing & machine intelligence
By Ross Goodwin
[DRAFT]
Due to the popularity of Adventures in Narrated Reality, Part I, I’ve decided to continue narrating my research concerning the creative potential of LSTM recurrent neural networks here on Medium. In this installment, I’ll begin by introducing a new short film: Sunspring, an End Cue film, directed by Oscar Sharp and starring Thomas Middleditch, created for the 2016 Sci-Fi London 48 Hour Film Challenge from a screenplay generated with an LSTM trained on science fiction screenplays.
Avni Hannun - speech recognition, currently at Baidu deep speech
http://arxiv.org/find/cs/1/au:+Hannun_A/0/1/0/all/0/1
No comments:
Post a Comment