Open Salon: Neural Network Based Language Models. Text Generation. Dialog

TOMAS MIKOLOV

2012

RNNLM Toolkit

by Tomas Mikolov, 2010-2012
http://www.rnnlm.org/

Introduction

Neural network based language models are nowadays among the most successful techniques for statistical language modeling. The 'rnnlm' toolkit can be used to train, evaluate and use such models.
The goal of this toolkit is to speed up research progress in the language modeling field. First, by providing useful implementation that can demonstrate some of the principles. Second, for the empirical experiments when used in speech recognition and other applications. And finally third, by providing a strong state of the art baseline results, to which future research that aims to "beat state of the art techniques" should compare to.

Download

rnnlm-0.1h - some older version of the toolkit
rnnlm-0.2b
rnnlm-0.2c
rnnlm-0.3b
rnnlm-0.3c
rnnlm-0.3d
rnnlm-0.3e
rnnlm-0.4b - latest version of the toolkit

my notes:
written in C
Uses stochastic gradient descent
One hidden layer
One compression layer
Uses soft max
Needs srilm installed for n-gram model to work. n-gram model is used for comparison with rnnlm model in example.sh
Created in 2012 when srilm 1.6.0 was released
Current SRILM version (2015) is 1.7.1

srilm links:
http://www.speech.sri.com/projects/srilm/
http://www.speech.sri.com/projects/srilm/download.html
SRILM Installation and Running Tutorial

http://www.opentag.com/okapi/wiki/index.php?title=SRILM_Installation_and_Running_Tutorial

http://code.google.com/p/moses-suite/downloads/detail?name=srilm-1.6.0.tar.gz&can=2&q=
srilm-1.6.0.tar.gz - Google Code

installing srilm 1.6.1beta on macOS

http://www1.icsi.berkeley.edu/~wooters/SRILM/index.html

Basic examples - very useful for quick introduction (training, evaluation, hyperparameter selection, simple n-best list rescoring, etc.) - 35MB
Advanced examples - includes large scale experiments with speech lattices (n-best list rescoring, ...) - 235MB, by Stefan Kombrink
Slides from my presentation at Google - pdf
RNNLM is now integrated into Kaldi toolkit! Check this.
Example of data generated by 4-gram language model, by RNN model and by RNNME model (all models are trained on Broadcast news data, 400M/320M words) - check which generated sentences are easier to read!
Word projections from RNN-80 and RNN-640 models trained on Broadcast news data + tool for computing the closest words. (extra large 1600-dimensional features from 3 models are here)

Frequently asked questions

FAQ archive

------------------------------------------------------------------------------------------------------------------------
6. Something fails! What should I do?
------------------------------------------------------------------------------------------------------------------------
- compilation: the code should be easy to compile with any c++ compiler, let us know if you experience any problems
- if the 'example.sh' fails, check if you have installed SRILM tools (if the combination of models fails)

Known bugs:
- with MSDOS end of line encoding, the rnnlm tool works incorrectly; use 'dos2unix'
- empty lines: SRILM skips empty lines, while rnnlm does not; thus better remove all empty lines from test sets, if
  scores from rnnlm and SRILM tools are to be combined
- other CPU architectures than x86: the FAST_EXP() macro from rnnlmlib.cpp might fail; in such case, use normal call
  to exp()

------------------------------------------------------------------------------------------------------------------------
7. Where can I get more information about the 'recurrent neural network based language model'?
------------------------------------------------------------------------------------------------------------------------

First check the examples on the webpage:
http://www.fit.vutbr.cz/~imikolov/rnnlm/

Contact:
email: tmikolov@gmail.com


References:
[1] Mikolov, T., Karafiát, M., Burget, L., Èernocký, J., Khudanpur, S.: Recurrent neural network based language model, In:
Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010),
Makuhari, Chiba, JP, ISCA, 2010
http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf

[2] Mikolov, T., Kombrink, S., Burget, L., Èernocký, J., Khudanpur, S.: EXTENSIONS OF RECURRENT NEURAL NETWORK LANGUAGE MODEL,
In: Proc. ICASSP 2011

https://scholar.google.com/citations?view_op=view_citation&hl=en



[3] Mikolov, T., Deoras, A., Kombrink, S., Burget, L., Èernocký: Empirical Evaluation and Combination of Advanced
Language Modeling Techniques, submitted to Interspeech 2011

http://research.microsoft.com/pubs/175560/InterSpeech-2011.PDF

Recurrent tweets
Project presentation

Mathias Berglund, Petri Kyröläinen, Yu Shen
December 9, 2013

http://research.ics.aalto.fi/cog/langtech13/2013-12-09_Recurrent_Tweets.pdf

RNNLM-HS: fast recurrent nnet language model; WSJ example
https://github.com/vimal-manohar91/kaldi-git/tree/master/tools/rnnlm-hs-0.1b

****************************************************************************
Mikolov Tomáš: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
All the details that did not make it into the papers, more results on additional taks.
Mikolov Tomáš, Sutskever Ilya, Deoras Anoop, Le Hai-Son, Kombrink Stefan, Černocký Jan: Subword Language Modeling with Neural Networks. Not published (rejected from ICASSP 2012).
Using subwords as basic units for RNNLMs has several advantages: no OOV rate, smaller model size and better speed. Just split the infrequent words into subword units.
Mikolov Tomáš, Deoras Anoop, Povey Daniel, Burget Lukáš, Černocký Jan: Strategies for Training Large Scale Neural Network Language Models, In: Proceedings of ASRU 2011
How to train RNN LM on a single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup.
Mikolov Tomáš, Kombrink Stefan, Deoras Anoop, Burget Lukáš, Černocký Jan: RNNLM - Recurrent Neural Network Language Modeling Toolkit, In: ASRU 2011 Demo Session
Brief description of the RNN LM toolkit that is available on this website.
Mikolov Tomáš, Deoras Anoop, Kombrink Stefan, Burget Lukáš, Černocký Jan: Empirical Evaluation and Combination of Advanced Language Modeling Techniques, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.
Kombrink Stefan, Mikolov Tomáš, Karafiát Martin, Burget Lukáš: Recurrent Neural Network based Language Modeling in Meeting Recognition, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT
Easy way how to adapt RNN LM + speedup tricks for rescoring (can be faster than 0.05 RT)
Deoras Anoop, Mikolov Tomáš, Kombrink Stefan, Karafiát Martin, Khudanpur Sanjeev: Variational Approximation of Long-span Language Models for LVCSR, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
RNN LM can be approximated by n-gram model, and used directly in the decoder at no compuational cost.
Mikolov Tomáš, Kombrink Stefan, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Extensions of Recurrent Neural Network Language Model, In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ
Better results by using Backpropagation throught time and better speed by using classes.
Mikolov Tomáš, Karafiát Martin, Burget Lukáš, Černocký Jan, Khudanpur Sanjeev: Recurrent neural network based language model, In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, JP
We show that RNN LM can be trained just by simple backpropagation, despite the popular beliefs.

*******

methods for learning vector space representations of words:

Distributed Representations of Words and Phrases and their Compositionality

https://arxiv.org/abs/1310.4546
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean
(Submitted on 16 Oct 2013)
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible

Distributed Representations of Sentences and Documents

https://cs.stanford.edu/~quocle/paragraph_vector.pdf

Quoc Le QVL@GOOGLE.COM Tomas Mikolov TMIKOLOV@GOOGLE.COM Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043

Abstract

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, “powerful,” “strong” and “Paris” are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

GloVe: Global Vectors for Word Representation
http://nlp.stanford.edu/projects/glove/
Jeffrey Pennington, Richard Socher, Christopher D. Manning
Introduction
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

GloVe: Global Vectors for Word Representation
http://nlp.stanford.edu/pubs/glove.pdf
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014
Abstract
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

Details of intrinsic word vector evaluation
Word vector analogies Semantic and Syntactic examples from
https://code.google.com/archive/p/word2vec/source/default/source

word2vec/trunk/questions-words.txt

589.8KB

http://code.google.com/p/word2vec/source/browse/trunk/questions-words.txt

SVN hosting has been permanently disabled.

****************************************************************************

2014-2015

[PDF] Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews
G Mesnil, MA Ranzato, T Mikolov, Y Bengio - arXiv preprint arXiv:1412.5335, 2014
Abstract: Sentiment analysis is a common task in natural language processing that aims to
detect polarity of a text document (typically a consumer review). In the simplest settings, we
discriminate only between positive and negative sentiment, turning the task into a ...
http://arxiv.org/pdf/1412.5335.pdf

Code to reproduce Mesnil experiments is available at

https://github.com/mesnilgr/iclr15

https://github.com/mesnilgr/nbsvm

my notes
use https to clone git repository

git clone https://github.com:mesnilgr/iclr15

//git clone git@github.com:mesnilgr/iclr15.git

cd iclr15;

chmod +x oh_my_go.sh

./oh_my_go.sh

This code has been tested on Ubuntu and Fedora. Compilation of word2vec on OSX seems to be an issue

my notes:

my env
1. alleged patch to allow rnnlm build on macos - doesn't fix exp10 issue
https://gist.github.com/tpeng/9020592

patch makefile:

-----------------------------------------------------------------------------------------------------------------
#CC = x86_64-linux-g++-4.6
CC = llvm-gcc
WEIGHTTYPE = float
CFLAGS = -D WEIGHTTYPE=$(WEIGHTTYPE) -lm -O2 -Wall -funroll-loops -ffast-math -lstdc++
#CFLAGS = -D WEIGHTTYPE=$(WEIGHTTYPE) -lm -O2 -Wall -funroll-loops -ffast-math
#CFLAGS = -lm -O2 -Wall

all: rnnlmlib.o rnnlm

rnnlmlib.o : rnnlmlib.cpp
$(CC) $(CFLAGS) $(OPT_DEF) -c rnnlmlib.cpp

rnnlm : rnnlm.cpp
$(CC) $(CFLAGS) $(OPT_DEF) rnnlm.cpp rnnlmlib.o -o rnnlm

clean:
rm -rf *.o rnnlm
-------------------------------------------------------------------------------------------------------------------
really good advice at

http://dev.libqxt.org/libqxt-old-hg/issue/156/problem-with-llrint-exp10-not-declared-in

----------------------------------------------------------------------------------------------------------------------
Ok. I looked into this further. It turns out my math.h function in Mac OSX does not have exp10. I changed line 269 to use pow(x,y) instead of exp10(x). So
//qlonglong modv = llrint(exp10(4-n));

//this line fails compilation in Mac OSX10.6.8 - no exp10 function qlonglong modv = llrint(pow((double)10,(4-n)));

//this should work. Modified to use Mac's existing function library
-----------------------------------------------------------------------------------------------------------------------

rnnlmlib.cpp:1651:40: error: use of undeclared identifier 'exp10'

fprintf(flog, "PPL net: %f\n", exp10(-logp/(real)wordcn));

replacing exp10(-logp/(real)wordcn) with pow((double)10, (-logp/(real)wordcn))

i.e.
fprintf(flog, "PPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));
in my case -

rnnlmlib.cpp:1651:40: error: use of undeclared identifier 'exp10'

fprintf(flog, "PPL net: %f\n", exp10(-logp/(real)wordcn));

replace with fprintf(flog, "PPL net: %f\n", pow((double)10, (-logp/(real)wordcn))); ^

rnnlmlib.cpp:1798:35: error: use of undeclared identifier 'exp10'

fprintf(flog, "\nPPL net: %f\n", exp10(-logp/(real)wordcn));

replace with fprintf(flog, "\nPPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));

rnnlmlib.cpp:1800:43: error: use of undeclared identifier 'exp10'

fprintf(flog, "PPL other: %f\n", exp10(-log_other/(real)wordcn));

replace with fprintf(flog, "PPL other: %f\n", pow((double)10, (-log_other/(real)wordcn)));

rnnlmlib.cpp:1801:45: error: use of undeclared identifier 'exp10'

fprintf(flog, "PPL combine: %f\n", exp10(-log_combine/(real)wordcn));

replace with fprintf(flog, "PPL combine: %f\n", pow((double)10, (-log_combine/(real)wordcn)));

rnnlmlib.cpp:1936:28: error: use of undeclared identifier 'exp10'

printf("\nPPL net: %f\n", exp10(-logp/(real)wordcn));

replace with printf("\nPPL net: %f\n", pow((double)10, (-logp/(real)wordcn)));

rnnlmlib.cpp:1938:36: error: use of undeclared identifier 'exp10'

printf("PPL other: %f\n", exp10(-log_other/(real)wordcn));

replace with printf("PPL net: %f\n", pow((double)10, (-log_other/(real)wordcn)));

rnnlmlib.cpp:1939:38: error: use of undeclared identifier 'exp10'

printf("PPL combine: %f\n", exp10(-log_combine/(real)wordcn));

replace with printf("PPL combine: %f\n", pow((double)10, (-log_combine/(real)wordcn)));

7 errors generated.

--------------------
after replacement, to prevent ovewriting during script execution - change rnnlm.sh script:
#IR commenting out lines downloading new code because we need to preserve changed rnnlmlib.cpp with replaced exp10 failing the build
#wget http://www.fit.vutbr.cz/~imikolov/rnnlm/rnnlm-0.3e.tgz
#tar -xvf rnnlm-0.3e.tgz

----------------------
the problem now is at

mkdir: word2vec: File exists

../iclr15/scripts/paragraph.sh: line 7: shuf: command not found

----------------------
command line shuffling util shuf is in linux coreutils and absent on Mac OS
http://superuser.com/questions/760732/randomly-shuffle-rows-in-a-large-text-file

http://brew.sh/
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

command:
brew install coreutils
location:

/usr/local/bin/gshuf
----------------------
paragraph.sh
typos in commands. correct versions:
head -n 25000 sentence_vectors.txt | awk 'BEGIN{a=0;}{if (a<12500) printf "1 "; else printf "-1 "; for (b=1; b<NF; b++) printf b ":" $(b+1) " "; print ""; a++;}' > full-train.txt
head -n 50000 sentence_vectors.txt | tail -n 25000 | awk 'BEGIN{a=0;}{if (a<12500) printf "1 "; else printf "-1 "; for (b=1; b<NF; b++) printf b ":" $(b+1) " "; print ""; a++;}' > test.txt

----------------------

bash-3.2$ ../iclr15/scripts/nbsvm.sh

Cloning into 'nbsvm'...

Permission denied (publickey).

fatal: Could not read from remote repository.

Please make sure you have the correct access rights

and the repository exists.
solution - use https:
#git clone git@github.com:mesnilgr/nbsvm.git
git clone https://github.com/mesnilgr/nbsvm.git

---------------------

see also -

Word2Vec
OSX 10 env errors debugging:

https://code.google.com/p/word2vec/issues/detail?id=17

http://coolestguidesontheplanet.com/install-and-configure-wget-on-os-x/

http://code.google.com/p/word2vec/issues/detail?id=1

Related code:

improved performance gensim word2vec
Deep learning via word2vec’s “skip-gram and CBOW models”, using either hierarchical softmax or negative sampling
http://radimrehurek.com/gensim/models/word2vec.html

[PDF] Learning Longer Memory in Recurrent Neural Networks
T Mikolov, A Joulin, S Chopra, M Mathieu, MA Ranzato - arXiv preprint arXiv: …, 2014
Abstract: Recurrent neural network is a powerful model that learns temporal patterns in
sequential data. For a long time, it was believed that recurrent networks are difficult to train
using simple optimizers, such as stochastic gradient descent, due to the so-called ...
http://arxiv.org/pdf/1412.7753.pdf

Distributed Representations of Sentences and Documents

http://cs.stanford.edu/~quocle/paragraph_vector.pdf

One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling (Chelba et al, 2014)
http://arxiv.org/pdf/1312.3005v3.pdf

Summary: NN, RNN, RNNME • RNN outperforms FNN on language modeling tasks, both are better than n-grams • The question “are neural nets better than n-grams” is incomplete: the best solution is to use both • Joint training of RNN and maxent with n-gram features works great on large datasets

Maximum Entropy Modeling
http://homepages.inf.ed.ac.uk/lzhang10/maxent.html

Resources • Open-source neural-net based NLP software: RNNLM toolkit, word2vec and other tools • Links to large text corpora, pre-trained models • Benchmark datasets for advancing the state of the art Tomas Mikolov, COLING 2014 139 RNNLM toolkit • Available at rnnlm.org • Allows training of RNN and RNNME models • Extensions are actively developed, for example multi-threaded version with hierarchical softmax: http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/ Tomas Mikolov, COLING 2014 140 Word2vec • Available at https://code.google.com/p/word2vec/ • Tool for training the word vectors using CBOW and skip-gram architectures, supports both negative sampling and hierarchical softmax • Optimized for very large datasets (>billions of training words) • Includes links to models pre-trained on large datasets (100B words) Tomas Mikolov, COLING 2014 141 CSLM: Feedforward NNLM code • Continuous Space Language Model toolkit: http://www-lium.univ-lemans.fr/cslm/ • Implementation of feedforward neural network language model by Holger Schwenk Tomas Mikolov, COLING 2014 142 Other neural net SW • List available at http://deeplearning.net/software_links/ • Mostly focuses on general machine learning tools, not necessarily NLP Tomas Mikolov, COLING 2014 143 Large text corpora Short list available at the word2vec project: https://code.google.com/p/word2vec/#Where_to_obtain_the_training _data • Sources: Wikipedia dump, statmt.org, UMBC webbase corpus • Altogether around 8 billion words can be downloaded for free Tomas Mikolov, COLING 2014 144 Benchmark datasets (LMs, word vectors) • The Penn Treebank setup including the usual text normalization is part of the example archive at rnnlm.org • WSJ setup (simple ASR experiments, includes N-best lists): http://www.fit.vutbr.cz/~imikolov/rnnlm/kaldi-wsj.tgz • Datasets for measuring word / phrase similarity available at: 1. http://research.microsoft.com/enus/um/people/gzweig/Pubs/myz_naacl13_test_set.tgz 2. https://code.google.com/p/word2vec/source/browse/trunk/questionswords.txt 3. https://code.google.com/p/word2vec/source/browse/trunk/questionsphrases.txt Tomas Mikolov, COLING 2014 145 Final summary • Distributed word representations >= word classes • Neural nets >= logistic regression • Neural networks are useful statistical tool, but not the final solution to AI by themselves • Deep learning is an interesting research direction, but we need more research to understand how to learn complex patterns in language

Juergen Schmidhuber

https://plus.google.com/100849856540000067209/posts
Recent (2014) benchmark records in speech recognition and machine translation etc achieved with the help of deep Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs), often at major IT companies:

Vanishing gradients • As we propagate the gradients back in time, usually their magnitude decreases, and quickly approaches tiny values: this is called vanishing gradient • In practice this means that learning long term dependencies is difficult • Special architectures address this problem (Long Short-term Memory – LSTM RNN (Hochreiter & Schmidhuber, 1997))
http://people.idsia.ch/~juergen/rnn.html

http://people.idsia.ch/~juergen/oldrnn4.html

Why Use Recurrent Neural Networks? Why Use LSTM?

Tutorial slides

NIPS 2003 RNNaissance workshop

SOURCE CODE

LSTM source code of Felix Gers (ex-IDSIA)

LSTM source code in the PDP++ software

Understanding LSTM Networks
Posted on August 27, 2015
colah's blog

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Omer Levy Yoav Goldberg

Linguistic Regularities in Sparse and Explicit Word Representations
https://levyomer.files.wordpress.com/2014/04/linguistic-regularities-in-sparse-and-explicit-word-representations-conll-2014.pdf

ANNOTATING RELATION INFERENCE IN CONTEXT VIA QUESTION ANSWERING
https://levyomer.wordpress.com/2016/05/01/annotating-relation-inference-in-context-via-question-answering/

BIST Parsers

(Yoav Goldberg likes)

Graph & Transition based dependency parsers using BiLSTM feature extractors

The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations.
Required software
Python 2.7 interpreter
PyCNN library

https://github.com/elikip/bist-parser

Google Parser SyntaxNet

https://github.com/tensorflow/models/tree/master/syntaxnet

Yoshua Bengio

How to Construct Deep Recurrent Neural Networks
http://arxiv.org/pdf/1312.6026.pdf

On optimization methods for deep learning
http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf

Brown clustering

http://en.wikipedia.org/wiki/Brown_clustering
Perplexity per word

http://en.wikipedia.org/wiki/Perplexity
What is Maximum Entropy Modeling
http://homepages.inf.ed.ac.uk/lzhang10/maxent.html

Deep Learning. Gregory Piatetsky (@kdnuggets) posted this on Twitter http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html

Artificial Neural Networks/Neural Network Basics

http://en.wikibooks.org/wiki/Artificial_Neural_Networks/Neural_Network_Basics#Learning_Rate

Differences between L1 and L2 as Loss Function and Regularization
http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/

TEXT CLASSIFICATION FOR SENTIMENT ANALYSIS – STOPWORDS AND COLLOCATIONS
Bayesian classifier

http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/

Latest research papers in NLP, advanced applications of RNN with LSTM

RECURRENT NEURAL NETWORKS - FEEDBACK ... - Idsia

www.idsia.ch/.../r...

Dalle Molle Institute for Artificial Intelligence Research

Journal of Machine Learning Research 3:115-143, 2002. PDF. ... Our recent applications include adaptive robotics and control, handwriting recognition, speech ...Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014) ...LSTM recurrent neural network applications by (former) students & postdocs:.

[1502.06922] Deep Sentence Embedding Using the Long ...

arxiv.org › cs

arXiv

by H Palangi - ‎2015 - ‎Cited by 2 - ‎Related articles

(Help | Advanced search) ... (Submitted on 24 Feb 2015 (v1), last revised 5 Jul 2015 (this version, v2)) ... a hot topic in current natural language processing research, using recurrent ... In this paper, the LSTM-RNN is trained in a weakly supervised manner ... the embedding vector can be used in many different applications.

[PDF]arXiv:1506.00019v4 [cs.LG] 17 Oct 2015

arxiv.org/pdf/1506.00019

arXiv

by ZC Lipton - ‎2015 - ‎Cited by 3

Jun 5, 2015 - In recent years, systems based on long short-term memory (LSTM) and... far larger, and the field of parallel computing has advanced considerably. In .... sive but selective survey of research on recurrent neural networks for learning ... We usethe notation lj and not hj, unlike some other papers, to distinguish ...

The Unreasonable Effectiveness of Recurrent Neural ...

karpathy.github.io/2015/05/21/rnn-effectiveness/

May 21, 2015 - There's something magical about Recurrent Neural Networks (RNNs). ... For instance, the figure below shows results from two very nice papers from DeepMind. .... From here on I will use the terms "RNN/LSTM" interchangeably but all experiments ... Concatenating all pg essays over the last ~5 years we get ...

Long short-term memory - Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Long_short-term_memory

Wikipedia

Long short-term memory (LSTM) is a recurrent neural network (RNN) ... 1 Architecture; 2 Training; 3 Applications; 4 See also; 5 References .... AdvancedRobotics, 22/13–14, pp. ... Journal of Machine Learning Research 3:115–143, 2002. ...detection paper with two chapters devoted to explaining recurrent neural networks, ...

[PDF]This paper - Microsoft Research

research.microsoft.com/pubs/246720/rnn_em.pdf

Microsoft Research

by B Peng

standing, Long Short-Term Memory, Neural Turing Machine. 1. ... promising results on many natural language processing tasks [1,. 2]. ... chitecture. Inspired by the recentwork in [19], we extend the ... To predict outputs, the model uses input observation together ... RNN, to contrast it with more advanced recurrent neural net-.

[PDF]Deep Sentence Embedding Using Long Short-Term ...

research.microsoft.com/.../SentenceEmbedding1502....

Microsoft Research

by H Palangi

(RNN) with Long Short-Term Memory (LSTM) cells. The proposed ... In this paper, we propose to use an RNN to sequen- tially accept each ... Language Processing (NLP). ....sponding to the last word is the sentence embedding vector (blue). ..... In this study all the ... of words in a query or a document, is known in advance,.

[PDF]Phenotyping of Clinical Time Series with LSTM Recurrent ...

zacklipton.com/media/papers/lipton_kale-nips2015-picu_lstms.pdf

by ZC Lipton

We present a novel application of LSTM recurrent neural networks to multilabel ...spanning natural language processing, image captioning, handwriting recognition, ...This paper presents a preliminary empirical study of LSTM RNNs applied to .... includeadvanced optimization and regularization strategies, techniques to ...

ChristosChristofidis/awesome-deep-learning · GitHub

https://github.com/ChristosChristofidis/awesome-deep-learning

Latest commit 82f39e2 3 days ago @ChristosChristofidis ... (Jan 6 2015); neuraltalk by Andrej Karpathy : numpy-based RNN/LSTM implementation ... Recursive Deep Learning for Natural Language Processing and Computer Vision .... over several years to use in object recognition research (Formats: homebrew, vrml); OSU ...

Recurrent Neural Networks Tutorial, Part 1 – Introduction to ...

www.wildml.com/.../recurrent-neural-networks-tutorial-part-1-introducti...

Sep 17, 2015 - Recurrent Neural Networks (RNNs) are popular models that have shown ... But despite their recent popularity I've only found a limited number of ... Here are some example applications of RNNs in NLP (by non means an exhaustive list). ...Research papers about Language Modeling and Generating Text:.

https://www.reddit.com/r/.../comments/.../keras_lstm_limitations/

Jul 18, 2015 - advanced search: by author, subreddit. .... Here is how I made an LSTM generate in Keras: ... feeding input data to an RNN and obtaining its output one-by-one.... in Keras have to process every sample from its first time step to the last. ... Even though PyBrain uses no theano or optimization whatsoever, ...

Recommend-Papers.org - Explore Deep Learning ...

https://recommend-papers.org/venue?q...

Key to our approach is a well-optimized RNN training system that uses multiple GPUs,... Therefore, our objective here is to formally study this general problem for .... output is given to a long short-term memory (LSTM) recurrent neural network .... ask me anything: dynamic memory networks for natural language processing.

[PDF]On Efficient Training of Word Classes and Their Application ...

https://www-i6.informatik.rwth-aachen.de/.../Botr...

RWTH Aachen University

by R Botros - ‎2015

Sep 10, 2015 - In this paper, we investigated various word clustering methods, by studying ... terms of PP of the RNN LM, word classes trained under the ... used for various tasks in natural language processing including ... The last two cases are both called one-sided models. ..... Supported by the Intelligence Advanced.

Tutorials - (ICASSP) 2015

icassp2015.org/tutorials/

In 2012, she was named a Center for Advanced Studies (CAS) Associate, while in ...RNN d. LSTM. 2. Computational Network: A Unified Framework for Models ..... of the tutorial will then focus on advanced applications and recent research results. .... He has received the IEEE Signal Processing Society best paper award in ...

Asking RNNs+LTSMs: What Would Mozart Write? - Wise.io

www.wise.io/tech/asking-rnn-and-ltsm-what-would-mozart-write

Jun 19, 2015 - deep learning, recurrent neural net, long short-term memory, LSTM, data science, NLP. ... awhile (even with LSTM; see this site and this 2014 paper from Liu ...Andrej Karpathy's recent blog showed how training a character-level model ... While we're using advanced Natural Language Processing (NLP) in ...

[PDF]Fine-grained Opinion Mining with Recurrent Neural ...

www.cs.ubc.ca/.../paper/emnlp-paper-drn...

University of British Columbia

by P Liu - ‎Cited by 1

2Qatar Computing Research Institute - HBKU, Doha, Qatar. {pfliu ... NLP applicationscan benefit from fine-grained ... Meanwhile, recent advances in word embed- ... ing, in this paper we propose a general class of ... results with LSTM RNN outperform the top per- ... also with a Jordan-type and with a more advanced. LSTM ...

The Unreasonable Effectiveness of Recurrent Neural ...

https://news.ycombinator.com/item?id=9584325

Hacker News

May 21, 2015 - I would be fascinated by seeing if this research can be continued ... I agree the RNN performance is really amazing! .... See any of the recent papers from Google DeepMind, such as [1] or their most recent work which is startlingly good [2] .....have advanced the work to use LSTM and deep belief networks.

[PDF]Interspeech - Merl

www.merl.com/.../TR2015-097...

Mitsubishi Electric Research Laboratories

by Y Luan - ‎2015

The pro- posed RNN uses two sub-networks to model the different time ... In the lastdecade, a variety of practical goal-oriented spoken dialog systems have ...

Newest 'deep-learning' Questions - Cross Validated

stats.stackexchange.com/questions/tagged/deep-learning

a new area of Machine Learning research concerned with the technologies used for learning .... I'm looking to replicate the findings of a paper to get a better understanding of some more advanced concepts. .... Dictionary creation in deep learning for NLP ....Say that I use an RNN/LSTM to do sentiment analysis, which is a ...

IBM Research creates new foundation to program SyNAPSE ...

www.kurzweilai.net › News

Ray Kurzweil

Aug 8, 2013 - (Credit: IBM Research) Scientists from IBM unveiled on Aug. ... Toadvance and enable this new ecosystem, IBM researchers developed the .... and Microsoft speech recognition starting last year, and deep convolutional nets .... neural nets on GPU, convolutional neural nets, use LSTM or tensor based RNN, ...

[PDF]Part III

mlss.tuebingen.mpg.de/2015/slides/.../Fergus_2.pdf

Max Planck Society

Recent uses of NNLMs and RNNs to improve machine translation: Fast and ... Learning Phrase Representations using RNN Encoder-Decoder for Statistical.

arXiv:1506.06726v1 [cs.CL] 22 Jun 2015 - Department of ...

www.cs.toronto.edu/~zemel/documents/skipThought.pdf

by R Kiros - ‎2015 - ‎Cited by 9

Jun 22, 2015 - Canadian Institute for Advanced Research 2 .... model, we use an RNNencoder with GRU [14] activations and an RNN decoder with a ...

paper - Academia.edu

www.academia.edu/.../FYP_Deep_Learning_with_GPU_T...

Academia.edu

An empirical study of the use of deep learning (DL) neural networks powered by ... allow reasoning via probabilistic natural language processing for interacting more ... (RNN) 4.6 Sparse Autoencoders 4.7 Long short term memory (LSTM) 4.8 .... NVIDIA's latestGPU-powered technology making access affordable for those ...

Can we build language-independent OCR using LSTM ...

dl.acm.org/citation.cfm?id...

Association for Computing Machinery

by A Ul-Hasan - ‎2013 - ‎Cited by 4 - ‎Related articles

Aug 24, 2013 - Recent authors with related interests Expand Related Authors ... In thispaper, we explore the question to what extent LSTM models can be used for .... A. Graves, "RNNLIB: A recurrent neural network library for sequence learning problems.... Multilingual OCR research and applications: an overview.

BigDat 2016 Course Description - Grammars

grammars.grlmc.com/bigdat2016/coursedescription.php

Computing in particle physics and related research domains faces ... and published around 65 papers in physics and computer science with an hindex of ... However, the analysis of that data -- that magic ingredient of algorithms and advanced ....representative models including CNN, RBMs, LSTM, and RNN are covered.

Machine Learning - Community - Google+

https://plus.google.com/communities/107785538899595981479

NLP. - Nov 13, 2015. We have used Karpathy's char-rnn network to generate .... by Irina Rish(http://www.research.ibm.com/people/r/rish/papers/RC22230.pdf), .... explaining what the difference between an LSTM memory cell and an LSTM layer. ... That awfully nerdy sentence means that part of the software that Google uses ...

[PDF]yes

heim.ifi.uio.no/griff/SIGMM-records-1404.pdf

Dec 4, 2014 - 12 Papers. 13 Call for ... and audio processing, natural language processing and multimedia ..... LSTM-RNN JSON network file support for networks trained with the .... quickly get some advanced applications running as pre- configured... The latest openSMILE release (2.1) contains a research prototype of ...

[PDF]Language Models for Image Captioning - Margaret Mitchell

www.m-mitchell.com/papers/P15-2017.pdf

by J Devlin - ‎Cited by 4 - ‎Related articles

Jul 31, 2015 - key aspects of the ME and RNN methods, we achieve a new ... Recentprogress in automatic image captioning ... In this paper, we study the relative merits of... We advance the state-of-the-art BLEU scores ... 2015), and a novel LSTM approach introduced ... pear in a caption, and both use a beam search to.

Accepted Regular Papers | ASRU 2013

www.asru2013.org/accepted-regular-papers

LM_01: K-COMPONENT RECURRENT NEURAL NETWORK LANGUAGE ... Taiwan; Ea-Ee Jan, IBM Thomas J. Watson Research Center, United States ..... Rongfeng Su, Shenzhen Institutes of Advanced Technology, China; Xunying Liu, ..... NN_02: HYBRID SPEECH RECOGNITION WITH DEEP BIDIRECTIONAL LSTM.

Jonathan LE ROUX

www.jonathanleroux.org/

I added a page with non-research software where I share pieces of software that I ... to cite my papers, please make sure to write my last name as {Le Roux} in the ... robust feature extraction, and advanced speech recognition," in Proc. ... LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR," in Proc.

OCR for Bilingual documents using Language Modeling ...

www.researchgate.net/.../282283832_OCR_for_Bilingual_doc...

ResearchGate

Sep 29, 2015 - In this paper we use multiple preprocessing routines as alternate hypotheses and use a language model to verify each alternative and choose ...

BEGINNINGS OF WORD2VEC IN 2010 CANADA
From Frequency to Meaning: Vector Space Models of Semantics
https://www.jair.org/media/2934/live-2934-4846-jair.pdf

Peter D. Turney peter.turney@nrc-cnrc.gc.ca National Research Council Canada Ottawa, Ontario, Canada, K1A 0R6 Patrick Pantel me@patrickpantel.com Yahoo! Labs Sunnyvale, CA, 94089, USA Abstract
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.

TEXT GENERATION
DopeLearning: A Computational Approach to Rap Lyrics Generation
May 18 2015
http://arxiv.org/pdf/1505.04771.pdf
DopeLearning: A Computational Approach to Rap Lyrics Generation∗ Eric Malmi Aalto University and HIIT Espoo, Finland eric.malmi@aalto.fi Pyry Takala Aalto University Espoo, Finland pyry.takala@aalto.fi Hannu Toivonen University of Helsinki and HIIT Helsinki, Finland hannu.toivonen@cs.helsinki.fi Tapani Raiko Aalto University Espoo, Finland tapani.raiko@aalto.fi Aristides Gionis Aalto University and HIIT Espoo, Finland aristides.gionis@aalto.fi

ABSTRACT Writing rap lyrics requires both creativity, to construct a meaningful and an interesting story, and lyrical skills, to produce complex rhyme patterns, which are the cornerstone of a good flow. We present a method for capturing both of these aspects. Our approach is based on two machine learning techniques: the RankSVM algorithm, and a deep neural network model with a novel structure. For the problem of distinguishing the real next line from a randomly selected one, we achieve an 82 % accuracy. We employ the resulting prediction method for creating new rap lyrics by combining lines from existing songs. In terms of quantitative rhyme density, the produced lyrics outperform best human rappers by 21 %. The results highlight the benefit of our rhyme density metric and our innovative predictor of next lines.

demo
http://deepbeat.org/
eSpeak text to speech

http://espeak.sourceforge.net/
Raplysaattori is a software used to detect rhymes and compute their lengths from English / Finnish rap lyrics
http://mining4meaning.com/2015/02/13/raplyzer/
https://github.com/ekQ/raplysaattori

WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making
Rodrigo Nogueira
http://arxiv.org/pdf/1602.02261v1.pdf

RODRIGONOGUEIRA@NYU.EDU Tandon School of Engineering, New York University, 6 MetroTech Center, Brooklyn, NY 11201 Kyunghyun Cho KYUNGHYUN.CHO@NYU.EDU Courant Institute of Mathematical Sciences, New York University, 719 Broadway, 12th Floor, New York, NY 10003
Abstract We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments. In this challenging task, an agent navigates through a web site, which is represented as a graph consisting of web pages as nodes and hyperlinks as directed edges, to find a web page in which a query appears. The agent is required to have sophisticated high-level reasoning based on natural languages and efficient sequential decision making capability to succeed. We release a software tool, called WebNav, that automatically transforms a website into this goal-driven web navigation task, and as an example, we make WikiNav, a dataset constructed from the English Wikipedia containing approximately 5 million articles and more than 12 million queries for training. We evaluate two different agents based on neural networks on the WikiNav and provide the human performance. Our results show the difficulty of the task for both humans and machines. With this benchmark, we expect faster progress in developing artificial agents with natural language understanding and planning skills.

Exploring the Limits of Language Modeling

http://arxiv.org/pdf/1602.02410v2.pdf

Google Brain
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu
(Submitted on 7 Feb 2016 (v1), last revised 11 Feb 2016 (this version, v2))

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.

Google -
Swivel: Improving Embeddings by Noticing What’s Missing
http://arxiv.org/pdf/1602.02215v1.pdf
Noam Shazeer NOAM@GOOGLE.COM Ryan Doherty PORTALFIRE@GOOGLE.COM Colin Evans COLINHEVANS@GOOGLE.COM Chris Waterson WATERSON@GOOGLE.COM Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043
Abstract
We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating lowdimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information matrix via stochastic gradient descent. It uses a piecewise loss with special handling for unobserved co-occurrences, and thus makes use of all the information in the matrix. While this requires computation proportional to the size of the entire matrix, we make use of vectorized multiplication to process thousands of rows and columns at once to compute millions of predicted values. Furthermore, we partition the matrix into shards in order to parallelize the computation across many nodes. This approach results in more accurate embeddings than can be achieved with methods that consider only observed cooccurrences, and can scale to much larger corpora than can be handled with sampling

Unsupervised and Multimodal Seq2Seq
Dec 15, 2015
Three more papers! These are on multimodal / multilingual translation, as well as an approach to incorporating monolingual data that I’ve also been pursuing. Thanks to Cho (at NIPS) for bringing them to my attention.
http://www.cinjon.com/papers-multimodal-seq2seq/

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

http://arxiv.org/abs/1506.07285
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher
(Submitted on 24 Jun 2015 (v1), last revised 9 Feb 2016 (this version, v4))

Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook's bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.

Representation of linguistic form and function in recurrent neural networks

http://arxiv.org/abs/1602.08952

Ákos Kádár, Grzegorz Chrupała, Afra Alishahi
(Submitted on 29 Feb 2016)

We present novel methods for analysing the activation patterns of RNNs and identifying the types of linguistic structure they learn. As a case study, we use a multi-task gated recurrent network model consisting of two parallel pathways with shared word embeddings trained on predicting the representations of the visual scene corresponding to an input sentence, and predicting the next word in the same sentence. We show that the image prediction pathway is sensitive to the information structure of the sentence, and pays selective attention to lexical categories and grammatical functions that carry semantic information. It also learns to treat the same input token differently depending on its grammatical functions in the sentence. The language model is comparatively more sensitive to words with a syntactic function. Our analysis of the function of individual hidden units shows that each pathway contains specialized units tuned to patterns informative for the task, some of which can carry activations to later time steps to encode long-term dependencies.

Research at Google
Natural Language Processing
http://research.google.com/pubs/NaturalLanguageProcessing.html

Ross Goodwin
http://rossgoodwin.com/

https://medium.com/@rossgoodwin/3505ae7a17e7

Adventures in Narrated Reality
New forms & interfaces for written language, enabled by machine intelligence

https://medium.com/@rossgoodwin/adventures-in-narrated-reality-6516ff395ba3

Adventures in Narrated Reality, Part II
Ongoing experiments in writing & machine intelligence

By Ross Goodwin

[DRAFT]

Due to the popularity of Adventures in Narrated Reality, Part I, I’ve decided to continue narrating my research concerning the creative potential of LSTM recurrent neural networks here on Medium. In this installment, I’ll begin by introducing a new short film: Sunspring, an End Cue film, directed by Oscar Sharp and starring Thomas Middleditch, created for the 2016 Sci-Fi London 48 Hour Film Challenge from a screenplay generated with an LSTM trained on science fiction screenplays.

Avni Hannun - speech recognition, currently at Baidu deep speech

http://arxiv.org/find/cs/1/au:+Hannun_A/0/1/0/all/0/1

1. arXiv:1603.09509 [pdf, other]

Learning Multiscale Features Directly From Waveforms

Zhenyao Zhu, Jesse H. Engel, Awni Hannun

Comments: "fix typo in the title"

Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

2. arXiv:1512.02595 [pdf, other]

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Subjects: Computation and Language (cs.CL)

3. arXiv:1412.5567 [pdf, other]

Deep Speech: Scaling up end-to-end speech recognition

Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng

Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

4. arXiv:1408.2873 [pdf, ps, other]

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

Awni Y. Hannun, Andrew L. Maas, Daniel Jurafsky, Andrew Y. Ng

Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

5. arXiv:1406.7806 [pdf, other]

Building DNN Acoustic Models for Large Vocabulary Speech Recognition

Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T. Lengerich, Daniel Jurafsky, Andrew Y. Ng

Subjects: Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

http://arxiv.org/abs/1606.01269

End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning

Jason D. Williams, Geoffrey Zweig

(Submitted on 3 Jun 2016)

This paper presents a model for end-to-end learning of task-oriented dialog systems. The main component of the model is a recurrent neural network (an LSTM), which maps from raw dialog history directly to a distribution over system actions. The LSTM automatically infers a representation of dialog history, which relieves the system developer of much of the manual feature engineering of dialog state. In addition, the developer can provide software that expresses business rules and provides access to programmatic APIs, enabling the LSTM to take actions in the real world on behalf of the user. The LSTM can be optimized using supervised learning (SL), where a domain expert provides example dialogs which the LSTM should imitate; or using reinforcement learning (RL), where the system improves by interacting directly with end users. Experiments show that SL and RL are complementary: SL alone can derive a reasonable initial policy from a small number of training dialogs; and starting RL optimization with a policy trained with SL substantially accelerates the learning rate of RL.

Using a deep neural network approach to identify sarcasm
August 6, 2016 by Nancy Owano
https://techxplore.com/news/2016-08-deep-neural-network-approach-sarcasm.html

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
https://arxiv.org/pdf/1605.02019v1.pdf
Christopher Moody Stitch Fix One Montgomery Tower, Suite 1200 San Francisco, California 94104, USA chrisemoody@gmail.com

Abstract
Distributed dense word vectors have been shown to be effective at capturing token level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors. In contrast to continuous dense document representations, this formulation produces sparse, interpretable document mixtures through a non-negative simplex constraint. Our method is simple to incorporate into existing automatic differentiation frameworks and allows for unsupervised document representations geared for use by scientists while simultaneously learning word vectors and the linear relationships between them

Language as a Latent Variable: Discrete Generative Models for Sentence Compression
September 23 2016
https://arxiv.org/pdf/1609.07317.pdf
Yishu Miao 1 , Phil Blunsom 1,2
1 University of Oxford, 2 Google Deepmind {yishu.miao, phil.blunsom}@cs.ox.ac.uk

Abstract
In this work we explore deep generative models of text in which the latent representation of a document is itself drawn from a discrete language model distribution. We formulate a variational auto-encoder for inference in this model and apply it to the task of compressing sentences. In this application the generative model first draws a latent summary sentence from a background language model, and then subsequently draws the observed sentence conditioned on this latent summary. In our empirical evaluation we show that generative formulations of both abstractive and extractive compression yield state-of-the-art results when trained on a large amount of supervised data. Further, we explore semi-supervised compression scenarios where we show that it is possible to achieve performance competitive with previously proposed supervised models while training on a fraction of the supervised data.

SPEECH generation

Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation

http://www.eurasip.org/Proceedings/Eusipco/Eusipco2016/papers/1570256358.pdf

2016 24th European Signal Processing Conference (EUSIPCO)

Santiago Pascual, Antonio Bonafonte Universitat Politècnica de Catalunya Barcelona, Spain santiago.pascual@tsc.upc.edu, antonio.bonafonte@upc.edu

Abstract

—Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker adaptation by adding a new output branch to the model and successfully training it without the need of modifying the base optimized model. This fine tuning method achieves better results than training the new speaker from scratch with its own model.

Knowledge Graph generation

TransG : A Generative Model for Knowledge Graph EmbeddingDecember 27 2015

https://www.researchgate.net/profile/Han_Xiao31/publication/282000528_TransG_A_Generative_Mixture_Model_for_Knowledge_Graph_Embedding/links/57ab4fe008ae0932c97122f2.pdfHan Xiao1 , Minlie Huang 1 , Hao Yu 1 , Xiaoyan Zhu 1

1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, China

Abstract

Recently, knowledge graph embedding, which projects symbolic entities and relations into continuous vector space, has become a new, hot topic in artificial intelligence. This paper proposes a novel generative model (TransG) to address the issue of multiple relation semantics that a relation may have multiple meanings revealed by the entity pairs associated with the corresponding triples. The new model can discover latent semantics for a relation and leverage a mixture of relation-specific component vectors to embed a fact triple. To the best of our knowledge, this is the first generative model for knowledge graph embedding, and at the first time, the issue of multiple relation semantics is formally discussed. Extensive experiments show that the proposed model achieves substantial improvements against the state-of-the-art baselines.

Pointer Sentinel Mixture Models

September 26 2016

https://arxiv.org/pdf/1609.07843.pdf

Stephen Merity SMERITY@SALESFORCE.COM Caiming Xiong CXIONG@SALESFORCE.COM James Bradbury JAMES.BRADBURY@SALESFORCE.COM Richard Socher RSOCHER@SALESFORCE.COM MetaMind - A Salesforce Company, Palo Alto, CA, USA

Abstract

Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinelLSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.

DBPedia
Generating Paraphrases from DBPedia using Deep Learning
https://webnlg2016.sciencesconf.org/data/pages/book.pdf#page=62
Amin Sleimi Universite de Lorraine, Nancy (France) ´ amin.sleimi@gmail.com Claire Gardent CNRS/LORIA, Nancy (France) claire.gardent@loria.fr

Abstract

Recent deep learning approaches to Natural Language Generation mostly rely on sequence-to-sequence models. In these approaches, the input is treated as a sequence whereas in most cases, input to generation usually is either a tree or a graph. In this paper, we describe an experiment showing how enriching a sequential input with structural information improves results and help support the generation of paraphrases.

Evaluating Relation Inference via Question Answering
Supplementary Material
https://pdfs.semanticscholar.org/7342/b94d7ee97fdcced02f8736f0956bc3d0cabe.pdf

Omer Levy Ido Dagan Computer Science Department Bar-Ilan University Ramat-Gan, Israel {omerlevy,dagan}@cs.biu.ac.il

Extracting Open IE Assertions We extracted over 63 million unique subjectrelation-object triplets from Google’s Syntactic Ngrams (Goldberg and Orwant, 2013), including the number of times each one appeared in the corpus. This collection represents over 1.5 billion distinct appearances. The assertions may include multiword phrases as relations or arguments, for example: (chocolate, is made from, the cocoa bean) : 22 We extracted these triplets using the following method:

1. Obtain the collection of biarcs, triarcs, and quadarcs (the extended versions). The extended collections contain function words such as modality and negation.

2. Select all syntactic n-grams that contain a subject (including passive and controlling), followed by a verb at the n-gram’s root, and ending with an object (direct, prepositional, or copular). Other tokens may appear in between these ones and before the subject.

3. Retain all n-grams that have exactly one subject and one root.

4. Define the subject as the subtree rooted at the subject token.

5. Define the object as the subtree rooted at the last token in the n-gram.

6. Verify that the noun-phrases describing subjects or objects are continuous, and contain only internal dependencies of the types: det, amod, nn, or dep

Rationalizing neural predictions [textual data; sentiment summaries @peterdodds @ChrisDanforth we talked about this]

Rationalizing Neural Predictions

https://people.csail.mit.edu/taolei/papers/emnlp16_rationale.pdf

Tao Lei, Regina Barzilay and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {taolei, regina, tommi}@csail.mit.edu

Abstract

Prediction without justification has limited applicability. As a remedy, we learn to extract pieces of input text as justifications – rationales – that are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by desiderata for rationales. We evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. Our approach outperforms attention-based baseline by a significant margin. We also successfully illustrate the method on the question retrieval task.

Understanding Neural Networks through Representation Erasure

https://arxiv.org/pdf/1612.08220.pdf

Jiwei Li, Will Monroe and Dan Jurafsky Computer Science Department, Stanford University, Stanford, CA, USA jiweil,wmonroe4,jurafsky@stanford.edu

Abstract

While neural networks have been successfully applied to many natural language processing tasks, they come at the cost of interpretability. In this paper, we propose a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words. We present several approaches to analyzing the effects of such erasure, from computing its impact on evaluation metrics, to using reinforcement learning to erase the minimum set of input words in order to flip a neural model’s decision. In a comprehensive analysis of multiple NLP tasks from lexical (word shape, morphology) to sentence-level (sentiment) to document level (sentiment aspect), we show that the proposed methodology not only offers clear explanations about neural model decisions, but also provides a way to conduct error analysis on neural models.

New neural net for Language and Machine Translation! Fast and simple way of capturing very long range dependencies

Neural Machine Translation in Linear Time
http://arxiv.org/abs/1610.10099
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan A¨aron van den Oord, Alex Graves, Koray Kavukcuoglu {nalk,lespeholt,simonyan,avdnoord,gravesa,korayk}@google.com Google DeepMind, London, UK
Abstract
We present a neural architecture for sequence processing. The ByteNet is a stack of two dilated convolutional neural networks, one to encode the source sequence and one to decode the target sequence, where the target network unfolds dynamically to generate variable length outputs. The ByteNet has two core properties: it runs in time that is linear in the length of the sequences and it preserves the sequences’ temporal resolution. The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent neural networks. The ByteNet also achieves a performance on raw character-level machine translation that approaches that of the best neural translation models that run in quadratic time. The implicit structure learnt by the ByteNet mirrors the expected alignments between the sequences.

Training set for summarization

Document Understanding Conferences
http://duc.nist.gov/

Define Summarization Evaluation Metric: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) which defines n-gram overlap to human summaries.

Tal Linzen ‏@tallinzen 24 Dec 2016 - computational psychology
sure, king:queen etc, but did you know word2vec gets real SAT analogies right just 1% of the time? https://www.cambridge.org/core/journals/natural-language-engineering/article/div-classtitleword2vecdiv/B84AE4446BD47F48847B4904F0B36E0B …
http://bit.ly/2jYLjzO

Cambridge Handbook of Computational Psychology: Psychology, Psychology
CTI Reviews
October 16, 2016
Cram101 Textbook Reviews

https://play.google.com/store/books/details?id=4iCFxUjiQBoC

Sunday, February 1, 2015

Neural Network Based Language Models. Text Generation. Dialog

TOMAS MIKOLOV

2012

RNNLM Toolkit

Introduction

Download

Frequently asked questions

*******

methods for learning vector space representations of words:

2014-2015

Code to reproduce Mesnil experiments is available at

Omer Levy Yoav Goldberg

[PDF]arXiv:1506.00019v4 [cs.LG] 17 Oct 2015

[PDF]This paper - Microsoft Research

[PDF]Deep Sentence Embedding Using Long Short-Term ...

[PDF]Phenotyping of Clinical Time Series with LSTM Recurrent ...

[PDF]On Efficient Training of Word Classes and Their Application ...

[PDF]Fine-grained Opinion Mining with Recurrent Neural ...

[PDF]Interspeech - Merl

[PDF]Part III

[PDF]yes

[PDF]Language Models for Image Captioning - Margaret Mitchell

Training set for summarization

No comments:

Post a Comment

Followers

Blog Archive