Thursday, August 14, 2014

NLP Parsers




1. Stanford
The Stanford NLP (Natural Language Processing) Group
Neural Network, Java
http://nlp.stanford.edu/software/lex-parser.shtml

Stanford Parser FAQ

http://nlp.stanford.edu/software/parser-faq.shtml

Stanford Deterministic Coreference Resolution System.

Probabilistic Context-Free Grammars (PCFGs)
http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf

Alphabetical list of part-of-speech tags used in the Penn Treebank Project:
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html



2. Google SyntaxNet Parsey Mc Parsey
Feed forward Neural Network, Python

GitHub SyntaxNet: Neural Models of Syntax

https://github.com/tensorflow/models/tree/master/syntaxnet
A TensorFlow implementation of the models described in
Andor et al. (2016)
http://arxiv.org/pdf/1603.06042v1.pdf

http://9to5google.com/2016/05/12/google-open-sources-parsey-mcparseface-the-worlds-most-accurate-parser/

http://www.theverge.com/2016/5/12/11666414/google-parsey-mcparseface-tensorflow-open-source-language-tool

Google NLP Research
http://research.google.com/pubs/NaturalLanguageProcessing.html



3. Yoav Goldberg & Eliyahu Kiperwasser
Bi-directional LSTM, Python
BIST Parsers
Graph & Transition based dependency parsers using BiLSTM feature extractors
The techniques behind the parser are described in the paper
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations.
http://arxiv.org/pdf/1603.04351v1.pdf
Eliyahu Kiperwasser Computer Science Department Bar-Ilan University Ramat-Gan, Israel elikip@gmail.com Yoav Goldberg Computer Science Department Bar-Ilan University Ramat-Gan, Israel yoav.goldberg@gmail.com

 Abstract We present a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTMs). Each sentence token is associated with a BiLSTM vector representing the token in its sentential context, and feature vectors are constructed by concatenating a few BiLSTM vectors. The BiLSTM is trained jointly with the parser objective, resulting in very effective feature extractors for parsing. We demonstrate the effectiveness of the approach by applying it to a greedy transition based parser as well as to a globally optimized graph-based parser. The resulting parsers have very simple architectures, and match or surpass the state-of-the-art accuracies on English and Chinese.

Our proposed feature extractors are based on a bidirectional recurrent neural network (BiRNN), an extension of RNNs that take into account both the past x1:i and the future xi:n. We use a specific flavor of RNN called a long short-term memory network (LSTM)

Going deeper A deep RNN (or k-layer RNN) is composed of k RNN functions RNN1, · · · , RNNk that feed into each other: the output h ℓ 1:n of RNNℓ becomes the input of RNNℓ+1. Stacking RNNs in this way was empirically shown to be effective. Finally, in a deep bidirectional RNN, both RNNF and RNNR are k-layer RNNs, and BIRNNℓ (x1:n, i) = v ℓ i = h ℓ F,i ◦ h ℓ R,i. In this work, we use BiRNNs and deep-BiRNNs interchangeably, specifying the number of layers when needed.

 Historical Notes RNNs were introduced by Elamn (Elman, 1990), and extended to BiRNNs by (Schuster and Paliwal, 1997). The LSTM variant of RNNs is due to (Hochreiter and Schmidhuber, 1997). BiLSTMs were recently popularized by Graves (2008), and deep BiRNNs were introduced to NLP by Irsoy and Cardie (2014), who used them for sequence tagging.

Required software
Python 2.7 interpreter
PyCNN library
https://github.com/elikip/bist-parser

see additional NLP refs at https://levyomer.wordpress.com/2016/05/01/annotating-relation-inference-in-context-via-question-answering/

independent outlets, blogs

Parsing English in 500 Lines of Python


Tutorials
NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science and Technology (NAIST)
http://www.phontron.com/slides/nlp-programming-en-10-parsing.pdf

Word Corpus can be created with Common Crawl
We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.
http://commoncrawl.org/


No comments:

Post a Comment