Wednesday, December 10, 2014

NLP USE CASES AND TESTING

word2vec 2013

project page: https://code.google.com/p/word2vec/

core research:
Tomas Mikolov
Efficient Estimation of Word Representations in Vector Space 
(http://goo.gl/ZvBp8F)
http://arxiv.org/pdf/1301.3781.pdf

follow up - 
Distributed Representations of Words and Phrases

and their Compositionality
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf


word2vec Explained: Deriving Mikolov et al.’s
Negative-Sampling Word-Embedding Method
Yoav Goldberg and Omer Levy
{yoav.goldberg,omerlevy}@gmail.com
February 14, 2014



Distributed Representations of Sentences and Documents

Quoc Le, Tomas Mikolov
http://cs.stanford.edu/~quocle/paragraph_vector.pdf


blogs and tutorials:
http://www.i-programmer.info/news/105-artificial-intelligence/6264-machine-learning-applied-to-natural-language.html
Representing words as high dimensional vectors
https://plus.google.com/+ResearchatGoogle/posts/VwBUvQ7PvnZ
http://radimrehurek.com/2014/02/word2vec-tutorial/
Deep learning via word2vec’s “skip-gram and CBOW models”, using either hierarchical softmax or negative sampling [1] [2].
http://radimrehurek.com/gensim/models/word2vec.html
http://mfcabrera.com/research/2013/11/14/word2vec-german.blog.org/




DEEPLEARNING4J 

GloVe 2014

project page: http://nlp.stanford.edu/projects/glove/
core research:
GloVe: Global Vectors for Word Representation
http://stanford.edu/~jpennin/papers/glove.pdf
We provide the source code for the model as
well as trained word vectors at
http://nlp.stanford.edu/projects/glove/

Best word vectors so far? http://stanford.edu/~jpennin/papers/glove.pdf … 11% more accurate than word2vec, fast to train, statistically efficient, good task accuracy

RNN trained word vectors 2012

http://www.socher.org/index.php/Main/SemanticCompositionalityThroughRecursiveMatrix-VectorSpaces
Semantic Compositionality Through Recursive Matrix-Vector Spaces
Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.

Download Paper
SocherHuvalManningNg_EMNLP2012.pdf

Download Code
Relation Classification
relationClassification.zip (525MB) - All training code and testing code with trained models for new data. External packages (parser, tagger) included and the whole pipeline should run with one script. This is the package if you just want to use the best model to classify your relations.
relationClassification-No-MVRNN-models.zip (103MB) - All training code and testing code but WITHOUT trained models. External packages (parser, tagger) included. Here you need to first run the full training script, which will take a few hours to run.
relationClassification-Only-code.zip (170kB) - All training code and testing code but WITHOUT trained models, external packages, word vectors or anything else. This package includes only the code so you can study the algorithm.

How to measure quality of the word vectors

Several factors influence the quality of the word vectors:
  • amount and quality of the training data
  • size of the vectors
  • training algorithm
The quality of the vectors is crucial for any application. However, exploration of different hyper-parameter settings for complex tasks might be too time demanding. Thus, we designed simple test sets that can be used to quickly evaluate the word vector quality.
For the word relation test set described in [1], see ./demo-word-accuracy.sh, for the phrase relation test set described in [2], see ./demo-phrase-accuracy.sh. Note that the accuracy depends heavily on the amount of the training data; our best results for both test sets are above 70% accuracy with coverage close to 100%.

test metric, tests GloVe vs Word2Vec

On the importance of comparing apples to apples: a case study using the GloVe model
Yoav Goldberg, 10 August 2014


links from word2vec paper http://arxiv.org/pdf/1301.3781.pdf:

The test set is available at http://www.fit.vutbr.cz/˜imikolov/rnnlm/word-test.v1.txt
http://ronan.collobert.com/senna/
http://metaoptimize.com/projects/wordreprs/
http://www.fit.vutbr.cz/˜imikolov/rnnlm/
http://ai.stanford.edu/˜ehhuang/
Microsoft Research Sentence Completion Challenge
http://research.microsoft.com/en-us/um/people/cburges/tech_reports/MSR-TR-2011-129.pdf

G. Zweig, C.J.C. Burges. The Microsoft Research Sentence Completion Challenge, Microsoft
Research Technical Report MSR-TR-2011-129, 2011.
Appendix: Full List of Training Data
The Microsoft Sentence Completion Challenge has been recently introduced as a task for advancing language modeling and other NLP techniques [32]. This task consists of 1040 sentences, where one word is missing in each sentence and the goal is to select word that is the most coherent with the rest of the sentence, given a list of five reasonable choices.

C.J.C. Burges papers:
http://research.microsoft.com/en-us/um/people/cburges/pubs.htm

CORTICAL

Potential Projects in Fashion AI

Fashion Marketing and Digital Media Group on LinkedIn. 

https://alicemitchellx.wordpress.com/2014/12/09/the-perspective-media-interview/ 
http://box-of-style.blogspot.com/2014/12/recommended-read-imagine-world-where.html 

If you would like to submit predictions for project related to the future of fashion please let me know. Welcome your feedback and questions. 

Paul


https://www.stylewe.com/

StyleWe is an online fashion shopping platform featuring independent fashion designers. We are committed to providing shoppers with original, high quality, and exclusive fashion products from independent designers.

By working with cutting edge independent fashion designers from around the world, and combining them with our high quality production and digital marketing capabilities, we will turn the fashion designers’ dreams into reality by providing high fashion to customers worldwide. 

Rather than just an online shopping store, we would like to create a community which will be shared by both designers and customers. The community will enable all parties to communicate, share ideas, and recognize each other. It would not only provide instant feedback for fashion designers when launching new concepts or products, but would also allow customers to share their shopping experiences and fashion dreams.

We bring together designers and fashion covering many different styles. We hope that every one of our customers will find their own unique and exclusive designer fashions at StyleWe.

We believe the fashion trend should not be controlled by the few, but rather be guided by the collective actions of every designer and fashion consumer. At StyleWe, our goal is to empower designers so that they no longer feel hidden behind the brand, but are able to proactively communicate directly with their customers throughout the entire fashion life cycle.

We believe fashion should be personal and diversified. Fashion designers should not cater exclusively to the rich and famous. We have dedicated ourselves to enabling talented fashion designers to build their own brands and achieve their dream. Together with our designers, we will deliver high quality designer fashions to everyone

Monday, December 8, 2014

Cortical.io



http://www.crunchbase.com/organization/cept-systems

list of suggestions:

1. use cases and implementations for each cortical API call, similar to word2vec use cases and more...
2. webinars:
 end-to-end examples of cortical word representation combined with deep learning
end-to-end examples of cortical word representation combined with deep learning deployed on a spark cluster
3. schools, similar to http://www.next.ml/
4. meetup presentations - new tech, hackers and founders, etc.
5, participation in summits - spark, solr, etc...ML, NLP, NLU
6. participation in semeval
7. bloomberg
8. cnbc
9. tests:

How to measure quality of the word vectors

simple test sets that can be used to quickly evaluate the word vector quality - 
word relation test set 
the phrase relation test 

best result, average

example:
comparing apples to apples: a case study using the GloVe model

https://docs.google.com/document/d/1ydIujJ7ETSZ688RGfU5IMJJsbxAi-kRl8czSwpti15s/mobilebasic?pli=1






Language processing in the brain


http://en.wikipedia.org/wiki/Language_processing_in_the_brain

Computational Linguistics


http://plato.stanford.edu/entries/computational-linguistics/

SEMEVAL NIST COMPETITIONS


http://alt.qcri.org/semeval2014/
http://alt.qcri.org/semeval2014/task1/index.php?id=results
http://alt.qcri.org/semeval2014/task3/index.php?id=results
http://alt.qcri.org/semeval2014/task5/?id=data-and-tools

word2vec


https://groups.google.com/forum/#!msg/word2vec-toolkit/ZcOst7kEjaI/rv_A6LaE9vkJ
word2vec-toolkit

Re: Abridged summary of word2vec...@googlegroups.com - 1 update in 1 topic
1 post by 1 author
Tim Finin

The SemEval workshop (http://en.wikipedia.org/wiki/SemEval) ran tasks in
2012, 2013 and 2014 where the goal was to compute the semantic
similarity of two sentences on a scale from 0 to 5. Each year they
provided training and test datasets with human judgments. These could
easily be used to evaluate and compare the performance of this and other
ideas using word2vec data. Papers on the participating systems can be
found in the ACL repository
(http://aclanthology.info/events/semeval-201+X for X in range(12:15)).
For an overview of the most recent task, see
http://aclanthology.info/papers/semeval-2014-task-10-multilingual-semantic-textual-similarity.

What are some standard ways of computing the distance between documents?

There's a number of different ways of going about this depending on exactly how much semantic information you want to retain and how easy your documents are to tokenize (html documents would probably be pretty difficult to tokenize, but you could conceivably do something with tags and context.)
Some of them have been mentioned by ffriend, and the paragraph vectors by user1133029 is a really solid one, but I just figured I would go into some more depth about plusses and minuses of different approaches.
  • Cosine Distance - Tried a true, cosine distance is probably the most common distance metric used generically across multiple domains. With that said, there's very little information in cosine distance that can actually be mapped back to anything semantic, which seems to be non-ideal for this situation.
  • Levenshtein Distance - Also known as edit distance, this is usually just used on the individual token level (words, bigrams, etc...). In general I wouldn't recommend this metric as it not only discards any semantic information, but also tends to treat very different word alterations very similarly, but it is an extremely common metric for this kind of thing
  • LSA - Is a part of a large arsenal of techniques when it comes to evaluating document similarity called topic modeling. LSA has gone out of fashion pretty recently, and in my experience, it's not quite the strongest topic modeling approach, but it is relatively straightforward to implement and has a few open source implementations
  • LDA - Is also a technique used for topic modeling, but it's different from LSA in that it actually learns internal representations that tend to be more smooth and intuitive. In general, the results you get from LDA are better for modeling document similarity than LSA, but not quite as good for learning how to discriminate strongly between topics.
  • Pachinko Allocation - Is a really neat extension on top of LDA. In general, this is just a significantly improved version of LDA, with the only downside being that it takes a bit longer to train and open-source implementations are a little harder to come by
  • word2vec - Google has been working on a series of techniques for intelligently reducing words and documents to more reasonable vectors than the sparse vectors yielded by techniques such as Count Vectorizers and TF-IDF. Word2vec is great because it has a number of open source implementations. Once you have the vector, any other similarity metric (like cosine distance) can be used on top of it with significantly more efficacy.
  • doc2vec - Also known as paragraph vectors, this is the latest and greatest in a series of papers by Google, looking into dense vector representations of documents. The gensim library in python has an implementation of word2vec that is straightforward enough that it can pretty reasonably be leveraged to build doc2vec, but make sure to keep the license in mind if you want to go down this route


http://www.fi.muni.cz/usr/sojka/papers/pakray-sojka-raslan2014.pdf
An Architecture for Scientific Document Retrieval
Using Textual and Math Entailment Modules
Partha Pakray and Petr Sojka
Faculty of Informatics, Masaryk University
Botanická 68a, 602 00 Brno, Czech Rep

plain Word2vec with pretrained Google news data by LSA gave better result ...Technology (NIST), Evaluation Exercises on Semantic Evaluation (SemEval)5

word2vec entity relationship resolution - as a search key

competing frameworks: stanford nlp vs coreference

 http://www.ark.cs.cmu.edu/ARKref/ 
submissions to the CoNLL 2011 / 2012 shared task on coreference modeling:

http://conll.cemantix.org/2011/http://conll.cemantix.org/2012/

1. was English only, 2012 involved English, Chinese and Arabic.

The Stanford system (Lee et al.'s submission) was the top performing system in 2011, but a few other submissions reported slightly better performance on English in 2012. I'm not sure if any other substantial work has been done on coreference resolution since then.

In my experience, Stanford's system is the winner in usability. Getting a hold of the code for the other submissions can be difficult - your best bet might be to try contacting the authors directly.


 Poesio's BART  
http://www.bart-coref.org/










Tuesday, November 4, 2014

PREDICTIVE ANALYTICS


http://www.mindtree.com/sites/default/files/mindtree-thought-posts-white-paper-enabling-predictive-analysis-in-service-oriented-bpm-solutions.pdf

http://acharyavivek.wordpress.com/2013/02/20/intelligent-bpm-oracle-ibpm/

http://www.oracle.com/us/technologies/bpm/wp-intelligent-bpm-2280473.pdf

Oracle R Enterprise

http://www.oracle.com/technetwork/database/database-technologies/r/r-enterprise/overview/index.html

Oracle R Enterprise, a component of the Oracle Advanced Analytics Option, makes the open source R statistical programming language and environment ready for the enterprise and big data. 



Learn More about Oracle R Enterprise
white papers
tutorials
blogs


http://www.oracle.com/technetwork/database/database-technologies/r/r-enterprise/learnmore/index.html


R Technologies from Oracle

http://www.oracle.com/technetwork/database/database-technologies/r/r-technologies/r-offerings-1566363.html


https://blogs.oracle.com/R/entry/oracle_r_enterprise_tutorial_series

Oracle R Enterprise Tutorial Series on Oracle Learning Library

Oracle Server Technologies Curriculum has just released the Oracle R Enterprise Tutorial Series, which is publicly available on Oracle Learning Library (OLL). This 8 part interactive lecture series with review sessions covers Oracle R Enterprise 1.1 and an introduction to Oracle R Connector for Hadoop 1.1:
  • Introducing Oracle R Enterprise
  • Getting Started with ORE
  • R Language Basics
  • Producing Graphs in R
  • The ORE Transparency Layer
  • ORE Embedded R Scripts: R Interface
  • ORE Embedded R Scripts: SQL Interface
  • Using the Oracle R Connector for Hadoop 
https://stbeehive.oracle.com/teamcollab/wiki/BEAM+BI+12c+Workspace:Predictive+Analytics+Main+Page

http://www.inside-r.org/category/packagetags/machinelearning
http://en.wikipedia.org/wiki/Non-negative_matrix_factorization

terminology

Generalized Additive Model residuals
Extract Model Fitted Values

Fitting Generalized Linear Models
devianceup to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.

Oracle Nimbula:





Wednesday, October 8, 2014

GOOGLE SCHOLARS, RESEARCH & CODE

GOOGLE SCHOLARS, RESEARCH & CODE



http://research.google.com/archive/bigtable.html

PCA papers

Attack Resistant Collaborative Filtering
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/34404.pdf

Tomas Mikolov
Research scientist, Facebook
http://scholar.google.com/citations?user=oBu8kMMAAAAJ&hl=en

[PDF] Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

G Mesnil, MA Ranzato, T Mikolov, Y Bengio - arXiv preprint arXiv:1412.5335, 2014
Abstract: Sentiment analysis is a common task in natural language processing that aims to
detect polarity of a text document (typically a consumer review). In the simplest settings, we
discriminate only between positive and negative sentiment, turning the task into a ...

[PDF] Learning Longer Memory in Recurrent Neural Networks

T Mikolov, A Joulin, S Chopra, M Mathieu, MA Ranzato - arXiv preprint arXiv: …, 2014
Abstract: Recurrent neural network is a powerful model that learns temporal patterns in
sequential data. For a long time, it was believed that recurrent networks are difficult to train
using simple optimizers, such as stochastic gradient descent, due to the so-called ...



RICHARD SOCHER


SENTIMENT ANALYSIS online demo system at

http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
CS224d: Deep Learning for Natural Language Processinghttp://cs224d.stanford.edu/


[PDF] Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence

NB Shah, A Parekh, S Balakrishnan, K Ramchandran… - 2015

Page 1. Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology
Dependence Nihar B. Shah Abhay Parekh Sivaraman Balakrishnan Kannan
Ramchandran Joseph Bradley Martin Wainwright UC Berkeley Abstract ...

[PDF] Finding The Best Model Among Representative Compositional Models

M Muraoka, S Shimaoka, K Yamamoto, Y Watanabe… - 2014
Page 1. PACLIC 28 !65 Finding The Best Model Among Representative Compositional
Models Masayasu Muraoka† Sonse Shimaoka‡ Kazeto Yamamoto† Yotaro Watanabe†
Naoaki Okazaki†
Kentaro Inui† Tohoku University ...


[PDF] Transition-based Knowledge Graph Embedding with Relational Mapping Properties

M Fan, Q Zhou, E Chang, TF Zheng - 2014
Page 1. PACLIC 28 !328 Transition-based Knowledge Graph Embedding with Relational
Mapping Properties Miao Fan†,
, Qiang Zhou†, Emily Chang‡, Thomas Fang Zheng†,
†CSLT, Tsinghua National Laboratory for Information ...


[PDF] Deep Multimodal Learning for Audio-Visual Speech Recognition

Y Mroueh, E Marcheret, V Goel - arXiv preprint arXiv:1501.05396, 2015
... In Issues in Visual and Audio-Visual Speech Processing. MIT Press, 2004. [SCMN]
Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Ng. Reasoning
with neural tensor networks for knowledge base completion. ...


[PDF] Visual Knowledge Discovery using Deep Learning

GA Sigurdsson, S Hu
... IEEE, 2013. [5] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet:
A large-scale hierarchical image database. ... Unsupervised discovery of mid-level discrim- inative
patches. Computer VisionECCV 2012, 2012. [16] Richard Socher and Li Fei-Fei. ...


[PDF] An Italian Corpus for Aspect Based Sentiment Analysis of Movie Reviews

A Sorgente, VC Flegrei, G Vettigli, F Mele
... USA. ACM. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher
D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep mod-
els for semantic compositionality over a sentiment treebank. ...


An Adaptive Search Path Traverse for Large-scale Video Frame Retrieval

DTN NGUYEN, Y KIYOKI - Information Modelling and Knowledge Bases XXVI, 2014
Page 336. Information Modelling and Knowledge Bases XXVI B. Thalheim et al.(Eds.)
324 IOS Press, 2014 © 2014 The authors and IOS Press. All rights reserved. doi:
10.3233/978-1-61499-472-5-324 An Adaptive Search Path ...


[PDF] Distributed index for matching multimedia objects

A Abdelsadek - 2014
Page 1. DISTRIBUTED INDEX FOR MATCHING MULTIMEDIA OBJECTS by Ahmed Abdelsadek
B.Sc., Cairo University, 2010 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF Master of Science in the ...



[PDF] Research Report: A Unified Framework for Salient Object Detection of Single/Multiple Images Based on Object Distributions at Semantic Level

W Tang, Z Shi, Y Wu - 2015

Page 1. Research Report: A Unified Framework for Salient Object Detection of
Single/Multiple Images Based on Object Distributions at Semantic Level Wei Tang,
Zhenwei Shi, Ying Wu January 19, 2015 1 Motivation Conventionally ...

[PDF] Combining Language and Vision with a Multimodal Skip-gram Model

A Lazaridou, NT Pham, M Baroni - arXiv preprint arXiv:1501.02598, 2015
Page 1. Combining Language and Vision with a Multimodal Skip-gram Model
AngelikiLazaridou NghiaThePham MarcoBaroni Center for Mind/Brain Sciences University
of Trento {angeliki.lazaridou|thenghia.pham|marco.baroni}@unitn.it Abstract ...

Applying skip-gram word estimation and SVM-based classification for opinion mining Vietnamese food places text reviews

DH Phan, TD Cao - Proceedings of the Fifth Symposium on Information and …, 2014
... NIPS 2013: 3111-3119. [5] Richard Socher and Christopher Manning. Tutorials on Deep
Learning for NLP. In procedding of NAACL HLT, 2013. ... St. Catherine's College, 2005.
[13] Richard Socher, Christopher D. Manning, Andrew Y. Ng. ...

[PDF] From visual attributes to adjectives through decompositional distributional semantics

A Lazaridou, G Dinu, A Liska, M Baroni - arXiv preprint arXiv:1501.02714, 2015
Page 1. From visual attributes to adjectives through decompositional distributional semantics
AngelikiLazaridou GeorgianaDinu AdamLiska MarcoBaroni Center for Mind/Brain Sciences
University of Trento {angeliki.lazaridou|georgiana.dinu|adam.liska|marco.baroni}@unitn.it ...

[PDF] Navigating the Semantic Horizon using Relative Neighborhood Graphs

AC Gyllensten, M Sahlgren - arXiv preprint arXiv:1501.02670, 2015
Page 1. Navigating the Semantic Horizon using Relative Neighborhood Graphs
Amaru Cuba Gyllensten and Magnus Sahlgren Gavagai Bondegatan 21 116 33
Stockholm Sweden {amaru|mange}@gavagai.se Abstract This ...

[PDF] Emotional Analysis of Personal Narrative

J Lee - 2015
... [PL08] B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and
Trends in Information Retrieval, 2(1-2):1– 135. [SHP*11] Richard Socher, Eric H Huang, Jeffrey
Pennington, Andrew Y Ng, and Christo- pher D Manning. 2011a. ...

Semantic video segmentation using both appearance and geometric information

J Woo, K Kitani, S Kim, H Kwak, W Shim - IS&T/SPIE Electronic Imaging, 2015
... We plan to solve these problems in the future work. REFERENCES [1] Li-Jia Li, Richard
Socher, and Li Fei-Fei, “Towards Total Scene Understanding: Classification, Annotation
and Segmentation in an Automatic Framework,” Proc. ...

" Hey# 311, Come Clean My Street!": A Spatio-temporal Sentiment Analysis of Twitter Data and 311 Civil Complaints

R Eshleman, H Yang - Big Data and Cloud Computing (BdCloud), 2014 IEEE …, 2014
Page 1. Abstract— Twitter data has been applied to address a wide range of
applications (eg, political election prediction and disease tracking); however, no
studies have been conducted to explore the interactions and potential ...

[PDF] Words in context: a reference perspective on the lexicon

P Vossen, T Caselli, F Ilievski, R Izquierdo, A Lopopolo…
... In Pro- ceedings of the 9th International Conference on Se- mantic Systems, I-SEMANTICS '13,
pages 121– 124, New York, NY, USA. ACM. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai
Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hi- erarchical image database. ...



[PDF] Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings

S Arora, Y Li, Y Liang, T Ma, A Risteski - arXiv preprint arXiv:1502.03520, 2015
Page 1. Random Walks on Context Spaces: Towards an Explanation of the Mysteries of
Semantic Word Embeddings Sanjeev Arora ∗ Yuanzhi Li † Yingyu Liang ‡ Tengyu Ma § Andrej
Risteski ¶ February 13, 2015 Abstract The papers of Mikolov et al. ...

[PDF] Combining Compositional and Latent Factorization Methods for Knowledge Base Inference

M Gardner
Page 1. Thesis Proposal Combining Compositional and Latent Factorization Methods
for Knowledge Base Inference Matt Gardner Abstract A recent focus in natural
language processing research has been on the creation of ...

Yoshua Bengio



Scholar Alert: New articles in Yoshua Bengio's profile

[PDF] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

K Xu, J Ba, R Kiros, A Courville, R Salakhutdinov… - arXiv preprint arXiv: …, 2015
Abstract: Inspired by recent work in machine translation and object detection, we introduce
an attention based model that automatically learns to describe the content of images. We
describe how we can train this model in a deterministic manner using standard ...

[PDF] Gated Feedback Recurrent Neural Networks

J Chung, C Gulcehre, K Cho, Y Bengio - arXiv preprint arXiv:1502.02367, 2015
Abstract: In this work, we propose a novel recurrent neural network (RNN) architecture. The
proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking
multiple recurrent layers by allowing and controlling signals flowing from upper recurrent ...

PDF] Towards Biologically Plausible Deep Learning

Y Bengio, DH Lee, J Bornschein, Z Lin - arXiv preprint arXiv:1502.04156, 2015
Abstract: Neuroscientists have long criticised deep learning algorithms as incompatible with
current knowledge of neurobiology. We explore more biologically plausible versions of deep
representation learning, focusing here mostly on unsupervised learning but developing a ...

[PDF] RMSProp and equilibrated adaptive learning rates for non-convex optimization

YN Dauphin, H de Vries, J Chung, Y Bengio - arXiv preprint arXiv:1502.04390, 2015
Abstract: Parameter-specific adaptive learning rate methods are computationally efficient
ways to reduce the ill-conditioning problems encountered when training large deep
networks. Following recent work that strongly suggests that most of the critical points ...


Andrej Karpathy
BLOG
http://karpathy.github.io/

Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov 
http://www.cs.toronto.edu/~nitish/unsupervised_video/




Samuel Bowman
PhD Student, Stanford University
Verified email at stanford.edu - Homepage http://stanford.edu/~sbowman/

News

  • I'm lecturing in CS 224U (Natural Language Understanding) and LING 1 this quarter. Expect slides soon.
  • I proposed my dissertation under the tentative title of Realizing natural language semantics in learned representations.
  • Interested in recursive NNs in MATLAB? I have a new release of my research code up, and I'm happy to offer support.
  • Interested in training data for textual entailment? Be in touch, some colleagues and I collecting some!
  • I'll be back at Google next summer to work with Oriol Vinyals on machine learning for language meaning.
  • I just posted an expanded version of my manuscript on logical behavior in deep neural networks for language on arXiv.
  • I have a new short paper up on arXiv on learning word vectors that encode lexical relationships that I recently presented at the AAAI Spring Symposium on Knowledge Representation and Reasoning.




Eduard Hovy
Carnegie Mellon University
Verified email at cmu.edu - Homepage





  1. [PDF]What are Sentiment, Affect, and Emotion? Applying the ...

    www.springer.com/.../9783319080420...

    Springer Science+Business Media
    Applying the Methodology of MichaelZock to Sentiment Analysis ... Sentiment analysis or opinion mining refers to the application of natural language

  2. Language Production, Cognition,

    and the Lexicon
  3. inspired by Michael Zock (retired in 2013 after 30 years of research)
What are Sentiment, Affect, and Emotion? Applying the Methodology of Michael Zock to Sentiment Analysis
EH Hovy
Language Production, Cognition, and the Lexicon, 13-24
 2015Retrofitting Word Vectors to Semantic Lexicons
M Faruqui, J Dodge, SK Jauhar, C Dyer, E Hovy, NA Smith
arXiv preprint arXiv:1411.4166
12014What a Nasty day: Exploring Mood-Weather Relationship from Twitter
J Li, X Wang, E Hovy
Proceedings of the 23rd ACM International Conference on Conference on ...
 2014Sentiment Analysis on the People’s Daily
J Li, E Hovy
 2014The C@ merata Task at MediaEval 2014: Natural language queries on classical music scores
R Sutcliffe, T Crawford, C Fox, DL Root, E Hovy
MediaEval 2014 Workshop, Barcelona, Spain
42014Application of Prize based on Sentence Length in Chunk-based Automatic Evaluation of Machine Translation
H Echizen’ya, K Araki, E Hovy
Proc. of the Ninth Workshop on Statistical Machine Translation, 381-386
12014Metaphor Detection through Term Relevance
M Schulder, E Hovy
ACL 2014, 18
 2014Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts
J Li, A Ritter, C Cardie, E Hovy
Proceedings of Empirical Methods in Natural Language Processing
12014Data integration from open internet sources and network detection to combat underage sex trafficking
DR Silva, A Philpot, A Sundararajan, NM Bryan, E Hovy
Proceedings of the 15th Annual International Conference on Digital ...
 2014A taxonomy and a knowledge portal for cybersecurity
D Klaper, E Hovy
Proceedings of the 15th Annual International Conference on Digital ...
 2014Scoring coreference partitions of predicted mentions: A reference implementation
S Pradhan, X Luo, M Recasens, E Hovy, V Ng, M Strube
Proceedings of the Association for Computational Linguistics
32014286 The Functinal Perspective on Language and Discourse
MV Escandell Vidal, T Espigares, N Fabb, R Fawcett, Z Fenghui, A Fetzer, ...
The Functional Perspective on Language and Discourse: Applications and ...
 2014Spatial compactness meets topical consistency: jointly modeling links and content for community detection
M Sachan, A Dubey, S Srivastava, EP Xing, E Hovy
Proceedings of the 7th ACM international conference on Web search and data ...
12014Automatic Post-Editing Method Using Translation Knowledge Based on Intuitive Common Parts Continuum for Statistical Machine Translation
H Echizen’ya, K Araki, Y Uchida, E Hovy
Speech and Computer, 129-136
 2014Overview of CLEF QA Entrance Exams Task 2014
A Peñas, Y Miyao, Á Rodrigo, E Hovy, N Kando
CLEF
12014Recursive deep models for discourse parsing
J Li, R Li, E Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language ...
22014An extension of BLANC to system mentions
X Luo, S Pradhan, M Recasens, E Hovy
Proceedings of ACL, Baltimore, Maryland, June
12014Weakly Supervised User Profile Extraction from Twitter
J Li, A Ritter, E Hovy
ACL
62014RIPTIDE: Learning violation prediction models from boarding activity data
H Chalupsky, E Hovy
Technologies for Homeland Security (HST), 2013 IEEE International Conference ...
 2013What Is a Paraphrase?
R Bhagat, E Hovy
Computational Linguistics 39 (3), 463-472
132013


Michael Zock
Research director at the CNRS, (LIF) university of Aix-Marseille

Verified email at lif.univ-mrs.fr


How Well Can a Corpus-Derived Co-Occurrence Network

Simulate Human Associative Behavior?
http://www.aclweb.org/anthology/W/W14/W14-0509.pdf

Yoshua Bengio
Professor, U. Montreal, Computer Sc. & Op. Res., member/Fellow of CIFAR, CRM, REPARTI, GRSNC, CIRANO
Verified email at umontreal.ca - Homepage


YOSHUA BENGIO'S ANSWER ON QUORA

I want to do an independent study on deep learning, but rather than some tutorial, I am interested in digging deep and implement the workings of a fundamental paper in this field for which code and data is available. Any direction will be deeply appreciated.
http://www.quora.com/What-are-some-fundamental-deep-learning-papers-for-which-code-and-data-is-available-to-reproduce-the-result-and-on-the-way-grasp-deep-learning/answer/Yoshua-Bengio?srid=dCMG&share=1
Here are some (paper, code url) pairs from deep learning research:...

Ben Sandbank

Refining Generative Language Models using Discriminative Learning
Ben Sandbank Blavatnik School of Computer Science Tel-Aviv University Tel-Aviv 69978, Israel sandban@post.tau.ac.il

Generative Language Model:

Language modeling is a fundamental task in natural language processing and is routinely employed in a wide range of applications, such as speech recognition, machine translation, etc’. Traditionally, a language model is a probabilistic model which assigns a probability value to a sentence or a sequence of words. We refer to these as generative language models. A very popular example of a generative language model is the n-gram, which conditions the probability of the next word on the previous (n-1)-words.

 discriminative language model:

Although simple and widely-applicable, it has proven difficult to allow n-grams, and other forms of generative language models as well, to take advantage of non-local and overlapping features.1 These sorts of features, however, pose no problem for standard discriminative learning methods, e.g. large-margin classifiers. For this reason, a new class of language model, the discriminative language model, has been proposed recently to augment generative language models (Gao et al., 2005; Roark et al., 2007). Instead of providing probability values, discriminative language models directly classify sentences as either correct or incorrect, where the definition of correctness depends on the application (e.g. grammatical / ungrammatical, correct translation / incorrect translation, etc'). Discriminative learning methods require negative samples. Given that the corpora used for training language models contain only real sentences, i.e. positive samples, obtaining these can be problematic. 
  

Hal Daumé

natural language processing blog

Hyperparameter search, Bayesian optimization and related topics

ML Scalability to millions of features

Carnegie Mellon University

Petuum: A Framework for Iterative-Convergent Distributed ML

http://biglearn.org/2013/files/papers/biglearning2013_submission_11.pdf

AMPLabs, Berkeley

Distributed Machine Learning and Graph Processing with Sparse Matrices
Paper #83
https://amplab.cs.berkeley.edu/wp-content/uploads/2013/03/eurosys13-paper83.pdf


Adam Gibson
SlideShare
http://www.slideshare.net/agibsonccc/ir-34811120

Chong Wang


Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS)
Q Diao, M Qiu, CY Wu, AJ Smola, J Jiang, C Wang
Proceedings of the 20th ACM SIGKDD international conference on Knowledge ...
2014
Dynamic Language Models for Streaming Text
D Yogatama, C Wang, BR Routledge, NA Smith, EP Xing
Transactions of the Association for Computational Linguistics 2, 181-192
2014
Personalized collaborative clustering
Y Yue, C Wang, K El-Arini, C Guestrin
Proceedings of the 23rd international conference on World wide web, 75-84
12014
Community Specific Temporal Topic Discovery from Social Media
Z Hu, C Wang, J Yao, E Xing, H Yin, B Cui
arXiv preprint arXiv:1312.0860
2013
Asymptotically exact, embarrassingly parallel MCMC
W Neiswanger, C Wang, E Xing
arXiv preprint arXiv:1311.4780
292013
A Nested HDP for Hierarchical Topic Models
J Paisley, C Wang, D Blei, MI Jordan
arXiv preprint arXiv:1301.3570
2013
Modeling overlapping communities with node popularities
PK Gopalan, C Wang, D Blei
Advances in Neural Information Processing Systems, 2850-2858
32013
Variance reduction for stochastic gradient optimization
C Wang, X Chen, AJ Smola, EP Xing
Advances in Neural Information Processing Systems, 181-189
112013


RNN CODE on GITHub

Awesome Recurrent Neural Networks

A curated list of resources dedicated to recurrent neural networks

Maintainers - Myungsub Choi, Jiwon Kim
pages for other topics: awesome-deep-vision, awesome-random-forest
https://github.com/kjw0612/awesome-rnn



Adversarial Attacks on AI APIs DNN online

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples

http://arxiv.org/pdf/1602.02697v2.pdf
Feb 19, 2016

Nicolas Papernot - The Pennsylvania State University ngp5056@cse.psu.edu Patrick McDaniel - The Pennsylvania State University mcdaniel@cse.psu.edu Ian Goodfellow - Google Inc. goodfellow@google.com Somesh Jha - University of Wisconsin-Madison jha@cs.wisc.edu Z. Berkay Celik - The Pennsylvania State University zbc102@cse.psu.edu Ananthram Swami - US Army Research Laboratory ananthram.swami.civ@mail.mil
Abstract - Advances in deep learning have led to the broad adoption of Deep Neural Networks (DNNs) to a range of important machine learning problems, e.g., guiding autonomous vehicles, speech recognition, malware detection. Yet, machine learning models, including DNNs, were shown to be vulnerable to adversarial samples—subtly (and often humanly indistinguishably) modified malicious inputs crafted to compromise the integrity of their outputs. Adversarial examples thus enable adversaries to manipulate system behaviors. Potential attacks include attempts to control the behavior of vehicles, have spam content identified as legitimate content, or have malware identified as legitimate software. Adversarial examples are known to transfer from one model to another, even if the second model has a different architecture or was trained on a different set. We introduce the first practical demonstration that this cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data. In our demonstration, we only assume that the adversary can observe outputs from the target DNN given inputs chosen by the adversary. We introduce the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model. We evaluate the approach on existing DNN datasets and real-world settings. In one experiment, we force a DNN supported by MetaMind (one of the online APIs for DNN classifiers) to mis-classify inputs at a rate of 84.24%. We conclude with experiments exploring why adversarial samples transfer between DNNs, and a discussion on the applicability of our attack when targeting machine learning algorithms distinct from DNNs.



Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion
https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Xin Luna Dong ∗ , Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy † , Thomas Strohmann, Shaohua Sun, Wei Zhang Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043 {lunadong|gabr|geremy|wilko|nlao|kpmurphy|tstrohmann|sunsh|weizh}@google.com 

ABSTRACT
 Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft’s Satori, and Google’s Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.

Knowledge Vault Slides
http://www.slideshare.net/hustwj/kdd14-constructing-and-mining-webscale-knowledge-graphs
Published on Aug 25, 2014
Antoine Bordes (Facebook)
abordes@fb.com
Evgeniy Gabrilovich (Google)
gabr@google.com
A Review of “Knowledge Vault: A Web-Scale Approach to a Probabilistic Knowledge Fusion”
http://artent.net/2014/11/25/a-review-of-knowledge-vault-a-web-scale-approach-to-a-probabilistic-knowledge-fusion/

Deep Learning with TensorFlow
https://bigdatauniversity.com/courses/deep-learning-tensorflow/
This Deep Learning with TensorFlow course focuses on TensorFlow. If you are new to the subject of deep learning, consider taking our Deep Learning 101 course first.
Traditional neural networks rely on shallow nets, composed of one input, one hidden layer and one output layer. Deep-learning networks are distinguished from these ordinary neural networks having more hidden layers, or so-called more depth. These kind of nets are capable of discovering hidden structures within unlabeled and unstructured data (i.e. images, sound, and text), which consitutes the vast majority of data in the world.
TensorFlow is one of the best libraries to implement deep learning. TensorFlow is a software library for numerical computation of mathematical expressional, using data flow graphs. Nodes in the graph represent mathematical operations, while the edges represent the multidimensional data arrays (tensors) that flow between them. It was created by Google and tailored for Machine Learning. In fact, it is being widely used to develop solutions with Deep Learning.
In this TensorFlow course, you will be able to learn the basic concepts of TensorFlow, the main functions, operations and the execution pipeline. Starting with a simple “Hello Word” example, throughout the course you will be able to see how TensorFlow can be used in curve fitting, regression, classification and minimization of error functions. This concept is then explored in the Deep Learning world. You will learn how to apply TensorFlow for backpropagation to tune the weights and biases while the Neural Networks are being trained. Finally, the course covers different types of Deep Architectures, such as Convolutional Networks, Recurrent Networks and Autoencoders.

Course Syllabus
Module 1 – Introduction to TensorFlow
  • HelloWorld with TensorFlow
  • Linear Regression
  • Nonlinear Regression
  • Logistic Regression
  • Activation Functions
Module 2 – Convolutional Neural Networks (CNN)
  • CNN History
  • Understanding CNNs
  • CNN Application
Module 3 – Recurrent Neural Networks (RNN)
  • Intro to RNN Model
  • Long Short-Term memory (LSTM)
  • Recursive Neural Tensor Network Theory
  • Recurrent Neural Network Model
Module 4 - Unsupervised Learning
  • Applications of Unsupervised Learning
  • Restricted Boltzmann Machine
  • Collaborative Filtering with RBM
Module 5 - Autoencoders
  • Introduction to Autoencoders and Applications
  • Autoencoders
  • Deep Belief Network

GENERAL INFORMATION

  • This TensorFlow course is free.
  • This course if with Python language.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.

RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE

  • Neural Network

REQUIREMENTS

  • Python programming