Wednesday, July 30, 2014


SPARK


Intro to Spark's Standard Libraries at Stanford Spark Class


slides: http://stanford.edu/~rezab/spar%C2%AD%C2%ADkclass/



Study Group

Every Wednesday  6:00 - 9.00 pm
Working through basics of Scala followed by Spark framework. Goal is to be writing machine learning algorithms in Spark

http://events.hackerdojo.com/event/5810468316774400-spark-study-group

Spark framework study group
Hosted by Richard Walker in Conference Room


Thursday, July 24, 2014


Vicarious FPC, Inc.
c/o Founders Fund
One Letterman Drive, Building C, Suite 420
San FranciscoCalifornia 94129

http://news.vicarious.com/


An Introduction to Tachyon - The Next Evolution in Fast Big Data Processing


  • July 23. 2014 · 6:30 PM

VIDEO: http://youtu.be/rIFTgnSRVqQ­ 

Wednesday, July 23, 2014


Tarle Speech and Language Services

https://www.youtube.com/user/TarleSpeech

Tuesday, July 22, 2014

Yann LeCun
Director of AI Research, Facebook
Founding Director of the NYU Center for Data Science
Silver Professor of Computer ScienceNeural Science, and Electrical and Computer Engineering,
The Courant Institute of Mathematical Sciences,
Center for Neural Science, and
Electrical and Computer Engineering Department, NYU School of Engineering
New York University.

http://yann.lecun.com/


Silicon Valley Data Science Journal Club Message Board › 


Richard Socher
http://cs224d.stanford.edu/
CS224d: Deep Learning for Natural Language Processing

April 23, 2015 Mountain View

• M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, J. Daume. A Neural Network for Factoid Question Answering over Paragraphs
• R. Socher, J. Bauer, C.D. Manning, A.Y. Ng. Parsing with Compositional Vector Grammars
same as

April 15 2015 San Francisco

http://www.meetup.com/Silicon-Valley-Data-Science-Journal-Club/events/221698151/

The papers for the meeting is:

• M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, J. Daume. A Neural Network for Factoid Question Answering over Paragraphs

• R. Socher, J. Bauer, C.D. Manning, A.Y. Ng. Parsing with Compositional Vector Grammars

March 3, 2015

• R. Socher, B. Huval, C.D. Manning, A.Y. Ng. Semantic Compositionality through Recursive Matrix-Vector Spaces 

Past Papers

http://www.meetup.com/Silicon-Valley-Data-Science-Journal-Club/messages/boards/thread/40111312

1. Large scale image annotation: learning to rank with joint
word-image embeddings
Jason Weston ·Samy Bengio ·Nicolas Usunier

http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36573.pdf

2. DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Conference on Computer Vision and Pattern Recognition (CVPR)
https://www.facebook.com/publications/546316888800776/

3. Lior Wolf, Tel Aviv University
Technion
Intel Project

http://icri-ci.technion.ac.il/files/2014/05/36-Lior-Wolf-140513.pdf

4. DeViSE: A Deep Visual-Semantic Embedding Model
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/41473.pdf

5. ImageNet Classification with Deep Convolutional
Neural Networks
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

6. 1) Jeffrey Pennington, Richard Socher, and Christopher D. Manning. "GloVe: Global Vectors for Word Representation." EMNLP, 2014. (Thanks Stoney!) 
2) Remi Lebret and Ronan Collobert. "Word Embeddings through Hellinger PCA." 2013. (Thanks Andre and Diego!) 

7. 1) Omer Levy and Yoav Goldberg. "Linguistic Regularities in Sparse and Explicit Word Representations." CoNLL, 2014. (Thanks Andre!) 
2) Marco Baroni, Georgiana Dinu, and German Kruszewski. "Don’t count, predict! A systematic comparison ofcontext-counting vs. context-predicting semantic vectors." Proceedings of ACL, 2014. (Thanks Andre!) 



Saturday, July 19, 2014


ML Samples


http://saldym.com/wiki/index.php?title=ML_Logistic_Regression

https://github.com/andersonvom/ml-class/blob/master/programming.exercises/06.support.vector.machines/mlclass-ex6/dataset3Params.m


Thursday, July 17, 2014


Innovation and Commercialization 

course from EDX

https://www.edx.org/course/mitx/mitx-3-086x-innovation-commercialization-880#

Monday, July 14, 2014


Machine Learning


http://dudarev.com/wiki/ml-class-logistic-regression.html

https://github.com/alexband/ml-class/blob/master/

https://github.com/everpeace/

https://github.com/zhouxc

Sunday, July 13, 2014




AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion. 
Creating and explain what a template that contains:
• Elastic load balancer 
• Auto scaling group 
• Launch config 
• EC2 instance with user data to turn on a simple webserver to show functionality 
At the end of this template you should be able to goto the Elastic load balancer's IP address and view a webpage.  This will all be created with one simple CloudFormation template.

Friday, July 11, 2014


Normal Equations, Gradient Descent and Linear Regression


http://puriney.github.io/numb/2013/07/06/normal-equations-gradient-descent-and-linear-regression/

Monday, July 7, 2014


Toronto University CSC 411: Machine Learning and Data Mining (Sept-Dec 2006)

http://www.cs.utoronto.ca/~radford/csc411.F06/

APL
one of the greatest programming languages ever
Bernd Ulmann
ulmann@vaxman.de
Vintage Computer Festival Europe 2007
29th April – 1st May 2007
Munich

http://www.vaxman.de/publications/apl_slides.pdf


A geek with a hat

http://swizec.com/blog

by Swizec Teller

First steps with Octave and machine learning

http://swizec.com/blog/first-steps-with-octave-and-machine-learning/swizec/2865

Stanford Machine Learning

The following notes represent a complete, stand alone interpretation of Stanford's machine learning course presented by Professor Andrew Ng and originally posted on the ml-class.org website during the fall 2011 semester. The topics covered are shown below, although for a more detailed summary see lecture 19. The only content not covered here is the Octave/MATLAB programming.
All diagrams are my own or are directly taken from the lectures, full credit to Professor Ng for a truly exceptional lecture course.


http://www.holehouse.org/mlclass/


CS 229
Machine Learning
Course Materials


http://cs229.stanford.edu/materials.html


Algorithms

Algorithms
Robert Sedgewick
and
Kevin Wayne
Princeton University



Sunday, July 6, 2014

Machine Learning: Linear Regression With Multiple Variables


http://blog.singhanuvrat.com/academic/machine-learning-linear-regression-with-multiple-variables


setenv("GNUTERM","qt")

http://stackoverflow.com/questions/13786754/octave-gnuplot-aquaterm-error-set-terminal-aqua-enhanced-title-figure-1-unk

Thursday, July 3, 2014

Spark Machine Learning Library (MLlib)




Spark Summit 2014

Analyzing endurance-sports activity data with Spark

William Benton (Red Hat, Inc.)

https://spark-summit.org/2014/talk/analyzing-endurance-sports-activity-data-with-spark





some of the academic and open-source background to what Alpine does:

http://madlib.net/

http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-38.html

http://alpinedatalabs.com/download.php?file=DataScienceWithHadoop.pdf
Alpine Labs Blog

full of references to Spark...

http://alpinenow.com/blog/


Alpine plus Spark on KDnuggets By Joel Horwitz, Alpine Data Labs, Apr 16, 2014.

from:
http://www.kdnuggets.com/2014/04/apache-spark-hot-new-trend-big-data.html
Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. Leveraging Hadoop Yarn, Alpine has made it very simple to get started with Spark.
Two years ago I was having coffee with a friend of mine and now colleague Dr. Will Ford in a cafe in San Mateo.  We were talking about data science and analytics when he leaned in real close to say, “Have you heard of Spark? This is going to change everything, again.” I had not heard of Spark and started researching the technology the moment I got back to my desk.  I quickly realized what all of the fuss was about when landed on the Berkeley AMPLab. 

Apache SparkSpark is new technology that sits on top of Hadoop Distributed File System (HDFS) that is characterized as “a fast and general engine for large-scale data processing.”  Spark has three key features that make it the most interesting up and coming technology to rock the big data world since Apache Hadoop in 2005.
  1. For iterative analysis like logistic regression, Random Forests, or other advanced algorithms, Spark has demonstrated 100X increase in speed that scales to hundreds of millions of rows.
  2. Spark has native support for the latest and greatest programming languages Java, Scala, and of course Python.
  3. Spark has generality or platform compatibility in both directions meaning it integrates nicely with SQL engines (Shark), Machine Learning (MLlib), and streaming (Spark Streaming) without requiring new software installed on the cluster using Hadoop’s new YARN cluster manager.

  
At Alpine, we have made it dead simple to get started with Spark by including the technology in our latest build out of the box.  We require no additional software or hardware to leverage our extensive list of operators for data transformation, exploration, and building advanced analytic models.  We leverage Hadoop Yarn (Hadoop NextGen) to launch Spark job without any pre-installation of Spark or modification of cluster configuration. This empowers our customers to have seamless integration of our Spark implementation and their Hadoop stack.  For example, we have analyzed 50 Million rows of account data in 50 seconds on a 20 node cluster recently at last month GigaOM conference. 

The screenshot below shows how Spark does a quick in-memory iteration.  It uses a standard way to do the gradient aggregation, as implemented by Databricks, a company which commercializes  the Apache Spark framework.Spark In-memory IterationAlso, see a demo at http://video.alpinenow.com/medias/f1nq8m48eu 

Interested in learning more about Alpine Chorus and Spark? Head over tohttp://start.alpinenow.com to get started. 
"How to Become a Data Scientist" Slides and Video


For those who missed it, we have posted the video of the talks. 

You can also find the slides Ryan and Dennis used here: 

Wednesday, July 2, 2014



http://www.physast.uga.edu/~mgeller/QS14main.html


International Conference on Quantum Simulation 2014 
SETI Institute, Mountain View, California
 July 9 and 10, 2014