Tuesday, September 30, 2014

H2O

here is the link to github to prepare development environment for sparkling water:https://github.com/0xdata/h2o-dev/blob/master/h2o-docs/src/docs/develop/sparkling_water.md

Sparkling Water Demo

https://www.youtube.com/watch?feature=youtu.be&v=3AsT0FXOkYk&a=&app=desktop

Slides of the first Sparkling Water meet up

http://www.slideshare.net/0xdata/sparkling-water-5-2814

Sparkling Water code is here

https://github.com/0xdata/h2o-sparkling

Blog on Sparkling Water

http://databricks.com/blog/2014/06/30/sparkling-water-h20-spark.html

Install and Test Instructions

https://github.com/0xdata/h2o-sparkling/blob/master/README.md

setting github in vpn

http://stackoverflow.com/questions/783811/getting-git-to-work-with-a-proxy-server
http://www-proxy.us.oracle.com:80

git config --global http.proxy http://www-proxy.us.oracle.com:80
git config --global https.proxy https://www-proxy.us.oracle.com:80

setting gradle in vpn
http://stackoverflow.com/questions/8938994/gradlew-behind-a-proxy

github reference:
----------------------------------------------------------------------------------------------------------------------

Command to use :

git config --global http.proxy http://proxyuser:proxypwd@proxy.server.com:8080
git config --global https.proxy https://proxyuser:proxypwd@proxy.server.com:8080

change proxyuser by your proxy user
change proxypwd by your prxy password
change proxy.server.com by proxy server url
change 8080 by proxy port

If you decide at any time to reset this proxy and work without (no proxy):

Commands to use:

git config --global --unset http.proxy
git config --global --unset https.proxy

------------------------------

gradle reference:


gradle.properties
systemProp.http.proxyHost=www.somehost.org
systemProp.http.proxyPort=8080
systemProp.http.proxyUser=userid
systemProp.http.proxyPassword=password
systemProp.https.proxyHost=www.somehost.org
systemProp.https.proxyPort=8080


All you have to do is to create a file called gradle.properties (with the properties you mentioned above) and place it under your gradle user home directory (which defaults to USER_HOME/.gradle) 

OR in your project directory.

Gradle (the wrapper too!!!) automatically picks up gradle.properties files if found in the user home directory or project directories.

For more about this, read the gradle user guide, especially section 12.2. Gradle properties and system properties.

NLU

1. Omer Levy

Dependency-Based Word Embeddings
CESAR Data Annotation and AnalysisMay 2013 – December 2013Principal Researcher at Lenovo
Research and design of speech-based and multimodal human-machine interfaces.

Previous	Honda Research Institute, Carnegie Mellon University, Advanced Telecommunications Research Institute International (NICT)
Education	Carnegie Mellon University

Omer Levy∗
and Yoav Goldberg
Computer Science Department
Bar-Ilan University
Ramat-Gan, Israel
{omerlevy,yoav.goldberg}@gmail.com
http://anthology.aclweb.org/P/P14/P14-2050.pdf

http://levyomer.wordpress.com/category/semantic-similarity/
UKP-BIU: Similarity and Entailment Metrics for Student Response Analysis
http://aclweb.org/anthology/S/S13/S13-2048.pdf

Yoav Goldberg

personal page at BGU

http://www.cs.bgu.ac.il/~yoavg/uni/

I am a brilliant Senior Lecturer at Bar Ilan University's Computer Science Department. Before that, I did my Post-Doc as a Research Scientist at Google Research New York.

I work on problems related to Natural Language Processing and Machine Learning. In particular I am interested in syntactic parsing, structured-prediction models, learning for greedy decoding algorithms, multilingual language understanding, confidence estimation and prediction of partial structures.

I completed my PhD at Ben-Gurion University in 2011, under the supervision of Prof. Michael Elhadad.
Publications

http://www.cs.bgu.ac.il/~yoavg/publications/

Mining the Hebrew Web,Yoav Goldberg

https://www.youtube.com/watch?v=hiJO0uUfmHw

Tomas Mikolov
https://www.linkedin.com/pub/tomas-mikolov/88/311/598

The Stanford NLP Group

http://nlp.stanford.edu/

Semantic Matching Project - S-Match - open source
Trento University
http://semanticmatching.org/s-match.html

4. Semantic Indexing and Multimedia Event Detection:

ECNU at TRECVID 2012

Feng Wang†

, Zhanhu Sun†

, Daran Zhang†

, Chong-Wah Ngo‡

†Dept. of Computer Science and Technology, East China Normal University

‡ Dept. of Computer Science, City University of Hong Kong

http://www-nlpir.nist.gov/projects/tvpubs/tv12.papers/ecnu.pdf

5. UNAL-NLP: Combining Soft Cardinality Features for Semantic

Textual Similarity, Relatedness and Entailment

http://www.aclweb.org/anthology/S14-2131

The Meaning Factory: Formal Semantics for Recognizing ...

www.let.rug.nl/bos/.../Bjerva2014SemEval.pdf

University of Groningen

by J Bjerva - ‎Cited by 1

The Meaning Factory: Formal Semantics for Recognizing Textual. Entailment and Determining Semantic Similarity. Johannes Bjerva. Univ. of Groningen.

http://www.meaningfactory.com/

Welcome to the Meaning Factory!

Click here to enter the Groningen Meaning Bank
Click here to visit the VP Ellipsis corpus
Click here to download Boxer

7. Vector Based Models Of Semantic Composition
http://www.freebook.im/vector/vector-based-models-of-semantic-composition-association-for

SemEval 2014 - QCRI-ALT server

alt.qcri.org/semeval2014/cdrom/SemEval-2014.pdf

SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching. Thomas ...... dard MT metrics such as BLEU, NIST, METEOR.

PDF of Vector Based Models Of Semantic Composition ...

www.freebook.im/.../vector-based-models-of-semantic-composition-asso...

11. http://csrc.nist.gov/publications/fips/f. Send. Read Download send to ....Semantiklue: Robust Semantic Similarity At Multiple Levels alt.qcri.org November 5, ...

Enriching Pdf 3d Documents With Semantic Metadata From ...

www.freebook.im › 3d Templates For Tea Pots And Paper

Apr 1, 2013 - Semantiklue: Robust Semantic Similarity At Multiple Levels alt.qcri.org November 5, .... 11. http://csrc.nist.gov/publications/fips/f. Send. Read

SemEval 2014 - QCRI-ALT server

alt.qcri.org/semeval2014/cdrom/SemEval-2014.pdf

SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching. Thomas ...... dard MT metrics such as BLEU, NIST, METEOR.

2. David Cohen

Carnegie Melloon University Ph.D.
Intern at Toyota Info-tech Center
https://www.linkedin.com/pub/david-cohen/47/14a/b32

Data annotation and analysis of human interaction in a multi sensored car environment as a part of the Honda Research Institute sponsored CESAR initiative.

3. Antoine Raux

Research and design of speech-based and multimodal human-machine interfaces.

https://www.linkedin.com/profile/view?id=4820857

IBM NLP Watson

IBM Research
http://researcher.watson.ibm.com/researcher/view_group.php?
Natural Language Processing
publications
http://researcher.watson.ibm.com/researcher/view_group_pubs.php?grp=147

2013 Semantic Technologies in IBM WatsonTM
http://www.patwardhans.net/papers/GliozzoBPM13.pdf

Alfio Gliozzo
IBM Watson Research Center
Yorktown Heights, NY 10598
gliozzo@us.ibm.com
Or Biran
Columbia University
New York, NY 10027
orb@cs.columbia.edu
Siddharth Patwardhan
IBM Watson Research Center
Yorktown Heights, NY 10598
siddharth@us.ibm.com
Kathleen McKeown
Columbia University
New York, NY 10027
kathy@cs.columbia.edu

commercial page: id=147http://www.ibm.com/smarterplanet/us/en/ibmwatson/implement-watson.html

Leader: Dr. Satya V. Nitta is currently the Program Leader of the Cognitive Learning Content research group at IBM’s T J Watson Research Center where he is developing cognitive computing based next generation adaptive learning technologies. http://researcher.ibm.com/researcher/view.php?person=us-svn

SALIENCE

http://datasift.com/source/19/salience-entities

Ahmed Hassan

Discovering Salience in Textual Elements using Graph Mutual Reinforcemnt SI508 Project Ahmed Hassan hassanam@umich.edu

http://www-personal.umich.edu/~ladamic/courses/networks/si508f07/projects/SummarizationHITS.pdf

Saturday, September 27, 2014

FINANCIAL ML

Reinforcement Learning for Portfolio Management

Problem definition:

1. Based on pre-opening signals achieve daily Sharpe ratio close to 1 for a directional trade in SPY

2. Use Atari and Go game winning strategy
Demis Hassabis ‏@demishassabis (see later deepmind post)

Modern Portfolio Theory states that adding assets to a diversified portfolio that have correlations of less than one with each other can decrease portfolio risk without sacrificing return. Such diversification will serve to increase the Sharpe ratio of a portfolio.

Sharpe ratio = (Mean portfolio return − Risk-free rate)/Standard deviation of portfolio return

The ex-ante Sharpe ratio formula uses expected returns while the ex-post Sharpe ratio uses realized returns.

Read more: Sharpe Ratio Definition | Investopedia http://www.investopedia.com/terms/s/sharperatio.asp#ixzz4C15w8pDI

AQR Funds
https://funds.aqr.com/

MSCI, A LEADER IN FACTOR INDEXING
https://www.msci.com/factor-indexes

ALGORITHMIC TRADING

Classification-Based Financial Markets Prediction Using Deep Neural Networks

Matthew Francis Dixon
Illinois Institute of Technology – Stuart School of Business, IIT

Diego Klabjan
Northwestern University

Jin Hoon Bang
Northwestern University

May 18, 2016

Abstract:

Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community for their superior predictive properties including robustness to over fitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to back testing a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy back testing environment both of which are available as open source code written by the authors.

http://www.valuewalk.com/2016/06/classification-based-financial-markets-prediction-using-deep-neural-networks/
-----------------------------

LENDING

machine learning loan risk

https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=machine+learning+loan+risk

Using Dataset Transformations and Machine Learning to Assess Loan Risk

by andrewshikiar on January 30, 2014 http://blog.bigml.com/2014/01/30/using-dataset-transformations-and-machine-learning-to-assess-loan-risk/

Engineering ML

1. Recovering And Using Use-Case-Diagram-To-Source-Code

Traceability Links

http://www.cs.utexas.edu/users/speedway/DaCapo/papers/leanart-fpe-2007.pdf

ESEC/FSE’07, September 3–7, 2007, Cavtat near Dubrovnik, Croatia.
Copyright 2007 ACM 978-1-59593-811-4/07/0009 ...$5.00.

2. google search for
mining software repositories for traceability links
https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=mining+software+repositories+for+traceability+links&revid=521494032

GITHUB REPOSITORIES

AMPLabs
https://github.com/amplab/MLI

Data Bricks
https://github.com/databricks/spark-training

Nitro - Scala + Spark Deep Learning ML
http://www.meetup.com/SF-Bayarea-Machine-Learning/events/226657565/
repo associated with the Nitro talk. A more technical version of the slides is linked to in the README in the repo:
https://github.com/Nitro/data-pipelines

Deep Networks with Stochastic Depth

https://github.com/dblN/stochastic_depth_keras

stochastic depth training procedure trains short networks and obtains deep networks.

DEEP LEARNING (DL) CONVOLUTIONAL NEURAL NETWORKS (CNN)

http://gitxiv.com/posts/vwfa87JJp5QTXE2PJ/deep-networks-with-stochastic-depth

These notes accompany the Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition.
For questions/concerns/bug reports regarding contact Justin Johnson regarding the assignments, or contact Andrej Karpathy regarding the course notes. You can also submit a pull request directly to our git repo.
We encourage the use of the hypothes.is extension to annote comments and discuss these notes inline.
http://cs231n.github.io/
http://cs231n.github.io/neural-networks-1/

https://github.com/irusman

HOW TO BIG DATA

http://stackoverflow.com/questions/1533330/writing-data-to-hadoop

Thursday, September 25, 2014

DIGITAL REPOSITORIES

http://repositories.lib.utexas.edu/handle/2152/ETD-UT-2011-08-4329

Ayers, Debra Lynn
Consumer preference measurement and its practical application for selecting software product features

ML SOFTWARE

OCTAVE

http://www.gnu.org/software/octave/community-news.html
Octave 3.8 is a major new release with many new features, including an experimental graphical user interface. But because the GUI is not quite as polished as we would like, we have decided to wait until the 4.0.x release series before making the GUI the default interface.
See the release notes or the "Experimental GUI Info" button in the GUI for more information about the release and how you can help us with GUI development and speed up the 4.0 release.

BLAS Library selection

During the install, a BLAS library was selected. The installer contains 2 BLAS implementations, the NetLib reference BLAS and OpenBLAS.
Either can be selected after the install by copying librefblas.dll or libopenblas.dll to libblas.dll in the bin folder of the Octave installation.

Included Octave Forge Packages

A number of Octave-forge packages have been included with this install Octave, however they need to be installed in order to use them.
To install:

Start Octave and then open the build_packages.m file found in the src folder where Octave was installed.
Run the build_packages.m script to build and install the packages.

Packages must then be loaded in order to use them with the pkg load PACKAGENAME command.

Other packages are available from Octave-Forge

Tuesday, September 23, 2014

Companies in Data Science Space

https://www.metamind.io/
GitHub
https://github.com/MetaMind
https://github.com/MetaMind/gevent-socketio

website info:
https://www.easycounter.com/report/metamind.io
hosted by Amazon Technologies Inc

Attacks on Metamind by Google

Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Feb 19, 2016
http://arxiv.org/pdf/1602.02697v2.pdf

Nicolas Papernot - The Pennsylvania State University ngp5056@cse.psu.edu Patrick McDaniel - The Pennsylvania State University mcdaniel@cse.psu.edu Ian Goodfellow - Google Inc. goodfellow@google.com Somesh Jha - University of Wisconsin-Madison jha@cs.wisc.edu Z. Berkay Celik - The Pennsylvania State University zbc102@cse.psu.edu Ananthram Swami - US Army Research Laboratory ananthram.swami.civ@mail.mil

Abstract - Advances in deep learning have led to the broad adoption of Deep Neural Networks (DNNs) to a range of important machine learning problems, e.g., guiding autonomous vehicles, speech recognition, malware detection. Yet, machine learning models, including DNNs, were shown to be vulnerable to adversarial samples—subtly (and often humanly indistinguishably) modified malicious inputs crafted to compromise the integrity of their outputs. Adversarial examples thus enable adversaries to manipulate system behaviors. Potential attacks include attempts to control the behavior of vehicles, have spam content identified as legitimate content, or have malware identified as legitimate software. Adversarial examples are known to transfer from one model to another, even if the second model has a different architecture or was trained on a different set. We introduce the first practical demonstration that this cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data. In our demonstration, we only assume that the adversary can observe outputs from the target DNN given inputs chosen by the adversary. We introduce the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model. We evaluate the approach on existing DNN datasets and real-world settings. In one experiment, we force a DNN supported by MetaMind (one of the online APIs for DNN classifiers) to mis-classify inputs at a rate of 84.24%. We conclude with experiments exploring why adversarial samples transfer between DNNs, and a discussion on the applicability of our attack when targeting machine learning algorithms distinct from DNNs.

AlchemyAPI
http://www.alchemyapi.com/
website info:
https://www.easycounter.com/report/alchemyapi.com
Alchemyapi.com uses Drupal CMS and is hosted by Amazon Technologies Inc
Server Technologies
Nginx Backend server
Drupal CMS
see subdomains
querybuilder.alchemyapi.com
etc.

https://www-03.ibm.com/press/us/en/pressrelease/46205.wss

http://clarifai.com/

http://www.fastforwardlabs.com/
http://blog.fastforwardlabs.com/

http://elastica.net/

http://cortica.com/
https://www.crunchbase.com/organization/cortica

http://www.cortical.io/
GitHub Public
https://github.com/cortical-io/Public

website info from
https://www.easycounter.com/report/cortical.io
hosted by fastly in SF

http://numenta.org/
http://numenta.org/events.html
https://github.com/numenta/nupic/wiki/Natural-Language-Processing
http://numenta.org/htm.html#prettyPhoto/0/

https://www.cbinsights.com/blog/artificial-intelligence-startups-early-stage/

Wit.ai - acquired by Facebook
https://wit.ai/blog/2015/01/05/wit-ai-facebook

Idibon - http://idibon.com/

Identified has been acquired by Workday
http://www.workday.com/identified.php

Alchemy API-acquired by IBM

Expect Labs - https://www.expectlabs.com/

Enlitic

Wise.io

Nervana Systems

PredictionIO

Scaled Inference

Kasisto

x.ai
http://tweetmotif.com/about

Monica Andersonhttp://www.sens.ai/insights.html

http://www.silk.co/

High-Speed Data Analytics, Web Scraping and Open Data with Silk.co
http://www.meetup.com/USF-Seminar-Series-in-Analytics/events/219199389/

http://www.occamzrazor.com/

Accelerating Scientific Discovery

meaning of the name:
http://en.wikipedia.org/wiki/Occam%27s_razor
https://explorable.com/occams-razor
Occam's Razor - The Simplest Answer is Usually Correct

http://mafait.org/

Thinknowlogy is open source

Saturday, September 20, 2014

AMPLab MLBase/MLI Projects

http://mlbase.org/

From "Nick Pentreath" <nick.pentre...@gmail.com>
Subject Re: Machine Learning on Spark [long rambling discussion email]
Date Thu, 25 Jul 2013 20:56:36 GMT
Cool I totally understand the constraints you're under and it's not really a criticism at all
- the amplab projects are all awesome!

If I can find ways to help then all the better
—
Sent from Mailbox for iPhone

On Thu, Jul 25, 2013 at 10:04 PM, Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> I fully agree that we need to be clearer with the timelines in AMP Lab. One thing is
that many of these are still research projects, so it's hard to predict when they will be
ready for prime-time. Usually with all the things we officially announce (e.g. MLlib, GraphX),
and especially the things we put in the Spark codebase, the team behind them really wants
to make them widely available and has committed to spend the engineering to make them usable
in real applications (as opposed to prototyping and moving on). But even then it can take
some time to get the first release out. Hopefully we'll improve our communication about this
through more careful tracking in JIRA.
> Matei
> On Jul 25, 2013, at 11:41 AM, Ameet Talwalkar <ameet@eecs.berkeley.edu> wrote:
>> Hi Nick,
>>
>> I can understand your 'frustration' -- my hope is that having discussions
>> (like the one we're having now) via this mailing list will help mitigate
>> duplicate work moving forward.
>>
>> Regarding your detailed comments, we are aiming to include various
>> components that you mentioned in our release (basic evaluation for
>> collaborative filtering, linear model additions, and basic support for
>> sparse vectors/features). One particularly interesting avenue that is not
>> on our immediate roadmap is adding implicit feedback for matrix
>> factorization. Algorithms like SVD++ are often used in practice, and it
>> would be great to add them to the MLI library (and perhaps also MLlib).
>>
>> -Ameet
>>
>>
>> On Thu, Jul 25, 2013 at 6:44 AM, Nick Pentreath <nick.pentreath@gmail.com>wrote:
>>
>>> Hi
>>>
>>> Ok, that all makes sense. I can see the benefit of good standard libraries
>>> definitely, and I guess the pieces that felt "missing" to me were what you
>>> are describing as MLI and MLOptimizer.
>>>
>>> It seems like the aims of MLI are very much in line with what I have/had in
>>> mind for a ML library/framework. It seems the goals overlap quite a lot.
>>>
>>> I guess one "frustration" I have had is that there are all these great BDAS
>>> projects, but we never really know when they will be released and what they
>>> will look like until they are. In this particular case I couldn't wait for
>>> MLlib so ended up doing some work myself to port Mahout's ALS and of course
>>> have ended up duplicating effort (which is not a problem as it was
>>> necessary at the time and has been a great learning experience).
>>>
>>> Similarly for GraphX, I would like to develop a project for a Spark-based
>>> version of Faunus (https://github.com/thinkaurelius/faunus) for batch
>>> processing of data in our Titan graph DB. For now I am working with
>>> Bagel-based primitives and Spark RDDs directly, but would love to use
>>> GraphX, but have no idea when it will be released and have little
>>> involvement until it is.
>>>
>>> (I use "frustration" in the nicest way here - I love the BDAS concepts and
>>> all the projects coming out, I just want them all to be released NOW!! :)
>>>
>>> So yes I would love to be involved in MLlib and MLI work to the extent I
>>> can assist and the work is aligned with what I need currently in my
>>> projects (this is just from a time allocation viewpoint - I'm sure much of
>>> it will be complementary).
>>>
>>> Anyway, it seems to me the best course of action is as follows:
>>>
>>> - I'll get involved in MLlib and see how I can contribute there. Some
>>> things that jump out:
>>>
>>>
>>> - implicit preference capability for ALS model since as far as I can see
>>> currently it handles explicit prefs only? (Implicit prefs here:
>>> http://68.180.206.246/files/HuKorenVolinsky-ICDM08.pdf which is
>>> typically better if we don't have actual rating data but instead
>>> "view",
>>> "click", "play" or whatever)
>>
>> - RMSE and other evaluation metrics for ALS as well as test/train
>>> split / cross-val stuff?
>>
>> - linear model additions, like new loss functions for hinge loss,
>>> least squares etc for SGD, as well as learning rate stuff (
>>> http://arxiv.org/pdf/1305.6646) and regularisers (L1/L2/Elasic Net)
>>> -
>>> i.e. bring the SGD stuff in line with Vowpal Wabbit / sklearn (if
>>> that's
>>> desirable, my view is yes)
>>
>> - what about sparse weight and feature vectors for linear models/SGD?
>>> Together with hashing allows very large models while still being
>>> efficient,
>>> and with L1 reg is particularly useful.
>>
>> - finally what about online models? ie SGD models currently are
>>> "static" ie once trained can only predict, whereas SGD can of course
>>> keep
>>> learning. Or does one simply re-train with the previous initial
>>> weight
>>> vector (I guess that can work just as well)... Also on this
>>> topic training
>>> / predicting on Streams as well as RDDs
>>> - I can put up what I have done to a BitBucket account and grant access
>>> to whichever devs would like to take a look. The only reason I don't
>>> just
>>> throw it up on GitHub is that frankly it is not really ready and is not
>>> a
>>> fully-fledged project yet (I think anyway). Possibly some of this can be
>>> useful (not that there's all that much there apart from the ALS (but it
>>> does solve for both explicit and implicit preference data as per
>>> Mahout's
>>> implementation), KMeans (simpler than the one in MLlib as I didn't yet
>>> get
>>> around to doing KMeans++ init) and the arg-parsing / jobrunner (which
>>> may
>>> or may not be interesting both for ML and for Spark jobs in general)).
>>>
>>> Let me know your thoughts
>>> Nick
>>>
>>>
>>> On Wed, Jul 24, 2013 at 10:09 PM, Ameet Talwalkar
>>> <ameet@eecs.berkeley.edu>wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> Thanks for your email, and it's great to see such excitement around this
>>>> work! Matei and Reynold already addressed the motivation behind MLlib as
>>>> well as our reasons for not using Breeze, and I'd like to give you some
>>>> background about MLbase, and discuss how it may fit with your interests.
>>>>
>>>> There are three components of MLbase:
>>>>
>>>> 1) MLlib: As Matei mentioned, this is an ML library in Spark with core ML
>>>> kernels and solid implementations of common algorithms that can be used
>>>> easily by Java/Python and also called into by higher-level systems (e.g.
>>>> MLI, Shark, PySpark).
>>>>
>>>> 2) MLI: this is an ML API that provides a common interface for ML
>>>> algorithms (the same interface used in MLlib), and introduces high-level
>>>> abstractions to simplify feature extraction / exploration and ML
>>> algorithm
>>>> development. These abstractions leverage the kernels in MLlib when
>>>> possible, and also introduce additional kernels. This work also
>>> includes a
>>>> library written against the MLI. The MLI is currently written against
>>>> Spark, but is designed to be platform independent, so that code written
>>>> against MLI could be run on different engines (e.g., Hadoop, GraphX,
>>> etc.).
>>>>
>>>>
>>>> 3) ML Optimizer: This piece automates the task of model selection. The
>>>> optimizer can be viewed as a search problem over feature extraction /
>>>> algorithms included in the MLI library, and is in part based on efficient
>>>> cross validation. This work is under active development but is in an
>>>> earlier stage of development than MLlib and MLI.
>>>>
>>>> (note: MLlib will be included with the Spark codebase, while the MLI and
>>> ML
>>>> Optimizer will live in separate repositories.)
>>>>
>>>> As far as I can tell (though please correct me if I've misunderstood)
>>> your
>>>> main goals include:
>>>>
>>>> i) "consistency in the API"
>>>> ii) "some level of abstraction but to keep things as simple as possible"
>>>> iii) "execute models on Spark ... while providing workflows for
>>> pipelining
>>>> transformations, feature extraction, testing and cross-validation, and
>>> data
>>>> viz."
>>>>
>>>> The MLI (and to some extent the ML Optimizer) is very much in line with
>>>> these goals, and it would be great if you were interested in contributing
>>>> to it. MLI is a private repository right now, but we'll make it public
>>>> soon though, and Evan Sparks or I will let you know when we do so.
>>>>
>>>> Thanks again for getting in touch with us!
>>>>
>>>> -Ameet
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 11:47 AM, Reynold Xin <rxin@cs.berkeley.edu>
>>>> wrote:
>>>>
>>>>> On Wed, Jul 24, 2013 at 1:46 AM, Nick Pentreath <
>>>> nick.pentreath@gmail.com
>>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I also found Breeze to be very nice to work with and like the DSL
-
>>>> hence
>>>>>> my question about why not use that? (Especially now that Breeze is
>>>>> actually
>>>>>> just breeze-math and breeze-viz).
>>>>>>
>>>>>
>>>>>
>>>>> Matei addressed this from a higher level. I want to provide a little
>>> bit
>>>>> more context. A common properties of a lot of high level Scala DSL
>>>>> libraries is that simple operators tend to have high virtual function
>>>>> overheads and also create a lot of temporary objects. And because the
>>>> level
>>>>> of abstraction is so high, it is fairly hard to debug / optimize
>>>>> performance.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Reynold Xin, AMPLab, UC Berkeley
>>>>> http://rxin.org
http://mail-archives.apache.org/mod_mbox/spark-dev/201307.mbox/%3C1374785796360.b9575b2a@Nodemailer%3E

Apache Spark User List

========
Hi Lochana,

This post is also referring to the MLbase project I mentioned in my
previous email. We have not open-sourced this work, but plan to do so.

Moreover, you might want to check out the following JIRA ticket
<https://issues.apache.org/jira/browse/SPARK-3530>that includes the design
doc for ML pipelines and parameters in MLlib. This design will include
many of the ideas from our MLbase work.

-Ameet

On Sun, Oct 5, 2014 at 7:28 PM, Lochana Menikarachchi <lochanac@gmail.com>

wrote:

> Found this thread from April..

>

> http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%

> 3CCABjXkq6b7SfAxie4+AqTCmD8jSqBZnsxSFw6V5o0WWWouOBbCw@mail.gmail.com%3E

>

==================

Status of MLI?

http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-MLI-td3610.html

MLBase
BDAS
GraphX
https://amplab.cs.berkeley.edu/?s=mlbase&submit=Search

Wednesday, September 17, 2014

ML SERVICE

Ersatz

Ersatz provides a unified machine learning environment with support for deep learning, data wrangling, a variety of "model backends", model and data visualization, team collaboration, and GPU computing--all from a browser.

http://www.ersatzlabs.com/

SKYMIND

DEEPLEARNING
http://www.skymind.io/

ALCHEMYAPI

http://www.alchemyapi.com

SEMANTRIA

https://semantria.com/

Nitay Joffe

Founder and CTO at ActionIQ

http://beta.actioniq.co/