Tuesday, September 30, 2014

H2O



here is the link to github to prepare development environment for sparkling water:https://github.com/0xdata/h2o-dev/blob/master/h2o-docs/src/docs/develop/sparkling_water.md
Sparkling Water Demo
Slides of the first Sparkling Water meet up
Sparkling Water code is here
Blog on Sparkling Water
Install and Test Instructions


setting github in vpn
http://stackoverflow.com/questions/783811/getting-git-to-work-with-a-proxy-server
http://www-proxy.us.oracle.com:80


git config --global http.proxy http://www-proxy.us.oracle.com:80
git config --global https.proxy https://www-proxy.us.oracle.com:80
setting gradle in vpn
http://stackoverflow.com/questions/8938994/gradlew-behind-a-proxy

github reference:
----------------------------------------------------------------------------------------------------------------------
Command to use :
git config --global http.proxy http://proxyuser:proxypwd@proxy.server.com:8080
git config --global https.proxy https://proxyuser:proxypwd@proxy.server.com:8080
  • change proxyuser by your proxy user
  • change proxypwd by your prxy password
  • change proxy.server.com by proxy server url
  • change 8080 by proxy port
If you decide at any time to reset this proxy and work without (no proxy):
Commands to use:
git config --global --unset http.proxy
git config --global --unset https.proxy
------------------------------
gradle reference:
gradle.properties
systemProp.http.proxyHost=www.somehost.org
systemProp.http.proxyPort=8080
systemProp.http.proxyUser=userid
systemProp.http.proxyPassword=password
systemProp.https.proxyHost=www.somehost.org
systemProp.https.proxyPort=8080
All you have to do is to create a file called gradle.properties (with the properties you mentioned above) and place it under your gradle user home directory (which defaults to USER_HOME/.gradle
OR in your project directory.
Gradle (the wrapper too!!!) automatically picks up gradle.properties files if found in the user home directory or project directories.
For more about this, read the gradle user guide, especially section 12.2. Gradle properties and system properties.

NLU


1. Omer Levy



Dependency-Based Word Embeddings
CESAR Data Annotation and Analysis
 – Principal Researcher at Lenovo
Research and design of speech-based and multimodal human-machine interfaces.

Previous
  1. Honda Research Institute
  2. Carnegie Mellon University,
  3. Advanced Telecommunications Research Institute International (NICT)
Education
  1. Carnegie Mellon University


Omer Levy∗
and Yoav Goldberg
Computer Science Department
Bar-Ilan University
Ramat-Gan, Israel
{omerlevy,yoav.goldberg}@gmail.com
http://anthology.aclweb.org/P/P14/P14-2050.pdf

http://levyomer.wordpress.com/category/semantic-similarity/
UKP-BIU: Similarity and Entailment Metrics for Student Response Analysis
http://aclweb.org/anthology/S/S13/S13-2048.pdf


    Yoav Goldberg

    personal page at  BGU

    I am a brilliant Senior Lecturer at Bar Ilan University's Computer Science Department. Before that, I did my Post-Doc as a Research Scientist at Google Research New York.

    I work on problems related to Natural Language Processing and Machine Learning. In particular I am interested in syntactic parsing, structured-prediction models, learning for greedy decoding algorithms, multilingual language understanding, confidence estimation and prediction of partial structures.

    I completed my PhD at Ben-Gurion University in 2011, under the supervision of Prof. Michael Elhadad.
    Publications

Mining the Hebrew Web,Yoav Goldberg



Tomas Mikolov
https://www.linkedin.com/pub/tomas-mikolov/88/311/598





The Stanford NLP Group

http://nlp.stanford.edu/

Semantic Matching Project - S-Match - open source
Trento University
http://semanticmatching.org/s-match.html


4. Semantic Indexing and Multimedia Event Detection:
ECNU at TRECVID 2012
Feng Wang†
, Zhanhu Sun†
, Daran Zhang†
, Chong-Wah Ngo‡
†Dept. of Computer Science and Technology, East China Normal University
‡ Dept. of Computer Science, City University of Hong Kong


5. UNAL-NLP: Combining Soft Cardinality Features for Semantic
Textual Similarity, Relatedness and Entailment



6. 

The Meaning Factory: Formal Semantics for Recognizing ...

www.let.rug.nl/bos/.../Bjerva2014SemEval.pdf

University of Groningen
by J Bjerva - ‎Cited by 1
The Meaning Factory: Formal Semantics for Recognizing Textual. Entailment and Determining Semantic Similarity. Johannes Bjerva. Univ. of Groningen.
Welcome to the Meaning Factory!
Click here to enter the Groningen Meaning Bank
Click here to visit the VP Ellipsis corpus
Click here to download Boxer

7. Vector Based Models Of Semantic Composition
http://www.freebook.im/vector/vector-based-models-of-semantic-composition-association-for









  • SemEval 2014 - QCRI-ALT server

    alt.qcri.org/semeval2014/cdrom/SemEval-2014.pdf
    SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching. Thomas ...... dard MT metrics such as BLEU, NIST, METEOR.
  • PDF of Vector Based Models Of Semantic Composition ...

    www.freebook.im/.../vector-based-models-of-semantic-composition-asso...

    11. http://csrc.nist.gov/publications/fips/f. Send. Read Download send to ....Semantiklue: Robust Semantic Similarity At Multiple Levels alt.qcri.org November 5, ...
  • Enriching Pdf 3d Documents With Semantic Metadata From ...

    www.freebook.im › 3d Templates For Tea Pots And Paper

    Apr 1, 2013 - Semantiklue: Robust Semantic Similarity At Multiple Levels alt.qcri.org November 5, .... 11. http://csrc.nist.gov/publications/fips/f. Send. Read









  • SemEval 2014 - QCRI-ALT server

    alt.qcri.org/semeval2014/cdrom/SemEval-2014.pdf
    SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching. Thomas ...... dard MT metrics such as BLEU, NIST, METEOR.

  • 2. David Cohen

    Carnegie Melloon University Ph.D.
    Intern at Toyota Info-tech Center
    https://www.linkedin.com/pub/david-cohen/47/14a/b32



    Data annotation and analysis of human interaction in a multi sensored car environment as a part of the Honda Research Institute sponsored CESAR initiative. 

    3. Antoine Raux
    Research and design of speech-based and multimodal human-machine interfaces.

  • IBM NLP Watson
  • IBM Research
    http://researcher.watson.ibm.com/researcher/view_group.php?
    Natural Language Processing
    publications
    http://researcher.watson.ibm.com/researcher/view_group_pubs.php?grp=147

    2013 Semantic Technologies in IBM WatsonTM
    http://www.patwardhans.net/papers/GliozzoBPM13.pdf

    Alfio Gliozzo
    IBM Watson Research Center
    Yorktown Heights, NY 10598
    gliozzo@us.ibm.com
    Or Biran
    Columbia University
    New York, NY 10027
    orb@cs.columbia.edu
    Siddharth Patwardhan
    IBM Watson Research Center
    Yorktown Heights, NY 10598
    siddharth@us.ibm.com
    Kathleen McKeown
    Columbia University
    New York, NY 10027
    kathy@cs.columbia.edu


  • commercial page: id=147http://www.ibm.com/smarterplanet/us/en/ibmwatson/implement-watson.html
  • Leader: Dr. Satya V. Nitta is currently the Program Leader of the Cognitive Learning Content research group at IBM’s T J Watson Research Center where he is developing cognitive computing based next generation adaptive learning technologies. http://researcher.ibm.com/researcher/view.php?person=us-svn
  • SALIENCE
  • http://datasift.com/source/19/salience-entities
  • Ahmed Hassan 
  • Discovering Salience in Textual Elements using Graph Mutual Reinforcemnt SI508 Project Ahmed Hassan hassanam@umich.edu
  • http://www-personal.umich.edu/~ladamic/courses/networks/si508f07/projects/SummarizationHITS.pdf


  • Saturday, September 27, 2014

    FINANCIAL ML

    Reinforcement Learning for Portfolio Management

    Problem definition:

    1. Based on pre-opening signals achieve daily Sharpe ratio close to 1 for a directional trade in SPY

    2. Use Atari and Go game winning strategy

    Demis Hassabis ‏@demishassabis (see later deepmind post)

    Modern Portfolio Theory states that adding assets to a diversified portfolio that have correlations of less than one with each other can decrease portfolio risk without sacrificing return. Such diversification will serve to increase the Sharpe ratio of a portfolio.

    Sharpe ratio = (Mean portfolio return − Risk-free rate)/Standard deviation of portfolio return



    The ex-ante Sharpe ratio formula uses expected returns while the ex-post Sharpe ratio uses realized returns.

    Read more: Sharpe Ratio Definition | Investopedia http://www.investopedia.com/terms/s/sharperatio.asp#ixzz4C15w8pDI

    AQR Funds
    https://funds.aqr.com/

    MSCI, A LEADER IN FACTOR INDEXING
    https://www.msci.com/factor-indexes




























    ALGORITHMIC TRADING

    Classification-Based Financial Markets Prediction Using Deep Neural Networks

    Matthew Francis Dixon
    Illinois Institute of Technology – Stuart School of Business, IIT

    Diego Klabjan
    Northwestern University

    Jin Hoon Bang
    Northwestern University

    May 18, 2016

    Abstract:

    Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community for their superior predictive properties including robustness to over fitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to back testing a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy back testing environment both of which are available as open source code written by the authors.

    Engineering ML


    1. Recovering And Using Use-Case-Diagram-To-Source-Code
    Traceability Links

    ESEC/FSE’07, September 3–7, 2007, Cavtat near Dubrovnik, Croatia.
    Copyright 2007 ACM 978-1-59593-811-4/07/0009 ...$5.00.

    2. google search for
    mining software repositories for traceability links
    https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=mining+software+repositories+for+traceability+links&revid=521494032

    GITHUB REPOSITORIES

    AMPLabs
    https://github.com/amplab/MLI

    Data Bricks
    https://github.com/databricks/spark-training


    Nitro - Scala + Spark Deep Learning ML
    http://www.meetup.com/SF-Bayarea-Machine-Learning/events/226657565/
    repo associated with the Nitro talk. A more technical version of the slides is linked to in the README in the repo:
    https://github.com/Nitro/data-pipelines


    Deep Networks with Stochastic Depth

    https://github.com/dblN/stochastic_depth_keras

    stochastic depth training procedure trains short networks and obtains deep networks.

    DEEP LEARNING (DL) CONVOLUTIONAL NEURAL NETWORKS (CNN)

    http://gitxiv.com/posts/vwfa87JJp5QTXE2PJ/deep-networks-with-stochastic-depth

    These notes accompany the Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition.
    For questions/concerns/bug reports regarding contact Justin Johnson regarding the assignments, or contact Andrej Karpathy regarding the course notes. You can also submit a pull request directly to our git repo.
    We encourage the use of the hypothes.is extension to annote comments and discuss these notes inline.
    http://cs231n.github.io/
    http://cs231n.github.io/neural-networks-1/



    https://github.com/irusman

    HOW TO BIG DATA


    http://stackoverflow.com/questions/1533330/writing-data-to-hadoop

    Thursday, September 25, 2014

    DIGITAL REPOSITORIES


    http://repositories.lib.utexas.edu/handle/2152/ETD-UT-2011-08-4329

    Ayers, Debra Lynn
    Consumer preference measurement and its practical application for selecting software product features

    ML SOFTWARE


    OCTAVE


    http://www.gnu.org/software/octave/community-news.html
    Octave 3.8 is a major new release with many new features, including an experimental graphical user interface. But because the GUI is not quite as polished as we would like, we have decided to wait until the 4.0.x release series before making the GUI the default interface.
    See the release notes or the "Experimental GUI Info" button in the GUI for more information about the release and how you can help us with GUI development and speed up the 4.0 release.

    BLAS Library selection

    During the install, a BLAS library was selected. The installer contains 2 BLAS implementations, the NetLib reference BLAS and OpenBLAS.
    Either can be selected after the install by copying librefblas.dll or libopenblas.dll to libblas.dll in the bin folder of the Octave installation.

    Included Octave Forge Packages

    A number of Octave-forge packages have been included with this install Octave, however they need to be installed in order to use them.
    To install:
    • Start Octave and then open the build_packages.m file found in the src folder where Octave was installed.
    • Run the build_packages.m script to build and install the packages.
    Packages must then be loaded in order to use them with the pkg load PACKAGENAME command.

    Other packages are available from Octave-Forge

    Tuesday, September 23, 2014

    Companies in Data Science Space


    https://www.metamind.io/
    GitHub
    https://github.com/MetaMind
    https://github.com/MetaMind/gevent-socketio

    website info:
    https://www.easycounter.com/report/metamind.io
    hosted by Amazon Technologies Inc

    Attacks on Metamind by Google

    Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
    Feb 19, 2016
    http://arxiv.org/pdf/1602.02697v2.pdf

    Nicolas Papernot - The Pennsylvania State University ngp5056@cse.psu.edu Patrick McDaniel - The Pennsylvania State University mcdaniel@cse.psu.edu Ian Goodfellow - Google Inc. goodfellow@google.com Somesh Jha - University of Wisconsin-Madison jha@cs.wisc.edu Z. Berkay Celik - The Pennsylvania State University zbc102@cse.psu.edu Ananthram Swami - US Army Research Laboratory ananthram.swami.civ@mail.mil

    Abstract - Advances in deep learning have led to the broad adoption of Deep Neural Networks (DNNs) to a range of important machine learning problems, e.g., guiding autonomous vehicles, speech recognition, malware detection. Yet, machine learning models, including DNNs, were shown to be vulnerable to adversarial samples—subtly (and often humanly indistinguishably) modified malicious inputs crafted to compromise the integrity of their outputs. Adversarial examples thus enable adversaries to manipulate system behaviors. Potential attacks include attempts to control the behavior of vehicles, have spam content identified as legitimate content, or have malware identified as legitimate software. Adversarial examples are known to transfer from one model to another, even if the second model has a different architecture or was trained on a different set. We introduce the first practical demonstration that this cross-model transfer phenomenon enables attackers to control a remotely hosted DNN with no access to the model, its parameters, or its training data. In our demonstration, we only assume that the adversary can observe outputs from the target DNN given inputs chosen by the adversary. We introduce the attack strategy of fitting a substitute model to the input-output pairs in this manner, then crafting adversarial examples based on this auxiliary model. We evaluate the approach on existing DNN datasets and real-world settings. In one experiment, we force a DNN supported by MetaMind (one of the online APIs for DNN classifiers) to mis-classify inputs at a rate of 84.24%. We conclude with experiments exploring why adversarial samples transfer between DNNs, and a discussion on the applicability of our attack when targeting machine learning algorithms distinct from DNNs.

    AlchemyAPI
    http://www.alchemyapi.com/
    website info:
    https://www.easycounter.com/report/alchemyapi.com
    Alchemyapi.com uses Drupal CMS and is hosted by Amazon Technologies Inc
    Server Technologies
    Nginx Backend server
    Drupal CMS
    see subdomains
    querybuilder.alchemyapi.com
    etc.
    https://www-03.ibm.com/press/us/en/pressrelease/46205.wss


    http://clarifai.com/

    http://www.fastforwardlabs.com/
    http://blog.fastforwardlabs.com/


    http://elastica.net/

    http://cortica.com/
    https://www.crunchbase.com/organization/cortica


    http://www.cortical.io/
    GitHub Public
    https://github.com/cortical-io/Public

    website info from
    https://www.easycounter.com/report/cortical.io
    hosted by fastly in SF



    http://numenta.org/
    http://numenta.org/events.html
    https://github.com/numenta/nupic/wiki/Natural-Language-Processing
    http://numenta.org/htm.html#prettyPhoto/0/

    https://www.cbinsights.com/blog/artificial-intelligence-startups-early-stage/

    Wit.ai - acquired by Facebook
    https://wit.ai/blog/2015/01/05/wit-ai-facebook

    Idibon - http://idibon.com/

    Identified has been acquired by Workday
    http://www.workday.com/identified.php

    Alchemy API-acquired by IBM

    Expect Labs - https://www.expectlabs.com/

    Enlitic

    Wise.io

    Nervana Systems

    PredictionIO

    Scaled Inference

    Kasisto

    x.ai
    http://tweetmotif.com/about

    Monica Andersonhttp://www.sens.ai/insights.html

    http://www.silk.co/
    High-Speed Data Analytics, Web Scraping and Open Data with Silk.co
    http://www.meetup.com/USF-Seminar-Series-in-Analytics/events/219199389/

    http://www.occamzrazor.com/
    Accelerating Scientific Discovery
    meaning of the name:
    http://en.wikipedia.org/wiki/Occam%27s_razor
    https://explorable.com/occams-razor
    Occam's Razor - The Simplest Answer is Usually Correct

    http://mafait.org/

    Thinknowlogy is open source







    Saturday, September 20, 2014

    AMPLab MLBase/MLI Projects

    AMPLab MLBase/MLI Projects

    http://mlbase.org/



    From "Nick Pentreath" <nick.pentre...@gmail.com>
    Subject Re: Machine Learning on Spark [long rambling discussion email]
    Date Thu, 25 Jul 2013 20:56:36 GMT
    Cool I totally understand the constraints you're under and it's not really a criticism at all
    - the amplab projects are all awesome!


    If I can find ways to help then all the better

    Sent from Mailbox for iPhone

    On Thu, Jul 25, 2013 at 10:04 PM, Matei Zaharia <matei.zaharia@gmail.com>
    wrote:

    > I fully agree that we need to be clearer with the timelines in AMP Lab. One thing is
    that many of these are still research projects, so it's hard to predict when they will be
    ready for prime-time. Usually with all the things we officially announce (e.g. MLlib, GraphX),
    and especially the things we put in the Spark codebase, the team behind them really wants
    to make them widely available and has committed to spend the engineering to make them usable
    in real applications (as opposed to prototyping and moving on). But even then it can take
    some time to get the first release out. Hopefully we'll improve our communication about this
    through more careful tracking in JIRA.
    > Matei
    > On Jul 25, 2013, at 11:41 AM, Ameet Talwalkar <ameet@eecs.berkeley.edu> wrote:
    >> Hi Nick,
    >>
    >> I can understand your 'frustration' -- my hope is that having discussions
    >> (like the one we're having now) via this mailing list will help mitigate
    >> duplicate work moving forward.
    >>
    >> Regarding your detailed comments, we are aiming to include various
    >> components that you mentioned in our release (basic evaluation for
    >> collaborative filtering, linear model additions, and basic support for
    >> sparse vectors/features).  One particularly interesting avenue that is not
    >> on our immediate roadmap is adding implicit feedback for matrix
    >> factorization.  Algorithms like SVD++ are often used in practice, and it
    >> would be great to add them to the MLI library (and perhaps also MLlib).
    >>
    >> -Ameet
    >>
    >>
    >> On Thu, Jul 25, 2013 at 6:44 AM, Nick Pentreath <nick.pentreath@gmail.com>wrote:
    >>
    >>> Hi
    >>>
    >>> Ok, that all makes sense. I can see the benefit of good standard libraries
    >>> definitely, and I guess the pieces that felt "missing" to me were what you
    >>> are describing as MLI and MLOptimizer.
    >>>
    >>> It seems like the aims of MLI are very much in line with what I have/had in
    >>> mind for a ML library/framework. It seems the goals overlap quite a lot.
    >>>
    >>> I guess one "frustration" I have had is that there are all these great BDAS
    >>> projects, but we never really know when they will be released and what they
    >>> will look like until they are. In this particular case I couldn't wait for
    >>> MLlib so ended up doing some work myself to port Mahout's ALS and of course
    >>> have ended up duplicating effort (which is not a problem as it was
    >>> necessary at the time and has been a great learning experience).
    >>>
    >>> Similarly for GraphX, I would like to develop a project for a Spark-based
    >>> version of Faunus (https://github.com/thinkaurelius/faunus) for batch
    >>> processing of data in our Titan graph DB. For now I am working with
    >>> Bagel-based primitives and Spark RDDs directly, but would love to use
    >>> GraphX, but have no idea when it will be released and have little
    >>> involvement until it is.
    >>>
    >>> (I use "frustration" in the nicest way here - I love the BDAS concepts and
    >>> all the projects coming out, I just want them all to be released NOW!! :)
    >>>
    >>> So yes I would love to be involved in MLlib and MLI work to the extent I
    >>> can assist and the work is aligned with what I need currently in my
    >>> projects (this is just from a time allocation viewpoint - I'm sure much of
    >>> it will be complementary).
    >>>
    >>> Anyway, it seems to me the best course of action is as follows:
    >>>
    >>>   - I'll get involved in MLlib and see how I can contribute there. Some
    >>>   things that jump out:
    >>>
    >>>
    >>>   - implicit preference capability for ALS model since as far as I can see
    >>>      currently it handles explicit prefs only? (Implicit prefs here:
    >>>      http://68.180.206.246/files/HuKorenVolinsky-ICDM08.pdf which is
    >>>      typically better if we don't have actual rating data but instead
    >>> "view",
    >>>      "click", "play" or whatever)
    >>
    >>      - RMSE and other evaluation metrics for ALS as well as test/train
    >>>      split / cross-val stuff?
    >>
    >>      - linear model additions, like new loss functions for hinge loss,
    >>>      least squares etc for SGD, as well as learning rate stuff (
    >>>      http://arxiv.org/pdf/1305.6646) and regularisers (L1/L2/Elasic Net)
    >>> -
    >>>      i.e. bring the SGD stuff in line with Vowpal Wabbit / sklearn (if
    >>> that's
    >>>      desirable, my view is yes)
    >>
    >>      - what about sparse weight and feature vectors for linear models/SGD?
    >>>      Together with hashing allows very large models while still being
    >>> efficient,
    >>>      and with L1 reg is particularly useful.
    >>
    >>      - finally what about online models? ie SGD models currently are
    >>>      "static" ie once trained can only predict, whereas SGD can of course
    >>> keep
    >>>      learning. Or does one simply re-train with the previous initial
    >>> weight
    >>>      vector (I guess that can work just as well)... Also on this
    >>> topic training
    >>>      / predicting on Streams as well as RDDs
    >>>   - I can put up what I have done to a BitBucket account and grant access
    >>>   to whichever devs would like to take a look. The only reason I don't
    >>> just
    >>>   throw it up on GitHub is that frankly it is not really ready and is not
    >>> a
    >>>   fully-fledged project yet (I think anyway). Possibly some of this can be
    >>>   useful (not that there's all that much there apart from the ALS (but it
    >>>   does solve for both explicit and implicit preference data as per
    >>> Mahout's
    >>>   implementation), KMeans (simpler than the one in MLlib as I didn't yet
    >>> get
    >>>   around to doing KMeans++ init) and the arg-parsing / jobrunner (which
    >>> may
    >>>   or may not be interesting both for ML and for Spark jobs in general)).
    >>>
    >>> Let me know your thoughts
    >>> Nick
    >>>
    >>>
    >>> On Wed, Jul 24, 2013 at 10:09 PM, Ameet Talwalkar
    >>> <ameet@eecs.berkeley.edu>wrote:
    >>>
    >>>> Hi Nick,
    >>>>
    >>>> Thanks for your email, and it's great to see such excitement around this
    >>>> work!  Matei and Reynold already addressed the motivation behind MLlib as
    >>>> well as our reasons for not using Breeze, and I'd like to give you some
    >>>> background about MLbase, and discuss how it may fit with your interests.
    >>>>
    >>>> There are three components of MLbase:
    >>>>
    >>>> 1) MLlib: As Matei mentioned, this is an ML library in Spark with core ML
    >>>> kernels and solid implementations of common algorithms that can be used
    >>>> easily by Java/Python and also called into by higher-level systems (e.g.
    >>>> MLI, Shark, PySpark).
    >>>>
    >>>> 2) MLI: this is an ML API that provides a common interface for ML
    >>>> algorithms (the same interface used in MLlib), and introduces high-level
    >>>> abstractions to simplify feature extraction / exploration and ML
    >>> algorithm
    >>>> development.  These abstractions leverage the kernels in MLlib when
    >>>> possible, and also introduce additional kernels.  This work also
    >>> includes a
    >>>> library written against the MLI.  The MLI is currently written against
    >>>> Spark, but is designed to be platform independent, so that code written
    >>>> against MLI could be run on different engines (e.g., Hadoop, GraphX,
    >>> etc.).
    >>>>
    >>>>
    >>>> 3) ML Optimizer: This piece automates the task of model selection.  The
    >>>> optimizer can be viewed as a search problem over feature extraction /
    >>>> algorithms included in the MLI library, and is in part based on efficient
    >>>> cross validation. This work is under active development but is in an
    >>>> earlier stage of development than MLlib and MLI.
    >>>>
    >>>> (note: MLlib will be included with the Spark codebase, while the MLI and
    >>> ML
    >>>> Optimizer will live in separate repositories.)
    >>>>
    >>>> As far as I can tell (though please correct me if I've misunderstood)
    >>> your
    >>>> main goals include:
    >>>>
    >>>> i) "consistency in the API"
    >>>> ii) "some level of abstraction but to keep things as simple as possible"
    >>>> iii) "execute models on Spark ... while providing workflows for
    >>> pipelining
    >>>> transformations, feature extraction, testing and cross-validation, and
    >>> data
    >>>> viz."
    >>>>
    >>>> The MLI (and to some extent the ML Optimizer) is very much in line with
    >>>> these goals, and it would be great if you were interested in contributing
    >>>> to it.  MLI is a private repository right now, but we'll make it public
    >>>> soon though, and Evan Sparks or I will let you know when we do so.
    >>>>
    >>>> Thanks again for getting in touch with us!
    >>>>
    >>>> -Ameet
    >>>>
    >>>>
    >>>> On Wed, Jul 24, 2013 at 11:47 AM, Reynold Xin <rxin@cs.berkeley.edu>
    >>>> wrote:
    >>>>
    >>>>> On Wed, Jul 24, 2013 at 1:46 AM, Nick Pentreath <
    >>>> nick.pentreath@gmail.com
    >>>>>> wrote:
    >>>>>
    >>>>>>
    >>>>>> I also found Breeze to be very nice to work with and like the DSL
    -
    >>>> hence
    >>>>>> my question about why not use that? (Especially now that Breeze is
    >>>>> actually
    >>>>>> just breeze-math and breeze-viz).
    >>>>>>
    >>>>>
    >>>>>
    >>>>> Matei addressed this from a higher level. I want to provide a little
    >>> bit
    >>>>> more context. A common properties of a lot of high level Scala DSL
    >>>>> libraries is that simple operators tend to have high virtual function
    >>>>> overheads and also create a lot of temporary objects. And because the
    >>>> level
    >>>>> of abstraction is so high, it is fairly hard to debug / optimize
    >>>>> performance.
    >>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>>> --
    >>>>> Reynold Xin, AMPLab, UC Berkeley
    >>>>> http://rxin.org
    http://mail-archives.apache.org/mod_mbox/spark-dev/201307.mbox/%3C1374785796360.b9575b2a@Nodemailer%3E


    ========
    Hi Lochana,

    This post is also referring to the MLbase project I mentioned in my
    previous email.  We have not open-sourced this work, but plan to do so.

    Moreover, you might want to check out the following JIRA ticket
    <https://issues.apache.org/jira/browse/SPARK-3530>that includes the design
    doc for ML pipelines and parameters in MLlib.  This design will include
    many of the ideas from our MLbase work.

    -Ameet

    On Sun, Oct 5, 2014 at 7:28 PM, Lochana Menikarachchi <lochanac@gmail.com>
    wrote:

    > Found this thread from April..
    >
    http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%
    > 3CCABjXkq6b7SfAxie4+AqTCmD8jSqBZnsxSFw6V5o0WWWouOBbCw@mail.gmail.com%3E
    >
    ==================

    Status of MLI?



    Wednesday, September 17, 2014

    ML SERVICE

    ML SERVICE


    Ersatz 

    Ersatz provides a unified machine learning environment with support for deep learning, data wrangling, a variety of "model backends", model and data visualization, team collaboration, and GPU computing--all from a browser.
    http://www.ersatzlabs.com/

    SKYMIND

    DEEPLEARNING
    http://www.skymind.io/


    ALCHEMYAPI

    http://www.alchemyapi.com

    SEMANTRIA

    https://semantria.com/

    Nitay Joffe

    Founder and CTO at ActionIQ