Saturday, May 30, 2015

REST Platform Development for Java and C++ Backend



Web Site Development

Java 8, Tomcat 8, and RESTEasy on Mac OS X
IntelliJ
Remote deployment on Linux server from IntelliJ with Tomcat extras

REST with Java (JAX-RS) using Jersey - Tutorial
http://www.vogella.com/tutorials/REST/article.html
eclipse web development
http://www.vogella.com/tutorials/EclipseWTP/article.html
ubuntu Tomcat installation and configuration
http://www.vogella.com/tutorials/ApacheTomcat/article.html

angularJS calling jersey servlet
Jersey Core Server » 2.17 installation
http://mvnrepository.com/artifact/org.glassfish.jersey.core/jersey-server/2.17
Jersey Servlet Container
https://jersey.java.net/apidocs/latest/jersey/org/glassfish/jersey/servlet/ServletContainer.html

Jersey 2.17 User Guide
https://jersey.java.net/documentation/latest/index.html

Client APIs
https://jersey.java.net/documentation/latest/client.html

AngularJS Example Using a Java RESTful Web Service
http://draptik.github.io/blog/2013/07/13/angularjs-example-using-a-java-restful-web-service/

example demonstrates JAX-RS 2.0 asynchronous API, both on the client and
server side
https://github.com/jersey/jersey/tree/master/examples/server-async-standalone

Eclipse Luna works with Tomcat 8
http://stackoverflow.com/questions/17868232/how-to-use-tomcat-8-in-eclipse
download eclipse
https://www.eclipse.org/downloads/?osType=macosx
extensions, including maven plugin
https://www.eclipse.org/users/
installation on mac osx
http://www.cs.dartmouth.edu/~cs5/install/eclipse-osx/

Real-Time Messaging
metamind.io uses python version of socket.io (originated from Node.js)
Atmosphere 2.0 is natively supported by Jersey server
Cross-Framework and Cross-Browser Asynchronous Framework for Real Time Applications on the JVMhttp://async-io.org/release.html
Examples of Jersey with Atmosphere
https://github.com/Atmosphere/atmosphere/wiki/Getting-Started-with-the-samples
contains:
runtime: server side components for Servlet based framework.
javascript: client side library for Atmosphere
wAsync: Java client side library for Atmosphere

Links for java version of socket.io
Getting Started with Socket.IO
This blog introduces Socket.IO for the JVM, or how to write Socket.IO applications running on the JVM, using the Atmosphere Framework!
https://github.com/Atmosphere/atmosphere/wiki/Getting-Started-with-Socket.IO

JSONP for cross-domain calls
http://stackoverflow.com/questions/2067472/what-is-jsonp-all-about

How to avoid OutOfMemoryError when uploading a large file using Jersey client
http://stackoverflow.com/questions/10326460/how-to-avoid-outofmemoryerror-when-uploading-a-large-file-using-jersey-client
In order for your code not to depend on the size of the uploaded file, you need:
  1. Use streams
  2. Define the chunk size of the jersey client. For example: client.setChunkedEncodingSize(1024);
Server:
    @POST
    @Path("/upload/{attachmentName}")
    @Consumes(MediaType.APPLICATION_OCTET_STREAM)
    public void uploadAttachment(@PathParam("attachmentName") String attachmentName, InputStream attachmentInputStream) {
        // do something with the input stream
    }
Client:
    ...
    client.setChunkedEncodingSize(1024);
    WebResource rootResource = client.resource("your-server-base-url");
    File file = new File("your-file-path");
    InputStream fileInStream = new FileInputStream(file);
    String contentDisposition = "attachment; filename=\"" + file.getName() + "\"";
    ClientResponse response = rootResource.path("attachment").path("upload").path("your-file-name")
            .type(MediaType.APPLICATION_OCTET_STREAM).header("Content-Disposition", contentDisposition)
            .post(ClientResponse.class, fileInStream);
JETTY

This tutorial will walk you through how to download, install and run Jetty – a 100 % Java HTTP and Servlet Container.
https://dzone.com/articles/installing-and-running-jetty

RESTful web services with jetty and jersey
http://jlunaquiroga.blogspot.com/2014/01/restful-web-services-with-jetty-and.html

Friday, May 29, 2015

AngularJS and HTML5

AngularJS Tutorial

http://www.w3schools.com/angular/
How to call a Java controller class from angularjs
http://stackoverflow.com/questions/27921957/how-to-call-a-java-controller-class-from-angularjs

GUI Libraries for Angular Javascript Framework
Angular JavaScript Framework: Interacting with Java Servlet Backend
http://www.doublecloud.org/2013/09/angular-javascript-framework-interacting-with-java-servlet-backend/

Allow streaming upload as opposed to multipart upload
https://github.com/nervgh/angular-file-upload/pull/38

Angular File Upload is a module for the AngularJS framework. Supports drag-n-drop upload, upload progress, validation filters and a file upload queue. It supports native HTML5 uploads, but degrades to a legacy iframe upload method for older browsers. Works with any server side platform which supports standard HTML form uploads.

When files are selected or dropped into the component, one or more filters are applied. Files which pass all filters are added to the queue. When file is added to the queue, for him is created instance of{FileItem} and uploader options are copied into this object. After, items in the queue (FileItems) are ready for uploading.

You could find this module in bower like angular file upload.

https://github.com/nervgh/angular-file-upload

File Uploading and Streaming with BinaryJS
http://www.olindata.com/blog/2014/01/file-uploading-and-streaming-binaryjs

angular-file-upload
Lightweight Angular JS directive to upload files using input type file or drag&drop with ajax call.
  • file upload for both html5 and non-html5 browsers with $http()
  • upload progress
  • file drag and drop.
  • abort/cancel upload
For non HTML5 browsers it uses FileAPI polyfill.
Please visit the project on GitHub for guides and more information


java - execute a command in linux
http://stackoverflow.com/questions/4264857/java-execute-a-command-in-linux
Mockup Frameworks:
Balsamiq Mockups 3
http://www.filehorse.com/download-balsamiq-mockups/
Balsamiq Mockups is a rapid wireframing tool that helps you Work Faster & Smarter. It reproduces the experience of sketching on a whiteboard, but using a computer. Making mockups is fast. You'll generate more ideas, so you can throw out the bad ones and discover the best solutions.

Using Mockups feels like drawing, but because it's digital, you can tweak and rearrange easily. Teams can come up with a design and iterate over it in real-time in the course of a meeting. Sketchy, low-fidelity wireframes let you focus design conversations on functionality.Linking lets you generate click-through prototypes for demos & usability testing. Seamless integration with all versions of Mockups, for when you're back online. Interfaces with drag and drop components-anyone can use it.Create templates, masters, and re-usable component libraries. Mockups is designed to help you and your team or clients iterate on wireframes as early in the process as possible, when it's cheapest to do so.

Balsamiq Mockups Features:

Sketchy Wireframes
Low-fidelity wireframes let you focus discussion on functionality.

Drag & Drop Simplicity
Create user interfaces with drag and drop components—anyone can use it.

Quick Add for Speed
Lets you build wireframes using your keyboard.

Re-usable Symbols
Create templates, masters, and re-usable component libraries.

UI Components & Icons
Lots of built-in user interface controls and icons, plus many community-generated symbols.

Click-Through Prototypes
Linking lets you generate click-through prototypes for demos & usability testing.

Export to PNG or PDF
Share or present mockups with embedded links using PDF export, or use a 3rd party tool to export to code.

Keyboard Shortcuts
Go really, really fast. and many more.

Note: The trial is fully functional for 30 days. Balsamiq Mockups for Desktop requiresAdobe Air to run.

Also Available: Download Balsamiq Mockups for Mac







Tuesday, May 19, 2015

NLI - Natural Language Inference


Can recursive neural tensor networks learn logical reasoning?

 Samuel R. Bowman NLP Group, Dept. of Linguistics Stanford University Stanford, CA 94305-2150 sbowman@stanford.edu
 Abstract
 Recursive neural network models and their accompanying vector representations for words have seen success in an array of increasingly semantically sophisticated tasks, but almost nothing is known about their ability to accurately capture the aspects of linguistic meaning that are necessary for interpretation or reasoning. To evaluate this, I train a recursive model on a new corpus of constructed examples of logical reasoning in short sentences, like the inference of some animal walks from some dog walks or some cat walks, given that dogs and cats are animals. This model learns representations that generalize well to new types of reasoning pattern in all but a few cases, a result which is promising for the ability of learned representation models to capture logical reasoning.


http://web.stanford.edu/~sbowman/arxiv_submission.pdf

depends on coreference resolution
http://nlp.stanford.edu/projects/coref.shtml

Natural Language Processing for the rest of us.

Opinions, Entities and Sentiments in 6 languages 

Sunday, May 17, 2015

NLP STARTUPS AND RESEARCH DIRECTIONS

TRANSFER LEARNING
10 Exciting Ideas of 2018 in NLP
https://t.co/AtEm5CxzVd
Sebastian Ruder
I'm a PhD student in Natural Language Processing and a research scientist at AYLIEN. I blog about Machine Learning, Deep Learning, and NLP.

#NLP 2018 Unsupervised #MT Pretrained LM Common sense inference #datasets Meta-learning Robust #unsupervised methods Understanding reps Clever auxiliary tasks Combining semi-supervised learning w/transfer learning QA & reasoning w/large docs Inductive bias

NLP Startups Analyzing Twitter and Other Social Media Sources in Real Time

QUID

AUGMENTED INTELLIGENCE

https://www.crunchbase.com/person/bob-goodson#/entity
Sean Gourly
http://seangourley.com/about/
https://www.crunchbase.com/person/sean-gourley#/entity

HUMAN + MACHINE = AUGMENTED HUMAN INTELLIGENCE

Big Data and the Rise of Augmented Intelligence: Sean Gourley at TEDxAuckland
https://www.youtube.com/watch?v=mKZCa_ejbfg&feature=youtu.be


Published on Dec 5, 2012

Dr. Sean Gourley is the founder and CTO of Quid. He is a Physicist by training and has studied the mathematical patterns of war and terrorism. This research has taken him all over the world from the Pentagon, to the United Nations and Iraq. Previously, Sean worked at NASA on self-repairing nano-circuits and is a two-time New Zealand track and field champion. Sean is now based in San Francisco where he is building tools

TEDxNewWallStreet - Sean Gourley - High frequency trading and the new algorithmic ecosystemhttps://www.youtube.com/watch?v=V43a-KxLFcg
Published on Apr 12, 2012
Speaker Bio:
Dr. Sean Gourley is the founder and CTO of Quid. He is a Physicist by training and has studied the mathematical patterns of war and terrorism. He is building tools to augment human intelligence.

Technologies:

webGL
https://get.webgl.org/
python
spark



*********************************************************************************
The Startup That Helps You Analyze Twitter Chatter in Real Time
http://www.wired.com/2015/02/luminoso/
Luminoso
Compass works with Twitter out of the box, but it also comes with an API, or application programming interface, that lets you plug it into other online forums. And according to Havasi, it can train itself to search for relevant information.

With the tool, the company aims to compete with a long list of other text analytics companies, from the Chicago-based Network Insights to Lexalytics and Clarabridge.

Right now, if businesses want to track a certain topic, an actual person must manually enter keywords they want to look for, while Compass can generate relevant keywords on the fly.

Meltwater is a Business Intelligence company of +1000 individuals spread across ~60 offices in ~30 countries with over 26,000 clients. At Meltwater we see ourselves as a Outside Insights company, meaning we seek to deliver similar type of business analytics & insights as traditional CRM dashboards and ERP systems used to, except by leveraging data outside the firewall (social media, news, blogs etc.) we believe the insights can be much more decisive and predictive for our clients business. Part of the challenge with this is of course structuring the unstructured data out there. This is why the Data Science team at Meltwater has the mission to ingest, categorize, label, classify, and a whole range of other enrichments on the content that we crawl in order to index it properly in our big data architecture and make it available for our insights dashboard. We do these enrichments in +15 languages.

The second talk will be by Gregor Stewart of Basis Tech. It will be an example of Basis adaptive tech -- Gregor will complement Babak!

Babak Rasolzadeh is the Director of Data Science at Meltwater and has a team of 24 engineers on this. Prior to Meltwater, Babak was the co-founder of OculusAI, a computer vision start-up in Sweden, that was sold to Meltwater in 2013. He holds a PhD in Computer Vision, from KTH in Sweden, and has worked on things ranging from self-driving cars to humanoid robots and mobile object recognition. He is an advisor for several startups here in US and Sweden.

Gregor Stewart is the VP of Product Management for Basis Technology, a multilingual text analytics company based in Cambridge, MA. Among other things, it delivers adaptable entity extraction and resolution components in Java, for 17 languages. Currently, Gregor has the Basis teams hard at work readying a web API offering. Previously, Mr Stewart was CTO of a storage services company, and a strategy consultant. He has degrees from the University of Oxford and the London School of Economics, as well as a Masters in NLP from Edinburgh University.

RESEARCH DIRECTIONS

HERE'S WHAT WE CAN EXPECT FROM DEEP LEARNING IN 2016 AND BEYOND
By Sophie Curtis on December 29, 2015
https://re-work.co/blog/deep-learning-experts-discuss-the-next-5-years

NLP for Assessing Credibility of Scientific Papers

Assessing Credibility of Weblogs Victoria L. Rubin and Elizabeth D. Liddy* School of Information Studies *Center for Natural Language Processing Syracuse University Syracuse, NY13244-1190, USA {vlrubin, liddy}@syr.edu
http://aaaipress.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-038.pdf

excerpts:

 The study will elicit and test credibility assessment factors (Phase I), perform NLP-based blog profiling (Phase II), and contentanalyze blog-readers’ comments for partial profile matching (Phase III).

Credibility is viewed as a perceived quality that is evaluated simultaneously with at least two major components: trustworthiness and expertise.

In this study we will explore how these distinctive features of blogs can be used beneficially for NLP and Machine Learning analysis to allow for automation of blog credibility assessment. Thus, the objectives of this study are: 1) to compile a list of factors that users take into account in credibility assessment of weblog sites; 2) to order these factors in terms of their perceived importance to users, and; 3) to suggest which factors can be accessed and computed with NLP-techniques.

Once the factors that contribute to blog credibility are completed and tested, we can focus specific computational efforts on scanning large amounts of information for bloggerprofiling and automating credibility assessment.

Weblogs: Credibility and Collaboration in an Online World
http://people.ischool.berkeley.edu/~vanhouse/Van%20House%20trust%20workshop.pdf

Journalist versus news consumer: The perceived credibility of machine written news
http://compute-cuj.org/cj-2014/cj2014_session4_paper2.pdf

Credibility assessment and inference for fusion of hard and soft information
http://hrilab.tufts.edu/publications/premaratneetal12ahfe.pdf

Assessing Credibility with Natural language processing
https://books.google.com/books?id=BVlDAAAAQBAJ&pg=PA331&lpg=PA331&dq=Assessing+Credibility+with+Natural+language+processing



Attrasoft Launches New Automatic Image Tagging Service - See more at: http://atdc.org/news-from-our-companies/attrasoft-launches-new-automatic-image-tagging-service/#sthash.TGTP04S1.dpuf

http://atdc.org/news-from-our-companies/attrasoft-launches-new-automatic-image-tagging-service/

Google NLP research
http://research.google.com/pubs/NaturalLanguageProcessing.html

ICLR 2016 Best Papers Awards
http://www.iclr.cc/doku.php?id=iclr2016%3Amain#best_paper_awards

Neural Programmer-Interpreters

http://arxiv.org/abs/1511.06279
Scott Reed, Nando de Freitas
(Submitted on 19 Nov 2015 (v1), last revised 29 Feb 2016 (this version, v4))

We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

https://arxiv.org/abs/1703.03864
Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever
(Submitted on 10 Mar 2017)
We explore the use of Evolution Strategies, a class of black box optimization algorithms, as an alternative to popular RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using hundreds to thousands of parallel workers, ES can solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training time. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.












Monday, May 11, 2015

NLP API Comparison



Analysis, Natural Language Processing, Content, Language, Machine Learning, Semantics, Social

HOW 5 NATURAL LANGUAGE PROCESSING APIS STACK UP

Patricio Robles, Contributing WriterJul. 28 2014, 03:20PM EDT
http://www.programmableweb.com/news/how-5-natural-language-processing-apis-stack/analysis/2014/07/28


LICENSING





Developers Prefer GPL, Enterprises Prefer Apache
Posted May 17, 2011 by Scott Merrill (@smerrill)
http://techcrunch.com/2011/05/17/developers-prefer-gpl-enterprises-prefer-apache/

Unirest MIT Licensehttp://unirest.io/java.html
https://github.com/Mashape/unirest-java/blob/master/LICENSE

SCALA



Remotely

http://oncue.github.io/remotely/

Introduction

Remotely is an elegant, reasonable, purely functional remoting system. Remotely is fast, lightweight and models network operations as explicit monadic computations. Remotely is ideally suited for:
  • Client/server programming
  • Service-to-service communication
  • Building network-facing APIs
NB: Remotely is currently an experimental project under active development. Feedback and contributions are welcomed as we look to improve the project.

Rationale

Before talking about how to use Remotely, it's worth discussing why we made this project in the first place. For large distributed service platforms there is typically a large degree of inter-service communication. In this scenario, several factors become really important:
  • Productivity. We want to spend less of our time writing marshalling/unmarshalling boilerplate. Although much progress has been made with things like JSON or XML over HTTP, in larger teams development time is typically spent on transforming to and from some internal representation of the wire content syntax tree (e.g. JValue from lift-json or similar). As this AST is typically some kind of semi-structured data, users have to waste time traversing all these different JSON structures (with either none or extremely minimal reuse) to extract the typed value they care about. Alternatively, some marshalling/unmarshalling libraries resort to runtime reflection or similar magic, which developers find confusing and frustrating. InRemotely we have tried to address this by providing a generic way to serialise any given type over the wire. This immediately removes the need for traversing various types of AST to extract fields, since the serialisation and deserialization code is highly composable and modular (thanks to scodec).
  • Safety. In Remotely, incompatibilities between client and server result in a compile-time rather than run-time failure. Something that is missing from HTTP+JSON services is the ability to know that clients remain compatible when moving between revisions of a service API. Typically this kind of meta-information ends up being encoded in a version number or some other out-of-band knowledge. This often results in runtime failures and incompatibility between services unless exceptional care is taken by the service owner not to break their API in any way between revisions. Using Remotely we build the protocols just like any other compile-time artifact. Those artefacts are then published as JARs and depended upon as build-time contracts by clients. It is then easy to build all dependent services as downstream jobs during the build phase, which gives engineers early warnings about compatibility issues with their service APIs.
  • Reuse. In most typed-protocol definitions there is a low degree of reuse because the serialisation code does not compose, and generally has no higher-order facilities. An example of this would be Thrift's protocol definitions: the definition contains all structures and values used by the client and server, and the entire world of needed files is generated at build time, even if that exact same structure (e.g tuples) is used by an adjacent service. Within Remotely we wanted to avoid this nasty code-generation step and instead rely on highly composable structures with their associated combinators to get the granularity and level of reuse we wanted.