Saturday, August 2, 2014

ML at Oracle and Other Companies


ORACLE
https://labs.oracle.com/pls/apex/f?p=labs:49:::::P49_PROJECT_ID:7

https://oracle.taleo.net/careersection/2/jobdetail.ftl

put in a job cart:
1. Brook Lee Stevens
https://people.us.oracle.com/pls/oracle/f?p=8000:2:0::::PERSON_ID:75043250526760
Oracle Big Data Analytics Group
Job Description

Software Development Manager-14000T1K
10020.Software Development Manager.PRODEV.SWENG.M2
Big Data Discovery Group
Top 3 skill sets / technologies in the ideal candidate: 
1. Java
2. Experience working in an Agile environment
3. NLP exposure
location - Redwood Shores
------------------------------------------------------------------------
2. Abe Taha

Software Developer 4-14000EYS
10540.Software Developer 4.PRODEV.SWENG.IC4
Oracle Marketing Cloud
Cupertino
-------------------------------------------------------------------------------------------
3. Hanlin Chien

Software Developer 3-14000E4X
10530.Software Developer 3.PRODEV.SWENG.IC3
Cluster and Parallel Systems
Redwood City
--------------------------------------------------------
4. Ari W Mozes
Software Developer 4-140004UD
10540.Software Developer 4.PRODEV.SWENG.IC4
Oracle Data Mining
Burlington, MA
----------------------------------
5. Timothy McCandless
Software Developer 4-140004YY
10540.Software Developer 4.PRODEV.SWENG.IC4
Social Platform
 US-CO,Colorado-Longmont
------------------------
ENDECA

https://cloud.oracle.com/bigdatadiscovery
http://busgj05.us.oracle.com:7003/bdd/
my email/admin123
https://www.youtube.com/watch?v=C9wclRqTixo
http://www.oracle.com/technetwork/middleware/endecaserver/downloads/endeca-server-downloads-1721978.html
https://blogs.oracle.com/emeapartnerbiepm/entry/try_out_latest_version_of
For you to try out Endeca 3.1 as a user and self-service dashboard builder, there is a publicly accessible oracle-hosted demonstration URL available with Endeca 3.1, here at:
http://www.oracle.com/technetwork/middleware/endeca/documentation/index.html
http://www.oracle.com/us/solutions/ent-performance-bi/oeid-tech-overview-1674380.pdf

http://www.oracle.com/technetwork/middleware/endeca/overview/index.html

Oracle® Endeca Information Discovery Installation Guide Version 2.3.0 • February 2014 • Revision B
https://docs.oracle.com/cd/E29805_01/general.230/InstallGuide.pdf

Oracle Endeca Platform Services Installation Guide Version 6.1.2 • March 2012
https://docs.oracle.com/cd/E28911_01/PlatformServices.612/pdf/PlatServInstallGuide.pdf

Oracle Endeca Commerce Getting Started Guide Version 6.3.0 • July 2012
http://docs.oracle.com/cd/E35823_01/Common.630/pdf/GettingStarted.pdf

How to install endeca?http://stackoverflow.com/questions/13478544/how-to-install-endeca
-------------------------------------------
Amit Zavery - direct report Siddhartha Agarwal
https://people.oracle.com/apex/f?p=8000:PERSON:103822553816814::NO::PERSON:siddhartha.agarwal

Melissa Jacobus
Director Product Management Common Cloud PaaS - Product Mgmt Austin TX, US · 5:03 PM Wed

-----------------------
Product Management
Lauri Kopra Mashoian
Senior Director, Product Management, Fusion Development Management
US lauri.kopra@oracle.com +1 650-633-8812
Justin Knowles
VP, ACS Product Mgmt & Strategy ACS - Global Business Operations Belmont CA, US · 5:18 PM Wed
justin.knowles@oracle.com
+1 650-506-5848
+1 650-215-0891
Adina Simu
Vice President of Product Management Identity Cloud - CASB (Palerra) San Jose CA, US

Phone Number and Email

adina.simu@oracle.com

+1 408-480-0189
+1 408-480-0189
--------------------------------------------------------------------------------------------------------

INTERVIEW

how to interview data scientist
linkedin

Strata 2013 - How to Interview a Data Scientist

Daniel Tunkelang

title: System Architect, team lead
osb documentation
JMS Transport
http://docs.oracle.com/middleware/1213/osb/develop/osb-transport-jms.htm#OSBDV89415
JMS 2.0
http://www.oracle.com/technetwork/articles/java/jms20-1947669.html

answer Arun's questions

random stream sampling by 10 elements

https://gregable.com/2007/10/reservoir-sampling.html

https://en.wikipedia.org/wiki/Reservoir_sampling

Optimal Random Sampling from Distributed Streams Revisited
http://www.cs.cmu.edu/afs/cs/user/dwoodruf/www/tw11.pdf

https://www.geeksforgeeks.org/reservoir-sampling/


search sorted array in log n time

public static <T> int binarySearch(List<? extends Comparable<? super T>> list,
                   T key)
public static <T> int binarySearch(List<? extends T> list,
                   T key,
                   Comparator<? super T> c)

Java 8 Lambda expression to write a Comparator to sort a List.
1. Classic Comparator example.
 Comparator<Developer> byName = new Comparator<Developer>() {
  @Override
  public int compare(Developer o1, Developer o2) {
   return o1.getName().compareTo(o2.getName());
  }
 };

2. Lambda expression equivalent.
 Comparator<Developer> byName = 
  (Developer o1, Developer o2)->o1.getName().compareTo(o2.getName());


YouTube - simple terms

OSB Design and Architecture
https://www.youtube.com/watch?v=nN4_X5ZuP9Q
https://www.youtube.com/watch?v=4SLunYTIgak
https://www.youtube.com/watch?v=qfp1AwFWTH4

oracle service bus design in business terms

what is soa

https://www.youtube.com/watch?v=A3_QlYJRVvk
soa fundamentals
https://www.youtube.com/watch?v=uW8dnVuMZRM

Enterprise architecture
The 10 Minute Guide to Enterprise Architecture
https://www.youtube.com/watch?v=y2vEpglX1eQ

Oracle Learning Library
Robert Wunderlich

Oracle SOA Suite 12c: REST Enabling SOA

https://www.youtube.com/watch?v=1KXKppaOgtY

Webinar : Practical SOA for the Solution Architect

https://www.youtube.com/watch?v=1KXKppaOgtY

John Brunswick

What is Middleware? Service Oriented Architecture Explained

https://www.youtube.com/watch?v=7s_S5Hkm7z0
What is Middleware? Identity and Access Management
https://www.youtube.com/watch?v=7wAF2m88a8s
tarik elmoudden

How to mediate EJB invocation with the help of the Oracle Service Bus
https://www.youtube.com/watch?v=kAZyO_PcQ4c

REST Architecture from google
https://www.youtube.com/watch?v=YCcAE2SCQ6k

Management
http://www.media.mit.edu/research/groups/human-dynamics

http://docs.oracle.com/javase/8/docs/
http://docs.oracle.com/javase/8/docs/technotes/guides/concurrency/index.html
http://docs.oracle.com/javase/8/docs/technotes/guides/collections/

synchronization and concurrency

http://cs.nyu.edu/~lerner/spring12/Preso03-JavaPrimitives.pdf
http://www.javaworld.com/article/2078809/java-concurrency/java-101-the-next-generation-java-concurrency-without-the-pain-part-1.html

Dr. Venkat Subramaniam has released an online course on "Concurrency without Pain". More details @ https://parleys.com/course/54b910e6e4b01b2fa950e332/info

The Concurrency Specialist Course
Dr Heinz Kabutz
https://www.jfokus.se/jfokus13/trainingconcurrency-en.htmlBrian Goetz Java Concurrency in Practice

Questions

Can describe the Java memory model, synchronization and concurrency primitives, and GC behavior under normal and pathological conditions.
http://www.scalingdata.com/careers/engineering-application-engineer
http://www.geekinterview.com/question_details/71549

http://www.codeodor.com/index.cfm/2009/1/29/Answering-the-100-Interview-Questions-for-Software-Developers-Functional-Design/2705

http://noop.nl/2009/01/100-interview-questions-for-software-developers.html

http://codebetter.com/raymondlewallen/2005/07/19/4-major-principles-of-object-oriented-programming/

http://www.smashingmagazine.com/2008/08/05/7-essential-guidelines-for-functional-design/

http://programmers.stackexchange.com/questions/184047/how-do-you-handle-multiple-users-editing-the-same-piece-of-data-in-a-webapp

http://en.wikipedia.org/wiki/Optimistic_concurrency_control

12 principles of extreme programming
http://martinfowler.com/tags/extreme%20programming.html

SOLID principles of object oriented design ***

http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod
https://stackify.com/dependency-inversion-principle/

The SOLID design principles were promoted by Robert C. Martin and are some of the best-known design principles in object-oriented software development. SOLID is a mnemonic acronym for the following five principles:
Single Responsibility Principle
Open/Closed Principle
Liskov Substitution Principle
Interface Segregation Principle
Dependency Inversion Principle

Design Patterns
http://www.oodesign.com/

four principles of OOP

https://stackify.com/oop-concept-abstraction/
Abstraction:

Abstraction is one of the key concepts of object-oriented programming (OOP) languages. Its main goal is to handle complexity by hiding unnecessary details from the user. That enables the user to implement more complex logic on top of the provided abstraction without understanding or even thinking about all the hidden complexity.

Composition:

Composition is one of the fundamental concepts in object-oriented programming. It describes a class that references one or more objects of other classes in instance variables. This allows you to model a has-a association between objects.

Encapsulation:

Encapsulation means that the internal representation of an object is generally hidden from view outside of the object's definition. ...

Inheritance:

Inheritance is one of the core concepts of object-oriented programming (OOP) languages. It is a mechanism where you can to derive a class from another class for a hierarchy of classes that share a set of attributes and methods.

You can use it to declare different kinds of exceptions, add custom logic to existing frameworks, and even map your domain model to a database.

Polymorphism:

The word polymorphism is used in various contexts and describes situations in which something occurs in several different forms. In computer science, it describes the concept that objects of different types can be accessed through the same interface. Each type can provide its own, independent implementation of this interface. It is one of the core concepts of object-oriented programming (OOP).

Java Concurrency

GOTO 2017 • An Overview of Java 9 • Angelika Langer

https://www.youtube.com/watch?v=vQDS4gYs8fA

GOTO 2014 • New Concurrency Utilities in Java 8 • Angelika Langer
https://www.youtube.com/watch?v=Q_0_1mKTlnY

Asynchronous programming in Java 8: how to use CompletableFuture by José Paumard
https://www.youtube.com/watch?v=HdnHmbFg_hw

The ThreadLocal class in Java enables you to create variables that can only be read and written by the same thread. Thus, even if two threads are executing the same code, and the code has a reference to aThreadLocal variable, then the two threads cannot see each other's ThreadLocal variables.

Finally Getting the Most out of the Java Thread Pool
https://stackify.com/java-thread-pools/


While my perspective may be biased, my current intention for updating the book would be almost strictly additive, covering fork-join, parallel decomposition, and the new parallel bulk data operations coming in Java SE 8. - Brian Goetz

Divide and Conquer Parallelism
with the Fork/Join Framework
Mark Reinhold (@mreinhold)
Chief Architect, Java Platform Group
2011/7/7

Serialization
Variables may be marked transient to indicate that they are not part of the persistent state of an object.
For example, you may have fields that are derived from other fields, and should only be done so programmatically, rather than having the state be persisted via serialization.
https://stackoverflow.com/questions/910374/why-does-java-have-transient-fields

The Double Colon Operator in Java 8
http://www.baeldung.com/java-8-double-colon-operator

We’ve used the :: operator as shorthand for lambdas calling a specific method – by name
Comparator c = Comparator.comparing(Computer::getAge);

Very simply put, when we are using a method reference – the target reference is placed before the delimiter :: and the name of the method is provided after it.
For example:
1
Computer::getAge;
We’re looking at a method reference to the method getAge defined in the Computer class.
We can then operate with that function:
1
2
Function<Computer, Integer> getAge = Computer::getAge;
Integer computerAge = getAge.apply(c1);
Notice that we’re referencing the function – and then applying it to the right kind of argument.


LAMBDA
Java 8 Lambda : Comparator example
Data Structures and Algorithms

Algorithms and Data Structures
https://introcs.cs.princeton.edu/java/40algorithms/

Sort elements by value
https://stackoverflow.com/questions/30477275/data-structure-to-sort-elements-by-values

Reactive Programming is a style of micro-architecture involving intelligent routing and consumption of events, all combining to change behaviour.
https://spring.io/blog/2016/06/07/notes-on-reactive-programming-part-i-the-reactive-landscape

CS Dojo
Data Structures & Algorithms #1 - What Are Data Structures?
https://www.youtube.com/watch?v=bum_19loj9A

brilliant.org
https://brilliant.org/courses/?tour=true#featured


Java Programming - Data Structure and Algorithms in Java

https://www.youtube.com/watch?v=0XL1NBUv2NU

Memory Leak

https://stackoverflow.com/questions/29112272/java-memory-leak-from-thread-pool
The allocation hot spots view in JProfiler just tells you were objects that are still on the heap are allocated. It does not mean that those objects cannot be GCed.

For an analysis of what objects are actually strongly referenced, go to the "Heap walker" in JProfiler. Then select all ListAcceptor objects and go to the "Incoming references" view. With a single object, click on "Show path to GC root". Then you will see a chain of references that prevents the object from being GCed.

With the "Cumulated incoming reference" view you can check whether this is the case for all ListAcceptor objects.


Using the Connection Pool Manager

Java 8 Tutorials

http://www.oracle.com/technetwork/java/javase/8-whats-new-2157071.html
http://docs.oracle.com/javase/tutorial/index.html
http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html
http://docs.oracle.com/javase/tutorial/java/IandI/defaultmethods.html
http://docs.oracle.com/javase/tutorial/java/javaOO/innerclasses.html
https://docs.oracle.com/javase/tutorial/java/javaOO/nested.html

https://user-assets-unbounce-com.s3.amazonaws.com/b93ede49-06e1-44dc-9caf-8ca7fe04896f/f8617cd0-4912-48f4-9827-6a160b406144/javacontentpack1.original.pdf
==============================================
***
The first five principles are principles of class design. They are:
SRPThe Single Responsibility PrincipleA class should have one, and only one, reason to change.
OCPThe Open Closed PrincipleYou should be able to extend a classes behavior, without modifying it.
LSPThe Liskov Substitution PrincipleDerived classes must be substitutable for their base classes.
ISPThe Interface Segregation PrincipleMake fine grained interfaces that are client specific.
DIPThe Dependency Inversion PrincipleDepend on abstractions, not on concretions.

The next six principles are about packages. In this context a package is a binary deliverable like a .jar file, or a dll as opposed to a namespace like a java package or a C++ namespace.

The first three package principles are about package cohesion, they tell us what to put inside packages:

REPThe Release Reuse Equivalency PrincipleThe granule of reuse is the granule of release.
CCPThe Common Closure PrincipleClasses that change together are packaged together.
CRPThe Common Reuse PrincipleClasses that are used together are packaged together.

The last three principles are about the couplings between packages, and talk about metrics that evaluate the package structure of a system.

ADPThe Acyclic Dependencies PrincipleThe dependency graph of packages must have no cycles.
SDPThe Stable Dependencies PrincipleDepend in the direction of stability.
SAPThe Stable Abstractions PrincipleAbstractness increases with stability.

The Java EE Tutorial Project is the official site for the Java Platform, Enterprise Edition (Java EE) 8 Tutorial that is delivered with the Java EE 8 SDK. The Java EE Tutorial teaches and demonstrates the Java EE features that are used to develop enterprise applications.

Spring Framework 5.0.4.RELEASE API
https://docs.spring.io/spring/docs/5.0.4.RELEASE/javadoc-api/

The Java EE 6 Tutorial

Chapter 32
Introduction to the Java Persistence API

https://docs.oracle.com/javaee/6/tutorial/doc/bnbpz.html

https://docs.oracle.com/javase/8/

Controlling Access to Members of a Class
https://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html

------------------------------------------------------------------------

OTHER COMPANIES

highest paying: NLP, BI

http://www.alchemistaccelerator.com/jobs/


NLP - Data Leaders
$186K | Vault Ranks a Top 10 place to work. - San Francisco, CA

In this role, the selected candidate must have an experience with text classification via logistic regression, decision trees, support vector machines and maximum entropy classifiers.

INDUSTRY
Consulting

EXPERIENCE
8–10 years

DATA SCIENCE
================

Cloudera


Cloudera Certified Professional: Data Scientist (CCP:DS)
http://www.cloudera.com/content/cloudera/en/training/certification/ccp-ds.html

NLP - Data Leaders

$186K | Vault Ranks a Top 10 place to work. - San Jose, CA


In this role, the selected candidate must have an experience with text classification via logistic regression, decision trees, support vector machines and maximum entropy classifiers.

Data Scientist

$150K to $200K | networking - San Jose, CA


Seeking a Data Scientist.

INDUSTRY


Business Intelligence

EXPERIENCE


11–15 years


http://www.interana.com/company/careers


http://www.argyledata.com/careers/research-engineer/

Research Engineer


The engineering team at Argyle is building the next generation of big data risk applications for massive telecom and financial services customers. We are delivering innovative solutions that include: big data management, distributed sql, machine learning, graph analysis and search.


The Data Scientist/Research Engineer will work with the engineering team to research, prototype, design, build, and maintain the next generation prediction, optimization, and analytics technologies.


We are looking for motivated self-starters and extremely fast learners with top notch problem solving and analytic skills. Must have a strong desire to dig deep into petabytes of data and love to play with the innards of complex algorithms, mathematical constructs and code.


Requirements:
A Master’s degree or PhD degree in a closely related field (eg: computer science/engineering, optimization) from a top notch university is strongly preferred although MS degree holders with demonstrably exceptional research abilities may be considered
Solid computing background with a thorough understanding of the fundamentals of computer science, data structures, and algorithms. Significant experience writing code is preferred. Experience writing highly scalable, low-latency, distributed systems is a plus
Experience in one or more of the following areas: machine learning or statistical modeling, optimization, algorithms, big-data
Excellent communication skills – ability to present and communicate complex ideas in simple terms
Strong statistical/math/analytical abilities; Exceptional problem solving skills

complex quantitative modeling problems and advancing the company’s core statistical inference and algorithmic technology for audience targeting 

http://www.quantcast.com/audience/quantcast-lookalikes

Quantcast’s Modeling Team works closely with our Scientific Advisory board, comprised of Stanford professors Jerome Friedman and Trevor Hastie. As leaders in the fields of data mining and statistics, both professors take an active interest in Quantcast and our ongoing work.

http://videolectures.net/kdd2010_feldman_qalbb/

Data Scientist
SAN FRANCISCO, CA, UNITED STATES
https://www.quantcast.com/engineering/quantcast-careers?jobid=o443YfwT


GENERAL

questions to ask when interviewing for engineering management
http://engineering.randstadusa.com/jobseekers/interviewandresumetips/questionstoaskduringaninterview.aspx

Facebook
https://www.facebook.com/careers/department?dept=data&req=a0IA000000CwmViMAJ

Data & Analytics

Manager, Analytics

LocationMenlo Park, CA
Facebook was built to help people connect and share, and over the last decade our tools have played a critical part in changing how people around the world communicate with one another. With over a billion people using the service and more than fifty offices around the globe, a career at Facebook offers countless ways to make an impact in a fast growing organization.
We’re looking for analytics engineering leaders to work on our Identity & Privacy and Growth core products with a passion for social media to help drive informed business decisions for Facebook. You will enjoy working with one of the richest data sets in the world, cutting edge technology, and the ability to see your insights turned into real products on a regular basis. The perfect candidate will have a background in computer science or a related technical field, will have experience working with large data stores, and will have some experience building software. You are scrappy, focused on results, a self-starter, and have demonstrated success in using analytics to drive the understanding, progression, and user engagement of a product. This position is located in our Menlo Park office.

Responsibilities

  • Apply your expertise in quantitative analysis, data mining, and the presentation of data to see beyond the numbers and understand how our users interact with our core products
  • Partner with Product and Engineering teams to solve problems and identify trends and opportunities
  • Inform, influence, support, and execute our product decisions
  • Build/maintain reports, dashboards, and metrics to monitor the performance of our products
  • Mine massive amounts of data and extract useful product insights
  • Manage development of data resources, gather requirements, organize sources, and support product launches

Requirements

  • 4+ years of experience doing quantitative analysis preferably for a social web company
  • BA/BS in Computer Science, Math, Physics, or other technical field. Advanced degrees preferred but not required
  • Experience in managing other team members in a formal or informal capacity
  • Fluency in SQL or similar languages and development experience in at least one scripting language (PHP, Python, Perl, etc.)
  • Experience with large data sets and distributed computing (Hive/Hadoop) a plus
  • Ability to initiate and drive projects to completion with minimal guidance
  • The ability to communicate the results of analyses in a clear and effective manner
  • Basic understanding of statistical analysis, experience with packages such as R, MATLAB, SPSS, SAS, Stata, etc. preferred

SAP

Senior Developer, Engineering Platform Team Job


As a small team of ex-Google employees, we have recently launched a new website, interviewjoy.com, where you can earn money by sharing your interview experiences/insights with other job candidates. (It is a marketplace for sharing job interview insights). Posting an interview consultancy service is totally free &amp; anonymous and we are giving 50 USD sign-up bonus for the first 500 users. You are kindly invited to interviewjoy.com to check it out. Users already started making money on the website! Best Regards.. (For more information: onboarding@interviewjoy.com

JAVA INTERVIEW






  • Java Profilers. ...JProfiler
  • JProfiler 10.1
  • https://www.ej-technologies.com/download/jprofiler/files
  • java http load generator 
  • https://jmeter.apache.org/
  • Apache JMeter 4.0 (Requires Java 8 or 9.)
  • Tracing Java Web Requests and Transactions. ...
  • Java Application Performance Management (APM) ...
  • Real User Monitoring (RUM) ...
  • JVM Performance Metrics. ...
  • Web Server (Apache/Nginx) Access Logs. ...
  • Tracking All Java Exceptions. ...
  • Memory Analysis.


  • Qs

    1. Your web application is slow to respond or stopped responding - your actions?

    THE 10 MOST COMMON WEB APP PERFORMANCE PROBLEMS
    https://www.neotys.com/blog/10-most-common-web-app-performance-problems/

    1. Poorly written code can lead to a host of web application issues including inefficient algorithms, memory leaks and application deadlocks.

    2. Missing indexes slow down the performance of SQL queries causing, which can drag down an entire site. Be sure to use scripts and file statistics to check for any inefficient queries. Check response time for the frequently used queries.
    3. Developing a plan to manage and monitor data as it grows is indispensable to your web performance success. 
    4. Major traffic spikes
    5. Poor load distribution 
    6. default configurations
    7.DNS queries make up the majority of web traffic
    8. third-party services, you know that some slowdowns are out of your control
    9. Shared Resources and Virtual Machines
    10.  a failure in one location may affect other spots 
    11. test load performance at higher user levels beforehand


    Azure
    https://docs.microsoft.com/en-us/azure/app-service/app-service-web-troubleshoot-performance-degradation
    Symptom

    When you browse the web app, the pages load slowly and sometimes timeout.
    Cause

    This problem is often caused by application level issues, such as:

    network requests taking a long time
    application code or database queries being inefficient
    application using high memory/CPU
    application crashing due to an exception

    Troubleshooting steps

    Troubleshooting can be divided into three distinct tasks, in sequential order:
    Observe and monitor application behavior
    Collect data
    Mitigate the issue

    App Service Web Apps gives you various options at each step.

    1. Observe and monitor application behavior
    Track Service health

    Microsoft Azure publicizes each time there is a service interruption or performance degradation. You can track the health of the service on the Azure portal. For more information, see Track service health.
    Monitor your web app

    This option enables you to find out if your application is having any issues. In your web app’s blade, click the Requests and errors tile. The Metric blade shows you all the metrics you can add.

    Some of the metrics that you might want to monitor for your web app are
    Average memory working set

    Average response time
    CPU time
    Memory working set
    Requests

    Five Root-Cause Reasons Your Applications Are Slow


    JProfiler, instrumenting code

    Debugging Server-side Code through IntelliJ IDEA with BEA Weblogic 8.1
    by Mark Spritzler
    In order to debug server-side code you must start your (app) server in debug mode, and you must have your IDE connect to the remote JVM through a Remote Server Debug configuration (That's what it is called in IDEA). Other IDE's should have something very similar to this that will allow it to "hook" into the remote JVM and find out the calls that are being made, and to stop the code when it hits a breakpoint that you have set in the IDE.

    What are the benefits of a stateless web application?
    https://stackoverflow.com/questions/5539823/what-are-the-benefits-of-a-stateless-web-application

    https://softwareengineering.stackexchange.com/questions/346867/how-to-keep-applications-stateless

    https://stackoverflow.com/questions/34675027/stateless-web-application-an-urban-legend


    3. Apache HDFS, Apache TEZ, Apache Hive
    https://hadoopecosystemtable.github.io/

    4. Kafka - a distributed streaming platform
    Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
    https://kafka.apache.org/
    Apache Kafka® as a Service. A Way to Liberate Developers.
    https://www.confluent.io/confluent-cloud/

    KafkaServlet.java
    https://github.com/ibm-messaging/message-hub-samples/blob/master/kafka-java-liberty-sample/src/main/java/com/messagehub/samples/servlet/KafkaServlet.java

    Redis
    https://redis.io/topics/cluster-spec
    https://redis.io/topics/cluster-tutorial

    Main properties and rationales of the design
    Redis Cluster goals

    Redis Cluster is a distributed implementation of Redis with the following goals, in order of importance in the design:

    High performance and linear scalability up to 1000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.
    Acceptable degree of write safety: the system tries (in a best-effort way) to retain all the writes originating from clients connected with the majority of the master nodes. Usually there are small windows where acknowledged writes can be lost. Windows to lose acknowledged writes are larger when clients are in a minority partition.

    Availability: Redis Cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable. Moreover using replicas migration, masters no longer replicated by any slave will receive one from a master which is covered by multiple slaves.

    5. Tomcat
    Tomcat Servlet & Kafka
    https://stackoverflow.com/questions/47034029/connect-kafkaproducer-using-servlet

    6. Auth token for stateless webapp


    Using OAuth 2.0 for Web Server Applications for YouTube
    https://developers.google.com/youtube/v3/guides/auth/server-side-web-apps
    What is JSON Web Token?
    https://jwt.io/introduction/
    This is a stateless authentication mechanism as the user state is never saved in server memory. The server's protected routes will check for a valid JWT in the Authorization header, and if it's present, the user will be allowed to access protected resources. As JWTs are self-contained, all the necessary information is there, reducing the need to query the database multiple times.
    This allows you to fully rely on data APIs that are stateless and even make requests to downstream services. It doesn't matter which domains are serving your APIs, so Cross-Origin Resource Sharing (CORS) won't be an issue as it doesn't use cookies.

    Where to Store Tokens

    Auth0 API
    https://auth0.com/docs/api/info
    Management: Handles management of your Auth0 account, including functions related to (but not limited to):
    Applications;
    Connections;
    Emails;
    Users.

    7. Shadow table

    8. Scanning the table before and after changes

    IDEA - IntelliJ

    https://intellij-support.jetbrains.com/hc/en-us/articles/207240985-Changing-IDE-default-directories-used-for-config-plugins-and-caches-storage
    Changing IDE default directories used for config, plugins, and caches storage
    ----------------------------


    NLP for LOGS 

    Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection Christophe Bertero, Matthieu Roy, Carla Sauvanaud, Gilles Trédan

    A Machine Learning Approach to Log Analytics

    Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

    https://www.youtube.com/watch?v=_0Wpwj_gvzg

    KAFKA
    https://kafka.apache.org/

    Introduction to Apache Kafka by James Ward
    https://www.youtube.com/watch?v=UEg40Te8pnE&t=1586s

    MICROSERVICES

    Spring Boot. This is probably the best Java microservices framework that works on top of languages for Inversion of Control, Aspect Oriented Programming, and others.
    Jersey. This open source framework supports JAX-RS APIs in Java is very easy to use.
    Swagger. Helps you in documenting API as well as gives you a development portal, which allows users to test your APIs.
    https://stackify.com/what-are-microservices/

    Tomcat, Jersey and Microserver togetherhttps://github.com/aol/micro-server/tree/master/micro-tomcat-with-jersey

    KUBERNETES

    Introduction to Microservices, Docker, and Kubernetes
    https://www.youtube.com/watch?v=1xo-0gCVhTU

    Kubernetes as a Service
    https://stackify.com/kubernetes-service/

    Mattei Zaharia
    https://www.youtube.com/watch?v=L029ZNBG7bk
    https://www.youtube.com/watch?v=ZFBgY0PwUeY
    https://www.youtube.com/watch?v=Zb9YW8XjxnE&list=PL-x35fyliRwiQN5ozol2pjbtvPDQaSx_Q


    Deep Learning and Streaming in Apache Spark 2 x - Matei Zaharia & Sue Ann Hong
    https://www.youtube.com/watch?v=zom9J9sK6wY&list=PLn3_C5ZW-eVAAyyxQii9fPbVJ8IfPgaaO

    GOOGLE & SPARK

    Using Apache Spark with TensorFlow on Google Cloud Platform
    Tuesday, November 28, 2017
    By Bill Prin and Neeraj Kashyap, Developer Programs Engineers

    https://cloud.google.com/blog/big-data/2017/11/using-apache-spark-with-tensorflow-on-google-cloud-platform
    Google Cloud Platform offers managed services for both Apache Spark, called Cloud Dataproc, and TensorFlow, called Cloud ML Engine. Both of these services deliver the power of their respective open-source frameworks in a managed environment, letting you focus on the data science while we worry about the operations.

    Intuitively, there is some overlap — Spark provides a framework for big data computations, and the type of datasets that power TensorFlow algorithms tends to be large. This leads to a possible intersection between the two frameworks: using Spark to preprocess the input to TensorFlow.

    On our GitHub repo for Cloud Dataproc, we have new TensorFlow-Spark samples which demonstrate two of these use cases. The first is generating TFRecords from CSVs. The second is using Spark to preprocess the data and generate artifacts for the Tensorflow graph. Many of the concepts in the samples are borrowed from TF Transform.
    Deep Learning with Apache Spark and TensorFlow

    How will the distributed version of TensorFlow affect Apache Spark and MLlib?
    https://www.quora.com/How-will-the-distributed-version-of-TensorFlow-affect-Apache-Spark-and-MLlib

    ML AT ORACLE PLAN

    SPARK APIS for processing uploaded build/test results
    sapphire server logs -> SPARK cluster processing -> ML

    structured API example 

    DataFrame API

    events =
    sc.read.json("/logs")
    stats = events.join(users).groupBy("loc", "status").avg("duration")
    errors = stats.where(stats.status == "ERR")

    Optimized Plan

    SCAN logs->filter->join->aggregate
    SCAN users ----------|^

    Specialized Code

    while(logs.hasNext){
    e.logs.next
    if(e.status == "ERR"){
    u = users.get(e.uid)
    key = )u.loc, e.status)
    sum(key) +=e.duration
    count(key) +=1
    }
    }

    MICROPAYMENT COMPANIES

    https://angel.co/search?q=micropayment%20system

    SALARY

    Who make more money? Software Engineer or Project Manager at Big 4?https://www.reddit.com/r/cscareerquestions/comments/33lyz3/who_make_more_money_software_engineer_or_project/

    COURSERA CS COURSES 

    https://www.reddit.com/r/learnprogramming/comments/4huw73/heres_a_list_of_229_free_online_programmingcs/


    Transition from engineer to manager - youtube talk

    David Loftesness
    @dloft
    Platform at eero. Formerly Twitter, Amazon/A9, Xmarks. Just finished writing @scalingteams with @klangberater.
    San Francisco
    medium.com/@dloft
    https://twitter.com/dloft

    questions to send:













    1 comment: