Getting Started with Hadoop, Spark, Hive and Kafka – live blog from oracle code

Posted on March 8, 2018 by Jeanne Boyarsky

Title: Getting Started with Hadoop, Spark, Hive and Kafka

Speakers: Edelweiss Kammermann

See my live blog table of contents from Oracle Cloud

Nice beginning with picture of Uruguay and a map

Big data

Volume – Lots of data
Variety – Many different data format
Velocity – Data create/consumed quickly
Veracity – Know data is accurate
Value – Data has intrinsic value; but have to find it

Hadoop

Manage huge volumes of data
Parallel processing
Highly scalable
HDFS: Hadoop Distributed File System for storing info
Map Reduce – for processing data. Language/methods inside hadoop
Writes data into fixed size blocks
NameNode – ike index, central entry point
DataNode – store data. Send data to next DataNode and so on until done.
Fault tolerant – can survive node failure (Each DataNode sends heartbeat every 3 seconds to NameNode; assues dead after 10 minutes), Communication failure (DataNode sends ack), data corruption (data nodes send block report to NameNode of good blocks)
Can have second NameNode for active/standby config. DataNodes report to both.

Hive

Analyze and query HDFS data to find patterns
Structure the data into tables so can write SQL like queries – HiveQL
HiveQL has multitable insert and cluster by clause
HiveQL has high atench and lacks a query cache

Spark

Can write in Java, Scala, Python or R
Fast in-memory data processing engine
Supports SQL, streaing data, machine learning and graph procesing
Can run standalone, on Hadoop or on Apache Mesos
Much faster than map reduce. How much faster depends n whether the data can fit into memory
Includes packages for core, streaming, SQL, MLLib and GraphX
RDD (resilient distributed dataset) – immutable programming abstraction of objects collection, can be splt cross clusters. Can create from text file, sql, nosql, etc
Can choose which acks need to receive – none, from the leader or from al replicas

Kafka

Integrate data from different sources as input/output
Producer/consumer pattern (called source and sink)
Incoming essages are stored in topics
Topics are identified by unique names and split into partitions (for redundancy and partitions)
Partitions are ordered and has an id named offset
Brokers are Kafka servers in a cluster. Recommended to have three
Define replication factor for data. 2 or 3 is common
Consumers read data from a topic. They read in order from a partition, but in parallel between partitions.

My take

Good simplified intro for a bunch of topics. It was good seeing how things fit together. The audience asked what sounded like detailed questions. I would have liked if they held that for the end.

Intro to Docker Containers – live blog at Oracle Code

Posted on March 8, 2018 by Jeanne Boyarsky

Title: Intro to Docker Containers
Speakers: Mike Raab

See my live blog table of contents from Oracle Cloud

History of containers

ex: UNIX containers, Solairs Zones, VMWare
Docker as a product and company made containerization easy

Use cases

Ready to run application stacks – setting up a cluster can take a few days even if know what doing. Preparing Docker takes a few minutes once you have it configured.
New development/microservices
One time run jobs – the data dies with the container by default so good if don’t need it.
Front end app servers
Server density – Portable – can run same container anywhere

Architecture/Nomenclature

VM has entire OS, app, dependencies, binaries, etc.. Container include the app and dependencies.
Docker client – CLI for interfacing with Docker
Dockerfile – text file of docker instructions to assemble image
Image – hierarchies of files. Input to the docker build command. Collection if files and metadata. Contains layers.
Container – running instance of an image using the docker run command.
User doesn’t know whether you are using a container. Pure implementation.
Registry – image repository. DockerHub is largest repo.
Docker engine – container execution and admin. Uses Linux Kernel namespaces so have isolated workspaces. Can do Docker for Windows but different image and not popular. 99.9%+ Linux

Commands inside DockerFile

FROM x – the base/parent image
COPY x y – copy x to y
RUN c – run UNIX command c
ENTRYPOINT [“a”] – run command a at startup

Docker commands

docker build -t dockerfile . – run in current directory
docker tag image user/image
docker push username/image
docker pull username/image
doker run – pull the image and run in container
docker logs
docker ps – running docker containers
docker ps – a – all docker containers whether running or not
docker images – list all images
docker rm
docker tag
docker login – login to registry
docker push/pull
docker inspect – config metadata
docker-compose up -d – run multi container Docker applications

Why docker is hot

developers love it
fast to spin up environment
open source
code agility, CI/CD pipeline, devops
portability
managed kubernetes – running it is hard. use managed/cloud environments – oracle commercial time 🙂

My take

Good intro/review. I’m giving a session right after this one and wanted to get ready. So this was a good session for me. Not brand new content, but still go something out of it.

Building a ChatBot – live blog Oracle Code New York

Posted on March 8, 2018 by Jeanne Boyarsky

Title: DevOps at Scale; Greek Tragedy in Three Acts
Speakers: Maria Kaval and Shaun Smith

See my live blog table of contents from Oracle Cloud

Started by talking about 6 trends

Serverless functions

Spins up when funtion calls
Goes away after
Like Cinderella’s carriage – but with a server. Only there for a short time

DevOps -> NoOps

Taking work of ops away from you
As developer, just want to write your code
Less emphasis on memory management and such

Open Source

Oracle cloud based on open source
Not focused on profit

Chatbots

Teens like testing and emojis
Adults like to text too; good interface
With chatbots, don’t know if talking to human or bot

Blockchain

More than just bitcoin
Ledgers build trust

Machine Learning

Now have proessing/compute power to enable machine learning

Use case – selling and buying a car

The chatbot asks basic info and calls serverless functions to look up value
Showed Oracle bot builder service – set up intents (phrases that represent what you want to do), train bot (ex: linguistics, machine learning)
Test bot by trying a chat in the config screen
Artificial intelligence integration trains bot

Serverless Functions

Serverless is a category
There are still servers; you just don’t have them. Instead the cloud provider does
Economics – only pay when service is being used – so only when called vs standing service
Agility – small amounts of code. So easier to write/debug/etc
Reliability – in cloud
Innovation – easier to try things out since easy to deploy

http://fnproject.io
Sample commands – init, run, test, deploy, call
Flow UI lets you watch as code runs. Looks like a sequence diagram except live; shows how long took, etc

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Category Archives: Conferences

Getting Started with Hadoop, Spark, Hive and Kafka – live blog from oracle code

Intro to Docker Containers – live blog at Oracle Code

Building a ChatBot – live blog Oracle Code New York

Share this:

Share this:

Share this: