Getting Started with Hadoop, Spark, Hive and Kafka – live blog from oracle code

Title: Getting Started with Hadoop, Spark, Hive and Kafka

Speakers: Edelweiss Kammermann

See my live blog table of contents from Oracle Cloud

Nice beginning with picture of Uruguay and a map

Big data

  • Volume – Lots of data
  • Variety – Many different data format
  • Velocity – Data create/consumed quickly
  • Veracity – Know data is accurate
  • Value – Data has intrinsic value; but have to find it

Hadoop

  • Manage huge volumes of data
  • Parallel processing
  • Highly scalable
  • HDFS: Hadoop Distributed File System  for storing info
  • Map Reduce – for processing data. Language/methods inside hadoop
  • Writes data into fixed size blocks
  • NameNode – ike index, central entry point
  • DataNode – store data. Send data to next DataNode and so on until done.
  • Fault tolerant – can survive node failure (Each DataNode sends heartbeat every 3 seconds to NameNode; assues dead after 10 minutes), Communication failure (DataNode sends ack), data corruption (data nodes send block report to NameNode of good blocks)
  • Can have second NameNode for active/standby config. DataNodes report to both.

Hive

  • Analyze and query HDFS data to find patterns
  • Structure the data into tables so can write SQL like queries – HiveQL
  • HiveQL has multitable insert and cluster by clause
  • HiveQL has high atench and lacks a query cache

Spark

  • Can write in Java, Scala, Python or R
  • Fast in-memory data processing engine
  • Supports SQL, streaing data, machine learning and graph procesing
  • Can run standalone, on Hadoop or on Apache Mesos
  • Much faster than map reduce. How much faster depends n whether the data can fit into memory
  • Includes packages for core, streaming, SQL, MLLib and GraphX
  • RDD (resilient distributed dataset) – immutable programming abstraction of objects collection, can be splt cross clusters. Can create from text file, sql, nosql, etc
  • Can choose which acks need to receive – none, from the leader or from al replicas

Kafka

  • Integrate data from different sources as input/output
  • Producer/consumer pattern (called source and sink)
  • Incoming essages are stored in topics
  • Topics are identified by unique names and split into partitions (for redundancy and partitions)
  • Partitions are ordered and has an id named offset
  • Brokers are Kafka servers in a cluster. Recommended to have three
  • Define replication factor for data. 2 or 3 is common
  • Consumers read data from a topic. They read in order from a partition, but in parallel between partitions.

My take

Good simplified intro for a bunch of topics. It was good seeing how things fit together. The audience asked what sounded like detailed questions. I would have liked if they held that for the end.

Intro to Docker Containers – live blog at Oracle Code

Title: Intro to Docker Containers
Speakers: Mike Raab

See my live blog table of contents from Oracle Cloud

 

History of containers

  • ex: UNIX containers, Solairs Zones, VMWare
  • Docker as a product and company made containerization easy

Use cases

  • Ready to run application stacks – setting up a cluster can take a few days even if know what doing. Preparing Docker takes a few minutes once you have it configured.
  • New development/microservices
  • One time run jobs – the data dies with the container by default so good if don’t need it.
  • Front end app servers
  • Server density – Portable – can run same container anywhere

Architecture/Nomenclature

  • VM has entire OS, app, dependencies, binaries, etc.. Container include the app and dependencies.
  • Docker client – CLI for interfacing with Docker
  • Dockerfile – text file of docker instructions to assemble image
  • Image – hierarchies of files. Input to the docker build command. Collection if files and metadata. Contains layers.
  • Container – running instance of an image using the docker run command.
  • User doesn’t know whether you are using a container. Pure implementation.
  • Registry – image repository. DockerHub is largest repo.
  • Docker engine – container execution and admin. Uses Linux Kernel namespaces so have isolated workspaces. Can do Docker for Windows but different image and not popular. 99.9%+ Linux

Commands inside DockerFile

  • FROM x – the base/parent image
  • COPY x y – copy x to y
  • RUN c – run UNIX command c
  • ENTRYPOINT [“a”] – run command a at startup

Docker commands

  • docker build -t dockerfile . – run in current directory
  • docker tag image user/image
  • docker push username/image
  • docker pull username/image
  • doker run – pull the image and run in container
  • docker logs
  • docker ps  – running docker containers
  • docker ps – a – all docker containers whether running or not
  • docker images – list all images
  • docker rm
  • docker tag
  • docker login – login to registry
  • docker push/pull
  • docker inspect – config metadata
  • docker-compose up -d – run multi container Docker applications

Why docker is hot

  • developers love it
  • fast to spin up environment
  • open source
  • code agility, CI/CD pipeline, devops
  • portability
  • managed kubernetes – running it is hard. use managed/cloud environments – oracle commercial time 🙂

My take

Good intro/review. I’m giving a session right after this one and wanted to get ready. So this was a good session for me. Not brand new content, but still go something out of it.

Building a ChatBot – live blog Oracle Code New York

Title: DevOps at Scale; Greek Tragedy in Three Acts
Speakers: Maria Kaval and Shaun Smith

See my live blog table of contents from Oracle Cloud

Started by talking about 6 trends

Serverless functions

  • Spins up when funtion calls
  • Goes away after
  • Like Cinderella’s carriage – but with a server. Only there for a short time

DevOps -> NoOps

  • Taking work of ops away from you
  • As developer, just want to write your code
  • Less emphasis on memory management and such

Open Source

  • Oracle cloud based on open source
  • Not focused on profit

Chatbots

  • Teens like testing and emojis
  • Adults like to text too; good interface
  • With chatbots, don’t know if talking to human or bot

Blockchain

  • More than just bitcoin
  • Ledgers build trust

Machine Learning

  • Now have proessing/compute power to enable machine learning

Use case – selling and buying a car

  • The chatbot asks basic info and calls serverless functions to look up value
  • Showed Oracle bot builder service – set up intents (phrases that represent what you want to do), train bot (ex: linguistics, machine learning)
  • Test bot by trying a chat in the config screen
  • Artificial intelligence integration trains bot

Serverless Functions

  • Serverless is a category
  • There are still servers; you just don’t have them. Instead the cloud provider does
  • Economics – only pay when service is being used – so only when called vs standing service
  • Agility – small amounts of code. So easier to write/debug/etc
  • Reliability – in cloud
  • Innovation – easier to try things out since easy to deploy

Fn

  • http://fnproject.io
  • Sample commands – init, run, test, deploy, call
  • Flow UI lets you watch as code runs. Looks like a sequence diagram except live; shows how long took, etc