[uberconf 2023] Practical AI Tools for Java Developers

Speaker: Ken Kousen

@kenkousen

For more, see the table of contents


Prompt Engineering

  • Tools are improving fast, might not be needed as job
  • Suggest context (ex: “pretend you are”)
  • Give example of what you want

Chat GPT

  • Free version is GPT 3.5 Turbo (improved performance over original 3.5)
  • $20/month for GPT 4. Can make 25 requests in a three hour block.
  • Have not noticed quality control over plugins.
  • Plugins change rapidly.
  • Apologizes when you correct it.
  • Warning about pasting your company’s code in
  • Trained thru summer 2021
  • Can’t read files on local file system (Bard can). Can read link but doesn’t know it can
  • Often wrong if you ask it about whether can do something. Like talking to toddler; says want thinks want to know.
  • Temperature – tweaks creativity vs precision
  • REST API docs
  • REST API: cookbook has examples
  • Must give credit card to call REST APIs. Pennies are for 1000 tokens (about 750 words). Charged for both input and output words. Also limits on context (amount GPT remembers). Not expensive if don’t use it much. Ken’s bill has been pennies and too low to be sent a bill.
  • REST API JSON response says how many tokens used. Can also see graph when log into account
  • Had it make multiple choice questions on a topic

Chat GPT Code Interpreter

  • Code Interpreter beta feature.
  • Need to explicitly enable under settings.
  • From OpenAI, not third party
  • ex: can convert Groovy to Kotlin DSL for Gradle

DALL-E

  • First popular text to image generation tool
  • A generation behind text/GPT.
  • Stable Diffusion free, but behind on quality
  • Prefers MidJourney, more realistic

Whisper

  • Audio to text
  • Takes audio or video and writes transcription.
  • Free (unless use REST API)
  • Mac Whisper – $20 on time fee for larger models. Good for transcribing videos of talks. Slow first time. After that (including other videos, fast. [caching?]
  • Creates .srt file (Subtitles)

Claude.AI

  • Free beta
  • Only available in US and UK
  • Can hold 100K tokens. ex: can summarize a novel
  • Quality comparable to ChatGPT 3.5, but not as good as 4.0
  • Can upload many file types
  • Harder to get back to previous conversations than ChatGPT. Need to click on “A” icon on top to see them
  • Doesn’t do image

Bard

  • Can upload answers to Google docs on Ken’s personal account, but not business account
  • Used to be able to answer who Venkat is but can’t anymore.

Llama 2

  • Meta announced today
  • Pretrained language model
  • Free unless large company (aka: competitors)

Descript

  • Transcribes and edits video
  • Can give instructions – ex: shorten gaps in video, remove filler words
  • If don’t move around much, will make it look like you are looking at camera
  • Can give text and select a voice. With 30 minute sample, can train on your voice

Canva

  • Can describe presentation want and Canva makes a draft
  • Can choose theme from list of choices
  • Magic eraser – brush over part of image don’t want and replaces with background nearby
  • Beats Sync – line of slide transition to beats of music
  • Magic Write – like GPT 3.5
  • Magic Design – give own image and make presentation around that

GitHub Copilot

  • Virtual pair programmer
  • Plugins for VSCode and IntelliJ
  • If hesitate, suggests code
  • Can’t agree to part of suggestion. Need to accept it all or delete
  • Guesses right a lot because knows what have done before in a training class
  • Always looks plausible because trained on own code. Need to look carefully
  • Next generation is GitHub CopilotX. Only available via wait list. VS Code only at this point, can use for pull requests.
  • GitHub Next – tools in a variety of states – https://githubnext.com. “Visualizing a Codebase” runs as github action to see packages

IntelliJ AI Assistant

  • Not much documentation on how it works. Only one blog post
  • In Ultimate, not Community
  • In beta edition
  • Can highlight code and ask to explain it
  • If don’t like suggestion, can request it suggests something else and get more choices
  • Can write commit message for you
  • Find issues with code when know language well
  • Helps in language know less well because it knows the API/syntax
  • Good for nuisance tasks that would take a lot of time

YouTube Summary

  • Get summary or transcript of video
  • Free
  • Up to 20 minute video

My take

I was doing my interview with the Build Propulsion Lab so was a few minutes late. It was a full room so my seat was on the floor. Luckily, the room had a large aisle so I could sit near the front instead of in the very back! And the carpet was comfy.

As far as Ken’s actual talk, it was great. I liked the overview of a bunch of tools and seeing the REST APIs for calling OpenAI. Great breath of topics and fun examples! I learned a lot including some tools I hadn’t heard of. And some very cool functionality!

[2023 kcdc] With Great Power Comes Great Responsibility: The Ethics of AI

Speaker: Matthew Renze

Twitter: @matthewrenze

For more, see the table of contents.


History

Tech has a tendency to be abused

  • land – slaves
  • mechanized war fare – expand influence
  • cyberware – mass surveillance

Alice and Bob

  • Need to decide if want to get cat or dog for kids.
  • One researches cats and one dogts.
  • Get into info bubble thinking cat lovers hate dogs and vice versa and mad at each other
  • Then talk to real people, learn people like both and get a cat and a dog.
  • A generation later they lose their jobs due to robots/AI. Their kids see lots of jobs because tech savvy.
  • Kids convince parents to upskill and get new job
  • Another generation later grandkids want biological augmentation and to marry an AI.
  • Feel lost in world no longer recognize
  • Learn about technology and see it is an evolution. Learn from grandchildren.

Today

  • When search for something, get more of it.
  • Then info bubble/echo chambers
  • Goal is to maximize engagement. This results in more extreme content so people click
  • Lose privacy – ex: shopping data predict pregenancy
  • Can deanonymomize data with data of birth, sez and zip code
  • Little privacy now and soon a lot less
  • Algorithmic bias – ex: racially bias criminal risk score, males preferred in resumes

AI

  • Uncanny valley – distrust things that almost like us
  • Hallucination – making up believeable, but false info
  • Misinformation at scale
  • Lack of AI literacy

What can we do

  • Delete cookies
  • Incognito mode
  • Throwaway emails
  • Stop using “click holes” to get pulled down rabbit holes
  • Opt out
  • Privacy regulations
  • Limit/stop using social media
  • Talk to other people

AI Developers

  • Eliminate bias in data – diverse datasets, exclude protected attributes, retrain algorithm over time
  • Be able to explain how AI made decision. Use decision tree vs neural network where can.
  • Let users choose how much error they allow
  • Don’t allow full autonomous

Fight misinformation

  • Who is the author/publisher?
  • What are their sources?
  • How strong is the evidence?

Near Future

  • Significant unemployment – simple/repetitive/costly jobs. Expect 20%+ jobs to go away by 20230 and be replaced by other higher tech jobs
  • Labor market unprepared for rapid change
  • Society is unprepared for change.
  • Many people left behind in poverty.
  • Synthetic media – indistinguishable from human data. Propaganda/misinformation at scale. Deep fakes. Deep nude (remove clothes without permission), etc
  • With 10 likes, AI knows you well as colleague.
  • Surveillance capitalism – can’t detect being manipulated
  • Greater social stratification – income gap
  • Safety issues – does self driving car protect driver or pedestrian
  • Autonomous weapons – currently a human is in the loop

Solutions

  • Educate everyone/AI literacy, Basics of ML, DL (deep learning), RL (reinforcement learning)
  • Job retraining
  • Retirement options for those too old to reskill
  • Mandatory higher ed – mandatory high school was controversial
  • Universal basic income/negative income task
  • Deep fake detection – arms race
  • Digital alibi – so can prove what doing at all times and therefore not in fake ideo
  • Blockchain for everything so have complete audit trail
  • Default mode of skepticism

Further Future – Speculative

  • AGI (artificial general intelligence) – at least as smart as average person
  • Improve health
  • Solve biggest problem – climate change, politics, government
  • Humans could become obsolete – ex: horses became obsolete to farms. “Peak horse” was in 1915
  • Collapse of modern institutions – could break capitialism.
  • Changes already faster than society can adapt. What happens when new discoveries every day?
  • Dystopian future – authoritarianism, communism, fascism, AI religion, AI super bureaucracy
  • Or a better AI based government
  • ASI (artificial super intelligence) – if create AGI, intelligence exposion can happen fast. AGI can rewrite its own code.
  • Alignment problem – how do we align human and AI values. Reward hacking – find loopholes
  • AI run amok – what happens if robot mine astroids. When does it stop
  • Conflicts – are we pets, ants, raw materials, competition, a threat?

Positives

  • We evolved for short bursts of stress.
  • Modern society is chronic stress
  • Be mindful with tech
  • Respect AI
  • Don’t fear/fight change
  • Use tech when beneficial and skip when not
  • Reward AI goal states
  • Keep ability to intervene if decision doesn’t align

Long run

  • Peacefully coexist with AI
  • AI wins
  • AI and humanity merge – most likely option
  • Humanity ends itself

Merge

  • No “us vs them” problem.
  • Phones an extension of us
  • Younger generation willing to merge with mind
  • VR/AR glasses
  • Gene editing
  • Brain/computer interfaces
  • Next version of people likely to be vary different

My take

The Alice and Bob stories are fun. There was a ton of information. It went very fast and definitely need time to process. I expected more discussion of ethics rather than covering “everything” but I’m happy with how it turned out.

[2023 kcdc] data science: zero to hero

Speaker: Gary Short

Twitter; @garyshort

Repo for presentation/samples: https://github.com/garyshort/kcdc2023

For more, see the table of contents.


Data science overview/rules

  • Applied data science – solving business problems
  • Curiosity is most important
  • The universe does random stuff so you haven’t discovered anything until you prove you’ve discovered something
  • Only qualitative and quantitative data – people lie, Can’t trust what you ask
  • Can only do math with numbers. Some things will pretend to be numbers when they are not. Also, can’t add different things (dollars vs killograms)
  • If you can’t explain it to a six year old, you don’t really understand it
  • Only have to be more than 51% accurate to do better than guessing
  • True random data has some clusters. The cluster will not last forever. Gambler’s paradox. 27 blacks doesn’t mean due a red.
  • If it’s not in production, it doesn’t exist. Can’t just be on your laptop. Most data scientists need to give to someone else to get it to prod. Cultural difference between data scientist and person who is building/deploying.
  • % chance of hypothesis being right or wrong doesn’t have to sum up to 100%. ex: grass is wet. Could be rain or a dog peeing or something else

Types of data

Structured

  • Relational data
  • Get connection, create cursor, fill cursor, close connection
  • Schema is important on data write.

Semi structured

  • ex: JSON/MongoDB.
  • Get connection, name collection, fill cursor, close connection
  • Schema important when read data

Unstructured

  • Blob (binary large object)
  • Stored in pages/blocks
  • Access via URL

Graph

  • Degrees of separation – can you deliver a message directly
  • People in room now more closely connected because in this session (and would stay so if shared contact info)
  • Wide network effect
  • Nodes tend to be nouns
  • Edges tend to be verbs. Can be unidirectional or bidirectional
  • Get connection, state query, fill cursor, close connection

AI/ML works on data types

  • Categorical – segregate data by category where category is not important (ex: blue eyes)
  • Ordinal – order is important but distance between is not important (ex: position in a race)
  • Numeric – order is important but distance is the same (ex: counting)
  • Ratio – numeric but with positive numbers

Can only do math with ordinal and ratio types. A survey on a scale of 1-5 (likert scale) is ordinal, not numeric/ratio. Can’t do average. This is categorical data (ex: very happy, pissed off). Can do math with counts of categorical data but not single items.

Exploratory Data analysis

  • Need to understand the variables. Ex: is it really a number
  • Handle missing values – depends on scenario. Ex: use mean or median (if not looking for that particular thing), delete row with incomplete data
  • Outlier detection – sometimes genuinely an outlier (ex: someone who is 8 feet tall), sometimes it is the important piece of data (ex: which exits people use in a fire; one person went the other way and want to know why). Need to determine why outlier and if care so don’t delete data need
  • Univariate analysis – ex: histogram for categorical data
  • Bivariate analysis – correlated data; could be hidden variable. Don’t need both of them since one predicts the other. Want minimal variables in model so chose the one that brings in the most info.

Feature Selection

  • Preprocess the data
  • Normalize data – units have to be the same. Using variance doesn’t help because unit is now original unit squared. Can use Z-score so everything on scale 0-1 using mean and divisor
  • Encode the categories – make so can do math
  • Booleans are numbers (0 and 1)
  • Word vector – can use math to represent a word. Complicated. Ok to have to look up every time.
  • Bi/multivariate analysis – high correlation means redundant info
  • Feature importance – check coefficients from regressions and scores from gradient boosting

Model Selection

  • People have a favorite model
  • Use one or more models. See which gives best result before making any changes to the model.
  • Good to use a linear and non linear one. Normal the linear model is enough because normally dealing with people (directly or indirectly). Linear equations work for a normal distrobution.
  • Make sure to find global minimum, not local/current one
  • Compete with yourself. Try to have your second best model beat your current best model. Once something in prod, start again

Train/test split

  • 80/20 split
  • 80% data for training
  • 20% data real
  • Model never sees training data because can’t grade own homework

Model evaluation

  • Outcome – model + error
  • Error is difference between predicted and observed values.
  • Sample of population can be model. Get error because of sampling bias

Hyper Parameter Tuning

  • Every models have parameters to govern how works.
  • Hyper param tuning is fiddling with these
  • Will be an optional value for each of these parameters for your particular use case

Model Validation

  • Need to make sure model doesn’t work by chance
  • K-Fold Cross Validation – after do 80/20 split, can feed data back in and do again
  • Stratified Cross Validation – same as K-Fold but unbalanced classes

Bayesian inference in Real Life

  • P(h|e) = P(e|h) * P(h) / P(e)
  • In English: current belief = new evidence

Estimation

  • Important to be able to estimate values when have no data
  • Dumb questions like “how many piano tuners are there in Chicago” was testing this. So few people could do it that pulled question. [I suspect the ridicule and people memorizing the answer was a factor too]
  • Easier to estimate a range than an actual value
  • Pick a minimum that it couldn’t possibly be below. You’d be surprised and skeptical if less than that.
  • Pick a maximum that it couldn’t possible be above.
  • Pick value spits range in two so that the possibility of being above/below has equal probability. Call this the medium. Resist temptation to pick the mean.
  • Repeat finding the minimum to median. Call this Q1
  • Then repeat finding the median to maximum to get Q3.
  • This gets you a five point description of a distribution
  • Use sampling to get mean of distribution

Lab part

The lab was to predict something you want to predict and make a model and/or predict a probability. Can do individually or in groups. He also gave the option to leave. I chose leave because there was a little over an hour left when he finished explaining the lab. I need to go over the material for my own workshop so doing that instead of the lab.

My take

This was a good intro and Gary is a good, engaging speaker. I learned (and re-learned) a bunch of stuff. Both concepts and terms. Having a bunch of rules and getting into them made it fun. (ex: math needs numbers). I like that the concept part was longer (except for the lack of a break), but it would hav been better if it was advertised that way in the intro.

I disagree with Gary’s philosophy on not having a bathroom break. He started by saying there would be 60-90 minutes of lecture and then a lab. [wound up to being just over 2.5 hours] And that we are all adults and can go to the bathroom whenever. Someone asked at the 90 minute mark if there would be a bathroom break and he repeated the all adults thing expanding that you’ll catch up and the slides will be online later. He also said people feel compelled to hold it until break or go when told it is break. However, the tradeoff is that you don’t want to go to the bathroom lest you miss something that will wind up being important during the session. It’s super frustrating to miss stuff and then struggle to understand later. It may be that this workshop isn’t cumulative but there’s no way to know. Also, by not having a break, you aren’t giving people’s brain a break. It’s not just about the bathroom.

Gary stated he puts the materials online after so people don’t read it during the session. That I agree with!