[2024 dev2next] improving llm results with rag

Speaker: Brian Sletten (bsletten@mastodon.social)

For more see the table of contents

PDF of deck on dropbox


Notes

  • Problem: info in our language but models insufficient to extract it
  • Important to capture sequences – ex: context window
  • Problems with word2vec and other embedding approaches. Sequences lost impact if got too long. “New York” and “Soviet Union” useful since near each other. Words farther apart are harder to predict
  • Next transformer architecture used levels of “attention” to have more detailed views between/across sentences
  • Encode in a lower dimensional space and decode into higher dimensional space
  • Positional encoding of words in sentences – picks up some nuance – some has quadratic calculations, but can parallelize so fast
  • Expensive to create a model. Still expensive but less so to tune it
  • Types of RAG: Naive, Advanced, Modular

Emergent behavior

  • Not magical/sentinence
  • Avoids need to have to retrain all the time
  • Use linguistic skills, not knowledge skills
  • Chain of thought prompting

Causes of hallucinations

  • No logic engine/no way to evaluate correctness
  • Language engine with schocastic element to avoid memorizing and encourage novelty

Example

  • Showed how can access web.
  • Allows you to summarize current news stories
  • Note: can include output format in prompt: As Json, As CSV, In bulleted list format, etc

Options

  • Basic model
  • Fine tuning – setting parameters
  • Prompt engineering

RAG

  • Allows getting a lot of custom data
  • Work with vector databases

Searching

  • Find portion of data. Then do kd tree and nearest neighbor search
  • Invevertible tree
  • Hierarchical Navigable Small Worlds (HNSW) – start in high dimensional space then detailed search
  • Like express to local train in a city
  • Can find docs that mention a keyword and then use those docs to answer questions
  • Want to minimize long contexts because costs lots of tokens
  • Chunking makes docs smaller so pay less for search – llama provides API to chunk

Limitations of Naive RAG Models

  • Issues with precision and recall: misaligned chunks, irrelevant or missing chunks
  • Can still hallucinate if no backed by the used chunks
  • Still have toxicity and bias problems

Chaining

  • Initial response
  • Constitutional principal – showed how to add ethics/legality and rewrites
  • Constitutional principal – added rewrite for 7th grader and rewrites
  • That gives final response

Security

  • Easy to poison data
  • Need data cleansing but cleverer
  • http://berryvilleiml.com – machine learning security

Reference: https://www.louisbouchard.ai/top-rag-techniques/

My take

I learned a bunch and the examples were fun. Good to see the code and output. My brain filled up during the session. I needed to switch to an easier talk for the final 5:30 session as I don’t have enough focus left. Cool how the answer to security was a different deck!

[2024 dev2next] reclaiming agility

Speaker: Dave Thomas @pragdave

For more see the table of contents


Notes

  • Started with a vote between this talk and a programming language one. This talk (barely) won

“Agile”

  • Wish people would stop using word “agile”
  • “Agile” is now an industry
  • “Agile” co-opted
  • Was supposed to be on what values need to apply, not the practices
  • Can’t sell or package values
  • Can’t be agile if handed a process

“Agile is Dead”

  • Agile is dead. Agile with a capital A is what you pay for. Lowercase agile is the original set of values.
  • Devs are unhappy with Agile – too many meetings and processsed
  • Read great quotes from Death of Agile quotes all of which value agile principles

Agile noun vs adjective

  • Agile has become a proper noun; something people can package/sell/make you follow.
  • agile should not be a noun. A noun is something you can point to
  • agile should be used as an adjective
  • It was made a noun to sell it; you can buy a pound of green
  • Agile is lost to us; how do we reclaim it?

History

  • How do we reclaim it?
  • 1995 – 31% failed, 53% partial success (overbudget or late or didn’t deliver all functionality)
  • 2020 – 11% failed, 47% partial success
  • Can’t go back
  • Too few people started programming when everything was specified up front and the problems to remember why it was bad.
  • Requirements change

agility

  • Take back control
  • Ignore the branding
  • Make it work for us individually and as a team
  • Create a living, evolving local set of values and practices
  • There are things you can’t do that would like to do. That’s live. Still much you can do at a local/team level
  • Can’t change everything – business promises things, have the initiate carefully and measured
  • Better off than are today by improving locally

First page of manifesto from snowbird:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

Value

  • Values are an internal filter through which you make decisions
  • Make decisions based on the four values in the manifesto
  • It’s difficult to do this. But the code is hard to write too

Skills need

  • Empathy – see things as someone else sees. Cognitive (thinking) and affective (feeling). Also need empathy for things so understand how will react/stress under.
  • Storytelling – everything we do is an abstraction; using stories/metaphors allow people to understand.
  • Feedback – Classic feedback loop. input -> process -> result -> compare to expectations -> input. In software, we apply the feedback to the process (or expectations) instead like in unit testing. Could also be the process used to create the code in the first place. Aka error when coding, bad assumptions, or systemic error. The third is most valuable, but done the least.
  • Change driven – spent most time changing things vs building new things. Development is a nested set of change cycles. Software is the most malleable building component. Think or work as a series of steps instead of an outcome.
  • Value focused – Goal is help people and add value. Working software is a means to an end. Earlier delivery is more valuable so cadence is a trade off against accumulated value.
  • Courage – being an agent of change means making mistakes. Don’t hide behind the process or make excuses. It has to be ok to make mistakes. If we are taking control or handling something, we are accepting responsibility for it. If we communciate effectively and honestly and are ignored, they are responsible. Don’t let an imposed constraint be a stressful responsibility unless you’ve accepted it.

What can I do

  • no-one knows
  • nobody can know what you individually should do
  • except possibly you
  • you are the only person who understands the constraints living under and your personal values.
  • What to do – Find out where you are now, take a small step towards your goal, adjust your understanding based on what you learned, repeat. That’s agility!
  • How to do it – When faced with 2+ choices that deliver roughly the same value, take the path that makes future changes easier

Example

  • Unproductive meetings – Goal is to feel more productive.
  • Non agile way is to say nobody can speak for more than 3 minutes and pass around the teddy bear.
  • Agile way – start with why have them – know what’s going on, identify frustration, planning, serendipitous discovery, etc. Identity downtime. In classic Scrum 17% of time is in meetings.
  • Small step #1: stop having meetings. Then adjust based on what happened. Will world end if abandon meetings for 2 weeks? Will company collapse? [seems like a big step]
  • Found delivery went up but two cases of duplicated effort and lost too much communication.
  • Small step #2: senior dev spends an hour a day chatting with individuals. Find common problems, patterns, and opportunities. Helps with small problems and brings people togehter for larger one. Found unnecessary activity and local expert
  • Small step #3: change to 45 minutes a day
  • Be prepared that will change over time
  • Called “Developer without a portfolio”. 6/45 independently evolved to doing this. Most had a full time facilitator/ambassador. None wanted to go back.

Another example:

  • Problem: when we refactor, we spend too much time fixing up the existing tests
  • Side story: Tests were brittle and a lot of failures on new machine. Deleted all internal tests and reintroduced as needed.
  • Why have tests: validate functionality, protected against regressions, inform design
  • Cost: Time to fix when fail
  • Small step #1: disabled on API tests after code written. Rely on caller to find issues. Make sure tracking error rate metrics before and after change
  • Months later no observable change and refactorings faster. Found unintended coupling
  • Small step #2: addd test to detect coupling and investigate cause

Closing

  • agility – bring back the fun

Q&A

  • How was it rewriting pragmatic programmer 20th edition? Threw out or rewrote a lot 30% of tips. Changes references in many others or described better. Ideas themselves didn’t change. Common sense. Things people know. Seeing it written down validates. Gave names to concepts that didn’t have.
  • Example of storytelling of what we do? Use point of view of customer using product
  • How change more? Get others to make small changes too

My take

Awesome keynote. It was so nice to meet Dave in person. It was great seeing his perspective. The material is helpful in impacting positive change. The info and examples are great. Missed a little of the end of Q&A to setup for my session

[2024 dev2next] gen ai panel

Intro and one of the mic runners: Ixchel Ruiz

Panel:  Frank Greco,  Brian Sletten, Neal Ford, Micah Silverman

For more see the table of contents


Opening remarks

  • The people who made money in the gold rush made the tools
  • We’re supposed to be the experts; start using
  • AI is not going to take your job, but the person who knows how to use AI well will take your job
  • Snyk has a mandate should be using AI at some point every day
  • If you think this isn’t going to change your job, you aren’t paying attention.
  • Tech journalism is dead; don’t be a credulous idiot and believe PR
  • 28% task competition in tasks for junior devs and none for senior
  • 41% increase in rewrites
  • Radical decrease in moving code around. Keeps adding code, not looking reuse.
  • Don’t want junior devs adding garbage code nobody needs

Q&A

  • Language model selection? hard question. Need to build something quick and iterate. A lot of moving parts in a RAG system. Consider cost (latest and greatest more expensive, more work inference stage costs more). Excel is an abstraction and we know how it works. LLMs are nondeterministic and black boxes.
  • Tech debt? need guardrails to protect against blasting code. Need whether came from junior dev, senior dev, or AI. Dev are more trusting of code from LLMs and falsely believe it is more secure. In past devs, were more pessimistic and verified more. Don’t yet have best practices for probabilistic systems.
  • Have to be careful with prompts with natural language vs turning complete language. Is it a new programming language that looks like natural? In near future, will have a LLM whisperer on each time.
  • News article where tried to get ChatGPT to admin sentiment? we anthropomorphize., Turning test not enough. Arc test (https://lab42.global/arc/) working on better approach. (Abstract and Reasoning Corpus). Solving problems by recognizing new patterns. In field, people saying LLM not the future. In 70s, Eliza “showed” computers could talk, but really parlor trick. Currently in AI hype cycle. Need people who understand limits of mind/limits of what possible. Not on verge of generalized AI/sentience. May not be in our lifetime if even possible. Plenty of natural stupidity.
  • LLMs trained on internet text and generate vast amounts of text which put on internet. When pollute internet to point can’t be trained? on the cusp. “Dead internet” theory where generated content exceeds humans. What happens when people don’t create new poems and creativity. “AI has taught me to believe in a soul because I’ve seen art created without it”
  • Definition of AGI? ARC competition. Referenced books for why not on cusp [didn’t really answer]
  • Reused joke from yesterday about the real changes are in AV, not in AI (when the mic didn’t work)
  • AI fundamentally lies to us; we call hallucinations. Companies say LLMs wouldn’t exist if couldn’t break copyright laws and 2/3 of ChatGPT users present results as own work. Integrity of crypto bros? Industry has ignored ethics for so long. Physists brought into fiction as evil once could destroy world. We are next as bad guy. Youtube on apology tour for dumbing down culture. Project Nightshade lets artists poison art to confuse LLMs. [it adds a pixel layer to categorize incorrectly]
  • AI tooks that help with daily work cycle? Assembly AI API to create transcript from audio, Copilot, Codium (suggests tests). Warp terminal – creates regex from standard English.
  • Prove work made by human vs AI: False positive rate too high. Need ethics. Ex: should have musicians for making music. Teacher added white on white text in test question so could tell if cheating. Unfortunately not sustainable or scalable
  • Open source LLMs: different than what used to with open source. The key is the data. Not open source if don’t say where data coming from legally. Chain of thought makes beefing up model less important due to post processing. Asymmetric power between big tech companies and others. How compete? Microsoft doesn’t have a Windows dept; it is spread out over different departments.
  • Hiring changing narrative especially for recent grads? AI or other job market trends? Yes. Bias if train on resumes with western names or traditional education. Will be fallout. Recent story: manager submitted resume and it got rejected; fired HR. AI is the new electricity where just expect it. Difference is we understood electricity before started using. Hype cycle were AI needs to be on resume to get attention. In NY, big companies hiring junior people expecting AI to help them out. Market for senior folks is dead.

Closing thoughts

  • Learn things. Mediocre people trying to use AI for competitive advantage. Use as tool to be better.
  • Education. AI is not a search engine. Don’t use it as one.

My take

The format was audience Q&A. I enjoyed reading about the ARC project and Nightshade. Great audience questions and great end to the day.