[2024 dev2next] improving llm results with rag

Speaker: Brian Sletten (bsletten@mastodon.social)

For more see the table of contents

PDF of deck on dropbox


Notes

  • Problem: info in our language but models insufficient to extract it
  • Important to capture sequences – ex: context window
  • Problems with word2vec and other embedding approaches. Sequences lost impact if got too long. “New York” and “Soviet Union” useful since near each other. Words farther apart are harder to predict
  • Next transformer architecture used levels of “attention” to have more detailed views between/across sentences
  • Encode in a lower dimensional space and decode into higher dimensional space
  • Positional encoding of words in sentences – picks up some nuance – some has quadratic calculations, but can parallelize so fast
  • Expensive to create a model. Still expensive but less so to tune it
  • Types of RAG: Naive, Advanced, Modular

Emergent behavior

  • Not magical/sentinence
  • Avoids need to have to retrain all the time
  • Use linguistic skills, not knowledge skills
  • Chain of thought prompting

Causes of hallucinations

  • No logic engine/no way to evaluate correctness
  • Language engine with schocastic element to avoid memorizing and encourage novelty

Example

  • Showed how can access web.
  • Allows you to summarize current news stories
  • Note: can include output format in prompt: As Json, As CSV, In bulleted list format, etc

Options

  • Basic model
  • Fine tuning – setting parameters
  • Prompt engineering

RAG

  • Allows getting a lot of custom data
  • Work with vector databases

Searching

  • Find portion of data. Then do kd tree and nearest neighbor search
  • Invevertible tree
  • Hierarchical Navigable Small Worlds (HNSW) – start in high dimensional space then detailed search
  • Like express to local train in a city
  • Can find docs that mention a keyword and then use those docs to answer questions
  • Want to minimize long contexts because costs lots of tokens
  • Chunking makes docs smaller so pay less for search – llama provides API to chunk

Limitations of Naive RAG Models

  • Issues with precision and recall: misaligned chunks, irrelevant or missing chunks
  • Can still hallucinate if no backed by the used chunks
  • Still have toxicity and bias problems

Chaining

  • Initial response
  • Constitutional principal – showed how to add ethics/legality and rewrites
  • Constitutional principal – added rewrite for 7th grader and rewrites
  • That gives final response

Security

  • Easy to poison data
  • Need data cleansing but cleverer
  • http://berryvilleiml.com – machine learning security

Reference: https://www.louisbouchard.ai/top-rag-techniques/

My take

I learned a bunch and the examples were fun. Good to see the code and output. My brain filled up during the session. I needed to switch to an easier talk for the final 5:30 session as I don’t have enough focus left. Cool how the answer to security was a different deck!

[2024 dev2next] customgpts

Speaker: Ken Kousen @kenkousen

For more see the table of contents


Notes

  • Goal: Customize ChatGPT without coding
  • Useful for virtual assistants, automate repetitive tasks and shape AI behavior (within limits)
  • GPT Builder: create profile picture, specify leading questions, upload files (useful for own info or diff since gpt last trained), enable code interpreter, publish via a link or public “GPT Store”
  • The GPT Store is just a public link and search. There’s no money

chatgpt.com

  • Explore GPTs in left nav
  • Can search for or browse. Lots of available ones.
  • 4-5 million custom GPTs. Bar to creating is very low
  • Ken made Pragmatic Assistant with rules from PDF and editor rules. Ran chapters through guide. [Janeice made a code version of this for our book; no AI though, long predated ChatGPT]
  • Like a skin on ChatGPT

Demo

  • CustomGPT with text from Venkat’s Agile book
  • Lets choose how communicate – formal, casual, pirate speak, shakespeare, etc [I’ve used chatpgt for western stuff for coderanch a few times]
  • Configure tab – description, instructions for GPT to use (like trying to sell the book), 4 generated conversation starters so users see some prompts.
  • Click on upload files to give training data (in this case Venkat’s book). Takes care of all the RAG steps. Limit on number and size of attachments. Was able to upload 8 books.
  • Give it capabilities like web browsing to use certain sites, DALLE-E, code interpreter
  • Instructions is misleading as have way more room than actually do.
  • Can keep private, sharing via the link or via the GPT Store
  • Gets Ken’s name wrong. Loses the “s”

Trying it out

  • Outputs in Markdown
  • Did try to sell book per instructions

Stats

  • 10 files per GPT
  • Text, spreadsheets, presentations, images,. etc
  • 2 millin tokens per file
  • 20MB for images

“Security”

  • GPT may share file contents.
  • Files can be downloaded when Code Interpreter is enabled
  • [I experimented with Venbot: had it tell me the books available, the table of contents of one, the sections in ch 1 and the full text of two sections. Then I ran out of free tokens for a few hours. When I asked for all of ch 1 it gave me a little and prompted to read the book]
  • Only reference to copyright is in the docs of using free materials

Venbot

  • https://chatgpt.com/g/g-LsSBgJX2D-venbot-5000
  • Demo was fun
  • Cool that it figured out the weather from the location

Actions

  • GPTs allow you to define Actions – external API. Published via OpenAPI spec (what we used to call Swagger)
  • Doesn’t work that well. Easier to write code.
  • ActionsGPT can help generate
  • Many limitations: no customer headers, must be same domain (except google, microsoft and adobe oauth domains), 100K request/response, 45 second timeout, text only

Code version

  • The website was NoCode
  • Showed using langchain4j
  • Can write own logic and will identify. Langchain will call

Problem

  • Custom GPT has name on it
  • When wrong, looks like you did it

Competitors

  • Claude AI – can only share with teammates (via team subscription), limited number of resources. Two books fit. Can’t access internet.
  • NotebookLM from Google – can generate stuff like a read only FAQ based on Gemini, Can generate a study guide (short/long essay questions with answers). Also has audio overview. Meant to be a study tool from docs, websites, and youtube (transcripts)

My take

It was cool seeing a demo. Also, the interaction with Dave Thomas for Pragmatic prompts was fun. I played some with Venbot and some other GPTs while Ken was talking which was also fun. Especially getting it give me parts of the book.