[2024 dev2next] improving llm results with rag

Speaker: Brian Sletten (bsletten@mastodon.social)

For more see the table of contents

PDF of deck on dropbox


Notes

  • Problem: info in our language but models insufficient to extract it
  • Important to capture sequences – ex: context window
  • Problems with word2vec and other embedding approaches. Sequences lost impact if got too long. “New York” and “Soviet Union” useful since near each other. Words farther apart are harder to predict
  • Next transformer architecture used levels of “attention” to have more detailed views between/across sentences
  • Encode in a lower dimensional space and decode into higher dimensional space
  • Positional encoding of words in sentences – picks up some nuance – some has quadratic calculations, but can parallelize so fast
  • Expensive to create a model. Still expensive but less so to tune it
  • Types of RAG: Naive, Advanced, Modular

Emergent behavior

  • Not magical/sentinence
  • Avoids need to have to retrain all the time
  • Use linguistic skills, not knowledge skills
  • Chain of thought prompting

Causes of hallucinations

  • No logic engine/no way to evaluate correctness
  • Language engine with schocastic element to avoid memorizing and encourage novelty

Example

  • Showed how can access web.
  • Allows you to summarize current news stories
  • Note: can include output format in prompt: As Json, As CSV, In bulleted list format, etc

Options

  • Basic model
  • Fine tuning – setting parameters
  • Prompt engineering

RAG

  • Allows getting a lot of custom data
  • Work with vector databases

Searching

  • Find portion of data. Then do kd tree and nearest neighbor search
  • Invevertible tree
  • Hierarchical Navigable Small Worlds (HNSW) – start in high dimensional space then detailed search
  • Like express to local train in a city
  • Can find docs that mention a keyword and then use those docs to answer questions
  • Want to minimize long contexts because costs lots of tokens
  • Chunking makes docs smaller so pay less for search – llama provides API to chunk

Limitations of Naive RAG Models

  • Issues with precision and recall: misaligned chunks, irrelevant or missing chunks
  • Can still hallucinate if no backed by the used chunks
  • Still have toxicity and bias problems

Chaining

  • Initial response
  • Constitutional principal – showed how to add ethics/legality and rewrites
  • Constitutional principal – added rewrite for 7th grader and rewrites
  • That gives final response

Security

  • Easy to poison data
  • Need data cleansing but cleverer
  • http://berryvilleiml.com – machine learning security

Reference: https://www.louisbouchard.ai/top-rag-techniques/

My take

I learned a bunch and the examples were fun. Good to see the code and output. My brain filled up during the session. I needed to switch to an easier talk for the final 5:30 session as I don’t have enough focus left. Cool how the answer to security was a different deck!