[2024 dev2next] improving llm results with rag

Speaker: Brian Sletten (bsletten@mastodon.social)

For more see the table of contents

PDF of deck on dropbox

Notes

Problem: info in our language but models insufficient to extract it
Important to capture sequences – ex: context window
Problems with word2vec and other embedding approaches. Sequences lost impact if got too long. “New York” and “Soviet Union” useful since near each other. Words farther apart are harder to predict
Next transformer architecture used levels of “attention” to have more detailed views between/across sentences
Encode in a lower dimensional space and decode into higher dimensional space
Positional encoding of words in sentences – picks up some nuance – some has quadratic calculations, but can parallelize so fast
Expensive to create a model. Still expensive but less so to tune it
Types of RAG: Naive, Advanced, Modular

Emergent behavior

Not magical/sentinence
Avoids need to have to retrain all the time
Use linguistic skills, not knowledge skills
Chain of thought prompting

Causes of hallucinations

No logic engine/no way to evaluate correctness
Language engine with schocastic element to avoid memorizing and encourage novelty

Example

Showed how can access web.
Allows you to summarize current news stories
Note: can include output format in prompt: As Json, As CSV, In bulleted list format, etc

Options

Basic model
Fine tuning – setting parameters
Prompt engineering

RAG

Allows getting a lot of custom data
Work with vector databases

Searching

Find portion of data. Then do kd tree and nearest neighbor search
Invevertible tree
Hierarchical Navigable Small Worlds (HNSW) – start in high dimensional space then detailed search
Like express to local train in a city
Can find docs that mention a keyword and then use those docs to answer questions
Want to minimize long contexts because costs lots of tokens
Chunking makes docs smaller so pay less for search – llama provides API to chunk

Limitations of Naive RAG Models

Issues with precision and recall: misaligned chunks, irrelevant or missing chunks
Can still hallucinate if no backed by the used chunks
Still have toxicity and bias problems

Chaining

Initial response
Constitutional principal – showed how to add ethics/legality and rewrites
Constitutional principal – added rewrite for 7th grader and rewrites
That gives final response

Security

Easy to poison data
Need data cleansing but cleverer
http://berryvilleiml.com – machine learning security

Reference: https://www.louisbouchard.ai/top-rag-techniques/

My take

I learned a bunch and the examples were fun. Good to see the code and output. My brain filled up during the session. I needed to switch to an easier talk for the final 5:30 session as I don’t have enough focus left. Cool how the answer to security was a different deck!

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Leave a Reply

Share this:

Leave a Reply