Speaker: Brian Sletten (bsletten@mastodon.social)
For more see theĀ table of contents
Notes
- Problem: info in our language but models insufficient to extract it
- Important to capture sequences – ex: context window
- Problems with word2vec and other embedding approaches. Sequences lost impact if got too long. “New York” and “Soviet Union” useful since near each other. Words farther apart are harder to predict
- Next transformer architecture used levels of “attention” to have more detailed views between/across sentences
- Encode in a lower dimensional space and decode into higher dimensional space
- Positional encoding of words in sentences – picks up some nuance – some has quadratic calculations, but can parallelize so fast
- Expensive to create a model. Still expensive but less so to tune it
- Types of RAG: Naive, Advanced, Modular
Emergent behavior
- Not magical/sentinence
- Avoids need to have to retrain all the time
- Use linguistic skills, not knowledge skills
- Chain of thought prompting
Causes of hallucinations
- No logic engine/no way to evaluate correctness
- Language engine with schocastic element to avoid memorizing and encourage novelty
Example
- Showed how can access web.
- Allows you to summarize current news stories
- Note: can include output format in prompt: As Json, As CSV, In bulleted list format, etc
Options
- Basic model
- Fine tuning – setting parameters
- Prompt engineering
RAG
- Allows getting a lot of custom data
- Work with vector databases
Searching
- Find portion of data. Then do kd tree and nearest neighbor search
- Invevertible tree
- Hierarchical Navigable Small Worlds (HNSW) – start in high dimensional space then detailed search
- Like express to local train in a city
- Can find docs that mention a keyword and then use those docs to answer questions
- Want to minimize long contexts because costs lots of tokens
- Chunking makes docs smaller so pay less for search – llama provides API to chunk
Limitations of Naive RAG Models
- Issues with precision and recall: misaligned chunks, irrelevant or missing chunks
- Can still hallucinate if no backed by the used chunks
- Still have toxicity and bias problems
Chaining
- Initial response
- Constitutional principal – showed how to add ethics/legality and rewrites
- Constitutional principal – added rewrite for 7th grader and rewrites
- That gives final response
Security
- Easy to poison data
- Need data cleansing but cleverer
- http://berryvilleiml.com – machine learning security
Reference: https://www.louisbouchard.ai/top-rag-techniques/
My take
I learned a bunch and the examples were fun. Good to see the code and output. My brain filled up during the session. I needed to switch to an easier talk for the final 5:30 session as I don’t have enough focus left. Cool how the answer to security was a different deck!