1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
|
from langchain.document_loaders.csv_loader import CSVLoader loader = CSVLoader(file_path = './example_data/mllb_teams_2012.csv') data = loader.load()
with open('../../state_of_the_union.txt') as f: state_of_the_union = f.read() from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size = 100, chunk_overlap = 20, length_function = len, add_start_index = True, )
import os import chromadb from langchain.vectorstores import Chroma from langchain.embeddings import HuggingFaceEmbeddings from langchain.document_transformers import ( LongContextReorder, ) from langchain.chains import StuffDocumentsChain, LLMChain from langchain.prompts import PromptTemplate from langchain.llms import OpenAI
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
texts = [ "Basquetball is a great sport.", "Fly me to the moon is one of my favourite songs.", "The Celtics are my favourite team.", "This is a document about the Boston Celtics", "I simply love going to the movies", "The Boston Celtics won the game by 20 points", "This is just a random text.", "Elden Ring is one of the best games in the last 15 years.", "L. Kornet is one of the best Celtics players.", "Larry Bird was an iconic NBA player.", ]
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever( search_kwargs={"k": 10} ) query = "What can you tell me about the Celtics?"
docs = retriever.get_relevant_documents(query) docs
from langchain.embeddings import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings() embeddings = embeddings_model.embed_documents( [ "Hi there!", "Oh, hello!", "What's your name?", "My friends call me World", "Hello World!" ] ) len(embeddings), len(embeddings[0])
embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?") embedded_query[:5]
from langchain.document_loaders import TextLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import FAISS
raw_documents = TextLoader('../../../state_of_the_union.txt').load() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) documents = text_splitter.split_documents(raw_documents) db = FAISS.from_documents(documents, OpenAIEmbeddings())
query = "What did the president say about Ketanji Brown Jackson" docs = db.similarity_search(query) print(docs[0].page_content)
embedding_vector = OpenAIEmbeddings().embed_query(query) docs = db.similarity_search_by_vector(embedding_vector) print(docs[0].page_content)
from abc import ABC, abstractmethod from typing import Any, List from langchain.schema import Document from langchain.callbacks.manager import Callbacks
class BaseRetriever(ABC): ... def get_relevant_documents( self, query: str, *, callbacks: Callbacks = None, **kwargs: Any ) -> List[Document]: """Retrieve documents relevant to a query. Args: query: string to find relevant documents for callbacks: Callback manager or list of callbacks Returns: List of relevant documents """ ...
async def aget_relevant_documents( self, query: str, *, callbacks: Callbacks = None, **kwargs: Any ) -> List[Document]: """Asynchronously get documents relevant to a query. Args: query: string to find relevant documents for callbacks: Callback manager or list of callbacks Returns: List of relevant documents """ ...
from langchain.chains import RetrievalQA from langchain.llms import OpenAI from langchain.document_loaders import TextLoader loader = TextLoader('../state_of_the_union.txt', encoding='utf8') from langchain.indexes import VectorstoreIndexCreator index = VectorstoreIndexCreator().from_loaders([loader]) query = "What did the president say about Ketanji Brown Jackson" index.query_with_sources(query) query = "What did the president say about Ketanji Brown Jackson" index.query_with_sources(query) index.query("Summarize the general content of this document.", retriever_kwargs={"search_kwargs": {"filter": {"source": "../state_of_the_union.txt"}}})
from langchain.vectorstores import Chroma from langchain.document_loaders import WebBaseLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0) splits = text_splitter.split_documents(data)
embedding = OpenAIEmbeddings() vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
from langchain.chat_models import ChatOpenAI from langchain.retrievers.multi_query import MultiQueryRetriever
question = "What are the approaches to Task Decomposition?" llm = ChatOpenAI(temperature=0) retriever_from_llm = MultiQueryRetriever.from_llm( retriever=vectordb.as_retriever(), llm=llm ) unique_docs = retriever_from_llm.get_relevant_documents(query=question) len(unique_docs) from typing import List from langchain import LLMChain from pydantic import BaseModel, Field from langchain.prompts import PromptTemplate from langchain.output_parsers import PydanticOutputParser
class LineList(BaseModel): lines: List[str] = Field(description="Lines of text")
class LineListOutputParser(PydanticOutputParser): def __init__(self) -> None: super().__init__(pydantic_object=LineList)
def parse(self, text: str) -> LineList: lines = text.strip().split("\n") return LineList(lines=lines)
output_parser = LineListOutputParser()
QUERY_PROMPT = PromptTemplate( input_variables=["question"], template="""You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Provide these alternative questions seperated by newlines. Original question: {question}""", ) llm = ChatOpenAI(temperature=0)
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)
question = "What are the approaches to Task Decomposition?"
scoring = semantic_similarity + (1.0 - decay_rate) ^ hours_passed
import faiss
from datetime import datetime, timedelta from langchain.docstore import InMemoryDocstore from langchain.embeddings import OpenAIEmbeddings from langchain.retrievers import TimeWeightedVectorStoreRetriever from langchain.schema import Document from langchain.vectorstores import FAISS
embeddings_model = OpenAIEmbeddings()
embedding_size = 1536 index = faiss.IndexFlatL2(embedding_size) vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {}) retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.0000000000000000000000001, k=1)
|