Azure AI Search
Azure AI
Search
(formerly known as Azure Search
and Azure Cognitive Search
) is a
cloud search service that gives developers infrastructure, APIs, and
tools for building a rich search experience over private, heterogeneous
content in web, mobile, and enterprise applications.
Install Azure AI Search SDK
%pip install --upgrade --quiet azure-search-documents
%pip install --upgrade --quiet azure-identity
Import required librariesβ
import os
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_openai import OpenAIEmbeddings
Configure OpenAI settingsβ
Configure the OpenAI settings to use Azure OpenAI or OpenAI
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "YOUR_OPENAI_ENDPOINT"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
model: str = "text-embedding-ada-002"
Configure vector store settingsβ
Set up the vector store settings using environment variables:
vector_store_address: str = "YOUR_AZURE_SEARCH_ENDPOINT"
vector_store_password: str = "YOUR_AZURE_SEARCH_ADMIN_KEY"
Create embeddings and vector store instancesβ
Create instances of the OpenAIEmbeddings and AzureSearch classes:
embeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)
index_name: str = "langchain-vector-demo"
vector_store: AzureSearch = AzureSearch(
azure_search_endpoint=vector_store_address,
azure_search_key=vector_store_password,
index_name=index_name,
embedding_function=embeddings.embed_query,
)
Insert text and embeddings into vector storeβ
Add texts and metadata from the JSON data to the vector store:
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
loader = TextLoader("../../modules/state_of_the_union.txt", encoding="utf-8")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
vector_store.add_documents(documents=docs)
Perform a vector similarity searchβ
Execute a pure vector similarity search using the similarity_search() method:
# Perform a similarity search
docs = vector_store.similarity_search(
query="What did the president say about Ketanji Brown Jackson",
k=3,
search_type="similarity",
)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youβre at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, Iβd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyerβan Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nationβs top legal minds, who will continue Justice Breyerβs legacy of excellence.
Perform a vector similarity search with relevance scoresβ
Execute a pure vector similarity search using the similarity_search_with_relevance_scores() method:
docs_and_scores = vector_store.similarity_search_with_relevance_scores(
query="What did the president say about Ketanji Brown Jackson",
k=4,
score_threshold=0.80,
)
from pprint import pprint
pprint(docs_and_scores)
[(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youβre at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, Iβd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyerβan Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nationβs top legal minds, who will continue Justice Breyerβs legacy of excellence.', metadata={'source': 'C:\\repos\\langchain-fruocco-acs\\langchain\\docs\\extras\\modules\\state_of_the_union.txt'}),
0.8441472),
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youβre at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, Iβd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyerβan Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nationβs top legal minds, who will continue Justice Breyerβs legacy of excellence.', metadata={'source': 'C:\\repos\\langchain-fruocco-acs\\langchain\\docs\\extras\\modules\\state_of_the_union.txt'}),
0.8441472),
(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since sheβs been nominated, sheβs received a broad range of supportβfrom the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, weβve installed new technology like cutting-edge scanners to better detect drug smuggling. \n\nWeβve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n\nWeβre putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWeβre securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'C:\\repos\\langchain-fruocco-acs\\langchain\\docs\\extras\\modules\\state_of_the_union.txt'}),
0.82153815),
(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since sheβs been nominated, sheβs received a broad range of supportβfrom the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, weβve installed new technology like cutting-edge scanners to better detect drug smuggling. \n\nWeβve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n\nWeβre putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWeβre securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'C:\\repos\\langchain-fruocco-acs\\langchain\\docs\\extras\\modules\\state_of_the_union.txt'}),
0.82153815)]
Perform a Hybrid Searchβ
Execute hybrid search using the search_type or hybrid_search() method:
# Perform a hybrid search
docs = vector_store.similarity_search(
query="What did the president say about Ketanji Brown Jackson",
k=3,
search_type="hybrid",
)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youβre at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, Iβd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyerβan Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nationβs top legal minds, who will continue Justice Breyerβs legacy of excellence.
# Perform a hybrid search
docs = vector_store.hybrid_search(
query="What did the president say about Ketanji Brown Jackson", k=3
)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youβre at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, Iβd like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyerβan Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nationβs top legal minds, who will continue Justice Breyerβs legacy of excellence.
Create a new index with custom filterable fields
from azure.search.documents.indexes.models import (
ScoringProfile,
SearchableField,
SearchField,
SearchFieldDataType,
SimpleField,
TextWeights,
)
embeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)
embedding_function = embeddings.embed_query
fields = [
SimpleField(
name="id",
type=SearchFieldDataType.String,
key=True,
filterable=True,
),
SearchableField(
name="content",
type=SearchFieldDataType.String,
searchable=True,
),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=len(embedding_function("Text")),
vector_search_configuration="default",
),
SearchableField(
name="metadata",
type=SearchFieldDataType.String,
searchable=True,
),
# Additional field to store the title
SearchableField(
name="title",
type=SearchFieldDataType.String,
searchable=True,
),
# Additional field for filtering on document source
SimpleField(
name="source",
type=SearchFieldDataType.String,
filterable=True,
),
]
index_name: str = "langchain-vector-demo-custom"
vector_store: AzureSearch = AzureSearch(
azure_search_endpoint=vector_store_address,
azure_search_key=vector_store_password,
index_name=index_name,
embedding_function=embedding_function,
fields=fields,
)
Perform a query with a custom filter
# Data in the metadata dictionary with a corresponding field in the index will be added to the index
# In this example, the metadata dictionary contains a title, a source and a random field
# The title and the source will be added to the index as separate fields, but the random won't. (as it is not defined in the fields list)
# The random field will be only stored in the metadata field
vector_store.add_texts(
["Test 1", "Test 2", "Test 3"],
[
{"title": "Title 1", "source": "A", "random": "10290"},
{"title": "Title 2", "source": "A", "random": "48392"},
{"title": "Title 3", "source": "B", "random": "32893"},
],
)
res = vector_store.similarity_search(query="Test 3 source1", k=3, search_type="hybrid")
res
[Document(page_content='Test 3', metadata={'title': 'Title 3', 'source': 'B', 'random': '32893'}),
Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'A', 'random': '10290'}),
Document(page_content='Test 2', metadata={'title': 'Title 2', 'source': 'A', 'random': '48392'})]
res = vector_store.similarity_search(
query="Test 3 source1", k=3, search_type="hybrid", filters="source eq 'A'"
)
res
[Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'A', 'random': '10290'}),
Document(page_content='Test 2', metadata={'title': 'Title 2', 'source': 'A', 'random': '48392'})]
Create a new index with a Scoring Profile
from azure.search.documents.indexes.models import (
FreshnessScoringFunction,
FreshnessScoringParameters,
ScoringProfile,
SearchableField,
SearchField,
SearchFieldDataType,
SimpleField,
TextWeights,
)
embeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)
embedding_function = embeddings.embed_query
fields = [
SimpleField(
name="id",
type=SearchFieldDataType.String,
key=True,
filterable=True,
),
SearchableField(
name="content",
type=SearchFieldDataType.String,
searchable=True,
),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=len(embedding_function("Text")),
vector_search_configuration="default",
),
SearchableField(
name="metadata",
type=SearchFieldDataType.String,
searchable=True,
),
# Additional field to store the title
SearchableField(
name="title",
type=SearchFieldDataType.String,
searchable=True,
),
# Additional field for filtering on document source
SimpleField(
name="source",
type=SearchFieldDataType.String,
filterable=True,
),
# Additional data field for last doc update
SimpleField(
name="last_update",
type=SearchFieldDataType.DateTimeOffset,
searchable=True,
filterable=True,
),
]
# Adding a custom scoring profile with a freshness function
sc_name = "scoring_profile"
sc = ScoringProfile(
name=sc_name,
text_weights=TextWeights(weights={"title": 5}),
function_aggregation="sum",
functions=[
FreshnessScoringFunction(
field_name="last_update",
boost=100,
parameters=FreshnessScoringParameters(boosting_duration="P2D"),
interpolation="linear",
)
],
)
index_name = "langchain-vector-demo-custom-scoring-profile"
vector_store: AzureSearch = AzureSearch(
azure_search_endpoint=vector_store_address,
azure_search_key=vector_store_password,
index_name=index_name,
embedding_function=embeddings.embed_query,
fields=fields,
scoring_profiles=[sc],
default_scoring_profile=sc_name,
)
# Adding same data with different last_update to show Scoring Profile effect
from datetime import datetime, timedelta
today = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S-00:00")
yesterday = (datetime.utcnow() - timedelta(days=1)).strftime("%Y-%m-%dT%H:%M:%S-00:00")
one_month_ago = (datetime.utcnow() - timedelta(days=30)).strftime(
"%Y-%m-%dT%H:%M:%S-00:00"
)
vector_store.add_texts(
["Test 1", "Test 1", "Test 1"],
[
{
"title": "Title 1",
"source": "source1",
"random": "10290",
"last_update": today,
},
{
"title": "Title 1",
"source": "source1",
"random": "48392",
"last_update": yesterday,
},
{
"title": "Title 1",
"source": "source1",
"random": "32893",
"last_update": one_month_ago,
},
],
)
['NjQyNTI5ZmMtNmVkYS00Njg5LTk2ZDgtMjM3OTY4NTJkYzFj',
'M2M0MGExZjAtMjhiZC00ZDkwLThmMTgtODNlN2Y2ZDVkMTMw',
'ZmFhMDE1NzMtMjZjNS00MTFiLTk0MTEtNGRkYjgwYWQwOTI0']
res = vector_store.similarity_search(query="Test 1", k=3, search_type="similarity")
res
[Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'source1', 'random': '10290', 'last_update': '2023-07-13T10:47:39-00:00'}),
Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'source1', 'random': '48392', 'last_update': '2023-07-12T10:47:39-00:00'}),
Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'source1', 'random': '32893', 'last_update': '2023-06-13T10:47:39-00:00'})]