Skip to content

Commit 934f147

Browse files
author
Nupur Lal
committed
new recipe notebook for vectordistance function
1 parent 50f7093 commit 934f147

File tree

1 file changed

+316
-0
lines changed

1 file changed

+316
-0
lines changed
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "bc549e6c-0cc4-4188-94a3-a9bdd3ae3dfa",
6+
"metadata": {},
7+
"source": [
8+
"<header>\n",
9+
" <p style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>\n",
10+
" VectorDistance function in Vantage\n",
11+
" <br>\n",
12+
" <img id=\"teradata-logo\" src=\"https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg\" alt=\"Teradata\" style=\"width: 125px; height: auto; margin-top: 20pt;\">\n",
13+
" </p>\n",
14+
"</header>"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"id": "7ae7611a-0795-4168-b716-01fee6880cbd",
20+
"metadata": {},
21+
"source": [
22+
"<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>\n",
23+
"<p style = 'font-size:16px;font-family:Arial'>VectorDistance computes similarity or dissimilarity between two vectors in multi-dimensional space. VectorDistance also supports distance/similarity computation for embeddings which are supported by the Vector data type. The distance between vectors is usually calculated using a distance metric, such as Euclidean, Manhattan, DotProduct, Minkowski, or Cosine. It takes a table of target vectors, and a table of reference vectors and returns a table that contains the distance between target-reference pairs. The function computes the distance between the target pair and the reference pair from the same table if you provide only one table as the input.<br> In this notebook we will see how we can use the VectorDistance function available in Vantage.</p>"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"id": "6b3a00b4-6661-4c91-9b2d-cb7b0b403140",
29+
"metadata": {},
30+
"source": [
31+
"<hr style=\"height:2px;border:none;\">\n",
32+
"<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"id": "2346857f-e0d3-488a-8a3f-ac6dff752c2b",
38+
"metadata": {},
39+
"source": [
40+
"<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required)."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"id": "c5af5af3-29d5-4f6a-8334-9df6924e7787",
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"from teradataml import *\n",
51+
"\n",
52+
"# Modify the following to match the specific client environment settings\n",
53+
"display.max_rows = 5"
54+
]
55+
},
56+
{
57+
"cell_type": "markdown",
58+
"id": "ad3dd7b4-831c-4fb3-ab71-719c8c99a71c",
59+
"metadata": {},
60+
"source": [
61+
"<hr style=\"height:1px;border:none;\">\n",
62+
"<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>\n",
63+
"<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>"
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"id": "2742444c-4349-4b0f-b4e5-b068a8785cd9",
70+
"metadata": {},
71+
"outputs": [],
72+
"source": [
73+
"%run -i ../../UseCases/startup.ipynb\n",
74+
"eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n",
75+
"print(eng)"
76+
]
77+
},
78+
{
79+
"cell_type": "code",
80+
"execution_count": null,
81+
"id": "e14915b0-7932-4e03-94ba-20f0599c3707",
82+
"metadata": {},
83+
"outputs": [],
84+
"source": [
85+
"%%capture\n",
86+
"execute_sql('''SET query_band='DEMO=PP_VectorDistance_Python.ipynb;' UPDATE FOR SESSION; ''')"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "efe2fd2d-63ff-4278-9157-8b9110d682e8",
92+
"metadata": {},
93+
"source": [
94+
"<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>"
95+
]
96+
},
97+
{
98+
"cell_type": "markdown",
99+
"id": "f003f332-7489-4bdd-a740-4af2a0a22280",
100+
"metadata": {},
101+
"source": [
102+
"<hr style='height:1px;border:none;'>\n",
103+
"\n",
104+
"<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>\n",
105+
"\n",
106+
"<p style = 'font-size:16px;font-family:Arial'>Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.</p>"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"id": "45c86176-734c-4b1c-ace0-d0c88657b4f8",
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"load_example_data(\"vectordistance\", [\"target_mobile_data_dense\", \"ref_mobile_data_dense\"])"
117+
]
118+
},
119+
{
120+
"cell_type": "markdown",
121+
"id": "2401d6d3-4fcd-46fc-8a94-7cafcd1258b0",
122+
"metadata": {},
123+
"source": [
124+
"<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>"
125+
]
126+
},
127+
{
128+
"cell_type": "code",
129+
"execution_count": null,
130+
"id": "87429200-db02-450d-9472-4d1e2030124d",
131+
"metadata": {},
132+
"outputs": [],
133+
"source": [
134+
"%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds"
135+
]
136+
},
137+
{
138+
"cell_type": "markdown",
139+
"id": "2a3762ac-ba27-4fa3-adba-d577262a4290",
140+
"metadata": {},
141+
"source": [
142+
"<hr style=\"height:2px;border:none;\">\n",
143+
"<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>\n",
144+
"<p style = 'font-size:16px;font-family:Arial'>Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>"
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": null,
150+
"id": "3d936fab-7ca7-4e94-ba64-95c1da08b74f",
151+
"metadata": {},
152+
"outputs": [],
153+
"source": [
154+
"target_mobile_data_dense=DataFrame(\"target_mobile_data_dense\")\n",
155+
"ref_mobile_data_dense=DataFrame(\"ref_mobile_data_dense\")"
156+
]
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": null,
161+
"id": "3c726cb7-02ba-4874-a04c-65a1c67286b9",
162+
"metadata": {},
163+
"outputs": [],
164+
"source": [
165+
"target_mobile_data_dense"
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": null,
171+
"id": "b85f3dd0-9c41-4b56-b989-866a195e8c07",
172+
"metadata": {},
173+
"outputs": [],
174+
"source": [
175+
"ref_mobile_data_dense"
176+
]
177+
},
178+
{
179+
"cell_type": "markdown",
180+
"id": "0d0adaf2-461e-48ff-87ce-b6038db8254a",
181+
"metadata": {},
182+
"source": [
183+
"<p style = 'font-size:16px;font-family:Arial'>Let us find the vectordistance between the target and reference datasets.<br>Detailed help can be found by passing function name to built-in help function.</p>"
184+
]
185+
},
186+
{
187+
"cell_type": "code",
188+
"execution_count": null,
189+
"id": "d413a344-a12d-46ae-a27b-6702733387c4",
190+
"metadata": {},
191+
"outputs": [],
192+
"source": [
193+
"help(VectorDistance)"
194+
]
195+
},
196+
{
197+
"cell_type": "code",
198+
"execution_count": null,
199+
"id": "8db7efa9-25d9-4da9-a142-ceeb29a9273e",
200+
"metadata": {},
201+
"outputs": [],
202+
"source": [
203+
"# Compute the cosine, euclidean, manhattan distance between the target and reference vectors.\n",
204+
"VectorDistance_out = VectorDistance(target_id_column=\"userid\",\n",
205+
" target_feature_columns=['CallDuration', 'DataCounter', 'SMS'],\n",
206+
" ref_id_column=\"userid\",\n",
207+
" ref_feature_columns=['CallDuration', 'DataCounter', 'SMS'],\n",
208+
" distance_measure=['Cosine', 'Euclidean', 'Manhattan'],\n",
209+
" topk=2,\n",
210+
" target_data=target_mobile_data_dense,\n",
211+
" reference_data=ref_mobile_data_dense)\n",
212+
"\n",
213+
"# Print the result DataFrame.\n",
214+
"VectorDistance_out.result"
215+
]
216+
},
217+
{
218+
"cell_type": "markdown",
219+
"id": "151d5db4-29a9-49d9-8a61-d53f9627a294",
220+
"metadata": {},
221+
"source": [
222+
"<hr style=\"height:2px;border:none;\">\n",
223+
"<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>"
224+
]
225+
},
226+
{
227+
"cell_type": "markdown",
228+
"id": "a562f058-fb24-4966-a25d-f2960e6ddfb8",
229+
"metadata": {},
230+
"source": [
231+
"<hr style=\"height:1px;border:none;\">\n",
232+
"<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>\n",
233+
"<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>"
234+
]
235+
},
236+
{
237+
"cell_type": "code",
238+
"execution_count": null,
239+
"id": "e6b3935b-47c2-4a96-bec2-68106d172116",
240+
"metadata": {},
241+
"outputs": [],
242+
"source": [
243+
"db_drop_table(\"target_mobile_data_dense\")"
244+
]
245+
},
246+
{
247+
"cell_type": "code",
248+
"execution_count": null,
249+
"id": "01e9eb6e-70d8-40a7-8386-9fc51c5f5cab",
250+
"metadata": {},
251+
"outputs": [],
252+
"source": [
253+
"db_drop_table(\"ref_mobile_data_dense\")"
254+
]
255+
},
256+
{
257+
"cell_type": "code",
258+
"execution_count": null,
259+
"id": "157fe3d4-4e0e-4d92-b343-9f758f3bf690",
260+
"metadata": {},
261+
"outputs": [],
262+
"source": [
263+
"remove_context()"
264+
]
265+
},
266+
{
267+
"cell_type": "markdown",
268+
"id": "4317a6cf-1479-4aa8-b30a-ee0a3b5231a8",
269+
"metadata": {},
270+
"source": [
271+
"<hr style=\"height:1px;border:none;\">\n",
272+
"<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>\n",
273+
"<ul style = 'font-size:16px;font-family:Arial'>\n",
274+
" <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>\n",
275+
" <li>VectorDistance function reference: <a href = 'https://docs.teradata.com/search/all?query=VectorDistance&content-lang=en-US'>here</a></li>\n",
276+
"</ul>"
277+
]
278+
},
279+
{
280+
"cell_type": "markdown",
281+
"id": "b2dcca28-5de5-44d7-88cb-45a12153b3f8",
282+
"metadata": {},
283+
"source": [
284+
"<footer style=\"padding-bottom:35px; border-bottom:3px solid #91A0Ab\">\n",
285+
" <div style=\"float:left;margin-top:14px\">ClearScape Analytics™</div>\n",
286+
" <div style=\"float:right;\">\n",
287+
" <div style=\"float:left; margin-top:14px\">\n",
288+
" Copyright © Teradata Corporation - 2025. All Rights Reserved\n",
289+
" </div>\n",
290+
" </div>\n",
291+
"</footer>"
292+
]
293+
}
294+
],
295+
"metadata": {
296+
"kernelspec": {
297+
"display_name": "Python 3 (ipykernel)",
298+
"language": "python",
299+
"name": "python3"
300+
},
301+
"language_info": {
302+
"codemirror_mode": {
303+
"name": "ipython",
304+
"version": 3
305+
},
306+
"file_extension": ".py",
307+
"mimetype": "text/x-python",
308+
"name": "python",
309+
"nbconvert_exporter": "python",
310+
"pygments_lexer": "ipython3",
311+
"version": "3.9.10"
312+
}
313+
},
314+
"nbformat": 4,
315+
"nbformat_minor": 5
316+
}

0 commit comments

Comments
 (0)