Skip to content

Commit ccf8f60

Browse files
authored
2 parents 083f487 + 616eb08 commit ccf8f60

File tree

80 files changed

+5534
-3556
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+5534
-3556
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
.DS_Store
12
# Byte-compiled / optimized / DLL files
23
__pycache__/
34
*.py[cod]
@@ -167,4 +168,4 @@ google-cloud-cli-469.0.0-linux-x86_64.tar.gz
167168
/backend/src/chunks
168169
/backend/merged_files
169170
/backend/chunks
170-
google-cloud-cli-479.0.0-linux-x86_64.tar.gz
171+
google-cloud-cli-479.0.0-linux-x86_64.tar.gz

README.md

+23-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ DIFFBOT_API_KEY="your-diffbot-key"
4040

4141
if you only want OpenAI:
4242
```env
43-
LLM_MODELS="gpt-3.5,gpt-4o"
43+
LLM_MODELS="diffbot,openai-gpt-3.5,openai-gpt-4o"
4444
OPENAI_API_KEY="your-openai-key"
4545
```
4646

@@ -70,6 +70,18 @@ GOOGLE_CLIENT_ID="xxxx"
7070

7171
You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don't want/need.
7272

73+
### Chat Modes
74+
75+
By default,all of the chat modes will be available: vector, graph+vector and graph.
76+
If none of the mode is mentioned in the chat modes variable all modes will be available:
77+
```env
78+
CHAT_MODES=""
79+
```
80+
81+
If however you want to specifiy the only vector mode or only graph mode you can do that by specifying the mode in the env:
82+
```env
83+
CHAT_MODES="vector,graph+vector"
84+
```
7385

7486
#### Running Backend and Frontend separately (dev environment)
7587
Alternatively, you can run the backend and frontend separately:
@@ -134,12 +146,21 @@ Allow unauthenticated request : Yes
134146
| BACKEND_API_URL | Optional | http://localhost:8000 | URL for backend API |
135147
| BLOOM_URL | Optional | https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true | URL for Bloom visualization |
136148
| REACT_APP_SOURCES | Optional | local,youtube,wiki,s3 | List of input sources that will be available |
137-
| LLM_MODELS | Optional | diffbot,gpt-3.5,gpt-4o | Models available for selection on the frontend, used for entities extraction and Q&A Chatbot |
149+
| LLM_MODELS | Optional | diffbot,openai-gpt-3.5,openai-gpt-4o | Models available for selection on the frontend, used for entities extraction and Q&A
150+
| CHAT_MODES | Optional | vector,graph+vector,graph | Chat modes available for Q&A
138151
| ENV | Optional | DEV | Environment variable for the app |
139152
| TIME_PER_CHUNK | Optional | 4 | Time per chunk for processing |
140153
| CHUNK_SIZE | Optional | 5242880 | Size of each chunk of file for upload |
141154
| GOOGLE_CLIENT_ID | Optional | | Client ID for Google authentication |
142155
| GCS_FILE_CACHE | Optional | False | If set to True, will save the files to process into GCS. If set to False, will save the files locally |
156+
| ENTITY_EMBEDDING | Optional | False | If set to True, It will add embeddings for each entity in database |
157+
| LLM_MODEL_CONFIG_azure_ai_<azure_deployment_name> | Optional | | Set azure config as - azure_deployment_name,azure_endpoint or base_url,azure_api_key,api_version|
158+
| LLM_MODEL_CONFIG_groq_<model_name> | Optional | | Set groq config as - model_name,base_url,groq_api_key |
159+
| LLM_MODEL_CONFIG_anthropic_<model_name> | Optional | | Set anthropic config as - model_name,anthropic_api_key |
160+
| LLM_MODEL_CONFIG_fireworks_<model_name> | Optional | | Set fireworks config as - model_name,fireworks_api_key |
161+
| LLM_MODEL_CONFIG_bedrock_<model_name> | Optional | | Set bedrock config as - model_name,aws_access_key_id,aws_secret__access_key,region_name |
162+
| LLM_MODEL_CONFIG_ollama_<model_name> | Optional | | Set ollama config as - model_name,model_local_url |
163+
143164

144165

145166

backend/Dockerfile

+2-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ RUN apt-get update && \
1616
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
1717
# Copy requirements file and install Python dependencies
1818
COPY requirements.txt /code/
19-
RUN pip install --no-cache-dir --upgrade -r requirements.txt
19+
# --no-cache-dir --upgrade
20+
RUN pip install -r requirements.txt
2021
# Copy application code
2122
COPY . /code
2223
# Set command

backend/example.env

+13-2
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,16 @@ LANGCHAIN_PROJECT = ""
2121
LANGCHAIN_TRACING_V2 = ""
2222
LANGCHAIN_ENDPOINT = ""
2323
GCS_FILE_CACHE = "" #save the file into GCS or local, SHould be True or False
24-
NEO4J_USER_AGENT = ""
25-
ENABLE_USER_AGENT = ""
24+
NEO4J_USER_AGENT=""
25+
ENABLE_USER_AGENT = ""
26+
LLM_MODEL_CONFIG_model_version=""
27+
ENTITY_EMBEDDING="" True or False
28+
#examples
29+
LLM_MODEL_CONFIG_azure_ai_gpt_35="azure_deployment_name,azure_endpoint or base_url,azure_api_key,api_version"
30+
LLM_MODEL_CONFIG_azure_ai_gpt_4o="gpt-4o,https://YOUR-ENDPOINT.openai.azure.com/,azure_api_key,api_version"
31+
LLM_MODEL_CONFIG_groq_llama3_70b="model_name,base_url,groq_api_key"
32+
LLM_MODEL_CONFIG_anthropic_claude_3_5_sonnet="model_name,anthropic_api_key"
33+
LLM_MODEL_CONFIG_fireworks_llama_v3_70b="model_name,fireworks_api_key"
34+
LLM_MODEL_CONFIG_bedrock_claude_3_5_sonnet="model_name,aws_access_key_id,aws_secret__access_key,region_name"
35+
LLM_MODEL_CONFIG_ollama_llama3="model_name,model_local_url"
36+

backend/requirements.txt

+49-43
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ asyncio==3.4.3
88
attrs==23.2.0
99
backoff==2.2.1
1010
beautifulsoup4==4.12.3
11-
boto3
12-
botocore
11+
boto3==1.34.140
12+
botocore==1.34.140
1313
cachetools==5.3.3
1414
certifi==2024.2.2
1515
cffi==1.16.0
@@ -28,8 +28,8 @@ docstring_parser==0.16
2828
effdet==0.4.1
2929
emoji==2.10.1
3030
exceptiongroup==1.2.0
31-
fastapi
32-
fastapi-health
31+
fastapi==0.111.0
32+
fastapi-health==0.4.0
3333
filelock==3.13.1
3434
filetype==1.2.0
3535
flatbuffers==23.5.26
@@ -38,24 +38,24 @@ frozenlist==1.4.1
3838
fsspec==2024.2.0
3939
google-api-core==2.18.0
4040
google-auth==2.29.0
41-
google_auth_oauthlib
42-
google-cloud-aiplatform
41+
google_auth_oauthlib==1.2.0
42+
google-cloud-aiplatform==1.58.0
4343
google-cloud-bigquery==3.19.0
4444
google-cloud-core==2.4.1
4545
google-cloud-resource-manager==1.12.3
46-
google-cloud-storage
46+
google-cloud-storage==2.17.0
4747
google-crc32c==1.5.0
4848
google-resumable-media==2.7.0
4949
googleapis-common-protos==1.63.0
5050
greenlet==3.0.3
5151
grpc-google-iam-v1==0.13.0
5252
grpcio==1.62.1
53-
google-ai-generativelanguage
53+
google-ai-generativelanguage==0.6.6
5454
grpcio-status==1.62.1
5555
h11==0.14.0
5656
httpcore==1.0.4
5757
httpx==0.27.0
58-
huggingface-hub==0.20.3
58+
huggingface-hub
5959
humanfriendly==10.0
6060
idna==3.6
6161
importlib-resources==6.1.1
@@ -67,21 +67,25 @@ joblib==1.3.2
6767
jsonpatch==1.33
6868
jsonpath-python==1.0.6
6969
jsonpointer==2.4
70+
json-repair==0.25.2
7071
kiwisolver==1.4.5
71-
langchain
72-
langchain-google-genai
73-
langchain-community
74-
langchain-core
75-
langchain-experimental
76-
langchain-google-vertexai
77-
langchain-groq
78-
langchain-openai
79-
langchain-text-splitters==0.0.1
72+
langchain==0.2.6
73+
langchain-aws==0.1.9
74+
langchain-anthropic==0.1.19
75+
langchain-fireworks==0.1.4
76+
langchain-google-genai==1.0.7
77+
langchain-community==0.2.6
78+
langchain-core==0.2.10
79+
langchain-experimental==0.0.62
80+
langchain-google-vertexai==1.0.6
81+
langchain-groq==0.1.6
82+
langchain-openai==0.1.14
83+
langchain-text-splitters==0.2.2
8084
langdetect==1.0.9
81-
langsmith==0.1.31
85+
langsmith==0.1.83
8286
layoutparser==0.3.4
83-
langserve
84-
langchain-cli
87+
langserve==0.2.2
88+
#langchain-cli==0.0.25
8589
lxml==5.1.0
8690
MarkupSafe==2.1.5
8791
marshmallow==3.20.2
@@ -94,9 +98,9 @@ networkx==3.2.1
9498
nltk==3.8.1
9599
numpy==1.26.4
96100
omegaconf==2.3.0
97-
onnx==1.15.0
98-
onnxruntime==1.15.1
99-
openai==1.14.2
101+
onnx==1.16.1
102+
onnxruntime==1.18.1
103+
openai==1.35.10
100104
opencv-python==4.8.0.76
101105
orjson==3.9.15
102106
packaging==23.2
@@ -110,15 +114,16 @@ pillow_heif==0.15.0
110114
portalocker==2.8.2
111115
proto-plus==1.23.0
112116
protobuf==4.23.4
117+
psutil==6.0.0
113118
pyasn1==0.6.0
114119
pyasn1_modules==0.4.0
115120
pycocotools==2.0.7
116121
pycparser==2.21
117-
pydantic==2.6.4
118-
pydantic_core==2.16.3
122+
pydantic==2.8.2
123+
pydantic_core==2.20.1
119124
pyparsing==3.0.9
120125
pypdf==4.0.1
121-
PyPDF2
126+
PyPDF2==3.0.1
122127
pypdfium2==4.27.0
123128
pytesseract==0.3.10
124129
python-dateutil==2.8.2
@@ -131,44 +136,45 @@ pytz==2024.1
131136
PyYAML==6.0.1
132137
rapidfuzz==3.6.1
133138
regex==2023.12.25
134-
requests
139+
requests==2.32.3
135140
rsa==4.9
136141
s3transfer==0.10.1
137-
safetensors==0.3.2
142+
safetensors==0.4.1
138143
scipy==1.10.1
139144
shapely==2.0.3
140145
six==1.16.0
141146
sniffio==1.3.1
142147
soupsieve==2.5
143148
SQLAlchemy==2.0.28
144-
starlette==0.36.3
145-
starlette-session
149+
starlette==0.37.2
150+
sse-starlette==2.1.2
151+
starlette-session==0.4.3
146152
sympy==1.12
147153
tabulate==0.9.0
148154
tenacity==8.2.3
149-
tiktoken==0.6.0
155+
tiktoken==0.7.0
150156
timm==0.9.12
151-
tokenizers==0.15.2
157+
tokenizers==0.19
152158
tqdm==4.66.2
153-
transformers==4.37.1
159+
transformers==4.42.3
154160
types-protobuf
155161
types-requests
156162
typing-inspect==0.9.0
157163
typing_extensions==4.9.0
158164
tzdata==2024.1
159-
unstructured
160-
unstructured-client
161-
unstructured-inference
162-
unstructured.pytesseract
163-
unstructured[all-docs]
164-
urllib3
165-
uvicorn
166-
gunicorn
165+
unstructured==0.14.9
166+
unstructured-client==0.23.8
167+
unstructured-inference==0.7.36
168+
unstructured.pytesseract==0.3.12
169+
unstructured[all-docs]==0.14.9
170+
urllib3==2.2.2
171+
uvicorn==0.30.1
172+
gunicorn==22.0.0
167173
wikipedia==1.4.0
168174
wrapt==1.16.0
169175
yarl==1.9.4
170176
youtube-transcript-api==0.6.2
171177
zipp==3.17.0
172-
sentence-transformers
178+
sentence-transformers==2.7.0
173179
google-cloud-logging==3.10.0
174180
PyMuPDF==1.24.5

0 commit comments

Comments
 (0)