Skip to content

Guide/s3 integration with meltano #2206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/generalTemplates/_airbyte_s3_destination_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ import TabItem from "@theme/TabItem"

5. Click `Test and save` and wait for Airbyte to confirm the Destination is set up correctly.


</TabItem>

<TabItem value="terraform" label="Terraform">
Expand Down Expand Up @@ -71,4 +70,4 @@ variable "workspace_id" {

</TabItem>

</Tabs>
</Tabs>
107 changes: 107 additions & 0 deletions docs/generalTemplates/_generic_data_model_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import Tabs from "@theme/Tabs"
import TabItem from "@theme/TabItem"

## Data model setup

### Figure out the target schema and mapping

To define the data model, you will need to know the schema of the data you want to ingest.
If you are unsure about the schema that the connector extracts, you can always set up the connection, and during the **stream selection** step, review the expected schema.

Alternatively, you can set up the connection and start the sync, then download the extracted files from S3, review them, and construct the appropriate blueprints and mappings.

:::tip Important
If you set up a connection to S3 before setting the target blueprints and mappings, you will have to execute a "resync" after the resources in Port have been properly set up.
:::

**To download the extracted S3 files:**

<Tabs groupId="Download files extracted to S3" queryString values={[{label: "AWS CLI", value: "aws_cli"},{label: "Python (Boto3)", value: "python_boto3"}]}>

<TabItem value="aws_cli" label="AWS CLI">

1. Install AWS CLI:
Download and install the AWS CLI from [AWS’s official page](https://aws.amazon.com/cli/).

2. Configure Your Credentials:
Run the command below and input your `ACCESS_KEY`, `SECRET_KEY`, and `region`:

```code showLineNumbers
aws configure
```

Alternatively, you can set the environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_DEFAULT_REGION`.

3. Download Files from S3:
Use the following command, replacing the placeholders with your bucket name and file prefix:

```code showLineNumbers
aws s3 cp s3://<bucket-name>/<file-prefix> ./local-folder --recursive
```
for example:
``` code showLineNumbers
aws s3 cp s3://org-XXX/data/abc123/ ./my_extracted_data --recursive
```

This command copies all files that start with your specified prefix into the local folder (create it if needed).
</TabItem>
<TabItem value="python_boto3" label="Python (Boto3)">

Run the following command to install boto3 if you haven’t already:

``` code showLineNumbers
pip install boto3
```

Copy and paste this code into a file (e.g., download_s3.py), replacing the placeholders with your actual details:

``` code showLineNumbers
import boto3

# Initialize the S3 client with your credentials and region
s3 = boto3.client(
's3',
aws_access_key_id='YOUR_ACCESS_KEY_ID',
aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
region_name='YOUR_REGION'
)

bucket_name = 'your-bucket-name'
prefix = 'your/file/prefix/' # Ensure this ends with '/' if you want a folder-like behavior

# List objects within the specified prefix
response = s3.list_objects_v2(Bucket=bucket_name, Prefix=prefix)

# Download each file found
for obj in response.get('Contents', []):
key = obj['Key']
# Define a local filename (you might want to recreate the directory structure)
local_filename = key.split('/')[-1]
print(f"Downloading {key} to {local_filename}...")
s3.download_file(bucket_name, key, local_filename)
```

Execute your script from the terminal:

``` code showLineNumbers
python download_s3.py
```

</TabItem>

</Tabs>

Once the files are in your local device you can use your preferred text editor to review it's content and
construct the appropriate blueprints and mappings for your data.

### Create blueprints

Once you have decided on the desired blueprints you wish to set up, you can refer to the [blueprint creation docs](https://docs.port.io/build-your-software-catalog/customize-integrations/configure-data-model/setup-blueprint/?definition=ui) to set them up in your account.

### Create webhook integration

Once you have decided on the mappings you wish to set up, you can refer to the [webhook creation docs](https://docs.port.io/build-your-software-catalog/custom-integration/webhook/) to set them up in your portal.

:::tip Important
It is important that you use the generated webhook URL when setting up the Connection, otherwise the data will not be automatically ingested into Port from S3.
:::
39 changes: 39 additions & 0 deletions docs/generalTemplates/_meltano_prerequisites.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import Tabs from "@theme/Tabs"
import TabItem from "@theme/TabItem"

- Ensure you have a Port account and have completed the [onboarding process](https://docs.port.io/quickstart).

- This feature is part of Port's limited-access offering. To obtain the required S3 bucket, please contact our team directly via chat, [Slack](https://www.getport.io/community), or [e-mail](mailto:support@getport.io), and we will create and manage the bucket on your behalf.

- Access to an available Meltano app - for reference, follow the [quick start guide](https://docs.meltano.com/getting-started/installation), or follow the following steps:

<Tabs groupId="Install Meltano" queryString values={[{label: "shell", value: "shell"}]}>
<TabItem value="shell" label="shell">

1. Install python3

```shell
brew install python3
```

2. Create a python virtual env:

```shell
python -m venv .venv
source .venv/bin/activate
```

3. Install meltano & follow installation instructions

```shell
pip install meltano
```

4. Change to meltano project

```shell
cd <name_of_project>
```

</TabItem>
</Tabs>
43 changes: 43 additions & 0 deletions docs/generalTemplates/_meltano_s3_destination_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import Tabs from "@theme/Tabs"
import TabItem from "@theme/TabItem"

Meltano provides detailed [documentation](https://hub.meltano.com/loaders/target-s3) on how to generate/receive the appropriate credentials to set the s3-target loader.
Once the appropriate credentials are prepared, you may set up the meltano extractor:

<Tabs groupId="Install Meltano S3 Loader" queryString values={[{label: "shell", value: "shell"}]}>
<TabItem value="shell" label="shell">

1. Navigate to your meltano environment:

```shell showLineNumbers
cd path/to/your/meltano/project/
```

2. Install the source plugin you wish to extract data from:

```shell
meltano add loader target-s3
```

3. Configure the plugin using the interactive CLI prompt:

```shell
meltano config target-s3 set --interactive
```

Or set the configuration parameters individually using the CLI:

```shell
# required
meltano config target-s3 set cloud_provider.aws.aws_access_key_id $AWS_ACCESS_KEY_ID
meltano config target-s3 set cloud_provider.aws.aws_secret_access_key $AWS_SECRET_ACCESS_KEY
meltano config target-s3 set cloud_provider.aws.aws_bucket $AWS_BUCKET
meltano config target-s3 set cloud_provider.aws.aws_region $AWS_REGION
# recommended
meltano config target-s3 set append_date_to_filename_grain microsecond
meltano config target-s3 set partition_name_enabled true
meltano config target-s3 set prefix 'data/'
```

</TabItem>
</Tabs>
2 changes: 1 addition & 1 deletion docs/generalTemplates/_s3_integrations_disclaimer.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ mechanism to remove it from Port. The record simply won’t appear in future syn

If the data includes a flag for deleted records (e.g., is_deleted: "true"), you can configure a webhook delete operation
in your [webhook’s mapping configuration](/build-your-software-catalog/custom-integration/webhook/#configuration-structure) to remove these records from Port automatically.
:::
:::
Loading