Skip to content

Latest commit

 

History

History
280 lines (199 loc) · 11.1 KB

quick-start-with-docker.md

File metadata and controls

280 lines (199 loc) · 11.1 KB

Quick Start with Docker

To try PostgresML on your system for the first time, Docker is a great tool to get you started quicky. We've prepared a Docker image that comes with the latest version of PostgresML and all of its dependencies. If you have Nvidia GPUs on your machine, you'll also be able to use GPU acceleration.

!!! tip

If you're looking to get started with PostgresML as quickly as possible, sign up for our free serverless cloud. You'll get a database in seconds, and will be able to use all the latest Hugging Face models on modern GPUs.

!!!

Get Started

=== "macOS"

docker run \
    -it \
    -v postgresml_data:/var/lib/postgresql \
    -p 5433:5432 \
    -p 8000:8000 \
    ghcr.io/postgresml/postgresml:2.7.3 \
    sudo -u postgresml psql -d postgresml

=== "Linux with GPUs"

Make sure you have Cuda, the Cuda container toolkit, and matching graphics drivers installed. You can install everything from Nvidia.

On Ubuntu, you can install everything with:

sudo apt install -y \
    cuda \
    cuda-container-toolkit

To run the container with GPU capabilities:

docker run \
    -it \
    -v postgresml_data:/var/lib/postgresql \
    --gpus all \
    -p 5433:5432 \
    -p 8000:8000 \
    ghcr.io/postgresml/postgresml:2.7.3 \
    sudo -u postgresml psql -d postgresml

If your machine doesn't have a GPU, just omit the --gpus all option, and the container will start and use the CPU instead.

=== "Windows"

Install WSL and Docker Desktop. You can then use Linux with GPUs instructions. GPU support is included, make sure to enable CUDA.

===

Once the container is running, setting up PostgresML is as simple as creating the extension and running a few queries to make sure everything is working correctly.

!!! generic

!!! code_block time="41.520ms"

CREATE EXTENSION IF NOT EXISTS pgml;
SELECT pgml.version();

!!!

!!! results

postgresml=# CREATE EXTENSION IF NOT EXISTS pgml;
INFO:  Python version: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
INFO:  Scikit-learn 1.2.2, XGBoost 1.7.5, LightGBM 3.3.5, NumPy 1.25.1
CREATE EXTENSION
Time: 41.520 ms

postgresml=# SELECT pgml.version();
 version 
---------
 2.7.3
(1 row)

!!!

!!!

You can continue using the command line, or connect to the container using any of the commonly used PostgreSQL tools like psql, pgAdmin, DBeaver, and others:

psql -h 127.0.0.1 -p 5433 -U postgresml

Workflows

PostgresML allows you to generate embeddings with open source models from Hugging Face, easily prompt LLMs with tasks like translation and text generation, and train classical machine learning models on tabular data.

Embeddings

To generate an embedding, all you have to do is use the pgml.embed(model_name, text) function with any open source model available on Hugging Face.

!!! example

!!! code_block time="51.907ms"

SELECT pgml.embed(
    'intfloat/e5-small',
    'passage: PostgresML is so easy!'
);

!!!

!!! results

postgres=# SELECT pgml.embed(
    'intfloat/e5-small',
    'passage: PostgresML is so easy!'
);

{0.02997742,-0.083322115,-0.074212186,0.016167048,0.09899471,-0.08137268,-0.030717574,0.03474584,-0.078880586,0.053087912,-0.027900297,-0.06316991,
 0.04218509,-0.05953648,0.028624319,-0.047688972,0.055339724,0.06451558,-0.022694778,0.029539965,-0.03861752,-0.03565117,0.06457901,0.016581751,
0.030634841,-0.026699776,-0.03840521,0.10052487,0.04131341,-0.036192447,0.036209006,-0.044945586,-0.053815156,0.060391728,-0.042378396,
 -0.008441956,-0.07911099,0.021774381,0.034313954,0.011788908,-0.08744744,-0.011105505,0.04577902,0.0045646844,-0.026846683,-0.03492123,0.068385094,
-0.057966642,-0.04777695,0.11460253,0.010138827,-0.0023120022,0.052329376,0.039127126,-0.100108854,-0.03925074,-0.0064703166,-0.078960024,-0.046833295,
0.04841002,0.029004619,-0.06588247,-0.012441916,0.001127402,-0.064730585,0.05566701,-0.08166461,0.08834854,-0.030919826,0.017261868,-0.031665307,
0.039764903,-0.0747297,-0.079097,-0.063424855,0.057243366,-0.025710078,0.033673875,0.050384883,-0.06700917,-0.020863676,0.001511638,-0.012377004,
-0.01928165,-0.0053149736,0.07477675,0.03526208,-0.033746846,-0.034142617,0.048519857,0.03142429,-0.009989936,-0.018366965,0.098441005,-0.060974542,
0.066505,-0.013180869,-0.067969725,0.06731659,-0.008099243,-0.010721313,0.06885249,-0.047483806,0.004565877,-0.03747329,-0.048288923,-0.021769432,
0.033546787,0.008165753,-0.0018901207,-0.05621888,0.025734955,-0.07408746,-0.053908117,-0.021819277,0.045596648,0.0586417,0.0057576317,-0.05601786,
-0.03452876,-0.049566686,-0.055589233,0.0056059696,0.034660816,0.018012922,-0.06444576,0.036400944,-0.064374834,-0.019948835,-0.09571418,0.09412033,-0.07085108,0.039256454,-0.030016104,-0.07527431,-0.019969895,-0.09996753,0.008969355,0.016372273,0.021206321,0.0041883467,0.032393526,0.04027315,-0.03194125,-0.03397957,-0.035261292,0.061776843,0.019698814,-0.01767779,0.018515844,-0.03544395,-0.08169962,-0.02272048,-0.0830616,-0.049991447,-0.04813149,-0.06792019,0.031181566,-0.04156394,-0.058702122,-0.060489867,0.0020844154,0.18472219,0.05215536,-0.038624488,-0.0029086764,0.08512023,0.08431501,-0.03901469,-0.05836445,0.118146114,-0.053862963,0.014351494,0.0151984785,0.06532256,-0.056947585,0.057420347,0.05119938,0.001644649,0.05911524,0.012656099,-0.00918104,-0.009667282,-0.037909098,0.028913427,-0.056370094,-0.06015602,-0.06306665,-0.030340875,-0.14780329,0.0502743,-0.039765555,0.00015358179,0.018831518,0.04897686,0.014638214,-0.08677867,-0.11336724,-0.03236903,-0.065230116,-0.018204475,0.022788873,0.026926292,-0.036414392,-0.053245157,-0.022078559,-0.01690316,-0.042608887,-0.000196666,-0.0018297597,-0.06743311,0.046494357,-0.013597083,-0.06582122,-0.065659754,-0.01980711,0.07082651,-0.020514658,-0.05147128,-0.012459332,0.07485931,0.037384395,-0.03292486,0.03519196,0.014782926,-0.011726298,0.016492695,-0.0141114695,0.08926231,-0.08323172,0.06442687,0.03452826,-0.015580203,0.009428933,0.06759306,0.024144053,0.055612188,-0.015218529,-0.027584016,0.1005267,-0.054801818,-0.008317948,-0.000781896,-0.0055441647,0.018137401,0.04845575,0.022881811,-0.0090647405,0.00068219384,-0.050285354,-0.05689162,0.015139549,0.03553917,-0.09011886,0.010577362,0.053231273,0.022833975,-3.470906e-05,-0.0027906548,-0.03973121,0.007263015,0.00042456342,0.07092535,-0.043497834,-0.0015815622,-0.03489149,0.050679605,0.03153052,0.037204932,-0.13364139,-0.011497628,-0.043809805,0.045094978,-0.037943177,0.0021411474,0.044974167,-0.05388966,0.03780391,0.033220228,-0.027566046,-0.043608706,0.021699436,-0.011780484,0.04654962,-0.04134961,0.00018980364,-0.0846228,-0.0055453447,0.057337128,0.08390022,-0.019327229,0.10235083,0.048388377,0.042193796,0.025521005,0.013201268,-0.0634062,-0.08712715,0.059367906,-0.007045281,0.0041695046,-0.08747506,-0.015170839,-0.07994115,0.06913491,0.06286314,0.030512255,0.0141608,0.046193067,0.0026272296,0.057590637,-0.06136263,0.069828056,-0.038925823,-0.076347575,0.08457048,0.076567,-0.06237806,0.06076619,0.05488552,-0.06070616,0.10767283,0.008605431,0.045823734,-0.0055780583,0.043272685,-0.05226901,0.035603754,0.04357865,-0.061862156,0.06919797,-0.00086810143,-0.006476894,-0.043467253,0.017243104,-0.08460669,0.07001912,0.025264058,0.048577853,-0.07994533,-0.06760861,-0.034988943,-0.024210323,-0.02578568,0.03488276,-0.0064449264,0.0345789,-0.0155197615,0.02356351,0.049044855,0.0497944,0.053986903,0.03198324,0.05944599,-0.027359396,-0.026340311,0.048312716,-0.023747599,0.041861262,0.017830249,0.0051145423,0.018402847,0.027941752,0.06337417,0.0026447168,-0.057954717,-0.037295196,0.03976777,0.057269543,0.09760822,-0.060166832,-0.039156828,0.05768707,0.020471212,0.013265894,-0.050758235,-0.020386606,0.08815887,-0.05172276,-0.040749934,0.01554588,-0.017021973,0.034403082,0.12543736}

!!!

!!!

Training an XGBoost model

Importing a dataset

PostgresML comes with a few built-in datasets. You can also import your own CSV files or data from other sources like BigQuery, S3, and other databases or files. For our example, let's import the digits dataset from Scikit:

!!! generic

!!! code_block time="47.532ms"

SELECT * FROM pgml.load_dataset('digits');

!!!

!!! results

postgres=# SELECT * FROM pgml.load_dataset('digits');
 table_name  | rows 
-------------+------
 pgml.digits | 1797
(1 row)

!!!

!!!

Training a model

The heart of PostgresML is its pgml.train() function. Using only that function, you can load the data from any table or view in the database, train any number of ML models on it, and deploy the best model to production.

!!! generic

!!! code_block time="222.206ms"

SELECT * FROM pgml.train(
    project_name => 'My First PostgresML Project',
    task => 'classification',
    relation_name => 'pgml.digits',
    y_column_name => 'target',
    algorithm => 'xgboost',
    hyperparams => '{
        "n_estimators": 25
    }'
);

!!!

!!! results

postgres=# SELECT * FROM pgml.train(
    project_name => 'My First PostgresML Project',
    task => 'classification',
    relation_name => 'pgml.digits',
    y_column_name => 'target',
    algorithm => 'xgboost',
    hyperparams => '{
        "n_estimators": 25
    }'
);

[...]

INFO:  Metrics: {
    "f1": 0.88244045,
    "precision": 0.8835865,
    "recall": 0.88687027,
    "accuracy": 0.8841871,
    "mcc": 0.87189955,
    "fit_time": 0.7631203,
    "score_time": 0.007338208
}
INFO:  Deploying model id: 1
           project           |      task      | algorithm | deployed 
-----------------------------+----------------+-----------+----------
 My First PostgresML Project | classification | xgboost   | t
(1 row)

!!!

!!!

Making predictions

After training a model, you can use it to make predictions. PostgresML provides a pgml.predict(project_name, features) function which makes real time predictions using the best deployed model for the given project:

!!! generic

!!! code_block time="8.676ms"

SELECT 
    target,
    pgml.predict('My First PostgresML Project', image) AS prediction
FROM pgml.digits
LIMIT 5;

!!!

!!! results

 target | prediction 
--------+------------
      0 |          0
      1 |          1
      2 |          2
      3 |          3
      4 |          4

!!!

!!!

Automation of common ML tasks

The following common machine learning tasks are performed automatically by PostgresML:

  1. Snapshot the data so the experiment is reproducible
  2. Split the dataset into train and test sets
  3. Train and validate the model
  4. Save it into the model store (a Postgres table)
  5. Load it and cache it during inference

Check out our Training and Predictions documentation for more details. Some more advanced topics like hyperparameter search and GPU acceleration are available as well.

Dashboard

The Dashboard app is running on localhost:8000. You can use it to write experiments in Jupyter-style notebooks, manage projects, and visualize datasets used by PostgresML.